From patchwork Tue Oct  7 19:39:54 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Christoffer Dall <christoffer.dall@linaro.org>
X-Patchwork-Id: 38428
Return-Path: <patchwork-forward+bncBCAJ5U7H64CRB34D2GQQKGQEQDEUVBI@linaro.org>
X-Original-To: linaro@patches.linaro.org
Delivered-To: linaro@patches.linaro.org
Received: from mail-lb0-f197.google.com (mail-lb0-f197.google.com
 [209.85.217.197])
 by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id A113A202E7
 for <linaro@patches.linaro.org>; Tue,  7 Oct 2014 19:41:36 +0000 (UTC)
Received: by mail-lb0-f197.google.com with SMTP id p9sf4474465lbv.8
 for <linaro@patches.linaro.org>; Tue, 07 Oct 2014 12:41:35 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:delivered-to:date:from:to:subject:message-id
 :references:mime-version:in-reply-to:user-agent:cc:precedence
 :list-id:list-unsubscribe:list-archive:list-post:list-help
 :list-subscribe:sender:errors-to:x-original-sender
 :x-original-authentication-results:mailing-list:content-disposition
 :content-type:content-transfer-encoding;
 bh=YYdbCOOX7upaWtuirG/5QyNi2S20dCJ9HZ75MXAeFi8=;
 b=gVdoFzANig8FhY6yWRk67ZO8OluXmdFVAb+V43eCVjTBs3Ju0OvOlnkrRXD9gfe+Ou
 AgSI/S0Q65Hnhmw0iPxMrSvL/lMDWlwsnTHgPfs4QSx+NOn6zCdlkPvJ3xJEW0TPimVI
 TRTpawjs4baLSYRz2agmh+WNrQIzLwOnUKYIrDNDXWKBUhCJW+QoW/zBWVb6Sl1+xwpy
 D+brE1UAF4t2HpFMgZW0QeJNZro33CcL+j002rzSyRdfOQYSheA5ZCAnGDZ/VdGRTI0p
 v8up7ZvxfVNAuRNB60/s1yRENH5sJNejQh76KtQX0VMViAZFtG50hRLl8fzDd9nBknm1
 fRoQ==
X-Gm-Message-State: ALoCoQkSF6uGg+dcSHh1C07It4zbhwbF9ZyfKea+h7iQ8WO8ghIVmgezAX4+gceqOVw73y/lv5h4
X-Received: by 10.112.145.136 with SMTP id su8mr809978lbb.9.1412710895133;
 Tue, 07 Oct 2014 12:41:35 -0700 (PDT)
X-BeenThere: patchwork-forward@linaro.org
Received: by 10.152.27.201 with SMTP id v9ls739053lag.40.gmail; Tue, 07 Oct
 2014 12:41:35 -0700 (PDT)
X-Received: by 10.152.22.7 with SMTP id z7mr6432818lae.6.1412710894985;
 Tue, 07 Oct 2014 12:41:34 -0700 (PDT)
Received: from mail-la0-f53.google.com (mail-la0-f53.google.com
 [209.85.215.53]) by mx.google.com with ESMTPS id
 u2si1766522laa.120.2014.10.07.12.41.34
 for <patchwork-forward@linaro.org>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Tue, 07 Oct 2014 12:41:34 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 209.85.215.53 as permitted sender) client-ip=209.85.215.53; 
Received: by mail-la0-f53.google.com with SMTP id gq15so6933606lab.40
 for <patchwork-forward@linaro.org>;
 Tue, 07 Oct 2014 12:41:34 -0700 (PDT)
X-Received: by 10.152.7.145 with SMTP id j17mr6354300laa.67.1412710894891;
 Tue, 07 Oct 2014 12:41:34 -0700 (PDT)
X-Forwarded-To: patchwork-forward@linaro.org
X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org
Delivered-To: patch@linaro.org
Received: by 10.112.130.169 with SMTP id of9csp404340lbb;
 Tue, 7 Oct 2014 12:41:34 -0700 (PDT)
X-Received: by 10.70.53.73 with SMTP id z9mr3830191pdo.154.1412710893141;
 Tue, 07 Oct 2014 12:41:33 -0700 (PDT)
Received: from bombadil.infradead.org (bombadil.infradead.org.
 [2001:1868:205::9]) by mx.google.com with ESMTPS id
 y1si13872093pdo.45.2014.10.07.12.41.32 for <patch@linaro.org>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 07 Oct 2014 12:41:33 -0700 (PDT)
Received-SPF: none (google.com:
 linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org
 does not designate permitted sender hosts)
 client-ip=2001:1868:205::9; 
Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org)
 by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux))
 id 1XbacG-0007Pr-Oq; Tue, 07 Oct 2014 19:40:12 +0000
Received: from mail-lb0-f172.google.com ([209.85.217.172])
 by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat
 Linux)) id 1XbacD-0006Ck-IV for linux-arm-kernel@lists.infradead.org;
 Tue, 07 Oct 2014 19:40:10 +0000
Received: by mail-lb0-f172.google.com with SMTP id b6so6923190lbj.3
 for <linux-arm-kernel@lists.infradead.org>;
 Tue, 07 Oct 2014 12:39:46 -0700 (PDT)
X-Received: by 10.152.10.2 with SMTP id e2mr5472272lab.96.1412710786110;
 Tue, 07 Oct 2014 12:39:46 -0700 (PDT)
Received: from localhost (188-178-240-98-static.dk.customer.tdc.net.
 [188.178.240.98]) by mx.google.com with ESMTPSA id
 pm3sm6623511lbb.15.2014.10.07.12.39.44 for <multiple recipients>
 (version=TLSv1.2 cipher=RC4-SHA bits=128/128);
 Tue, 07 Oct 2014 12:39:45 -0700 (PDT)
Date: Tue, 7 Oct 2014 21:39:54 +0200
From: Christoffer Dall <christoffer.dall@linaro.org>
To: Marc Zyngier <marc.zyngier@arm.com>
Subject: Re: [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2
 and Stage-2
Message-ID: <20141007193954.GI3717@cbox>
References: <1412627427-28629-1-git-send-email-christoffer.dall@linaro.org>
 <1412627427-28629-2-git-send-email-christoffer.dall@linaro.org>
 <20141007104846.GB12675@e104818-lin.cambridge.arm.com>
 <5433EA8B.6010502@arm.com>
MIME-Version: 1.0
In-Reply-To: <5433EA8B.6010502@arm.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3
X-CRM114-CacheID: sfid-20141007_124009_984071_9846A57D
X-CRM114-Status: GOOD (  43.26  )
X-Spam-Score: -0.7 (/)
X-Spam-Report: SpamAssassin version 3.4.0 on bombadil.infradead.org summary:
 Content analysis details:   (-0.7 points)
 pts rule name              description
 ---- ----------------------
 --------------------------------------------------
 -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/,
 low trust [209.85.217.172 listed in list.dnswl.org]
 -0.0 SPF_PASS               SPF: sender matches SPF record
 -0.0 RCVD_IN_MSPIKE_H2      RBL: Average reputation (+2)
 [209.85.217.172 listed in wl.mailspike.net]
Cc: Joel Schopp <joel.schopp@amd.com>,
 "jungseoklee85@gmail.com" <jungseoklee85@gmail.com>,
 "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
 Catalin Marinas <catalin.marinas@arm.com>,
 "kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>,
 "linux-arm-kernel@lists.infradead.org"
 <linux-arm-kernel@lists.infradead.org>
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: <patchwork-forward.linaro.org>
List-Unsubscribe: <mailto:googlegroups-manage+836684582541+unsubscribe@googlegroups.com>, 
 <http://groups.google.com/a/linaro.org/group/patchwork-forward/subscribe>
List-Archive: <http://groups.google.com/a/linaro.org/group/patchwork-forward/>
List-Post: <http://groups.google.com/a/linaro.org/group/patchwork-forward/post>, 
 <mailto:patchwork-forward@linaro.org>
List-Help: <http://support.google.com/a/linaro.org/bin/topic.py?topic=25838>, 
 <mailto:patchwork-forward+help@linaro.org>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org
X-Removed-Original-Auth: Dkim didn't pass.
X-Original-Sender: christoffer.dall@linaro.org
X-Original-Authentication-Results: mx.google.com; spf=pass (google.com:
 domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 209.85.215.53 as permitted sender)
 smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org
Mailing-list: list patchwork-forward@linaro.org;
 contact patchwork-forward+owners@linaro.org
X-Google-Group-Id: 836684582541
Content-Disposition: inline

On Tue, Oct 07, 2014 at 02:28:43PM +0100, Marc Zyngier wrote:
> On 07/10/14 11:48, Catalin Marinas wrote:
> > On Mon, Oct 06, 2014 at 09:30:25PM +0100, Christoffer Dall wrote:
> >> +/**
> >> + * kvm_prealloc_hwpgd - allocate inital table for VTTBR
> >> + * @kvm:       The KVM struct pointer for the VM.
> >> + * @pgd:       The kernel pseudo pgd
> >> + *
> >> + * When the kernel uses more levels of page tables than the guest, we allocate
> >> + * a fake PGD and pre-populate it to point to the next-level page table, which
> >> + * will be the real initial page table pointed to by the VTTBR.
> >> + *
> >> + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and
> >> + * the kernel will use folded pud.  When KVM_PREALLOC_LEVEL==1, we
> >> + * allocate 2 consecutive PUD pages.
> >> + */
> >> +#if defined(CONFIG_ARM64_64K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 3
> >> +#define KVM_PREALLOC_LEVEL     2
> >> +#define PTRS_PER_S2_PGD                1
> >> +#define S2_PGD_ORDER           get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
> > 
> > I agree that my magic equation wasn't readable ;) (I had troubles
> > re-understanding it as well), but you also have some constants here that
> > are not immediately obvious where you got to them from. IIUC,
> > KVM_PREALLOC_LEVEL == 2 here means that the hardware only understands
> > stage 2 pmd and pte. I guess you could look into the ARM ARM tables but
> > it's still not clear.
> > 
> > Let's look at PTRS_PER_S2_PGD as I think it's simpler. My proposal was:
> > 
> > #if PGDIR_SHIFT > KVM_PHYS_SHIFT
> > #define PTRS_PER_S2_PGD			(1)
> > #else
> > #define PTRS_PER_S2_PGD			(1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT))
> > #endif
> > 
> > In this case PGDIR_SHIFT is 42, so we get PTRS_PER_S2_PGD == 1. The 4K
> > and 4 levels case below is also correct.
> > 
> > The KVM start level calculation, we could assume that KVM needs either
> > host levels or host levels - 1 (unless we go for some weirdly small
> > KVM_PHYS_SHIFT). So we could define them KVM_PREALLOC_LEVEL as:
> > 
> > #if PTRS_PER_S2_PGD <= 16
> > #define KVM_PREALLOC_LEVEL	(4 - CONFIG_ARM64_PGTABLE_LEVELS + 1)
> > #else
> > #define KVM_PREALLOC_LEVEL	(0)
> > #endif
> > 
> > Basically if you can concatenate 16 or less pages at the level below the
> > top, the architecture does not allow a small top level. In this case,
> > (4 - CONFIG_ARM64_PGTABLE_LEVELS) represents the first level for the
> > host and we add 1 to go to the next level for KVM stage 2 when
> > PTRS_PER_S2_PGD is 16 or less. We use 0 when we don't need to
> > preallocate.
> 
> I think this makes the whole thing clearer (at least for me), as it
> makes the relationship between KVM_PREALLOC_LEVEL and
> CONFIG_ARM64_PGTABLE_LEVELS explicit (it wasn't completely obvious to me
> initially).

Agreed.

> 
> >> +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd)
> >> +{
> >> +       pud_t *pud;
> >> +       pmd_t *pmd;
> >> +
> >> +       pud = pud_offset(pgd, 0);
> >> +       pmd = (pmd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 0);
> >> +
> >> +       if (!pmd)
> >> +               return -ENOMEM;
> >> +       pud_populate(NULL, pud, pmd);
> >> +
> >> +       return 0;
> >> +}
> >> +
> >> +static inline void kvm_free_hwpgd(struct kvm *kvm)
> >> +{
> >> +       pgd_t *pgd = kvm->arch.pgd;
> >> +       pud_t *pud = pud_offset(pgd, 0);
> >> +       pmd_t *pmd = pmd_offset(pud, 0);
> >> +       free_pages((unsigned long)pmd, 0);
> >> +}
> >> +
> >> +static inline phys_addr_t kvm_get_hwpgd(struct kvm *kvm)
> >> +{
> >> +       pgd_t *pgd = kvm->arch.pgd;
> >> +       pud_t *pud = pud_offset(pgd, 0);
> >> +       pmd_t *pmd = pmd_offset(pud, 0);
> >> +       return virt_to_phys(pmd);
> >> +
> >> +}
> >> +#elif defined(CONFIG_ARM64_4K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 4
> >> +#define KVM_PREALLOC_LEVEL     1
> >> +#define PTRS_PER_S2_PGD                2
> >> +#define S2_PGD_ORDER           get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
> > 
> > Here PGDIR_SHIFT is 39, so we get PTRS_PER_S2_PGD == (1 << (40 - 39))
> > which is 2 and KVM_PREALLOC_LEVEL == 1.
> > 
> >> +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd)
> >> +{
> >> +       pud_t *pud;
> >> +
> >> +       pud = (pud_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1);
> >> +       if (!pud)
> >> +               return -ENOMEM;
> >> +       pgd_populate(NULL, pgd, pud);
> >> +       pgd_populate(NULL, pgd + 1, pud + PTRS_PER_PUD);
> >> +
> >> +       return 0;
> >> +}
> > 
> > You still need to define these functions but you can make their
> > implementation dependent solely on the KVM_PREALLOC_LEVEL rather than
> > 64K/4K and levels combinations. If it is KVM_PREALLOC_LEVEL is 1, you
> > allocate pud and populate the pgds (in a loop based on the
> > PTRS_PER_S2_PGD). If it is 2, you allocate the pmd and populate the pud
> > (still in a loop though it would probably be 1 iteration). We know based
> > on the assumption above that you can't get KVM_PREALLOC_LEVEL == 2 and
> > CONFIG_ARM64_PGTABLE_LEVELS == 4.
> > 
> 
> Also agreed. Most of what you wrote here could also be gathered as
> comments in the patch.
> 
Yes, I reworded some of the text slightly as comments for the next
version of the patch.

However, I'm not sure I have a clear idea of how you'd like these
functions to look like.

I came up with the following based on your feedback, but I personally
don't find it a lot easier to read than what I had already.  Suggestions
are welcome:


Thanks,
-Christoffer

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index a030d16..7941a51 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -41,6 +41,18 @@
  */
 #define TRAMPOLINE_VA		(HYP_PAGE_OFFSET_MASK & PAGE_MASK)
 
+/*
+ * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation
+ * levels in addition to the PGD and potentially the PUD which are
+ * pre-allocated (we pre-allocate the fake PGD and the PUD when the Stage-2
+ * tables use one level of tables less than the kernel.
+ */
+#ifdef CONFIG_ARM64_64K_PAGES
+#define KVM_MMU_CACHE_MIN_PAGES	1
+#else
+#define KVM_MMU_CACHE_MIN_PAGES	2
+#endif
+
 #ifdef __ASSEMBLY__
 
 /*
@@ -53,6 +65,7 @@
 
 #else
 
+#include <asm/pgalloc.h>
 #include <asm/cachetype.h>
 #include <asm/cacheflush.h>
 
@@ -65,10 +78,6 @@
 #define KVM_PHYS_SIZE	(1UL << KVM_PHYS_SHIFT)
 #define KVM_PHYS_MASK	(KVM_PHYS_SIZE - 1UL)
 
-/* Make sure we get the right size, and thus the right alignment */
-#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT))
-#define S2_PGD_ORDER	get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
-
 int create_hyp_mappings(void *from, void *to);
 int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
 void free_boot_hyp_pgd(void);
@@ -93,6 +102,7 @@ void kvm_clear_hyp_idmap(void);
 #define	kvm_set_pmd(pmdp, pmd)		set_pmd(pmdp, pmd)
 
 static inline void kvm_clean_pgd(pgd_t *pgd) {}
+static inline void kvm_clean_pmd(pmd_t *pmd) {}
 static inline void kvm_clean_pmd_entry(pmd_t *pmd) {}
 static inline void kvm_clean_pte(pte_t *pte) {}
 static inline void kvm_clean_pte_entry(pte_t *pte) {}
@@ -118,13 +128,115 @@ static inline bool kvm_page_empty(void *ptr)
 }
 
 #define kvm_pte_table_empty(ptep) kvm_page_empty(ptep)
-#ifndef CONFIG_ARM64_64K_PAGES
-#define kvm_pmd_table_empty(pmdp) kvm_page_empty(pmdp)
-#else
+
+#ifdef __PAGETABLE_PMD_FOLDED
 #define kvm_pmd_table_empty(pmdp) (0)
+#else
+#define kvm_pmd_table_empty(pmdp) kvm_page_empty(pmdp)
 #endif
+
+#ifdef __PAGETABLE_PUD_FOLDED
 #define kvm_pud_table_empty(pudp) (0)
+#else
+#define kvm_pud_table_empty(pudp) kvm_page_empty(pudp)
+#endif
+
+/*
+ * In the case where PGDIR_SHIFT is larger than KVM_PHYS_SHIFT, we can address
+ * the entire IPA input range with a single pgd entry, and we would only need
+ * one pgd entry.
+ */
+#if PGDIR_SHIFT > KVM_PHYS_SHIFT
+#define PTRS_PER_S2_PGD		(1)
+#else
+#define PTRS_PER_S2_PGD		(1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT))
+#endif
+#define S2_PGD_ORDER		get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
 
+/*
+ * If we are concatenating first level stage-2 page tables, we would have less
+ * than or equal to 16 pointers in the fake PGD, because that's what the
+ * architecture allows.  In this case, (4 - CONFIG_ARM64_PGTABLE_LEVELS)
+ * represents the first level for the host, and we add 1 to go to the next
+ * level (which uses contatenation) for the stage-2 tables.
+ */
+#if PTRS_PER_S2_PGD <= 16
+#define KVM_PREALLOC_LEVEL	(4 - CONFIG_ARM64_PGTABLE_LEVELS + 1)
+#else
+#define KVM_PREALLOC_LEVEL	(0)
+#endif
+
+/**
+ * kvm_prealloc_hwpgd - allocate inital table for VTTBR
+ * @kvm:	The KVM struct pointer for the VM.
+ * @pgd:	The kernel pseudo pgd
+ *
+ * When the kernel uses more levels of page tables than the guest, we allocate
+ * a fake PGD and pre-populate it to point to the next-level page table, which
+ * will be the real initial page table pointed to by the VTTBR.
+ *
+ * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and
+ * the kernel will use folded pud.  When KVM_PREALLOC_LEVEL==1, we
+ * allocate 2 consecutive PUD pages.
+ */
+static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd)
+{
+	pud_t *pud;
+	pmd_t *pmd;
+	unsigned int order, i;
+	unsigned long hwpgd;
+
+	if (KVM_PREALLOC_LEVEL == 0)
+		return 0;
+
+	order = get_order(PTRS_PER_S2_PGD);
+	hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
+	if (!hwpgd)
+		return -ENOMEM;
+
+	if (KVM_PREALLOC_LEVEL == 1) {
+		pud = (pud_t *)hwpgd;
+		for (i = 0; i < PTRS_PER_S2_PGD; i++)
+			pgd_populate(NULL, pgd + i, pud + i * PTRS_PER_PUD);
+	} else if (KVM_PREALLOC_LEVEL == 2) {
+		pud = pud_offset(pgd, 0);
+		pmd = (pmd_t *)hwpgd;
+		for (i = 0; i < PTRS_PER_S2_PGD; i++)
+			pud_populate(NULL, pud + i, pmd + i * PTRS_PER_PMD);
+	}
+
+	return 0;
+}
+
+static inline void *kvm_get_hwpgd(struct kvm *kvm)
+{
+	pgd_t *pgd = kvm->arch.pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+
+	switch (KVM_PREALLOC_LEVEL) {
+	case 0:
+		return pgd;
+	case 1:
+		pud = pud_offset(pgd, 0);
+		return pud;
+	case 2:
+		pud = pud_offset(pgd, 0);
+		pmd = pmd_offset(pud, 0);
+		return pmd;
+	default:
+		BUG();
+		return NULL;
+	}
+}
+
+static inline void kvm_free_hwpgd(struct kvm *kvm)
+{
+	if (KVM_PREALLOC_LEVEL > 0) {
+		unsigned long hwpgd = (unsigned long)kvm_get_hwpgd(kvm);
+		free_pages(hwpgd, get_order(S2_PGD_ORDER));
+	}
+}
 
 struct kvm;