From patchwork Tue Oct 7 19:39:54 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoffer Dall X-Patchwork-Id: 38428 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-lb0-f197.google.com (mail-lb0-f197.google.com [209.85.217.197]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id A113A202E7 for ; Tue, 7 Oct 2014 19:41:36 +0000 (UTC) Received: by mail-lb0-f197.google.com with SMTP id p9sf4474465lbv.8 for ; Tue, 07 Oct 2014 12:41:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:date:from:to:subject:message-id :references:mime-version:in-reply-to:user-agent:cc:precedence :list-id:list-unsubscribe:list-archive:list-post:list-help :list-subscribe:sender:errors-to:x-original-sender :x-original-authentication-results:mailing-list:content-disposition :content-type:content-transfer-encoding; bh=YYdbCOOX7upaWtuirG/5QyNi2S20dCJ9HZ75MXAeFi8=; b=gVdoFzANig8FhY6yWRk67ZO8OluXmdFVAb+V43eCVjTBs3Ju0OvOlnkrRXD9gfe+Ou AgSI/S0Q65Hnhmw0iPxMrSvL/lMDWlwsnTHgPfs4QSx+NOn6zCdlkPvJ3xJEW0TPimVI TRTpawjs4baLSYRz2agmh+WNrQIzLwOnUKYIrDNDXWKBUhCJW+QoW/zBWVb6Sl1+xwpy D+brE1UAF4t2HpFMgZW0QeJNZro33CcL+j002rzSyRdfOQYSheA5ZCAnGDZ/VdGRTI0p v8up7ZvxfVNAuRNB60/s1yRENH5sJNejQh76KtQX0VMViAZFtG50hRLl8fzDd9nBknm1 fRoQ== X-Gm-Message-State: ALoCoQkSF6uGg+dcSHh1C07It4zbhwbF9ZyfKea+h7iQ8WO8ghIVmgezAX4+gceqOVw73y/lv5h4 X-Received: by 10.112.145.136 with SMTP id su8mr809978lbb.9.1412710895133; Tue, 07 Oct 2014 12:41:35 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.27.201 with SMTP id v9ls739053lag.40.gmail; Tue, 07 Oct 2014 12:41:35 -0700 (PDT) X-Received: by 10.152.22.7 with SMTP id z7mr6432818lae.6.1412710894985; Tue, 07 Oct 2014 12:41:34 -0700 (PDT) Received: from mail-la0-f53.google.com (mail-la0-f53.google.com [209.85.215.53]) by mx.google.com with ESMTPS id u2si1766522laa.120.2014.10.07.12.41.34 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 07 Oct 2014 12:41:34 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.53 as permitted sender) client-ip=209.85.215.53; Received: by mail-la0-f53.google.com with SMTP id gq15so6933606lab.40 for ; Tue, 07 Oct 2014 12:41:34 -0700 (PDT) X-Received: by 10.152.7.145 with SMTP id j17mr6354300laa.67.1412710894891; Tue, 07 Oct 2014 12:41:34 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.130.169 with SMTP id of9csp404340lbb; Tue, 7 Oct 2014 12:41:34 -0700 (PDT) X-Received: by 10.70.53.73 with SMTP id z9mr3830191pdo.154.1412710893141; Tue, 07 Oct 2014 12:41:33 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org. [2001:1868:205::9]) by mx.google.com with ESMTPS id y1si13872093pdo.45.2014.10.07.12.41.32 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 07 Oct 2014 12:41:33 -0700 (PDT) Received-SPF: none (google.com: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org does not designate permitted sender hosts) client-ip=2001:1868:205::9; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1XbacG-0007Pr-Oq; Tue, 07 Oct 2014 19:40:12 +0000 Received: from mail-lb0-f172.google.com ([209.85.217.172]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1XbacD-0006Ck-IV for linux-arm-kernel@lists.infradead.org; Tue, 07 Oct 2014 19:40:10 +0000 Received: by mail-lb0-f172.google.com with SMTP id b6so6923190lbj.3 for ; Tue, 07 Oct 2014 12:39:46 -0700 (PDT) X-Received: by 10.152.10.2 with SMTP id e2mr5472272lab.96.1412710786110; Tue, 07 Oct 2014 12:39:46 -0700 (PDT) Received: from localhost (188-178-240-98-static.dk.customer.tdc.net. [188.178.240.98]) by mx.google.com with ESMTPSA id pm3sm6623511lbb.15.2014.10.07.12.39.44 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Tue, 07 Oct 2014 12:39:45 -0700 (PDT) Date: Tue, 7 Oct 2014 21:39:54 +0200 From: Christoffer Dall To: Marc Zyngier Subject: Re: [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2 and Stage-2 Message-ID: <20141007193954.GI3717@cbox> References: <1412627427-28629-1-git-send-email-christoffer.dall@linaro.org> <1412627427-28629-2-git-send-email-christoffer.dall@linaro.org> <20141007104846.GB12675@e104818-lin.cambridge.arm.com> <5433EA8B.6010502@arm.com> MIME-Version: 1.0 In-Reply-To: <5433EA8B.6010502@arm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20141007_124009_984071_9846A57D X-CRM114-Status: GOOD ( 43.26 ) X-Spam-Score: -0.7 (/) X-Spam-Report: SpamAssassin version 3.4.0 on bombadil.infradead.org summary: Content analysis details: (-0.7 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [209.85.217.172 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record -0.0 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2) [209.85.217.172 listed in wl.mailspike.net] Cc: Joel Schopp , "jungseoklee85@gmail.com" , "kvm@vger.kernel.org" , Catalin Marinas , "kvmarm@lists.cs.columbia.edu" , "linux-arm-kernel@lists.infradead.org" X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: christoffer.dall@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.53 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 Content-Disposition: inline On Tue, Oct 07, 2014 at 02:28:43PM +0100, Marc Zyngier wrote: > On 07/10/14 11:48, Catalin Marinas wrote: > > On Mon, Oct 06, 2014 at 09:30:25PM +0100, Christoffer Dall wrote: > >> +/** > >> + * kvm_prealloc_hwpgd - allocate inital table for VTTBR > >> + * @kvm: The KVM struct pointer for the VM. > >> + * @pgd: The kernel pseudo pgd > >> + * > >> + * When the kernel uses more levels of page tables than the guest, we allocate > >> + * a fake PGD and pre-populate it to point to the next-level page table, which > >> + * will be the real initial page table pointed to by the VTTBR. > >> + * > >> + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and > >> + * the kernel will use folded pud. When KVM_PREALLOC_LEVEL==1, we > >> + * allocate 2 consecutive PUD pages. > >> + */ > >> +#if defined(CONFIG_ARM64_64K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 3 > >> +#define KVM_PREALLOC_LEVEL 2 > >> +#define PTRS_PER_S2_PGD 1 > >> +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) > > > > I agree that my magic equation wasn't readable ;) (I had troubles > > re-understanding it as well), but you also have some constants here that > > are not immediately obvious where you got to them from. IIUC, > > KVM_PREALLOC_LEVEL == 2 here means that the hardware only understands > > stage 2 pmd and pte. I guess you could look into the ARM ARM tables but > > it's still not clear. > > > > Let's look at PTRS_PER_S2_PGD as I think it's simpler. My proposal was: > > > > #if PGDIR_SHIFT > KVM_PHYS_SHIFT > > #define PTRS_PER_S2_PGD (1) > > #else > > #define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT)) > > #endif > > > > In this case PGDIR_SHIFT is 42, so we get PTRS_PER_S2_PGD == 1. The 4K > > and 4 levels case below is also correct. > > > > The KVM start level calculation, we could assume that KVM needs either > > host levels or host levels - 1 (unless we go for some weirdly small > > KVM_PHYS_SHIFT). So we could define them KVM_PREALLOC_LEVEL as: > > > > #if PTRS_PER_S2_PGD <= 16 > > #define KVM_PREALLOC_LEVEL (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1) > > #else > > #define KVM_PREALLOC_LEVEL (0) > > #endif > > > > Basically if you can concatenate 16 or less pages at the level below the > > top, the architecture does not allow a small top level. In this case, > > (4 - CONFIG_ARM64_PGTABLE_LEVELS) represents the first level for the > > host and we add 1 to go to the next level for KVM stage 2 when > > PTRS_PER_S2_PGD is 16 or less. We use 0 when we don't need to > > preallocate. > > I think this makes the whole thing clearer (at least for me), as it > makes the relationship between KVM_PREALLOC_LEVEL and > CONFIG_ARM64_PGTABLE_LEVELS explicit (it wasn't completely obvious to me > initially). Agreed. > > >> +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) > >> +{ > >> + pud_t *pud; > >> + pmd_t *pmd; > >> + > >> + pud = pud_offset(pgd, 0); > >> + pmd = (pmd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 0); > >> + > >> + if (!pmd) > >> + return -ENOMEM; > >> + pud_populate(NULL, pud, pmd); > >> + > >> + return 0; > >> +} > >> + > >> +static inline void kvm_free_hwpgd(struct kvm *kvm) > >> +{ > >> + pgd_t *pgd = kvm->arch.pgd; > >> + pud_t *pud = pud_offset(pgd, 0); > >> + pmd_t *pmd = pmd_offset(pud, 0); > >> + free_pages((unsigned long)pmd, 0); > >> +} > >> + > >> +static inline phys_addr_t kvm_get_hwpgd(struct kvm *kvm) > >> +{ > >> + pgd_t *pgd = kvm->arch.pgd; > >> + pud_t *pud = pud_offset(pgd, 0); > >> + pmd_t *pmd = pmd_offset(pud, 0); > >> + return virt_to_phys(pmd); > >> + > >> +} > >> +#elif defined(CONFIG_ARM64_4K_PAGES) && CONFIG_ARM64_PGTABLE_LEVELS == 4 > >> +#define KVM_PREALLOC_LEVEL 1 > >> +#define PTRS_PER_S2_PGD 2 > >> +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) > > > > Here PGDIR_SHIFT is 39, so we get PTRS_PER_S2_PGD == (1 << (40 - 39)) > > which is 2 and KVM_PREALLOC_LEVEL == 1. > > > >> +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) > >> +{ > >> + pud_t *pud; > >> + > >> + pud = (pud_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 1); > >> + if (!pud) > >> + return -ENOMEM; > >> + pgd_populate(NULL, pgd, pud); > >> + pgd_populate(NULL, pgd + 1, pud + PTRS_PER_PUD); > >> + > >> + return 0; > >> +} > > > > You still need to define these functions but you can make their > > implementation dependent solely on the KVM_PREALLOC_LEVEL rather than > > 64K/4K and levels combinations. If it is KVM_PREALLOC_LEVEL is 1, you > > allocate pud and populate the pgds (in a loop based on the > > PTRS_PER_S2_PGD). If it is 2, you allocate the pmd and populate the pud > > (still in a loop though it would probably be 1 iteration). We know based > > on the assumption above that you can't get KVM_PREALLOC_LEVEL == 2 and > > CONFIG_ARM64_PGTABLE_LEVELS == 4. > > > > Also agreed. Most of what you wrote here could also be gathered as > comments in the patch. > Yes, I reworded some of the text slightly as comments for the next version of the patch. However, I'm not sure I have a clear idea of how you'd like these functions to look like. I came up with the following based on your feedback, but I personally don't find it a lot easier to read than what I had already. Suggestions are welcome: Thanks, -Christoffer diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index a030d16..7941a51 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -41,6 +41,18 @@ */ #define TRAMPOLINE_VA (HYP_PAGE_OFFSET_MASK & PAGE_MASK) +/* + * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation + * levels in addition to the PGD and potentially the PUD which are + * pre-allocated (we pre-allocate the fake PGD and the PUD when the Stage-2 + * tables use one level of tables less than the kernel. + */ +#ifdef CONFIG_ARM64_64K_PAGES +#define KVM_MMU_CACHE_MIN_PAGES 1 +#else +#define KVM_MMU_CACHE_MIN_PAGES 2 +#endif + #ifdef __ASSEMBLY__ /* @@ -53,6 +65,7 @@ #else +#include #include #include @@ -65,10 +78,6 @@ #define KVM_PHYS_SIZE (1UL << KVM_PHYS_SHIFT) #define KVM_PHYS_MASK (KVM_PHYS_SIZE - 1UL) -/* Make sure we get the right size, and thus the right alignment */ -#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT)) -#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) - int create_hyp_mappings(void *from, void *to); int create_hyp_io_mappings(void *from, void *to, phys_addr_t); void free_boot_hyp_pgd(void); @@ -93,6 +102,7 @@ void kvm_clear_hyp_idmap(void); #define kvm_set_pmd(pmdp, pmd) set_pmd(pmdp, pmd) static inline void kvm_clean_pgd(pgd_t *pgd) {} +static inline void kvm_clean_pmd(pmd_t *pmd) {} static inline void kvm_clean_pmd_entry(pmd_t *pmd) {} static inline void kvm_clean_pte(pte_t *pte) {} static inline void kvm_clean_pte_entry(pte_t *pte) {} @@ -118,13 +128,115 @@ static inline bool kvm_page_empty(void *ptr) } #define kvm_pte_table_empty(ptep) kvm_page_empty(ptep) -#ifndef CONFIG_ARM64_64K_PAGES -#define kvm_pmd_table_empty(pmdp) kvm_page_empty(pmdp) -#else + +#ifdef __PAGETABLE_PMD_FOLDED #define kvm_pmd_table_empty(pmdp) (0) +#else +#define kvm_pmd_table_empty(pmdp) kvm_page_empty(pmdp) #endif + +#ifdef __PAGETABLE_PUD_FOLDED #define kvm_pud_table_empty(pudp) (0) +#else +#define kvm_pud_table_empty(pudp) kvm_page_empty(pudp) +#endif + +/* + * In the case where PGDIR_SHIFT is larger than KVM_PHYS_SHIFT, we can address + * the entire IPA input range with a single pgd entry, and we would only need + * one pgd entry. + */ +#if PGDIR_SHIFT > KVM_PHYS_SHIFT +#define PTRS_PER_S2_PGD (1) +#else +#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT)) +#endif +#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t)) +/* + * If we are concatenating first level stage-2 page tables, we would have less + * than or equal to 16 pointers in the fake PGD, because that's what the + * architecture allows. In this case, (4 - CONFIG_ARM64_PGTABLE_LEVELS) + * represents the first level for the host, and we add 1 to go to the next + * level (which uses contatenation) for the stage-2 tables. + */ +#if PTRS_PER_S2_PGD <= 16 +#define KVM_PREALLOC_LEVEL (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1) +#else +#define KVM_PREALLOC_LEVEL (0) +#endif + +/** + * kvm_prealloc_hwpgd - allocate inital table for VTTBR + * @kvm: The KVM struct pointer for the VM. + * @pgd: The kernel pseudo pgd + * + * When the kernel uses more levels of page tables than the guest, we allocate + * a fake PGD and pre-populate it to point to the next-level page table, which + * will be the real initial page table pointed to by the VTTBR. + * + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and + * the kernel will use folded pud. When KVM_PREALLOC_LEVEL==1, we + * allocate 2 consecutive PUD pages. + */ +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd) +{ + pud_t *pud; + pmd_t *pmd; + unsigned int order, i; + unsigned long hwpgd; + + if (KVM_PREALLOC_LEVEL == 0) + return 0; + + order = get_order(PTRS_PER_S2_PGD); + hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order); + if (!hwpgd) + return -ENOMEM; + + if (KVM_PREALLOC_LEVEL == 1) { + pud = (pud_t *)hwpgd; + for (i = 0; i < PTRS_PER_S2_PGD; i++) + pgd_populate(NULL, pgd + i, pud + i * PTRS_PER_PUD); + } else if (KVM_PREALLOC_LEVEL == 2) { + pud = pud_offset(pgd, 0); + pmd = (pmd_t *)hwpgd; + for (i = 0; i < PTRS_PER_S2_PGD; i++) + pud_populate(NULL, pud + i, pmd + i * PTRS_PER_PMD); + } + + return 0; +} + +static inline void *kvm_get_hwpgd(struct kvm *kvm) +{ + pgd_t *pgd = kvm->arch.pgd; + pud_t *pud; + pmd_t *pmd; + + switch (KVM_PREALLOC_LEVEL) { + case 0: + return pgd; + case 1: + pud = pud_offset(pgd, 0); + return pud; + case 2: + pud = pud_offset(pgd, 0); + pmd = pmd_offset(pud, 0); + return pmd; + default: + BUG(); + return NULL; + } +} + +static inline void kvm_free_hwpgd(struct kvm *kvm) +{ + if (KVM_PREALLOC_LEVEL > 0) { + unsigned long hwpgd = (unsigned long)kvm_get_hwpgd(kvm); + free_pages(hwpgd, get_order(S2_PGD_ORDER)); + } +} struct kvm;