From patchwork Wed Mar 20 14:57:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suzuki K Poulose X-Patchwork-Id: 160695 Delivered-To: patch@linaro.org Received: by 2002:a02:c6d8:0:0:0:0:0 with SMTP id r24csp386758jan; Wed, 20 Mar 2019 07:57:35 -0700 (PDT) X-Google-Smtp-Source: APXvYqyv1+EZDg4zXrSXBSQLKZjKJVIXGVGeRdVwULPFmg1Svmxc6Q/62EjcPbPBvofnqjziJiuR X-Received: by 2002:aa7:8384:: with SMTP id u4mr7884281pfm.190.1553093855517; Wed, 20 Mar 2019 07:57:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553093855; cv=none; d=google.com; s=arc-20160816; b=HGxEfQosqBvC6oH+9E4CU0k7HAy1YdeK4G3K+U3dJ2Lmok4z7DjiO2bxtu9Ym/4dEu VRRjMPZAcX5zqymwAZIo38Cea9J20Q9dTfnzp064bPLMLQjtWEAmUYq145+y/nQMCWzB xqxExQ+PKImOGR/lUIBl8QvjjqeGnYgUToEZkNIzQ4pEzVcYGSRuiBH/p1iRpPRCezY5 OPG8vzuW2XWrY2aKknj9afntHfL2YoeFrekc5Y8iTG/icXT62o4pjnndNZFrHNcclm3o w0YdTqQUF7ow+pmVKPU69Ma+tbv/kGyB5AhNul5STv7cwhcgGhTlqskqGnjFeE3+zax0 S/Yg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from; bh=I9z6O5LwSXo3FrT6FHvl7W71n6J/rVgHhv2JOdOvqwM=; b=JY19VQcZ1GOlWwFZrFLNrCW7g8OHh/fE+fQ9qEszXin7702P6tooq7HgL2vIFcvUp0 qLftZw3OZEPOFfTiHloKdzAExOtDHOMm+6CcA0xPEXsnXEjeuVLibLil5aOQZMo0XAv+ jTHXcWeiy0P+tf0fwvumVn8mNtEuFWC4ujzIGKTlnknvbUireWZn2CjNNSMEsSrILoc+ +7nFGrD2B34dF0ecQmJUar8PSTUGOppxw2QqMHTd3JFiOgexIDWDLH0i3eGBFtAS3j7l sSdUBNU0hbHe4kAsAWHtH70TdKLoDArOI4mPvApEOBDPWCwwo3ll7pL8ncLo3FSaPJ/u kYhw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d4si1745554pgk.501.2019.03.20.07.57.35; Wed, 20 Mar 2019 07:57:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728278AbfCTO5e (ORCPT + 31 others); Wed, 20 Mar 2019 10:57:34 -0400 Received: from foss.arm.com ([217.140.101.70]:41468 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727495AbfCTO5d (ORCPT ); Wed, 20 Mar 2019 10:57:33 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 34EAD1650; Wed, 20 Mar 2019 07:57:33 -0700 (PDT) Received: from en101.cambridge.arm.com (en101.cambridge.arm.com [10.1.196.93]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 882833F614; Wed, 20 Mar 2019 07:57:30 -0700 (PDT) From: Suzuki K Poulose To: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu, will.deacon@arm.com, catalin.marinas@arm.com, james.morse@arm.com, julien.thierry@arm.com, wanghaibin.wang@huawei.com, lious.lilei@hisilicon.com, lishuo1@hisilicon.com, zhengxiang9@huawei.com, yuzenghui@huawei.com, Suzuki K Poulose , Marc Zyngier , Christoffer Dall Subject: [PATCH v2] kvm: arm: Fix handling of stage2 huge mappings Date: Wed, 20 Mar 2019 14:57:19 +0000 Message-Id: <1553093839-29317-1-git-send-email-suzuki.poulose@arm.com> X-Mailer: git-send-email 2.7.4 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We rely on the mmu_notifier call backs to handle the split/merge of huge pages and thus we are guaranteed that, while creating a block mapping, either the entire block is unmapped at stage2 or it is missing permission. However, we miss a case where the block mapping is split for dirty logging case and then could later be made block mapping, if we cancel the dirty logging. This not only creates inconsistent TLB entries for the pages in the the block, but also leakes the table pages for PMD level. Handle this corner case for the huge mappings at stage2 by unmapping the non-huge mapping for the block. This could potentially release the upper level table. So we need to restart the table walk once we unmap the range. Fixes : ad361f093c1e31d ("KVM: ARM: Support hugetlbfs backed huge pages") Reported-by: Zheng Xiang Cc: Zheng Xiang Cc: Zhenghui Yu Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose --- Changes since v1: - Fix build break on arm32 - Add missing definitions for S2_PUD_* - Use kvm_pud_pfn() instead of pud_pfn() - Make PUD handling similar to PMD, by dropping the else case arch/arm/include/asm/stage2_pgtable.h | 2 ++ virt/kvm/arm/mmu.c | 59 +++++++++++++++++++++++++---------- 2 files changed, 45 insertions(+), 16 deletions(-) -- 2.7.4 diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h index de20895..9e11dce 100644 --- a/arch/arm/include/asm/stage2_pgtable.h +++ b/arch/arm/include/asm/stage2_pgtable.h @@ -75,6 +75,8 @@ static inline bool kvm_stage2_has_pud(struct kvm *kvm) #define S2_PMD_MASK PMD_MASK #define S2_PMD_SIZE PMD_SIZE +#define S2_PUD_MASK PUD_MASK +#define S2_PUD_SIZE PUD_SIZE static inline bool kvm_stage2_has_pmd(struct kvm *kvm) { diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index fce0983..97b5417 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1060,25 +1060,43 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache { pmd_t *pmd, old_pmd; +retry: pmd = stage2_get_pmd(kvm, cache, addr); VM_BUG_ON(!pmd); old_pmd = *pmd; + /* + * Multiple vcpus faulting on the same PMD entry, can + * lead to them sequentially updating the PMD with the + * same value. Following the break-before-make + * (pmd_clear() followed by tlb_flush()) process can + * hinder forward progress due to refaults generated + * on missing translations. + * + * Skip updating the page table if the entry is + * unchanged. + */ + if (pmd_val(old_pmd) == pmd_val(*new_pmd)) + return 0; + if (pmd_present(old_pmd)) { /* - * Multiple vcpus faulting on the same PMD entry, can - * lead to them sequentially updating the PMD with the - * same value. Following the break-before-make - * (pmd_clear() followed by tlb_flush()) process can - * hinder forward progress due to refaults generated - * on missing translations. + * If we already have PTE level mapping for this block, + * we must unmap it to avoid inconsistent TLB state and + * leaking the table page. We could end up in this situation + * if the memory slot was marked for dirty logging and was + * reverted, leaving PTE level mappings for the pages accessed + * during the period. So, unmap the PTE level mapping for this + * block and retry, as we could have released the upper level + * table in the process. * - * Skip updating the page table if the entry is - * unchanged. + * Normal THP split/merge follows mmu_notifier callbacks and do + * get handled accordingly. */ - if (pmd_val(old_pmd) == pmd_val(*new_pmd)) - return 0; - + if (!pmd_thp_or_huge(old_pmd)) { + unmap_stage2_range(kvm, addr & S2_PMD_MASK, S2_PMD_SIZE); + goto retry; + } /* * Mapping in huge pages should only happen through a * fault. If a page is merged into a transparent huge @@ -1090,8 +1108,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache * should become splitting first, unmapped, merged, * and mapped back in on-demand. */ - VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); - + WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); pmd_clear(pmd); kvm_tlb_flush_vmid_ipa(kvm, addr); } else { @@ -1107,6 +1124,7 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac { pud_t *pudp, old_pud; +retry: pudp = stage2_get_pud(kvm, cache, addr); VM_BUG_ON(!pudp); @@ -1114,14 +1132,23 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac /* * A large number of vcpus faulting on the same stage 2 entry, - * can lead to a refault due to the - * stage2_pud_clear()/tlb_flush(). Skip updating the page - * tables if there is no change. + * can lead to a refault due to the stage2_pud_clear()/tlb_flush(). + * Skip updating the page tables if there is no change. */ if (pud_val(old_pud) == pud_val(*new_pudp)) return 0; if (stage2_pud_present(kvm, old_pud)) { + /* + * If we already have table level mapping for this block, unmap + * the range for this block and retry. + */ + if (!stage2_pud_huge(kvm, old_pud)) { + unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE); + goto retry; + } + + WARN_ON_ONCE(kvm_pud_pfn(old_pud) != kvm_pud_pfn(*new_pudp)); stage2_pud_clear(kvm, pudp); kvm_tlb_flush_vmid_ipa(kvm, addr); } else {