diff mbox series

[Part2,v6,41/49] KVM: SVM: Add support to handle the RMP nested page fault

Message ID d7decd3cb48d962da086afb65feb94a124e5c537.1655761627.git.ashish.kalra@amd.com
State New
Headers show
Series Add AMD Secure Nested Paging (SEV-SNP) | expand

Commit Message

Ashish Kalra June 20, 2022, 11:13 p.m. UTC
From: Brijesh Singh <brijesh.singh@amd.com>

When SEV-SNP is enabled in the guest, the hardware places restrictions on
all memory accesses based on the contents of the RMP table. When hardware
encounters RMP check failure caused by the guest memory access it raises
the #NPF. The error code contains additional information on the access
type. See the APM volume 2 for additional information.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
---
 arch/x86/kvm/svm/sev.c | 76 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c | 14 +++++---
 2 files changed, 86 insertions(+), 4 deletions(-)

Comments

Ashish Kalra Oct. 11, 2022, 2:32 a.m. UTC | #1
Hello Alper,

On 10/10/2022 5:03 PM, Alper Gun wrote:
> On Mon, Jun 20, 2022 at 4:13 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
>>
>> From: Brijesh Singh <brijesh.singh@amd.com>
>>
>> When SEV-SNP is enabled in the guest, the hardware places restrictions on
>> all memory accesses based on the contents of the RMP table. When hardware
>> encounters RMP check failure caused by the guest memory access it raises
>> the #NPF. The error code contains additional information on the access
>> type. See the APM volume 2 for additional information.
>>
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>> ---
>>   arch/x86/kvm/svm/sev.c | 76 ++++++++++++++++++++++++++++++++++++++++++
>>   arch/x86/kvm/svm/svm.c | 14 +++++---
>>   2 files changed, 86 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index 4ed90331bca0..7fc0fad87054 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -4009,3 +4009,79 @@ void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn)
>>
>>          spin_unlock(&sev->psc_lock);
>>   }
>> +
>> +void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
>> +{
>> +       int rmp_level, npt_level, rc, assigned;
>> +       struct kvm *kvm = vcpu->kvm;
>> +       gfn_t gfn = gpa_to_gfn(gpa);
>> +       bool need_psc = false;
>> +       enum psc_op psc_op;
>> +       kvm_pfn_t pfn;
>> +       bool private;
>> +
>> +       write_lock(&kvm->mmu_lock);
>> +
>> +       if (unlikely(!kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level)))
>> +               goto unlock;
>> +
>> +       assigned = snp_lookup_rmpentry(pfn, &rmp_level);
>> +       if (unlikely(assigned < 0))
>> +               goto unlock;
>> +
>> +       private = !!(error_code & PFERR_GUEST_ENC_MASK);
>> +
>> +       /*
>> +        * If the fault was due to size mismatch, or NPT and RMP page level's
>> +        * are not in sync, then use PSMASH to split the RMP entry into 4K.
>> +        */
>> +       if ((error_code & PFERR_GUEST_SIZEM_MASK) ||
>> +           (npt_level == PG_LEVEL_4K && rmp_level == PG_LEVEL_2M && private)) {
>> +               rc = snp_rmptable_psmash(kvm, pfn);
> 
> 
> Regarding this case:
> RMP level is 4K
> Page table level is 2M
> 
> Does this also cause a page fault with size mismatch? If so, we
> shouldn't try psmash because the rmp entry is already 4K.
> 
> I see these errors in our tests and I think it may be happening
> because rmp size is already 4K.
> 
> [ 1848.752952] psmash failed, gpa 0x191560000 pfn 0x536cd60 rc 7
> [ 2922.879635] psmash failed, gpa 0x102830000 pfn 0x37c8230 rc 7
> [ 3010.983090] psmash failed, gpa 0x104220000 pfn 0x6cf1e20 rc 7
> [ 3170.792050] psmash failed, gpa 0x108a80000 pfn 0x20e0080 rc 7
> [ 3345.955147] psmash failed, gpa 0x11b480000 pfn 0x1545e480 rc 7
> 
> Shouldn't we use AND instead of OR in the if statement?
> 

I believe this we can't do, looking at the typical usage case below :

[   37.243969] #VMEXIT (NPF) - SIZEM, err 0xc80000005 npt_level 2, 
rmp_level 2, private 1
[   37.243973] trying psmash gpa 0x7f790000 pfn 0x1f5d90

This is typically the case with #VMEXIT(NPF) with SIZEM error code, when 
the guest tries to do PVALIDATE on 4K GHCB pages, in this case both the
RMP table and NPT will be optimally setup to 2M hugepage as can be seen.

Is it possible to investigate in more depth, when is the this case being 
observed:
RMP level is 4K
Page table level is 2M
We shouldn't try psmash because the rmp entry is already 4K.

Thanks,
Ashish

> if ((error_code & PFERR_GUEST_SIZEM_MASK) && ...
>
Alper Gun Oct. 12, 2022, 10:53 p.m. UTC | #2
On Mon, Oct 10, 2022 at 7:32 PM Kalra, Ashish <ashish.kalra@amd.com> wrote:
>
> Hello Alper,
>
> On 10/10/2022 5:03 PM, Alper Gun wrote:
> > On Mon, Jun 20, 2022 at 4:13 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
> >>
> >> From: Brijesh Singh <brijesh.singh@amd.com>
> >>
> >> When SEV-SNP is enabled in the guest, the hardware places restrictions on
> >> all memory accesses based on the contents of the RMP table. When hardware
> >> encounters RMP check failure caused by the guest memory access it raises
> >> the #NPF. The error code contains additional information on the access
> >> type. See the APM volume 2 for additional information.
> >>
> >> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> >> ---
> >>   arch/x86/kvm/svm/sev.c | 76 ++++++++++++++++++++++++++++++++++++++++++
> >>   arch/x86/kvm/svm/svm.c | 14 +++++---
> >>   2 files changed, 86 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> >> index 4ed90331bca0..7fc0fad87054 100644
> >> --- a/arch/x86/kvm/svm/sev.c
> >> +++ b/arch/x86/kvm/svm/sev.c
> >> @@ -4009,3 +4009,79 @@ void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn)
> >>
> >>          spin_unlock(&sev->psc_lock);
> >>   }
> >> +
> >> +void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
> >> +{
> >> +       int rmp_level, npt_level, rc, assigned;
> >> +       struct kvm *kvm = vcpu->kvm;
> >> +       gfn_t gfn = gpa_to_gfn(gpa);
> >> +       bool need_psc = false;
> >> +       enum psc_op psc_op;
> >> +       kvm_pfn_t pfn;
> >> +       bool private;
> >> +
> >> +       write_lock(&kvm->mmu_lock);
> >> +
> >> +       if (unlikely(!kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level)))
> >> +               goto unlock;
> >> +
> >> +       assigned = snp_lookup_rmpentry(pfn, &rmp_level);
> >> +       if (unlikely(assigned < 0))
> >> +               goto unlock;
> >> +
> >> +       private = !!(error_code & PFERR_GUEST_ENC_MASK);
> >> +
> >> +       /*
> >> +        * If the fault was due to size mismatch, or NPT and RMP page level's
> >> +        * are not in sync, then use PSMASH to split the RMP entry into 4K.
> >> +        */
> >> +       if ((error_code & PFERR_GUEST_SIZEM_MASK) ||
> >> +           (npt_level == PG_LEVEL_4K && rmp_level == PG_LEVEL_2M && private)) {
> >> +               rc = snp_rmptable_psmash(kvm, pfn);
> >
> >
> > Regarding this case:
> > RMP level is 4K
> > Page table level is 2M
> >
> > Does this also cause a page fault with size mismatch? If so, we
> > shouldn't try psmash because the rmp entry is already 4K.
> >
> > I see these errors in our tests and I think it may be happening
> > because rmp size is already 4K.
> >
> > [ 1848.752952] psmash failed, gpa 0x191560000 pfn 0x536cd60 rc 7
> > [ 2922.879635] psmash failed, gpa 0x102830000 pfn 0x37c8230 rc 7
> > [ 3010.983090] psmash failed, gpa 0x104220000 pfn 0x6cf1e20 rc 7
> > [ 3170.792050] psmash failed, gpa 0x108a80000 pfn 0x20e0080 rc 7
> > [ 3345.955147] psmash failed, gpa 0x11b480000 pfn 0x1545e480 rc 7
> >
> > Shouldn't we use AND instead of OR in the if statement?
> >
>
> I believe this we can't do, looking at the typical usage case below :
>
> [   37.243969] #VMEXIT (NPF) - SIZEM, err 0xc80000005 npt_level 2,
> rmp_level 2, private 1
> [   37.243973] trying psmash gpa 0x7f790000 pfn 0x1f5d90
>
> This is typically the case with #VMEXIT(NPF) with SIZEM error code, when
> the guest tries to do PVALIDATE on 4K GHCB pages, in this case both the
> RMP table and NPT will be optimally setup to 2M hugepage as can be seen.
>
> Is it possible to investigate in more depth, when is the this case being
> observed:

Yes, I added more logs and I can see that these errors happen when RMP
level is 4K and NPT level is 2M.
psmash fails as expected. I think it is just a log, there is no real
issue but the best is not trying psmash if rmp level is 4K.

> RMP level is 4K
> Page table level is 2M
> We shouldn't try psmash because the rmp entry is already 4K.
>
> Thanks,
> Ashish
>
> > if ((error_code & PFERR_GUEST_SIZEM_MASK) && ...
> >
Ashish Kalra Oct. 13, 2022, 3 p.m. UTC | #3
On 10/12/2022 5:53 PM, Alper Gun wrote:
> On Mon, Oct 10, 2022 at 7:32 PM Kalra, Ashish <ashish.kalra@amd.com> wrote:
>>
>> Hello Alper,
>>
>> On 10/10/2022 5:03 PM, Alper Gun wrote:
>>> On Mon, Jun 20, 2022 at 4:13 PM Ashish Kalra <Ashish.Kalra@amd.com> wrote:
>>>>
>>>> From: Brijesh Singh <brijesh.singh@amd.com>
>>>>
>>>> When SEV-SNP is enabled in the guest, the hardware places restrictions on
>>>> all memory accesses based on the contents of the RMP table. When hardware
>>>> encounters RMP check failure caused by the guest memory access it raises
>>>> the #NPF. The error code contains additional information on the access
>>>> type. See the APM volume 2 for additional information.
>>>>
>>>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>>>> ---
>>>>    arch/x86/kvm/svm/sev.c | 76 ++++++++++++++++++++++++++++++++++++++++++
>>>>    arch/x86/kvm/svm/svm.c | 14 +++++---
>>>>    2 files changed, 86 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>>>> index 4ed90331bca0..7fc0fad87054 100644
>>>> --- a/arch/x86/kvm/svm/sev.c
>>>> +++ b/arch/x86/kvm/svm/sev.c
>>>> @@ -4009,3 +4009,79 @@ void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn)
>>>>
>>>>           spin_unlock(&sev->psc_lock);
>>>>    }
>>>> +
>>>> +void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
>>>> +{
>>>> +       int rmp_level, npt_level, rc, assigned;
>>>> +       struct kvm *kvm = vcpu->kvm;
>>>> +       gfn_t gfn = gpa_to_gfn(gpa);
>>>> +       bool need_psc = false;
>>>> +       enum psc_op psc_op;
>>>> +       kvm_pfn_t pfn;
>>>> +       bool private;
>>>> +
>>>> +       write_lock(&kvm->mmu_lock);
>>>> +
>>>> +       if (unlikely(!kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level)))
>>>> +               goto unlock;
>>>> +
>>>> +       assigned = snp_lookup_rmpentry(pfn, &rmp_level);
>>>> +       if (unlikely(assigned < 0))
>>>> +               goto unlock;
>>>> +
>>>> +       private = !!(error_code & PFERR_GUEST_ENC_MASK);
>>>> +
>>>> +       /*
>>>> +        * If the fault was due to size mismatch, or NPT and RMP page level's
>>>> +        * are not in sync, then use PSMASH to split the RMP entry into 4K.
>>>> +        */
>>>> +       if ((error_code & PFERR_GUEST_SIZEM_MASK) ||
>>>> +           (npt_level == PG_LEVEL_4K && rmp_level == PG_LEVEL_2M && private)) {
>>>> +               rc = snp_rmptable_psmash(kvm, pfn);
>>>
>>>
>>> Regarding this case:
>>> RMP level is 4K
>>> Page table level is 2M
>>>
>>> Does this also cause a page fault with size mismatch? If so, we
>>> shouldn't try psmash because the rmp entry is already 4K.
>>>
>>> I see these errors in our tests and I think it may be happening
>>> because rmp size is already 4K.
>>>
>>> [ 1848.752952] psmash failed, gpa 0x191560000 pfn 0x536cd60 rc 7
>>> [ 2922.879635] psmash failed, gpa 0x102830000 pfn 0x37c8230 rc 7
>>> [ 3010.983090] psmash failed, gpa 0x104220000 pfn 0x6cf1e20 rc 7
>>> [ 3170.792050] psmash failed, gpa 0x108a80000 pfn 0x20e0080 rc 7
>>> [ 3345.955147] psmash failed, gpa 0x11b480000 pfn 0x1545e480 rc 7
>>>
>>> Shouldn't we use AND instead of OR in the if statement?
>>>
>>
>> I believe this we can't do, looking at the typical usage case below :
>>
>> [   37.243969] #VMEXIT (NPF) - SIZEM, err 0xc80000005 npt_level 2,
>> rmp_level 2, private 1
>> [   37.243973] trying psmash gpa 0x7f790000 pfn 0x1f5d90
>>
>> This is typically the case with #VMEXIT(NPF) with SIZEM error code, when
>> the guest tries to do PVALIDATE on 4K GHCB pages, in this case both the
>> RMP table and NPT will be optimally setup to 2M hugepage as can be seen.
>>
>> Is it possible to investigate in more depth, when is the this case being
>> observed:
> 
> Yes, I added more logs and I can see that these errors happen when RMP
> level is 4K and NPT level is 2M.
> psmash fails as expected. I think it is just a log, there is no real
> issue but the best is not trying psmash if rmp level is 4K.
> 

Now, the SIZEM bit is only set when PVALIDATE or RMPADJUST fails due to
guest attempting to validate a 4K page that is backed by a 2MB RMP 
entry, which is not the case here as RMP level is 4K.

Also, this does not fall into the second case for the same reason.

#NPF will happen during Guest page table walk if RMP checks fail
for 2M nested page and RMP.SubPage_Count !=0 OR
RMP.PageSize != Nested table page size, but then that shouldn't have
the SIZEM fault bit set.

This raises concern about some existing race condition, it probably
can race with
snp_handle_page_state_change()->snp_make_page_shared()->snp_rmptable_psmash(),
but that code path seems to be protected from this nested RMP #PF 
handler as they both acquire the kvm->mmu_lock.

So, this still needs more investigation.

Can you share what kind of tests are you running to reproduce this
issue ?

Thanks,
Ashish

>> RMP level is 4K
>> Page table level is 2M
>> We shouldn't try psmash because the rmp entry is already 4K.
>>
>> Thanks,
>> Ashish
>>
>>> if ((error_code & PFERR_GUEST_SIZEM_MASK) && ...
>>>
diff mbox series

Patch

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4ed90331bca0..7fc0fad87054 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4009,3 +4009,79 @@  void sev_post_unmap_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn)
 
 	spin_unlock(&sev->psc_lock);
 }
+
+void handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code)
+{
+	int rmp_level, npt_level, rc, assigned;
+	struct kvm *kvm = vcpu->kvm;
+	gfn_t gfn = gpa_to_gfn(gpa);
+	bool need_psc = false;
+	enum psc_op psc_op;
+	kvm_pfn_t pfn;
+	bool private;
+
+	write_lock(&kvm->mmu_lock);
+
+	if (unlikely(!kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &npt_level)))
+		goto unlock;
+
+	assigned = snp_lookup_rmpentry(pfn, &rmp_level);
+	if (unlikely(assigned < 0))
+		goto unlock;
+
+	private = !!(error_code & PFERR_GUEST_ENC_MASK);
+
+	/*
+	 * If the fault was due to size mismatch, or NPT and RMP page level's
+	 * are not in sync, then use PSMASH to split the RMP entry into 4K.
+	 */
+	if ((error_code & PFERR_GUEST_SIZEM_MASK) ||
+	    (npt_level == PG_LEVEL_4K && rmp_level == PG_LEVEL_2M && private)) {
+		rc = snp_rmptable_psmash(kvm, pfn);
+		if (rc)
+			pr_err_ratelimited("psmash failed, gpa 0x%llx pfn 0x%llx rc %d\n",
+					   gpa, pfn, rc);
+		goto out;
+	}
+
+	/*
+	 * If it's a private access, and the page is not assigned in the
+	 * RMP table, create a new private RMP entry. This can happen if
+	 * guest did not use the PSC VMGEXIT to transition the page state
+	 * before the access.
+	 */
+	if (!assigned && private) {
+		need_psc = 1;
+		psc_op = SNP_PAGE_STATE_PRIVATE;
+		goto out;
+	}
+
+	/*
+	 * If it's a shared access, but the page is private in the RMP table
+	 * then make the page shared in the RMP table. This can happen if
+	 * the guest did not use the PSC VMGEXIT to transition the page
+	 * state before the access.
+	 */
+	if (assigned && !private) {
+		need_psc = 1;
+		psc_op = SNP_PAGE_STATE_SHARED;
+	}
+
+out:
+	write_unlock(&kvm->mmu_lock);
+
+	if (need_psc)
+		rc = __snp_handle_page_state_change(vcpu, psc_op, gpa, PG_LEVEL_4K);
+
+	/*
+	 * The fault handler has updated the RMP pagesize, zap the existing
+	 * rmaps for large entry ranges so that nested page table gets rebuilt
+	 * with the updated RMP pagesize.
+	 */
+	gfn = gpa_to_gfn(gpa) & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+	kvm_zap_gfn_range(kvm, gfn, gfn + PTRS_PER_PMD);
+	return;
+
+unlock:
+	write_unlock(&kvm->mmu_lock);
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1c8e035ba011..7742bc986afc 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1866,15 +1866,21 @@  static int pf_interception(struct kvm_vcpu *vcpu)
 static int npf_interception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	int rc;
 
 	u64 fault_address = svm->vmcb->control.exit_info_2;
 	u64 error_code = svm->vmcb->control.exit_info_1;
 
 	trace_kvm_page_fault(fault_address, error_code);
-	return kvm_mmu_page_fault(vcpu, fault_address, error_code,
-			static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
-			svm->vmcb->control.insn_bytes : NULL,
-			svm->vmcb->control.insn_len);
+	rc = kvm_mmu_page_fault(vcpu, fault_address, error_code,
+				static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
+				svm->vmcb->control.insn_bytes : NULL,
+				svm->vmcb->control.insn_len);
+
+	if (error_code & PFERR_GUEST_RMP_MASK)
+		handle_rmp_page_fault(vcpu, fault_address, error_code);
+
+	return rc;
 }
 
 static int db_interception(struct kvm_vcpu *vcpu)