diff mbox series

[V7,1/5] swiotlb: Add swiotlb bounce buffer remap function for HV IVM

Message ID 20211213071407.314309-2-ltykernel@gmail.com
State New
Headers show
Series x86/Hyper-V: Add Hyper-V Isolation VM support(Second part) | expand

Commit Message

Tianyu Lan Dec. 13, 2021, 7:14 a.m. UTC
From: Tianyu Lan <Tianyu.Lan@microsoft.com>

In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
extra address space which is above shared_gpa_boundary (E.G 39 bit
address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access
physical address will be original physical address + shared_gpa_boundary.
The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of
memory(vTOM). Memory addresses below vTOM are automatically treated as
private while memory above vTOM is treated as shared.

Expose swiotlb_unencrypted_base for platforms to set unencrypted
memory base offset and platform calls swiotlb_update_mem_attributes()
to remap swiotlb mem to unencrypted address space. memremap() can
not be called in the early stage and so put remapping code into
swiotlb_update_mem_attributes(). Store remap address and use it to copy
data from/to swiotlb bounce buffer.

Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
Change since v3:
	* Fix boot up failure on the host with mem_encrypt=on.
	  Move calloing of set_memory_decrypted() back from
	  swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl()
	  and rmem_swiotlb_device_init().

Change since v2:
	* Leave mem->vaddr with phys_to_virt(mem->start) when fail
	  to remap swiotlb memory.

Change since v1:
	* Rework comment in the swiotlb_init_io_tlb_mem()
	* Make swiotlb_init_io_tlb_mem() back to return void.
---
 include/linux/swiotlb.h |  6 ++++++
 kernel/dma/swiotlb.c    | 43 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 47 insertions(+), 2 deletions(-)

Comments

Dave Hansen Dec. 13, 2021, 4:45 p.m. UTC | #1
On 12/12/21 11:14 PM, Tianyu Lan wrote:
> In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
> extra address space which is above shared_gpa_boundary (E.G 39 bit
> address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access
> physical address will be original physical address + shared_gpa_boundary.
> The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of
> memory(vTOM). Memory addresses below vTOM are automatically treated as
> private while memory above vTOM is treated as shared.

This seems to be independently reintroducing some of the SEV
infrastructure.  Is it really OK that this doesn't interact at all with
any existing SEV code?

For instance, do we need a new 'swiotlb_unencrypted_base', or should
this just be using sme_me_mask somehow?
Tianyu Lan Dec. 14, 2021, 4:36 a.m. UTC | #2
On 12/14/2021 12:45 AM, Dave Hansen wrote:
> On 12/12/21 11:14 PM, Tianyu Lan wrote:
>> In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
>> extra address space which is above shared_gpa_boundary (E.G 39 bit
>> address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access
>> physical address will be original physical address + shared_gpa_boundary.
>> The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of
>> memory(vTOM). Memory addresses below vTOM are automatically treated as
>> private while memory above vTOM is treated as shared.
> 
> This seems to be independently reintroducing some of the SEV
> infrastructure.  Is it really OK that this doesn't interact at all with
> any existing SEV code?
> 
> For instance, do we need a new 'swiotlb_unencrypted_base', or should
> this just be using sme_me_mask somehow?

Hi Dave:
        Thanks for your review. Hyper-V provides a para-virtualized
confidential computing solution based on the AMD SEV function and not
expose sev&sme capabilities to guest. So sme_me_mask is unset in the
Hyper-V Isolation VM. swiotlb_unencrypted_base is more general solution
to handle such case of different address space for encrypted and
decrypted memory and other platform also may reuse it.
Dave Hansen Dec. 14, 2021, 6:40 p.m. UTC | #3
On 12/13/21 8:36 PM, Tianyu Lan wrote:
> On 12/14/2021 12:45 AM, Dave Hansen wrote:
>> On 12/12/21 11:14 PM, Tianyu Lan wrote:
>>> In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
>>> extra address space which is above shared_gpa_boundary (E.G 39 bit
>>> address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access
>>> physical address will be original physical address +
>>> shared_gpa_boundary.
>>> The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of
>>> memory(vTOM). Memory addresses below vTOM are automatically treated as
>>> private while memory above vTOM is treated as shared.
>>
>> This seems to be independently reintroducing some of the SEV
>> infrastructure.  Is it really OK that this doesn't interact at all with
>> any existing SEV code?
>>
>> For instance, do we need a new 'swiotlb_unencrypted_base', or should
>> this just be using sme_me_mask somehow?
> 
>        Thanks for your review. Hyper-V provides a para-virtualized
> confidential computing solution based on the AMD SEV function and not
> expose sev&sme capabilities to guest. So sme_me_mask is unset in the
> Hyper-V Isolation VM. swiotlb_unencrypted_base is more general solution
> to handle such case of different address space for encrypted and
> decrypted memory and other platform also may reuse it.

I don't really understand how this can be more general any *not* get
utilized by the existing SEV support.
Tom Lendacky Dec. 14, 2021, 10:23 p.m. UTC | #4
On 12/14/21 12:40 PM, Dave Hansen wrote:
> On 12/13/21 8:36 PM, Tianyu Lan wrote:
>> On 12/14/2021 12:45 AM, Dave Hansen wrote:
>>> On 12/12/21 11:14 PM, Tianyu Lan wrote:
>>>> In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
>>>> extra address space which is above shared_gpa_boundary (E.G 39 bit
>>>> address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access
>>>> physical address will be original physical address +
>>>> shared_gpa_boundary.
>>>> The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of
>>>> memory(vTOM). Memory addresses below vTOM are automatically treated as
>>>> private while memory above vTOM is treated as shared.
>>>
>>> This seems to be independently reintroducing some of the SEV
>>> infrastructure.  Is it really OK that this doesn't interact at all with
>>> any existing SEV code?
>>>
>>> For instance, do we need a new 'swiotlb_unencrypted_base', or should
>>> this just be using sme_me_mask somehow?
>>
>>         Thanks for your review. Hyper-V provides a para-virtualized
>> confidential computing solution based on the AMD SEV function and not
>> expose sev&sme capabilities to guest. So sme_me_mask is unset in the
>> Hyper-V Isolation VM. swiotlb_unencrypted_base is more general solution
>> to handle such case of different address space for encrypted and
>> decrypted memory and other platform also may reuse it.
> 
> I don't really understand how this can be more general any *not* get
> utilized by the existing SEV support.

The Virtual Top-of-Memory (VTOM) support is an SEV-SNP feature that is 
meant to be used with a (relatively) un-enlightened guest. The idea is 
that the C-bit in the guest page tables must be 0 for all accesses. It is 
only the physical address relative to VTOM that determines if the access 
is encrypted or not. So setting sme_me_mask will actually cause issues 
when running with this feature. Since all DMA for an SEV-SNP guest must 
still be to shared (unencrypted) memory, some enlightenment is needed. In 
this case, memory mapped above VTOM will provide that via the SWIOTLB 
update. For SEV-SNP guests running with VTOM, they are likely to also be 
running with the Reflect #VC feature, allowing a "paravisor" to handle any 
#VCs generated by the guest.

See sections 15.36.8 "Virtual Top-of-Memory" and 15.36.9 "Reflect #VC" in 
volume 2 of the AMD APM [1].

I'm not sure if that will answer your question or generate more :)

Thanks,
Tom

[1] https://www.amd.com/system/files/TechDocs/24593.pdf

>
Dave Hansen Dec. 14, 2021, 10:40 p.m. UTC | #5
On 12/14/21 2:23 PM, Tom Lendacky wrote:
>> I don't really understand how this can be more general any *not* get
>> utilized by the existing SEV support.
> 
> The Virtual Top-of-Memory (VTOM) support is an SEV-SNP feature that is
> meant to be used with a (relatively) un-enlightened guest. The idea is
> that the C-bit in the guest page tables must be 0 for all accesses. It
> is only the physical address relative to VTOM that determines if the
> access is encrypted or not. So setting sme_me_mask will actually cause
> issues when running with this feature. Since all DMA for an SEV-SNP
> guest must still be to shared (unencrypted) memory, some enlightenment
> is needed. In this case, memory mapped above VTOM will provide that via
> the SWIOTLB update. For SEV-SNP guests running with VTOM, they are
> likely to also be running with the Reflect #VC feature, allowing a
> "paravisor" to handle any #VCs generated by the guest.
> 
> See sections 15.36.8 "Virtual Top-of-Memory" and 15.36.9 "Reflect #VC"
> in volume 2 of the AMD APM [1].

Thanks, Tom, that's pretty much what I was looking for.

The C-bit normally comes from the page tables.  But, the hardware also
provides an alternative way to effectively get C-bit behavior without
actually setting the bit in the page tables: Virtual Top-of-Memory
(VTOM).  Right?

It sounds like Hyper-V has chosen to use VTOM instead of requiring the
guest to do the C-bit in its page tables.

But, the thing that confuses me is when you said: "it (VTOM) is meant to
be used with a (relatively) un-enlightened guest".  We don't have an
unenlightened guest here.  We have Linux, which is quite enlightened.

Is VTOM being used because there's something that completely rules out
using the C-bit in the page tables?  What's that "something"?
Tianyu Lan Dec. 15, 2021, 5 a.m. UTC | #6
On 12/15/2021 6:40 AM, Dave Hansen wrote:
> On 12/14/21 2:23 PM, Tom Lendacky wrote:
>>> I don't really understand how this can be more general any *not* get
>>> utilized by the existing SEV support.
>>
>> The Virtual Top-of-Memory (VTOM) support is an SEV-SNP feature that is
>> meant to be used with a (relatively) un-enlightened guest. The idea is
>> that the C-bit in the guest page tables must be 0 for all accesses. It
>> is only the physical address relative to VTOM that determines if the
>> access is encrypted or not. So setting sme_me_mask will actually cause
>> issues when running with this feature. Since all DMA for an SEV-SNP
>> guest must still be to shared (unencrypted) memory, some enlightenment
>> is needed. In this case, memory mapped above VTOM will provide that via
>> the SWIOTLB update. For SEV-SNP guests running with VTOM, they are
>> likely to also be running with the Reflect #VC feature, allowing a
>> "paravisor" to handle any #VCs generated by the guest.
>>
>> See sections 15.36.8 "Virtual Top-of-Memory" and 15.36.9 "Reflect #VC"
>> in volume 2 of the AMD APM [1].
> 
> Thanks, Tom, that's pretty much what I was looking for.
> 
> The C-bit normally comes from the page tables.  But, the hardware also
> provides an alternative way to effectively get C-bit behavior without
> actually setting the bit in the page tables: Virtual Top-of-Memory
> (VTOM).  Right?
> 
> It sounds like Hyper-V has chosen to use VTOM instead of requiring the
> guest to do the C-bit in its page tables.
> 
> But, the thing that confuses me is when you said: "it (VTOM) is meant to
> be used with a (relatively) un-enlightened guest".  We don't have an
> unenlightened guest here.  We have Linux, which is quite enlightened.
>
>> Is VTOM being used because there's something that completely rules out
>> using the C-bit in the page tables?  What's that "something"?


For "un-enlightened" guest, there is an another system running insider
the VM to emulate some functions(tsc, timer, interrupt and so on) and
this helps not to modify OS(Linux/Windows) a lot. In Hyper-V Isolation
VM, we called the new system as HCL/paravisor. HCL runs in the VMPL0 and 
Linux runs in VMPL2. This is similar with nested virtualization. HCL
plays similar role as L1 hypervisor to emulate some general functions
(e.g, rdmsr/wrmsr accessing and interrupt injection) which needs to be
enlightened in the enlightened guest. Linux kernel needs to handle
#vc/#ve exception directly in the enlightened guest. HCL handles such
exception in un-enlightened guest and emulate interrupt injection which
helps not to modify OS core part code. Using vTOM also is same purpose.
Hyper-V uses vTOM avoid changing page table related code in OS(include
Windows and Linux)and just needs to map memory into decrypted address
space above vTOM in the driver code.

Linux has generic swiotlb bounce buffer implementation and so introduce
swiotlb_unencrypted_base here to set shared memory boundary or vTOM.
Hyper-V Isolation VM is un-enlightened guest. Hyper-V doesn't expose 
sev/sme capability to guest and so SEV code actually doesn't work.
So we also can't interact current existing SEV code and these code is
for enlightened guest support without HCL/paravisor. If other platforms
or SEV want to use similar vTOM feature, swiotlb_unencrypted_base can
be reused. So swiotlb_unencrypted_base is a general solution for all
platforms besides SEV and Hyper-V.
Wei Liu Dec. 16, 2021, 11:05 a.m. UTC | #7
On Wed, Dec 15, 2021 at 01:00:38PM +0800, Tianyu Lan wrote:
> 
> 
> On 12/15/2021 6:40 AM, Dave Hansen wrote:
> > On 12/14/21 2:23 PM, Tom Lendacky wrote:
> > > > I don't really understand how this can be more general any *not* get
> > > > utilized by the existing SEV support.
> > > 
> > > The Virtual Top-of-Memory (VTOM) support is an SEV-SNP feature that is
> > > meant to be used with a (relatively) un-enlightened guest. The idea is
> > > that the C-bit in the guest page tables must be 0 for all accesses. It
> > > is only the physical address relative to VTOM that determines if the
> > > access is encrypted or not. So setting sme_me_mask will actually cause
> > > issues when running with this feature. Since all DMA for an SEV-SNP
> > > guest must still be to shared (unencrypted) memory, some enlightenment
> > > is needed. In this case, memory mapped above VTOM will provide that via
> > > the SWIOTLB update. For SEV-SNP guests running with VTOM, they are
> > > likely to also be running with the Reflect #VC feature, allowing a
> > > "paravisor" to handle any #VCs generated by the guest.
> > > 
> > > See sections 15.36.8 "Virtual Top-of-Memory" and 15.36.9 "Reflect #VC"
> > > in volume 2 of the AMD APM [1].
> > 
> > Thanks, Tom, that's pretty much what I was looking for.
> > 
> > The C-bit normally comes from the page tables.  But, the hardware also
> > provides an alternative way to effectively get C-bit behavior without
> > actually setting the bit in the page tables: Virtual Top-of-Memory
> > (VTOM).  Right?
> > 
> > It sounds like Hyper-V has chosen to use VTOM instead of requiring the
> > guest to do the C-bit in its page tables.
> > 
> > But, the thing that confuses me is when you said: "it (VTOM) is meant to
> > be used with a (relatively) un-enlightened guest".  We don't have an
> > unenlightened guest here.  We have Linux, which is quite enlightened.
> > 
> > > Is VTOM being used because there's something that completely rules out
> > > using the C-bit in the page tables?  What's that "something"?
> 
> 
> For "un-enlightened" guest, there is an another system running insider
> the VM to emulate some functions(tsc, timer, interrupt and so on) and
> this helps not to modify OS(Linux/Windows) a lot. In Hyper-V Isolation
> VM, we called the new system as HCL/paravisor. HCL runs in the VMPL0 and
> Linux runs in VMPL2. This is similar with nested virtualization. HCL
> plays similar role as L1 hypervisor to emulate some general functions
> (e.g, rdmsr/wrmsr accessing and interrupt injection) which needs to be
> enlightened in the enlightened guest. Linux kernel needs to handle
> #vc/#ve exception directly in the enlightened guest. HCL handles such
> exception in un-enlightened guest and emulate interrupt injection which
> helps not to modify OS core part code. Using vTOM also is same purpose.
> Hyper-V uses vTOM avoid changing page table related code in OS(include
> Windows and Linux)and just needs to map memory into decrypted address
> space above vTOM in the driver code.
> 
> Linux has generic swiotlb bounce buffer implementation and so introduce
> swiotlb_unencrypted_base here to set shared memory boundary or vTOM.
> Hyper-V Isolation VM is un-enlightened guest. Hyper-V doesn't expose sev/sme
> capability to guest and so SEV code actually doesn't work.
> So we also can't interact current existing SEV code and these code is
> for enlightened guest support without HCL/paravisor. If other platforms
> or SEV want to use similar vTOM feature, swiotlb_unencrypted_base can
> be reused. So swiotlb_unencrypted_base is a general solution for all
> platforms besides SEV and Hyper-V.
> 

Thanks for the detailed explanation.

Dave, are you happy with this?

The code looks pretty solid to my untrained eyes. And the series has
collected necessary acks from stakeholders. If I don't hear objection by
EOD Friday I will apply this series to hyperv-next.

Wei.
diff mbox series

Patch

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 569272871375..f6c3638255d5 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -73,6 +73,9 @@  extern enum swiotlb_force swiotlb_force;
  * @end:	The end address of the swiotlb memory pool. Used to do a quick
  *		range check to see if the memory was in fact allocated by this
  *		API.
+ * @vaddr:	The vaddr of the swiotlb memory pool. The swiotlb memory pool
+ *		may be remapped in the memory encrypted case and store virtual
+ *		address for bounce buffer operation.
  * @nslabs:	The number of IO TLB blocks (in groups of 64) between @start and
  *		@end. For default swiotlb, this is command line adjustable via
  *		setup_io_tlb_npages.
@@ -92,6 +95,7 @@  extern enum swiotlb_force swiotlb_force;
 struct io_tlb_mem {
 	phys_addr_t start;
 	phys_addr_t end;
+	void *vaddr;
 	unsigned long nslabs;
 	unsigned long used;
 	unsigned int index;
@@ -186,4 +190,6 @@  static inline bool is_swiotlb_for_alloc(struct device *dev)
 }
 #endif /* CONFIG_DMA_RESTRICTED_POOL */
 
+extern phys_addr_t swiotlb_unencrypted_base;
+
 #endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8e840fbbed7c..34e6ade4f73c 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -50,6 +50,7 @@ 
 #include <asm/io.h>
 #include <asm/dma.h>
 
+#include <linux/io.h>
 #include <linux/init.h>
 #include <linux/memblock.h>
 #include <linux/iommu-helper.h>
@@ -72,6 +73,8 @@  enum swiotlb_force swiotlb_force;
 
 struct io_tlb_mem io_tlb_default_mem;
 
+phys_addr_t swiotlb_unencrypted_base;
+
 /*
  * Max segment that we can provide which (if pages are contingous) will
  * not be bounced (unless SWIOTLB_FORCE is set).
@@ -155,6 +158,27 @@  static inline unsigned long nr_slots(u64 val)
 	return DIV_ROUND_UP(val, IO_TLB_SIZE);
 }
 
+/*
+ * Remap swioltb memory in the unencrypted physical address space
+ * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP
+ * Isolation VMs).
+ */
+void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes)
+{
+	void *vaddr = NULL;
+
+	if (swiotlb_unencrypted_base) {
+		phys_addr_t paddr = mem->start + swiotlb_unencrypted_base;
+
+		vaddr = memremap(paddr, bytes, MEMREMAP_WB);
+		if (!vaddr)
+			pr_err("Failed to map the unencrypted memory %llx size %lx.\n",
+			       paddr, bytes);
+	}
+
+	return vaddr;
+}
+
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
  * perform the desired operations.  This function allows the architecture to
@@ -172,7 +196,12 @@  void __init swiotlb_update_mem_attributes(void)
 	vaddr = phys_to_virt(mem->start);
 	bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
 	set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
-	memset(vaddr, 0, bytes);
+
+	mem->vaddr = swiotlb_mem_remap(mem, bytes);
+	if (!mem->vaddr)
+		mem->vaddr = vaddr;
+
+	memset(mem->vaddr, 0, bytes);
 }
 
 static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
@@ -196,7 +225,17 @@  static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
 		mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
 		mem->slots[i].alloc_size = 0;
 	}
+
+	/*
+	 * If swiotlb_unencrypted_base is set, the bounce buffer memory will
+	 * be remapped and cleared in swiotlb_update_mem_attributes.
+	 */
+	if (swiotlb_unencrypted_base)
+		return;
+
 	memset(vaddr, 0, bytes);
+	mem->vaddr = vaddr;
+	return;
 }
 
 int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
@@ -371,7 +410,7 @@  static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size
 	phys_addr_t orig_addr = mem->slots[index].orig_addr;
 	size_t alloc_size = mem->slots[index].alloc_size;
 	unsigned long pfn = PFN_DOWN(orig_addr);
-	unsigned char *vaddr = phys_to_virt(tlb_addr);
+	unsigned char *vaddr = mem->vaddr + tlb_addr - mem->start;
 	unsigned int tlb_offset, orig_addr_offset;
 
 	if (orig_addr == INVALID_PHYS_ADDR)