[Resend,RFC,V2,00/12] x86/Hyper-V: Add Hyper-V Isolation VM support

Message ID	20210414144945.3460554-1-ltykernel@gmail.com
Headers	show Return-Path: <netdev-owner@kernel.org> From: Tianyu Lan <ltykernel@gmail.com> To: kys@microsoft.com, haiyangz@microsoft.com, sthemmin@microsoft.com, wei.liu@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, arnd@arndb.de, akpm@linux-foundation.org, gregkh@linuxfoundation.org, konrad.wilk@oracle.com, hch@lst.de, m.szyprowski@samsung.com, robin.murphy@arm.com, joro@8bytes.org, will@kernel.org, davem@davemloft.net, kuba@kernel.org, jejb@linux.ibm.com, martin.petersen@oracle.com Cc: Tianyu Lan <Tianyu.Lan@microsoft.com>, iommu@lists.linux-foundation.org, linux-arch@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-scsi@vger.kernel.org, netdev@vger.kernel.org, vkuznets@redhat.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, sunilmut@microsoft.com Subject: [Resend RFC PATCH V2 00/12] x86/Hyper-V: Add Hyper-V Isolation VM support Date: Wed, 14 Apr 2021 10:49:33 -0400 Message-Id: <20210414144945.3460554-1-ltykernel@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	x86/Hyper-V: Add Hyper-V Isolation VM support \| expand [Resend,RFC,V2,00/12] x86/Hyper-V: Add Hyper-V Isolation VM support [Resend,RFC,V2,02/12] x86/HV: Initialize shared memory boundary in Isolation VM [Resend,RFC,V2,04/12] HV: Add Write/Read MSR registers via ghcb [Resend,RFC,V2,06/12] HV/Vmbus: Add SNP support for VMbus channel initiate message [Resend,RFC,V2,08/12] UIO/Hyper-V: Not load UIO HV driver in the isolation VM. [Resend,RFC,V2,09/12] swiotlb: Add bounce buffer remap address setting function [Resend,RFC,V2,12/12] HV/Storvsc: Add Isolation VM support for storvsc driver

Tianyu Lan April 14, 2021, 2:49 p.m. UTC

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

"Resend all patches because someone in CC list didn't receive all
patchset. Sorry for nosy."

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

There are two exceptions - packets sent by vmbus_sendpacket_
pagebuffer() and vmbus_sendpacket_mpb_desc(). These packets
contains IO stack memory address and host will access these memory.
So add allocation bounce buffer support in vmbus for these packets.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

Tianyu Lan (12):
  x86/HV: Initialize GHCB page in Isolation VM
  x86/HV: Initialize shared memory boundary in Isolation VM
  x86/Hyper-V: Add new hvcall guest address host visibility support
  HV: Add Write/Read MSR registers via ghcb
  HV: Add ghcb hvcall support for SNP VM
  HV/Vmbus: Add SNP support for VMbus channel initiate message
  HV/Vmbus: Initialize VMbus ring buffer for Isolation VM
  UIO/Hyper-V: Not load UIO HV driver in the isolation VM.
  swiotlb: Add bounce buffer remap address setting function
  HV/IOMMU: Add Hyper-V dma ops support
  HV/Netvsc: Add Isolation VM support for netvsc driver
  HV/Storvsc: Add Isolation VM support for storvsc driver

 arch/x86/hyperv/Makefile           |   2 +-
 arch/x86/hyperv/hv_init.c          |  70 +++++--
 arch/x86/hyperv/ivm.c              | 289 +++++++++++++++++++++++++++++
 arch/x86/include/asm/hyperv-tlfs.h |  22 +++
 arch/x86/include/asm/mshyperv.h    |  90 +++++++--
 arch/x86/kernel/cpu/mshyperv.c     |   5 +
 arch/x86/kernel/pci-swiotlb.c      |   3 +-
 drivers/hv/channel.c               |  44 ++++-
 drivers/hv/connection.c            |  68 ++++++-
 drivers/hv/hv.c                    |  73 ++++++--
 drivers/hv/hyperv_vmbus.h          |   3 +
 drivers/hv/ring_buffer.c           |  83 ++++++---
 drivers/hv/vmbus_drv.c             |   3 +
 drivers/iommu/hyperv-iommu.c       | 127 +++++++++++++
 drivers/net/hyperv/hyperv_net.h    |  11 ++
 drivers/net/hyperv/netvsc.c        | 137 +++++++++++++-
 drivers/net/hyperv/rndis_filter.c  |   3 +
 drivers/scsi/storvsc_drv.c         |  67 ++++++-
 drivers/uio/uio_hv_generic.c       |   5 +
 include/asm-generic/hyperv-tlfs.h  |   1 +
 include/asm-generic/mshyperv.h     |  18 +-
 include/linux/hyperv.h             |  12 +-
 include/linux/swiotlb.h            |   5 +
 kernel/dma/swiotlb.c               |  13 +-
 mm/ioremap.c                       |   1 +
 mm/vmalloc.c                       |   1 +
 26 files changed, 1068 insertions(+), 88 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c

Christoph Hellwig April 14, 2021, 3:40 p.m. UTC | #1

> +/*
> + * hv_set_mem_host_visibility - Set host visibility for specified memory.
> + */

I don't think this comment really clarifies anything over the function
name.  What is 'host visibility'

> +int hv_set_mem_host_visibility(void *kbuffer, u32 size, u32 visibility)

Should size be a size_t?
Should visibility be an enum of some kind?

> +int hv_mark_gpa_visibility(u16 count, const u64 pfn[], u32 visibility)

Not sure what this does either.

> +	local_irq_save(flags);
> +	input_pcpu = (struct hv_input_modify_sparse_gpa_page_host_visibility **)

Is there a chance we could find a shorter but still descriptive
name for this variable?  Why do we need the cast?

> +#define VMBUS_PAGE_VISIBLE_READ_ONLY HV_MAP_GPA_READABLE
> +#define VMBUS_PAGE_VISIBLE_READ_WRITE (HV_MAP_GPA_READABLE|HV_MAP_GPA_WRITABLE)

pointlessly overlong line.

Christoph Hellwig April 14, 2021, 3:50 p.m. UTC | #2

> +struct dma_range {
> +	dma_addr_t dma;
> +	u32 mapping_size;
> +};

That's a rather generic name that is bound to create a conflict sooner
or later.

>  #include "hyperv_net.h"
>  #include "netvsc_trace.h"
> +#include "../../hv/hyperv_vmbus.h"

Please move public interfaces out of the private header rather than doing
this.

> +	if (hv_isolation_type_snp()) {
> +		area = get_vm_area(buf_size, VM_IOREMAP);

Err, no.  get_vm_area is private a for a reason.

> +		if (!area)
> +			goto cleanup;
> +
> +		vaddr = (unsigned long)area->addr;
> +		for (i = 0; i < buf_size / HV_HYP_PAGE_SIZE; i++) {
> +			extra_phys = (virt_to_hvpfn(net_device->recv_buf + i * HV_HYP_PAGE_SIZE)
> +				<< HV_HYP_PAGE_SHIFT) + ms_hyperv.shared_gpa_boundary;
> +			ret |= ioremap_page_range(vaddr + i * HV_HYP_PAGE_SIZE,
> +					   vaddr + (i + 1) * HV_HYP_PAGE_SIZE,
> +					   extra_phys, PAGE_KERNEL_IO);
> +		}
> +
> +		if (ret)
> +			goto cleanup;

And this is not something a driver should ever do.  I think you are badly
reimplementing functionality that should be in the dma coherent allocator
here.

Tianyu Lan April 15, 2021, 8:39 a.m. UTC | #3

On 4/14/2021 11:50 PM, Christoph Hellwig wrote:
>> +struct dma_range {

>> +	dma_addr_t dma;

>> +	u32 mapping_size;

>> +};

> 

> That's a rather generic name that is bound to create a conflict sooner

> or later.


Good point. Will update.

> 

>>   #include "hyperv_net.h"

>>   #include "netvsc_trace.h"

>> +#include "../../hv/hyperv_vmbus.h"

> 

> Please move public interfaces out of the private header rather than doing

> this.


OK. Will update.

> 

>> +	if (hv_isolation_type_snp()) {

>> +		area = get_vm_area(buf_size, VM_IOREMAP);

> 

> Err, no.  get_vm_area is private a for a reason.

> 

>> +		if (!area)

>> +			goto cleanup;

>> +

>> +		vaddr = (unsigned long)area->addr;

>> +		for (i = 0; i < buf_size / HV_HYP_PAGE_SIZE; i++) {

>> +			extra_phys = (virt_to_hvpfn(net_device->recv_buf + i * HV_HYP_PAGE_SIZE)

>> +				<< HV_HYP_PAGE_SHIFT) + ms_hyperv.shared_gpa_boundary;

>> +			ret |= ioremap_page_range(vaddr + i * HV_HYP_PAGE_SIZE,

>> +					   vaddr + (i + 1) * HV_HYP_PAGE_SIZE,

>> +					   extra_phys, PAGE_KERNEL_IO);

>> +		}

>> +

>> +		if (ret)

>> +			goto cleanup;

> 

> And this is not something a driver should ever do.  I think you are badly

> reimplementing functionality that should be in the dma coherent allocator

> here.

>

OK. I will try hiding these in the Hyper-V dma ops callback. Thanks.

Konrad Rzeszutek Wilk April 15, 2021, 8:24 p.m. UTC | #4

On Wed, Apr 14, 2021 at 10:49:40AM -0400, Tianyu Lan wrote:
> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> 
> VMbus ring buffer are shared with host and it's need to
> be accessed via extra address space of Isolation VM with
> SNP support. This patch is to map the ring buffer
> address in extra address space via ioremap(). HV host

Why do you need to use ioremap()? Why not just use vmap?


> visibility hvcall smears data in the ring buffer and
> so reset the ring buffer memory to zero after calling
> visibility hvcall.

So you are exposing these two:
 EXPORT_SYMBOL_GPL(get_vm_area);
 EXPORT_SYMBOL_GPL(ioremap_page_range);

But if you used vmap wouldn't you get the same thing for free?

> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  drivers/hv/channel.c      | 10 +++++
>  drivers/hv/hyperv_vmbus.h |  2 +
>  drivers/hv/ring_buffer.c  | 83 +++++++++++++++++++++++++++++----------
>  mm/ioremap.c              |  1 +
>  mm/vmalloc.c              |  1 +
>  5 files changed, 76 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index 407b74d72f3f..4a9fb7ad4c72 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -634,6 +634,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>  	if (err)
>  		goto error_clean_ring;
>  
> +	err = hv_ringbuffer_post_init(&newchannel->outbound,
> +				      page, send_pages);
> +	if (err)
> +		goto error_free_gpadl;
> +
> +	err = hv_ringbuffer_post_init(&newchannel->inbound,
> +				      &page[send_pages], recv_pages);
> +	if (err)
> +		goto error_free_gpadl;
> +
>  	/* Create and init the channel open message */
>  	open_info = kzalloc(sizeof(*open_info) +
>  			   sizeof(struct vmbus_channel_open_channel),
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index 0778add21a9c..d78a04ad5490 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -172,6 +172,8 @@ extern int hv_synic_cleanup(unsigned int cpu);
>  /* Interface */
>  
>  void hv_ringbuffer_pre_init(struct vmbus_channel *channel);
> +int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
> +		struct page *pages, u32 page_cnt);
>  
>  int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
>  		       struct page *pages, u32 pagecnt);
> diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
> index 35833d4d1a1d..c8b0f7b45158 100644
> --- a/drivers/hv/ring_buffer.c
> +++ b/drivers/hv/ring_buffer.c
> @@ -17,6 +17,8 @@
>  #include <linux/vmalloc.h>
>  #include <linux/slab.h>
>  #include <linux/prefetch.h>
> +#include <linux/io.h>
> +#include <asm/mshyperv.h>
>  
>  #include "hyperv_vmbus.h"
>  
> @@ -188,6 +190,44 @@ void hv_ringbuffer_pre_init(struct vmbus_channel *channel)
>  	mutex_init(&channel->outbound.ring_buffer_mutex);
>  }
>  
> +int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
> +		       struct page *pages, u32 page_cnt)
> +{
> +	struct vm_struct *area;
> +	u64 physic_addr = page_to_pfn(pages) << PAGE_SHIFT;
> +	unsigned long vaddr;
> +	int err = 0;
> +
> +	if (!hv_isolation_type_snp())
> +		return 0;
> +
> +	physic_addr += ms_hyperv.shared_gpa_boundary;
> +	area = get_vm_area((2 * page_cnt - 1) * PAGE_SIZE, VM_IOREMAP);
> +	if (!area || !area->addr)
> +		return -EFAULT;
> +
> +	vaddr = (unsigned long)area->addr;
> +	err = ioremap_page_range(vaddr, vaddr + page_cnt * PAGE_SIZE,
> +			   physic_addr, PAGE_KERNEL_IO);
> +	err |= ioremap_page_range(vaddr + page_cnt * PAGE_SIZE,
> +				  vaddr + (2 * page_cnt - 1) * PAGE_SIZE,
> +				  physic_addr + PAGE_SIZE, PAGE_KERNEL_IO);
> +	if (err) {
> +		vunmap((void *)vaddr);
> +		return -EFAULT;
> +	}
> +
> +	/* Clean memory after setting host visibility. */
> +	memset((void *)vaddr, 0x00, page_cnt * PAGE_SIZE);
> +
> +	ring_info->ring_buffer = (struct hv_ring_buffer *)vaddr;
> +	ring_info->ring_buffer->read_index = 0;
> +	ring_info->ring_buffer->write_index = 0;
> +	ring_info->ring_buffer->feature_bits.value = 1;
> +
> +	return 0;
> +}
> +
>  /* Initialize the ring buffer. */
>  int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
>  		       struct page *pages, u32 page_cnt)
> @@ -197,33 +237,34 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
>  
>  	BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
>  
> -	/*
> -	 * First page holds struct hv_ring_buffer, do wraparound mapping for
> -	 * the rest.
> -	 */
> -	pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
> -				   GFP_KERNEL);
> -	if (!pages_wraparound)
> -		return -ENOMEM;
> -
> -	pages_wraparound[0] = pages;
> -	for (i = 0; i < 2 * (page_cnt - 1); i++)
> -		pages_wraparound[i + 1] = &pages[i % (page_cnt - 1) + 1];
> +	if (!hv_isolation_type_snp()) {
> +		/*
> +		 * First page holds struct hv_ring_buffer, do wraparound mapping for
> +		 * the rest.
> +		 */
> +		pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
> +					   GFP_KERNEL);
> +		if (!pages_wraparound)
> +			return -ENOMEM;
>  
> -	ring_info->ring_buffer = (struct hv_ring_buffer *)
> -		vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, PAGE_KERNEL);
> +		pages_wraparound[0] = pages;
> +		for (i = 0; i < 2 * (page_cnt - 1); i++)
> +			pages_wraparound[i + 1] = &pages[i % (page_cnt - 1) + 1];
>  
> -	kfree(pages_wraparound);
> +		ring_info->ring_buffer = (struct hv_ring_buffer *)
> +			vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, PAGE_KERNEL);
>  
> +		kfree(pages_wraparound);
>  
> -	if (!ring_info->ring_buffer)
> -		return -ENOMEM;
> +		if (!ring_info->ring_buffer)
> +			return -ENOMEM;
>  
> -	ring_info->ring_buffer->read_index =
> -		ring_info->ring_buffer->write_index = 0;
> +		ring_info->ring_buffer->read_index =
> +			ring_info->ring_buffer->write_index = 0;
>  
> -	/* Set the feature bit for enabling flow control. */
> -	ring_info->ring_buffer->feature_bits.value = 1;
> +		/* Set the feature bit for enabling flow control. */
> +		ring_info->ring_buffer->feature_bits.value = 1;
> +	}
>  
>  	ring_info->ring_size = page_cnt << PAGE_SHIFT;
>  	ring_info->ring_size_div10_reciprocal =
> diff --git a/mm/ioremap.c b/mm/ioremap.c
> index 5fa1ab41d152..d63c4ba067f9 100644
> --- a/mm/ioremap.c
> +++ b/mm/ioremap.c
> @@ -248,6 +248,7 @@ int ioremap_page_range(unsigned long addr,
>  
>  	return err;
>  }
> +EXPORT_SYMBOL_GPL(ioremap_page_range);
>  
>  #ifdef CONFIG_GENERIC_IOREMAP
>  void __iomem *ioremap_prot(phys_addr_t addr, size_t size, unsigned long prot)
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index e6f352bf0498..19724a8ebcb7 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2131,6 +2131,7 @@ struct vm_struct *get_vm_area(unsigned long size, unsigned long flags)
>  				  NUMA_NO_NODE, GFP_KERNEL,
>  				  __builtin_return_address(0));
>  }
> +EXPORT_SYMBOL_GPL(get_vm_area);
>  
>  struct vm_struct *get_vm_area_caller(unsigned long size, unsigned long flags,
>  				const void *caller)
> -- 
> 2.25.1
>

Christoph Hellwig April 19, 2021, 6:36 a.m. UTC | #5

On Thu, Apr 15, 2021 at 04:24:15PM -0400, Konrad Rzeszutek Wilk wrote:
> So you are exposing these two:
>  EXPORT_SYMBOL_GPL(get_vm_area);
>  EXPORT_SYMBOL_GPL(ioremap_page_range);
> 
> But if you used vmap wouldn't you get the same thing for free?

Yes, this needs to go into some vmap version, preferably reusing the
existing code in kernel/dma/remap.c.

Exporting get_vm_area is a complete dealbreaker and not going to happen.
We worked hard on not exposing it to modules.

Tianyu Lan May 12, 2021, 4:01 p.m. UTC | #6

Hi Christoph and Konrad:
      Current Swiotlb bounce buffer uses a pool for all devices. There
is a high overhead to get or free bounce buffer during performance test.
Swiotlb code now use a global spin lock to protect bounce buffer data.
Several device queues try to acquire the spin lock and this introduce
additional overhead.

For performance and security perspective, each devices should have a
separate swiotlb bounce buffer pool and so this part needs to rework.
I want to check this is right way to resolve performance issues with 
swiotlb bounce buffer. If you have some other suggestions,welcome.

Thanks.

On 4/14/2021 11:47 PM, Christoph Hellwig wrote:
>> +static dma_addr_t hyperv_map_page(struct device *dev, struct page *page,

>> +				  unsigned long offset, size_t size,

>> +				  enum dma_data_direction dir,

>> +				  unsigned long attrs)

>> +{

>> +	phys_addr_t map, phys = (page_to_pfn(page) << PAGE_SHIFT) + offset;

>> +

>> +	if (!hv_is_isolation_supported())

>> +		return phys;

>> +

>> +	map = swiotlb_tbl_map_single(dev, phys, size, HV_HYP_PAGE_SIZE, dir,

>> +				     attrs);

>> +	if (map == (phys_addr_t)DMA_MAPPING_ERROR)

>> +		return DMA_MAPPING_ERROR;

>> +

>> +	return map;

>> +}

> 

> This largerly duplicates what dma-direct + swiotlb does.  Please use

> force_dma_unencrypted to force bounce buffering and just use the generic

> code.

> 

>> +	if (hv_isolation_type_snp()) {

>> +		ret = hv_set_mem_host_visibility(

>> +				phys_to_virt(hyperv_io_tlb_start),

>> +				hyperv_io_tlb_size,

>> +				VMBUS_PAGE_VISIBLE_READ_WRITE);

>> +		if (ret)

>> +			panic("%s: Fail to mark Hyper-v swiotlb buffer visible to host. err=%d\n",

>> +			      __func__, ret);

>> +

>> +		hyperv_io_tlb_remap = ioremap_cache(hyperv_io_tlb_start

>> +					    + ms_hyperv.shared_gpa_boundary,

>> +						    hyperv_io_tlb_size);

>> +		if (!hyperv_io_tlb_remap)

>> +			panic("%s: Fail to remap io tlb.\n", __func__);

>> +

>> +		memset(hyperv_io_tlb_remap, 0x00, hyperv_io_tlb_size);

>> +		swiotlb_set_bounce_remap(hyperv_io_tlb_remap);

> 

> And this really needs to go into a common hook where we currently just

> call set_memory_decrypted so that all the different schemes for these

> trusted VMs (we have about half a dozen now) can share code rather than

> reinventing it.

>

Robin Murphy May 12, 2021, 5:29 p.m. UTC | #7

On 2021-05-12 17:01, Tianyu Lan wrote:
> Hi Christoph and Konrad:

>       Current Swiotlb bounce buffer uses a pool for all devices. There

> is a high overhead to get or free bounce buffer during performance test.

> Swiotlb code now use a global spin lock to protect bounce buffer data.

> Several device queues try to acquire the spin lock and this introduce

> additional overhead.

> 

> For performance and security perspective, each devices should have a

> separate swiotlb bounce buffer pool and so this part needs to rework.

> I want to check this is right way to resolve performance issues with 

> swiotlb bounce buffer. If you have some other suggestions,welcome.


We're already well on the way to factoring out SWIOTLB to allow for just 
this sort of more flexible usage like per-device bounce pools - see here:

https://lore.kernel.org/linux-iommu/20210510095026.3477496-1-tientzu@chromium.org/T/#t

FWIW this looks to have an awful lot in common with what's going to be 
needed for Android's protected KVM and Arm's Confidential Compute 
Architecture, so we'll all be better off by doing it right. I'm getting 
the feeling that set_memory_decrypted() wants to grow into a more 
general abstraction that can return an alias at a different GPA if 
necessary.

Robin.

> 

> Thanks.

> 

> On 4/14/2021 11:47 PM, Christoph Hellwig wrote:

>>> +static dma_addr_t hyperv_map_page(struct device *dev, struct page 

>>> *page,

>>> +                  unsigned long offset, size_t size,

>>> +                  enum dma_data_direction dir,

>>> +                  unsigned long attrs)

>>> +{

>>> +    phys_addr_t map, phys = (page_to_pfn(page) << PAGE_SHIFT) + offset;

>>> +

>>> +    if (!hv_is_isolation_supported())

>>> +        return phys;

>>> +

>>> +    map = swiotlb_tbl_map_single(dev, phys, size, HV_HYP_PAGE_SIZE, 

>>> dir,

>>> +                     attrs);

>>> +    if (map == (phys_addr_t)DMA_MAPPING_ERROR)

>>> +        return DMA_MAPPING_ERROR;

>>> +

>>> +    return map;

>>> +}

>>

>> This largerly duplicates what dma-direct + swiotlb does.  Please use

>> force_dma_unencrypted to force bounce buffering and just use the generic

>> code.

>>

>>> +    if (hv_isolation_type_snp()) {

>>> +        ret = hv_set_mem_host_visibility(

>>> +                phys_to_virt(hyperv_io_tlb_start),

>>> +                hyperv_io_tlb_size,

>>> +                VMBUS_PAGE_VISIBLE_READ_WRITE);

>>> +        if (ret)

>>> +            panic("%s: Fail to mark Hyper-v swiotlb buffer visible 

>>> to host. err=%d\n",

>>> +                  __func__, ret);

>>> +

>>> +        hyperv_io_tlb_remap = ioremap_cache(hyperv_io_tlb_start

>>> +                        + ms_hyperv.shared_gpa_boundary,

>>> +                            hyperv_io_tlb_size);

>>> +        if (!hyperv_io_tlb_remap)

>>> +            panic("%s: Fail to remap io tlb.\n", __func__);

>>> +

>>> +        memset(hyperv_io_tlb_remap, 0x00, hyperv_io_tlb_size);

>>> +        swiotlb_set_bounce_remap(hyperv_io_tlb_remap);

>>

>> And this really needs to go into a common hook where we currently just

>> call set_memory_decrypted so that all the different schemes for these

>> trusted VMs (we have about half a dozen now) can share code rather than

>> reinventing it.

>>

Baolu Lu May 13, 2021, 3:19 a.m. UTC | #8

On 5/13/21 12:01 AM, Tianyu Lan wrote:
> Hi Christoph and Konrad:
>       Current Swiotlb bounce buffer uses a pool for all devices. There
> is a high overhead to get or free bounce buffer during performance test.
> Swiotlb code now use a global spin lock to protect bounce buffer data.
> Several device queues try to acquire the spin lock and this introduce
> additional overhead.
> 
> For performance and security perspective, each devices should have a
> separate swiotlb bounce buffer pool and so this part needs to rework.
> I want to check this is right way to resolve performance issues with 
> swiotlb bounce buffer. If you have some other suggestions,welcome.

Is this what you want?

https://lore.kernel.org/linux-iommu/20210510095026.3477496-1-tientzu@chromium.org/

Best regards,
baolu

[Resend,RFC,V2,00/12] x86/Hyper-V: Add Hyper-V Isolation VM support

Message

Comments