[Resend,RFC,V2,10/12] HV/IOMMU: Add Hyper-V dma ops support

Message ID	20210414144945.3460554-11-ltykernel@gmail.com
State	New
Headers	show Return-Path: <linux-scsi-owner@kernel.org> From: Tianyu Lan <ltykernel@gmail.com> To: kys@microsoft.com, haiyangz@microsoft.com, sthemmin@microsoft.com, wei.liu@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, arnd@arndb.de, akpm@linux-foundation.org, gregkh@linuxfoundation.org, konrad.wilk@oracle.com, hch@lst.de, m.szyprowski@samsung.com, robin.murphy@arm.com, joro@8bytes.org, will@kernel.org, davem@davemloft.net, kuba@kernel.org, jejb@linux.ibm.com, martin.petersen@oracle.com Cc: Tianyu Lan <Tianyu.Lan@microsoft.com>, iommu@lists.linux-foundation.org, linux-arch@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-scsi@vger.kernel.org, netdev@vger.kernel.org, vkuznets@redhat.com, thomas.lendacky@amd.com, brijesh.singh@amd.com, sunilmut@microsoft.com Subject: [Resend RFC PATCH V2 10/12] HV/IOMMU: Add Hyper-V dma ops support Date: Wed, 14 Apr 2021 10:49:43 -0400 Message-Id: <20210414144945.3460554-11-ltykernel@gmail.com> In-Reply-To: <20210414144945.3460554-1-ltykernel@gmail.com> References: <20210414144945.3460554-1-ltykernel@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	x86/Hyper-V: Add Hyper-V Isolation VM support \| expand [Resend,RFC,V2,00/12] x86/Hyper-V: Add Hyper-V Isolation VM support [Resend,RFC,V2,02/12] x86/HV: Initialize shared memory boundary in Isolation VM [Resend,RFC,V2,04/12] HV: Add Write/Read MSR registers via ghcb [Resend,RFC,V2,06/12] HV/Vmbus: Add SNP support for VMbus channel initiate message [Resend,RFC,V2,08/12] UIO/Hyper-V: Not load UIO HV driver in the isolation VM. [Resend,RFC,V2,10/12] HV/IOMMU: Add Hyper-V dma ops support [Resend,RFC,V2,12/12] HV/Storvsc: Add Isolation VM support for storvsc driver

Message ID

20210414144945.3460554-11-ltykernel@gmail.com

State

New

Headers

From: Tianyu Lan <ltykernel@gmail.com>
To: kys@microsoft.com, haiyangz@microsoft.com, sthemmin@microsoft.com,
	wei.liu@kernel.org, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, x86@kernel.org, hpa@zytor.com, arnd@arndb.de,
	akpm@linux-foundation.org, gregkh@linuxfoundation.org,
	konrad.wilk@oracle.com, hch@lst.de, m.szyprowski@samsung.com,
	robin.murphy@arm.com, joro@8bytes.org, will@kernel.org,
	davem@davemloft.net, kuba@kernel.org, jejb@linux.ibm.com,
	martin.petersen@oracle.com
Cc: Tianyu Lan <Tianyu.Lan@microsoft.com>,
	iommu@lists.linux-foundation.org, linux-arch@vger.kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-scsi@vger.kernel.org,
	netdev@vger.kernel.org, vkuznets@redhat.com,
	thomas.lendacky@amd.com, brijesh.singh@amd.com, sunilmut@microsoft.com
Subject: [Resend RFC PATCH V2 10/12] HV/IOMMU: Add Hyper-V dma ops support
Date: Wed, 14 Apr 2021 10:49:43 -0400
Message-Id: <20210414144945.3460554-11-ltykernel@gmail.com>
In-Reply-To: <20210414144945.3460554-1-ltykernel@gmail.com>
References: <20210414144945.3460554-1-ltykernel@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

x86/Hyper-V: Add Hyper-V Isolation VM support | expand

Commit Message

Tianyu Lan April 14, 2021, 2:49 p.m. UTC

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V Isolation VM requires bounce buffer support. To use swiotlb
bounce buffer, add Hyper-V dma ops and use swiotlb functions in the
map and unmap callback.

Allocate bounce buffer in the Hyper-V code because bounce buffer
needs to be accessed via extra address space(e.g, address above 39bit)
in the AMD SEV SNP based Isolation VM.

ioremap_cache() can't use in the hyperv_iommu_swiotlb_init() which
is too early place and remap bounce buffer in the hyperv_iommu_swiotlb_
later_init().

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 arch/x86/kernel/pci-swiotlb.c |   3 +-
 drivers/hv/vmbus_drv.c        |   3 +
 drivers/iommu/hyperv-iommu.c  | 127 ++++++++++++++++++++++++++++++++++
 3 files changed, 132 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig April 14, 2021, 3:47 p.m. UTC | #1

> +static dma_addr_t hyperv_map_page(struct device *dev, struct page *page,
> +				  unsigned long offset, size_t size,
> +				  enum dma_data_direction dir,
> +				  unsigned long attrs)
> +{
> +	phys_addr_t map, phys = (page_to_pfn(page) << PAGE_SHIFT) + offset;
> +
> +	if (!hv_is_isolation_supported())
> +		return phys;
> +
> +	map = swiotlb_tbl_map_single(dev, phys, size, HV_HYP_PAGE_SIZE, dir,
> +				     attrs);
> +	if (map == (phys_addr_t)DMA_MAPPING_ERROR)
> +		return DMA_MAPPING_ERROR;
> +
> +	return map;
> +}

This largerly duplicates what dma-direct + swiotlb does.  Please use
force_dma_unencrypted to force bounce buffering and just use the generic
code.

> +	if (hv_isolation_type_snp()) {
> +		ret = hv_set_mem_host_visibility(
> +				phys_to_virt(hyperv_io_tlb_start),
> +				hyperv_io_tlb_size,
> +				VMBUS_PAGE_VISIBLE_READ_WRITE);
> +		if (ret)
> +			panic("%s: Fail to mark Hyper-v swiotlb buffer visible to host. err=%d\n",
> +			      __func__, ret);
> +
> +		hyperv_io_tlb_remap = ioremap_cache(hyperv_io_tlb_start
> +					    + ms_hyperv.shared_gpa_boundary,
> +						    hyperv_io_tlb_size);
> +		if (!hyperv_io_tlb_remap)
> +			panic("%s: Fail to remap io tlb.\n", __func__);
> +
> +		memset(hyperv_io_tlb_remap, 0x00, hyperv_io_tlb_size);
> +		swiotlb_set_bounce_remap(hyperv_io_tlb_remap);

And this really needs to go into a common hook where we currently just
call set_memory_decrypted so that all the different schemes for these
trusted VMs (we have about half a dozen now) can share code rather than
reinventing it.

Robin Murphy May 12, 2021, 5:29 p.m. UTC | #2

On 2021-05-12 17:01, Tianyu Lan wrote:
> Hi Christoph and Konrad:
>       Current Swiotlb bounce buffer uses a pool for all devices. There
> is a high overhead to get or free bounce buffer during performance test.
> Swiotlb code now use a global spin lock to protect bounce buffer data.
> Several device queues try to acquire the spin lock and this introduce
> additional overhead.
> 
> For performance and security perspective, each devices should have a
> separate swiotlb bounce buffer pool and so this part needs to rework.
> I want to check this is right way to resolve performance issues with 
> swiotlb bounce buffer. If you have some other suggestions,welcome.

We're already well on the way to factoring out SWIOTLB to allow for just 
this sort of more flexible usage like per-device bounce pools - see here:

https://lore.kernel.org/linux-iommu/20210510095026.3477496-1-tientzu@chromium.org/T/#t

FWIW this looks to have an awful lot in common with what's going to be 
needed for Android's protected KVM and Arm's Confidential Compute 
Architecture, so we'll all be better off by doing it right. I'm getting 
the feeling that set_memory_decrypted() wants to grow into a more 
general abstraction that can return an alias at a different GPA if 
necessary.

Robin.

> 
> Thanks.
> 
> On 4/14/2021 11:47 PM, Christoph Hellwig wrote:
>>> +static dma_addr_t hyperv_map_page(struct device *dev, struct page 
>>> *page,
>>> +                  unsigned long offset, size_t size,
>>> +                  enum dma_data_direction dir,
>>> +                  unsigned long attrs)
>>> +{
>>> +    phys_addr_t map, phys = (page_to_pfn(page) << PAGE_SHIFT) + offset;
>>> +
>>> +    if (!hv_is_isolation_supported())
>>> +        return phys;
>>> +
>>> +    map = swiotlb_tbl_map_single(dev, phys, size, HV_HYP_PAGE_SIZE, 
>>> dir,
>>> +                     attrs);
>>> +    if (map == (phys_addr_t)DMA_MAPPING_ERROR)
>>> +        return DMA_MAPPING_ERROR;
>>> +
>>> +    return map;
>>> +}
>>
>> This largerly duplicates what dma-direct + swiotlb does.  Please use
>> force_dma_unencrypted to force bounce buffering and just use the generic
>> code.
>>
>>> +    if (hv_isolation_type_snp()) {
>>> +        ret = hv_set_mem_host_visibility(
>>> +                phys_to_virt(hyperv_io_tlb_start),
>>> +                hyperv_io_tlb_size,
>>> +                VMBUS_PAGE_VISIBLE_READ_WRITE);
>>> +        if (ret)
>>> +            panic("%s: Fail to mark Hyper-v swiotlb buffer visible 
>>> to host. err=%d\n",
>>> +                  __func__, ret);
>>> +
>>> +        hyperv_io_tlb_remap = ioremap_cache(hyperv_io_tlb_start
>>> +                        + ms_hyperv.shared_gpa_boundary,
>>> +                            hyperv_io_tlb_size);
>>> +        if (!hyperv_io_tlb_remap)
>>> +            panic("%s: Fail to remap io tlb.\n", __func__);
>>> +
>>> +        memset(hyperv_io_tlb_remap, 0x00, hyperv_io_tlb_size);
>>> +        swiotlb_set_bounce_remap(hyperv_io_tlb_remap);
>>
>> And this really needs to go into a common hook where we currently just
>> call set_memory_decrypted so that all the different schemes for these
>> trusted VMs (we have about half a dozen now) can share code rather than
>> reinventing it.
>>

Baolu Lu May 13, 2021, 3:19 a.m. UTC | #3

On 5/13/21 12:01 AM, Tianyu Lan wrote:
> Hi Christoph and Konrad:

>       Current Swiotlb bounce buffer uses a pool for all devices. There

> is a high overhead to get or free bounce buffer during performance test.

> Swiotlb code now use a global spin lock to protect bounce buffer data.

> Several device queues try to acquire the spin lock and this introduce

> additional overhead.

> 

> For performance and security perspective, each devices should have a

> separate swiotlb bounce buffer pool and so this part needs to rework.

> I want to check this is right way to resolve performance issues with 

> swiotlb bounce buffer. If you have some other suggestions,welcome.


Is this what you want?

https://lore.kernel.org/linux-iommu/20210510095026.3477496-1-tientzu@chromium.org/

Best regards,
baolu

diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index c2cfa5e7c152..caaf68c06f24 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -15,6 +15,7 @@ 
 #include <asm/iommu_table.h>
 
 int swiotlb __read_mostly;
+extern int hyperv_swiotlb;
 
 /*
  * pci_swiotlb_detect_override - set swiotlb to 1 if necessary
@@ -68,7 +69,7 @@  void __init pci_swiotlb_init(void)
 void __init pci_swiotlb_late_init(void)
 {
 	/* An IOMMU turned us off. */
-	if (!swiotlb)
+	if (!swiotlb && !hyperv_swiotlb)
 		swiotlb_exit();
 	else {
 		printk(KERN_INFO "PCI-DMA: "
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 10dce9f91216..0ee6ec3a5de6 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -23,6 +23,7 @@ 
 #include <linux/cpu.h>
 #include <linux/sched/task_stack.h>
 
+#include <linux/dma-map-ops.h>
 #include <linux/delay.h>
 #include <linux/notifier.h>
 #include <linux/ptrace.h>
@@ -2030,6 +2031,7 @@  struct hv_device *vmbus_device_create(const guid_t *type,
 	return child_device_obj;
 }
 
+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
 /*
  * vmbus_device_register - Register the child device
  */
@@ -2070,6 +2072,7 @@  int vmbus_device_register(struct hv_device *child_device_obj)
 	}
 	hv_debug_add_dev_dir(child_device_obj);
 
+	child_device_obj->device.dma_mask = &vmbus_dma_mask;
 	return 0;
 
 err_kset_unregister:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e285a220c913..588ba847f0cc 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -13,19 +13,28 @@ 
 #include <linux/irq.h>
 #include <linux/iommu.h>
 #include <linux/module.h>
+#include <linux/hyperv.h>
 
+#include <asm/io.h>
 #include <asm/apic.h>
 #include <asm/cpu.h>
 #include <asm/hw_irq.h>
 #include <asm/io_apic.h>
+#include <asm/iommu.h>
+#include <asm/iommu_table.h>
 #include <asm/irq_remapping.h>
 #include <asm/hypervisor.h>
 #include <asm/mshyperv.h>
+#include <asm/swiotlb.h>
+#include <linux/dma-map-ops.h>
+#include <linux/dma-direct.h>
 
 #include "irq_remapping.h"
 
 #ifdef CONFIG_IRQ_REMAP
 
+int hyperv_swiotlb __read_mostly;
+
 /*
  * According 82093AA IO-APIC spec , IO APIC has a 24-entry Interrupt
  * Redirection Table. Hyper-V exposes one single IO-APIC and so define
@@ -36,6 +45,10 @@ 
 static cpumask_t ioapic_max_cpumask = { CPU_BITS_NONE };
 static struct irq_domain *ioapic_ir_domain;
 
+static unsigned long hyperv_io_tlb_start, *hyperv_io_tlb_end; 
+static unsigned long hyperv_io_tlb_nslabs, hyperv_io_tlb_size;
+static void *hyperv_io_tlb_remap;
+
 static int hyperv_ir_set_affinity(struct irq_data *data,
 		const struct cpumask *mask, bool force)
 {
@@ -337,4 +350,118 @@  static const struct irq_domain_ops hyperv_root_ir_domain_ops = {
 	.free = hyperv_root_irq_remapping_free,
 };
 
+static dma_addr_t hyperv_map_page(struct device *dev, struct page *page,
+				  unsigned long offset, size_t size,
+				  enum dma_data_direction dir,
+				  unsigned long attrs)
+{
+	phys_addr_t map, phys = (page_to_pfn(page) << PAGE_SHIFT) + offset;
+
+	if (!hv_is_isolation_supported())
+		return phys;
+
+	map = swiotlb_tbl_map_single(dev, phys, size, HV_HYP_PAGE_SIZE, dir,
+				     attrs);
+	if (map == (phys_addr_t)DMA_MAPPING_ERROR)
+		return DMA_MAPPING_ERROR;
+
+	return map;
+}
+
+static void hyperv_unmap_page(struct device *dev, dma_addr_t dev_addr,
+		size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+	if (!hv_is_isolation_supported())
+		return;
+
+	swiotlb_tbl_unmap_single(dev, dev_addr, size, HV_HYP_PAGE_SIZE, dir,
+				attrs);
+}
+
+int __init hyperv_swiotlb_init(void)
+{
+	unsigned long bytes;
+	void *vstart = 0;
+
+	bytes = 200 * 1024 * 1024;
+	vstart = memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);
+	hyperv_io_tlb_nslabs = bytes >> IO_TLB_SHIFT;
+	hyperv_io_tlb_size = bytes;
+
+	if (!vstart) {
+		pr_warn("Fail to allocate swiotlb.\n");
+		return -ENOMEM;
+	}
+
+	hyperv_io_tlb_start = virt_to_phys(vstart);
+	if (!hyperv_io_tlb_start)
+		panic("%s: Failed to allocate %lu bytes align=0x%lx.\n",
+		      __func__, PAGE_ALIGN(bytes), PAGE_SIZE);
+
+	if (swiotlb_init_with_tbl(vstart, hyperv_io_tlb_nslabs, 1))
+		panic("%s: Cannot allocate SWIOTLB buffer.\n", __func__);
+
+	swiotlb_set_max_segment(PAGE_SIZE);
+	hyperv_io_tlb_end = hyperv_io_tlb_start + bytes;
+	return 0;
+}
+
+const struct dma_map_ops hyperv_dma_ops = {
+	.map_page = hyperv_map_page,
+	.unmap_page = hyperv_unmap_page,
+};
+
+int __init hyperv_swiotlb_detect(void)
+{
+	dma_ops = &hyperv_dma_ops;
+
+	if (hypervisor_is_type(X86_HYPER_MS_HYPERV)
+	    && hv_is_isolation_supported()) {
+		/*
+		 * Disable generic swiotlb and allocate Hyper-v swiotlb
+		 * in the hyperv_iommu_swiotlb_init().
+		 */
+		swiotlb = 0;
+		hyperv_swiotlb = 1;
+
+		return 1;
+	}
+
+	return 0;
+}
+
+void __init hyperv_iommu_swiotlb_init(void)
+{
+	hyperv_swiotlb_init();
+}
+
+void __init hyperv_iommu_swiotlb_later_init(void)
+{
+	int ret;
+
+	/* Mask bounce buffer visible to host and remap extra address. */
+	if (hv_isolation_type_snp()) {
+		ret = hv_set_mem_host_visibility(
+				phys_to_virt(hyperv_io_tlb_start),
+				hyperv_io_tlb_size,
+				VMBUS_PAGE_VISIBLE_READ_WRITE);
+		if (ret)
+			panic("%s: Fail to mark Hyper-v swiotlb buffer visible to host. err=%d\n",
+			      __func__, ret);
+
+		hyperv_io_tlb_remap = ioremap_cache(hyperv_io_tlb_start
+					    + ms_hyperv.shared_gpa_boundary,
+						    hyperv_io_tlb_size);
+		if (!hyperv_io_tlb_remap)
+			panic("%s: Fail to remap io tlb.\n", __func__);
+
+		memset(hyperv_io_tlb_remap, 0x00, hyperv_io_tlb_size);
+		swiotlb_set_bounce_remap(hyperv_io_tlb_remap);
+	}
+}
+
+IOMMU_INIT_FINISH(hyperv_swiotlb_detect,
+		  NULL, hyperv_iommu_swiotlb_init,
+		  hyperv_iommu_swiotlb_later_init);
+
 #endif

[Resend,RFC,V2,10/12] HV/IOMMU: Add Hyper-V dma ops support

Commit Message

Comments

Patch