mbox series

[v3,0/6] iommu/iova: improve the allocation performance of dma64

Message ID 1495094397-9132-1-git-send-email-thunder.leizhen@huawei.com
Headers show
Series iommu/iova: improve the allocation performance of dma64 | expand

Message

Zhen Lei May 18, 2017, 7:59 a.m. UTC
v2 -> v3:
It's been a long time. I have not received any advise except Robin Murphy's. So
the major changes is just deleted an old patch ("iommu/iova: fix incorrect variable types")
and merged it into patch 5 of this version.

v1 -> v2:
Because the problem of my email-server, all patches sent to Joerg Roedel <joro@8bytes.org> failed.
So I repost all these patches again, there is no changes.

v1:
64 bits devices is very common now. But currently we only defined a cached32_node
to optimize the allocation performance of dma32, and I saw some dma64 drivers chose
to allocate iova from dma32 space first, maybe becuase of current dma64 performance
problem or some other reasons.

For example:(in drivers/iommu/amd_iommu.c)
static unsigned long dma_ops_alloc_iova(......
{
	......
	if (dma_mask > DMA_BIT_MASK(32))
		pfn = alloc_iova_fast(&dma_dom->iovad, pages,
				      IOVA_PFN(DMA_BIT_MASK(32)));
	if (!pfn)
		pfn = alloc_iova_fast(&dma_dom->iovad, pages, IOVA_PFN(dma_mask));
		
For the details of why dma64 iova allocation performance is very bad, please refer the
description of patch-5.

In this patch series, I added a cached64_node to manage the dma64 iova space(iova>=4G), it
takes the same effect as cached32_node(iova<4G).

Below it's the performance data before and after my patch series:
(before)$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 35898
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.2 sec  7.88 MBytes  6.48 Mbits/sec
[  5] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 35900
[  5]  0.0-10.3 sec  7.88 MBytes  6.43 Mbits/sec
[  4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 35902
[  4]  0.0-10.3 sec  7.88 MBytes  6.43 Mbits/sec

(after)$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 36330
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  1.09 GBytes   933 Mbits/sec
[  5] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 36332
[  5]  0.0-10.0 sec  1.10 GBytes   939 Mbits/sec
[  4] local 192.168.1.106 port 5001 connected with 192.168.1.198 port 36334
[  4]  0.0-10.0 sec  1.10 GBytes   938 Mbits/sec

Zhen Lei (6):
  iommu/iova: cut down judgement times
  iommu/iova: insert start_pfn boundary of dma32
  iommu/iova: adjust __cached_rbnode_insert_update
  iommu/iova: to optimize the allocation performance of dma64
  iommu/iova: move the caculation of pad mask out of loop
  iommu/iova: fix iovad->dma_32bit_pfn as the last pfn of dma32

 drivers/iommu/amd_iommu.c        |   7 +-
 drivers/iommu/dma-iommu.c        |  21 ++----
 drivers/iommu/intel-iommu.c      |  11 +--
 drivers/iommu/iova.c             | 143 +++++++++++++++++++++------------------
 drivers/misc/mic/scif/scif_rma.c |   3 +-
 include/linux/iova.h             |   7 +-
 6 files changed, 93 insertions(+), 99 deletions(-)

-- 
2.5.0