mbox series

[0/6] dma mapping/iommu: Allow IOMMU IOVA rcache range to be configured

Message ID 1616160348-29451-1-git-send-email-john.garry@huawei.com
Headers show
Series dma mapping/iommu: Allow IOMMU IOVA rcache range to be configured | expand

Message

John Garry March 19, 2021, 1:25 p.m. UTC
For streaming DMA mappings involving an IOMMU and whose IOVA len regularly
exceeds the IOVA rcache upper limit (meaning that they are not cached),
performance can be reduced. 

This is much more pronounced from commit 4e89dce72521 ("iommu/iova: Retry
from last rb tree node if iova search fails"), as discussed at [0].

IOVAs which cannot be cached are highly involved in the IOVA aging issue,
as discussed at [1].

This series attempts to allow the device driver hint what upper limit its
DMA mapping IOVA lengths would be, so that the caching range may be
increased.

Some figures on storage scenario:
v5.12-rc3 baseline:			600K IOPS
With series:				1300K IOPS
With reverting 4e89dce72521: 		1250K IOPS

All above are for IOMMU strict mode. Non-strict mode gives ~1750K IOPS in
all scenarios.

I will say that APIs and their semantics are a bit ropey - any better
ideas welcome...

[0] https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leizhen@huawei.com/
[1] https://lore.kernel.org/linux-iommu/1607538189-237944-1-git-send-email-john.garry@huawei.com/

John Garry (6):
  iommu: Move IOVA power-of-2 roundup into allocator
  iova: Add a per-domain count of reserved nodes
  iova: Allow rcache range upper limit to be configurable
  iommu: Add iommu_dma_set_opt_size()
  dma-mapping/iommu: Add dma_set_max_opt_size()
  scsi: hisi_sas: Set max optimal DMA size for v3 hw

 drivers/iommu/dma-iommu.c              | 23 ++++---
 drivers/iommu/iova.c                   | 88 ++++++++++++++++++++------
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c |  2 +
 include/linux/dma-map-ops.h            |  1 +
 include/linux/dma-mapping.h            |  5 ++
 include/linux/iova.h                   | 12 +++-
 kernel/dma/mapping.c                   | 11 ++++
 7 files changed, 115 insertions(+), 27 deletions(-)

-- 
2.26.2

Comments

Christoph Hellwig March 19, 2021, 1:40 p.m. UTC | #1
On Fri, Mar 19, 2021 at 09:25:42PM +0800, John Garry wrote:
> For streaming DMA mappings involving an IOMMU and whose IOVA len regularly

> exceeds the IOVA rcache upper limit (meaning that they are not cached),

> performance can be reduced. 

> 

> This is much more pronounced from commit 4e89dce72521 ("iommu/iova: Retry

> from last rb tree node if iova search fails"), as discussed at [0].

> 

> IOVAs which cannot be cached are highly involved in the IOVA aging issue,

> as discussed at [1].


I'm confused.  If this a limit in the IOVA allocator, dma-iommu should
be able to just not grow the allocation so larger without help from
the driver.

If contrary to the above description it is device-specific, the driver
could simply use dma_get_max_seg_size().
John Garry March 19, 2021, 3:42 p.m. UTC | #2
On 19/03/2021 13:40, Christoph Hellwig wrote:
> On Fri, Mar 19, 2021 at 09:25:42PM +0800, John Garry wrote:

>> For streaming DMA mappings involving an IOMMU and whose IOVA len regularly

>> exceeds the IOVA rcache upper limit (meaning that they are not cached),

>> performance can be reduced.

>>

>> This is much more pronounced from commit 4e89dce72521 ("iommu/iova: Retry

>> from last rb tree node if iova search fails"), as discussed at [0].

>>

>> IOVAs which cannot be cached are highly involved in the IOVA aging issue,

>> as discussed at [1].

> 

> I'm confused.  If this a limit in the IOVA allocator, dma-iommu should

> be able to just not grow the allocation so larger without help from

> the driver.


This is not an issue with the IOVA allocator.

The issue is with how the IOVA code handles caching of IOVAs. 
Specifically, when we DMA unmap, for an IOVA whose length is above a 
fixed threshold, the IOVA is freed, rather than being cached. See 
free_iova_fast().

For performance reasons, I want that threshold increased for my driver 
to avail of the caching of all lengths of IOVA which we may see - 
currently we see IOVAs whose length exceeds that threshold. But it may 
not be good to increase that threshold for everyone.

 > If contrary to the above description it is device-specific, the driver

 > could simply use dma_get_max_seg_size().

 > .

 >


But that is for a single segment, right? Is there something equivalent 
to tell how many scatter-gather elements which we may generate, like 
scsi_host_template.sg_tablesize?

Thanks,
John