mbox series

[v5,0/5] DMA mapping changes for SCSI core

Message ID 1656590892-42307-1-git-send-email-john.garry@huawei.com
Headers show
Series DMA mapping changes for SCSI core | expand

Message

John Garry June 30, 2022, 12:08 p.m. UTC
As reported in [0], DMA mappings whose size exceeds the IOMMU IOVA caching
limit may see a big performance hit.

This series introduces a new DMA mapping API, dma_opt_mapping_size(), so
that drivers may know this limit when performance is a factor in the
mapping.

The SCSI SAS transport code is modified only to use this limit. For now I
did not want to touch other hosts as I have a concern that this change
could cause a performance regression.

I also added a patch for libata-scsi as it does not currently honour the
shost max_sectors limit.

[0] https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leizhen@huawei.com/

Changes since v4:
- tweak libata and other patch titles
- Add Robin's tag (thanks!)
- Clarify description of new DMA mapping API

Changes since v3:
- Apply max DMA optimial limit to SAS hosts only
  Note: Even though "scsi: core: Cap shost max_sectors only once when
  adding" is a subset of a previous patch I did not transfer the RB tags
- Rebase on v5.19-rc4

John Garry (5):
  dma-mapping: Add dma_opt_mapping_size()
  dma-iommu: Add iommu_dma_opt_mapping_size()
  scsi: core: Cap shost max_sectors according to DMA limits only once
  scsi: scsi_transport_sas: Cap shost max_sectors according to DMA
    optimal limit
  ata: libata-scsi: Cap ata_device->max_sectors according to
    shost->max_sectors

 Documentation/core-api/dma-api.rst | 14 ++++++++++++++
 drivers/ata/libata-scsi.c          |  1 +
 drivers/iommu/dma-iommu.c          |  6 ++++++
 drivers/iommu/iova.c               |  5 +++++
 drivers/scsi/hosts.c               |  5 +++++
 drivers/scsi/scsi_lib.c            |  4 ----
 drivers/scsi/scsi_transport_sas.c  |  6 ++++++
 include/linux/dma-map-ops.h        |  1 +
 include/linux/dma-mapping.h        |  5 +++++
 include/linux/iova.h               |  2 ++
 kernel/dma/mapping.c               | 12 ++++++++++++
 11 files changed, 57 insertions(+), 4 deletions(-)

Comments

John Garry July 1, 2022, 8:02 a.m. UTC | #1
On 01/07/2022 00:41, Damien Le Moal wrote:
>>   
>>   	shost->dma_dev = dma_dev;
>>   
>> +	if (dma_dev->dma_mask) {
>> +		shost->max_sectors = min_t(unsigned int, shost->max_sectors,
>> +				dma_max_mapping_size(dma_dev) >> SECTOR_SHIFT);
>> +	}
> Nit: you could remove the curly brackets... But it being a multi-line
> statement, having them is OK too I think.
> 

tglx seems to think that they are ok, and I generally agree (now):

https://lore.kernel.org/linux-arm-kernel/877djwdorz.ffs@nanos.tec.linutronix.de/

AFAICT coding-style.rst is ok with them in this scenario too

Cheers,
John
John Garry July 6, 2022, 1:40 p.m. UTC | #2
On 30/06/2022 13:08, John Garry wrote:

Hi Christoph,

Can you please consider picking up this series? A few things to note 
beforehand:

- I changed to only apply the mapping limit to SAS hosts in this 
version. I would need a fresh ack from Martin for those SCSI parts, but 
wanted to make sure you were ok with it.
- Damien had some doubt on updating the shost max_sectors as opposed to 
the per-request queue default, but I think he's ok with it - see patch 4/5

Thanks,
John


> As reported in [0], DMA mappings whose size exceeds the IOMMU IOVA caching
> limit may see a big performance hit.
> 
> This series introduces a new DMA mapping API, dma_opt_mapping_size(), so
> that drivers may know this limit when performance is a factor in the
> mapping.
> 
> The SCSI SAS transport code is modified only to use this limit. For now I
> did not want to touch other hosts as I have a concern that this change
> could cause a performance regression.
> 
> I also added a patch for libata-scsi as it does not currently honour the
> shost max_sectors limit.
> 
> [0]https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leizhen@huawei.com/
> 
> Changes since v4:
> - tweak libata and other patch titles
> - Add Robin's tag (thanks!)
> - Clarify description of new DMA mapping API
Martin K. Petersen July 7, 2022, 8:35 p.m. UTC | #3
Christoph,

> Yes, I've mostly been waiting for an ACK from Martin.

Sorry, I'm on vacation this week. The series looks OK to me although I
do agree that it would be great if the max was reflected in the queue's
hard limit and opt in the soft limit.

Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
John Garry July 8, 2022, 4:17 p.m. UTC | #4
On 07/07/2022 21:35, Martin K. Petersen wrote:
> Christoph,
> 
>> Yes, I've mostly been waiting for an ACK from Martin.
> Sorry, I'm on vacation this week. The series looks OK to me although I
> do agree that it would be great if the max was reflected in the queue's
> hard limit and opt in the soft limit.

Ah, I think that I misunderstood Damien's question. I thought he was 
asking why not keep shost max_sectors at dma_max_mapping_size() and then 
init each sdev request queue max hw sectors at dma_opt_mapping_size().

But he seems that you want to know why not have the request queue max 
sectors at dma_opt_mapping_size(). The answer is related to meaning of 
dma_opt_mapping_size(). If we get any mappings which exceed this size 
then it can have a big dma mapping performance hit. So I set max hw 
sectors at this ‘opt’ mapping size to ensure that we get no mappings 
which exceed this size. Indeed, I think max sectors is 128Kb today for 
my host, which would be same as dma_opt_mapping_size() value with an 
IOMMU enabled. And I find that only a small % of request size may exceed 
this 128kb size, but it still has a big performance impact.

> 
> Acked-by: Martin K. Petersen<martin.petersen@oracle.com>

Thanks,
John
Damien Le Moal July 10, 2022, 11:08 p.m. UTC | #5
On 7/9/22 01:17, John Garry wrote:
> On 07/07/2022 21:35, Martin K. Petersen wrote:
>> Christoph,
>>
>>> Yes, I've mostly been waiting for an ACK from Martin.
>> Sorry, I'm on vacation this week. The series looks OK to me although I
>> do agree that it would be great if the max was reflected in the queue's
>> hard limit and opt in the soft limit.
> 
> Ah, I think that I misunderstood Damien's question. I thought he was 
> asking why not keep shost max_sectors at dma_max_mapping_size() and then 
> init each sdev request queue max hw sectors at dma_opt_mapping_size().

I was suggesting the reverse :) Keep the device hard limit
(max_hw_sectors) to the max dma mapping and set the soft limit
(max_sectors) to the optimal dma mapping size.

> 
> But he seems that you want to know why not have the request queue max 
> sectors at dma_opt_mapping_size(). The answer is related to meaning of 
> dma_opt_mapping_size(). If we get any mappings which exceed this size 
> then it can have a big dma mapping performance hit. So I set max hw 
> sectors at this ‘opt’ mapping size to ensure that we get no mappings 
> which exceed this size. Indeed, I think max sectors is 128Kb today for 
> my host, which would be same as dma_opt_mapping_size() value with an 
> IOMMU enabled. And I find that only a small % of request size may exceed 
> this 128kb size, but it still has a big performance impact.
> 
>>
>> Acked-by: Martin K. Petersen<martin.petersen@oracle.com>
> 
> Thanks,
> John
John Garry July 11, 2022, 7:36 a.m. UTC | #6
On 11/07/2022 00:08, Damien Le Moal wrote:
>> Ah, I think that I misunderstood Damien's question. I thought he was
>> asking why not keep shost max_sectors at dma_max_mapping_size() and then
>> init each sdev request queue max hw sectors at dma_opt_mapping_size().
> I was suggesting the reverse:)  Keep the device hard limit
> (max_hw_sectors) to the max dma mapping and set the soft limit
> (max_sectors) to the optimal dma mapping size.

Sure, but as I mentioned below, I only see a small % of requests whose 
mapping size exceeds max_sectors but that still causes a big performance 
hit. So that is why I want to set the hard limit as the optimal dma 
mapping size.

Indeed, the IOMMU IOVA caching limit is already the same as default 
max_sectors for the disks in my system - 128Kb for 4k page size.

> 
>> But he seems that you want to know why not have the request queue max
>> sectors at dma_opt_mapping_size(). The answer is related to meaning of
>> dma_opt_mapping_size(). If we get any mappings which exceed this size
>> then it can have a big dma mapping performance hit. So I set max hw
>> sectors at this ‘opt’ mapping size to ensure that we get no mappings
>> which exceed this size. Indeed, I think max sectors is 128Kb today for
>> my host, which would be same as dma_opt_mapping_size() value with an
>> IOMMU enabled. And I find that only a small % of request size may exceed
>> this 128kb size, but it still has a big performance impact.
>>

Thanks,
John
Damien Le Moal July 11, 2022, 10:40 a.m. UTC | #7
On 7/11/22 16:36, John Garry wrote:
> On 11/07/2022 00:08, Damien Le Moal wrote:
>>> Ah, I think that I misunderstood Damien's question. I thought he was
>>> asking why not keep shost max_sectors at dma_max_mapping_size() and then
>>> init each sdev request queue max hw sectors at dma_opt_mapping_size().
>> I was suggesting the reverse:)  Keep the device hard limit
>> (max_hw_sectors) to the max dma mapping and set the soft limit
>> (max_sectors) to the optimal dma mapping size.
> 
> Sure, but as I mentioned below, I only see a small % of requests whose 
> mapping size exceeds max_sectors but that still causes a big performance 
> hit. So that is why I want to set the hard limit as the optimal dma 
> mapping size.

How can you possibly end-up with requests larger than max_sectors ? BIO
split is done using this limit, right ? Or is it that request merging is
allowed up to max_hw_sectors even if the resulting request size exceeds
max_sectors ?

> 
> Indeed, the IOMMU IOVA caching limit is already the same as default 
> max_sectors for the disks in my system - 128Kb for 4k page size.
> 
>>
>>> But he seems that you want to know why not have the request queue max
>>> sectors at dma_opt_mapping_size(). The answer is related to meaning of
>>> dma_opt_mapping_size(). If we get any mappings which exceed this size
>>> then it can have a big dma mapping performance hit. So I set max hw
>>> sectors at this ‘opt’ mapping size to ensure that we get no mappings
>>> which exceed this size. Indeed, I think max sectors is 128Kb today for
>>> my host, which would be same as dma_opt_mapping_size() value with an
>>> IOMMU enabled. And I find that only a small % of request size may exceed
>>> this 128kb size, but it still has a big performance impact.
>>>
> 
> Thanks,
> John
John Garry July 11, 2022, 2:49 p.m. UTC | #8
On 11/07/2022 11:40, Damien Le Moal wrote:
> On 7/11/22 16:36, John Garry wrote:
>> On 11/07/2022 00:08, Damien Le Moal wrote:
>>>> Ah, I think that I misunderstood Damien's question. I thought he was
>>>> asking why not keep shost max_sectors at dma_max_mapping_size() and then
>>>> init each sdev request queue max hw sectors at dma_opt_mapping_size().
>>> I was suggesting the reverse:)  Keep the device hard limit
>>> (max_hw_sectors) to the max dma mapping and set the soft limit
>>> (max_sectors) to the optimal dma mapping size.
>>
>> Sure, but as I mentioned below, I only see a small % of requests whose
>> mapping size exceeds max_sectors but that still causes a big performance
>> hit. So that is why I want to set the hard limit as the optimal dma
>> mapping size.
> 
> How can you possibly end-up with requests larger than max_sectors ? BIO
> split is done using this limit, right ? Or is it that request merging is
> allowed up to max_hw_sectors even if the resulting request size exceeds
> max_sectors ?
> 

Ah, I see how I thought that I was seeing requests whose size exceeded 
max_sectors. Somebody must have changed a single disk in my system and 
this odd disk has a higher default max_sectors_kb -- 512 vs 128 for the 
rest.

So ignoring my nonesence that I was seeing oversize requests, as for the 
idea to set default max_sectors at dma_opt_mapping_size(), I see some 
issues:
- for SAS disks I have no common point to impose this limit. Maybe in 
the slave configure callback, but each SAS driver has its own 
implementation generally
- Even if we do config in slave_configure callback the max_sectors value 
is overwritten later in sd_revalidate_disk().

This following change could sort the issue though:

---8<----
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 895b56c8f25e..bb49bea3d161 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3214,6 +3214,8 @@ static int sd_revalidate_disk(struct gendisk *disk)
         sector_t old_capacity = sdkp->capacity;
         unsigned char *buffer;
         unsigned int dev_max, rw_max;
+       struct Scsi_Host *host = sdp->host;
+       struct device *dev = host->dma_dev;

         SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp,
                                       "sd_revalidate_disk\n"));
@@ -3296,8 +3298,13 @@ static int sd_revalidate_disk(struct gendisk *disk)
                                       (sector_t)BLK_DEF_MAX_SECTORS);
         }

-       /* Do not exceed controller limit */
-       rw_max = min(rw_max, queue_max_hw_sectors(q));
+       if (dev->dma_mask) {
+               /* Do not exceed dma optimal limit */
+               rw_max = min_t(unsigned int, rw_max,
+                               dma_opt_mapping_size(dev) >> SECTOR_SHIFT);
+       } else {
+               rw_max = min(rw_max, queue_max_hw_sectors(q));
+       }

         /*
          * Only update max_sectors if previously unset or if the 
current value

--->8---

Or I could go with the method in this series, which is not preferred. 
Let me know what you think.

Thanks,
John
Martin K. Petersen July 14, 2022, 3:10 a.m. UTC | #9
John,

> So I set max hw sectors at this ‘opt’ mapping size to ensure that we
> get no mappings which exceed this size. Indeed, I think max sectors is
> 128Kb today for my host, which would be same as dma_opt_mapping_size()
> value with an IOMMU enabled. And I find that only a small % of request
> size may exceed this 128kb size, but it still has a big performance
> impact.

The purpose of the soft limit is to pick the appropriate I/O size
(i.e. for best performance). The purpose of the hard limit is to ensure
we don't submit something the hardware can't handle or describe.

IOW, the hard limit is not about performance at all. The hard limit is
mainly relevant for things that are way bigger than anything we'd issue
as regular filesystem I/O such as multi-megabyte firmware images, etc.

It's perfectly fine for firmware download performance to be
"suboptimal". What is typically more important in that scenario is that
the firmware image makes it inside a single I/O.
John Garry July 14, 2022, 7:52 a.m. UTC | #10
On 14/07/2022 04:10, Martin K. Petersen wrote:

Hi Martin,

>> So I set max hw sectors at this ‘opt’ mapping size to ensure that we
>> get no mappings which exceed this size. Indeed, I think max sectors is
>> 128Kb today for my host, which would be same as dma_opt_mapping_size()
>> value with an IOMMU enabled. And I find that only a small % of request
>> size may exceed this 128kb size, but it still has a big performance
>> impact.
> The purpose of the soft limit is to pick the appropriate I/O size
> (i.e. for best performance). The purpose of the hard limit is to ensure
> we don't submit something the hardware can't handle or describe.
> 
> IOW, the hard limit is not about performance at all. The hard limit is
> mainly relevant for things that are way bigger than anything we'd issue
> as regular filesystem I/O such as multi-megabyte firmware images, etc.
> 
> It's perfectly fine for firmware download performance to be
> "suboptimal". What is typically more important in that scenario is that
> the firmware image makes it inside a single I/O.

OK, fine. I've improved the next version such that the DMA mapping opt 
limit only affects the max_sectors default.

Thanks,
John