diff mbox series

[5.10] scsi: hisi_sas: Revert "scsi: hisi_sas: Limit max hw sectors for v3 HW"

Message ID 20220927130116.1013775-1-yukuai3@huawei.com
State New
Headers show
Series [5.10] scsi: hisi_sas: Revert "scsi: hisi_sas: Limit max hw sectors for v3 HW" | expand

Commit Message

Yu Kuai Sept. 27, 2022, 1:01 p.m. UTC
This reverts commit 24cd0b9bfdff126c066032b0d40ab0962d35e777.

1) commit 4e89dce72521 ("iommu/iova: Retry from last rb tree node if
iova search fails") tries to fix that iova allocation can fail while
there are still free space available. This is not backported to 5.10
stable.
2) commit fce54ed02757 ("scsi: hisi_sas: Limit max hw sectors for v3
HW") fix the performance regression introduced by 1), however, this
is just a temporary solution and will cause io performance regression
because it limit max io size to PAGE_SIZE * 32(128k for 4k page_size).
3) John Garry posted a patchset to fix the problem.
4) The temporary solution is reverted.

It's weird that the patch in 2) is backported to 5.10 stable alone,
while the right thing to do is to backport them all together.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 7 -------
 1 file changed, 7 deletions(-)

Comments

John Garry Sept. 27, 2022, 1:06 p.m. UTC | #1
On 27/09/2022 14:01, Yu Kuai wrote:
> This reverts commit 24cd0b9bfdff126c066032b0d40ab0962d35e777.
> 
> 1) commit 4e89dce72521 ("iommu/iova: Retry from last rb tree node if
> iova search fails") tries to fix that iova allocation can fail while
> there are still free space available. This is not backported to 5.10
> stable.

This arrived in 5.11, I think

> 2) commit fce54ed02757 ("scsi: hisi_sas: Limit max hw sectors for v3
> HW") fix the performance regression introduced by 1), however, this
> is just a temporary solution and will cause io performance regression
> because it limit max io size to PAGE_SIZE * 32(128k for 4k page_size).

Did you really notice a performance regression? In what scenario? which 
kernel versions?

> 3) John Garry posted a patchset to fix the problem.
> 4) The temporary solution is reverted.
> 


> It's weird that the patch in 2) is backported to 5.10 stable alone,
> while the right thing to do is to backport them all together.

I would tend to agree. I did not notice fce54ed02757 backported at all. 
But I did consider backporting it to address 4e89dce72521. Anyway, the 
proper solution is merged for 6.0 in 4cbfca5f7750 ("scsi: 
scsi_transport_sas: cap shost opt_sectors according to DMA optimal 
limit") and I have a revert of "scsi: hisi_sas: Limit max hw sectors for 
v3 HW" queued for 6.1, but I would not plan on reverting for stable.

Please let me know if any issue here.

Thanks,
John

> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>   drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 7 -------
>   1 file changed, 7 deletions(-)
> 
> diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
> index dfe7e6370d84..cd41dc061d87 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
> @@ -2738,7 +2738,6 @@ static int slave_configure_v3_hw(struct scsi_device *sdev)
>   	struct hisi_hba *hisi_hba = shost_priv(shost);
>   	struct device *dev = hisi_hba->dev;
>   	int ret = sas_slave_configure(sdev);
> -	unsigned int max_sectors;
>   
>   	if (ret)
>   		return ret;
> @@ -2756,12 +2755,6 @@ static int slave_configure_v3_hw(struct scsi_device *sdev)
>   		}
>   	}
>   
> -	/* Set according to IOMMU IOVA caching limit */
> -	max_sectors = min_t(size_t, queue_max_hw_sectors(sdev->request_queue),
> -			    (PAGE_SIZE * 32) >> SECTOR_SHIFT);
> -
> -	blk_queue_max_hw_sectors(sdev->request_queue, max_sectors);
> -
>   	return 0;
>   }
>
Yu Kuai Sept. 27, 2022, 2:05 p.m. UTC | #2
Hi, John

在 2022/09/27 21:45, John Garry 写道:
> On 27/09/2022 14:14, Yu Kuai wrote:
>> Hi, John
>>
>> 在 2022/09/27 21:06, John Garry 写道:
>>> On 27/09/2022 14:01, Yu Kuai wrote:
>>>> This reverts commit 24cd0b9bfdff126c066032b0d40ab0962d35e777.
>>>>
>>>> 1) commit 4e89dce72521 ("iommu/iova: Retry from last rb tree node if
>>>> iova search fails") tries to fix that iova allocation can fail while
>>>> there are still free space available. This is not backported to 5.10
>>>> stable.
>>>
>>> This arrived in 5.11, I think
>>>
>>>> 2) commit fce54ed02757 ("scsi: hisi_sas: Limit max hw sectors for v3
>>>> HW") fix the performance regression introduced by 1), however, this
>>>> is just a temporary solution and will cause io performance regression
>>>> because it limit max io size to PAGE_SIZE * 32(128k for 4k page_size).
>>>
>>> Did you really notice a performance regression? In what scenario? 
>>> which kernel versions?
>>
>> We are using 5.10, and test tool is fs_mark and it's doing writeback,
>> and benefits from io merge, before this patch, avgqusz is 300+, and this
>> patch will limit avgqusz to 128.
> 
> OK, so I think it's ok to revert for 5.10
> 
>>
>> I think that in any other case that io size is greater than 128k, this
>> patch will probably have defects.
> 
> However both 5.15 stable and 5.19 mainline include fce54ed02757 - it was 
> automatically backported for 5.15 stable. Please double check that.
> 
> And can you also check performance there for those kernels?

I'm pretty sure io split can decline performance, especially for HDD,
because blk-mq can't guarantee that split io can be dispatched to disk
sequentially. However, this is usually not common with proper
max_sectors_kb.

Here is an example that if max_sector_kb is 128k, performance will
drop a lot under high concurrency:

https://lore.kernel.org/all/20220408073916.1428590-1-yukuai3@huawei.com/

Here I set max_sectors_kb to 128k manually, and 1m random io performance
will drop while io concurrency increase:

| numjobs | v5.18-rc1 |
| ------- | --------- |
| 1       | 67.7      |
| 2       | 67.7      |
| 4       | 67.7      |
| 8       | 67.7      |
| 16      | 64.8      |
| 32      | 59.8      |
| 64      | 54.9      |
| 128     | 49        |
| 256     | 37.7      |
| 512     | 31.8      |

Thanks,
Kuai
> 
> The reason which we had fce54ed02757 was because 4e89dce72521 hammered 
> performance when IOMMU enabled, and at least I saw no performance 
> regression for fce54ed02757 in other scenarios.
> 
> Thanks,
> John
> 
> 
> 
> 
> 
> .
>
Yu Kuai Sept. 28, 2022, 1:35 a.m. UTC | #3
Hi, John

在 2022/09/27 23:54, John Garry 写道:
> On 27/09/2022 15:05, Yu Kuai wrote:
>>>
>>> However both 5.15 stable and 5.19 mainline include fce54ed02757 - it 
>>> was automatically backported for 5.15 stable. Please double check that.
>>>
>>> And can you also check performance there for those kernels?
>>
>> I'm pretty sure io split can decline performance, especially for HDD,
>> because blk-mq can't guarantee that split io can be dispatched to disk
>> sequentially. However, this is usually not common with proper
>> max_sectors_kb.
>>
>> Here is an example that if max_sector_kb is 128k, performance will
>> drop a lot under high concurrency:
>>
>> https://lore.kernel.org/all/20220408073916.1428590-1-yukuai3@huawei.com/
>>
>> Here I set max_sectors_kb to 128k manually, and 1m random io performance
>> will drop while io concurrency increase:
>>
>> | numjobs | v5.18-rc1 |
>> | ------- | --------- |
>> | 1       | 67.7      |
>> | 2       | 67.7      |
>> | 4       | 67.7      |
>> | 8       | 67.7      |
>> | 16      | 64.8      |
>> | 32      | 59.8      |
>> | 64      | 54.9      |
>> | 128     | 49        |
>> | 256     | 37.7      |
>> | 512     | 31.8      |
> 
> Commit fce54ed02757 was to circumvent a terrible performance hit for 
> IOMMU enabled from 4e89dce72521 - have you ever tested with IOMMU enabled?

I understand that fce54ed02757 fix a terrible performance regression,
and I'm not familiar with IOMMU and I never test that.
> 
> If fce54ed02757 really does cause a performance regression in some 
> scenarios, then we can consider reverting it from any stable kernel and 
> also backporting [0] when it is included in Linus' kernel

That sounds good.

For 5.10 stable, I think it's ok to revert it for now, and if someone
cares about the problem 4e89dce72521 fixed, they can try to backport it
together with follow up patches.

Thanks,
Kuai

> 
> [0] 
> https://lore.kernel.org/linux-iommu/495de02c-59ce-917f-1cb4-5425a37063ed@huawei.com/T/#m6a655d596fdf30e4e8b90100e16f75ae5d67341a 
> 
> 
> thanks,
> John
> .
>
John Garry Sept. 28, 2022, 7:36 a.m. UTC | #4
On 28/09/2022 02:35, Yu Kuai wrote:
>>>> However both 5.15 stable and 5.19 mainline include fce54ed02757 - it 
>>>> was automatically backported for 5.15 stable. Please double check that.
>>>>
>>>> And can you also check performance there for those kernels?
>>>
>>> I'm pretty sure io split can decline performance, especially for HDD,
>>> because blk-mq can't guarantee that split io can be dispatched to disk
>>> sequentially. However, this is usually not common with proper
>>> max_sectors_kb.
>>>
>>> Here is an example that if max_sector_kb is 128k, performance will
>>> drop a lot under high concurrency:
>>>
>>> https://lore.kernel.org/all/20220408073916.1428590-1-yukuai3@huawei.com/

This never got merged in any form, right?

>>>
>>> Here I set max_sectors_kb to 128k manually, and 1m random io performance
>>> will drop while io concurrency increase:
>>>
>>> | numjobs | v5.18-rc1 |
>>> | ------- | --------- |
>>> | 1       | 67.7      |
>>> | 2       | 67.7      |
>>> | 4       | 67.7      |
>>> | 8       | 67.7      |
>>> | 16      | 64.8      |
>>> | 32      | 59.8      |
>>> | 64      | 54.9      |
>>> | 128     | 49        |
>>> | 256     | 37.7      |
>>> | 512     | 31.8      |
>>
>> Commit fce54ed02757 was to circumvent a terrible performance hit for 
>> IOMMU enabled from 4e89dce72521 - have you ever tested with IOMMU 
>> enabled?
> 
> I understand that fce54ed02757 fix a terrible performance regression,
> and I'm not familiar with IOMMU and I never test that.
>>
>> If fce54ed02757 really does cause a performance regression in some 
>> scenarios, then we can consider reverting it from any stable kernel 
>> and also backporting [0] when it is included in Linus' kernel
> 
> That sounds good.
> 
> For 5.10 stable, I think it's ok to revert it for now, and if someone
> cares about the problem 4e89dce72521 fixed, they can try to backport it
> together with follow up patches.

For 5.10 stable revert only,

Reviewed-by: John Garry <john.garry@huawei.com>

Thanks,
John
Yu Kuai Sept. 29, 2022, 2:57 a.m. UTC | #5
Hi,

在 2022/09/28 15:36, John Garry 写道:
> On 28/09/2022 02:35, Yu Kuai wrote:
>>>>> However both 5.15 stable and 5.19 mainline include fce54ed02757 - 
>>>>> it was automatically backported for 5.15 stable. Please double 
>>>>> check that.
>>>>>
>>>>> And can you also check performance there for those kernels?
>>>>
>>>> I'm pretty sure io split can decline performance, especially for HDD,
>>>> because blk-mq can't guarantee that split io can be dispatched to disk
>>>> sequentially. However, this is usually not common with proper
>>>> max_sectors_kb.
>>>>
>>>> Here is an example that if max_sector_kb is 128k, performance will
>>>> drop a lot under high concurrency:
>>>>
>>>> https://lore.kernel.org/all/20220408073916.1428590-1-yukuai3@huawei.com/ 
>>>>
> 
> This never got merged in any form, right?

Yes.

> 
>>>>
>>>> Here I set max_sectors_kb to 128k manually, and 1m random io 
>>>> performance
>>>> will drop while io concurrency increase:
>>>>
>>>> | numjobs | v5.18-rc1 |
>>>> | ------- | --------- |
>>>> | 1       | 67.7      |
>>>> | 2       | 67.7      |
>>>> | 4       | 67.7      |
>>>> | 8       | 67.7      |
>>>> | 16      | 64.8      |
>>>> | 32      | 59.8      |
>>>> | 64      | 54.9      |
>>>> | 128     | 49        |
>>>> | 256     | 37.7      |
>>>> | 512     | 31.8      |
>>>
>>> Commit fce54ed02757 was to circumvent a terrible performance hit for 
>>> IOMMU enabled from 4e89dce72521 - have you ever tested with IOMMU 
>>> enabled?
>>
>> I understand that fce54ed02757 fix a terrible performance regression,
>> and I'm not familiar with IOMMU and I never test that.
>>>
>>> If fce54ed02757 really does cause a performance regression in some 
>>> scenarios, then we can consider reverting it from any stable kernel 
>>> and also backporting [0] when it is included in Linus' kernel
>>
>> That sounds good.
>>
>> For 5.10 stable, I think it's ok to revert it for now, and if someone
>> cares about the problem 4e89dce72521 fixed, they can try to backport it
>> together with follow up patches.
> 
> For 5.10 stable revert only,
> 
> Reviewed-by: John Garry <john.garry@huawei.com>

Thanks for the review!

Kuai
> 
> Thanks,
> John
> .
>
diff mbox series

Patch

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index dfe7e6370d84..cd41dc061d87 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -2738,7 +2738,6 @@  static int slave_configure_v3_hw(struct scsi_device *sdev)
 	struct hisi_hba *hisi_hba = shost_priv(shost);
 	struct device *dev = hisi_hba->dev;
 	int ret = sas_slave_configure(sdev);
-	unsigned int max_sectors;
 
 	if (ret)
 		return ret;
@@ -2756,12 +2755,6 @@  static int slave_configure_v3_hw(struct scsi_device *sdev)
 		}
 	}
 
-	/* Set according to IOMMU IOVA caching limit */
-	max_sectors = min_t(size_t, queue_max_hw_sectors(sdev->request_queue),
-			    (PAGE_SIZE * 32) >> SECTOR_SHIFT);
-
-	blk_queue_max_hw_sectors(sdev->request_queue, max_sectors);
-
 	return 0;
 }