mbox series

[v3,0/9] Add support for segments smaller than one page

Message ID 20230118225447.2809787-1-bvanassche@acm.org
Headers show
Series Add support for segments smaller than one page | expand

Message

Bart Van Assche Jan. 18, 2023, 10:54 p.m. UTC
Hi Jens,

Several embedded storage controllers need support for DMA segments that are
smaller than the size of one virtual memory page. Hence this patch series.
Please consider this patch series for the next merge window.

Thanks,

Bart.

Changes compared to v2:
- For SCSI drivers, only set flag QUEUE_FLAG_SUB_PAGE_SEGMENTS if necessary.
- In the scsi_debug patch, sorted kernel module parameters alphabetically.
  Only set flag QUEUE_FLAG_SUB_PAGE_SEGMENTS if necessary.
- Added a patch for the UFS Exynos driver that enables
  CONFIG_BLK_SUB_PAGE_SEGMENTS if the page size exceeds 4 KiB.

Changes compared to v1:
- Added a CONFIG variable that controls whether or not small segment support
  is enabled.
- Improved patch descriptions.

Bart Van Assche (9):
  block: Introduce QUEUE_FLAG_SUB_PAGE_SEGMENTS and
    CONFIG_BLK_SUB_PAGE_SEGMENTS
  block: Support configuring limits below the page size
  block: Support submitting passthrough requests with small segments
  block: Add support for filesystem requests and small segments
  block: Add support for small segments in blk_rq_map_user_iov()
  scsi_debug: Support configuring the maximum segment size
  null_blk: Support configuring the maximum segment size
  scsi: core: Set BLK_SUB_PAGE_SEGMENTS for small max_segment_size
    values
  scsi: ufs: exynos: Select CONFIG_BLK_SUB_PAGE_SEGMENTS for lage page
    sizes

 block/Kconfig                     |  9 +++++++
 block/blk-map.c                   | 43 ++++++++++++++++++++++++++-----
 block/blk-merge.c                 |  6 +++--
 block/blk-mq.c                    |  2 ++
 block/blk-settings.c              | 20 ++++++++------
 block/blk.h                       | 22 +++++++++++-----
 drivers/block/null_blk/main.c     | 21 ++++++++++++---
 drivers/block/null_blk/null_blk.h |  1 +
 drivers/scsi/scsi_debug.c         | 15 +++++++++++
 drivers/scsi/scsi_lib.c           |  3 +++
 drivers/ufs/host/Kconfig          |  1 +
 include/linux/blkdev.h            |  7 +++++
 12 files changed, 125 insertions(+), 25 deletions(-)

Comments

Bart Van Assche Jan. 18, 2023, 11:40 p.m. UTC | #1
On 1/18/23 15:02, Jens Axboe wrote:
> On 1/18/23 3:54 PM, Bart Van Assche wrote:
>> Prepare for introducing support for segments smaller than the page size
>> by introducing the request queue flag QUEUE_FLAG_SUB_PAGE_SEGMENTS.
>> Introduce CONFIG_BLK_SUB_PAGE_SEGMENTS to prevent that performance of
>> block drivers that support segments >= PAGE_SIZE would be affected.
>>
>> Cc: Christoph Hellwig <hch@lst.de>
>> Cc: Ming Lei <ming.lei@redhat.com>
>> Cc: Keith Busch <kbusch@kernel.org>
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> ---
>>   block/Kconfig          | 9 +++++++++
>>   include/linux/blkdev.h | 7 +++++++
>>   2 files changed, 16 insertions(+)
>>
>> diff --git a/block/Kconfig b/block/Kconfig
>> index 5d9d9c84d516..e85061d2175b 100644
>> --- a/block/Kconfig
>> +++ b/block/Kconfig
>> @@ -35,6 +35,15 @@ config BLOCK_LEGACY_AUTOLOAD
>>   	  created on demand, but scripts that manually create device nodes and
>>   	  then call losetup might rely on this behavior.
>>   
>> +config BLK_SUB_PAGE_SEGMENTS
>> +       bool "Support segments smaller than the page size"
>> +       default n
>> +       help
>> +	  Most storage controllers support DMA segments larger than the typical
>> +	  size of a virtual memory page. Some embedded controllers only support
>> +	  DMA segments smaller than the page size. Enable this option to support
>> +	  such controllers.
> 
> This should not be a visible option at all, affected drivers should just
> select it.

Hi Jens,

Thanks for the feedback. Unless anyone requests me to realize this in another
way, I plan to merge the patch below into the patch above:


diff --git a/block/Kconfig b/block/Kconfig
index e85061d2175b..b44b449c47bf 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -36,13 +36,7 @@ config BLOCK_LEGACY_AUTOLOAD
  	  then call losetup might rely on this behavior.

  config BLK_SUB_PAGE_SEGMENTS
-       bool "Support segments smaller than the page size"
-       default n
-       help
-	  Most storage controllers support DMA segments larger than the typical
-	  size of a virtual memory page. Some embedded controllers only support
-	  DMA segments smaller than the page size. Enable this option to support
-	  such controllers.
+       bool

  config BLK_RQ_ALLOC_TIME
  	bool
Jens Axboe Jan. 19, 2023, 1:18 a.m. UTC | #2
On 1/18/23 3:54?PM, Bart Van Assche wrote:
> Hi Jens,
> 
> Several embedded storage controllers need support for DMA segments that are
> smaller than the size of one virtual memory page. Hence this patch series.
> Please consider this patch series for the next merge window.

Before any real reviews are done, I have to ask "why?". This is pretty
hairy code in the middle of the fast path for some obscure controller.
Why would anyone ship that with > 4k page sizes rather than ship it with
a controller that is sane?
Bart Van Assche Jan. 19, 2023, 4:43 a.m. UTC | #3
On 1/18/23 17:18, Jens Axboe wrote:
> On 1/18/23 3:54?PM, Bart Van Assche wrote:
>> Hi Jens,
>>
>> Several embedded storage controllers need support for DMA segments that are
>> smaller than the size of one virtual memory page. Hence this patch series.
>> Please consider this patch series for the next merge window.
> 
> Before any real reviews are done, I have to ask "why?". This is pretty
> hairy code in the middle of the fast path for some obscure controller.
> Why would anyone ship that with > 4k page sizes rather than ship it with
> a controller that is sane?

Hi Jens,

The new config variable CONFIG_BLK_SUB_PAGE_SEGMENTS has been introduced 
to make sure that the hot path is *not* affected if that config variable 
is disabled.

Regarding the question "why": the Google Android team would like to 
improve performance by switching from 4 KiB pages to 16 KiB pages. One 
of the widely used UFS controllers (Exynos) has a maximum segment size 
of 4 KiB. Hence this patch series.

This patch series is not only useful for phones but also for Tesla cars 
since Tesla cars use an Exynos UFS controller.

A contributor to the MMC driver told me that this patch series would 
allow to simplify the MMC driver significantly (I have not yet double 
checked this).

Bart.
Bart Van Assche Jan. 21, 2023, 12:11 a.m. UTC | #4
On 1/18/23 15:02, Jens Axboe wrote:
> On 1/18/23 3:54 PM, Bart Van Assche wrote:
>> Prepare for introducing support for segments smaller than the page size
>> by introducing the request queue flag QUEUE_FLAG_SUB_PAGE_SEGMENTS.
>> Introduce CONFIG_BLK_SUB_PAGE_SEGMENTS to prevent that performance of
>> block drivers that support segments >= PAGE_SIZE would be affected.
>>
>> Cc: Christoph Hellwig <hch@lst.de>
>> Cc: Ming Lei <ming.lei@redhat.com>
>> Cc: Keith Busch <kbusch@kernel.org>
>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>> ---
>>   block/Kconfig          | 9 +++++++++
>>   include/linux/blkdev.h | 7 +++++++
>>   2 files changed, 16 insertions(+)
>>
>> diff --git a/block/Kconfig b/block/Kconfig
>> index 5d9d9c84d516..e85061d2175b 100644
>> --- a/block/Kconfig
>> +++ b/block/Kconfig
>> @@ -35,6 +35,15 @@ config BLOCK_LEGACY_AUTOLOAD
>>   	  created on demand, but scripts that manually create device nodes and
>>   	  then call losetup might rely on this behavior.
>>   
>> +config BLK_SUB_PAGE_SEGMENTS
>> +       bool "Support segments smaller than the page size"
>> +       default n
>> +       help
>> +	  Most storage controllers support DMA segments larger than the typical
>> +	  size of a virtual memory page. Some embedded controllers only support
>> +	  DMA segments smaller than the page size. Enable this option to support
>> +	  such controllers.
> 
> This should not be a visible option at all, affected drivers should just
> select it.

Hi Jens,

If CONFIG_BLK_SUB_PAGE_SEGMENTS is made invisible, how should this 
option be enabled for the scsi_debug and null_blk drivers? Adding 
"select BLK_SUB_PAGE_SEGMENTS" to the Kconfig section of these drivers 
would have the unfortunate side effect that enabling either driver would 
make all block drivers slower. How about making sub-page segment support 
configurable for the scsi_debug and null_blk drivers only? That would 
allow kernel developers who want to test the sub-page segment support to 
enable this functionality without making e.g. distro kernels slower.

Thanks,

Bart.
Jens Axboe Jan. 21, 2023, 2:43 a.m. UTC | #5
On 1/20/23 5:11?PM, Bart Van Assche wrote:
> On 1/18/23 15:02, Jens Axboe wrote:
>> On 1/18/23 3:54?PM, Bart Van Assche wrote:
>>> Prepare for introducing support for segments smaller than the page size
>>> by introducing the request queue flag QUEUE_FLAG_SUB_PAGE_SEGMENTS.
>>> Introduce CONFIG_BLK_SUB_PAGE_SEGMENTS to prevent that performance of
>>> block drivers that support segments >= PAGE_SIZE would be affected.
>>>
>>> Cc: Christoph Hellwig <hch@lst.de>
>>> Cc: Ming Lei <ming.lei@redhat.com>
>>> Cc: Keith Busch <kbusch@kernel.org>
>>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
>>> ---
>>>   block/Kconfig          | 9 +++++++++
>>>   include/linux/blkdev.h | 7 +++++++
>>>   2 files changed, 16 insertions(+)
>>>
>>> diff --git a/block/Kconfig b/block/Kconfig
>>> index 5d9d9c84d516..e85061d2175b 100644
>>> --- a/block/Kconfig
>>> +++ b/block/Kconfig
>>> @@ -35,6 +35,15 @@ config BLOCK_LEGACY_AUTOLOAD
>>>         created on demand, but scripts that manually create device nodes and
>>>         then call losetup might rely on this behavior.
>>>   +config BLK_SUB_PAGE_SEGMENTS
>>> +       bool "Support segments smaller than the page size"
>>> +       default n
>>> +       help
>>> +      Most storage controllers support DMA segments larger than the typical
>>> +      size of a virtual memory page. Some embedded controllers only support
>>> +      DMA segments smaller than the page size. Enable this option to support
>>> +      such controllers.
>>
>> This should not be a visible option at all, affected drivers should just
>> select it.
> 
> Hi Jens,
> 
> If CONFIG_BLK_SUB_PAGE_SEGMENTS is made invisible, how should this
> option be enabled for the scsi_debug and null_blk drivers? Adding
> "select BLK_SUB_PAGE_SEGMENTS" to the Kconfig section of these drivers
> would have the unfortunate side effect that enabling either driver
> would make all block drivers slower. How about making sub-page segment
> support configurable for the scsi_debug and null_blk drivers only?
> That would allow kernel developers who want to test the sub-page
> segment support to enable this functionality without making e.g.
> distro kernels slower.

You'd need to have a separate sub-option for each of them, Kconfig
style. But this also highlights the usual issue with pretending that
Kconfig options means that this doesn't matter, because inevitably they
end up getting enabled by default in distros anyway. And then you may as
well not even have them...

Why can't we just handle this in the driver? The segment path is hard
enough to grok in the first place, and this just makes it worse.
Generally not a huge fan of punting this to the driver, but just maybe
it'd make sense in this case since it's just the one. At least that
seems a lot more palatable than the alternative.
Bart Van Assche Jan. 23, 2023, 7:58 p.m. UTC | #6
On 1/20/23 18:43, Jens Axboe wrote:
> On 1/20/23 5:11?PM, Bart Van Assche wrote:
>> If CONFIG_BLK_SUB_PAGE_SEGMENTS is made invisible, how should this
>> option be enabled for the scsi_debug and null_blk drivers? Adding
>> "select BLK_SUB_PAGE_SEGMENTS" to the Kconfig section of these drivers
>> would have the unfortunate side effect that enabling either driver
>> would make all block drivers slower. How about making sub-page segment
>> support configurable for the scsi_debug and null_blk drivers only?
>> That would allow kernel developers who want to test the sub-page
>> segment support to enable this functionality without making e.g.
>> distro kernels slower.
> 
> You'd need to have a separate sub-option for each of them, Kconfig
> style. But this also highlights the usual issue with pretending that
> Kconfig options means that this doesn't matter, because inevitably they
> end up getting enabled by default in distros anyway. And then you may as
> well not even have them...

Hi Jens,

How about the following approach?
* Remove CONFIG_BLK_SUB_PAGE_SEGMENTS.
* Use the static key mechanism instead of #ifdef
   CONFIG_BLK_SUB_PAGE_SEGMENTS to prevent that this patch series makes
   the hot path in the block layer slower.
* Count the number of request queues that need support for segments
   smaller than a page or max_hw_sector values smaller than
   PAGE_SIZE >> SECTOR_SHIFT. If that number changes from zero to one,
   enable the code introduced by this patch series. If that number drops
   to zero, toggle the static key such that the overhead of the code that
   supports small segments is eliminated.

> Why can't we just handle this in the driver? The segment path is hard
> enough to grok in the first place, and this just makes it worse.
> Generally not a huge fan of punting this to the driver, but just maybe
> it'd make sense in this case since it's just the one. At least that
> seems a lot more palatable than the alternative.

One of the functions modified by this patch series is 
blk_rq_append_bio(). That function is called by code that builds a bio 
(e.g. filesystems). Code that builds a bio does not call any block 
driver code. This is why I think it would be hard to move the 
functionality introduced by this patch series into block drivers.

Thanks,

Bart.