mbox series

[00/24] Optimize the hot path in the UFS driver

Message ID 20250403211937.2225615-1-bvanassche@acm.org
Headers show
Series Optimize the hot path in the UFS driver | expand

Message

Bart Van Assche April 3, 2025, 9:17 p.m. UTC
Hi Martin,

This patch series increases IOPS by 1% and reduces CPU per I/O by 10% on my
UFS 4.0 test setup. Please consider this patch series for the next merge
window.

Thanks,

Bart.

Bart Van Assche (23):
  scsi: core: Make scsi_cmd_to_rq() accept const arguments
  scsi: core: Make scsi_cmd_priv() accept const arguments
  scsi: core: Use scsi_cmd_priv() instead of open-coding it
  scsi: core: Introduce scsi_host_update_can_queue()
  scsi: ufs: core: Change the type of one ufshcd_add_cmd_upiu_trace()
    argument
  scsi: ufs: core: Only call ufshcd_add_command_trace() for SCSI
    commands
  scsi: ufs: core: Change the type of one ufshcd_add_command_trace()
    argument
  scsi: ufs: core: Change the type of one ufshcd_send_command() argument
  scsi: ufs: core: Only call ufshcd_should_inform_monitor() for SCSI
    commands
  scsi: ufs: core: Change the monitor function argument types
  scsi: ufs: core: Rework ufshcd_mcq_compl_pending_transfer()
  scsi: ufs: core: Rework ufshcd_eh_device_reset_handler()
  scsi: ufs: core: Cache the DMA buffer sizes
  scsi: ufs: core: Add an argument to ufshcd_mcq_decide_queue_depth()
  scsi: ufs: core: Add an argument to ufshcd_alloc_mcq()
  scsi: ufs: core: Call ufshcd_mcq_init() once
  scsi: ufs: core: Allocate the SCSI host earlier
  scsi: ufs: core: Call ufshcd_init_lrb() later
  scsi: ufs: core: Use hba->reserved_slot
  scsi: ufs: core: Allocate the reserved slot as a reserved request
  scsi: ufs: core: Do not clear driver-private command data
  scsi: ufs: core: Optimize the hot path
  scsi: ufs: core: Remove the ufshcd_lrb task_tag member

Hannes Reinecke (1):
  scsi: core: Implement reserved command handling

 drivers/scsi/hosts.c             |   3 +
 drivers/scsi/scsi.c              |  26 ++
 drivers/scsi/scsi_lib.c          |   6 +-
 drivers/scsi/scsi_logging.c      |  10 +-
 drivers/ufs/core/ufs-mcq.c       |  31 +-
 drivers/ufs/core/ufshcd-crypto.h |  18 +-
 drivers/ufs/core/ufshcd-priv.h   |  27 +-
 drivers/ufs/core/ufshcd.c        | 660 +++++++++++++++++--------------
 include/scsi/scsi_cmnd.h         |  17 +-
 include/scsi/scsi_host.h         |  24 +-
 include/ufs/ufshcd.h             |  11 +-
 11 files changed, 487 insertions(+), 346 deletions(-)

Comments

John Garry April 4, 2025, 10:15 a.m. UTC | #1
On 03/04/2025 22:17, Bart Van Assche wrote:
> Hi Martin,
> 
> This patch series increases IOPS by 1% and reduces CPU per I/O by 10% on my
> UFS 4.0 test setup. Please consider this patch series for the next merge
> window.

That cover letter is a bit thin for something which now implements SCSI 
reserved command handling (in addition to any UFS optimisation).

> 
> Thanks,
> 
> Bart.
> 
> Bart Van Assche (23):
>    scsi: core: Make scsi_cmd_to_rq() accept const arguments
>    scsi: core: Make scsi_cmd_priv() accept const arguments
>    scsi: core: Use scsi_cmd_priv() instead of open-coding it
>    scsi: core: Introduce scsi_host_update_can_queue()
>    scsi: ufs: core: Change the type of one ufshcd_add_cmd_upiu_trace()
>      argument
>    scsi: ufs: core: Only call ufshcd_add_command_trace() for SCSI
>      commands
>    scsi: ufs: core: Change the type of one ufshcd_add_command_trace()
>      argument
>    scsi: ufs: core: Change the type of one ufshcd_send_command() argument
>    scsi: ufs: core: Only call ufshcd_should_inform_monitor() for SCSI
>      commands
>    scsi: ufs: core: Change the monitor function argument types
>    scsi: ufs: core: Rework ufshcd_mcq_compl_pending_transfer()
>    scsi: ufs: core: Rework ufshcd_eh_device_reset_handler()
>    scsi: ufs: core: Cache the DMA buffer sizes
>    scsi: ufs: core: Add an argument to ufshcd_mcq_decide_queue_depth()
>    scsi: ufs: core: Add an argument to ufshcd_alloc_mcq()
>    scsi: ufs: core: Call ufshcd_mcq_init() once
>    scsi: ufs: core: Allocate the SCSI host earlier
>    scsi: ufs: core: Call ufshcd_init_lrb() later
>    scsi: ufs: core: Use hba->reserved_slot
>    scsi: ufs: core: Allocate the reserved slot as a reserved request
>    scsi: ufs: core: Do not clear driver-private command data
>    scsi: ufs: core: Optimize the hot path
>    scsi: ufs: core: Remove the ufshcd_lrb task_tag member
> 
> Hannes Reinecke (1):
>    scsi: core: Implement reserved command handling
> 
>   drivers/scsi/hosts.c             |   3 +
>   drivers/scsi/scsi.c              |  26 ++
>   drivers/scsi/scsi_lib.c          |   6 +-
>   drivers/scsi/scsi_logging.c      |  10 +-
>   drivers/ufs/core/ufs-mcq.c       |  31 +-
>   drivers/ufs/core/ufshcd-crypto.h |  18 +-
>   drivers/ufs/core/ufshcd-priv.h   |  27 +-
>   drivers/ufs/core/ufshcd.c        | 660 +++++++++++++++++--------------
>   include/scsi/scsi_cmnd.h         |  17 +-
>   include/scsi/scsi_host.h         |  24 +-
>   include/ufs/ufshcd.h             |  11 +-
>   11 files changed, 487 insertions(+), 346 deletions(-)
> 
>
Bart Van Assche April 4, 2025, 5:05 p.m. UTC | #2
On 4/4/25 3:15 AM, John Garry wrote:
> On 03/04/2025 22:17, Bart Van Assche wrote:
>> Hi Martin,
>>
>> This patch series increases IOPS by 1% and reduces CPU per I/O by 10% 
>> on my
>> UFS 4.0 test setup. Please consider this patch series for the next merge
>> window.
> 
> That cover letter is a bit thin for something which now implements SCSI 
> reserved command handling (in addition to any UFS optimisation).
Hi John,

Here is a more detailed version:

This patch series optimizes the hot path of the UFS driver by making
struct scsi_cmnd and struct ufshcd_lrb adjacent. Making these two data
structures adjacent is realized as follows:

@@ -9040,6 +9046,7 @@ static const struct scsi_host_template 
ufshcd_driver_template = {
  	.name			= UFSHCD,
  	.proc_name		= UFSHCD,
  	.map_queues		= ufshcd_map_queues,
+	.cmd_size		= sizeof(struct ufshcd_lrb),
  	.init_cmd_priv		= ufshcd_init_cmd_priv,
  	.queuecommand		= ufshcd_queuecommand,
  	.mq_poll		= ufshcd_poll,

The following changes had to be made prior to making these two data
structures adjacent:
* Instead of making the reserved command slot (hba->reserved_slot)
   invisible to the SCSI core, let the SCSI core allocate memory for the
   reserved slot and tell the block layer to reserve one slot. This is
   why patch 04/24 adds minimal support in the SCSI core for reserved
   command handling.
* Remove all UFS data structure members that are no longer needed
   because struct scsi_cmnd and struct ufshcd_lrb are now adjacent
* Call ufshcd_init_lrb() from inside ufshcd_queuecommand() instead of
   calling this function before I/O starts. This is necessary because
   ufshcd_memory_alloc() allocates fewer instances than the block layer
   allocates requests. See also the following code in the block layer
   core:

	if (blk_mq_init_request(set, hctx->fq->flush_rq, hctx_idx,
				hctx->numa_node))

   Although the UFS driver could be modified such that ufshcd_init_lrb()
   is called from ufshcd_init_cmd_priv(), realizing this would require
   moving the memory allocations that happen from inside
   ufshcd_memory_alloc() into ufshcd_init_cmd_priv(). That would make
   this patch series even larger.
* ufshcd_add_scsi_host() happens now before any device management
   commands are submitted. This change is necessary because this patch
   makes device management command allocation happen when the SCSI host
   is allocated.
* Start with allocating 32 command slots and increase this number later
   after it is clear whether or not the UFS device supports more than 32
   command slots. Introduce scsi_host_update_can_queue() to support this
   approach.

Bart.