Message ID | 20250403211937.2225615-1-bvanassche@acm.org |
---|---|
Headers | show |
Series | Optimize the hot path in the UFS driver | expand |
On 03/04/2025 22:17, Bart Van Assche wrote: > Hi Martin, > > This patch series increases IOPS by 1% and reduces CPU per I/O by 10% on my > UFS 4.0 test setup. Please consider this patch series for the next merge > window. That cover letter is a bit thin for something which now implements SCSI reserved command handling (in addition to any UFS optimisation). > > Thanks, > > Bart. > > Bart Van Assche (23): > scsi: core: Make scsi_cmd_to_rq() accept const arguments > scsi: core: Make scsi_cmd_priv() accept const arguments > scsi: core: Use scsi_cmd_priv() instead of open-coding it > scsi: core: Introduce scsi_host_update_can_queue() > scsi: ufs: core: Change the type of one ufshcd_add_cmd_upiu_trace() > argument > scsi: ufs: core: Only call ufshcd_add_command_trace() for SCSI > commands > scsi: ufs: core: Change the type of one ufshcd_add_command_trace() > argument > scsi: ufs: core: Change the type of one ufshcd_send_command() argument > scsi: ufs: core: Only call ufshcd_should_inform_monitor() for SCSI > commands > scsi: ufs: core: Change the monitor function argument types > scsi: ufs: core: Rework ufshcd_mcq_compl_pending_transfer() > scsi: ufs: core: Rework ufshcd_eh_device_reset_handler() > scsi: ufs: core: Cache the DMA buffer sizes > scsi: ufs: core: Add an argument to ufshcd_mcq_decide_queue_depth() > scsi: ufs: core: Add an argument to ufshcd_alloc_mcq() > scsi: ufs: core: Call ufshcd_mcq_init() once > scsi: ufs: core: Allocate the SCSI host earlier > scsi: ufs: core: Call ufshcd_init_lrb() later > scsi: ufs: core: Use hba->reserved_slot > scsi: ufs: core: Allocate the reserved slot as a reserved request > scsi: ufs: core: Do not clear driver-private command data > scsi: ufs: core: Optimize the hot path > scsi: ufs: core: Remove the ufshcd_lrb task_tag member > > Hannes Reinecke (1): > scsi: core: Implement reserved command handling > > drivers/scsi/hosts.c | 3 + > drivers/scsi/scsi.c | 26 ++ > drivers/scsi/scsi_lib.c | 6 +- > drivers/scsi/scsi_logging.c | 10 +- > drivers/ufs/core/ufs-mcq.c | 31 +- > drivers/ufs/core/ufshcd-crypto.h | 18 +- > drivers/ufs/core/ufshcd-priv.h | 27 +- > drivers/ufs/core/ufshcd.c | 660 +++++++++++++++++-------------- > include/scsi/scsi_cmnd.h | 17 +- > include/scsi/scsi_host.h | 24 +- > include/ufs/ufshcd.h | 11 +- > 11 files changed, 487 insertions(+), 346 deletions(-) > >
On 4/4/25 3:15 AM, John Garry wrote: > On 03/04/2025 22:17, Bart Van Assche wrote: >> Hi Martin, >> >> This patch series increases IOPS by 1% and reduces CPU per I/O by 10% >> on my >> UFS 4.0 test setup. Please consider this patch series for the next merge >> window. > > That cover letter is a bit thin for something which now implements SCSI > reserved command handling (in addition to any UFS optimisation). Hi John, Here is a more detailed version: This patch series optimizes the hot path of the UFS driver by making struct scsi_cmnd and struct ufshcd_lrb adjacent. Making these two data structures adjacent is realized as follows: @@ -9040,6 +9046,7 @@ static const struct scsi_host_template ufshcd_driver_template = { .name = UFSHCD, .proc_name = UFSHCD, .map_queues = ufshcd_map_queues, + .cmd_size = sizeof(struct ufshcd_lrb), .init_cmd_priv = ufshcd_init_cmd_priv, .queuecommand = ufshcd_queuecommand, .mq_poll = ufshcd_poll, The following changes had to be made prior to making these two data structures adjacent: * Instead of making the reserved command slot (hba->reserved_slot) invisible to the SCSI core, let the SCSI core allocate memory for the reserved slot and tell the block layer to reserve one slot. This is why patch 04/24 adds minimal support in the SCSI core for reserved command handling. * Remove all UFS data structure members that are no longer needed because struct scsi_cmnd and struct ufshcd_lrb are now adjacent * Call ufshcd_init_lrb() from inside ufshcd_queuecommand() instead of calling this function before I/O starts. This is necessary because ufshcd_memory_alloc() allocates fewer instances than the block layer allocates requests. See also the following code in the block layer core: if (blk_mq_init_request(set, hctx->fq->flush_rq, hctx_idx, hctx->numa_node)) Although the UFS driver could be modified such that ufshcd_init_lrb() is called from ufshcd_init_cmd_priv(), realizing this would require moving the memory allocations that happen from inside ufshcd_memory_alloc() into ufshcd_init_cmd_priv(). That would make this patch series even larger. * ufshcd_add_scsi_host() happens now before any device management commands are submitted. This change is necessary because this patch makes device management command allocation happen when the SCSI host is allocated. * Start with allocating 32 command slots and increase this number later after it is clear whether or not the UFS device supports more than 32 command slots. Introduce scsi_host_update_can_queue() to support this approach. Bart.