mbox series

[v8,0/9] Improve performance for zoned UFS devices

Message ID 20230811213604.548235-1-bvanassche@acm.org
Headers show
Series Improve performance for zoned UFS devices | expand

Message

Bart Van Assche Aug. 11, 2023, 9:35 p.m. UTC
Hi Jens,

This patch series improves small write IOPS by a factor of four (+300%) for
zoned UFS devices on my test setup with an UFSHCI 3.0 controller. Please
consider this patch series for the next merge window.

Thank you,

Bart.

Changes compared to v7:
 - Split the queue_limits member variable `use_zone_write_lock' into two member
   variables: `use_zone_write_lock' (set by disk_set_zoned()) and
   `driver_preserves_write_order' (set by the block driver or SCSI LLD). This
   should clear up the confusion about the purpose of this variable.
 - Moved the code for sorting SCSI commands by LBA from the SCSI error handler
   into the SCSI disk (sd) driver as requested by Christoph.
   
Changes compared to v6:
 - Removed QUEUE_FLAG_NO_ZONE_WRITE_LOCK and instead introduced a flag in
   the request queue limits data structure.

Changes compared to v5:
 - Renamed scsi_cmp_lba() into scsi_cmp_sector().
 - Improved several source code comments.

Changes compared to v4:
 - Dropped the patch that introduces the REQ_NO_ZONE_WRITE_LOCK flag.
 - Dropped the null_blk patch and added two scsi_debug patches instead.
 - Dropped the f2fs patch.
 - Split the patch for the UFS driver into two patches.
 - Modified several patch descriptions and source code comments.
 - Renamed dd_use_write_locking() into dd_use_zone_write_locking().
 - Moved the list_sort() call from scsi_unjam_host() into scsi_eh_flush_done_q()
   such that sorting happens just before reinserting.
 - Removed the scsi_cmd_retry_allowed() call from scsi_check_sense() to make
   sure that the retry counter is adjusted once per retry instead of twice.

Changes compared to v3:
 - Restored the patch that introduces QUEUE_FLAG_NO_ZONE_WRITE_LOCK. That patch
   had accidentally been left out from v2.
 - In patch "block: Introduce the flag REQ_NO_ZONE_WRITE_LOCK", improved the
   patch description and added the function blk_no_zone_write_lock().
 - In patch "block/mq-deadline: Only use zone locking if necessary", moved the
   blk_queue_is_zoned() call into dd_use_write_locking().
 - In patch "fs/f2fs: Disable zone write locking", set REQ_NO_ZONE_WRITE_LOCK
   from inside __bio_alloc() instead of in f2fs_submit_write_bio().

Changes compared to v2:
 - Renamed the request queue flag for disabling zone write locking.
 - Introduced a new request flag for disabling zone write locking.
 - Modified the mq-deadline scheduler such that zone write locking is only
   disabled if both flags are set.
 - Added an F2FS patch that sets the request flag for disabling zone write
   locking.
 - Only disable zone write locking in the UFS driver if auto-hibernation is
   disabled.

Changes compared to v1:
 - Left out the patches that are already upstream.
 - Switched the approach in patch "scsi: Retry unaligned zoned writes" from
   retrying immediately to sending unaligned write commands to the SCSI error
   handler.

Bart Van Assche (9):
  block: Introduce more member variables related to zone write locking
  block/mq-deadline: Only use zone locking if necessary
  scsi: core: Call .eh_prepare_resubmit() before resubmitting
  scsi: sd: Sort commands by LBA before resubmitting
  scsi: core: Retry unaligned zoned writes
  scsi: scsi_debug: Support disabling zone write locking
  scsi: scsi_debug: Support injecting unaligned write errors
  scsi: ufs: Split an if-condition
  scsi: ufs: Inform the block layer about write ordering

 block/blk-settings.c           |  7 +++
 block/mq-deadline.c            | 14 +++---
 drivers/scsi/Kconfig           |  2 +
 drivers/scsi/Kconfig.kunit     |  4 ++
 drivers/scsi/Makefile          |  2 +
 drivers/scsi/Makefile.kunit    |  1 +
 drivers/scsi/scsi_debug.c      | 20 +++++++-
 drivers/scsi/scsi_error.c      | 59 ++++++++++++++++++++++
 drivers/scsi/scsi_error_test.c | 92 ++++++++++++++++++++++++++++++++++
 drivers/scsi/scsi_lib.c        |  1 +
 drivers/scsi/scsi_priv.h       |  1 +
 drivers/scsi/sd.c              | 25 +++++++++
 drivers/ufs/core/ufshcd.c      | 40 +++++++++++++--
 include/linux/blkdev.h         | 10 ++++
 include/scsi/scsi.h            |  1 +
 include/scsi/scsi_driver.h     |  1 +
 16 files changed, 270 insertions(+), 10 deletions(-)
 create mode 100644 drivers/scsi/Kconfig.kunit
 create mode 100644 drivers/scsi/Makefile.kunit
 create mode 100644 drivers/scsi/scsi_error_test.c

Comments

Bao D. Nguyen Aug. 12, 2023, 5:09 p.m. UTC | #1
On 8/11/2023 2:35 PM, Bart Van Assche wrote:
>  From the UFSHCI 4.0 specification, about the legacy (single queue) mode:
> "The host controller always process transfer requests in-order according
> to the order submitted to the list. In case of multiple commands with
> single doorbell register ringing (batch mode), The dispatch order for
> these transfer requests by host controller will base on their index in
> the List. A transfer request with lower index value will be executed
> before a transfer request with higher index value."
> 
>  From the UFSHCI 4.0 specification, about the MCQ mode:
> "Command Submission
> 1. Host SW writes an Entry to SQ
> 2. Host SW updates SQ doorbell tail pointer
> 
> Command Processing
> 3. After fetching the Entry, Host Controller updates SQ doorbell head
>     pointer
> 4. Host controller sends COMMAND UPIU to UFS device"
> 
> In other words, for both legacy and MCQ mode, UFS controllers are
> required to forward commands to the UFS device in the order these
> commands have been received from the host.
> 
> Notes:
> - For legacy mode this is only correct if the host submits one
>    command at a time. The UFS driver does this.
> - Also in legacy mode, the command order is not preserved if
>    auto-hibernation is enabled in the UFS controller. Hence, enable
>    zone write locking if auto-hibernation is enabled.
> 
> This patch improves performance as follows on my test setup:
> - With the mq-deadline scheduler: 2.5x more IOPS for small writes.
> - When not using an I/O scheduler compared to using mq-deadline with
>    zone locking: 4x more IOPS for small writes.
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Can Guo <quic_cang@quicinc.com>
> Cc: Avri Altman <avri.altman@wdc.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   drivers/ufs/core/ufshcd.c | 35 ++++++++++++++++++++++++++++++++++-
>   1 file changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index ae7b868f9c26..71cee10c75ad 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -4337,23 +4337,48 @@ int ufshcd_uic_hibern8_exit(struct ufs_hba *hba)
>   }
>   EXPORT_SYMBOL_GPL(ufshcd_uic_hibern8_exit);
>   
> +static void ufshcd_update_preserves_write_order(struct ufs_hba *hba,
> +						bool preserves_write_order)
> +{
> +	struct scsi_device *sdev;
> +
> +	shost_for_each_device(sdev, hba->host)
> +		blk_freeze_queue_start(sdev->request_queue);
> +	shost_for_each_device(sdev, hba->host) {
> +		struct request_queue *q = sdev->request_queue;
> +
> +		blk_mq_freeze_queue_wait(q);
> +		q->limits.driver_preserves_write_order = preserves_write_order;
> +		blk_mq_unfreeze_queue(q);
> +	}
> +}
> +
>   void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
>   {
>   	unsigned long flags;
> -	bool update = false;
> +	bool prev_state, new_state, update = false;
>   
>   	if (!ufshcd_is_auto_hibern8_supported(hba))
>   		return;
>   
>   	spin_lock_irqsave(hba->host->host_lock, flags);
> +	prev_state = ufshcd_is_auto_hibern8_enabled(hba);
>   	if (hba->ahit != ahit) {
>   		hba->ahit = ahit;
>   		update = true;
>   	}
> +	new_state = ufshcd_is_auto_hibern8_enabled(hba);
>   	spin_unlock_irqrestore(hba->host->host_lock, flags);
>   
>   	if (!update)
>   		return;
> +	if (!is_mcq_enabled(hba) && !prev_state && new_state) {
> +		/*
> +		 * Auto-hibernation will be enabled for legacy UFSHCI mode. Tell
> +		 * the block layer that write requests may be reordered.
> +		 */
> +		ufshcd_update_preserves_write_order(hba, false);
Hi Bart,
I am not reviewing other patches in this series, so I don't know the 
whole context. Here is my comment on this patch alone.

Looks like you rely on ufshcd_auto_hibern8_update() being invoked so 
that you can update the driver_preserves_write_order. However, the 
hba->ahit value can be updated by the vendor's driver, and 
ufshcd_auto_hibern8_enable() can be invoked without 
ufshcd_auto_hibern8_update(). Therefore, you may have a situation where 
the driver_preserves_write_order is true by default, but 
Auto-hibernation is enabled by default.

Thanks,
Bao

> +	}
>   	if (!pm_runtime_suspended(&hba->ufs_device_wlun->sdev_gendev)) {
>   		ufshcd_rpm_get_sync(hba);
>   		ufshcd_hold(hba);
> @@ -4361,6 +4386,13 @@ void ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
>   		ufshcd_release(hba);
>   		ufshcd_rpm_put_sync(hba);
>   	}
> +	if (!is_mcq_enabled(hba) && prev_state && !new_state) {
> +		/*
> +		 * Auto-hibernation has been disabled. Tell the block layer that
> +		 * the order of write requests is preserved.
> +		 */
> +		ufshcd_update_preserves_write_order(hba, true);
> +	}
>   }
>   EXPORT_SYMBOL_GPL(ufshcd_auto_hibern8_update);
>   
> @@ -5140,6 +5172,7 @@ static int ufshcd_slave_configure(struct scsi_device *sdev)
>   
>   	ufshcd_hpb_configure(hba, sdev);
>   
> +	q->limits.driver_preserves_write_order = true;
>   	blk_queue_update_dma_pad(q, PRDT_DATA_BYTE_COUNT_PAD - 1);
>   	if (hba->quirks & UFSHCD_QUIRK_4KB_DMA_ALIGNMENT)
>   		blk_queue_update_dma_alignment(q, SZ_4K - 1);