mbox series

[v9,00/17] Improve performance for zoned UFS devices

Message ID 20230816195447.3703954-1-bvanassche@acm.org
Headers show
Series Improve performance for zoned UFS devices | expand

Message

Bart Van Assche Aug. 16, 2023, 7:53 p.m. UTC
Hi Jens,

This patch series improves small write IOPS by a factor of four (+300%) for
zoned UFS devices on my test setup with an UFSHCI 3.0 controller. Please
consider this patch series for the next merge window.

Thank you,

Bart.

Changes compared to v8:
 - Fixed handling of 'driver_preserves_write_order' and 'use_zone_write_lock'
   in blk_stack_limits().
 - Added a comment in disk_set_zoned().
 - Modified blk_req_needs_zone_write_lock() such that it returns false if
   q->limits.use_zone_write_lock is false.
 - Modified disk_clear_zone_settings() such that it clears
   q->limits.use_zone_write_lock.
 - Left out one change from the mq-deadline patch that became superfluous due to
   the blk_req_needs_zone_write_lock() change.
 - Modified scsi_call_prepare_resubmit() such that it only calls list_sort() if
   zoned writes have to be resubmitted for which zone write locking is disabled.
 - Added an additional unit test for scsi_call_prepare_resubmit().
 - Modified the sorting code in the sd driver such that only those SCSI commands
   are sorted for which write locking is disabled.
 - Modified sd_zbc.c such that ELEVATOR_F_ZBD_SEQ_WRITE is only set if the
   write order is not preserved.
 - Included three patches for UFS host drivers that rework code that wrote
   directly to the auto-hibernation controller register.
 - Modified the UFS driver such that enabling auto-hibernation is not allowed
   if a zoned logical unit is present and if the controller operates in legacy
   mode.
 - Also in the UFS driver, simplified ufshcd_auto_hibern8_update().

Changes compared to v7:
 - Split the queue_limits member variable `use_zone_write_lock' into two member
   variables: `use_zone_write_lock' (set by disk_set_zoned()) and
   `driver_preserves_write_order' (set by the block driver or SCSI LLD). This
   should clear up the confusion about the purpose of this variable.
 - Moved the code for sorting SCSI commands by LBA from the SCSI error handler
   into the SCSI disk (sd) driver as requested by Christoph.
   
Changes compared to v6:
 - Removed QUEUE_FLAG_NO_ZONE_WRITE_LOCK and instead introduced a flag in
   the request queue limits data structure.

Changes compared to v5:
 - Renamed scsi_cmp_lba() into scsi_cmp_sector().
 - Improved several source code comments.

Changes compared to v4:
 - Dropped the patch that introduces the REQ_NO_ZONE_WRITE_LOCK flag.
 - Dropped the null_blk patch and added two scsi_debug patches instead.
 - Dropped the f2fs patch.
 - Split the patch for the UFS driver into two patches.
 - Modified several patch descriptions and source code comments.
 - Renamed dd_use_write_locking() into dd_use_zone_write_locking().
 - Moved the list_sort() call from scsi_unjam_host() into scsi_eh_flush_done_q()
   such that sorting happens just before reinserting.
 - Removed the scsi_cmd_retry_allowed() call from scsi_check_sense() to make
   sure that the retry counter is adjusted once per retry instead of twice.

Changes compared to v3:
 - Restored the patch that introduces QUEUE_FLAG_NO_ZONE_WRITE_LOCK. That patch
   had accidentally been left out from v2.
 - In patch "block: Introduce the flag REQ_NO_ZONE_WRITE_LOCK", improved the
   patch description and added the function blk_no_zone_write_lock().
 - In patch "block/mq-deadline: Only use zone locking if necessary", moved the
   blk_queue_is_zoned() call into dd_use_write_locking().
 - In patch "fs/f2fs: Disable zone write locking", set REQ_NO_ZONE_WRITE_LOCK
   from inside __bio_alloc() instead of in f2fs_submit_write_bio().

Changes compared to v2:
 - Renamed the request queue flag for disabling zone write locking.
 - Introduced a new request flag for disabling zone write locking.
 - Modified the mq-deadline scheduler such that zone write locking is only
   disabled if both flags are set.
 - Added an F2FS patch that sets the request flag for disabling zone write
   locking.
 - Only disable zone write locking in the UFS driver if auto-hibernation is
   disabled.

Changes compared to v1:
 - Left out the patches that are already upstream.
 - Switched the approach in patch "scsi: Retry unaligned zoned writes" from
   retrying immediately to sending unaligned write commands to the SCSI error
   handler.

Bart Van Assche (17):
  block: Introduce more member variables related to zone write locking
  block: Only use write locking if necessary
  block/mq-deadline: Only use zone locking if necessary
  scsi: core: Call .eh_prepare_resubmit() before resubmitting
  scsi: core: Add unit tests for scsi_call_prepare_resubmit()
  scsi: sd: Sort commands by LBA before resubmitting
  scsi: core: Retry unaligned zoned writes
  scsi: sd_zbc: Only require an I/O scheduler if needed
  scsi: scsi_debug: Support disabling zone write locking
  scsi: scsi_debug: Support injecting unaligned write errors
  scsi: ufs: hisi: Rework the code that disables auto-hibernation
  scsi: ufs: mediatek: Rework the code for disabling auto-hibernation
  scsi: ufs: sprd: Rework the code for disabling auto-hibernation
  scsi: ufs: Rename ufshcd_auto_hibern8_enable() and make it static
  scsi: ufs: Simplify ufshcd_auto_hibern8_update()
  scsi: ufs: Forbid auto-hibernation without I/O scheduler
  scsi: ufs: Inform the block layer about write ordering

 block/blk-settings.c            |  15 +++
 block/blk-zoned.c               |  10 +-
 block/mq-deadline.c             |  11 +-
 drivers/scsi/Kconfig            |   2 +
 drivers/scsi/Kconfig.kunit      |   4 +
 drivers/scsi/Makefile           |   2 +
 drivers/scsi/Makefile.kunit     |   1 +
 drivers/scsi/scsi_debug.c       |  20 +++-
 drivers/scsi/scsi_error.c       |  81 +++++++++++++
 drivers/scsi/scsi_error_test.c  | 196 ++++++++++++++++++++++++++++++++
 drivers/scsi/scsi_lib.c         |   1 +
 drivers/scsi/scsi_priv.h        |   1 +
 drivers/scsi/sd.c               |  41 +++++++
 drivers/scsi/sd_zbc.c           |   4 +-
 drivers/ufs/core/ufs-sysfs.c    |   2 +-
 drivers/ufs/core/ufshcd-priv.h  |   1 -
 drivers/ufs/core/ufshcd.c       | 110 ++++++++++++++----
 drivers/ufs/host/ufs-hisi.c     |   5 +-
 drivers/ufs/host/ufs-mediatek.c |   2 +-
 drivers/ufs/host/ufs-sprd.c     |  11 +-
 include/linux/blkdev.h          |  10 ++
 include/scsi/scsi.h             |   1 +
 include/scsi/scsi_driver.h      |   1 +
 include/ufs/ufshcd.h            |   3 +-
 24 files changed, 485 insertions(+), 50 deletions(-)
 create mode 100644 drivers/scsi/Kconfig.kunit
 create mode 100644 drivers/scsi/Makefile.kunit
 create mode 100644 drivers/scsi/scsi_error_test.c

Comments

Damien Le Moal Aug. 17, 2023, 11:01 a.m. UTC | #1
On 8/17/23 04:53, Bart Van Assche wrote:
> Make blk_req_needs_zone_write_lock() return false if
> q->limits.use_zone_write_lock is false. Inline this function because it
> is short and because it is called from the hot path of the mq-deadline
> I/O scheduler.

Your are not actually inlining the function. Did you forget to move it to
blkdev.h ?

Otherwise, the change looks OK.

> 
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-zoned.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index 112620985bff..d8a80cce832f 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -53,11 +53,16 @@ const char *blk_zone_cond_str(enum blk_zone_cond zone_cond)
>  EXPORT_SYMBOL_GPL(blk_zone_cond_str);
>  
>  /*
> - * Return true if a request is a write requests that needs zone write locking.
> + * Return true if a request is a write request that needs zone write locking.
>   */
>  bool blk_req_needs_zone_write_lock(struct request *rq)
>  {
> -	if (!rq->q->disk->seq_zones_wlock)
> +	struct request_queue *q = rq->q;
> +
> +	if (!q->limits.use_zone_write_lock)
> +		return false;
> +
> +	if (!q->disk->seq_zones_wlock)
>  		return false;
>  
>  	return blk_rq_is_seq_zoned_write(rq);
Damien Le Moal Aug. 17, 2023, 11:10 a.m. UTC | #2
On 8/17/23 04:53, Bart Van Assche wrote:
> Introduce the .eh_prepare_resubmit function pointer in struct scsi_driver.
> Make the error handler call .eh_prepare_resubmit() before resubmitting
> commands. A later patch will use this functionality to sort SCSI commands
> by LBA from inside the SCSI disk driver.
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/scsi_error.c  | 65 ++++++++++++++++++++++++++++++++++++++
>  drivers/scsi/scsi_priv.h   |  1 +
>  include/scsi/scsi_driver.h |  1 +
>  3 files changed, 67 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index c67cdcdc3ba8..4393a7fd8a07 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -27,6 +27,7 @@
>  #include <linux/blkdev.h>
>  #include <linux/delay.h>
>  #include <linux/jiffies.h>
> +#include <linux/list_sort.h>
>  
>  #include <scsi/scsi.h>
>  #include <scsi/scsi_cmnd.h>
> @@ -2186,6 +2187,68 @@ void scsi_eh_ready_devs(struct Scsi_Host *shost,
>  }
>  EXPORT_SYMBOL_GPL(scsi_eh_ready_devs);
>  
> +/*
> + * Returns true if the commands in @done_q should be sorted in LBA order
> + * before being resubmitted.
> + */
> +static bool scsi_needs_sorting(struct list_head *done_q)
> +{
> +	struct scsi_cmnd *scmd;
> +
> +	list_for_each_entry(scmd, done_q, eh_entry) {
> +		struct request *rq = scsi_cmd_to_rq(scmd);
> +
> +		if (!rq->q->limits.use_zone_write_lock &&
> +		    blk_rq_is_seq_zoned_write(rq))
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * Comparison function that allows to sort SCSI commands by ULD driver.
> + */
> +static int scsi_cmp_uld(void *priv, const struct list_head *_a,
> +			const struct list_head *_b)
> +{
> +	struct scsi_cmnd *a = list_entry(_a, typeof(*a), eh_entry);
> +	struct scsi_cmnd *b = list_entry(_b, typeof(*b), eh_entry);
> +
> +	/* See also the comment above the list_sort() definition. */
> +	return scsi_cmd_to_driver(a) > scsi_cmd_to_driver(b);
> +}
> +
> +void scsi_call_prepare_resubmit(struct list_head *done_q)
> +{
> +	struct scsi_cmnd *scmd, *next;
> +
> +	if (!scsi_needs_sorting(done_q))
> +		return;

This is strange. The eh_prepare_resubmit callback is generic and its name does
not indicate anything related to sorting by LBAs. So this check would prevent
other actions not related to sorting by LBA. This should go away.

In patch 6, based on the device characteristics, the sd driver should decides
if it needs to set .eh_prepare_resubmit or not.

And ideally, if all ULDs have eh_prepare_resubmit == NULL, this function should
return here before going through the list of commands to resubmit. Given that
this list should generally be small, going through it and doing nothing should
be OK though...

> +
> +	/* Sort pending SCSI commands by ULD. */
> +	list_sort(NULL, done_q, scsi_cmp_uld);
> +
> +	/*
> +	 * Call .eh_prepare_resubmit for each range of commands with identical
> +	 * ULD driver pointer.
> +	 */
> +	list_for_each_entry_safe(scmd, next, done_q, eh_entry) {
> +		struct scsi_driver *uld = scsi_cmd_to_driver(scmd);
> +		struct list_head *prev, uld_cmd_list;
> +
> +		while (&next->eh_entry != done_q &&
> +		       scsi_cmd_to_driver(next) == uld)
> +			next = list_next_entry(next, eh_entry);
> +		if (!uld->eh_prepare_resubmit)
> +			continue;
> +		prev = scmd->eh_entry.prev;
> +		list_cut_position(&uld_cmd_list, prev, next->eh_entry.prev);
> +		uld->eh_prepare_resubmit(&uld_cmd_list);
> +		list_splice(&uld_cmd_list, prev);
> +	}
> +}
> +
>  /**
>   * scsi_eh_flush_done_q - finish processed commands or retry them.
>   * @done_q:	list_head of processed commands.
> @@ -2194,6 +2257,8 @@ void scsi_eh_flush_done_q(struct list_head *done_q)
>  {
>  	struct scsi_cmnd *scmd, *next;
>  
> +	scsi_call_prepare_resubmit(done_q);
> +
>  	list_for_each_entry_safe(scmd, next, done_q, eh_entry) {
>  		list_del_init(&scmd->eh_entry);
>  		if (scsi_device_online(scmd->device) &&
> diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
> index f42388ecb024..df4af4645430 100644
> --- a/drivers/scsi/scsi_priv.h
> +++ b/drivers/scsi/scsi_priv.h
> @@ -101,6 +101,7 @@ int scsi_eh_get_sense(struct list_head *work_q,
>  		      struct list_head *done_q);
>  bool scsi_noretry_cmd(struct scsi_cmnd *scmd);
>  void scsi_eh_done(struct scsi_cmnd *scmd);
> +void scsi_call_prepare_resubmit(struct list_head *done_q);
>  
>  /* scsi_lib.c */
>  extern int scsi_maybe_unblock_host(struct scsi_device *sdev);
> diff --git a/include/scsi/scsi_driver.h b/include/scsi/scsi_driver.h
> index 4ce1988b2ba0..2b11be896eee 100644
> --- a/include/scsi/scsi_driver.h
> +++ b/include/scsi/scsi_driver.h
> @@ -18,6 +18,7 @@ struct scsi_driver {
>  	int (*done)(struct scsi_cmnd *);
>  	int (*eh_action)(struct scsi_cmnd *, int);
>  	void (*eh_reset)(struct scsi_cmnd *);
> +	void (*eh_prepare_resubmit)(struct list_head *cmd_list);
>  };
>  #define to_scsi_driver(drv) \
>  	container_of((drv), struct scsi_driver, gendrv)
Damien Le Moal Aug. 17, 2023, 11:21 a.m. UTC | #3
On 8/17/23 04:53, Bart Van Assche wrote:
> If zoned writes (REQ_OP_WRITE) for a sequential write required zone have
> a starting LBA that differs from the write pointer, e.g. because zoned
> writes have been reordered, then the storage device will respond with an
> UNALIGNED WRITE COMMAND error. Send commands that failed with an
> unaligned write error to the SCSI error handler if zone write locking is
> disabled. The SCSI error handler will sort SCSI commands per LBA before
> resubmitting these.
> 
> If zone write locking is disabled, increase the number of retries for
> write commands sent to a sequential zone to the maximum number of
> outstanding commands because in the worst case the number of times
> reordered zoned writes have to be retried is (number of outstanding
> writes per sequential zone) - 1.
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Looks OK.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Bart Van Assche Aug. 17, 2023, 2:21 p.m. UTC | #4
On 8/17/23 04:01, Damien Le Moal wrote:
> On 8/17/23 04:53, Bart Van Assche wrote:
>> Make blk_req_needs_zone_write_lock() return false if
>> q->limits.use_zone_write_lock is false. Inline this function because it
>> is short and because it is called from the hot path of the mq-deadline
>> I/O scheduler.
> 
> Your are not actually inlining the function. Did you forget to move it to
> blkdev.h ?

I considered inlining but eventually decided not to inline 
blk_req_needs_zone_write_lock(). I will update the patch description.

Thanks,

Bart.
Bart Van Assche Aug. 17, 2023, 2:26 p.m. UTC | #5
On 8/17/23 04:10, Damien Le Moal wrote:
> On 8/17/23 04:53, Bart Van Assche wrote:
>> +/*
>> + * Returns true if the commands in @done_q should be sorted in LBA order
>> + * before being resubmitted.
>> + */
>> +static bool scsi_needs_sorting(struct list_head *done_q)
>> +{
>> +	struct scsi_cmnd *scmd;
>> +
>> +	list_for_each_entry(scmd, done_q, eh_entry) {
>> +		struct request *rq = scsi_cmd_to_rq(scmd);
>> +
>> +		if (!rq->q->limits.use_zone_write_lock &&
>> +		    blk_rq_is_seq_zoned_write(rq))
>> +			return true;
>> +	}
>> +
>> +	return false;
>> +}
>> +
>> +/*
>> + * Comparison function that allows to sort SCSI commands by ULD driver.
>> + */
>> +static int scsi_cmp_uld(void *priv, const struct list_head *_a,
>> +			const struct list_head *_b)
>> +{
>> +	struct scsi_cmnd *a = list_entry(_a, typeof(*a), eh_entry);
>> +	struct scsi_cmnd *b = list_entry(_b, typeof(*b), eh_entry);
>> +
>> +	/* See also the comment above the list_sort() definition. */
>> +	return scsi_cmd_to_driver(a) > scsi_cmd_to_driver(b);
>> +}
>> +
>> +void scsi_call_prepare_resubmit(struct list_head *done_q)
>> +{
>> +	struct scsi_cmnd *scmd, *next;
>> +
>> +	if (!scsi_needs_sorting(done_q))
>> +		return;
> 
> This is strange. The eh_prepare_resubmit callback is generic and its name does
> not indicate anything related to sorting by LBAs. So this check would prevent
> other actions not related to sorting by LBA. This should go away.
> 
> In patch 6, based on the device characteristics, the sd driver should decides
> if it needs to set .eh_prepare_resubmit or not.
> 
> And ideally, if all ULDs have eh_prepare_resubmit == NULL, this function should
> return here before going through the list of commands to resubmit. Given that
> this list should generally be small, going through it and doing nothing should
> be OK though...

I can add a eh_prepare_resubmit == NULL check early in 
scsi_call_prepare_resubmit(). Regarding the code inside 
scsi_needs_sorting(), how about moving that code into an additional 
callback function, e.g. eh_needs_prepare_resubmit? Setting 
.eh_prepare_resubmit depending on the zone model would prevent 
constification of struct scsi_driver.

Thanks,

Bart.
Bao D. Nguyen Aug. 17, 2023, 7 p.m. UTC | #6
On 8/16/2023 12:53 PM, Bart Van Assche wrote:
>  From the UFSHCI 4.0 specification, about the legacy (single queue) mode:
> "The host controller always process transfer requests in-order according
> to the order submitted to the list. In case of multiple commands with
> single doorbell register ringing (batch mode), The dispatch order for
> these transfer requests by host controller will base on their index in
> the List. A transfer request with lower index value will be executed
> before a transfer request with higher index value."
> 
>  From the UFSHCI 4.0 specification, about the MCQ mode:
> "Command Submission
> 1. Host SW writes an Entry to SQ
> 2. Host SW updates SQ doorbell tail pointer
> 
> Command Processing
> 3. After fetching the Entry, Host Controller updates SQ doorbell head
>     pointer
> 4. Host controller sends COMMAND UPIU to UFS device"
> 
> In other words, for both legacy and MCQ mode, UFS controllers are
> required to forward commands to the UFS device in the order these
> commands have been received from the host.
> 
> Notes:
> - For legacy mode this is only correct if the host submits one
>    command at a time. The UFS driver does this.
> - Also in legacy mode, the command order is not preserved if
>    auto-hibernation is enabled in the UFS controller. Hence, enable
>    zone write locking if auto-hibernation is enabled.
> 
> This patch improves performance as follows on my test setup:
> - With the mq-deadline scheduler: 2.5x more IOPS for small writes.
> - When not using an I/O scheduler compared to using mq-deadline with
>    zone locking: 4x more IOPS for small writes.
> 
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Can Guo <quic_cang@quicinc.com>
> Cc: Avri Altman <avri.altman@wdc.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   drivers/ufs/core/ufshcd.c | 22 ++++++++++++++++++++--
>   1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 37d430d20939..7f5049a66cca 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -4356,6 +4356,19 @@ static int ufshcd_update_preserves_write_order(struct ufs_hba *hba,
>   				return -EPERM;
>   		}
>   	}
> +	shost_for_each_device(sdev, hba->host)
> +		blk_freeze_queue_start(sdev->request_queue);
> +	shost_for_each_device(sdev, hba->host) {
> +		struct request_queue *q = sdev->request_queue;
> +
> +		blk_mq_freeze_queue_wait(q);
> +		q->limits.driver_preserves_write_order = preserves_write_order;
> +		blk_queue_required_elevator_features(q,
> +			preserves_write_order ? 0 : ELEVATOR_F_ZBD_SEQ_WRITE);
> +		if (q->disk)
> +			disk_set_zoned(q->disk, q->limits.zoned);
> +		blk_mq_unfreeze_queue(q);
> +	}
>   
>   	return 0;
>   }
> @@ -4393,7 +4406,8 @@ int ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
>   
>   	if (!is_mcq_enabled(hba) && !prev_state && new_state) {
>   		/*
> -		 * Auto-hibernation will be enabled for legacy UFSHCI mode.
> +		 * Auto-hibernation will be enabled for legacy UFSHCI mode. Tell
> +		 * the block layer that write requests may be reordered.
>   		 */
>   		ret = ufshcd_update_preserves_write_order(hba, false);
>   		if (ret)
> @@ -4409,7 +4423,8 @@ int ufshcd_auto_hibern8_update(struct ufs_hba *hba, u32 ahit)
>   	}
>   	if (!is_mcq_enabled(hba) && prev_state && !new_state) {
>   		/*
> -		 * Auto-hibernation has been disabled.
> +		 * Auto-hibernation has been disabled. Tell the block layer that
> +		 * the order of write requests is preserved.
>   		 */
>   		ret = ufshcd_update_preserves_write_order(hba, true);
>   		WARN_ON_ONCE(ret);
> @@ -5187,6 +5202,9 @@ static int ufshcd_slave_configure(struct scsi_device *sdev)
>   
>   	ufshcd_hpb_configure(hba, sdev);
>   
> +	q->limits.driver_preserves_write_order =
> +		!ufshcd_is_auto_hibern8_supported(hba) ||
> +		FIELD_GET(UFSHCI_AHIBERN8_TIMER_MASK, hba->ahit) == 0;
In legacy SDB mode with auto-hibernation enabled, the default setting 
for the driver_preserves_write_order = false.

Using the default setting, it may be missing this check that is part of 
the ufshcd_auto_hibern8_update()->ufshcd_update_preserves_write_order().

Auto-hibernation should not be enabled by default unless these 
conditions are met?
	if (blk_queue_is_zoned(q) && !q->elevator)
		return -EPERM;


>   	blk_queue_update_dma_pad(q, PRDT_DATA_BYTE_COUNT_PAD - 1);
>   	if (hba->quirks & UFSHCD_QUIRK_4KB_DMA_ALIGNMENT)
>   		blk_queue_update_dma_alignment(q, SZ_4K - 1);
Bart Van Assche Aug. 17, 2023, 7:34 p.m. UTC | #7
On 8/17/23 12:00, Bao D. Nguyen wrote:
> In legacy SDB mode with auto-hibernation enabled, the default setting for the
> driver_preserves_write_order = false.
 >
> Using the default setting, it may be missing this check that is part of the ufshcd_auto_hibern8_update()->ufshcd_update_preserves_write_order().

If auto-hibernation is enabled by the host driver, driver_preserves_write_order
is set by the following code in ufshcd_slave_configure():

	q->limits.driver_preserves_write_order =
		!ufshcd_is_auto_hibern8_supported(hba) ||
		FIELD_GET(UFSHCI_AHIBERN8_TIMER_MASK, hba->ahit) == 0;

Does this answer your question?

Thanks,

Bart.
Bao D. Nguyen Aug. 17, 2023, 9:47 p.m. UTC | #8
On 8/17/2023 12:34 PM, Bart Van Assche wrote:
> On 8/17/23 12:00, Bao D. Nguyen wrote:
>> In legacy SDB mode with auto-hibernation enabled, the default setting 
>> for the
>> driver_preserves_write_order = false.
>  >
>> Using the default setting, it may be missing this check that is part 
>> of the 
>> ufshcd_auto_hibern8_update()->ufshcd_update_preserves_write_order().
> 
> If auto-hibernation is enabled by the host driver, 
> driver_preserves_write_order
> is set by the following code in ufshcd_slave_configure():
> 
>      q->limits.driver_preserves_write_order =
>          !ufshcd_is_auto_hibern8_supported(hba) ||
>          FIELD_GET(UFSHCI_AHIBERN8_TIMER_MASK, hba->ahit) == 0;
> 
> Does this answer your question?
Hi Bart,
My concern is that in the ufshcd_update_preserves_write_order() you have 
this logic:

	if (!preserves_write_order) {
		shost_for_each_device(sdev, hba->host) {
			struct request_queue *q = sdev->request_queue;
			/*...*/
			if (blk_queue_is_zoned(q) && !q->elevator)
				return -EPERM;
		}
	}

The above logic is only invoked when ufshcd_auto_hibern8_update() is 
called. During initialization, ufshcd_auto_hibern8_update() is not 
called. Therefore, you may have SDB mode with auto hibernate enabled -> 
preserves_write_order = false, and (blk_queue_is_zoned(q) && 
!q->elevator) is true. Is that a problem? If you called 
ufshcd_auto_hibern8_update() during ufs probe, you would have returned 
-EPERM and not enable auto-hibernation in that case.

Thanks,
Bao


> 
> Thanks,
> 
> Bart.
Bart Van Assche Aug. 17, 2023, 10:05 p.m. UTC | #9
On 8/17/23 14:47, Bao D. Nguyen wrote:
> During initialization, ufshcd_auto_hibern8_update() is not called. Therefore,
> you may have SDB mode with auto hibernate enabled -> preserves_write_order = false, [ ... ]

Hi Bao,

ufshcd_slave_configure() is called before any SCSI commands are submitted to a
logical unit. ufshcd_slave_configure() sets 'preserves_write_order' depending on
the value of hba->ahit. Does this answer your question?

Thanks,

Bart.
Bao D. Nguyen Aug. 18, 2023, 12:19 a.m. UTC | #10
On 8/17/2023 3:05 PM, Bart Van Assche wrote:
> On 8/17/23 14:47, Bao D. Nguyen wrote:
>> During initialization, ufshcd_auto_hibern8_update() is not called. 
>> Therefore,
>> you may have SDB mode with auto hibernate enabled -> 
>> preserves_write_order = false, [ ... ]
> 
> Hi Bao,
> 
> ufshcd_slave_configure() is called before any SCSI commands are 
> submitted to a
> logical unit. ufshcd_slave_configure() sets 'preserves_write_order' 
> depending on
> the value of hba->ahit. Does this answer your question?

Sorry Bart. Not yet :-) Please let me try to explain myself again.

For example, in SDB mode, after the probe and you want to enable 
auto-hibern8, you would call ufshcd_auto_hibern8_update() which then calls
ufshcd_update_preserves_write_order(). Before auto-hibern8 is enabled, 
you would check this condition:
	if (blk_queue_is_zoned(q) && !q->elevator)

In other words, auto-hibern8 is enabled only if the above condition false.

However, the during a normal operation, the ufshcd_auto_hibern8_update() 
may not get called at all, and auto-hibern8 can be enabled in SDB mode 
as part of ufs init. Would that be a problem to have auto-hibern8 
enabled without checking whether the above condition is false?

Thanks
Bao

> 
> Thanks,
> 
> Bart.
Damien Le Moal Aug. 18, 2023, 2:38 a.m. UTC | #11
On 2023/08/17 23:26, Bart Van Assche wrote:
> On 8/17/23 04:10, Damien Le Moal wrote:
>> On 8/17/23 04:53, Bart Van Assche wrote:
>>> +/*
>>> + * Returns true if the commands in @done_q should be sorted in LBA order
>>> + * before being resubmitted.
>>> + */
>>> +static bool scsi_needs_sorting(struct list_head *done_q)
>>> +{
>>> +	struct scsi_cmnd *scmd;
>>> +
>>> +	list_for_each_entry(scmd, done_q, eh_entry) {
>>> +		struct request *rq = scsi_cmd_to_rq(scmd);
>>> +
>>> +		if (!rq->q->limits.use_zone_write_lock &&
>>> +		    blk_rq_is_seq_zoned_write(rq))
>>> +			return true;
>>> +	}
>>> +
>>> +	return false;
>>> +}
>>> +
>>> +/*
>>> + * Comparison function that allows to sort SCSI commands by ULD driver.
>>> + */
>>> +static int scsi_cmp_uld(void *priv, const struct list_head *_a,
>>> +			const struct list_head *_b)
>>> +{
>>> +	struct scsi_cmnd *a = list_entry(_a, typeof(*a), eh_entry);
>>> +	struct scsi_cmnd *b = list_entry(_b, typeof(*b), eh_entry);
>>> +
>>> +	/* See also the comment above the list_sort() definition. */
>>> +	return scsi_cmd_to_driver(a) > scsi_cmd_to_driver(b);
>>> +}
>>> +
>>> +void scsi_call_prepare_resubmit(struct list_head *done_q)
>>> +{
>>> +	struct scsi_cmnd *scmd, *next;
>>> +
>>> +	if (!scsi_needs_sorting(done_q))
>>> +		return;
>>
>> This is strange. The eh_prepare_resubmit callback is generic and its name does
>> not indicate anything related to sorting by LBAs. So this check would prevent
>> other actions not related to sorting by LBA. This should go away.
>>
>> In patch 6, based on the device characteristics, the sd driver should decides
>> if it needs to set .eh_prepare_resubmit or not.
>>
>> And ideally, if all ULDs have eh_prepare_resubmit == NULL, this function should
>> return here before going through the list of commands to resubmit. Given that
>> this list should generally be small, going through it and doing nothing should
>> be OK though...
> 
> I can add a eh_prepare_resubmit == NULL check early in 
> scsi_call_prepare_resubmit(). Regarding the code inside 
> scsi_needs_sorting(), how about moving that code into an additional 
> callback function, e.g. eh_needs_prepare_resubmit? Setting 
> .eh_prepare_resubmit depending on the zone model would prevent 
> constification of struct scsi_driver.

Sounds OK.

> 
> Thanks,
> 
> Bart.
Bart Van Assche Aug. 18, 2023, 5:56 p.m. UTC | #12
On 8/17/23 17:19, Bao D. Nguyen wrote:
> For example, in SDB mode, after the probe and you want to enable auto-hibern8, you would call ufshcd_auto_hibern8_update() which then calls
> ufshcd_update_preserves_write_order(). Before auto-hibern8 is enabled, you would check this condition:
>      if (blk_queue_is_zoned(q) && !q->elevator)
> 
> In other words, auto-hibern8 is enabled only if the above condition false.
> 
> However, the during a normal operation, the ufshcd_auto_hibern8_update() may not get called at all, and auto-hibern8 can be enabled in SDB mode as part of ufs init. Would that be a problem to have auto-hibern8 enabled without checking whether the above condition is false?

Hi Bao,

Probing the host controller happens as follows:
* The host controller probing function is called, e.g. ufs_qcom_probe().
* ufshcd_alloc_host() and ufshcd_init() are called directly or indirectly
   by the host controller probe function.
* ufshcd_init() calls scsi_add_host() and async_schedule(ufshcd_async_scan, hba).

Once ufshcd_async_scan() is called asynchronously, it performs the
following actions:
* ufs_probe_hba() is called. This function calls ufshcd_link_startup()
   indirectly. That function invokes the link_startup_notify vop if it has
   been defined. Some host drivers set hba->ahit from inside that vop.
* scsi_scan_host() is called and calls scsi_alloc_sdev() indirectly.
* scsi_alloc_sdev() calls blk_mq_init_queue() and shost->hostt->slave_alloc().
* scsi_add_lun() calls sdev->host->hostt->slave_configure().
* scsi_add_lun() calls scsi_sysfs_add_sdev(). This indirectly causes sd_probe()
   to be called asynchronously.
* sd_probe() calls sd_revalidate_disk(). This results in an indirect call of
   disk_set_zoned(). Additionally, the ELEVATOR_F_ZBD_SEQ_WRITE flag is set by
   the sd driver if q->limits.driver_preserves_write_order has not been set.
* Still from inside sd_probe(), device_add_disk() is called and a scheduler
   is selected based on the value of q->required_elevator_features.

There are two ways in which host drivers initialize auto-hibernation:
* Either by setting hba->ahit before ufshcd_init() is called.
* Or by calling ufshcd_auto_hibern8_update() from the link_startup_notify
   vop.

I think the above shows that the zoned model is queried *after* the above methods
for enabling auto-hibernation have completed and hence that blk_queue_is_zoned()
evaluates to false if a host driver calls ufshcd_auto_hibern8_update() from
inside the link_startup_notify vop.

If auto-hibernation is enabled from inside the host driver then that should
happen before sd_probe() is called. sd_probe() will use the value of
q->limits.driver_preserves_write_order that has been set by
ufshcd_slave_configure() so there is no risk for inconsistencies between the
auto-hibernation configuration and the request queue configuration.

Bart.