diff mbox series

[v4,03/10] scsi: core: Support failing requests while recovering

Message ID 20221018202958.1902564-4-bvanassche@acm.org
State New
Headers show
Series Fix a deadlock in the UFS driver | expand

Commit Message

Bart Van Assche Oct. 18, 2022, 8:29 p.m. UTC
The current behavior for SCSI commands submitted while error recovery
is ongoing is to retry command submission after error recovery has
finished. See also the scsi_host_in_recovery() check in
scsi_host_queue_ready(). Add support for failing SCSI commands while
host recovery is in progress. This functionality will be used to fix a
deadlock in the UFS driver.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_lib.c  | 8 +++++---
 include/scsi/scsi_cmnd.h | 3 ++-
 2 files changed, 7 insertions(+), 4 deletions(-)

Comments

Mike Christie Oct. 19, 2022, 7:52 p.m. UTC | #1
On 10/18/22 3:29 PM, Bart Van Assche wrote:
> The current behavior for SCSI commands submitted while error recovery
> is ongoing is to retry command submission after error recovery has
> finished. See also the scsi_host_in_recovery() check in
> scsi_host_queue_ready(). Add support for failing SCSI commands while
> host recovery is in progress. This functionality will be used to fix a
> deadlock in the UFS driver.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: John Garry <john.garry@huawei.com>
> Cc: Mike Christie <michael.christie@oracle.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/scsi_lib.c  | 8 +++++---
>  include/scsi/scsi_cmnd.h | 3 ++-
>  2 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index fa96d3cfdfa3..ec890865abae 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1341,9 +1341,6 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
>  				   struct scsi_device *sdev,
>  				   struct scsi_cmnd *cmd)
>  {
> -	if (scsi_host_in_recovery(shost))
> -		return 0;
> -
>  	if (atomic_read(&shost->host_blocked) > 0) {
>  		if (scsi_host_busy(shost) > 0)
>  			goto starved;
> @@ -1732,6 +1729,11 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>  	ret = BLK_STS_RESOURCE;
>  	if (!scsi_target_queue_ready(shost, sdev))
>  		goto out_put_budget;
> +	if (unlikely(scsi_host_in_recovery(shost))) {
> +		if (cmd->flags & SCMD_FAIL_IF_RECOVERING)
> +			ret = BLK_STS_OFFLINE;
> +		goto out_dec_target_busy;
> +	}

Hey,

Will we always hit this check? For example, if we have hit the
device's queue depth limit so

scsi_mq_get_budget -> scsi_dev_queue_ready

is returning -1, will we not even call scsi_queue_rq? So because
we are in recovery and no commands are completing, we will be
stuck waiting for a token to be put back on the sdev->budget_map.

Do you need a similar check in scsi_dev_queue_ready or should
the check go in there only or can we hit a race for the latter?
Bart Van Assche Oct. 19, 2022, 9:11 p.m. UTC | #2
On 10/19/22 12:52, Mike Christie wrote:
> Will we always hit this check? For example, if we have hit the
> device's queue depth limit so
> 
> scsi_mq_get_budget -> scsi_dev_queue_ready
> 
> is returning -1, will we not even call scsi_queue_rq? So because
> we are in recovery and no commands are completing, we will be
> stuck waiting for a token to be put back on the sdev->budget_map.
> 
> Do you need a similar check in scsi_dev_queue_ready or should
> the check go in there only or can we hit a race for the latter?

Hi Mike,

A later patch in this series uses the SCMD_FAIL_IF_RECOVERING flag in 
the suspend path of the UFS driver. The UFS suspend code waits for the 
SCSI error handler to finish before the UFS device is suspended. Does 
this answer your question?

Bart.
Mike Christie Oct. 20, 2022, 5:20 p.m. UTC | #3
On 10/19/22 4:11 PM, Bart Van Assche wrote:
> On 10/19/22 12:52, Mike Christie wrote:
>> Will we always hit this check? For example, if we have hit the
>> device's queue depth limit so
>>
>> scsi_mq_get_budget -> scsi_dev_queue_ready
>>
>> is returning -1, will we not even call scsi_queue_rq? So because
>> we are in recovery and no commands are completing, we will be
>> stuck waiting for a token to be put back on the sdev->budget_map.
>>
>> Do you need a similar check in scsi_dev_queue_ready or should
>> the check go in there only or can we hit a race for the latter?
> 
> Hi Mike,
> 
> A later patch in this series uses the SCMD_FAIL_IF_RECOVERING flag in the suspend path of the UFS driver. The UFS suspend code waits for the SCSI error handler to finish before the UFS device is suspended. Does this answer your question?

The reply for patch 9 about no other commands should be running
at this time answers my question, so we should be ok.

Reviewed-by: Mike Christie <michael.christie@oracle.com>
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index fa96d3cfdfa3..ec890865abae 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1341,9 +1341,6 @@  static inline int scsi_host_queue_ready(struct request_queue *q,
 				   struct scsi_device *sdev,
 				   struct scsi_cmnd *cmd)
 {
-	if (scsi_host_in_recovery(shost))
-		return 0;
-
 	if (atomic_read(&shost->host_blocked) > 0) {
 		if (scsi_host_busy(shost) > 0)
 			goto starved;
@@ -1732,6 +1729,11 @@  static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 	ret = BLK_STS_RESOURCE;
 	if (!scsi_target_queue_ready(shost, sdev))
 		goto out_put_budget;
+	if (unlikely(scsi_host_in_recovery(shost))) {
+		if (cmd->flags & SCMD_FAIL_IF_RECOVERING)
+			ret = BLK_STS_OFFLINE;
+		goto out_dec_target_busy;
+	}
 	if (!scsi_host_queue_ready(q, shost, sdev, cmd))
 		goto out_dec_target_busy;
 
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 7d3622db38ed..c2cb5f69635c 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -52,8 +52,9 @@  struct scsi_pointer {
 #define SCMD_TAGGED		(1 << 0)
 #define SCMD_INITIALIZED	(1 << 1)
 #define SCMD_LAST		(1 << 2)
+#define SCMD_FAIL_IF_RECOVERING	(1 << 4)
 /* flags preserved across unprep / reprep */
-#define SCMD_PRESERVED_FLAGS	(SCMD_INITIALIZED)
+#define SCMD_PRESERVED_FLAGS	(SCMD_INITIALIZED | SCMD_FAIL_IF_RECOVERING)
 
 /* for scmd->state */
 #define SCMD_STATE_COMPLETE	0