mbox series

[v9,00/33] scsi: Allow scsi_execute users to control retries

Message ID 20230711214620.87232-1-michael.christie@oracle.com
Headers show
Series scsi: Allow scsi_execute users to control retries | expand

Message

Mike Christie July 11, 2023, 9:45 p.m. UTC
The following patches were made over Martin's 6.6 scsi-queue branch.
They allow scsi_execute_cmd users to control exactly which errors are
retried, so we can reduce the sense/sshd handling they have to do and
have it one place.

The patches allow scsi_execute_cmd users to pass in an array of failures
which they want retried/failed and also specify how many times they want
them retried. If we hit an error that the user did not specify then we drop
down to the default behavior. This allows us to remove almost all the
retry logic from scsi_execute_cmd users. We just have the special cases
where we want to retry with a difference size command or sleep between
retries.

v9:
- Drop spi_execute changes from [PATCH] scsi: spi: Fix sshdr use
- Change git commit message for sshdr use fixes to try and make it
more clear when we need to check and when we can check.

v8:
- Rebase.
- Saw the discussion about the possible bug where callers are
accessing the sshdr when it's not setup, so I added some patches
for that since I was going over the same code.

v7:
- Rebase against scsi_execute_cmd patchset.

v6:
- Fix kunit build when it's built as a module but scsi is not.
- Drop [PATCH 17/35] scsi: ufshcd: Convert to scsi_exec_req because
  that driver no longer uses scsi_execute.
- Convert ufshcd to use the scsi_failures struct because it now just does
  direct retries and does not do it's own deadline timing.
- Add back memset in read_capacity_16.
- Remove memset in get_sectorsize and replace with { } to init buf.

v5:
- Fix spelling (made sure I ran checkpatch strict)
- Drop SCMD_FAILURE_NONE
- Rename SCMD_FAILURE_ANY
- Fix media_not_present handling where it was being retried instead of
failed.
- Fix ILLEGAL_REQUEST handling in read_capacity_16 so it was not retried.
- Fix coding style, spelling and and naming convention in kunit and added
  more tests to handle cases like the media_not_present one where we want
  to force failures instead of retries.
- Drop cxlflash patch because it actually checked it's internal state before
  performing a retry which we currently do not support.

v4:
- Redefine cmd definitions if the cmd is touched.
- Fix up coding style issues.
- Use sam_status enum.
- Move failures initialization to scsi_initialize_rq
(also fixes KASAN error).
- Add kunit test.
- Add function comments.

v3:
- Use a for loop in scsi_check_passthrough
- Fix result handling/testing.
- Fix scsi_status_is_good handling.
- make __scsi_exec_req take a const arg
- Fix formatting in patch 24

v2:
- Rename scsi_prep_sense
- Change scsi_check_passthrough's loop and added some fixes
- Modified scsi_execute* so it uses a struct to pass in args

Comments

Mike Christie July 11, 2023, 9:50 p.m. UTC | #1
I'm sorry to have to spam the list. While sending this patchset
outlook started to return failures on me, so only patches 0 -24
got sent.

I'm going to look into it, and will resend this patchset when I
figure out what happened.

On 7/11/23 4:45 PM, Mike Christie wrote:
> The following patches were made over Martin's 6.6 scsi-queue branch.
> They allow scsi_execute_cmd users to control exactly which errors are
> retried, so we can reduce the sense/sshd handling they have to do and
> have it one place.
> 
> The patches allow scsi_execute_cmd users to pass in an array of failures
> which they want retried/failed and also specify how many times they want
> them retried. If we hit an error that the user did not specify then we drop
> down to the default behavior. This allows us to remove almost all the
> retry logic from scsi_execute_cmd users. We just have the special cases
> where we want to retry with a difference size command or sleep between
> retries.
John Garry July 12, 2023, 11:17 a.m. UTC | #2
On 11/07/2023 22:45, Mike Christie wrote:
> This has scsi_probe_lun ask scsi-ml to retry UAs instead of driving them
> itself.
> 
> There is one behavior change with this patch. We used to get a total of
> 3 retries for both UAs we were checking for. We now get 3 retries for
> each.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: John Garry <john.g.garry@oracle.com>

> ---
>   drivers/scsi/scsi_scan.c | 42 +++++++++++++++++++++++-----------------
>   1 file changed, 24 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index aa13feb17c62..519755952254 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -647,10 +647,29 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result,
>   	int first_inquiry_len, try_inquiry_len, next_inquiry_len;
>   	int response_len = 0;
>   	int pass, count, result, resid;
> -	struct scsi_sense_hdr sshdr;
> +	/*
> +	 * not-ready to ready transition [asc/ascq=0x28/0x0] or power-on,
> +	 * reset [asc/ascq=0x29/0x0], continue. INQUIRY should not yield
> +	 * UNIT_ATTENTION but many buggy devices do so anyway.
> +	 */
> +	struct scsi_failure failures[] = {
> +		{
> +			.sense = UNIT_ATTENTION,
> +			.asc = 0x28,
> +			.allowed = 3,
> +			.result = SAM_STAT_CHECK_CONDITION,
> +		},
> +		{
> +			.sense = UNIT_ATTENTION,
> +			.asc = 0x29,
> +			.allowed = 3,
> +			.result = SAM_STAT_CHECK_CONDITION,
> +		},
> +		{},

nit: no need for ',' on a sentinel

> +	};
>   	const struct scsi_exec_args exec_args = {
> -		.sshdr = &sshdr,
>   		.resid = &resid,
> +		.failures = failures,
>   	};
>   
>   	*bflags = 0;
> @@ -668,6 +687,8 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result,
>   				pass, try_inquiry_len));
>   
>   	/* Each pass gets up to three chances to ignore Unit Attention */
> +	scsi_reset_failures(failures);
> +
>   	for (count = 0; count < 3; ++count) {
>   		memset(scsi_cmd, 0, 6);
>   		scsi_cmd[0] = INQUIRY;
> @@ -684,22 +705,7 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result,
>   				"scsi scan: INQUIRY %s with code 0x%x\n",
>   				result ? "failed" : "successful", result));
>   
> -		if (result > 0) {
> -			/*
> -			 * not-ready to ready transition [asc/ascq=0x28/0x0]
> -			 * or power-on, reset [asc/ascq=0x29/0x0], continue.
> -			 * INQUIRY should not yield UNIT_ATTENTION
> -			 * but many buggy devices do so anyway.
> -			 */
> -			if (scsi_status_is_check_condition(result) &&
> -			    scsi_sense_valid(&sshdr)) {
> -				if ((sshdr.sense_key == UNIT_ATTENTION) &&
> -				    ((sshdr.asc == 0x28) ||
> -				     (sshdr.asc == 0x29)) &&
> -				    (sshdr.ascq == 0))
> -					continue;
> -			}
> -		} else if (result == 0) {
> +		if (result == 0) {
>   			/*
>   			 * if nothing was transferred, we try
>   			 * again. It's a workaround for some USB