diff mbox series

[RESEND] scsi: scan: retry INQUIRY after timeout

Message ID 20220808202018.22224-1-mwilck@suse.com
State New
Headers show
Series [RESEND] scsi: scan: retry INQUIRY after timeout | expand

Commit Message

Martin Wilck Aug. 8, 2022, 8:20 p.m. UTC
From: Martin Wilck <mwilck@suse.com>

The SCSI mid layer doesn't retry commands after DID_TIME_OUT (see
scsi_noretry_cmd()). Packet loss in the fabric can cause spurious timeouts
during SCSI device probing, causing device probing to fail. This has been
observed in FCoE uplink failover tests, for example.

This patch fixes the issue by retrying the INQUIRY up to 3 times (in practice,
we never observed more than a single retry),

Signed-off-by: Martin Wilck <mwilck@suse.com>
Tested-by: Dave Prizer <dave.prizer@hpe.com>

---
This patch was previously part of the series "Fixes for device probing
on flaky connections", submitted on 2022/06/15. The first patch of the
series has been dropped as discussed in the review process. Testing
verified that just this patch was sufficient to solve the observed
issues.

---
 drivers/scsi/scsi_scan.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Mike Christie Aug. 8, 2022, 10:11 p.m. UTC | #1
On 8/8/22 3:20 PM, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> The SCSI mid layer doesn't retry commands after DID_TIME_OUT (see
> scsi_noretry_cmd()). Packet loss in the fabric can cause spurious timeouts
> during SCSI device probing, causing device probing to fail. This has been
> observed in FCoE uplink failover tests, for example.

What about the other scan/probe related commands and other transient transport
errors like this (so when we get to the point DID_TRANSPORT_DISRUPTED is returned)?
I think if you changed your test a little so the fc port state changed, we could
still hit the same end problem. We can hit similar errors with iscsi and plain old
FC.

For REPORT_LUNS it looks like we retry almost all errors 3 times. For the
probe/setup commands, at least for disks, it looks like we also are more
forgiving and will retry DID_TIME_OUT/DID_TRANSPORT_DISRUPTED 3 times for
commands like SAI_READ_CAPACITY_16 (I didn't check every sd operation and
other upper level drivers).

However, for the other probe/setup  operations that rely on scsi_attach_vpd
succeeding like sd_read_block_limits then we will hit issues where the device
is partially setup. Should scsi_vpd_inquiry be retrying 3 times as well?

An alternative to changing all the callers would be we could make scsi_noretry_cmd
detect when it's an internal passthrough command and just retry these types of
errors. For SG IO type of passthough we still want to fail right away.

> 
> This patch fixes the issue by retrying the INQUIRY up to 3 times (in practice,
> we never observed more than a single retry),
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> Tested-by: Dave Prizer <dave.prizer@hpe.com>
> 
> ---
> This patch was previously part of the series "Fixes for device probing
> on flaky connections", submitted on 2022/06/15. The first patch of the
> series has been dropped as discussed in the review process. Testing
> verified that just this patch was sufficient to solve the observed
> issues.
> 
> ---
>  drivers/scsi/scsi_scan.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 91ac901a66826..e859a648033f9 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -697,6 +697,11 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result,
>  				    (sshdr.ascq == 0))
>  					continue;
>  			}
> +			if (host_byte(result) == DID_TIME_OUT) {
> +				SCSI_LOG_SCAN_BUS(3, sdev_printk(KERN_INFO, sdev,
> +						"scsi scan: retry inquiry after timeout\n"));
> +				continue;
> +			}
>  		} else if (result == 0) {
>  			/*
>  			 * if nothing was transferred, we try

Should there
Christoph Hellwig Aug. 9, 2022, 6:52 a.m. UTC | #2
On Mon, Aug 08, 2022 at 05:11:27PM -0500, Mike Christie wrote:
> For REPORT_LUNS it looks like we retry almost all errors 3 times. For the
> probe/setup commands, at least for disks, it looks like we also are more
> forgiving and will retry DID_TIME_OUT/DID_TRANSPORT_DISRUPTED 3 times for
> commands like SAI_READ_CAPACITY_16 (I didn't check every sd operation and
> other upper level drivers).
> 
> However, for the other probe/setup  operations that rely on scsi_attach_vpd
> succeeding like sd_read_block_limits then we will hit issues where the device
> is partially setup. Should scsi_vpd_inquiry be retrying 3 times as well?
> 
> An alternative to changing all the callers would be we could make scsi_noretry_cmd
> detect when it's an internal passthrough command and just retry these types of
> errors. For SG IO type of passthough we still want to fail right away.

Yes, I think one single place to do retries for setup path command
is much better than growing ad-hoc logic.

I just made a similar comment to similar nvme patch from SuSE a few days
ago..
Martin Wilck Aug. 9, 2022, 8:21 a.m. UTC | #3
On Mon, 2022-08-08 at 17:11 -0500, Mike Christie wrote:
> On 8/8/22 3:20 PM, mwilck@suse.com wrote:
> > From: Martin Wilck <mwilck@suse.com>
> > 
> > The SCSI mid layer doesn't retry commands after DID_TIME_OUT (see
> > scsi_noretry_cmd()). Packet loss in the fabric can cause spurious
> > timeouts
> > during SCSI device probing, causing device probing to fail. This
> > has been
> > observed in FCoE uplink failover tests, for example.
> 
> What about the other scan/probe related commands and other transient
> transport
> errors like this (so when we get to the point DID_TRANSPORT_DISRUPTED
> is returned)?
> I think if you changed your test a little so the fc port state
> changed, we could
> still hit the same end problem. We can hit similar errors with iscsi
> and plain old
> FC.

All true. My focus was to fix an issue that has been encountered 
frequently by HPE. In the test scenario at hand, I expected to still
see some errors after applying this patch, but we didn't. Can we agree
to fix this issue now, and see later what else may need fixing? 

I suppose that it's impossible to do error-proof probing in the
presence of random transport layer errors, so whatever we do will be
just a gradual improvement, improving matters in some scenarios while
possibly slowing down probing in others. Also, verifying changes in
this area with meaningful tests is difficult and a time and resource
consuming endeavour.


> For REPORT_LUNS it looks like we retry almost all errors 3 times. For
> the
> probe/setup commands, at least for disks, it looks like we also are
> more
> forgiving and will retry DID_TIME_OUT/DID_TRANSPORT_DISRUPTED 3 times
> for
> commands like SAI_READ_CAPACITY_16 (I didn't check every sd operation
> and
> other upper level drivers).
> 
> However, for the other probe/setup  operations that rely on
> scsi_attach_vpd
> succeeding like sd_read_block_limits then we will hit issues where
> the device
> is partially setup. Should scsi_vpd_inquiry be retrying 3 times as
> well?

I think so. A frequent cause of errors in the multipath context is that
the udev rules assume that as soon as the "inquiry" sysfs attribute is
valid, the attributes "vpd_pg80" and "vpd_pg83" will be valid, too. But
in the presence of transport errors, any of the vpd attributes may be
invalid unless we retry.

Perhaps it also make sense to discuss the default timeouts? Given that
the max delay is (n_retries * timeout), the worst-case delay caused by
a single probing command would not change if we cut the timeout in half
and retry DID_TIME_OUT instead. In the case at hand, that would
probably have made sense - if the INQUIRY response wasn't received
after a few seconds, it wouldn't make sense to wait any longer. But I
guess there are other scenarios where a timeout of 20s or more is
required.

Note that the kernel isn't the only point of failure. udev rules
calling sg_inq or other similar tools may fall into the same trap. It
is even worse there, because commands called from udev rules are
expected to terminate quickly, thus there isn't much room for retries.
sg_inq uses a default passthrough timeout of 60s, and no retries.

> An alternative to changing all the callers would be we could make
> scsi_noretry_cmd
> detect when it's an internal passthrough command and just retry these
> types of
> errors. For SG IO type of passthough we still want to fail right
> away.

We can't distinguish these two cases. I am not sure if we ever could,
but at least since da6269da4cfe2 ("block: remove
REQ_OP_SCSI_{IN,OUT}"), we obviously can't.

Martin K. P., Christoph, thoughts?

Regards,
Martin
Martin Wilck Aug. 9, 2022, 8:50 a.m. UTC | #4
On Tue, 2022-08-09 at 08:52 +0200, Christoph Hellwig wrote:
> > 
> > An alternative to changing all the callers would be we could make
> > scsi_noretry_cmd
> > detect when it's an internal passthrough command and just retry
> > these types of
> > errors. For SG IO type of passthough we still want to fail right
> > away.
> 
> Yes, I think one single place to do retries for setup path command
> is much better than growing ad-hoc logic.
> 
> I just made a similar comment to similar nvme patch from SuSE a few
> days
> ago..

Are you suggesting to re-invent REQ_OP_SCSI_{IN,OUT} to distinguish
SG_IO from kernel-internal passthrough?

Martin
Christoph Hellwig Aug. 9, 2022, 8:51 a.m. UTC | #5
On Tue, Aug 09, 2022 at 10:50:50AM +0200, Martin Wilck wrote:
> Are you suggesting to re-invent REQ_OP_SCSI_{IN,OUT} to distinguish
> SG_IO from kernel-internal passthrough?

No.  We just need a flag in struct scsi_cmnd that controls the retry
behavior for passthrugh commands.
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 91ac901a66826..e859a648033f9 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -697,6 +697,11 @@  static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result,
 				    (sshdr.ascq == 0))
 					continue;
 			}
+			if (host_byte(result) == DID_TIME_OUT) {
+				SCSI_LOG_SCAN_BUS(3, sdev_printk(KERN_INFO, sdev,
+						"scsi scan: retry inquiry after timeout\n"));
+				continue;
+			}
 		} else if (result == 0) {
 			/*
 			 * if nothing was transferred, we try