diff mbox series

[09/10] scsi_error: map FAST_IO_FAIL to -EAGAIN in SCSI EH

Message ID 20231023092837.33786-10-hare@suse.de
State New
Headers show
Series scsi: EH rework, main part | expand

Commit Message

Hannes Reinecke Oct. 23, 2023, 9:28 a.m. UTC
Returning FAST_IO_FAIL from any of the SCSI EH functions is perfectly
valid, and indicates that the request could not be executed due to
the transport being busy.
But that is not an I/O error, and we should return -EAGAIN from
scsi_ioctl_reset() to correctly inform userspace.

Suggested-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/scsi_error.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

Comments

Mike Christie Oct. 26, 2023, 4:50 p.m. UTC | #1
On 10/23/23 4:28 AM, Hannes Reinecke wrote:
> Returning FAST_IO_FAIL from any of the SCSI EH functions is perfectly
> valid, and indicates that the request could not be executed due to
> the transport being busy.

I'm not sure if that's completely correct or maybe we have a different
view of what it means to be busy.

FC, iSCSI and SRP normally return it when the transport is marked as
offline/lost, so for normal IO we fail it upwards and userspace gets
an error.

What drivers use it as temp busy error code? Is it the lpfc one
or a snic one?

> But that is not an I/O error, and we should return -EAGAIN from
> scsi_ioctl_reset() to correctly inform userspace.
For the sg_reset example, if you tried to run sg_reset again it would
fail. When sg_reset tries to open the device the open function will
return failure in the open call because the device is in the
transport-offline state for FC/iSCSI/SRP.

If you are going to change the return value why not sync it with
what we return for normal IO and return BLK_STS_TRANSPORT/-ENOLINK?
Hannes Reinecke Oct. 27, 2023, 5:05 a.m. UTC | #2
On 10/26/23 18:50, Mike Christie wrote:
> On 10/23/23 4:28 AM, Hannes Reinecke wrote:
>> Returning FAST_IO_FAIL from any of the SCSI EH functions is perfectly
>> valid, and indicates that the request could not be executed due to
>> the transport being busy.
> 
> I'm not sure if that's completely correct or maybe we have a different
> view of what it means to be busy.
> 
> FC, iSCSI and SRP normally return it when the transport is marked as
> offline/lost, so for normal IO we fail it upwards and userspace gets
> an error.
> 
> What drivers use it as temp busy error code? Is it the lpfc one
> or a snic one?
> 
This patch is primarily for sg_reset(), which would return -EIO whenever
one of the EH functions would return FAST_IO_FAIL.
It got triggered by my patch to zfcp, which now returns FAST_IO_FAIL
from host reset as remote port login is happening from a worker thread,
and the devices are not (yet) available.

>> But that is not an I/O error, and we should return -EAGAIN from
>> scsi_ioctl_reset() to correctly inform userspace.
> For the sg_reset example, if you tried to run sg_reset again it would
> fail. When sg_reset tries to open the device the open function will
> return failure in the open call because the device is in the
> transport-offline state for FC/iSCSI/SRP.
> 
> If you are going to change the return value why not sync it with
> what we return for normal IO and return BLK_STS_TRANSPORT/-ENOLINK?

Good point.
The alternative would be to map FAST_IO_FAIL back to SUCCESS as after
all the host reset completed successfully, and blocked ports are not
really an error.
Lemme check.

Cheers,

Hannes
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 7c655d08a305..8e184d92abe9 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -2453,22 +2453,25 @@  scsi_ioctl_reset(struct scsi_device *dev, int __user *arg)
 		break;
 	case SG_SCSI_RESET_DEVICE:
 		rtn = scsi_try_bus_device_reset(dev);
-		if (rtn == SUCCESS || (val & SG_SCSI_RESET_NO_ESCALATE))
+		if (rtn == SUCCESS || rtn == FAST_IO_FAIL ||
+		    (val & SG_SCSI_RESET_NO_ESCALATE))
 			break;
 		fallthrough;
 	case SG_SCSI_RESET_TARGET:
 		rtn = scsi_try_target_reset(shost, starget);
-		if (rtn == SUCCESS || (val & SG_SCSI_RESET_NO_ESCALATE))
+		if (rtn == SUCCESS || rtn == FAST_IO_FAIL ||
+		    (val & SG_SCSI_RESET_NO_ESCALATE))
 			break;
 		fallthrough;
 	case SG_SCSI_RESET_BUS:
 		rtn = scsi_try_bus_reset(shost, dev->channel);
-		if (rtn == SUCCESS || (val & SG_SCSI_RESET_NO_ESCALATE))
+		if (rtn == SUCCESS || rtn == FAST_IO_FAIL ||
+		    (val & SG_SCSI_RESET_NO_ESCALATE))
 			break;
 		fallthrough;
 	case SG_SCSI_RESET_HOST:
 		rtn = scsi_try_host_reset(shost);
-		if (rtn == SUCCESS)
+		if (rtn == SUCCESS || rtn == FAST_IO_FAIL)
 			break;
 		fallthrough;
 	default:
@@ -2476,7 +2479,17 @@  scsi_ioctl_reset(struct scsi_device *dev, int __user *arg)
 		break;
 	}
 
-	error = (rtn == SUCCESS) ? 0 : -EIO;
+	switch (rtn) {
+	case SUCCESS:
+		error = 0;
+		break;
+	case FAST_IO_FAIL:
+		error = -EAGAIN;
+		break;
+	default:
+		error = -EIO;
+		break;
+	}
 
 	spin_lock_irqsave(shost->host_lock, flags);
 	shost->tmf_in_progress = 0;