diff mbox series

[5/9] scsi: set host byte after EH completed

Message ID 20231016121542.111501-6-hare@suse.de
State New
Headers show
Series scsi: EH rework, main part | expand

Commit Message

Hannes Reinecke Oct. 16, 2023, 12:15 p.m. UTC
When SCSI EH completes we should be setting the host byte to
DID_ABORT, DID_RESET, or DID_TRANSPORT_DISRUPTED to inform
the caller that some EH processing has happened.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/scsi_error.c | 35 ++++++++++++++++++++++++-----------
 1 file changed, 24 insertions(+), 11 deletions(-)

Comments

Johannes Thumshirn Oct. 16, 2023, 1:59 p.m. UTC | #1
On 16.10.23 14:16, Hannes Reinecke wrote:
> -void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
> +void __scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q,
> +			int host_byte)
>   {
> +	if (host_byte)
> +		set_host_byte(scmd, host_byte);

I think the 'if' is not needed here.
Hannes Reinecke Oct. 17, 2023, 7:13 a.m. UTC | #2
On 10/16/23 15:59, Johannes Thumshirn wrote:
> On 16.10.23 14:16, Hannes Reinecke wrote:
>> -void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
>> +void __scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q,
>> +			int host_byte)
>>    {
>> +	if (host_byte)
>> +		set_host_byte(scmd, host_byte);
> 
> I think the 'if' is not needed here.
> 
Actually, I prefer to keep it as passing in 'DID_OK' would _not_ modify
the status; leaving out the 'if' would always overwrite it.

Cheers,

Hannes
Christoph Hellwig Oct. 17, 2023, 7:25 a.m. UTC | #3
On Mon, Oct 16, 2023 at 02:15:38PM +0200, Hannes Reinecke wrote:
> When SCSI EH completes we should be setting the host byte to
> DID_ABORT, DID_RESET, or DID_TRANSPORT_DISRUPTED to inform
> the caller that some EH processing has happened.

I have a hard time following this commit log.  Yes, we probably
should.  But so far we haven't, so why is this suddenly a problem?

> -void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
> +void __scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q,
> +			int host_byte)
>  {
> +	if (host_byte)
> +		set_host_byte(scmd, host_byte);
>  	list_move_tail(&scmd->eh_entry, done_q);

What is the point of passing in the host_byte vs just setting it in
the caller?

In fat I'm not even quite sure what the point of the existing helper
is, as moving the command to the passed in queue doesn't provide
much of a useful abstraction.
Hannes Reinecke Oct. 17, 2023, 8:14 a.m. UTC | #4
On 10/17/23 09:25, Christoph Hellwig wrote:
> On Mon, Oct 16, 2023 at 02:15:38PM +0200, Hannes Reinecke wrote:
>> When SCSI EH completes we should be setting the host byte to
>> DID_ABORT, DID_RESET, or DID_TRANSPORT_DISRUPTED to inform
>> the caller that some EH processing has happened.
> 
> I have a hard time following this commit log.  Yes, we probably
> should.  But so far we haven't, so why is this suddenly a problem?
> 
>> -void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
>> +void __scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q,
>> +			int host_byte)
>>   {
>> +	if (host_byte)
>> +		set_host_byte(scmd, host_byte);
>>   	list_move_tail(&scmd->eh_entry, done_q);
> 
> What is the point of passing in the host_byte vs just setting it in
> the caller?
> 
Hmm. Sure, could do.

> In fat I'm not even quite sure what the point of the existing helper
> is, as moving the command to the passed in queue doesn't provide
> much of a useful abstraction.
> 
Share your sentiments.
One could open-code it, but if I move the host_byte setting into the
caller this function won't be touched at all.
And it's time to clean up the entire list splice-and-dice game in
SCSI EH once this rework is in; my plan is to move failed commands
onto a per-entity list, and merge them onto the next higher entity
list once an escalation fails.

So maybe shelf it for now, and just open-code the host_byte setting
in the caller.

Cheers,

Hannes
Christoph Hellwig Oct. 18, 2023, 5:46 a.m. UTC | #5
On Tue, Oct 17, 2023 at 10:14:19AM +0200, Hannes Reinecke wrote:
>> What is the point of passing in the host_byte vs just setting it in
>> the caller?
>>
> Hmm. Sure, could do.
>
>> In fat I'm not even quite sure what the point of the existing helper
>> is, as moving the command to the passed in queue doesn't provide
>> much of a useful abstraction.
>>
> Share your sentiments.
> One could open-code it, but if I move the host_byte setting into the
> caller this function won't be touched at all.

Yes, I can rant about existing code, not just new code :)

> And it's time to clean up the entire list splice-and-dice game in
> SCSI EH once this rework is in; my plan is to move failed commands
> onto a per-entity list, and merge them onto the next higher entity
> list once an escalation fails.
>
> So maybe shelf it for now, and just open-code the host_byte setting
> in the caller.

Sounds good.
Mike Christie Oct. 19, 2023, 8:30 p.m. UTC | #6
On 10/16/23 7:15 AM, Hannes Reinecke wrote:
> @@ -1671,7 +1682,8 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost,
>  			if (rtn == SUCCESS)
>  				list_move_tail(&scmd->eh_entry, &check_list);
>  			else if (rtn == FAST_IO_FAIL)
> -				scsi_eh_finish_cmd(scmd, done_q);
> +				__scsi_eh_finish_cmd(scmd, done_q,
> +						     DID_TRANSPORT_DISRUPTED);
>  			else
>  				/* push back on work queue for further processing */
>  				list_move(&scmd->eh_entry, work_q);
> @@ -1736,8 +1748,9 @@ static int scsi_eh_bus_reset(struct Scsi_Host *shost,
>  			list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
>  				if (channel == scmd_channel(scmd)) {
>  					if (rtn == FAST_IO_FAIL)
> -						scsi_eh_finish_cmd(scmd,
> -								   done_q);
> +						__scsi_eh_finish_cmd(scmd,
> +								     done_q,
> +								     DID_TRANSPORT_DISRUPTED);
>  					else
>  						list_move_tail(&scmd->eh_entry,
>  							       &check_list);
> @@ -1780,9 +1793,9 @@ static int scsi_eh_host_reset(struct Scsi_Host *shost,
>  		if (rtn == SUCCESS) {
>  			list_splice_init(work_q, &check_list);
>  		} else if (rtn == FAST_IO_FAIL) {
> -			list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
> -					scsi_eh_finish_cmd(scmd, done_q);
> -			}
> +			list_for_each_entry_safe(scmd, next, work_q, eh_entry)
> +				__scsi_eh_finish_cmd(scmd, done_q,
> +						     DID_TRANSPORT_DISRUPTED);
>  		} else {
>  			SCSI_LOG_ERROR_RECOVERY(3,
>  				shost_printk(KERN_INFO, shost,

For FAST_IO_FAIL I think you want to use DID_TRANSPORT_FAILFAST  because when
drivers return that, they normally have hit their fast io fail timer or have
hit a hard transport issue like the port is offline. For example for FC drivers
they do:

err = fc_block_rport(rport);
if (err)
	return err;
 
where fc_block_rport does:

if (rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)
	return FAST_IO_FAIL;

and then for fc_remote_port_chkready we return DID_TRANSPORT_FAILFAST
when FC_RPORT_FAST_FAIL_TIMEDOUT is set.

So using DID_TRANSPORT_FAILFAST would align the return values for that
state.

One question I had is why you added those checks for target and host
reset but not scsi_eh_bus_device_reset because drivers will do something
similar to above where they call fc_block_rport for that callout as
well.
Hannes Reinecke Oct. 20, 2023, 5:55 a.m. UTC | #7
On 10/19/23 22:30, Mike Christie wrote:
> On 10/16/23 7:15 AM, Hannes Reinecke wrote:
>> @@ -1671,7 +1682,8 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost,
>>   			if (rtn == SUCCESS)
>>   				list_move_tail(&scmd->eh_entry, &check_list);
>>   			else if (rtn == FAST_IO_FAIL)
>> -				scsi_eh_finish_cmd(scmd, done_q);
>> +				__scsi_eh_finish_cmd(scmd, done_q,
>> +						     DID_TRANSPORT_DISRUPTED);
>>   			else
>>   				/* push back on work queue for further processing */
>>   				list_move(&scmd->eh_entry, work_q);
>> @@ -1736,8 +1748,9 @@ static int scsi_eh_bus_reset(struct Scsi_Host *shost,
>>   			list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
>>   				if (channel == scmd_channel(scmd)) {
>>   					if (rtn == FAST_IO_FAIL)
>> -						scsi_eh_finish_cmd(scmd,
>> -								   done_q);
>> +						__scsi_eh_finish_cmd(scmd,
>> +								     done_q,
>> +								     DID_TRANSPORT_DISRUPTED);
>>   					else
>>   						list_move_tail(&scmd->eh_entry,
>>   							       &check_list);
>> @@ -1780,9 +1793,9 @@ static int scsi_eh_host_reset(struct Scsi_Host *shost,
>>   		if (rtn == SUCCESS) {
>>   			list_splice_init(work_q, &check_list);
>>   		} else if (rtn == FAST_IO_FAIL) {
>> -			list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
>> -					scsi_eh_finish_cmd(scmd, done_q);
>> -			}
>> +			list_for_each_entry_safe(scmd, next, work_q, eh_entry)
>> +				__scsi_eh_finish_cmd(scmd, done_q,
>> +						     DID_TRANSPORT_DISRUPTED);
>>   		} else {
>>   			SCSI_LOG_ERROR_RECOVERY(3,
>>   				shost_printk(KERN_INFO, shost,
> 
> For FAST_IO_FAIL I think you want to use DID_TRANSPORT_FAILFAST  because when
> drivers return that, they normally have hit their fast io fail timer or have
> hit a hard transport issue like the port is offline. For example for FC drivers
> they do:
> 
Ah, yes, you are right. Will be modifying that.

> err = fc_block_rport(rport);
> if (err)
> 	return err;
>   
> where fc_block_rport does:
> 
> if (rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)
> 	return FAST_IO_FAIL;
> 
> and then for fc_remote_port_chkready we return DID_TRANSPORT_FAILFAST
> when FC_RPORT_FAST_FAIL_TIMEDOUT is set.
> 
> So using DID_TRANSPORT_FAILFAST would align the return values for that
> state.
> 
> One question I had is why you added those checks for target and host
> reset but not scsi_eh_bus_device_reset because drivers will do something
> similar to above where they call fc_block_rport for that callout as
> well.
> 
That's done in one of the later patches where I convert the loop in
scsi_eh_bus_device_reset().

Cheers,

Hannes
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 826bc7f4d59f..cdbad217d013 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1262,9 +1262,10 @@  scsi_eh_action(struct scsi_cmnd *scmd, enum scsi_disposition rtn)
 }
 
 /**
- * scsi_eh_finish_cmd - Handle a cmd that eh is finished with.
+ * __scsi_eh_finish_cmd - Handle a cmd that eh is finished with.
  * @scmd:	Original SCSI cmd that eh has finished.
  * @done_q:	Queue for processed commands.
+ * @host_byte:	Host byte of the command status to be set
  *
  * Notes:
  *    We don't want to use the normal command completion while we are are
@@ -1273,10 +1274,18 @@  scsi_eh_action(struct scsi_cmnd *scmd, enum scsi_disposition rtn)
  *    keep a list of pending commands for final completion, and once we
  *    are ready to leave error handling we handle completion for real.
  */
-void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
+void __scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q,
+			int host_byte)
 {
+	if (host_byte)
+		set_host_byte(scmd, host_byte);
 	list_move_tail(&scmd->eh_entry, done_q);
 }
+
+void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
+{
+	__scsi_eh_finish_cmd(scmd, done_q, DID_OK);
+}
 EXPORT_SYMBOL(scsi_eh_finish_cmd);
 
 /**
@@ -1451,7 +1460,8 @@  static int scsi_eh_test_devices(struct list_head *cmd_list,
 				if (finish_cmds &&
 				    (try_stu ||
 				     scsi_eh_action(scmd, SUCCESS) == SUCCESS))
-					scsi_eh_finish_cmd(scmd, done_q);
+					__scsi_eh_finish_cmd(scmd, done_q,
+							     DID_RESET);
 				else
 					list_move_tail(&scmd->eh_entry, work_q);
 			}
@@ -1600,8 +1610,9 @@  static int scsi_eh_bus_device_reset(struct Scsi_Host *shost,
 							 work_q, eh_entry) {
 					if (scmd->device == sdev &&
 					    scsi_eh_action(scmd, rtn) != FAILED)
-						scsi_eh_finish_cmd(scmd,
-								   done_q);
+						__scsi_eh_finish_cmd(scmd,
+								     done_q,
+								     DID_RESET);
 				}
 			}
 		} else {
@@ -1671,7 +1682,8 @@  static int scsi_eh_target_reset(struct Scsi_Host *shost,
 			if (rtn == SUCCESS)
 				list_move_tail(&scmd->eh_entry, &check_list);
 			else if (rtn == FAST_IO_FAIL)
-				scsi_eh_finish_cmd(scmd, done_q);
+				__scsi_eh_finish_cmd(scmd, done_q,
+						     DID_TRANSPORT_DISRUPTED);
 			else
 				/* push back on work queue for further processing */
 				list_move(&scmd->eh_entry, work_q);
@@ -1736,8 +1748,9 @@  static int scsi_eh_bus_reset(struct Scsi_Host *shost,
 			list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
 				if (channel == scmd_channel(scmd)) {
 					if (rtn == FAST_IO_FAIL)
-						scsi_eh_finish_cmd(scmd,
-								   done_q);
+						__scsi_eh_finish_cmd(scmd,
+								     done_q,
+								     DID_TRANSPORT_DISRUPTED);
 					else
 						list_move_tail(&scmd->eh_entry,
 							       &check_list);
@@ -1780,9 +1793,9 @@  static int scsi_eh_host_reset(struct Scsi_Host *shost,
 		if (rtn == SUCCESS) {
 			list_splice_init(work_q, &check_list);
 		} else if (rtn == FAST_IO_FAIL) {
-			list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
-					scsi_eh_finish_cmd(scmd, done_q);
-			}
+			list_for_each_entry_safe(scmd, next, work_q, eh_entry)
+				__scsi_eh_finish_cmd(scmd, done_q,
+						     DID_TRANSPORT_DISRUPTED);
 		} else {
 			SCSI_LOG_ERROR_RECOVERY(3,
 				shost_printk(KERN_INFO, shost,