diff mbox series

[v3,03/17] scsi: No retries on abort success

Message ID 1602732462-10443-4-git-send-email-muneendra.kumar@broadcom.com
State New
Headers show
Series [v3,01/17] scsi: Added a new definition in scsi_cmnd.h | expand

Commit Message

Muneendra Kumar Oct. 15, 2020, 3:27 a.m. UTC
Made an additional check in scsi_noretry_cmd to verify whether user has
decided not to do retries on abort success by setting the
SCMD_NORETRIES_ABORT bit

If SCMD_NORETRIES_ABORT bit is set we are making sure there won't be any
retries done on the same path and also setting the host byte as
DID_TRANSPORT_MARGINAL so that the error can be propogated as recoverable
transport error to the blk layers.

Added a code in scsi_result_to_blk_status to translate
a new error DID_TRANSPORT_MARGINAL to the corresponding blk_status_t
i.e BLK_STS_TRANSPORT

Signed-off-by: Muneendra <muneendra.kumar@broadcom.com>

---
v3:
Merged  first part of the previous patch(v2 patch3) with
this patch.

v2:
set the hostbyte as DID_TRANSPORT_MARGINAL instead of
DID_TRANSPORT_FAILFAST.
---
 drivers/scsi/scsi_error.c | 10 ++++++++++
 drivers/scsi/scsi_lib.c   |  1 +
 2 files changed, 11 insertions(+)

Comments

Mike Christie Oct. 16, 2020, 6:37 p.m. UTC | #1
On 10/14/20 10:27 PM, Muneendra wrote:
> Made an additional check in scsi_noretry_cmd to verify whether user has

> decided not to do retries on abort success by setting the

> SCMD_NORETRIES_ABORT bit

> 

> If SCMD_NORETRIES_ABORT bit is set we are making sure there won't be any

> retries done on the same path and also setting the host byte as

> DID_TRANSPORT_MARGINAL so that the error can be propogated as recoverable

> transport error to the blk layers.

> 

> Added a code in scsi_result_to_blk_status to translate

> a new error DID_TRANSPORT_MARGINAL to the corresponding blk_status_t

> i.e BLK_STS_TRANSPORT

> 

> Signed-off-by: Muneendra <muneendra.kumar@broadcom.com>

> 

> ---

> v3:

> Merged  first part of the previous patch(v2 patch3) with

> this patch.

> 

> v2:

> set the hostbyte as DID_TRANSPORT_MARGINAL instead of

> DID_TRANSPORT_FAILFAST.

> ---

>   drivers/scsi/scsi_error.c | 10 ++++++++++

>   drivers/scsi/scsi_lib.c   |  1 +

>   2 files changed, 11 insertions(+)

> 

> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c

> index ae80daa5d831..aa30c1c9e9db 100644

> --- a/drivers/scsi/scsi_error.c

> +++ b/drivers/scsi/scsi_error.c

> @@ -1763,6 +1763,16 @@ int scsi_noretry_cmd(struct scsi_cmnd *scmd)

>   		return 0;

>   

>   check_type:

> +	/*

> +	 * Check whether caller has decided not to do retries on

> +	 * abort success by setting the SCMD_NORETRIES_ABORT bit


The comment seems wrong here because we are not setting that bit.

> +	 */

> +	if ((test_bit(SCMD_NORETRIES_ABORT, &scmd->state)) &&

> +		(scmd->request->cmd_flags & REQ_FAILFAST_TRANSPORT)) {

> +		set_host_byte(scmd, DID_TRANSPORT_MARGINAL);

> +		return 1;

> +	}

> +

>   	/*

>   	 * assume caller has checked sense and determined

>   	 * the check condition was retryable.

> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c

> index 2b5dea07498e..9606bad1542f 100644

> --- a/drivers/scsi/scsi_lib.c

> +++ b/drivers/scsi/scsi_lib.c

> @@ -629,6 +629,7 @@ static blk_status_t scsi_result_to_blk_status(struct scsi_cmnd *cmd, int result)

>   			return BLK_STS_OK;

>   		return BLK_STS_IOERR;

>   	case DID_TRANSPORT_FAILFAST:

> +	case DID_TRANSPORT_MARGINAL:

>   		return BLK_STS_TRANSPORT;

>   	case DID_TARGET_FAILURE:

>   		set_host_byte(cmd, DID_OK);

>
Mike Christie Oct. 19, 2020, 7:05 p.m. UTC | #2
On 10/14/20 10:27 PM, Muneendra wrote:
>   check_type:
> +	/*
> +	 * Check whether caller has decided not to do retries on
> +	 * abort success by setting the SCMD_NORETRIES_ABORT bit
> +	 */
> +	if ((test_bit(SCMD_NORETRIES_ABORT, &scmd->state)) &&
> +		(scmd->request->cmd_flags & REQ_FAILFAST_TRANSPORT)) {
> +		set_host_byte(scmd, DID_TRANSPORT_MARGINAL);

Hey, one other thing that might be confusing me is this check and the 
naming. The 0/17 email description and the SCMD_NORETRIES_ABORT name 
makes me think we want to only run this if the cmd timedout and we went 
through the SCSI EH TMF operations. However, I think this will end up 
failing other errors with DID_TRANSPORT_MARGINAL right?

Did you want just the the SCSI EH timeout/abort case to hit this or any 
errors that hit this code path?

> +		return 1;
> +	}
> +
Muneendra Kumar Oct. 19, 2020, 7:26 p.m. UTC | #3
Hi Michael,

>   check_type:

> +	/*

> +	 * Check whether caller has decided not to do retries on

> +	 * abort success by setting the SCMD_NORETRIES_ABORT bit

> +	 */

> +	if ((test_bit(SCMD_NORETRIES_ABORT, &scmd->state)) &&

> +		(scmd->request->cmd_flags & REQ_FAILFAST_TRANSPORT)) {

> +		set_host_byte(scmd, DID_TRANSPORT_MARGINAL);


>Hey, one other thing that might be confusing me is this check and the

>naming. The 0/17 email description and the SCMD_NORETRIES_ABORT name makes

>me think we want to only run this if the cmd timedout and we went >through

>the SCSI EH TMF operations. However, I think this will end up failing other

>errors with DID_TRANSPORT_MARGINAL right?


>Did you want just the the SCSI EH timeout/abort case to hit this or any

>errors that hit this code path?

[Muneendra]At present we want SCSI EH timeout/abort case to hit this.

Regards,
Muneendra.
Mike Christie Oct. 20, 2020, 7:53 p.m. UTC | #4
On 10/19/20 2:26 PM, Muneendra Kumar M wrote:
> Hi Michael,
> 
>>   check_type:
>> +	/*
>> +	 * Check whether caller has decided not to do retries on
>> +	 * abort success by setting the SCMD_NORETRIES_ABORT bit
>> +	 */
>> +	if ((test_bit(SCMD_NORETRIES_ABORT, &scmd->state)) &&
>> +		(scmd->request->cmd_flags & REQ_FAILFAST_TRANSPORT)) {
>> +		set_host_byte(scmd, DID_TRANSPORT_MARGINAL);
> 
>> Hey, one other thing that might be confusing me is this check and the
>> naming. The 0/17 email description and the SCMD_NORETRIES_ABORT name makes
>> me think we want to only run this if the cmd timedout and we went >through
>> the SCSI EH TMF operations. However, I think this will end up failing other
>> errors with DID_TRANSPORT_MARGINAL right?
> 
>> Did you want just the the SCSI EH timeout/abort case to hit this or any
>> errors that hit this code path?
> [Muneendra]At present we want SCSI EH timeout/abort case to hit this.

What about adding a new eh callout that the transport classes can implement.
They can check the port state at this time and decide if the cmd should be
retried or failed. It would be similar to fc_eh_timed_out for example.
Mike Christie Oct. 20, 2020, 8:18 p.m. UTC | #5
On 10/20/20 2:53 PM, Mike Christie wrote:
> On 10/19/20 2:26 PM, Muneendra Kumar M wrote:

>> Hi Michael,

>>

>>>   check_type:

>>> +	/*

>>> +	 * Check whether caller has decided not to do retries on

>>> +	 * abort success by setting the SCMD_NORETRIES_ABORT bit

>>> +	 */

>>> +	if ((test_bit(SCMD_NORETRIES_ABORT, &scmd->state)) &&

>>> +		(scmd->request->cmd_flags & REQ_FAILFAST_TRANSPORT)) {

>>> +		set_host_byte(scmd, DID_TRANSPORT_MARGINAL);

>>

>>> Hey, one other thing that might be confusing me is this check and the

>>> naming. The 0/17 email description and the SCMD_NORETRIES_ABORT name makes

>>> me think we want to only run this if the cmd timedout and we went >through

>>> the SCSI EH TMF operations. However, I think this will end up failing other

>>> errors with DID_TRANSPORT_MARGINAL right?

>>

>>> Did you want just the the SCSI EH timeout/abort case to hit this or any

>>> errors that hit this code path?

>> [Muneendra]At present we want SCSI EH timeout/abort case to hit this.

> 

> What about adding a new eh callout that the transport classes can implement.

> They can check the port state at this time and decide if the cmd should be

> retried or failed. It would be similar to fc_eh_timed_out for example.

> 


Or, in the fc_eh_timed_out callout set the SCMD_NORETRIES_ABORT if port
state is marginal and it indicates it wants to abort the cmd. In
scsi_noretry_cmd above then you know we got there, because the cmd
timedout and we tried/were going to abort it. We don't need the code to
loop over running commands, the chkready changes, etc.
Muneendra Kumar Oct. 21, 2020, 6:51 p.m. UTC | #6
Hi Michael,
Thanks for the input.

>Or, in the fc_eh_timed_out callout set the SCMD_NORETRIES_ABORT if port
>state is marginal and it indicates it wants to abort the cmd. In
>scsi_noretry_cmd above then you know we got there, because the cmd timedout
>and we >tried/were going to abort it. We don't need the code to loop over
>running commands, the chkready changes, etc.

Agreed.
I will incorporate all your review comments.

Regards,
Muneendra.
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index ae80daa5d831..aa30c1c9e9db 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1763,6 +1763,16 @@  int scsi_noretry_cmd(struct scsi_cmnd *scmd)
 		return 0;
 
 check_type:
+	/*
+	 * Check whether caller has decided not to do retries on
+	 * abort success by setting the SCMD_NORETRIES_ABORT bit
+	 */
+	if ((test_bit(SCMD_NORETRIES_ABORT, &scmd->state)) &&
+		(scmd->request->cmd_flags & REQ_FAILFAST_TRANSPORT)) {
+		set_host_byte(scmd, DID_TRANSPORT_MARGINAL);
+		return 1;
+	}
+
 	/*
 	 * assume caller has checked sense and determined
 	 * the check condition was retryable.
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 2b5dea07498e..9606bad1542f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -629,6 +629,7 @@  static blk_status_t scsi_result_to_blk_status(struct scsi_cmnd *cmd, int result)
 			return BLK_STS_OK;
 		return BLK_STS_IOERR;
 	case DID_TRANSPORT_FAILFAST:
+	case DID_TRANSPORT_MARGINAL:
 		return BLK_STS_TRANSPORT;
 	case DID_TARGET_FAILURE:
 		set_host_byte(cmd, DID_OK);