mbox series

[v2,00/18] Add Command Duration Limits support

Message ID 20230112140412.667308-1-niklas.cassel@wdc.com
Headers show
Series Add Command Duration Limits support | expand

Message

Niklas Cassel Jan. 12, 2023, 2:03 p.m. UTC
Hello,

This series adds support for Command Duration Limits.
The series is based on linux-next tag: next-20230112
The series can also be found in git:
https://github.com/floatious/linux/commits/cdl-v2


=================
CDL in ATA / SCSI
=================
Command Duration Limits is defined in:
T13 ATA Command Set - 5 (ACS-5) and
T10 SCSI Primary Commands - 6 (SPC-6) respectively
(a simpler version of CDL is defined in T10 SPC-5).

CDL defines Duration Limits Descriptors (DLD).
7 DLDs for read commands and 7 DLDs for write commands.
Simply put, a DLD contains a limit and a policy.

A command can specify that a certain limit should be applied by setting
the DLD index field (3 bits, so 0-7) in the command itself.

The DLD index points to one of the 7 DLDs.
DLD index 0 means no descriptor, so no limit.
DLD index 1-7 means DLD 1-7.

A DLD can have a few different policies, but the two major ones are:
-Policy 0xF (abort), command will be completed with command aborted error
(ATA) or status CHECK CONDITION (SCSI), with sense data indicating that
the command timed out.
-Policy 0xD (complete-unavailable), command will be completed without
error (ATA) or status GOOD (SCSI), with sense data indicating that the
command timed out. Note that the command will not have transferred any
data to/from the device when the command timed out, even though the
command returned success.

Regardless of the CDL policy, in case of a CDL timeout, the I/O will
result in a -ETIME error to user-space.

The DLDs are defined in the CDL log page(s) and are readable and writable.
For convenience, the kernel provides a sysfs interface for reading the
descriptors. If a user really wants to change the descriptors, they can do
so using a user-space application that sends passthrough commands,
one such application is cdl-tools:
https://github.com/westerndigitalcorporation/cdl-tools


==============================
How to use CDL from user-space
==============================
Since CDL is mutually exclusive with NCQ priority
(see ncq_prio_enable and sas_ncq_prio_enable in
Documentation/ABI/testing/sysfs-block-device),
CDL has to be enabled using:
echo 1 > /sys/block/$bdev/device/duration_limits/enable

In order for user-space to be able to select a specific DLD for an I/O,
we have decided to reuse the I/O priority API.

This means that we introduce a new priority class (IOPRIO_CLASS_DL).
When using this class, the existing I/O priority levels (0-7) directly
indicates the DLD index to use.

By reusing the I/O priority API, the user can both define DLD to use
per AIO (io_uring sqe->ioprio or libaio iocb->aio_reqprio) or per-thread
(ioprio_set()).


=======
Testing
=======
With the following fio patch that simply adds the new priority class:
https://github.com/westerndigitalcorporation/cdl-tools/blob/main/patches/fio-3.29-and-newer/0001-os-linux-Add-IORPIO_CLASS_DL-definition.patch

CDL can be tested using fio, e.g.:
fio --ioengine=io_uring --cmdprio_percentage=10 --cmdprio_class=4 --cmdprio=DLD_index

A simple way to test is to use a DLD with a very short duration limit,
and send large reads. Regardless of the CDL policy, in case of a CDL
timeout, the I/O will result in a -ETIME error to user-space.

We also provide a CDL test suite located in the cdl-tools repo, see:
https://github.com/westerndigitalcorporation/cdl-tools/blob/main/README.md#testing-a-system-command-duration-limits-support


We have tested this patch series using:
-real hardware
-the following QEMU implementation:
https://github.com/floatious/qemu/tree/cdl
(NOTE: the QEMU implementation requires you to define the CDL policy at compile
time, so you currently need to recompile QEMU when switching between policies.)


===================
Further information
===================
For further information about CDL, see Damien's slides:

Presented at SDC 2021:
https://www.snia.org/sites/default/files/SDC/2021/pdfs/SNIA-SDC21-LeMoal-Be-On-Time-command-duration-limits-Feature-Support-in%20Linux.pdf

Presented at Lund Linux Con 2022:
https://drive.google.com/file/d/1I6ChFc0h4JY9qZdO1bY5oCAdYCSZVqWw/view?usp=sharing


================
Changes since V1
================
-libata patches that were not strictly related to CDL have been dropped,
 as they have already been sent out and been accepted as a separate series:
 https://lore.kernel.org/linux-ide/20221229170005.49118-1-niklas.cassel@wdc.com/
-Sending all patches to all mailing lists (libata,scsi,block), instead of only
 sending the cover letter + relevant patches to each list (Chaitanya Kulkarni).
-Added my Signed-off-by on the patches developed solely by Damien (John Garry).
-Fixed comments on the Documentation (Bagas Sanjaya).
-Renamed SCSI ML helper while moving it (Mike Christie).
-Modified patch 17 ("ata: libata: handle completion of CDL commands using policy
 0xD") to not trigger the libata fast drain timer (which is 2 seconds) while
 scheduling SCSI EH for a CDL command (i.e. while waiting for other commands to
 finish normally). The regular 30 second SCSI timeout is still in place for the
 commands that we are waiting for to finish normally.
-Rebased and resolved merge conflicts in libata introduced by the now accepted
 libata FUA cleanup series.
-Rebased and resolved merge conflicts in bfq-iosched introduced by the now
 accepted BFQ multi actuator series.


Kind regards,
Niklas & Damien

Damien Le Moal (12):
  scsi: support retrieving sub-pages of mode pages
  scsi: support service action in scsi_report_opcode()
  block: introduce duration-limits priority class
  block: introduce BLK_STS_DURATION_LIMIT
  ata: libata: detect support for command duration limits
  ata: libata-scsi: handle CDL bits in ata_scsiop_maint_in()
  ata: libata-scsi: add support for CDL pages mode sense
  ata: libata: add ATA feature control sub-page translation
  ata: libata: set read/write commands CDL index
  scsi: sd: detect support for command duration limits
  scsi: sd: set read/write commands CDL index
  Documentation: sysfs-block-device: document command duration limits

Niklas Cassel (6):
  ata: libata: allow ata_scsi_set_sense() to not set CHECK_CONDITION
  ata: libata: allow ata_eh_request_sense() to not set CHECK_CONDITION
  scsi: core: allow libata to complete successful commands via EH
  scsi: rename and move get_scsi_ml_byte()
  scsi: sd: handle read/write CDL timeout failures
  ata: libata: handle completion of CDL commands using policy 0xD

 Documentation/ABI/testing/sysfs-block-device | 150 ++++
 block/bfq-iosched.c                          |  10 +
 block/blk-core.c                             |   3 +
 block/blk-ioprio.c                           |   3 +
 block/ioprio.c                               |   3 +-
 block/mq-deadline.c                          |   1 +
 drivers/ata/libata-core.c                    | 215 ++++-
 drivers/ata/libata-eh.c                      | 117 ++-
 drivers/ata/libata-sata.c                    | 104 ++-
 drivers/ata/libata-scsi.c                    | 403 +++++++--
 drivers/ata/libata.h                         |   6 +-
 drivers/scsi/Makefile                        |   2 +-
 drivers/scsi/scsi.c                          |  28 +-
 drivers/scsi/scsi_error.c                    |  49 +-
 drivers/scsi/scsi_lib.c                      |  15 +-
 drivers/scsi/scsi_priv.h                     |   6 +
 drivers/scsi/scsi_transport_sas.c            |   2 +-
 drivers/scsi/sd.c                            |  37 +-
 drivers/scsi/sd.h                            |  71 ++
 drivers/scsi/sd_cdl.c                        | 894 +++++++++++++++++++
 drivers/scsi/sr.c                            |   2 +-
 include/linux/ata.h                          |  11 +-
 include/linux/blk_types.h                    |   6 +
 include/linux/ioprio.h                       |   2 +-
 include/linux/libata.h                       |  42 +-
 include/scsi/scsi_cmnd.h                     |   5 +
 include/scsi/scsi_device.h                   |   8 +-
 include/uapi/linux/ioprio.h                  |   7 +
 28 files changed, 2051 insertions(+), 151 deletions(-)
 create mode 100644 drivers/scsi/sd_cdl.c

Comments

Hannes Reinecke Jan. 13, 2023, 7:49 a.m. UTC | #1
On 1/12/23 15:03, Niklas Cassel wrote:
> Current ata_scsi_set_sense() calls scsi_build_sense(), which,
> in addition to setting the sense data, unconditionally sets the
> scsicmd->result to SAM_STAT_CHECK_CONDITION.
> 
> For Command Duration Limits policy 0xD:
> The device shall complete the command without error (SAM_STAT_GOOD)
> with the additional sense code set to DATA CURRENTLY UNAVAILABLE.
> 
> It is perfectly fine to have sense data for a command that returned
> completion without error.
> 
> In order to support for CDL policy 0xD, we have to remove this
> assumption that having sense data means that the command failed
> (SAM_STAT_CHECK_CONDITION).
> 
> Add a new parameter to ata_scsi_set_sense() to allow us to set
> sense data without unconditionally setting SAM_STAT_CHECK_CONDITION.
> This new parameter will be used in a follow-up patch.
> 
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> ---
>   drivers/ata/libata-eh.c   |  3 ++-
>   drivers/ata/libata-sata.c |  4 ++--
>   drivers/ata/libata-scsi.c | 38 ++++++++++++++++++++------------------
>   drivers/ata/libata.h      |  4 ++--
>   4 files changed, 26 insertions(+), 23 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
Hannes Reinecke Jan. 13, 2023, 7:57 a.m. UTC | #2
On 1/12/23 15:03, Niklas Cassel wrote:
> In SCSI, we get the sense data as part of the completion, for ATA
> however, we need to fetch the sense data as an extra step. For an
> aborted ATA command the sense data is fetched via libata's
> ->eh_strategy_handler().
> 
> For Command Duration Limits policy 0xD:
> The device shall complete the command without error with the additional
> sense code set to DATA CURRENTLY UNAVAILABLE.
> 
> In order to handle this policy in libata, we intend to send a successful
> command via SCSI EH, and let libata's ->eh_strategy_handler() fetch the
> sense data for the good command. This is similar to how we handle an
> aborted ATA command, just that we need to read the Successful NCQ
> Commands log instead of the NCQ Command Error log.
> 
> When we get a SATA completion with successful commands, ATA_SENSE will
> be set, indicating that some commands in the completion have sense data.
> 
> The sense_valid bitmask in the Sense Data for Successful NCQ Commands
> log will inform exactly which commands that had sense data, which might
> be a subset of all the commands that was completed in the same
> completion. (Yet all will have ATA_SENSE set, since the status is per
> completion.)
> 
> The successful commands that have e.g. a "DATA CURRENTLY UNAVAILABLE"
> sense data will have a SCSI ML byte set, so scsi_eh_flush_done_q() will
> not set the scmd->result to DID_TIME_OUT for these commands. However,
> the successful commands that did not have sense data, must not get their
> result marked as DID_TIME_OUT by SCSI EH.
> 
> Add a new flag SCMD_EH_SUCCESS_CMD, which tells SCSI EH to not mark a
> command as DID_TIME_OUT, even if it has scmd->result == SAM_STAT_GOOD.
> 
> This will be used by libata in a follow-up patch.
> 
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> ---
>   drivers/scsi/scsi_error.c | 3 ++-
>   include/scsi/scsi_cmnd.h  | 5 +++++
>   2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 2aa2c2aee6e7..51aa5c1e31b5 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -2165,7 +2165,8 @@ void scsi_eh_flush_done_q(struct list_head *done_q)
>   			 * scsi_eh_get_sense), scmd->result is already
>   			 * set, do not set DID_TIME_OUT.
>   			 */
> -			if (!scmd->result)
> +			if (!scmd->result &&
> +			    !(scmd->flags & SCMD_EH_SUCCESS_CMD))
>   				scmd->result |= (DID_TIME_OUT << 16);
>   			SCSI_LOG_ERROR_RECOVERY(3,
>   				scmd_printk(KERN_INFO, scmd,
Wouldn't it be better to use '!scmd->result && !scsi_sense_valid(scmd)' 
instead of a new flag?
After all, if we have a valid sense code we _have_ been able to 
communicate with the device. And as we did so it's questionable whether 
it should count as a command time out ...

Cheers,

Hannes
Hannes Reinecke Jan. 13, 2023, 7:59 a.m. UTC | #3
On 1/12/23 15:03, Niklas Cassel wrote:
> From: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> 
> Allow scsi_mode_sense() to retrieve sub-pages of mode pages by adding
> the subpage argument. Change all the current caller sites to specify
> the subpage 0.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> ---
>   drivers/scsi/scsi_lib.c           | 4 +++-
>   drivers/scsi/scsi_transport_sas.c | 2 +-
>   drivers/scsi/sd.c                 | 9 ++++-----
>   drivers/scsi/sr.c                 | 2 +-
>   include/scsi/scsi_device.h        | 5 +++--
>   5 files changed, 12 insertions(+), 10 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
Damien Le Moal Jan. 13, 2023, 12:44 p.m. UTC | #4
On 1/13/23 20:55, Hannes Reinecke wrote:
> On 1/12/23 15:03, Niklas Cassel wrote:
>> From: Damien Le Moal <damien.lemoal@opensource.wdc.com>
>>
>> Introduce the IOPRIO_CLASS_DL priority class to indicate that IOs should
>> be executed using duration-limits targets. The duration target to apply
>> to a command is indicated using the priority level. Up to 8 levels are
>> supported, with level 0 indiating "no limit".
>>
>> This priority class has effect only if the target device supports the
>> command duration limits feature and this feature is enabled by the user.
>> In BFQ and mq-deadline, all requests with this new priority class are
>> handled using the highest priority class RT and priority level 0.
>>
>> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
>> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
>> ---
>>   block/bfq-iosched.c         | 10 ++++++++++
>>   block/blk-ioprio.c          |  3 +++
>>   block/ioprio.c              |  3 ++-
>>   block/mq-deadline.c         |  1 +
>>   include/linux/ioprio.h      |  2 +-
>>   include/uapi/linux/ioprio.h |  7 +++++++
>>   6 files changed, 24 insertions(+), 2 deletions(-)
>>
> I wonder.
> 
> _Normally_ a command timeout is only in force once the command is being 
> handed off to the driver. As such it doesn't apply for any scheduling 
> being done before that; most notably the I/O scheduler is not affected 
> by any command timeout.
> 
> And I was under the impression that CDL really only allows the drive to 
> impose a command timeout on its own.
> (Pray correct me if I'm mistaken)

The CDL descriptors to be used for read/write commands are defined by the
user and set on the drive (write log command). By default, the drive does
not have any CDL descriptor set, so no limit == regular behavior (no
timeout aborts). Also keep in mind that CDLs may be of the (1) "best
effort" flavor, or the (2) "abort" flavor. For (2), a command missing its
CDL limit will be aborted by the device (limit set by the user !). But for
(1), the drive will continue executing the command even if the CDL limit
is exceeded, so no timeout error.

> Hence: does CDL really impinge on the I/O scheduler? Or shouldn't we 
> treat CDL just like a 'normal' command timeout, only to be activated 
> when normal command timeout is enabled?

No impact on the IO scheduler, at least for now. But given that CDL is all
about controlling IO latency, we would miss that goal if we have an IO
scheduler that excessively delays a request with a short CDL set...

The main point of this patch is to have CDL commands treated similarly to
high priority commands to avoid such delays. This is also consistant with
the fact that CDL and ATA NCQ priority commands are mutually exclusive
features. They cannot be used together. So there we cannot have a mix of
CDL and NCQ prio commands together to the same drive.
Niklas Cassel Jan. 16, 2023, 12:43 p.m. UTC | #5
On Fri, Jan 13, 2023 at 08:57:37AM +0100, Hannes Reinecke wrote:
> On 1/12/23 15:03, Niklas Cassel wrote:
> > In SCSI, we get the sense data as part of the completion, for ATA
> > however, we need to fetch the sense data as an extra step. For an
> > aborted ATA command the sense data is fetched via libata's
> > ->eh_strategy_handler().
> > 
> > For Command Duration Limits policy 0xD:
> > The device shall complete the command without error with the additional
> > sense code set to DATA CURRENTLY UNAVAILABLE.
> > 
> > In order to handle this policy in libata, we intend to send a successful
> > command via SCSI EH, and let libata's ->eh_strategy_handler() fetch the
> > sense data for the good command. This is similar to how we handle an
> > aborted ATA command, just that we need to read the Successful NCQ
> > Commands log instead of the NCQ Command Error log.
> > 
> > When we get a SATA completion with successful commands, ATA_SENSE will
> > be set, indicating that some commands in the completion have sense data.
> > 
> > The sense_valid bitmask in the Sense Data for Successful NCQ Commands
> > log will inform exactly which commands that had sense data, which might
> > be a subset of all the commands that was completed in the same
> > completion. (Yet all will have ATA_SENSE set, since the status is per
> > completion.)
> > 
> > The successful commands that have e.g. a "DATA CURRENTLY UNAVAILABLE"
> > sense data will have a SCSI ML byte set, so scsi_eh_flush_done_q() will
> > not set the scmd->result to DID_TIME_OUT for these commands. However,
> > the successful commands that did not have sense data, must not get their
> > result marked as DID_TIME_OUT by SCSI EH.
> > 
> > Add a new flag SCMD_EH_SUCCESS_CMD, which tells SCSI EH to not mark a
> > command as DID_TIME_OUT, even if it has scmd->result == SAM_STAT_GOOD.
> > 
> > This will be used by libata in a follow-up patch.
> > 
> > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> > ---
> >   drivers/scsi/scsi_error.c | 3 ++-
> >   include/scsi/scsi_cmnd.h  | 5 +++++
> >   2 files changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> > index 2aa2c2aee6e7..51aa5c1e31b5 100644
> > --- a/drivers/scsi/scsi_error.c
> > +++ b/drivers/scsi/scsi_error.c
> > @@ -2165,7 +2165,8 @@ void scsi_eh_flush_done_q(struct list_head *done_q)
> >   			 * scsi_eh_get_sense), scmd->result is already
> >   			 * set, do not set DID_TIME_OUT.
> >   			 */
> > -			if (!scmd->result)
> > +			if (!scmd->result &&
> > +			    !(scmd->flags & SCMD_EH_SUCCESS_CMD))
> >   				scmd->result |= (DID_TIME_OUT << 16);
> >   			SCSI_LOG_ERROR_RECOVERY(3,
> >   				scmd_printk(KERN_INFO, scmd,
> Wouldn't it be better to use '!scmd->result && !scsi_sense_valid(scmd)'
> instead of a new flag?
> After all, if we have a valid sense code we _have_ been able to communicate
> with the device. And as we did so it's questionable whether it should count
> as a command time out ...

Hello Hannes,

Thanks a lot for helping out reviewing this series!

Unfortunately, your suggestion won't work.


Let me explain:

When you get a FIS, the ACT register will have a bit set for each
command that finished, however, all the commands will share a single
STATUS value (since there is just a shared STATUS field in the FIS).

So let's say that tags 0-3 got finished (i.e. bits 0-3 are set in the
ACT field) and the STATUS field has the "Sense Data Available" bit set.

This just tells us that at least one of tags 0-3 has sense data.


In order to know which of these tags that actually has sense data,
we need to read the "Sense Data for Successful NCQ Commands log",
which contains a sense_valid bitmask (which contains one bit for
each of the 32 tags).

So reading the "Sense Data for Successful NCQ Commands log" might
tell us that just tag 0-1 have sense data.

So, libata calls ata_qc_schedule_eh() on tags 0-3, wait until SCSI calls
libata .eh_strategy_handler(). libata .eh_strategy_handler() will read the
"Sense Data for Successful NCQ Commands log", which will see that there is
sense data for tags 0-1, and will add sense data for those commands, and
call scsi_check_sense() for tags 0-1.

ata_eh_finish() will finally be called, to determine what to do with the
commands that belonged to EH.

The code looks like this:
if (qc->flags & ATA_QCFLAG_SENSE_VALID ||
    qc->flags & ATA_QCFLAG_EH_SUCCESS_CMD) {
	ata_eh_qc_complete(qc);
}

So it will call complete for all 4 tags, regardless is they had sense data
or not.


scsi_eh_flush_done_q() will soon be called, and since ata_eh_qc_complete()
sets scmd->retries == scmd->allowed, none of the four commands will be retired.

if (!scmd->result &&
    !(scmd->flags & SCMD_EH_SUCCESS_CMD))
	scmd->result |= (DID_TIME_OUT << 16);

The 2 commands with sense data will not get DID_TIMEOUT,
because scmd->result has the SCSI ML byte set
(which is set by scsi_check_sense()).

The 2 commands without sense data will have scmd->result == 0,
so they will get DID_TIME_OUT set if we don't have the extra
!(scmd->flags & SCMD_EH_SUCCESS_CMD)) condition.


SCSI could add an additional check for:

if (!scmd->result && !scsi_sense_valid(scmd) &&
    !(scmd->flags & SCMD_EH_SUCCESS_CMD))
	scmd->result |= (DID_TIME_OUT << 16);

so that a command with does have sense data, but scsi_check_sense()
did not set any SCSI ML byte, does not get DID_TIME_OUT set.

However, for CDL policy 0xD, which this patch cares about,
we would still need the "&& !(scmd->flags & SCMD_EH_SUCCESS_CMD))",
so at least from a CDL perspective, I don't see how any benefit of
also adding a check for "&& !scsi_sense_valid(scmd)".


Kind regards,
Niklas
Hannes Reinecke Jan. 17, 2023, 11:44 a.m. UTC | #6
On 1/16/23 13:43, Niklas Cassel wrote:
> On Fri, Jan 13, 2023 at 08:57:37AM +0100, Hannes Reinecke wrote:
>> On 1/12/23 15:03, Niklas Cassel wrote:
>>> In SCSI, we get the sense data as part of the completion, for ATA
>>> however, we need to fetch the sense data as an extra step. For an
>>> aborted ATA command the sense data is fetched via libata's
>>> ->eh_strategy_handler().
>>>
>>> For Command Duration Limits policy 0xD:
>>> The device shall complete the command without error with the additional
>>> sense code set to DATA CURRENTLY UNAVAILABLE.
>>>
>>> In order to handle this policy in libata, we intend to send a successful
>>> command via SCSI EH, and let libata's ->eh_strategy_handler() fetch the
>>> sense data for the good command. This is similar to how we handle an
>>> aborted ATA command, just that we need to read the Successful NCQ
>>> Commands log instead of the NCQ Command Error log.
>>>
>>> When we get a SATA completion with successful commands, ATA_SENSE will
>>> be set, indicating that some commands in the completion have sense data.
>>>
>>> The sense_valid bitmask in the Sense Data for Successful NCQ Commands
>>> log will inform exactly which commands that had sense data, which might
>>> be a subset of all the commands that was completed in the same
>>> completion. (Yet all will have ATA_SENSE set, since the status is per
>>> completion.)
>>>
>>> The successful commands that have e.g. a "DATA CURRENTLY UNAVAILABLE"
>>> sense data will have a SCSI ML byte set, so scsi_eh_flush_done_q() will
>>> not set the scmd->result to DID_TIME_OUT for these commands. However,
>>> the successful commands that did not have sense data, must not get their
>>> result marked as DID_TIME_OUT by SCSI EH.
>>>
>>> Add a new flag SCMD_EH_SUCCESS_CMD, which tells SCSI EH to not mark a
>>> command as DID_TIME_OUT, even if it has scmd->result == SAM_STAT_GOOD.
>>>
>>> This will be used by libata in a follow-up patch.
>>>
>>> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
>>> ---
>>>    drivers/scsi/scsi_error.c | 3 ++-
>>>    include/scsi/scsi_cmnd.h  | 5 +++++
>>>    2 files changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>>> index 2aa2c2aee6e7..51aa5c1e31b5 100644
>>> --- a/drivers/scsi/scsi_error.c
>>> +++ b/drivers/scsi/scsi_error.c
>>> @@ -2165,7 +2165,8 @@ void scsi_eh_flush_done_q(struct list_head *done_q)
>>>    			 * scsi_eh_get_sense), scmd->result is already
>>>    			 * set, do not set DID_TIME_OUT.
>>>    			 */
>>> -			if (!scmd->result)
>>> +			if (!scmd->result &&
>>> +			    !(scmd->flags & SCMD_EH_SUCCESS_CMD))
>>>    				scmd->result |= (DID_TIME_OUT << 16);
>>>    			SCSI_LOG_ERROR_RECOVERY(3,
>>>    				scmd_printk(KERN_INFO, scmd,
>> Wouldn't it be better to use '!scmd->result && !scsi_sense_valid(scmd)'
>> instead of a new flag?
>> After all, if we have a valid sense code we _have_ been able to communicate
>> with the device. And as we did so it's questionable whether it should count
>> as a command time out ...
> 
> Hello Hannes,
> 
> Thanks a lot for helping out reviewing this series!
> 
> Unfortunately, your suggestion won't work.
> 
> 
> Let me explain:
> 
> When you get a FIS, the ACT register will have a bit set for each
> command that finished, however, all the commands will share a single
> STATUS value (since there is just a shared STATUS field in the FIS).
> 
> So let's say that tags 0-3 got finished (i.e. bits 0-3 are set in the
> ACT field) and the STATUS field has the "Sense Data Available" bit set.
> 
> This just tells us that at least one of tags 0-3 has sense data.
> 
> 
> In order to know which of these tags that actually has sense data,
> we need to read the "Sense Data for Successful NCQ Commands log",
> which contains a sense_valid bitmask (which contains one bit for
> each of the 32 tags).
> 
> So reading the "Sense Data for Successful NCQ Commands log" might
> tell us that just tag 0-1 have sense data.
> 
> So, libata calls ata_qc_schedule_eh() on tags 0-3, wait until SCSI calls
> libata .eh_strategy_handler(). libata .eh_strategy_handler() will read the
> "Sense Data for Successful NCQ Commands log", which will see that there is
> sense data for tags 0-1, and will add sense data for those commands, and
> call scsi_check_sense() for tags 0-1.
> 
> ata_eh_finish() will finally be called, to determine what to do with the
> commands that belonged to EH.
> 
> The code looks like this:
> if (qc->flags & ATA_QCFLAG_SENSE_VALID ||
>      qc->flags & ATA_QCFLAG_EH_SUCCESS_CMD) {
> 	ata_eh_qc_complete(qc);
> }
> 
> So it will call complete for all 4 tags, regardless is they had sense data
> or not.
> 
> 
> scsi_eh_flush_done_q() will soon be called, and since ata_eh_qc_complete()
> sets scmd->retries == scmd->allowed, none of the four commands will be retired.
> 
> if (!scmd->result &&
>      !(scmd->flags & SCMD_EH_SUCCESS_CMD))
> 	scmd->result |= (DID_TIME_OUT << 16);
> 
> The 2 commands with sense data will not get DID_TIMEOUT,
> because scmd->result has the SCSI ML byte set
> (which is set by scsi_check_sense()).
> 
> The 2 commands without sense data will have scmd->result == 0,
> so they will get DID_TIME_OUT set if we don't have the extra
> !(scmd->flags & SCMD_EH_SUCCESS_CMD)) condition.
> 
> 
> SCSI could add an additional check for:
> 
> if (!scmd->result && !scsi_sense_valid(scmd) &&
>      !(scmd->flags & SCMD_EH_SUCCESS_CMD))
> 	scmd->result |= (DID_TIME_OUT << 16);
> 
> so that a command with does have sense data, but scsi_check_sense()
> did not set any SCSI ML byte, does not get DID_TIME_OUT set.
> 
> However, for CDL policy 0xD, which this patch cares about,
> we would still need the "&& !(scmd->flags & SCMD_EH_SUCCESS_CMD))",
> so at least from a CDL perspective, I don't see how any benefit of
> also adding a check for "&& !scsi_sense_valid(scmd)".
> 

Right, I see.
Thanks for the explanation.

You can add:

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes