mbox series

[v3,00/18] Add Command Duration Limits support

Message ID 20230124190308.127318-1-niklas.cassel@wdc.com
Headers show
Series Add Command Duration Limits support | expand

Message

Niklas Cassel Jan. 24, 2023, 7:02 p.m. UTC
Hello,

This series adds support for Command Duration Limits.
The series is based on linux-next tag: next-20230124
The series can also be found in git:
https://github.com/floatious/linux/commits/cdl-v3


=================
CDL in ATA / SCSI
=================
Command Duration Limits is defined in:
T13 ATA Command Set - 5 (ACS-5) and
T10 SCSI Primary Commands - 6 (SPC-6) respectively
(a simpler version of CDL is defined in T10 SPC-5).

CDL defines Duration Limits Descriptors (DLD).
7 DLDs for read commands and 7 DLDs for write commands.
Simply put, a DLD contains a limit and a policy.

A command can specify that a certain limit should be applied by setting
the DLD index field (3 bits, so 0-7) in the command itself.

The DLD index points to one of the 7 DLDs.
DLD index 0 means no descriptor, so no limit.
DLD index 1-7 means DLD 1-7.

A DLD can have a few different policies, but the two major ones are:
-Policy 0xF (abort), command will be completed with command aborted error
(ATA) or status CHECK CONDITION (SCSI), with sense data indicating that
the command timed out.
-Policy 0xD (complete-unavailable), command will be completed without
error (ATA) or status GOOD (SCSI), with sense data indicating that the
command timed out. Note that the command will not have transferred any
data to/from the device when the command timed out, even though the
command returned success.

Regardless of the CDL policy, in case of a CDL timeout, the I/O will
result in a -ETIME error to user-space.

The DLDs are defined in the CDL log page(s) and are readable and writable.
For convenience, the kernel provides a sysfs interface for reading the
descriptors. If a user really wants to change the descriptors, they can do
so using a user-space application that sends passthrough commands,
one such application is cdl-tools:
https://github.com/westerndigitalcorporation/cdl-tools


==============================
How to use CDL from user-space
==============================
Since CDL is mutually exclusive with NCQ priority
(see ncq_prio_enable and sas_ncq_prio_enable in
Documentation/ABI/testing/sysfs-block-device),
CDL has to be enabled using:
echo 1 > /sys/block/$bdev/device/duration_limits/enable

In order for user-space to be able to select a specific DLD for an I/O,
we have decided to reuse the I/O priority API.

This means that we introduce a new priority class (IOPRIO_CLASS_DL).
When using this class, the existing I/O priority levels (0-7) directly
indicates the DLD index to use.

By reusing the I/O priority API, the user can both define DLD to use
per AIO (io_uring sqe->ioprio or libaio iocb->aio_reqprio) or per-thread
(ioprio_set()).


=======
Testing
=======
With the following fio patch that simply adds the new priority class:
https://github.com/westerndigitalcorporation/cdl-tools/blob/main/patches/fio-3.29-and-newer/0001-os-linux-Add-IORPIO_CLASS_DL-definition.patch

CDL can be tested using fio, e.g.:
fio --ioengine=io_uring --cmdprio_percentage=10 --cmdprio_class=4 --cmdprio=DLD_index

A simple way to test is to use a DLD with a very short duration limit,
and send large reads. Regardless of the CDL policy, in case of a CDL
timeout, the I/O will result in a -ETIME error to user-space.

We also provide a CDL test suite located in the cdl-tools repo, see:
https://github.com/westerndigitalcorporation/cdl-tools/blob/main/README.md#testing-a-system-command-duration-limits-support


We have tested this patch series using:
-real hardware
-the following QEMU implementation:
https://github.com/floatious/qemu/tree/cdl
(NOTE: the QEMU implementation requires you to define the CDL policy at compile
time, so you currently need to recompile QEMU when switching between policies.)


===================
Further information
===================
For further information about CDL, see Damien's slides:

Presented at SDC 2021:
https://www.snia.org/sites/default/files/SDC/2021/pdfs/SNIA-SDC21-LeMoal-Be-On-Time-command-duration-limits-Feature-Support-in%20Linux.pdf

Presented at Lund Linux Con 2022:
https://drive.google.com/file/d/1I6ChFc0h4JY9qZdO1bY5oCAdYCSZVqWw/view?usp=sharing


================
Changes since V2
================
-Reordered the patches by subsystem, so that the different subsystem maintainers
 can pick up a single range of patches to their respective tree.
-Dropped extern keyword when modifying SCSI function declarations. (Christoph)
-Renamed flag SCMD_EH_SUCCESS_CMD to SCMD_FORCE_EH_SUCCESS. (Christoph)
-Improved commit message for patch "block: introduce duration-limits priority
 class". (Christoph)
-Added a new patch (10/18) that removes unnecessary !cmd checks. (Christoph)
-Modified ata_eh_request_sense(), instead of taking an extra parameter,
 let the caller set scsicmd->result. (Christoph)
-Dropped the patch that changed ata_scsi_set_sense(), let CDL specific code
 call scsi_build_sense_buffer() directly instead. (Christoph)
-Picked up Reviewed-by tags from Hannes and Christoph.


For older change logs, see previous patch series versions:
https://lore.kernel.org/linux-scsi/20230112140412.667308-1-niklas.cassel@wdc.com/
https://lore.kernel.org/linux-scsi/20221208105947.2399894-1-niklas.cassel@wdc.com/


Kind regards,
Niklas & Damien

Damien Le Moal (12):
  block: introduce duration-limits priority class
  block: introduce BLK_STS_DURATION_LIMIT
  scsi: support retrieving sub-pages of mode pages
  scsi: support service action in scsi_report_opcode()
  scsi: sd: detect support for command duration limits
  scsi: sd: set read/write commands CDL index
  ata: libata: detect support for command duration limits
  ata: libata-scsi: handle CDL bits in ata_scsiop_maint_in()
  ata: libata-scsi: add support for CDL pages mode sense
  ata: libata: add ATA feature control sub-page translation
  ata: libata: set read/write commands CDL index
  Documentation: sysfs-block-device: document command duration limits

Niklas Cassel (6):
  scsi: core: allow libata to complete successful commands via EH
  scsi: rename and move get_scsi_ml_byte()
  scsi: sd: handle read/write CDL timeout failures
  ata: libata-scsi: remove unnecessary !cmd checks
  ata: libata: change ata_eh_request_sense() to not set CHECK_CONDITION
  ata: libata: handle completion of CDL commands using policy 0xD

 Documentation/ABI/testing/sysfs-block-device | 150 ++++
 block/bfq-iosched.c                          |  10 +
 block/blk-core.c                             |   3 +
 block/blk-ioprio.c                           |   3 +
 block/ioprio.c                               |   3 +-
 block/mq-deadline.c                          |   1 +
 drivers/ata/libata-core.c                    | 215 ++++-
 drivers/ata/libata-eh.c                      | 130 ++-
 drivers/ata/libata-sata.c                    | 103 ++-
 drivers/ata/libata-scsi.c                    | 371 ++++++--
 drivers/ata/libata.h                         |   2 +-
 drivers/scsi/Makefile                        |   2 +-
 drivers/scsi/scsi.c                          |  28 +-
 drivers/scsi/scsi_error.c                    |  49 +-
 drivers/scsi/scsi_lib.c                      |  15 +-
 drivers/scsi/scsi_priv.h                     |   6 +
 drivers/scsi/scsi_transport_sas.c            |   2 +-
 drivers/scsi/sd.c                            |  37 +-
 drivers/scsi/sd.h                            |  71 ++
 drivers/scsi/sd_cdl.c                        | 894 +++++++++++++++++++
 drivers/scsi/sr.c                            |   2 +-
 include/linux/ata.h                          |  11 +-
 include/linux/blk_types.h                    |   6 +
 include/linux/ioprio.h                       |   2 +-
 include/linux/libata.h                       |  42 +-
 include/scsi/scsi_cmnd.h                     |   5 +
 include/scsi/scsi_device.h                   |  13 +-
 include/uapi/linux/ioprio.h                  |   7 +
 28 files changed, 2039 insertions(+), 144 deletions(-)
 create mode 100644 drivers/scsi/sd_cdl.c

Comments

Bart Van Assche Jan. 24, 2023, 8:32 p.m. UTC | #1
On 1/24/23 11:59, Keith Busch wrote:
> On Tue, Jan 24, 2023 at 11:29:10AM -0800, Bart Van Assche wrote:
>> On 1/24/23 11:02, Niklas Cassel wrote:
>>> Introduce the new block IO status BLK_STS_DURATION_LIMIT for LLDDs to
>>> report command that failed due to a command duration limit being
>>> exceeded. This new status is mapped to the ETIME error code to allow
>>> users to differentiate "soft" duration limit failures from other more
>>> serious hardware related errors.
>>
>> What makes exceeding the duration limit different from an I/O timeout
>> (BLK_STS_TIMEOUT)? Why is it important to tell the difference between an I/O
>> timeout and exceeding the command duration limit?
> 
> BLK_STS_TIMEOUT should be used if the target device doesn't provide any
> response to the command. The DURATION_LIMIT status is used when the device
> completes a command with that status.

Hi Keith,

 From SPC-6: "The MAX ACTIVE TIME field specifies an upper limit on the 
time that elapses from the time at which the device server initiates 
actions to access, transfer, or act upon the specified data until the 
time the device server returns status for the command."

My interpretation of the above text is that the SCSI command duration 
limit specifies a hard limit, the same type of limit reported by the 
status code BLK_STS_TIMEOUT. It is not clear to me from the patch 
description why a new status code is needed for reporting that the 
command duration limit has been exceeded.

Thanks,

Bart.
Damien Le Moal Jan. 24, 2023, 9:34 p.m. UTC | #2
On 1/25/23 04:29, Bart Van Assche wrote:
> On 1/24/23 11:02, Niklas Cassel wrote:
>> Introduce the new block IO status BLK_STS_DURATION_LIMIT for LLDDs to
>> report command that failed due to a command duration limit being
>> exceeded. This new status is mapped to the ETIME error code to allow
>> users to differentiate "soft" duration limit failures from other more
>> serious hardware related errors.
> 
> What makes exceeding the duration limit different from an I/O timeout 
> (BLK_STS_TIMEOUT)? Why is it important to tell the difference between an 
> I/O timeout and exceeding the command duration limit?

If the device fail to execute a command in time, it will either
1) Fail the command with an error and sense data set (policy 0xf for the
time limit)
2) Return a success status for the command with sense data set telling the
host "data not available". This (weird) case is in essence equivalent to
(1) but was defined to avoid the penalty of a queue abort with SATA drives
(NCQ command errors always result in all on-going commands being aborted).

In both cases, the drive is still responsive and operational.
BLK_STS_TIMEOUT is used if a command timed-out, indicating that the drive
is *not* responding. BLK_STS_TIMEOUT thus generally mean "something is
wrong" (not always, but most of the time.

So we cetainly do not want to overload BLK_STS_TIMEOUT to indicate failed
CDL IOs as that would not allow the user to distinguished from more
serious hardware issues.
Damien Le Moal Jan. 24, 2023, 9:39 p.m. UTC | #3
On 1/25/23 05:32, Bart Van Assche wrote:
> On 1/24/23 11:59, Keith Busch wrote:
>> On Tue, Jan 24, 2023 at 11:29:10AM -0800, Bart Van Assche wrote:
>>> On 1/24/23 11:02, Niklas Cassel wrote:
>>>> Introduce the new block IO status BLK_STS_DURATION_LIMIT for LLDDs to
>>>> report command that failed due to a command duration limit being
>>>> exceeded. This new status is mapped to the ETIME error code to allow
>>>> users to differentiate "soft" duration limit failures from other more
>>>> serious hardware related errors.
>>>
>>> What makes exceeding the duration limit different from an I/O timeout
>>> (BLK_STS_TIMEOUT)? Why is it important to tell the difference between an I/O
>>> timeout and exceeding the command duration limit?
>>
>> BLK_STS_TIMEOUT should be used if the target device doesn't provide any
>> response to the command. The DURATION_LIMIT status is used when the device
>> completes a command with that status.
> 
> Hi Keith,
> 
>  From SPC-6: "The MAX ACTIVE TIME field specifies an upper limit on the 
> time that elapses from the time at which the device server initiates 
> actions to access, transfer, or act upon the specified data until the 
> time the device server returns status for the command."
> 
> My interpretation of the above text is that the SCSI command duration 
> limit specifies a hard limit, the same type of limit reported by the 
> status code BLK_STS_TIMEOUT. It is not clear to me from the patch 
> description why a new status code is needed for reporting that the 
> command duration limit has been exceeded.

As explained, this allows differentiating the "drive gave a response"
(BLK_STS_DURATION_LIMIT) from the "drive is not responding" case with
BLK_STS_TIMEOUT. We took care of mapping BLK_STS_DURATION_LIMIT to ETIME
(timer expired) for user space too, to not overload ETIMEDOUT used with
BLK_STS_TIMEOUT.

We can certainly improve the commit message to describe all of this in
more details.

> 
> Thanks,
> 
> Bart.