mbox series

[v7,00/19] Add Command Duration Limits support

Message ID 20230511011356.227789-1-nks@flawful.org
Headers show
Series Add Command Duration Limits support | expand

Message

Niklas Cassel May 11, 2023, 1:13 a.m. UTC
From: Niklas Cassel <niklas.cassel@wdc.com>

Hello,

This series adds support for Command Duration Limits.
The series is based on linux tag: v6.4-rc1
The series can also be found in git:
https://github.com/floatious/linux/commits/cdl-v7


=================
CDL in ATA / SCSI
=================
Command Duration Limits is defined in:
T13 ATA Command Set - 5 (ACS-5) and
T10 SCSI Primary Commands - 6 (SPC-6) respectively
(a simpler version of CDL is defined in T10 SPC-5).

CDL defines Duration Limits Descriptors (DLD).
7 DLDs for read commands and 7 DLDs for write commands.
Simply put, a DLD contains a limit and a policy.

A command can specify that a certain limit should be applied by setting
the DLD index field (3 bits, so 0-7) in the command itself.

The DLD index points to one of the 7 DLDs.
DLD index 0 means no descriptor, so no limit.
DLD index 1-7 means DLD 1-7.

A DLD can have a few different policies, but the two major ones are:
-Policy 0xF (abort), command will be completed with command aborted error
(ATA) or status CHECK CONDITION (SCSI), with sense data indicating that
the command timed out.
-Policy 0xD (complete-unavailable), command will be completed without
error (ATA) or status GOOD (SCSI), with sense data indicating that the
command timed out. Note that the command will not have transferred any
data to/from the device when the command timed out, even though the
command returned success.

Regardless of the CDL policy, in case of a CDL timeout, the I/O will
result in a -ETIME error to user-space.

The DLDs are defined in the CDL log page(s) and are readable and writable.
Reading and writing the CDL DLDs are outside the scope of the kernel.
If a user wants to read or write the descriptors, they can do so using a
user-space application that sends passthrough commands, such as cdl-tools:
https://github.com/westerndigitalcorporation/cdl-tools


================================
The introduction of ioprio hints
================================
What the kernel does provide, is a method to let I/O use one of the CDL DLDs
defined in the device. Note that the kernel will simply forward the DLD index
to the device, so the kernel currently does not know, nor does it need to know,
how the DLDs are defined inside the device.

The way that the CDL DLD index is supplied to the kernel is by introducing a
new 10 bit "ioprio hint" field within the existing 16 bit ioprio definition.

Currently, only 6 out of the 16 ioprio bits are in use, the remaining 10 bits
are unused, and are currently explicitly disallowed to be set by the kernel.

For now, we only add ioprio hints representing CDL DLD index 1-7. Additional
ioprio hints for other QoS features could be defined in the future.

A theoretical future work could be to make an I/O scheduler aware of these
hints. E.g. for CDL, an I/O scheduler could make use of the duration limit
in each descriptor, and take that information into account while scheduling
commands. Right now, the ioprio hints will be ignored by the I/O schedulers.


==============================
How to use CDL from user-space
==============================
Since CDL is mutually exclusive with NCQ priority
(see ncq_prio_enable and sas_ncq_prio_enable in
Documentation/ABI/testing/sysfs-block-device),
CDL has to be explicitly enabled using:
echo 1 > /sys/block/$bdev/device/cdl_enable

Since the ioprio hints are supplied through the existing I/O priority API,
it should be simple for an application to make use of the ioprio hints.

It simply has to reuse one of the new macros defined in
include/uapi/linux/ioprio.h: IOPRIO_PRIO_HINT() or IOPRIO_PRIO_VALUE_HINT(),
and supply one of the new hints defined in include/uapi/linux/ioprio.h:
IOPRIO_HINT_DEV_DURATION_LIMIT_[1-7], which indicates that the I/O should
use the corresponding CDL DLD index 1-7.

By reusing the I/O priority API, the user can both define a DLD to use per
AIO (io_uring sqe->ioprio or libaio iocb->aio_reqprio) or per-thread
(ioprio_set()).


=======
Testing
=======
With the following fio patches:
https://github.com/floatious/fio/commits/cdl

fio adds support for ioprio hints, such that CDL can be tested using e.g.:
fio --ioengine=io_uring --cmdprio_percentage=10 --cmdprio_hint=DLD_index

A simple way to test is to use a DLD with a very short duration limit,
and send large reads. Regardless of the CDL policy, in case of a CDL
timeout, the I/O will result in a -ETIME error to user-space.

We also provide a CDL test suite located in the cdl-tools repo, see:
https://github.com/westerndigitalcorporation/cdl-tools#testing-a-system-command-duration-limits-support


We have tested this patch series using:
-real hardware
-the following QEMU implementation:
https://github.com/floatious/qemu/tree/cdl
(NOTE: the QEMU implementation requires you to define the CDL policy at compile
time, so you currently need to recompile QEMU when switching between policies.)


===================
Further information
===================
For further information about CDL, see Damien's slides:

Presented at SDC 2021:
https://www.snia.org/sites/default/files/SDC/2021/pdfs/SNIA-SDC21-LeMoal-Be-On-Time-command-duration-limits-Feature-Support-in%20Linux.pdf

Presented at Lund Linux Con 2022:
https://drive.google.com/file/d/1I6ChFc0h4JY9qZdO1bY5oCAdYCSZVqWw/view?usp=sharing


================
Changes since V6
================
-Rebased series on v6.4-rc1.
-Picked up Reviewed-by tags from Hannes (Thank you Hannes!)
-Picked up Reviewed-by tag from Christoph (Thank you Christoph!)
-Changed KernelVersion from 6.4 to 6.5 for new sysfs attributes.


For older change logs, see previous patch series versions:
https://lore.kernel.org/linux-scsi/20230406113252.41211-1-nks@flawful.org/
https://lore.kernel.org/linux-scsi/20230404182428.715140-1-nks@flawful.org/
https://lore.kernel.org/linux-scsi/20230309215516.3800571-1-niklas.cassel@wdc.com/
https://lore.kernel.org/linux-scsi/20230124190308.127318-1-niklas.cassel@wdc.com/
https://lore.kernel.org/linux-scsi/20230112140412.667308-1-niklas.cassel@wdc.com/
https://lore.kernel.org/linux-scsi/20221208105947.2399894-1-niklas.cassel@wdc.com/


Kind regards,
Niklas & Damien

Damien Le Moal (13):
  ioprio: cleanup interface definition
  block: introduce ioprio hints
  block: introduce BLK_STS_DURATION_LIMIT
  scsi: support retrieving sub-pages of mode pages
  scsi: support service action in scsi_report_opcode()
  scsi: detect support for command duration limits
  scsi: allow enabling and disabling command duration limits
  scsi: sd: set read/write commands CDL index
  ata: libata: detect support for command duration limits
  ata: libata-scsi: handle CDL bits in ata_scsiop_maint_in()
  ata: libata-scsi: add support for CDL pages mode sense
  ata: libata: add ATA feature control sub-page translation
  ata: libata: set read/write commands CDL index

Niklas Cassel (6):
  scsi: core: allow libata to complete successful commands via EH
  scsi: rename and move get_scsi_ml_byte()
  scsi: sd: handle read/write CDL timeout failures
  ata: libata-scsi: remove unnecessary !cmd checks
  ata: libata: change ata_eh_request_sense() to not set CHECK_CONDITION
  ata: libata: handle completion of CDL commands using policy 0xD

 Documentation/ABI/testing/sysfs-block-device |  22 ++
 block/bfq-iosched.c                          |   8 +-
 block/blk-core.c                             |   3 +
 block/ioprio.c                               |   6 +-
 drivers/ata/libata-core.c                    | 204 +++++++++-
 drivers/ata/libata-eh.c                      | 130 ++++++-
 drivers/ata/libata-sata.c                    | 103 ++++-
 drivers/ata/libata-scsi.c                    | 384 +++++++++++++++----
 drivers/ata/libata.h                         |   2 +-
 drivers/scsi/scsi.c                          | 171 ++++++++-
 drivers/scsi/scsi_error.c                    |  48 ++-
 drivers/scsi/scsi_lib.c                      |  15 +-
 drivers/scsi/scsi_priv.h                     |   6 +
 drivers/scsi/scsi_scan.c                     |   3 +
 drivers/scsi/scsi_sysfs.c                    |  30 ++
 drivers/scsi/scsi_transport_sas.c            |   2 +-
 drivers/scsi/sd.c                            |  59 ++-
 drivers/scsi/sr.c                            |   2 +-
 include/linux/ata.h                          |  11 +-
 include/linux/blk_types.h                    |   6 +
 include/linux/libata.h                       |  42 +-
 include/scsi/scsi_cmnd.h                     |   5 +
 include/scsi/scsi_device.h                   |  18 +-
 include/uapi/linux/ioprio.h                  |  68 +++-
 24 files changed, 1198 insertions(+), 150 deletions(-)

Comments

Damien Le Moal May 11, 2023, 12:34 p.m. UTC | #1
On 2023/05/11 13:22, Douglas Gilbert wrote:
> On 2023-05-10 21:13, Niklas Cassel wrote:
>> From: Niklas Cassel <niklas.cassel@wdc.com>
>>
>> Hello,
>>
>> This series adds support for Command Duration Limits.
>> The series is based on linux tag: v6.4-rc1
>> The series can also be found in git:
>> https://github.com/floatious/linux/commits/cdl-v7
>>
>>
>> =================
>> CDL in ATA / SCSI
>> =================
>> Command Duration Limits is defined in:
>> T13 ATA Command Set - 5 (ACS-5) and
>> T10 SCSI Primary Commands - 6 (SPC-6) respectively
>> (a simpler version of CDL is defined in T10 SPC-5).
>>
>> CDL defines Duration Limits Descriptors (DLD).
>> 7 DLDs for read commands and 7 DLDs for write commands.
>> Simply put, a DLD contains a limit and a policy.
>>
>> A command can specify that a certain limit should be applied by setting
>> the DLD index field (3 bits, so 0-7) in the command itself.
>>
>> The DLD index points to one of the 7 DLDs.
>> DLD index 0 means no descriptor, so no limit.
>> DLD index 1-7 means DLD 1-7.
>>
>> A DLD can have a few different policies, but the two major ones are:
>> -Policy 0xF (abort), command will be completed with command aborted error
>> (ATA) or status CHECK CONDITION (SCSI), with sense data indicating that
>> the command timed out.
>> -Policy 0xD (complete-unavailable), command will be completed without
>> error (ATA) or status GOOD (SCSI), with sense data indicating that the
>> command timed out. Note that the command will not have transferred any
>> data to/from the device when the command timed out, even though the
>> command returned success.
>>
>> Regardless of the CDL policy, in case of a CDL timeout, the I/O will
>> result in a -ETIME error to user-space.
>>
>> The DLDs are defined in the CDL log page(s) and are readable and writable.
> 
> The above sentence may be correct for ATA disks, but for SCSI disks the CDL log
> page is for statistics. Those statistics counters can be reset (to zero) but
> are not writable in the normal sense. To "define" (or change from the default
> values) CDL settings in SCSI, the CDL _mode_ pages are provided. Confusingly

Yes, indeed, for SCSI, CDL descriptors are defined in mode pages.

> there are four such mode pages (A, B, T2A and T2B). Which one(s) do you
> support/use?

All of them (see patch 8). For the ATA devices, the mode sense translation can
only report support for T2A/T2B (see patch 17).

Note that the kernel does not handle directly the CDL descriptor mode pages.
These need to be set by the user using passthrough tools such as sg3utils or
cdl-tools.
Damien Le Moal May 15, 2023, 10:58 p.m. UTC | #2
On 2023/05/11 10:13, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@wdc.com>

[...]

> Damien Le Moal (13):
>   ioprio: cleanup interface definition
>   block: introduce ioprio hints
>   block: introduce BLK_STS_DURATION_LIMIT

Jens,

We really need your review / Ack (if appropriate) of the first 3 patches of this
series. Can you please do that as soon as possible so that Martin can queue the
series through the scsi tree ?

Thank you.

>   scsi: support retrieving sub-pages of mode pages
>   scsi: support service action in scsi_report_opcode()
>   scsi: detect support for command duration limits
>   scsi: allow enabling and disabling command duration limits
>   scsi: sd: set read/write commands CDL index
>   ata: libata: detect support for command duration limits
>   ata: libata-scsi: handle CDL bits in ata_scsiop_maint_in()
>   ata: libata-scsi: add support for CDL pages mode sense
>   ata: libata: add ATA feature control sub-page translation
>   ata: libata: set read/write commands CDL index
> 
> Niklas Cassel (6):
>   scsi: core: allow libata to complete successful commands via EH
>   scsi: rename and move get_scsi_ml_byte()
>   scsi: sd: handle read/write CDL timeout failures
>   ata: libata-scsi: remove unnecessary !cmd checks
>   ata: libata: change ata_eh_request_sense() to not set CHECK_CONDITION
>   ata: libata: handle completion of CDL commands using policy 0xD
Martin K. Petersen May 22, 2023, 9:41 p.m. UTC | #3
Niklas,

> This series adds support for Command Duration Limits.

Applied to 6.5/scsi-staging, thanks!
Damien Le Moal May 22, 2023, 11:12 p.m. UTC | #4
On 5/23/23 06:41, Martin K. Petersen wrote:
> 
> Niklas,
> 
>> This series adds support for Command Duration Limits.
> 
> Applied to 6.5/scsi-staging, thanks!
> 

Thanks Martin !
Damien Le Moal May 23, 2023, 10:08 a.m. UTC | #5
On 5/23/23 18:56, Niklas Cassel wrote:
> On Mon, May 22, 2023 at 05:41:19PM -0400, Martin K. Petersen wrote:
>>
>> Niklas,
>>
>>> This series adds support for Command Duration Limits.
>>
>> Applied to 6.5/scsi-staging, thanks!
> 
> Thank you Martin!
> 
> 
> Damien, Martin,
> considering that the libata changes depend on the scsi changes,
> and considering that further libata EH cleanups are planned for
> 6.5 now when the IPR driver is gone, I think that the best move
> is to follow the advice of:
> https://docs.kernel.org/maintainer/rebasing-and-merging.html#merging-from-sibling-or-upstream-trees

Hannes cleanup of EH will create a conflict with the scsi tree but can go in
through the ata tree independently so I was not planning on doing a rebase,
especially not on the scsi tree. I will notify Stephen about the conflict send
him a resolution to apply and carry for linux-next. When the 6.5 merge window
open, I will wait for the James to send the scsi PR and send my PR to Linus
after that with the conflict resolution, as usual.

So far, I do not see any big issue with that.

> 
> Specifically:
> "Merging another subsystem tree to resolve a dependency risks bringing in
> other bugs and should almost never be done. If that subsystem tree fails
> to be pulled upstream, whatever problems it had will block the merging of
> your tree as well.
> Preferable alternatives include agreeing with the maintainer to carry both
> sets of changes in one of the trees or creating a topic branch dedicated
> to the prerequisite commits that can be merged into both trees."
> 
> 
> 
> Martin created a topic branch/SHA1 for the CDL series:
> 18bd7718b5c489b3161b6c2ab4685d57c1e2da3b
> in order for him to be able to have a nice merge commit:
> https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/commit/?h=6.5/scsi-staging&id=8b60e2189fcd8b10b592608256eb97aebfcff147
> 
> So, I suggest that, after this has been applied to
> 6.5/scsi-queue (right now it is only in 6.5/scsi-staging),
> that Damien merges the same topic branch/SHA1:
> 18bd7718b5c489b3161b6c2ab4685d57c1e2da3b
> to libata/for-6.5.
> 
> Perhaps the fix:
> https://lore.kernel.org/linux-scsi/20230523074701.293502-1-dlemoal@kernel.org/T/#u
> could be applied on top of that SHA1, or folded in,
> the important thing is that libata merges the exact same SHA1
> for the CDL series as scsi-queue.
> (Especially since I noticed that Martin did some minor changes to
> the ioprio hints patch, namely changed IO to I/O in the comments
> describing the macros, so Damien can't just take the patches from
> the list as is, as that would create conflicts for Linus when he
> merges the two different subsystem trees.)
> 
> 
> Kind regards,
> Niklas
Niklas Cassel May 23, 2023, 10:35 a.m. UTC | #6
On Tue, May 23, 2023 at 07:08:27PM +0900, Damien Le Moal wrote:
> On 5/23/23 18:56, Niklas Cassel wrote:
> > On Mon, May 22, 2023 at 05:41:19PM -0400, Martin K. Petersen wrote:
> >>
> >> Niklas,
> >>
> >>> This series adds support for Command Duration Limits.
> >>
> >> Applied to 6.5/scsi-staging, thanks!
> > 
> > Thank you Martin!
> > 
> > 
> > Damien, Martin,
> > considering that the libata changes depend on the scsi changes,
> > and considering that further libata EH cleanups are planned for
> > 6.5 now when the IPR driver is gone, I think that the best move
> > is to follow the advice of:
> > https://docs.kernel.org/maintainer/rebasing-and-merging.html#merging-from-sibling-or-upstream-trees
> 
> Hannes cleanup of EH will create a conflict with the scsi tree but can go in
> through the ata tree independently so I was not planning on doing a rebase,
> especially not on the scsi tree. I will notify Stephen about the conflict send
> him a resolution to apply and carry for linux-next. When the 6.5 merge window
> open, I will wait for the James to send the scsi PR and send my PR to Linus
> after that with the conflict resolution, as usual.

I wasn't suggesting that you do a rebase on the scsi tree, I was
suggesting for a topic branch to be merged through both trees.

But I was just thinking how to make it as easy as possible to
handle additional libata patches that should go on top of the
CDL series (without creating conflicts for Stephen and Linus).

> 
> So far, I do not see any big issue with that.

If you think that the conflicts will be trivial, feel free to
ignore my suggestion :)


Kind regards,
Niklas
Damien Le Moal May 23, 2023, 10:53 a.m. UTC | #7
On 5/23/23 19:35, Niklas Cassel wrote:
> On Tue, May 23, 2023 at 07:08:27PM +0900, Damien Le Moal wrote:
>> On 5/23/23 18:56, Niklas Cassel wrote:
>>> On Mon, May 22, 2023 at 05:41:19PM -0400, Martin K. Petersen wrote:
>>>>
>>>> Niklas,
>>>>
>>>>> This series adds support for Command Duration Limits.
>>>>
>>>> Applied to 6.5/scsi-staging, thanks!
>>>
>>> Thank you Martin!
>>>
>>>
>>> Damien, Martin,
>>> considering that the libata changes depend on the scsi changes,
>>> and considering that further libata EH cleanups are planned for
>>> 6.5 now when the IPR driver is gone, I think that the best move
>>> is to follow the advice of:
>>> https://docs.kernel.org/maintainer/rebasing-and-merging.html#merging-from-sibling-or-upstream-trees
>>
>> Hannes cleanup of EH will create a conflict with the scsi tree but can go in
>> through the ata tree independently so I was not planning on doing a rebase,
>> especially not on the scsi tree. I will notify Stephen about the conflict send
>> him a resolution to apply and carry for linux-next. When the 6.5 merge window
>> open, I will wait for the James to send the scsi PR and send my PR to Linus
>> after that with the conflict resolution, as usual.
> 
> I wasn't suggesting that you do a rebase on the scsi tree, I was
> suggesting for a topic branch to be merged through both trees.
> 
> But I was just thinking how to make it as easy as possible to
> handle additional libata patches that should go on top of the
> CDL series (without creating conflicts for Stephen and Linus).
> 
>>
>> So far, I do not see any big issue with that.
> 
> If you think that the conflicts will be trivial, feel free to
> ignore my suggestion :)

So far, the conflicts do not look bad at all. So we should be fine. Will adapt
if needed.

> 
> 
> Kind regards,
> Niklas
Martin K. Petersen June 1, 2023, 12:43 a.m. UTC | #8
On Thu, 11 May 2023 03:13:33 +0200, Niklas Cassel wrote:

> This series adds support for Command Duration Limits.
> The series is based on linux tag: v6.4-rc1
> The series can also be found in git:
> https://github.com/floatious/linux/commits/cdl-v7
> 
> 
> =================
> CDL in ATA / SCSI
> =================
> Command Duration Limits is defined in:
> T13 ATA Command Set - 5 (ACS-5) and
> T10 SCSI Primary Commands - 6 (SPC-6) respectively
> (a simpler version of CDL is defined in T10 SPC-5).
> 
> [...]

Applied to 6.5/scsi-queue, thanks!

[01/19] ioprio: cleanup interface definition
        https://git.kernel.org/mkp/scsi/c/eca2040972b4
[02/19] block: introduce ioprio hints
        https://git.kernel.org/mkp/scsi/c/6c913257226a
[03/19] block: introduce BLK_STS_DURATION_LIMIT
        https://git.kernel.org/mkp/scsi/c/dffc480d2df1
[04/19] scsi: core: allow libata to complete successful commands via EH
        https://git.kernel.org/mkp/scsi/c/3d848ca1ebc8
[05/19] scsi: rename and move get_scsi_ml_byte()
        https://git.kernel.org/mkp/scsi/c/734326937b65
[06/19] scsi: support retrieving sub-pages of mode pages
        https://git.kernel.org/mkp/scsi/c/a6cdc35fab0d
[07/19] scsi: support service action in scsi_report_opcode()
        https://git.kernel.org/mkp/scsi/c/152e52fb6ff1
[08/19] scsi: detect support for command duration limits
        https://git.kernel.org/mkp/scsi/c/624885209f31
[09/19] scsi: allow enabling and disabling command duration limits
        https://git.kernel.org/mkp/scsi/c/1b22cfb14142
[10/19] scsi: sd: set read/write commands CDL index
        https://git.kernel.org/mkp/scsi/c/e59e80cfef60
[11/19] scsi: sd: handle read/write CDL timeout failures
        https://git.kernel.org/mkp/scsi/c/390e2d1a5874
[12/19] ata: libata-scsi: remove unnecessary !cmd checks
        https://git.kernel.org/mkp/scsi/c/91a8967ca7f4
[13/19] ata: libata: change ata_eh_request_sense() to not set CHECK_CONDITION
        https://git.kernel.org/mkp/scsi/c/24aeebbf8ea9
[14/19] ata: libata: detect support for command duration limits
        https://git.kernel.org/mkp/scsi/c/62e4a60e0cdb
[15/19] ata: libata-scsi: handle CDL bits in ata_scsiop_maint_in()
        https://git.kernel.org/mkp/scsi/c/0de558015286
[16/19] ata: libata-scsi: add support for CDL pages mode sense
        https://git.kernel.org/mkp/scsi/c/673b2fe6ff1d
[17/19] ata: libata: add ATA feature control sub-page translation
        https://git.kernel.org/mkp/scsi/c/df60f9c64576
[18/19] ata: libata: set read/write commands CDL index
        https://git.kernel.org/mkp/scsi/c/eafe804bda7b
[19/19] ata: libata: handle completion of CDL commands using policy 0xD
        https://git.kernel.org/mkp/scsi/c/18bd7718b5c4