[PATCHv8,00/18] scsi: enabled reserved commands for LLDDs

Message ID 20210503150333.130310-1-hare@suse.de
Headers show
Series
  • scsi: enabled reserved commands for LLDDs
Related show

Message

Hannes Reinecke May 3, 2021, 3:03 p.m.
Hi all,

quite some drivers use internal commands for various purposes, most
commonly sending TMFs or querying the HBA status.
While these commands use the same submission mechanism than normal
I/O commands, they will not be counted as outstanding commands,
requiring those drivers to implement their own mechanism to figure
out outstanding commands.
The block layer already has the concept of 'reserved' tags for
precisely this purpose, namely non-I/O tags which live off a separate
tag pool. That guarantees that these commands can always be sent,
and won't be influenced by tag starvation from the I/O tag pool.
This patchset enables the use of reserved tags for the SCSI midlayer
by allocating a virtual LUN for the HBA itself which just serves
as a resource to allocate valid tags from.
This removes quite some hacks which were required for some
drivers (eg. fnic or snic), and allows the use of tagset
iterators within the drivers.

Command allocation currently ignores the hardware queues, as none
of the modified drivers is mq-capable.

The entire patchset can be found at

git://git.kernel.org/pub/scm/linux/kernel/git/hare/scsi-devel.git
reserved-tags.v8

This patchset also includes the busy_iter patches for fnic, which
were also sent as a separate patchset. So if they are applied
separately they can be dropped from this patchset.

As usual, comments and reviews are welcome.

Changes to v7:
- Drop changes to hisi_sas, pm8001, and mv_sas
- Drop patch to introduce REQ_INTERNAL flag
- Include reviews from John Garry

Changes to v6:
- Remove patch to drop gdth
- Rework libsas to use a tag per slow task
- Update hisi_sas, pm8001, and mv_sas

Changes to v5:
- Remove patch for csiostor
- Warn on normal commands in scsi_put_reserved_cmd()
- Fixup aacraid to not only scsi_put_internal_cmd() for
  reserved commands
- Add 'nr_reserved_cmds' field to host template
- Reshuffle patches

Changes to v4:
- Fixup kbuild warning
- Include reviews from Bart

Changes to v3:
- Kill gdth
- Only convert fnic, snic, hpsa, and aacraid
- Drop command emulation for pseudo host device
- make 'can_queue' exclude the number or reserved tags
- Drop persistent commands proposal
- Sanitize host device handling

Changes to v2:
- Update patches from John Garry
- Use virtual LUN as suggested by Christoph
- Improve SCSI Host device to present a real SCSI device
- Implement 'persistent' commands for AENs
- Convert Megaraid SAS

Changes to v1:
- Make scsi_{get, put}_reserved_cmd() for Scsi host
- Previously we separate scsi_{get, put}_reserved_cmd() for sdev
  and scsi_host_get_reserved_cmd() for the host
- Fix how Scsi_Host.can_queue is set in the virtio-scsi change
- Drop Scsi_Host.use_reserved_cmd_q
- Drop scsi_is_reserved_cmd()
- Add support in libsas and associated HBA drivers
- Allocate reserved command in slow task
- Switch hisi_sas to use reserved Scsi command
- Reorder the series a little
- Some tidying


Hannes Reinecke (18):
  fnic: kill 'exclude_id' argument to fnic_cleanup_io()
  fnic: use scsi_host_busy_iter() to traverse commands
  scsi: add scsi_{get,put}_internal_cmd() helper
  fnic: use internal commands
  scsi: use real inquiry data when initialising devices
  scsi: Use dummy inquiry data for the host device
  scsi: revamp host device handling
  snic: use reserved commands
  snic: use tagset iter for traversing commands
  scsi: implement reserved command handling
  hpsa: move hpsa_hba_inquiry after scsi_add_host()
  hpsa: use reserved commands
  hpsa: use scsi_host_busy_iter() to traverse outstanding commands
  hpsa: drop refcount field from CommandList
  aacraid: move scsi_add_host()
  aacraid: store target id in host_scribble
  aacraid: use scsi_get_internal_cmd()
  aacraid: use scsi_host_busy_iter() to traverse outstanding commands

 drivers/scsi/aacraid/aachba.c   | 137 ++---
 drivers/scsi/aacraid/aacraid.h  |  10 +-
 drivers/scsi/aacraid/commctrl.c |  25 +-
 drivers/scsi/aacraid/comminit.c |   2 +-
 drivers/scsi/aacraid/commsup.c  | 118 ++--
 drivers/scsi/aacraid/dpcsup.c   |   2 +-
 drivers/scsi/aacraid/linit.c    | 175 +++---
 drivers/scsi/fnic/fnic_scsi.c   | 927 +++++++++++++++-----------------
 drivers/scsi/hosts.c            |   3 +
 drivers/scsi/hpsa.c             | 365 ++++++-------
 drivers/scsi/hpsa.h             |   3 +-
 drivers/scsi/hpsa_cmd.h         |  10 -
 drivers/scsi/scsi_devinfo.c     |   1 +
 drivers/scsi/scsi_lib.c         |  48 +-
 drivers/scsi/scsi_scan.c        |  96 ++--
 drivers/scsi/scsi_sysfs.c       |   5 +-
 drivers/scsi/snic/snic.h        |   4 +-
 drivers/scsi/snic/snic_main.c   |   7 +
 drivers/scsi/snic/snic_scsi.c   | 524 +++++++++---------
 include/scsi/scsi_device.h      |   3 +
 include/scsi/scsi_host.h        |  36 +-
 21 files changed, 1234 insertions(+), 1267 deletions(-)

Comments

Bart Van Assche May 4, 2021, 2:12 a.m. | #1
On 5/3/21 8:03 AM, Hannes Reinecke wrote:
> 'exclude_id' is always SCSI_NO_TAG, which will never be reached

> when traversing the list of tags.


Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Bart Van Assche May 4, 2021, 2:25 a.m. | #2
On 5/3/21 8:03 AM, Hannes Reinecke wrote:
> -	tag = sc->request->tag;

> -	if (unlikely(tag < 0)) {

> -		/*

> -		 * Really should fix the midlayer to pass in a proper

> -		 * request for ioctls...

> -		 */

> -		tag = fnic_scsi_host_start_tag(fnic, sc);

> -		if (unlikely(tag == SCSI_NO_TAG))

> -			goto fnic_device_reset_end;

> -		tag_gen_flag = 1;

> -		new_sc = 1;

> -	}


Since this patch removes the only callers of fnic_scsi_host_start_tag()
and fnic_scsi_host_end_tag(), please modify this patch such that it also
removes these functions.

Thanks,

Bart.
Bart Van Assche May 4, 2021, 2:28 a.m. | #3
On 5/3/21 8:03 AM, Hannes Reinecke wrote:
> Use dummy inquiry data when initialising devices and not just

> some 'nullnullnull' string.


Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Bart Van Assche May 4, 2021, 3:20 a.m. | #4
On 5/3/21 8:03 AM, Hannes Reinecke wrote:
> These commands are set aside before allocating the block-mq tag bitmap,

> so they'll never show up as busy in the tag map.


That doesn't sound correct to me. Should the above perhaps be changed
into "blk_mq_start_request() is never called for internal commands so
they'll never show up as busy in the tag map"?

Thanks,

Bart.
Bart Van Assche May 4, 2021, 3:22 a.m. | #5
On 5/3/21 8:03 AM, Hannes Reinecke wrote:
> The probe_container mechanism requires a target id to be present,

> even if the device itself isn't present (yet).

> As we're now allocating internal commands the target id of the

> device is immutable, so store the requested target id in the

> host_scribble field.


A more elegant solution is probably to introduce private data per SCSI
command and to set the .cmd_size member in the SCSI host template. I'd
like to get rid of the host_scribble field because it makes the SCSI
command data structure larger than necessary for SCSI LLDs that don't
use 'host_scribble'.

Thanks,

Bart.
Hannes Reinecke May 4, 2021, 6:12 a.m. | #6
On 5/4/21 4:25 AM, Bart Van Assche wrote:
> On 5/3/21 8:03 AM, Hannes Reinecke wrote:

>> -	tag = sc->request->tag;

>> -	if (unlikely(tag < 0)) {

>> -		/*

>> -		 * Really should fix the midlayer to pass in a proper

>> -		 * request for ioctls...

>> -		 */

>> -		tag = fnic_scsi_host_start_tag(fnic, sc);

>> -		if (unlikely(tag == SCSI_NO_TAG))

>> -			goto fnic_device_reset_end;

>> -		tag_gen_flag = 1;

>> -		new_sc = 1;

>> -	}

> 

> Since this patch removes the only callers of fnic_scsi_host_start_tag()

> and fnic_scsi_host_end_tag(), please modify this patch such that it also

> removes these functions.

> 

Of course.

Will do in the next round.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke May 4, 2021, 6:17 a.m. | #7
On 5/4/21 5:20 AM, Bart Van Assche wrote:
> On 5/3/21 8:03 AM, Hannes Reinecke wrote:

>> These commands are set aside before allocating the block-mq tag bitmap,

>> so they'll never show up as busy in the tag map.

> 

> That doesn't sound correct to me. Should the above perhaps be changed

> into "blk_mq_start_request() is never called for internal commands so

> they'll never show up as busy in the tag map"?

> 

Yes, will do.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke May 4, 2021, 6:18 a.m. | #8
On 5/4/21 5:22 AM, Bart Van Assche wrote:
> On 5/3/21 8:03 AM, Hannes Reinecke wrote:

>> The probe_container mechanism requires a target id to be present,

>> even if the device itself isn't present (yet).

>> As we're now allocating internal commands the target id of the

>> device is immutable, so store the requested target id in the

>> host_scribble field.

> 

> A more elegant solution is probably to introduce private data per SCSI

> command and to set the .cmd_size member in the SCSI host template. I'd

> like to get rid of the host_scribble field because it makes the SCSI

> command data structure larger than necessary for SCSI LLDs that don't

> use 'host_scribble'.

> 

Ah. Good idea, both with using the .cmd_size and removing the 
host_scribble field.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Christoph Hellwig May 4, 2021, 9:49 a.m. | #9
Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig May 4, 2021, 9:55 a.m. | #10
On Mon, May 03, 2021 at 05:03:20PM +0200, Hannes Reinecke wrote:
> Use dummy inquiry data when initialising devices and not just

> some 'nullnullnull' string.


Why?

> +/*

> + * Dummy inquiry for virtual LUNs:

> + *

> + * standard INQUIRY: [qualifier indicates no connected LU]

> + *  PQual=1  Device_type=31  RMB=0  LU_CONG=0  version=0x05  [SPC-3]

> + *  [AERC=0]  [TrmTsk=0]  NormACA=0  HiSUP=0  Resp_data_format=2

> + *  SCCS=0  ACC=0  TPGS=0  3PC=0  Protect=0  [BQue=0]

> + *  EncServ=0  MultiP=0  [MChngr=0]  [ACKREQQ=0]  Addr16=0

> + *  [RelAdr=0]  WBus16=0  Sync=0  [Linked=0]  [TranDis=0]  CmdQue=0

> + *    length=36 (0x24)   Peripheral device type: no physical device on this lu

> + * Vendor identification: LINUX

> + * Product identification: VIRTUALLUN

> + * Product revision level: 1.0

> + */


You don't juse set this up for virtual Luns, but as a default for all
scsi_devices before calling inquirty.  I'd much helper with a helper
to fill out fake inquiry data rather than having seemingly valid data
for all devices before inquirty is called or if it fails.
Christoph Hellwig May 4, 2021, 9:59 a.m. | #11
Same questions as for fnic: why does the driver implements its own
command completion in the EH path?
John Garry May 4, 2021, 10:55 a.m. | #12
On 04/05/2021 07:17, Hannes Reinecke wrote:
> On 5/4/21 5:20 AM, Bart Van Assche wrote:

>> On 5/3/21 8:03 AM, Hannes Reinecke wrote:

>>> These commands are set aside before allocating the block-mq tag bitmap,

>>> so they'll never show up as busy in the tag map.

>>

>> That doesn't sound correct to me. Should the above perhaps be changed

>> into "blk_mq_start_request() is never called for internal commands so

>> they'll never show up as busy in the tag map"?

>>

> Yes, will do.


So why don't these - or shouldn't these - turn up in the busy tag map?

One of the motivations to use these block requests for internal commands 
is that we can take advantage of the block layer handling for CPU 
hotplug for MQ hosts, i.e. if blk-mq can't see these are inflight, then 
they would be missed in blk_mq_hctx_notify_offline() -> 
blk_mq_hctx_has_requests(), right? And who knows what else...

Thanks,
John
Hannes Reinecke May 4, 2021, 12:57 p.m. | #13
On 5/4/21 11:55 AM, Christoph Hellwig wrote:
> On Mon, May 03, 2021 at 05:03:20PM +0200, Hannes Reinecke wrote:

>> Use dummy inquiry data when initialising devices and not just

>> some 'nullnullnull' string.

> 

> Why?

> 

Because it's really weird if you start up scsi_debug with thousands of 
devices and then call 'lsscsi' repeatedly. That will print out several
devices with 'nullnullnull', only to be replaced with the 'real' inquiry 
data during device discovery.
I'd rather have a valid inquiry right from the start.

>> +/*

>> + * Dummy inquiry for virtual LUNs:

>> + *

>> + * standard INQUIRY: [qualifier indicates no connected LU]

>> + *  PQual=1  Device_type=31  RMB=0  LU_CONG=0  version=0x05  [SPC-3]

>> + *  [AERC=0]  [TrmTsk=0]  NormACA=0  HiSUP=0  Resp_data_format=2

>> + *  SCCS=0  ACC=0  TPGS=0  3PC=0  Protect=0  [BQue=0]

>> + *  EncServ=0  MultiP=0  [MChngr=0]  [ACKREQQ=0]  Addr16=0

>> + *  [RelAdr=0]  WBus16=0  Sync=0  [Linked=0]  [TranDis=0]  CmdQue=0

>> + *    length=36 (0x24)   Peripheral device type: no physical device on this lu

>> + * Vendor identification: LINUX

>> + * Product identification: VIRTUALLUN

>> + * Product revision level: 1.0

>> + */

> 

> You don't juse set this up for virtual Luns, but as a default for all

> scsi_devices before calling inquirty.  I'd much helper with a helper

> to fill out fake inquiry data rather than having seemingly valid data

> for all devices before inquirty is called or if it fails.

> 

Right. Will be doing so.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke May 4, 2021, 1:12 p.m. | #14
On 5/4/21 12:55 PM, John Garry wrote:
> On 04/05/2021 07:17, Hannes Reinecke wrote:

>> On 5/4/21 5:20 AM, Bart Van Assche wrote:

>>> On 5/3/21 8:03 AM, Hannes Reinecke wrote:

>>>> These commands are set aside before allocating the block-mq tag bitmap,

>>>> so they'll never show up as busy in the tag map.

>>>

>>> That doesn't sound correct to me. Should the above perhaps be changed

>>> into "blk_mq_start_request() is never called for internal commands so

>>> they'll never show up as busy in the tag map"?

>>>

>> Yes, will do.

> 

> So why don't these - or shouldn't these - turn up in the busy tag map?

> 

> One of the motivations to use these block requests for internal commands 

> is that we can take advantage of the block layer handling for CPU 

> hotplug for MQ hosts, i.e. if blk-mq can't see these are inflight, then 

> they would be missed in blk_mq_hctx_notify_offline() -> 

> blk_mq_hctx_has_requests(), right? And who knows what else...

> 

Oh, but of course it's possible to call 'start' on these requests to 
have them counted in the busy map.
I just didn't see the need for it until now, that's all.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Bart Van Assche May 4, 2021, 4:59 p.m. | #15
On 5/4/21 6:12 AM, Hannes Reinecke wrote:
> On 5/4/21 12:55 PM, John Garry wrote:

>> On 04/05/2021 07:17, Hannes Reinecke wrote:

>>> On 5/4/21 5:20 AM, Bart Van Assche wrote:

>>>> On 5/3/21 8:03 AM, Hannes Reinecke wrote:

>>>>> These commands are set aside before allocating the block-mq tag

>>>>> bitmap,

>>>>> so they'll never show up as busy in the tag map.

>>>>

>>>> That doesn't sound correct to me. Should the above perhaps be changed

>>>> into "blk_mq_start_request() is never called for internal commands so

>>>> they'll never show up as busy in the tag map"?

>>>>

>>> Yes, will do.

>>

>> So why don't these - or shouldn't these - turn up in the busy tag map?

>>

>> One of the motivations to use these block requests for internal

>> commands is that we can take advantage of the block layer handling for

>> CPU hotplug for MQ hosts, i.e. if blk-mq can't see these are inflight,

>> then they would be missed in blk_mq_hctx_notify_offline() ->

>> blk_mq_hctx_has_requests(), right? And who knows what else...

>>

> Oh, but of course it's possible to call 'start' on these requests to

> have them counted in the busy map.

> I just didn't see the need for it until now, that's all.


This is possible but this will require careful review of at least the
following code paths such that nothing unexpected happens for internal
commands:
* The SCSI timeout code.
* All blk_mq_tagset_busy_iter() and scsi_host_busy_iter() callers. As an
example, scsi_host_busy() must not include LLD-internal commands.

Thanks,

Bart.
Hannes Reinecke May 4, 2021, 6:09 p.m. | #16
On 5/4/21 6:59 PM, Bart Van Assche wrote:
> On 5/4/21 6:12 AM, Hannes Reinecke wrote:

>> On 5/4/21 12:55 PM, John Garry wrote:

>>> On 04/05/2021 07:17, Hannes Reinecke wrote:

>>>> On 5/4/21 5:20 AM, Bart Van Assche wrote:

>>>>> On 5/3/21 8:03 AM, Hannes Reinecke wrote:

>>>>>> These commands are set aside before allocating the block-mq tag

>>>>>> bitmap,

>>>>>> so they'll never show up as busy in the tag map.

>>>>>

>>>>> That doesn't sound correct to me. Should the above perhaps be changed

>>>>> into "blk_mq_start_request() is never called for internal commands so

>>>>> they'll never show up as busy in the tag map"?

>>>>>

>>>> Yes, will do.

>>>

>>> So why don't these - or shouldn't these - turn up in the busy tag map?

>>>

>>> One of the motivations to use these block requests for internal

>>> commands is that we can take advantage of the block layer handling for

>>> CPU hotplug for MQ hosts, i.e. if blk-mq can't see these are inflight,

>>> then they would be missed in blk_mq_hctx_notify_offline() ->

>>> blk_mq_hctx_has_requests(), right? And who knows what else...

>>>

>> Oh, but of course it's possible to call 'start' on these requests to

>> have them counted in the busy map.

>> I just didn't see the need for it until now, that's all.

> 

> This is possible but this will require careful review of at least the

> following code paths such that nothing unexpected happens for internal

> commands:

> * The SCSI timeout code.

> * All blk_mq_tagset_busy_iter() and scsi_host_busy_iter() callers. As an

> example, scsi_host_busy() must not include LLD-internal commands.

> 

Oh, _that_ is easy. These are reserved commands, which will have the
last bool argument to the iter functions set to 'true'.

 bool (*fn)(struct scsi_cmnd *, void *, bool)

So we just need to return from the iter if the last argument is true.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		        Kernel Storage Architect
hare@suse.de			               +49 911 74053 688
SUSE Software Solutions Germany GmbH, 90409 Nürnberg
GF: F. Imendörffer, HRB 36809 (AG Nürnberg)