mbox series

[RFC,v3,00/22] blk-mq/libata/scsi: SCSI driver tagging improvements Part I

Message ID 1666693096-180008-1-git-send-email-john.garry@huawei.com
Headers show
Series blk-mq/libata/scsi: SCSI driver tagging improvements Part I | expand

Message

John Garry Oct. 25, 2022, 10:17 a.m. UTC
Currently SCSI low-level drivers are required to manage tags for commands
which do not come via the block layer - libata internal commands would be
an example of one of these. We want to make blk-mq manage these tags also.

There was some work to provide "reserved commands" support in such series
as https://lore.kernel.org/linux-scsi/20211125151048.103910-1-hare@suse.de/

This was based on allocating a request for the lifetime of the "internal"
command.

This series tries to solve that problem by not just allocating the request
but also sending it as a request through the block layer. Reasons to do
this:
- Normal flow of a request and also commonality for regular scsi command
  lifetime
- We don't leave request and scsi_cmnd fields dangling as when we just
  allocate and free the request for the lifetime of the "internal" command
- For poll mode support we can only poll in block layer, so could not send
  internal commands on poll mode queues if we did not do this, which is a
  problem
- Can get rid of duplicated code like libsas internal command timeout
  handling

Series part I contains core SCSI midlayer, libata, and libsas changes to
queue libsas "slow" tasks as requests.

Series part II of this series focused on changing libata to queue internal
commands as requests.

Testing:
QEMU with AHCI with disk and cdrom attached, hisi_sas, pm8001.

Branch containing all patches is at:
https://github.com/hisilicon/kernel-dev/commits/private-topic-sas-6.1-block

v2 was here:
https://lore.kernel.org/linux-scsi/1654770559-101375-1-git-send-email-john.garry@huawei.com/

Hannes Reinecke (1):
  scsi: core: Implement reserved command handling

John Garry (21):
  blk-mq: Don't get budget for reserved requests
  scsi: core: Add scsi_get_dev()
  scsi: core: Add support to send reserved commands
  scsi: core: Add support for reserved command timeout handling
  scsi: libsas: Improve sas_ex_discover_expander() error handling
  scsi: libsas: Notify LLDD expander found before calling sas_rphy_add()
  scsi: scsi_transport_sas: Alloc sdev for expander
  scsi: libsas: Add sas_alloc_slow_task_rq()
  scsi: libsas: Add sas_queuecommand_internal()
  scsi: libsas: Add sas_internal_timeout()
  scsi: core: Use SCSI_SCAN_RESCAN in  __scsi_add_device()
  scsi: scsi_transport_sas: Allocate end device target id in the rphy
    alloc
  ata: libata-scsi: Add ata_scsi_setup_sdev()
  scsi: libsas: Add sas_ata_setup_device()
  ata: libata-scsi: Allocate sdev early in port probe
  scsi: libsas drivers: Reserve tags
  scsi: libsas: Queue SMP commands as requests
  scsi: libsas: Queue TMF commands as requests
  scsi: core: Add scsi_alloc_request_hwq()
  scsi: libsas: Queue internal abort commands as requests
  scsi: libsas: Delete sas_task_slow.timer

 block/blk-mq.c                         |   4 +-
 drivers/ata/libata-eh.c                |   1 +
 drivers/ata/libata-scsi.c              |  49 ++++++++----
 drivers/ata/libata.h                   |   1 +
 drivers/scsi/aic94xx/aic94xx_init.c    |   3 +
 drivers/scsi/hisi_sas/hisi_sas_main.c  |  40 +++++-----
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c |   3 +
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c |   3 +
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c |   7 ++
 drivers/scsi/hosts.c                   |  16 ++++
 drivers/scsi/isci/init.c               |   3 +
 drivers/scsi/libsas/sas_ata.c          |  20 +++++
 drivers/scsi/libsas/sas_expander.c     | 101 ++++++++++++++-----------
 drivers/scsi/libsas/sas_init.c         |  61 ++++++++++++++-
 drivers/scsi/libsas/sas_internal.h     |   5 ++
 drivers/scsi/libsas/sas_scsi_host.c    |  93 ++++++++++++-----------
 drivers/scsi/mvsas/mv_init.c           |   7 ++
 drivers/scsi/pm8001/pm8001_init.c      |   8 +-
 drivers/scsi/scsi_error.c              |   3 +
 drivers/scsi/scsi_lib.c                |  42 +++++++++-
 drivers/scsi/scsi_scan.c               |  29 ++++++-
 drivers/scsi/scsi_transport_sas.c      |  34 ++++++---
 include/linux/libata.h                 |   2 +
 include/scsi/libsas.h                  |   8 +-
 include/scsi/scsi_cmnd.h               |   3 +
 include/scsi/scsi_host.h               |  21 ++++-
 26 files changed, 424 insertions(+), 143 deletions(-)

Comments

John Garry Oct. 25, 2022, 10:11 a.m. UTC | #1
On 25/10/2022 11:17, John Garry wrote:

Hi all,

I meant to say that this is just an update for where I got to here. I am 
actually changing employer soon, but will continue in upstream linux 
storage domain. So I don't want people to think that I am just throwing 
some stuff over the wall for the community to deal with. I would still 
like people to check this.

Thanks,
John

> Currently SCSI low-level drivers are required to manage tags for commands
> which do not come via the block layer - libata internal commands would be
> an example of one of these. We want to make blk-mq manage these tags also.
> 
> There was some work to provide "reserved commands" support in such series
> as https://lore.kernel.org/linux-scsi/20211125151048.103910-1-hare@suse.de/
> 
> This was based on allocating a request for the lifetime of the "internal"
> command.
> 
> This series tries to solve that problem by not just allocating the request
> but also sending it as a request through the block layer. Reasons to do
> this:
> - Normal flow of a request and also commonality for regular scsi command
>    lifetime
> - We don't leave request and scsi_cmnd fields dangling as when we just
>    allocate and free the request for the lifetime of the "internal" command
> - For poll mode support we can only poll in block layer, so could not send
>    internal commands on poll mode queues if we did not do this, which is a
>    problem
> - Can get rid of duplicated code like libsas internal command timeout
>    handling
> 
> Series part I contains core SCSI midlayer, libata, and libsas changes to
> queue libsas "slow" tasks as requests.
> 
> Series part II of this series focused on changing libata to queue internal
> commands as requests.
> 
> Testing:
> QEMU with AHCI with disk and cdrom attached, hisi_sas, pm8001.
> 
> Branch containing all patches is at:
> https://github.com/hisilicon/kernel-dev/commits/private-topic-sas-6.1-block
> 
> v2 was here:
> https://lore.kernel.org/linux-scsi/1654770559-101375-1-git-send-email-john.garry@huawei.com/
> 
> Hannes Reinecke (1):
>    scsi: core: Implement reserved command handling
> 
> John Garry (21):
>    blk-mq: Don't get budget for reserved requests
>    scsi: core: Add scsi_get_dev()
>    scsi: core: Add support to send reserved commands
>    scsi: core: Add support for reserved command timeout handling
>    scsi: libsas: Improve sas_ex_discover_expander() error handling
>    scsi: libsas: Notify LLDD expander found before calling sas_rphy_add()
>    scsi: scsi_transport_sas: Alloc sdev for expander
>    scsi: libsas: Add sas_alloc_slow_task_rq()
>    scsi: libsas: Add sas_queuecommand_internal()
>    scsi: libsas: Add sas_internal_timeout()
>    scsi: core: Use SCSI_SCAN_RESCAN in  __scsi_add_device()
>    scsi: scsi_transport_sas: Allocate end device target id in the rphy
>      alloc
>    ata: libata-scsi: Add ata_scsi_setup_sdev()
>    scsi: libsas: Add sas_ata_setup_device()
>    ata: libata-scsi: Allocate sdev early in port probe
>    scsi: libsas drivers: Reserve tags
>    scsi: libsas: Queue SMP commands as requests
>    scsi: libsas: Queue TMF commands as requests
>    scsi: core: Add scsi_alloc_request_hwq()
>    scsi: libsas: Queue internal abort commands as requests
>    scsi: libsas: Delete sas_task_slow.timer
> 
>   block/blk-mq.c                         |   4 +-
>   drivers/ata/libata-eh.c                |   1 +
>   drivers/ata/libata-scsi.c              |  49 ++++++++----
>   drivers/ata/libata.h                   |   1 +
>   drivers/scsi/aic94xx/aic94xx_init.c    |   3 +
>   drivers/scsi/hisi_sas/hisi_sas_main.c  |  40 +++++-----
>   drivers/scsi/hisi_sas/hisi_sas_v1_hw.c |   3 +
>   drivers/scsi/hisi_sas/hisi_sas_v2_hw.c |   3 +
>   drivers/scsi/hisi_sas/hisi_sas_v3_hw.c |   7 ++
>   drivers/scsi/hosts.c                   |  16 ++++
>   drivers/scsi/isci/init.c               |   3 +
>   drivers/scsi/libsas/sas_ata.c          |  20 +++++
>   drivers/scsi/libsas/sas_expander.c     | 101 ++++++++++++++-----------
>   drivers/scsi/libsas/sas_init.c         |  61 ++++++++++++++-
>   drivers/scsi/libsas/sas_internal.h     |   5 ++
>   drivers/scsi/libsas/sas_scsi_host.c    |  93 ++++++++++++-----------
>   drivers/scsi/mvsas/mv_init.c           |   7 ++
>   drivers/scsi/pm8001/pm8001_init.c      |   8 +-
>   drivers/scsi/scsi_error.c              |   3 +
>   drivers/scsi/scsi_lib.c                |  42 +++++++++-
>   drivers/scsi/scsi_scan.c               |  29 ++++++-
>   drivers/scsi/scsi_transport_sas.c      |  34 ++++++---
>   include/linux/libata.h                 |   2 +
>   include/scsi/libsas.h                  |   8 +-
>   include/scsi/scsi_cmnd.h               |   3 +
>   include/scsi/scsi_host.h               |  21 ++++-
>   26 files changed, 424 insertions(+), 143 deletions(-)
>
Damien Le Moal Oct. 27, 2022, 1:18 a.m. UTC | #2
On 10/25/22 19:17, John Garry wrote:
> From: Hannes Reinecke <hare@suse.de>
> 
> Quite some drivers are using management commands internally, which
> typically use the same hardware tag pool (ie they are being allocated
> from the same hardware resources) as the 'normal' I/O commands.
> These commands are set aside before allocating the block-mq tag bitmap,
> so they'll never show up as busy in the tag map.
> The block-layer, OTOH, already has 'reserved_tags' to handle precisely
> this situation.
> So this patch adds a new field 'nr_reserved_cmds' to the SCSI host
> template to instruct the block layer to set aside a tag space for these
> management commands by using reserved tags.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> #jpg: Set tag_set->queue_depth = shost->can_queue, and not
> = shost->can_queue + shost->nr_reserved_cmds;
> Signed-off-by: John Garry <john.garry@huawei.com>
> ---
>  drivers/scsi/hosts.c     |  3 +++
>  drivers/scsi/scsi_lib.c  |  2 ++
>  include/scsi/scsi_host.h | 15 ++++++++++++++-
>  3 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> index 12346e2297fd..db89afc37bc9 100644
> --- a/drivers/scsi/hosts.c
> +++ b/drivers/scsi/hosts.c
> @@ -489,6 +489,9 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
>  	if (sht->virt_boundary_mask)
>  		shost->virt_boundary_mask = sht->virt_boundary_mask;
>  
> +	if (sht->nr_reserved_cmds)
> +		shost->nr_reserved_cmds = sht->nr_reserved_cmds;
> +

Nit: the if is not really necessary I think. But it does not hurt.

>  	device_initialize(&shost->shost_gendev);
>  	dev_set_name(&shost->shost_gendev, "host%d", shost->host_no);
>  	shost->shost_gendev.bus = &scsi_bus_type;
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 39d4fd124375..a8c4e7c037ae 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1978,6 +1978,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
>  	tag_set->nr_hw_queues = shost->nr_hw_queues ? : 1;
>  	tag_set->nr_maps = shost->nr_maps ? : 1;
>  	tag_set->queue_depth = shost->can_queue;
> +	tag_set->reserved_tags = shost->nr_reserved_cmds;
> +

Why the blank line ?

>  	tag_set->cmd_size = cmd_size;
>  	tag_set->numa_node = dev_to_node(shost->dma_dev);
>  	tag_set->flags = BLK_MQ_F_SHOULD_MERGE;
> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> index 750ccf126377..91678c77398e 100644
> --- a/include/scsi/scsi_host.h
> +++ b/include/scsi/scsi_host.h
> @@ -360,10 +360,17 @@ struct scsi_host_template {
>  	/*
>  	 * This determines if we will use a non-interrupt driven
>  	 * or an interrupt driven scheme.  It is set to the maximum number
> -	 * of simultaneous commands a single hw queue in HBA will accept.
> +	 * of simultaneous commands a single hw queue in HBA will accept
> +	 * including reserved commands.
>  	 */
>  	int can_queue;
>  
> +	/*
> +	 * This determines how many commands the HBA will set aside
> +	 * for reserved commands.
> +	 */
> +	int nr_reserved_cmds;
> +
>  	/*
>  	 * In many instances, especially where disconnect / reconnect are
>  	 * supported, our host also has an ID on the SCSI bus.  If this is
> @@ -611,6 +618,12 @@ struct Scsi_Host {
>  	 */
>  	unsigned nr_hw_queues;
>  	unsigned nr_maps;
> +
> +	/*
> +	 * Number of reserved commands to allocate, if any.
> +	 */
> +	unsigned int nr_reserved_cmds;
> +
>  	unsigned active_mode:2;
>  
>  	/*
Hannes Reinecke Oct. 27, 2022, 7:51 a.m. UTC | #3
On 10/27/22 03:18, Damien Le Moal wrote:
> On 10/25/22 19:17, John Garry wrote:
>> From: Hannes Reinecke <hare@suse.de>
>>
>> Quite some drivers are using management commands internally, which
>> typically use the same hardware tag pool (ie they are being allocated
>> from the same hardware resources) as the 'normal' I/O commands.
>> These commands are set aside before allocating the block-mq tag bitmap,
>> so they'll never show up as busy in the tag map.
>> The block-layer, OTOH, already has 'reserved_tags' to handle precisely
>> this situation.
>> So this patch adds a new field 'nr_reserved_cmds' to the SCSI host
>> template to instruct the block layer to set aside a tag space for these
>> management commands by using reserved tags.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> #jpg: Set tag_set->queue_depth = shost->can_queue, and not
>> = shost->can_queue + shost->nr_reserved_cmds;
>> Signed-off-by: John Garry <john.garry@huawei.com>
>> ---
>>   drivers/scsi/hosts.c     |  3 +++
>>   drivers/scsi/scsi_lib.c  |  2 ++
>>   include/scsi/scsi_host.h | 15 ++++++++++++++-
>>   3 files changed, 19 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
>> index 12346e2297fd..db89afc37bc9 100644
>> --- a/drivers/scsi/hosts.c
>> +++ b/drivers/scsi/hosts.c
>> @@ -489,6 +489,9 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
>>   	if (sht->virt_boundary_mask)
>>   		shost->virt_boundary_mask = sht->virt_boundary_mask;
>>   
>> +	if (sht->nr_reserved_cmds)
>> +		shost->nr_reserved_cmds = sht->nr_reserved_cmds;
>> +
> 
> Nit: the if is not really necessary I think. But it does not hurt.
> 
Yes, we do.
Not all HBAs are able to figure out the number of reserved commands 
upfront; some modify that based on the PCI device used etc.
So I'd keep it for now.

Cheers,

Hannes
John Garry Oct. 27, 2022, 8:16 a.m. UTC | #4
On 27/10/2022 08:51, Hannes Reinecke wrote:
>>>
>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>> #jpg: Set tag_set->queue_depth = shost->can_queue, and not
>>> = shost->can_queue + shost->nr_reserved_cmds;
>>> Signed-off-by: John Garry <john.garry@huawei.com>
>>> ---
>>>   drivers/scsi/hosts.c     |  3 +++
>>>   drivers/scsi/scsi_lib.c  |  2 ++
>>>   include/scsi/scsi_host.h | 15 ++++++++++++++-
>>>   3 files changed, 19 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
>>> index 12346e2297fd..db89afc37bc9 100644
>>> --- a/drivers/scsi/hosts.c
>>> +++ b/drivers/scsi/hosts.c
>>> @@ -489,6 +489,9 @@ struct Scsi_Host *scsi_host_alloc(struct 
>>> scsi_host_template *sht, int privsize)
>>>       if (sht->virt_boundary_mask)
>>>           shost->virt_boundary_mask = sht->virt_boundary_mask;
>>> +    if (sht->nr_reserved_cmds)
>>> +        shost->nr_reserved_cmds = sht->nr_reserved_cmds;
>>> +
>>
>> Nit: the if is not really necessary I think. But it does not hurt.
>>
> Yes, we do.
> Not all HBAs are able to figure out the number of reserved commands 
> upfront; some modify that based on the PCI device used etc.
> So I'd keep it for now.

I think logically Damien is right as in the shost alloc 
shost->nr_reserved_cmds is initially zero, so:

if (sht->nr_reserved_cmds)
        shost->nr_reserved_cmds = sht->nr_reserved_cmds;

is same as simply:

	shost->nr_reserved_cmds = sht->nr_reserved_cmds;

However I am just copying the coding style.

Thanks,
John
John Garry Oct. 27, 2022, 9:09 a.m. UTC | #5
On 27/10/2022 02:16, Damien Le Moal wrote:
>> Signed-off-by: John Garry<john.garry@huawei.com>
>> ---
>>   block/blk-mq.c          | 4 +++-
>>   drivers/scsi/scsi_lib.c | 3 ++-
>>   2 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 260adeb2e455..d8baabb32ea4 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -1955,11 +1955,13 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
>>   	errors = queued = 0;
>>   	do {
>>   		struct blk_mq_queue_data bd;
>> +		bool need_budget;
>>   
>>   		rq = list_first_entry(list, struct request, queuelist);
>>   
>>   		WARN_ON_ONCE(hctx != rq->mq_hctx);
>> -		prep = blk_mq_prep_dispatch_rq(rq, !nr_budgets);
>> +		need_budget = !nr_budgets && !blk_mq_is_reserved_rq(rq);
>> +		prep = blk_mq_prep_dispatch_rq(rq, need_budget);
>>   		if (prep != PREP_DISPATCH_OK)
>>   			break;
> Below this code, there is:
> 
> 		if (nr_budgets)
> 			nr_budgets--;
> 
> Don't you need to change that to:
> 
> 		if (need_budget && nr_budgets)
> 			nr_budgets--;
> 
> ? Otherwise, the accounting will be off.
> 

Ah, yes, I think that you are right. I actually need to check nr_budgets 
usage further as nr_budgets initial value would be dependent on any 
reserved request requiring a budget (which we don't get).

Thanks,
John
John Garry Oct. 27, 2022, 9:11 a.m. UTC | #6
> 
>>   	device_initialize(&shost->shost_gendev);
>>   	dev_set_name(&shost->shost_gendev, "host%d", shost->host_no);
>>   	shost->shost_gendev.bus = &scsi_bus_type;
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index 39d4fd124375..a8c4e7c037ae 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -1978,6 +1978,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
>>   	tag_set->nr_hw_queues = shost->nr_hw_queues ? : 1;
>>   	tag_set->nr_maps = shost->nr_maps ? : 1;
>>   	tag_set->queue_depth = shost->can_queue;
>> +	tag_set->reserved_tags = shost->nr_reserved_cmds;
>> +
> Why the blank line ?
> 

I don't think that it is required, I can remedy.

>>   	tag_set->cmd_size = cmd_size;

Thanks,
John