mbox series

[V3,0/14] blk-mq: fix wrong queue mapping for kdump kernel

Message ID 20230808104239.146085-1-ming.lei@redhat.com
Headers show
Series blk-mq: fix wrong queue mapping for kdump kernel | expand

Message

Ming Lei Aug. 8, 2023, 10:42 a.m. UTC
Hi,

Fix wrong queue mapping for kdump kernel since blk-mq updates
nr_hw_queues to 1, so driver and blk-mq may have different queue topo.


V3:
	- cover more drivers
	- clean up blk-mq a bit, as suggested by Christoph

V2:
	- add helper of scsi_max_nr_hw_queues() for avoiding potential build
	failure because scsi driver often doesn't deal with blk-mq directly
	- apply scsi_max_nr_hw_queues() for all scsi changes
	- move lpfc's change into managed irq code path


Ming Lei (14):
  blk-mq: add blk_mq_max_nr_hw_queues()
  nvme-pci: use blk_mq_max_nr_hw_queues() to calculate io queues
  ublk: limit max allowed nr_hw_queues
  virtio-blk: limit max allowed submit queues
  scsi: core: add helper of scsi_max_nr_hw_queues()
  scsi: lpfc: use blk_mq_max_nr_hw_queues() to calculate io vectors
  scsi: mpi3mr: take blk_mq_max_nr_hw_queues() into account for
    calculating io vectors
  scsi: megaraid: take blk_mq_max_nr_hw_queues() into account for
    calculating io vectors
  scsi: mpt3sas: take blk_mq_max_nr_hw_queues() into account for
    calculating io vectors
  scsi: pm8001: take blk_mq_max_nr_hw_queues() into account for
    calculating io vectors
  scsi: hisi: take blk_mq_max_nr_hw_queues() into account for
    calculating io vectors
  scsi: ufs: limit max allowed nr_hw_queues
  scsi: storvsc: limit max allowed nr_hw_queues
  blk-mq: add helpers for treating kdump kernel

 block/blk-mq.c                            | 55 ++++++++++++++++++-----
 drivers/block/ublk_drv.c                  |  2 +-
 drivers/block/virtio_blk.c                |  3 +-
 drivers/nvme/host/pci.c                   |  2 +-
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c    |  3 ++
 drivers/scsi/lpfc/lpfc_init.c             |  2 +
 drivers/scsi/megaraid/megaraid_sas_base.c |  6 ++-
 drivers/scsi/mpi3mr/mpi3mr_fw.c           |  3 ++
 drivers/scsi/mpt3sas/mpt3sas_base.c       |  4 +-
 drivers/scsi/pm8001/pm8001_init.c         |  4 +-
 drivers/scsi/storvsc_drv.c                |  3 ++
 drivers/ufs/core/ufs-mcq.c                |  2 +-
 include/linux/blk-mq.h                    |  1 +
 include/scsi/scsi_host.h                  |  5 +++
 14 files changed, 75 insertions(+), 20 deletions(-)

Comments

Christoph Hellwig Aug. 9, 2023, 1:44 p.m. UTC | #1
I'm starting to sound like a broken record, but we can't just do random
is_kdump checks, and it's not going to get better by resending it again and
again.  If kdump kernels limit the number of possible CPUs, it needs to
reflected in cpu_possible_map and we need to use that information.
Ming Lei Aug. 10, 2023, 12:09 a.m. UTC | #2
On Wed, Aug 09, 2023 at 03:44:01PM +0200, Christoph Hellwig wrote:
> I'm starting to sound like a broken record, but we can't just do random
> is_kdump checks, and it's not going to get better by resending it again and
> again.  If kdump kernels limit the number of possible CPUs, it needs to
> reflected in cpu_possible_map and we need to use that information.
> 

Can you look at previous kdump/arch guys' comment about kdump usage &
num_possible_cpus?

    https://lore.kernel.org/linux-block/CAF+s44RuqswbosY9kMDx35crviQnxOeuvgNsuE75Bb0Y2Jg2uw@mail.gmail.com/
    https://lore.kernel.org/linux-block/ZKz912KyFQ7q9qwL@MiWiFi-R3L-srv/

The point is that kdump kernels does not limit the number of possible CPUs.

1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since
num_possible_cpus becomes 1.

2) some archs do not support 'nr_cpus=1', and have to rely on
'max_cpus=1', so num_possible_cpus isn't changed, and kernel just boots
with single online cpu. That causes trouble because blk-mq limits single
queue.

Documentation/admin-guide/kdump/kdump.rst

Thanks, 
Ming
Baoquan He Aug. 10, 2023, 1:18 a.m. UTC | #3
On 08/10/23 at 08:09am, Ming Lei wrote:
> On Wed, Aug 09, 2023 at 03:44:01PM +0200, Christoph Hellwig wrote:
> > I'm starting to sound like a broken record, but we can't just do random
> > is_kdump checks, and it's not going to get better by resending it again and
> > again.  If kdump kernels limit the number of possible CPUs, it needs to
> > reflected in cpu_possible_map and we need to use that information.
> > 
> 
> Can you look at previous kdump/arch guys' comment about kdump usage &
> num_possible_cpus?
> 
>     https://lore.kernel.org/linux-block/CAF+s44RuqswbosY9kMDx35crviQnxOeuvgNsuE75Bb0Y2Jg2uw@mail.gmail.com/
>     https://lore.kernel.org/linux-block/ZKz912KyFQ7q9qwL@MiWiFi-R3L-srv/
> 
> The point is that kdump kernels does not limit the number of possible CPUs.
> 
> 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since
> num_possible_cpus becomes 1.

Yes, "nr_cpus=" is strongly suggested in kdump kernel because "nr_cpus="
limits the possible cpu numbers, while "maxcpuss=" only limits the cpu
number which can be brought up during bootup. We noticed this diference
because a large number of possible cpus will cost more memory in kdump
kernel. e.g percpu initialization, even though kdump kernel have set
"maxcpus=1". 

Currently x86 and arm64 all support "nr_cpus=". Pingfan ever spent much
effort to make patches to add "nr_cpus=" support to ppc64, seems ppc64
dev and maintainers do not care about it. Finally the patches are not
accepted, and the work is not continued.

Now, I am wondering what is the barrier to add "nr_cpus=" to power ach.
Can we reconsider adding 'nr_cpus=' to power arch since real issue
occurred in kdump kernel?

As for this patchset, it can be accpeted so that no failure in kdump
kernel is seen on ARCHes w/o "nr_cpus=" support? My personal opinion.

> 
> 2) some archs do not support 'nr_cpus=1', and have to rely on
> 'max_cpus=1', so num_possible_cpus isn't changed, and kernel just boots
> with single online cpu. That causes trouble because blk-mq limits single
> queue.
> 
> Documentation/admin-guide/kdump/kdump.rst
> 
> Thanks, 
> Ming
>
Ming Lei Aug. 10, 2023, 2:06 a.m. UTC | #4
On Thu, Aug 10, 2023 at 09:18:27AM +0800, Baoquan He wrote:
> On 08/10/23 at 08:09am, Ming Lei wrote:
> > On Wed, Aug 09, 2023 at 03:44:01PM +0200, Christoph Hellwig wrote:
> > > I'm starting to sound like a broken record, but we can't just do random
> > > is_kdump checks, and it's not going to get better by resending it again and
> > > again.  If kdump kernels limit the number of possible CPUs, it needs to
> > > reflected in cpu_possible_map and we need to use that information.
> > > 
> > 
> > Can you look at previous kdump/arch guys' comment about kdump usage &
> > num_possible_cpus?
> > 
> >     https://lore.kernel.org/linux-block/CAF+s44RuqswbosY9kMDx35crviQnxOeuvgNsuE75Bb0Y2Jg2uw@mail.gmail.com/
> >     https://lore.kernel.org/linux-block/ZKz912KyFQ7q9qwL@MiWiFi-R3L-srv/
> > 
> > The point is that kdump kernels does not limit the number of possible CPUs.
> > 
> > 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since
> > num_possible_cpus becomes 1.
> 
> Yes, "nr_cpus=" is strongly suggested in kdump kernel because "nr_cpus="
> limits the possible cpu numbers, while "maxcpuss=" only limits the cpu
> number which can be brought up during bootup. We noticed this diference
> because a large number of possible cpus will cost more memory in kdump
> kernel. e.g percpu initialization, even though kdump kernel have set
> "maxcpus=1". 
> 
> Currently x86 and arm64 all support "nr_cpus=". Pingfan ever spent much
> effort to make patches to add "nr_cpus=" support to ppc64, seems ppc64
> dev and maintainers do not care about it. Finally the patches are not
> accepted, and the work is not continued.
> 
> Now, I am wondering what is the barrier to add "nr_cpus=" to power ach.
> Can we reconsider adding 'nr_cpus=' to power arch since real issue
> occurred in kdump kernel?

If 'nr_cpus=' can be supported on ppc64, this patchset isn't needed.

> 
> As for this patchset, it can be accpeted so that no failure in kdump
> kernel is seen on ARCHes w/o "nr_cpus=" support? My personal opinion.

IMO 'nr_cpus=' support should be preferred, given it is annoying to
maintain two kinds of implementation for kdump kernel from driver
viewpoint. I guess kdump things can be simplified too with supporting
'nr_cpus=' only.

thanks,
Ming
Baoquan He Aug. 10, 2023, 3:01 a.m. UTC | #5
On 08/10/23 at 10:06am, Ming Lei wrote:
> On Thu, Aug 10, 2023 at 09:18:27AM +0800, Baoquan He wrote:
> > On 08/10/23 at 08:09am, Ming Lei wrote:
> > > On Wed, Aug 09, 2023 at 03:44:01PM +0200, Christoph Hellwig wrote:
> > > > I'm starting to sound like a broken record, but we can't just do random
> > > > is_kdump checks, and it's not going to get better by resending it again and
> > > > again.  If kdump kernels limit the number of possible CPUs, it needs to
> > > > reflected in cpu_possible_map and we need to use that information.
> > > > 
> > > 
> > > Can you look at previous kdump/arch guys' comment about kdump usage &
> > > num_possible_cpus?
> > > 
> > >     https://lore.kernel.org/linux-block/CAF+s44RuqswbosY9kMDx35crviQnxOeuvgNsuE75Bb0Y2Jg2uw@mail.gmail.com/
> > >     https://lore.kernel.org/linux-block/ZKz912KyFQ7q9qwL@MiWiFi-R3L-srv/
> > > 
> > > The point is that kdump kernels does not limit the number of possible CPUs.
> > > 
> > > 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since
> > > num_possible_cpus becomes 1.
> > 
> > Yes, "nr_cpus=" is strongly suggested in kdump kernel because "nr_cpus="
> > limits the possible cpu numbers, while "maxcpuss=" only limits the cpu
> > number which can be brought up during bootup. We noticed this diference
> > because a large number of possible cpus will cost more memory in kdump
> > kernel. e.g percpu initialization, even though kdump kernel have set
> > "maxcpus=1". 
> > 
> > Currently x86 and arm64 all support "nr_cpus=". Pingfan ever spent much
> > effort to make patches to add "nr_cpus=" support to ppc64, seems ppc64
> > dev and maintainers do not care about it. Finally the patches are not
> > accepted, and the work is not continued.
> > 
> > Now, I am wondering what is the barrier to add "nr_cpus=" to power ach.
> > Can we reconsider adding 'nr_cpus=' to power arch since real issue
> > occurred in kdump kernel?
> 
> If 'nr_cpus=' can be supported on ppc64, this patchset isn't needed.
> 
> > 
> > As for this patchset, it can be accpeted so that no failure in kdump
> > kernel is seen on ARCHes w/o "nr_cpus=" support? My personal opinion.
> 
> IMO 'nr_cpus=' support should be preferred, given it is annoying to
> maintain two kinds of implementation for kdump kernel from driver
> viewpoint. I guess kdump things can be simplified too with supporting
> 'nr_cpus=' only.

Yes, 'nr_cpus=' is ideal. Not sure if there's some underlying concerns so
that power people decided to not support it.
Michael S. Tsirkin Aug. 10, 2023, 7:23 p.m. UTC | #6
On Tue, Aug 08, 2023 at 06:42:29PM +0800, Ming Lei wrote:
> Take blk-mq's knowledge into account for calculating io queues.
> 
> Fix wrong queue mapping in case of kdump kernel.
> 
> On arm and ppc64, 'maxcpus=1' is passed to kdump command line, see
> `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
> still returns all CPUs because 'maxcpus=1' just bring up one single
> cpu core during booting.
> 
> blk-mq sees single queue in kdump kernel, and in driver's viewpoint
> there are still multiple queues, this inconsistency causes driver to apply
> wrong queue mapping for handling IO, and IO timeout is triggered.
> 
> Meantime, single queue makes much less resource utilization, and reduce
> risk of kernel failure.
> 
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: virtualization@lists.linux-foundation.org
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

superficially:

Acked-by: Michael S. Tsirkin <mst@redhat.com>

but this patch only makes sense if the rest of patchset is merged.
feel free to merge directly.

> ---
>  drivers/block/virtio_blk.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 1fe011676d07..4ba79fe2a1b4 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -1047,7 +1047,8 @@ static int init_vq(struct virtio_blk *vblk)
>  
>  	num_poll_vqs = min_t(unsigned int, poll_queues, num_vqs - 1);
>  
> -	vblk->io_queues[HCTX_TYPE_DEFAULT] = num_vqs - num_poll_vqs;
> +	vblk->io_queues[HCTX_TYPE_DEFAULT] = min_t(unsigned,
> +			num_vqs - num_poll_vqs, blk_mq_max_nr_hw_queues());
>  	vblk->io_queues[HCTX_TYPE_READ] = 0;
>  	vblk->io_queues[HCTX_TYPE_POLL] = num_poll_vqs;
>  
> -- 
> 2.40.1
Hari Bathini Aug. 11, 2023, 7:53 a.m. UTC | #7
On 10/08/23 8:31 am, Baoquan He wrote:
> On 08/10/23 at 10:06am, Ming Lei wrote:
>> On Thu, Aug 10, 2023 at 09:18:27AM +0800, Baoquan He wrote:
>>> On 08/10/23 at 08:09am, Ming Lei wrote:
>>>> On Wed, Aug 09, 2023 at 03:44:01PM +0200, Christoph Hellwig wrote:
>>>>> I'm starting to sound like a broken record, but we can't just do random
>>>>> is_kdump checks, and it's not going to get better by resending it again and
>>>>> again.  If kdump kernels limit the number of possible CPUs, it needs to
>>>>> reflected in cpu_possible_map and we need to use that information.
>>>>>
>>>>
>>>> Can you look at previous kdump/arch guys' comment about kdump usage &
>>>> num_possible_cpus?
>>>>
>>>>      https://lore.kernel.org/linux-block/CAF+s44RuqswbosY9kMDx35crviQnxOeuvgNsuE75Bb0Y2Jg2uw@mail.gmail.com/
>>>>      https://lore.kernel.org/linux-block/ZKz912KyFQ7q9qwL@MiWiFi-R3L-srv/
>>>>
>>>> The point is that kdump kernels does not limit the number of possible CPUs.
>>>>
>>>> 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since
>>>> num_possible_cpus becomes 1.
>>>
>>> Yes, "nr_cpus=" is strongly suggested in kdump kernel because "nr_cpus="
>>> limits the possible cpu numbers, while "maxcpuss=" only limits the cpu
>>> number which can be brought up during bootup. We noticed this diference
>>> because a large number of possible cpus will cost more memory in kdump
>>> kernel. e.g percpu initialization, even though kdump kernel have set
>>> "maxcpus=1".
>>>
>>> Currently x86 and arm64 all support "nr_cpus=". Pingfan ever spent much
>>> effort to make patches to add "nr_cpus=" support to ppc64, seems ppc64
>>> dev and maintainers do not care about it. Finally the patches are not
>>> accepted, and the work is not continued.
>>>
>>> Now, I am wondering what is the barrier to add "nr_cpus=" to power ach.
>>> Can we reconsider adding 'nr_cpus=' to power arch since real issue
>>> occurred in kdump kernel?
>>
>> If 'nr_cpus=' can be supported on ppc64, this patchset isn't needed.
>>
>>>
>>> As for this patchset, it can be accpeted so that no failure in kdump
>>> kernel is seen on ARCHes w/o "nr_cpus=" support? My personal opinion.
>>
>> IMO 'nr_cpus=' support should be preferred, given it is annoying to
>> maintain two kinds of implementation for kdump kernel from driver
>> viewpoint. I guess kdump things can be simplified too with supporting
>> 'nr_cpus=' only.
> 
> Yes, 'nr_cpus=' is ideal. Not sure if there's some underlying concerns so
> that power people decided to not support it.

Though "nr_cpus=1" is an ideal solution, maintainer was not happy with
the patch as the code changes have impact for regular boot path and
it is likely to cause breakages. So, even if "nr_cpus=1" support for
ppc64 is revived, the change is going to take time to be accepted
upstream.

Also, I see is_kdump_kernel() being used irrespective of "nr_cpus=1"
support for other optimizations in the driver for the special dump
capture environment kdump is.

If there is no other downside for driver code, to use is_kdump_kernel(),
other than the maintainability aspect, I think the above changes are
worth considering.

Thanks
Hari
Christoph Hellwig Aug. 11, 2023, 1:10 p.m. UTC | #8
On Thu, Aug 10, 2023 at 08:09:27AM +0800, Ming Lei wrote:
> 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since
> num_possible_cpus becomes 1.
> 
> 2) some archs do not support 'nr_cpus=1', and have to rely on
> 'max_cpus=1', so num_possible_cpus isn't changed, and kernel just boots
> with single online cpu. That causes trouble because blk-mq limits single
> queue.

And we need to fix case 2.  We need to drop the is_kdump support, and
if they want to force less cpus they need to make nr_cpus=1 work.
Baoquan He Sept. 5, 2023, 5:03 a.m. UTC | #9
Hi Hari, Michael

On 08/11/23 at 01:23pm, Hari Bathini wrote:
> 
> 
> On 10/08/23 8:31 am, Baoquan He wrote:
> > On 08/10/23 at 10:06am, Ming Lei wrote:
> > > On Thu, Aug 10, 2023 at 09:18:27AM +0800, Baoquan He wrote:
> > > > On 08/10/23 at 08:09am, Ming Lei wrote:
> > > > > On Wed, Aug 09, 2023 at 03:44:01PM +0200, Christoph Hellwig wrote:
> > > > > > I'm starting to sound like a broken record, but we can't just do random
> > > > > > is_kdump checks, and it's not going to get better by resending it again and
> > > > > > again.  If kdump kernels limit the number of possible CPUs, it needs to
> > > > > > reflected in cpu_possible_map and we need to use that information.
> > > > > > 
> > > > > 
> > > > > Can you look at previous kdump/arch guys' comment about kdump usage &
> > > > > num_possible_cpus?
> > > > > 
> > > > >      https://lore.kernel.org/linux-block/CAF+s44RuqswbosY9kMDx35crviQnxOeuvgNsuE75Bb0Y2Jg2uw@mail.gmail.com/
> > > > >      https://lore.kernel.org/linux-block/ZKz912KyFQ7q9qwL@MiWiFi-R3L-srv/
> > > > > 
> > > > > The point is that kdump kernels does not limit the number of possible CPUs.
> > > > > 
> > > > > 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since
> > > > > num_possible_cpus becomes 1.
> > > > 
> > > > Yes, "nr_cpus=" is strongly suggested in kdump kernel because "nr_cpus="
> > > > limits the possible cpu numbers, while "maxcpuss=" only limits the cpu
> > > > number which can be brought up during bootup. We noticed this diference
> > > > because a large number of possible cpus will cost more memory in kdump
> > > > kernel. e.g percpu initialization, even though kdump kernel have set
> > > > "maxcpus=1".
> > > > 
> > > > Currently x86 and arm64 all support "nr_cpus=". Pingfan ever spent much
> > > > effort to make patches to add "nr_cpus=" support to ppc64, seems ppc64
> > > > dev and maintainers do not care about it. Finally the patches are not
> > > > accepted, and the work is not continued.
> > > > 
> > > > Now, I am wondering what is the barrier to add "nr_cpus=" to power ach.
> > > > Can we reconsider adding 'nr_cpus=' to power arch since real issue
> > > > occurred in kdump kernel?
> > > 
> > > If 'nr_cpus=' can be supported on ppc64, this patchset isn't needed.
> > > 
> > > > 
> > > > As for this patchset, it can be accpeted so that no failure in kdump
> > > > kernel is seen on ARCHes w/o "nr_cpus=" support? My personal opinion.
> > > 
> > > IMO 'nr_cpus=' support should be preferred, given it is annoying to
> > > maintain two kinds of implementation for kdump kernel from driver
> > > viewpoint. I guess kdump things can be simplified too with supporting
> > > 'nr_cpus=' only.
> > 
> > Yes, 'nr_cpus=' is ideal. Not sure if there's some underlying concerns so
> > that power people decided to not support it.
> 
> Though "nr_cpus=1" is an ideal solution, maintainer was not happy with
> the patch as the code changes have impact for regular boot path and
> it is likely to cause breakages. So, even if "nr_cpus=1" support for
> ppc64 is revived, the change is going to take time to be accepted
> upstream.

I talked to pingfan recently, he said he posted patches to add 'nr_cpus='
support in powerpc in order to reduce memory amount for kdump kernel.
His patches were rejected by maintainer because maintainer thought the
reason is not sufficient. So up to now, in architectures fedora/RHEL
supports to provide default crashkernel reservation value, powerpc costs
most. Now with this emerging issue, can we reconsider supporting
'nr_cpus=' in powerpc?

> 
> Also, I see is_kdump_kernel() being used irrespective of "nr_cpus=1"
> support for other optimizations in the driver for the special dump
> capture environment kdump is.
> 
> If there is no other downside for driver code, to use is_kdump_kernel(),
> other than the maintainability aspect, I think the above changes are
> worth considering.

Hi Hari,

By the way, will you use the ppc specific is_kdump_kernel() and
is_crashdump_kernel() in your patches to fix this issue?

Thanks
Baoquan