Message ID | 20250424-isolcpus-io-queues-v6-0-9a53a870ca1f@kernel.org |
---|---|
Headers | show |
Series | blk: honor isolcpus configuration | expand |
On Thu, Apr 24, 2025 at 08:19:39PM +0200, Daniel Wagner wrote: > I've added back the isolcpus io_queue agrument. This avoids any semantic > changes of managed_irq. IMO, this is correct thing to do. > I don't like it but I haven't found a > better way to deal with it. Ming clearly stated managed_irq should not > change. Precisely, we can't cause io hang and break existing managed_irq applications, especially you know there isn't kernel solution for it, same for v5, v6 or whatever. I will look at v6 this week. Thanks, Ming
I've added back the isolcpus io_queue agrument. This avoids any semantic changes of managed_irq. I don't like it but I haven't found a better way to deal with it. Ming clearly stated managed_irq should not change. Another change is to prevent offlining a housekeeping CPU which is still serving an isolated CPU instead just warning. Seem way saner way to handle this situation. Thanks Mathieu! Here details what's the difference is between managed_irq and io_queue. * nr cpus <= nr hardware queue (e.g. 8 CPUs, 8 hardware queues) managed_irq is working nicely for situation where there hardware has at least as many hardware queues as CPUs, e.g. enterprise nvme-pci devices. managed_irq will assign each CPU its own hardware queue and ensures that non unbound IO is scheduled to a isolated CPU. As long the isolated CPU is not issuing any IO there will be no block layer 'noise' on the isolated CPU. - irqaffinity=0 isolcpus=managed_ird,2-3,6-7 queue mapping for /dev/nvme0n1 hctx0: default 0 hctx1: default 1 hctx2: default 2 hctx3: default 3 hctx4: default 4 hctx5: default 5 hctx6: default 6 hctx7: default 7 IRQ mapping for nvme0n1 irq 40 affinity 0 effective 0 nvme0q0 irq 41 affinity 0 effective 0 nvme0q1 irq 42 affinity 1 effective 1 nvme0q2 irq 43 affinity 2 effective 2 nvme0q3 irq 44 affinity 3 effective 3 nvme0q4 irq 45 affinity 4 effective 4 nvme0q5 irq 46 affinity 5 effective 5 nvme0q6 irq 47 affinity 6 effective 6 nvme0q7 irq 48 affinity 7 effective 7 nvme0q8 With this configuration io_queue will create four hctx for the four housekeeping CPUs: - irqaffinity=0 isolcpus=io_queue,2-3,6-7 queue mapping for /dev/nvme0n1 hctx0: default 0 2 hctx1: default 1 3 hctx2: default 4 6 hctx3: default 5 7 IRQ mapping for /dev/nvme0n1 irq 36 affinity 0 effective 0 nvme0q0 irq 37 affinity 0 effective 0 nvme0q1 irq 38 affinity 1 effective 1 nvme0q2 irq 39 affinity 4 effective 4 nvme0q3 irq 40 affinity 5 effective 5 nvme0q4 * nr cpus > nr hardware queue (e.g. 8 CPUs, 2 hardware queues) managed_irq is creating two hctx and all CPUs could handle IRQs. In this case an isolated CPU is selected to handle all IRQs for a given hctx: - irqaffinity=0 isolcpus=managed_ird,2-3,6-7 queue mapping for /dev/nvme0n1 hctx0: default 0 1 2 3 hctx1: default 4 5 6 7 IRQ mapping for /dev/nvme0n1 irq 40 affinity 0 effective 0 nvme0q0 irq 41 affinity 0-3 effective 3 nvme0q1 irq 42 affinity 4-7 effective 7 nvme0q2 io_queue also creates also two hctxs but only assigns housekeeping CPUs to handle the IRQs: - irqaffinity=0 isolcpus=io_queue,2-3,6-7 queue mapping for /dev/nvme0n1 hctx0: default 0 1 2 6 hctx1: default 3 4 5 7 IRQ mapping for /dev/nvme0n1 irq 36 affinity 0 effective 0 nvme0q0 irq 37 affinity 0-1 effective 1 nvme0q1 irq 38 affinity 4-5 effective 5 nvme0q2 The case that there are less hardware queues than CPUs is more common with the SCSI HBAs so with the io_queue approach not just nvme-pci are supported. Something completely different: we got several bug reports for kdump and SCSI HBAs. The issue is that the SCSI drivers are allocating too many resources when it's a kdump kernel. This series will fix this as well, because the number of queues will be limitted by blk_mq_num_possible_queues() instead of num_possible_cpus(). This will avoid sprinkling is_kdump_kernel() around. Signed-off-by: Daniel Wagner <wagi@kernel.org> --- Changes in v6: - added io_queue isolcpus type back - prevent offlining hk cpu if a isol cpu is still present isntead just warning - Link to v5: https://lore.kernel.org/r/20250110-isolcpus-io-queues-v5-0-0e4f118680b0@kernel.org Changes in v5: - rebased on latest for-6.14/block - udpated documetation on managed_irq - updated commit message "blk-mq: issue warning when offlining hctx with online isolcpus" - split input/output parameter in "lib/group_cpus: let group_cpu_evenly return number of groups" - dropped "sched/isolation: document HK_TYPE housekeeping option" - Link to v4: https://lore.kernel.org/r/20241217-isolcpus-io-queues-v4-0-5d355fbb1e14@kernel.org Changes in v4: - added "blk-mq: issue warning when offlining hctx with online isolcpus" - fixed check in cgroup_cpus_evenly, the if condition needs to use housekeeping_enabled() and not cpusmask_weight(housekeeping_masks), because the later will always return a valid mask. - dropped fixed tag from "lib/group_cpus.c: honor housekeeping config when grouping CPUs" - fixed overlong line "scsi: use block layer helpers to calculate num of queues" - dropped "sched/isolation: Add io_queue housekeeping option", just document the housekeep enum hk_type - added "lib/group_cpus: let group_cpu_evenly return number of groups" - collected tags - splitted series into a preperation series: https://lore.kernel.org/linux-nvme/20241202-refactor-blk-affinity-helpers-v6-0-27211e9c2cd5@kernel.org/ - Link to v3: https://lore.kernel.org/r/20240806-isolcpus-io-queues-v3-0-da0eecfeaf8b@suse.de Changes in v3: - lifted a couple of patches from https://lore.kernel.org/all/20210709081005.421340-1-ming.lei@redhat.com/ "virito: add APIs for retrieving vq affinity" "blk-mq: introduce blk_mq_dev_map_queues" - replaces all users of blk_mq_[pci|virtio]_map_queues with blk_mq_dev_map_queues - updated/extended number of queue calc helpers - add isolcpus=io_queue CPU-hctx mapping function - documented enum hk_type and isolcpus=io_queue - added "scsi: pm8001: do not overwrite PCI queue mapping" - Link to v2: https://lore.kernel.org/r/20240627-isolcpus-io-queues-v2-0-26a32e3c4f75@suse.de Changes in v2: - updated documentation - splitted blk/nvme-pci patch - dropped HK_TYPE_IO_QUEUE, use HK_TYPE_MANAGED_IRQ - Link to v1: https://lore.kernel.org/r/20240621-isolcpus-io-queues-v1-0-8b169bf41083@suse.de --- Daniel Wagner (9): lib/group_cpus: let group_cpu_evenly return number initialized masks blk-mq: add number of queue calc helper nvme-pci: use block layer helpers to calculate num of queues scsi: use block layer helpers to calculate num of queues virtio: blk/scsi: use block layer helpers to calculate num of queues isolation: introduce io_queue isolcpus type lib/group_cpus: honor housekeeping config when grouping CPUs blk-mq: use hk cpus only when isolcpus=io_queue is enabled blk-mq: prevent offlining hk CPU with associated online isolated CPUs block/blk-mq-cpumap.c | 116 +++++++++++++++++++++++++++++- block/blk-mq.c | 46 +++++++++++- drivers/block/virtio_blk.c | 5 +- drivers/nvme/host/pci.c | 5 +- drivers/scsi/megaraid/megaraid_sas_base.c | 15 ++-- drivers/scsi/qla2xxx/qla_isr.c | 10 +-- drivers/scsi/smartpqi/smartpqi_init.c | 5 +- drivers/scsi/virtio_scsi.c | 1 + drivers/virtio/virtio_vdpa.c | 9 +-- fs/fuse/virtio_fs.c | 6 +- include/linux/blk-mq.h | 2 + include/linux/group_cpus.h | 3 +- include/linux/sched/isolation.h | 1 + kernel/irq/affinity.c | 9 +-- kernel/sched/isolation.c | 7 ++ lib/group_cpus.c | 90 +++++++++++++++++++++-- 16 files changed, 290 insertions(+), 40 deletions(-) --- base-commit: 3b607b75a345b1d808031bf1bb1038e4dac8d521 change-id: 20240620-isolcpus-io-queues-1a88eb47ff8b Best regards,