Message ID | 20230913163823.7880-1-james.morse@arm.com |
---|---|
Headers | show |
Series | ACPI/arm64: add support for virtual cpuhotplug | expand |
On Wed, Sep 13, 2023 at 04:37:49PM +0000, James Morse wrote: > diff --git a/arch/loongarch/include/asm/cpu.h b/arch/loongarch/include/asm/cpu.h > index 48b9f7168bcc..7afe8cbb844e 100644 > --- a/arch/loongarch/include/asm/cpu.h > +++ b/arch/loongarch/include/asm/cpu.h > @@ -128,4 +128,11 @@ enum cpu_type_enum { > #define LOONGARCH_CPU_HYPERVISOR BIT_ULL(CPU_FEATURE_HYPERVISOR) > #define LOONGARCH_CPU_PTW BIT_ULL(CPU_FEATURE_PTW) > > +#if !defined(__ASSEMBLY__) > +#ifdef CONFIG_HOTPLUG_CPU > +int arch_register_cpu(int num); > +void arch_unregister_cpu(int cpu); > +#endif > +#endif /* ! __ASSEMBLY__ */ So, for loongarch: grep arch_.*register_cpu arch/loongarch/ -r arch/loongarch/kernel/topology.c:int arch_register_cpu(int cpu) arch/loongarch/kernel/topology.c:EXPORT_SYMBOL(arch_register_cpu); arch/loongarch/kernel/topology.c:void arch_unregister_cpu(int cpu) arch/loongarch/kernel/topology.c:EXPORT_SYMBOL(arch_unregister_cpu); So really this is a fix (since these functions should have prototypes) and thus should probably be a separate patch. However, I also wonder whether these prototypes should be added to linux/cpu.h and be done with it (rather than have every arch prototype these - it's not like the prototype can be different from this because of the generic code. I know in subsequent patches you do that, but it's rather piecemeal, and I think this is a change that could be submitted now as both a fix and clean up.
On Wed, Sep 13, 2023 at 04:37:53PM +0000, James Morse wrote: > loongarch, mips, parisc, riscv and sh all print a warning if > register_cpu() returns an error. Architectures that use > GENERIC_CPU_DEVICES call panic() instead. > > Errors in this path indicate something is wrong with the firmware > description of the platform, but the kernel is able to keep running. > > Downgrade this to a warning to make it easier to debug this issue. > > This will allow architectures that switching over to GENERIC_CPU_DEVICES > to drop their warning, but keep the existing behaviour. > > Signed-off-by: James Morse <james.morse@arm.com> Assuming other architectures do similar to x86 (which only return the error code from register_cpu()), the only error that would occur here is if device_register() fails, which would be catastophic, and I suspect the system would fail to boot anyway. Downgrading the panic to a warning at least gives us a chance that the system may come up sufficiently to examine what happened, so I think this makes sense: Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
On Wed, Sep 13, 2023 at 04:37:57PM +0000, James Morse wrote: > Now that GENERIC_CPU_DEVICES calls arch_register_cpu(), which can be > overridden by the arch code, switch over to this to allow common code > to choose when the register_cpu() call is made. > > This allows topology_init() to be removed. > > This is an intermediate step to the logic being moved to drivers/acpi, > where GENERIC_CPU_DEVICES will do the work when booting with acpi=off. > > Signed-off-by: James Morse <james.morse@arm.com> Same comment as x86 (moving the point at which cpus are registered ought to be mentioned in the commit message.)
On Wed, 13 Sep 2023 16:37:52 +0000 James Morse <james.morse@arm.com> wrote: > NUMA systems require the node descriptions to be ready before CPUs are > registered. This is so that the node symlinks can be created in sysfs. > > Currently no NUMA platform uses GENERIC_CPU_DEVICES, meaning that CPUs > are registered by arch code, instead of cpu_dev_init(). Worth saying why this matters I think. I wrote a nice note on that being a possible problem path as node_dev_init() uses the results of cpu_dev_init() if CONFIG_GENERIC_CPU_DEVICES before seeing this comment and realizing you had it covered (sort of anyway). > > Move cpu_dev_init() after node_dev_init() so that NUMA architectures > can use GENERIC_CPU_DEVICES. > > Signed-off-by: James Morse <james.morse@arm.com> > --- > drivers/base/init.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/base/init.c b/drivers/base/init.c > index 397eb9880cec..c4954835128c 100644 > --- a/drivers/base/init.c > +++ b/drivers/base/init.c > @@ -35,8 +35,8 @@ void __init driver_init(void) > of_core_init(); > platform_bus_init(); > auxiliary_bus_init(); > - cpu_dev_init(); > memory_dev_init(); > node_dev_init(); > + cpu_dev_init(); > container_dev_init(); > }
On Wed, 13 Sep 2023 16:37:57 +0000 James Morse <james.morse@arm.com> wrote: > Now that GENERIC_CPU_DEVICES calls arch_register_cpu(), which can be > overridden by the arch code, switch over to this to allow common code > to choose when the register_cpu() call is made. > > This allows topology_init() to be removed. > > This is an intermediate step to the logic being moved to drivers/acpi, > where GENERIC_CPU_DEVICES will do the work when booting with acpi=off. > > Signed-off-by: James Morse <james.morse@arm.com> > --- > arch/loongarch/Kconfig | 1 + > arch/loongarch/kernel/topology.c | 29 ++--------------------------- > 2 files changed, 3 insertions(+), 27 deletions(-) > > diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig > index 2bddd202470e..5bed51adc68c 100644 > --- a/arch/loongarch/Kconfig > +++ b/arch/loongarch/Kconfig > @@ -72,6 +72,7 @@ config LOONGARCH > select GENERIC_CLOCKEVENTS > select GENERIC_CMOS_UPDATE > select GENERIC_CPU_AUTOPROBE > + select GENERIC_CPU_DEVICES > select GENERIC_ENTRY > select GENERIC_GETTIMEOFDAY > select GENERIC_IOREMAP if !ARCH_IOREMAP > diff --git a/arch/loongarch/kernel/topology.c b/arch/loongarch/kernel/topology.c > index caa7cd859078..8e4441c1ff39 100644 > --- a/arch/loongarch/kernel/topology.c > +++ b/arch/loongarch/kernel/topology.c > @@ -7,20 +7,13 @@ > #include <linux/percpu.h> > #include <asm/bootinfo.h> > > -static DEFINE_PER_CPU(struct cpu, cpu_devices); > - > #ifdef CONFIG_HOTPLUG_CPU > int arch_register_cpu(int cpu) > { > - int ret; > struct cpu *c = &per_cpu(cpu_devices, cpu); > > - c->hotpluggable = 1; This is a bit subtle. Can loongarch hotplug a CPU that is also io_master(cpu)? I have no idea if there is a subtle difference between. 1) CPUs present at boot where if they are an io_master they are not allowed to be hot removed. 2) CPUs that turn up (hotplugged) later which are an io_master and by original code can be removed. My guess is that no io_master CPU can be hotplugged in making this irrelevant and your code correct as the =1 is just a micro optimizatoin. If we can confirm that, a one line addition to the patch description would be great. Otherwise LGTM > - ret = register_cpu(c, cpu); > - if (ret < 0) > - pr_warn("register_cpu %d failed (%d)\n", cpu, ret); > - > - return ret; > + c->hotpluggable = !io_master(cpu); > + return register_cpu(c, cpu); > } > EXPORT_SYMBOL(arch_register_cpu); > > @@ -33,21 +26,3 @@ void arch_unregister_cpu(int cpu) > } > EXPORT_SYMBOL(arch_unregister_cpu); > #endif > - > -static int __init topology_init(void) > -{ > - int i, ret; > - > - for_each_present_cpu(i) { > - struct cpu *c = &per_cpu(cpu_devices, i); > - > - c->hotpluggable = !io_master(i); > - ret = register_cpu(c, i); > - if (ret < 0) > - pr_warn("topology_init: register_cpu %d failed (%d)\n", i, ret); > - } > - > - return 0; > -} > - > -subsys_initcall(topology_init);
On Wed, 13 Sep 2023 16:38:03 +0000 James Morse <james.morse@arm.com> wrote: > ACPI has two ways of describing processors in the DSDT. Either as a device > object with HID ACPI0007, or as a type 'C' package inside a Processor > Container. The ACPI processor driver probes CPUs described as devices, but > not those described as packages. > Specification reference needed... Terminology wise, I'd just refer to Processor() objects as I think they are named objects rather than data terms like a package (Which include a PkgLength etc) > Duplicate descriptions are not allowed, the ACPI processor driver already > parses the UID from both devices and containers. acpi_processor_get_info() > returns an error if the UID exists twice in the DSDT. > > The missing probe for CPUs described as packages creates a problem for > moving the cpu_register() calls into the acpi_processor driver, as CPUs > described like this don't get registered, leading to errors from other > subsystems when they try to add new sysfs entries to the CPU node. > (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp) > > To fix this, parse the processor container and call acpi_processor_add() > for each processor that is discovered like this. The processor container > handler is added with acpi_scan_add_handler(), so no detach call will > arrive. > > Qemu TCG describes CPUs using packages in a processor container. processor terms in a processor container. > > Signed-off-by: James Morse <james.morse@arm.com> Otherwise looks fine to me. Jonathan > --- > drivers/acpi/acpi_processor.c | 22 ++++++++++++++++++++++ > 1 file changed, 22 insertions(+) > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > index c0839bcf78c1..b4bde78121bb 100644 > --- a/drivers/acpi/acpi_processor.c > +++ b/drivers/acpi/acpi_processor.c > @@ -625,9 +625,31 @@ static struct acpi_scan_handler processor_handler = { > }, > }; > > +static acpi_status acpi_processor_container_walk(acpi_handle handle, > + u32 lvl, > + void *context, > + void **rv) > +{ > + struct acpi_device *adev; > + acpi_status status; > + > + adev = acpi_get_acpi_dev(handle); > + if (!adev) > + return AE_ERROR; > + > + status = acpi_processor_add(adev, &processor_device_ids[0]); > + acpi_put_acpi_dev(adev); > + > + return status; > +} > + > static int acpi_processor_container_attach(struct acpi_device *dev, > const struct acpi_device_id *id) > { > + acpi_walk_namespace(ACPI_TYPE_PROCESSOR, dev->handle, > + ACPI_UINT32_MAX, acpi_processor_container_walk, > + NULL, NULL, NULL); > + > return 1; > } >
On Wed, 13 Sep 2023 16:38:07 +0000 James Morse <james.morse@arm.com> wrote: > A subsequent patch will change acpi_scan_hot_remove() to call > acpi_bus_trim_one() instead of acpi_bus_trim(), meaning it can no longer > rely on the prototype in the header file. > > Move these functions further up the file. > No change in behaviour. > > Signed-off-by: James Morse <james.morse@arm.com> FWIW Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > drivers/acpi/scan.c | 76 ++++++++++++++++++++++----------------------- > 1 file changed, 38 insertions(+), 38 deletions(-) > > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c > index f898591ce05f..a675333618ae 100644 > --- a/drivers/acpi/scan.c > +++ b/drivers/acpi/scan.c > @@ -244,6 +244,44 @@ static int acpi_scan_try_to_offline(struct acpi_device *device) > return 0; > } > > +static int acpi_bus_trim_one(struct acpi_device *adev, void *not_used) > +{ > + struct acpi_scan_handler *handler = adev->handler; > + > + acpi_dev_for_each_child_reverse(adev, acpi_bus_trim_one, NULL); > + > + adev->flags.match_driver = false; > + if (handler) { > + if (handler->detach) > + handler->detach(adev); > + > + adev->handler = NULL; > + } else { > + device_release_driver(&adev->dev); > + } > + /* > + * Most likely, the device is going away, so put it into D3cold before > + * that. > + */ > + acpi_device_set_power(adev, ACPI_STATE_D3_COLD); > + adev->flags.initialized = false; > + acpi_device_clear_enumerated(adev); > + > + return 0; > +} > + > +/** > + * acpi_bus_trim - Detach scan handlers and drivers from ACPI device objects. > + * @adev: Root of the ACPI namespace scope to walk. > + * > + * Must be called under acpi_scan_lock. > + */ > +void acpi_bus_trim(struct acpi_device *adev) > +{ > + acpi_bus_trim_one(adev, NULL); > +} > +EXPORT_SYMBOL_GPL(acpi_bus_trim); > + > static int acpi_scan_hot_remove(struct acpi_device *device) > { > acpi_handle handle = device->handle; > @@ -2506,44 +2544,6 @@ int acpi_bus_scan(acpi_handle handle) > } > EXPORT_SYMBOL(acpi_bus_scan); > > -static int acpi_bus_trim_one(struct acpi_device *adev, void *not_used) > -{ > - struct acpi_scan_handler *handler = adev->handler; > - > - acpi_dev_for_each_child_reverse(adev, acpi_bus_trim_one, NULL); > - > - adev->flags.match_driver = false; > - if (handler) { > - if (handler->detach) > - handler->detach(adev); > - > - adev->handler = NULL; > - } else { > - device_release_driver(&adev->dev); > - } > - /* > - * Most likely, the device is going away, so put it into D3cold before > - * that. > - */ > - acpi_device_set_power(adev, ACPI_STATE_D3_COLD); > - adev->flags.initialized = false; > - acpi_device_clear_enumerated(adev); > - > - return 0; > -} > - > -/** > - * acpi_bus_trim - Detach scan handlers and drivers from ACPI device objects. > - * @adev: Root of the ACPI namespace scope to walk. > - * > - * Must be called under acpi_scan_lock. > - */ > -void acpi_bus_trim(struct acpi_device *adev) > -{ > - acpi_bus_trim_one(adev, NULL); > -} > -EXPORT_SYMBOL_GPL(acpi_bus_trim); > - > int acpi_bus_register_early_device(int type) > { > struct acpi_device *device = NULL;
On Wed, 13 Sep 2023 16:38:09 +0000 James Morse <james.morse@arm.com> wrote: > struct acpi_scan_handler has a detach callback that is used to remove > a driver when a bus is changed. When interacting with an eject-request, > the detach callback is called before _EJ0. > > This means the ACPI processor driver can't use _STA to determine if a > CPU has been made not-present, or some of the other _STA bits have been > changed. acpi_processor_remove() needs to know the value of _STA after > _EJ0 has been called. Why hasn't it been a problem before? > > Add a post_eject callback to struct acpi_scan_handler. This is called > after acpi_scan_hot_remove() has successfully called _EJ0. Because > acpi_bus_trim_one() also clears the handler pointer, it needs to be > told if the caller will go on to call acpi_bus_post_eject(), so > that acpi_device_clear_enumerated() and clearing the handler pointer > can be deferred. The existing not-used pointer is used for this. > > Signed-off-by: James Morse <james.morse@arm.com> I briefly wondered if an alternative model where you always call the post walk was cleaner as the handler clear etc would always be in same place. However, couldn't make it work that nicely because you still need to indicate that it's an eject post handler or not which just moves the messy code. As such this LGTM Reviewed-by: Joanthan Cameron <Jonathan.Cameron@huawei.com> > --- > drivers/acpi/acpi_processor.c | 4 +-- > drivers/acpi/scan.c | 52 ++++++++++++++++++++++++++++++----- > include/acpi/acpi_bus.h | 1 + > 3 files changed, 48 insertions(+), 9 deletions(-) > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c > index 22a15a614f95..00dcc23d49a8 100644 > --- a/drivers/acpi/acpi_processor.c > +++ b/drivers/acpi/acpi_processor.c > @@ -459,7 +459,7 @@ static int acpi_processor_add(struct acpi_device *device, > > #ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU > /* Removal */ > -static void acpi_processor_remove(struct acpi_device *device) > +static void acpi_processor_post_eject(struct acpi_device *device) > { > struct acpi_processor *pr; > > @@ -627,7 +627,7 @@ static struct acpi_scan_handler processor_handler = { > .ids = processor_device_ids, > .attach = acpi_processor_add, > #ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU > - .detach = acpi_processor_remove, > + .post_eject = acpi_processor_post_eject, > #endif > .hotplug = { > .enabled = true, > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c > index a675333618ae..b6d2f01640a9 100644 > --- a/drivers/acpi/scan.c > +++ b/drivers/acpi/scan.c > @@ -244,18 +244,28 @@ static int acpi_scan_try_to_offline(struct acpi_device *device) > return 0; > } > > -static int acpi_bus_trim_one(struct acpi_device *adev, void *not_used) > +/** > + * acpi_bus_trim_one() - Detach scan handlers and drivers from ACPI device > + * objects. > + * @adev: Root of the ACPI namespace scope to walk. > + * @eject: Pointer to a bool that indicates if this was due to an > + * eject-request. > + * > + * Must be called under acpi_scan_lock. > + * If @eject points to true, clearing the device enumeration is deferred until > + * acpi_bus_post_eject() is called. > + */ > +static int acpi_bus_trim_one(struct acpi_device *adev, void *eject) > { > struct acpi_scan_handler *handler = adev->handler; > + bool is_eject = *(bool *)eject; > > - acpi_dev_for_each_child_reverse(adev, acpi_bus_trim_one, NULL); > + acpi_dev_for_each_child_reverse(adev, acpi_bus_trim_one, eject); > > adev->flags.match_driver = false; > if (handler) { > if (handler->detach) > handler->detach(adev); > - > - adev->handler = NULL; > } else { > device_release_driver(&adev->dev); > } > @@ -265,7 +275,12 @@ static int acpi_bus_trim_one(struct acpi_device *adev, void *not_used) > */ > acpi_device_set_power(adev, ACPI_STATE_D3_COLD); > adev->flags.initialized = false; > - acpi_device_clear_enumerated(adev); > + > + /* For eject this is deferred to acpi_bus_post_eject() */ > + if (!is_eject) { > + adev->handler = NULL; > + acpi_device_clear_enumerated(adev); > + } > > return 0; > } > @@ -278,15 +293,36 @@ static int acpi_bus_trim_one(struct acpi_device *adev, void *not_used) > */ > void acpi_bus_trim(struct acpi_device *adev) > { > - acpi_bus_trim_one(adev, NULL); > + bool eject = false; > + > + acpi_bus_trim_one(adev, &eject); > } > EXPORT_SYMBOL_GPL(acpi_bus_trim); > > +static int acpi_bus_post_eject(struct acpi_device *adev, void *not_used) > +{ > + struct acpi_scan_handler *handler = adev->handler; > + > + acpi_dev_for_each_child_reverse(adev, acpi_bus_post_eject, NULL); > + > + if (handler) { > + if (handler->post_eject) > + handler->post_eject(adev); > + > + adev->handler = NULL; > + } > + > + acpi_device_clear_enumerated(adev); > + > + return 0; > +} > + > static int acpi_scan_hot_remove(struct acpi_device *device) > { > acpi_handle handle = device->handle; > unsigned long long sta; > acpi_status status; > + bool eject = true; > > if (device->handler && device->handler->hotplug.demand_offline) { > if (!acpi_scan_is_offline(device, true)) > @@ -299,7 +335,7 @@ static int acpi_scan_hot_remove(struct acpi_device *device) > > acpi_handle_debug(handle, "Ejecting\n"); > > - acpi_bus_trim(device); > + acpi_bus_trim_one(device, &eject); > > acpi_evaluate_lck(handle, 0); > /* > @@ -322,6 +358,8 @@ static int acpi_scan_hot_remove(struct acpi_device *device) > } else if (sta & ACPI_STA_DEVICE_ENABLED) { > acpi_handle_warn(handle, > "Eject incomplete - status 0x%llx\n", sta); > + } else { > + acpi_bus_post_eject(device, NULL); > } > > return 0; > diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h > index 254685085c82..1b7e1acf925b 100644 > --- a/include/acpi/acpi_bus.h > +++ b/include/acpi/acpi_bus.h > @@ -127,6 +127,7 @@ struct acpi_scan_handler { > bool (*match)(const char *idstr, const struct acpi_device_id **matchid); > int (*attach)(struct acpi_device *dev, const struct acpi_device_id *id); > void (*detach)(struct acpi_device *dev); > + void (*post_eject)(struct acpi_device *dev); > void (*bind)(struct device *phys_dev); > void (*unbind)(struct device *phys_dev); > struct acpi_hotplug_profile hotplug;
On Wed, Sep 13, 2023 at 04:37:48PM +0000, James Morse wrote: > This series is based on v6.6-rc1, and can be retrieved from: > https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/ virtual_cpu_hotplug/rfc/v2 Hi James, FYI, this doesn't seem to be based upon v6.6-rc1, but v6.4-rc5. virtual_cpu_hotplug/rfc/v2 seems to have a hash of 505859b05e15. Thanks.
> From: James Morse <james.morse@arm.com> > Sent: Wednesday, September 13, 2023 5:38 PM [...] > > Hello! > > Changes since RFC-v1: > * riscv is new, ia64 is gone > * The KVM support is different, and upstream - no need to patch the host. > > --- > > This series adds what looks like cpuhotplug support to arm64 for use in > virtual machines. It does this by moving the cpu_register() calls for > architectures that support ACPI out of the arch code by using > GENERIC_CPU_DEVICES, then into the ACPI processor driver. > > The kubernetes folk really want to be able to add CPUs to an existing VM, > in exactly the same way they do on x86. The use-case is pre-booting guests > with one CPU, then adding the number that were actually needed when the > workload is provisioned. > [...] > > I had a go at switching the remaining architectures over to > GENERIC_CPU_DEVICES, > so that the Kconfig symbol can be removed, but I got stuck with powerpc > and s390. > > I've only build tested Loongarch and riscv. I've removed the ia64 specific > patches, but left the changes in other patches to make git-grep review of > renames easier. > > If folk want to play along at home, you'll need a copy of Qemu that > supports this. > https://github.com/salil-mehta/qemu.git salil/virt-cpuhp-armv8/rfc-v2-rc6 Please use the latest pushed RFC V2 instead: https://lore.kernel.org/qemu-devel/20230926100436.28284-1-salil.mehta@huawei.com/T/#m523b37819c4811c7827333982004e07a1ef03879 Repository: https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v2 Thanks Salil. [...] > Why is this still an RFC? I'm still looking for confirmation from the > kubernetes/kata folk that this works for them. Because of this I've culled > the CC list... > > > This series is based on v6.6-rc1, and can be retrieved from: > https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/virtual_cpu_hotplug/rfc/v2