mbox series

[RFC,v2,00/35] ACPI/arm64: add support for virtual cpuhotplug

Message ID 20230913163823.7880-1-james.morse@arm.com
Headers show
Series ACPI/arm64: add support for virtual cpuhotplug | expand

Message

James Morse Sept. 13, 2023, 4:37 p.m. UTC
Hello!

Changes since RFC-v1:
 * riscv is new, ia64 is gone
 * The KVM support is different, and upstream - no need to patch the host.

---

This series adds what looks like cpuhotplug support to arm64 for use in
virtual machines. It does this by moving the cpu_register() calls for
architectures that support ACPI out of the arch code by using
GENERIC_CPU_DEVICES, then into the ACPI processor driver.

The kubernetes folk really want to be able to add CPUs to an existing VM,
in exactly the same way they do on x86. The use-case is pre-booting guests
with one CPU, then adding the number that were actually needed when the
workload is provisioned.

Wait? Doesn't arm64 support cpuhotplug already!?
In the arm world, cpuhotplug gets used to mean removing the power from a CPU.
The CPU is offline, and remains present. For x86, and ACPI, cpuhotplug
has the additional step of physically removing the CPU, so that it isn't
present anymore.

Arm64 doesn't support this, and can't support it: CPUs are really a slice
of the SoC, and there is not enough information in the existing ACPI tables
to describe which bits of the slice also got removed. Without a reference
machine: adding this support to the spec is a wild goose chase.

Critically: everything described in the firmware tables must remain present.

For a virtual machine this is easy as all the other bits of 'virtual SoC'
are emulated, so they can (and do) remain present when a vCPU is 'removed'.

On a system that supports cpuhotplug the MADT has to describe every possible
CPU at boot. Under KVM, the vGIC needs to know about every possible vCPU before
the guest is started.
With these constraints, virtual-cpuhotplug is really just a hypervisor/firmware
policy about which CPUs can be brought online.

This series adds support for virtual-cpuhotplug as exactly that: firmware
policy. This may even work on a physical machine too; for a guest the part of
firmware is played by the VMM. (typically Qemu).

PSCI support is modified to return 'DENIED' if the CPU can't be brought
online/enabled yet. The CPU object's _STA method's enabled bit is used to
indicate firmware's current disposition. If the CPU has its enabled bit clear,
it will not be registered with sysfs, and attempts to bring it online will
fail. The notifications that _STA has changed its value then work in the same
way as physical hotplug, and firmware can cause the CPU to be registered some
time later, allowing it to be brought online.

This creates something that looks like cpuhotplug to user-space, as the sysfs
files appear and disappear, and the udev notifications look the same.

One notable difference is the CPU present mask, which is exposed via sysfs.
Because the CPUs remain present throughout, they can still be seen in that mask.
This value does get used by webbrowsers to estimate the number of CPUs
as the CPU online mask is constantly changed on mobile phones.

Linux is tolerant of PSCI returning errors, as its always been allowed to do
that. To avoid confusing OS that can't tolerate this, we needed an additional
bit in the MADT GICC flags. This series copies ACPI_MADT_ONLINE_CAPABLE, which
appears to be for this purpose, but calls it ACPI_MADT_GICC_CPU_CAPABLE as it
has a different bit position in the GICC.

This code is unconditionally enabled for all ACPI architectures.
If there are problems with firmware tables on some devices, the CPUs will
already be online by the time the acpi_processor_make_enabled() is called.
A mismatch here causes a firmware-bug message and kernel taint. This should
only affect people with broken firmware who also boot with maxcpus=1, and
bring CPUs online later.

I had a go at switching the remaining architectures over to GENERIC_CPU_DEVICES,
so that the Kconfig symbol can be removed, but I got stuck with powerpc
and s390.

I've only build tested Loongarch and riscv. I've removed the ia64 specific
patches, but left the changes in other patches to make git-grep review of
renames easier.

If folk want to play along at home, you'll need a copy of Qemu that supports this.
https://github.com/salil-mehta/qemu.git salil/virt-cpuhp-armv8/rfc-v2-rc6

Replace your '-smp' argument with something like:
| -smp cpus=1,maxcpus=3,cores=3,threads=1,sockets=1

then feed the following to the Qemu montior;
| (qemu) device_add driver=host-arm-cpu,core-id=1,id=cpu1
| (qemu) device_del cpu1


Why is this still an RFC? I'm still looking for confirmation from the
kubernetes/kata folk that this works for them. Because of this I've culled
the CC list...


This series is based on v6.6-rc1, and can be retrieved from:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/ virtual_cpu_hotplug/rfc/v2


Thanks,

James Morse (34):
  ACPI: Move ACPI_HOTPLUG_CPU to be disabled on arm64 and riscv
  drivers: base: Use present CPUs in GENERIC_CPU_DEVICES
  drivers: base: Allow parts of GENERIC_CPU_DEVICES to be overridden
  drivers: base: Move cpu_dev_init() after node_dev_init()
  drivers: base: Print a warning instead of panic() when register_cpu()
    fails
  arm64: setup: Switch over to GENERIC_CPU_DEVICES using
    arch_register_cpu()
  x86: intel_epb: Don't rely on link order
  x86/topology: Switch over to GENERIC_CPU_DEVICES
  LoongArch: Switch over to GENERIC_CPU_DEVICES
  riscv: Switch over to GENERIC_CPU_DEVICES
  arch_topology: Make register_cpu_capacity_sysctl() tolerant to late
    CPUs
  ACPI: Use the acpi_device_is_present() helper in more places
  ACPI: Rename acpi_scan_device_not_present() to be about enumeration
  ACPI: Only enumerate enabled (or functional) devices
  ACPI: processor: Add support for processors described as container
    packages
  ACPI: processor: Register CPUs that are online, but not described in
    the DSDT
  ACPI: processor: Register all CPUs from acpi_processor_get_info()
  ACPI: Rename ACPI_HOTPLUG_CPU to include 'present'
  ACPI: Move acpi_bus_trim_one() before acpi_scan_hot_remove()
  ACPI: Rename acpi_processor_hotadd_init and remove pre-processor
    guards
  ACPI: Add post_eject to struct acpi_scan_handler for cpu hotplug
  ACPI: Check _STA present bit before making CPUs not present
  ACPI: Warn when the present bit changes but the feature is not enabled
  drivers: base: Implement weak arch_unregister_cpu()
  LoongArch: Use the __weak version of arch_unregister_cpu()
  arm64: acpi: Move get_cpu_for_acpi_id() to a header
  ACPICA: Add new MADT GICC flags fields [code first?]
  arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a
    helper
  irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc()
  irqchip/gic-v3: Add support for ACPI's disabled but 'online capable'
    CPUs
  ACPI: add support to register CPUs based on the _STA enabled bit
  arm64: document virtual CPU hotplug's expectations
  ACPI: Add _OSC bits to advertise OS support for toggling CPU
    present/enabled
  cpumask: Add enabled cpumask for present CPUs that can be brought
    online

Jean-Philippe Brucker (1):
  arm64: psci: Ignore DENIED CPUs

 Documentation/arch/arm64/cpu-hotplug.rst   |  79 ++++++++++
 Documentation/arch/arm64/index.rst         |   1 +
 arch/arm64/Kconfig                         |   1 +
 arch/arm64/include/asm/acpi.h              |  11 ++
 arch/arm64/include/asm/cpu.h               |   1 -
 arch/arm64/kernel/acpi_numa.c              |  11 --
 arch/arm64/kernel/psci.c                   |   2 +-
 arch/arm64/kernel/setup.c                  |  13 +-
 arch/arm64/kernel/smp.c                    |   5 +-
 arch/ia64/Kconfig                          |   2 +
 arch/ia64/include/asm/acpi.h               |   2 +-
 arch/ia64/include/asm/cpu.h                |   5 -
 arch/ia64/kernel/acpi.c                    |   6 +-
 arch/ia64/kernel/setup.c                   |   2 +-
 arch/ia64/kernel/topology.c                |   2 +-
 arch/loongarch/Kconfig                     |   2 +
 arch/loongarch/configs/loongson3_defconfig |   2 +-
 arch/loongarch/kernel/acpi.c               |   4 +-
 arch/loongarch/kernel/topology.c           |  38 +----
 arch/riscv/Kconfig                         |   1 +
 arch/riscv/kernel/setup.c                  |  19 +--
 arch/x86/Kconfig                           |   3 +
 arch/x86/include/asm/cpu.h                 |   6 -
 arch/x86/kernel/acpi/boot.c                |   4 +-
 arch/x86/kernel/cpu/intel_epb.c            |   2 +-
 arch/x86/kernel/topology.c                 |  25 +---
 drivers/acpi/Kconfig                       |  14 +-
 drivers/acpi/acpi_processor.c              | 160 ++++++++++++++++-----
 drivers/acpi/bus.c                         |  16 +++
 drivers/acpi/device_pm.c                   |   2 +-
 drivers/acpi/device_sysfs.c                |   2 +-
 drivers/acpi/internal.h                    |   1 -
 drivers/acpi/processor_core.c              |   2 +-
 drivers/acpi/property.c                    |   2 +-
 drivers/acpi/scan.c                        | 147 ++++++++++++-------
 drivers/base/arch_topology.c               |  38 +++--
 drivers/base/cpu.c                         |  40 ++++--
 drivers/base/init.c                        |   2 +-
 drivers/firmware/psci/psci.c               |   2 +
 drivers/irqchip/irq-gic-v3.c               |  38 ++---
 include/acpi/acpi_bus.h                    |   1 +
 include/acpi/actbl2.h                      |   1 +
 include/acpi/processor.h                   |   2 +-
 include/linux/acpi.h                       |  14 +-
 include/linux/cpu.h                        |   6 +
 include/linux/cpumask.h                    |  25 ++++
 kernel/cpu.c                               |   3 +
 47 files changed, 516 insertions(+), 251 deletions(-)
 create mode 100644 Documentation/arch/arm64/cpu-hotplug.rst

Comments

Russell King (Oracle) Sept. 14, 2023, 8:54 a.m. UTC | #1
On Wed, Sep 13, 2023 at 04:37:49PM +0000, James Morse wrote:
> diff --git a/arch/loongarch/include/asm/cpu.h b/arch/loongarch/include/asm/cpu.h
> index 48b9f7168bcc..7afe8cbb844e 100644
> --- a/arch/loongarch/include/asm/cpu.h
> +++ b/arch/loongarch/include/asm/cpu.h
> @@ -128,4 +128,11 @@ enum cpu_type_enum {
>  #define LOONGARCH_CPU_HYPERVISOR	BIT_ULL(CPU_FEATURE_HYPERVISOR)
>  #define LOONGARCH_CPU_PTW		BIT_ULL(CPU_FEATURE_PTW)
>  
> +#if !defined(__ASSEMBLY__)
> +#ifdef CONFIG_HOTPLUG_CPU
> +int arch_register_cpu(int num);
> +void arch_unregister_cpu(int cpu);
> +#endif
> +#endif /* ! __ASSEMBLY__ */

So, for loongarch:

grep arch_.*register_cpu arch/loongarch/ -r
arch/loongarch/kernel/topology.c:int arch_register_cpu(int cpu)
arch/loongarch/kernel/topology.c:EXPORT_SYMBOL(arch_register_cpu);
arch/loongarch/kernel/topology.c:void arch_unregister_cpu(int cpu)
arch/loongarch/kernel/topology.c:EXPORT_SYMBOL(arch_unregister_cpu);

So really this is a fix (since these functions should have prototypes)
and thus should probably be a separate patch.

However, I also wonder whether these prototypes should be added to
linux/cpu.h and be done with it (rather than have every arch prototype
these - it's not like the prototype can be different from this because
of the generic code.

I know in subsequent patches you do that, but it's rather piecemeal,
and I think this is a change that could be submitted now as both a
fix and clean up.
Russell King (Oracle) Sept. 14, 2023, 9:52 a.m. UTC | #2
On Wed, Sep 13, 2023 at 04:37:53PM +0000, James Morse wrote:
> loongarch, mips, parisc, riscv and sh all print a warning if
> register_cpu() returns an error. Architectures that use
> GENERIC_CPU_DEVICES call panic() instead.
> 
> Errors in this path indicate something is wrong with the firmware
> description of the platform, but the kernel is able to keep running.
> 
> Downgrade this to a warning to make it easier to debug this issue.
> 
> This will allow architectures that switching over to GENERIC_CPU_DEVICES
> to drop their warning, but keep the existing behaviour.
> 
> Signed-off-by: James Morse <james.morse@arm.com>

Assuming other architectures do similar to x86 (which only return the
error code from register_cpu()), the only error that would occur here
is if device_register() fails, which would be catastophic, and I
suspect the system would fail to boot anyway.

Downgrading the panic to a warning at least gives us a chance that
the system may come up sufficiently to examine what happened, so I
think this makes sense:

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Russell King (Oracle) Sept. 14, 2023, 10:03 a.m. UTC | #3
On Wed, Sep 13, 2023 at 04:37:57PM +0000, James Morse wrote:
> Now that GENERIC_CPU_DEVICES calls arch_register_cpu(), which can be
> overridden by the arch code, switch over to this to allow common code
> to choose when the register_cpu() call is made.
> 
> This allows topology_init() to be removed.
> 
> This is an intermediate step to the logic being moved to drivers/acpi,
> where GENERIC_CPU_DEVICES will do the work when booting with acpi=off.
> 
> Signed-off-by: James Morse <james.morse@arm.com>

Same comment as x86 (moving the point at which cpus are registered
ought to be mentioned in the commit message.)
Jonathan Cameron Sept. 14, 2023, 11:16 a.m. UTC | #4
On Wed, 13 Sep 2023 16:37:52 +0000
James Morse <james.morse@arm.com> wrote:

> NUMA systems require the node descriptions to be ready before CPUs are
> registered. This is so that the node symlinks can be created in sysfs.
> 
> Currently no NUMA platform uses GENERIC_CPU_DEVICES, meaning that CPUs
> are registered by arch code, instead of cpu_dev_init().

Worth saying why this matters I think.  I wrote a nice note on that being a possible
problem path as node_dev_init() uses the results of cpu_dev_init() if
CONFIG_GENERIC_CPU_DEVICES before seeing this comment and realizing you
had it covered (sort of anyway).

> 
> Move cpu_dev_init() after node_dev_init() so that NUMA architectures
> can use GENERIC_CPU_DEVICES.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/base/init.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/base/init.c b/drivers/base/init.c
> index 397eb9880cec..c4954835128c 100644
> --- a/drivers/base/init.c
> +++ b/drivers/base/init.c
> @@ -35,8 +35,8 @@ void __init driver_init(void)
>  	of_core_init();
>  	platform_bus_init();
>  	auxiliary_bus_init();
> -	cpu_dev_init();
>  	memory_dev_init();
>  	node_dev_init();
> +	cpu_dev_init();
>  	container_dev_init();
>  }
Jonathan Cameron Sept. 14, 2023, 11:47 a.m. UTC | #5
On Wed, 13 Sep 2023 16:37:57 +0000
James Morse <james.morse@arm.com> wrote:

> Now that GENERIC_CPU_DEVICES calls arch_register_cpu(), which can be
> overridden by the arch code, switch over to this to allow common code
> to choose when the register_cpu() call is made.
> 
> This allows topology_init() to be removed.
> 
> This is an intermediate step to the logic being moved to drivers/acpi,
> where GENERIC_CPU_DEVICES will do the work when booting with acpi=off.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  arch/loongarch/Kconfig           |  1 +
>  arch/loongarch/kernel/topology.c | 29 ++---------------------------
>  2 files changed, 3 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
> index 2bddd202470e..5bed51adc68c 100644
> --- a/arch/loongarch/Kconfig
> +++ b/arch/loongarch/Kconfig
> @@ -72,6 +72,7 @@ config LOONGARCH
>  	select GENERIC_CLOCKEVENTS
>  	select GENERIC_CMOS_UPDATE
>  	select GENERIC_CPU_AUTOPROBE
> +	select GENERIC_CPU_DEVICES
>  	select GENERIC_ENTRY
>  	select GENERIC_GETTIMEOFDAY
>  	select GENERIC_IOREMAP if !ARCH_IOREMAP
> diff --git a/arch/loongarch/kernel/topology.c b/arch/loongarch/kernel/topology.c
> index caa7cd859078..8e4441c1ff39 100644
> --- a/arch/loongarch/kernel/topology.c
> +++ b/arch/loongarch/kernel/topology.c
> @@ -7,20 +7,13 @@
>  #include <linux/percpu.h>
>  #include <asm/bootinfo.h>
>  
> -static DEFINE_PER_CPU(struct cpu, cpu_devices);
> -
>  #ifdef CONFIG_HOTPLUG_CPU
>  int arch_register_cpu(int cpu)
>  {
> -	int ret;
>  	struct cpu *c = &per_cpu(cpu_devices, cpu);
>  
> -	c->hotpluggable = 1;

This is a bit subtle.  Can loongarch hotplug a CPU that
is also io_master(cpu)?  I have no idea if there is a subtle difference
between.

1) CPUs present at boot where if they are an io_master they are not allowed
   to be hot removed.
2) CPUs that turn up (hotplugged) later which are an io_master and by original code
   can be removed.

My guess is that no io_master CPU can be hotplugged in making this irrelevant
and your code correct as the =1 is just a micro optimizatoin.

If we can confirm that, a one line addition to the patch description would be
great. 

Otherwise LGTM

> -	ret = register_cpu(c, cpu);
> -	if (ret < 0)
> -		pr_warn("register_cpu %d failed (%d)\n", cpu, ret);
> -
> -	return ret;
> +	c->hotpluggable = !io_master(cpu);
> +	return register_cpu(c, cpu);
>  }
>  EXPORT_SYMBOL(arch_register_cpu);
>  
> @@ -33,21 +26,3 @@ void arch_unregister_cpu(int cpu)
>  }
>  EXPORT_SYMBOL(arch_unregister_cpu);
>  #endif
> -
> -static int __init topology_init(void)
> -{
> -	int i, ret;
> -
> -	for_each_present_cpu(i) {
> -		struct cpu *c = &per_cpu(cpu_devices, i);
> -
> -		c->hotpluggable = !io_master(i);
> -		ret = register_cpu(c, i);
> -		if (ret < 0)
> -			pr_warn("topology_init: register_cpu %d failed (%d)\n", i, ret);
> -	}
> -
> -	return 0;
> -}
> -
> -subsys_initcall(topology_init);
Jonathan Cameron Sept. 14, 2023, 1:53 p.m. UTC | #6
On Wed, 13 Sep 2023 16:38:03 +0000
James Morse <james.morse@arm.com> wrote:

> ACPI has two ways of describing processors in the DSDT. Either as a device
> object with HID ACPI0007, or as a type 'C' package inside a Processor
> Container. The ACPI processor driver probes CPUs described as devices, but
> not those described as packages.
> 

Specification reference needed...

Terminology wise, I'd just refer to Processor() objects as I think they
are named objects rather than data terms like a package (Which include
a PkgLength etc)



> Duplicate descriptions are not allowed, the ACPI processor driver already
> parses the UID from both devices and containers. acpi_processor_get_info()
> returns an error if the UID exists twice in the DSDT.
> 
> The missing probe for CPUs described as packages creates a problem for
> moving the cpu_register() calls into the acpi_processor driver, as CPUs
> described like this don't get registered, leading to errors from other
> subsystems when they try to add new sysfs entries to the CPU node.
> (e.g. topology_sysfs_init()'s use of topology_add_dev() via cpuhp)
> 
> To fix this, parse the processor container and call acpi_processor_add()
> for each processor that is discovered like this. The processor container
> handler is added with acpi_scan_add_handler(), so no detach call will
> arrive.
> 
> Qemu TCG describes CPUs using packages in a processor container.

processor terms in a processor container. 
> 
> Signed-off-by: James Morse <james.morse@arm.com>

Otherwise looks fine to me.

Jonathan
> ---
>  drivers/acpi/acpi_processor.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index c0839bcf78c1..b4bde78121bb 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -625,9 +625,31 @@ static struct acpi_scan_handler processor_handler = {
>  	},
>  };
>  
> +static acpi_status acpi_processor_container_walk(acpi_handle handle,
> +						 u32 lvl,
> +						 void *context,
> +						 void **rv)
> +{
> +	struct acpi_device *adev;
> +	acpi_status status;
> +
> +	adev = acpi_get_acpi_dev(handle);
> +	if (!adev)
> +		return AE_ERROR;
> +
> +	status = acpi_processor_add(adev, &processor_device_ids[0]);
> +	acpi_put_acpi_dev(adev);
> +
> +	return status;
> +}
> +
>  static int acpi_processor_container_attach(struct acpi_device *dev,
>  					   const struct acpi_device_id *id)
>  {
> +	acpi_walk_namespace(ACPI_TYPE_PROCESSOR, dev->handle,
> +			    ACPI_UINT32_MAX, acpi_processor_container_walk,
> +			    NULL, NULL, NULL);
> +
>  	return 1;
>  }
>
Jonathan Cameron Sept. 14, 2023, 2:10 p.m. UTC | #7
On Wed, 13 Sep 2023 16:38:07 +0000
James Morse <james.morse@arm.com> wrote:

> A subsequent patch will change acpi_scan_hot_remove() to call
> acpi_bus_trim_one() instead of acpi_bus_trim(), meaning it can no longer
> rely on the prototype in the header file.
> 
> Move these functions further up the file.
> No change in behaviour.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
FWIW
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/acpi/scan.c | 76 ++++++++++++++++++++++-----------------------
>  1 file changed, 38 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index f898591ce05f..a675333618ae 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -244,6 +244,44 @@ static int acpi_scan_try_to_offline(struct acpi_device *device)
>  	return 0;
>  }
>  
> +static int acpi_bus_trim_one(struct acpi_device *adev, void *not_used)
> +{
> +	struct acpi_scan_handler *handler = adev->handler;
> +
> +	acpi_dev_for_each_child_reverse(adev, acpi_bus_trim_one, NULL);
> +
> +	adev->flags.match_driver = false;
> +	if (handler) {
> +		if (handler->detach)
> +			handler->detach(adev);
> +
> +		adev->handler = NULL;
> +	} else {
> +		device_release_driver(&adev->dev);
> +	}
> +	/*
> +	 * Most likely, the device is going away, so put it into D3cold before
> +	 * that.
> +	 */
> +	acpi_device_set_power(adev, ACPI_STATE_D3_COLD);
> +	adev->flags.initialized = false;
> +	acpi_device_clear_enumerated(adev);
> +
> +	return 0;
> +}
> +
> +/**
> + * acpi_bus_trim - Detach scan handlers and drivers from ACPI device objects.
> + * @adev: Root of the ACPI namespace scope to walk.
> + *
> + * Must be called under acpi_scan_lock.
> + */
> +void acpi_bus_trim(struct acpi_device *adev)
> +{
> +	acpi_bus_trim_one(adev, NULL);
> +}
> +EXPORT_SYMBOL_GPL(acpi_bus_trim);
> +
>  static int acpi_scan_hot_remove(struct acpi_device *device)
>  {
>  	acpi_handle handle = device->handle;
> @@ -2506,44 +2544,6 @@ int acpi_bus_scan(acpi_handle handle)
>  }
>  EXPORT_SYMBOL(acpi_bus_scan);
>  
> -static int acpi_bus_trim_one(struct acpi_device *adev, void *not_used)
> -{
> -	struct acpi_scan_handler *handler = adev->handler;
> -
> -	acpi_dev_for_each_child_reverse(adev, acpi_bus_trim_one, NULL);
> -
> -	adev->flags.match_driver = false;
> -	if (handler) {
> -		if (handler->detach)
> -			handler->detach(adev);
> -
> -		adev->handler = NULL;
> -	} else {
> -		device_release_driver(&adev->dev);
> -	}
> -	/*
> -	 * Most likely, the device is going away, so put it into D3cold before
> -	 * that.
> -	 */
> -	acpi_device_set_power(adev, ACPI_STATE_D3_COLD);
> -	adev->flags.initialized = false;
> -	acpi_device_clear_enumerated(adev);
> -
> -	return 0;
> -}
> -
> -/**
> - * acpi_bus_trim - Detach scan handlers and drivers from ACPI device objects.
> - * @adev: Root of the ACPI namespace scope to walk.
> - *
> - * Must be called under acpi_scan_lock.
> - */
> -void acpi_bus_trim(struct acpi_device *adev)
> -{
> -	acpi_bus_trim_one(adev, NULL);
> -}
> -EXPORT_SYMBOL_GPL(acpi_bus_trim);
> -
>  int acpi_bus_register_early_device(int type)
>  {
>  	struct acpi_device *device = NULL;
Jonathan Cameron Sept. 14, 2023, 2:28 p.m. UTC | #8
On Wed, 13 Sep 2023 16:38:09 +0000
James Morse <james.morse@arm.com> wrote:

> struct acpi_scan_handler has a detach callback that is used to remove
> a driver when a bus is changed. When interacting with an eject-request,
> the detach callback is called before _EJ0.
> 
> This means the ACPI processor driver can't use _STA to determine if a
> CPU has been made not-present, or some of the other _STA bits have been
> changed. acpi_processor_remove() needs to know the value of _STA after
> _EJ0 has been called.

Why hasn't it been a problem before?

> 
> Add a post_eject callback to struct acpi_scan_handler. This is called
> after acpi_scan_hot_remove() has successfully called _EJ0. Because
> acpi_bus_trim_one() also clears the handler pointer, it needs to be
> told if the caller will go on to call acpi_bus_post_eject(), so
> that acpi_device_clear_enumerated() and clearing the handler pointer
> can be deferred. The existing not-used pointer is used for this.
> 
> Signed-off-by: James Morse <james.morse@arm.com>

I briefly wondered if an alternative model where you always call the
post walk was cleaner as the handler clear etc would always be in same place.
However, couldn't make it work that nicely because you still need to indicate
that it's an eject post handler or not which just moves the messy code.

As such this LGTM

Reviewed-by: Joanthan Cameron <Jonathan.Cameron@huawei.com>


> ---
>  drivers/acpi/acpi_processor.c |  4 +--
>  drivers/acpi/scan.c           | 52 ++++++++++++++++++++++++++++++-----
>  include/acpi/acpi_bus.h       |  1 +
>  3 files changed, 48 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index 22a15a614f95..00dcc23d49a8 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -459,7 +459,7 @@ static int acpi_processor_add(struct acpi_device *device,
>  
>  #ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU
>  /* Removal */
> -static void acpi_processor_remove(struct acpi_device *device)
> +static void acpi_processor_post_eject(struct acpi_device *device)
>  {
>  	struct acpi_processor *pr;
>  
> @@ -627,7 +627,7 @@ static struct acpi_scan_handler processor_handler = {
>  	.ids = processor_device_ids,
>  	.attach = acpi_processor_add,
>  #ifdef CONFIG_ACPI_HOTPLUG_PRESENT_CPU
> -	.detach = acpi_processor_remove,
> +	.post_eject = acpi_processor_post_eject,
>  #endif
>  	.hotplug = {
>  		.enabled = true,
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index a675333618ae..b6d2f01640a9 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -244,18 +244,28 @@ static int acpi_scan_try_to_offline(struct acpi_device *device)
>  	return 0;
>  }
>  
> -static int acpi_bus_trim_one(struct acpi_device *adev, void *not_used)
> +/**
> + * acpi_bus_trim_one() - Detach scan handlers and drivers from ACPI device
> + *                       objects.
> + * @adev:       Root of the ACPI namespace scope to walk.
> + * @eject:      Pointer to a bool that indicates if this was due to an
> + *              eject-request.
> + *
> + * Must be called under acpi_scan_lock.
> + * If @eject points to true, clearing the device enumeration is deferred until
> + * acpi_bus_post_eject() is called.
> + */
> +static int acpi_bus_trim_one(struct acpi_device *adev, void *eject)
>  {
>  	struct acpi_scan_handler *handler = adev->handler;
> +	bool is_eject = *(bool *)eject;
>  
> -	acpi_dev_for_each_child_reverse(adev, acpi_bus_trim_one, NULL);
> +	acpi_dev_for_each_child_reverse(adev, acpi_bus_trim_one, eject);
>  
>  	adev->flags.match_driver = false;
>  	if (handler) {
>  		if (handler->detach)
>  			handler->detach(adev);
> -
> -		adev->handler = NULL;
>  	} else {
>  		device_release_driver(&adev->dev);
>  	}
> @@ -265,7 +275,12 @@ static int acpi_bus_trim_one(struct acpi_device *adev, void *not_used)
>  	 */
>  	acpi_device_set_power(adev, ACPI_STATE_D3_COLD);
>  	adev->flags.initialized = false;
> -	acpi_device_clear_enumerated(adev);
> +
> +	/* For eject this is deferred to acpi_bus_post_eject() */
> +	if (!is_eject) {
> +		adev->handler = NULL;
> +		acpi_device_clear_enumerated(adev);
> +	}
>  
>  	return 0;
>  }
> @@ -278,15 +293,36 @@ static int acpi_bus_trim_one(struct acpi_device *adev, void *not_used)
>   */
>  void acpi_bus_trim(struct acpi_device *adev)
>  {
> -	acpi_bus_trim_one(adev, NULL);
> +	bool eject = false;
> +
> +	acpi_bus_trim_one(adev, &eject);
>  }
>  EXPORT_SYMBOL_GPL(acpi_bus_trim);
>  
> +static int acpi_bus_post_eject(struct acpi_device *adev, void *not_used)
> +{
> +	struct acpi_scan_handler *handler = adev->handler;
> +
> +	acpi_dev_for_each_child_reverse(adev, acpi_bus_post_eject, NULL);
> +
> +	if (handler) {
> +		if (handler->post_eject)
> +			handler->post_eject(adev);
> +
> +		adev->handler = NULL;
> +	}
> +
> +	acpi_device_clear_enumerated(adev);
> +
> +	return 0;
> +}
> +
>  static int acpi_scan_hot_remove(struct acpi_device *device)
>  {
>  	acpi_handle handle = device->handle;
>  	unsigned long long sta;
>  	acpi_status status;
> +	bool eject = true;
>  
>  	if (device->handler && device->handler->hotplug.demand_offline) {
>  		if (!acpi_scan_is_offline(device, true))
> @@ -299,7 +335,7 @@ static int acpi_scan_hot_remove(struct acpi_device *device)
>  
>  	acpi_handle_debug(handle, "Ejecting\n");
>  
> -	acpi_bus_trim(device);
> +	acpi_bus_trim_one(device, &eject);
>  
>  	acpi_evaluate_lck(handle, 0);
>  	/*
> @@ -322,6 +358,8 @@ static int acpi_scan_hot_remove(struct acpi_device *device)
>  	} else if (sta & ACPI_STA_DEVICE_ENABLED) {
>  		acpi_handle_warn(handle,
>  			"Eject incomplete - status 0x%llx\n", sta);
> +	} else {
> +		acpi_bus_post_eject(device, NULL);
>  	}
>  
>  	return 0;
> diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
> index 254685085c82..1b7e1acf925b 100644
> --- a/include/acpi/acpi_bus.h
> +++ b/include/acpi/acpi_bus.h
> @@ -127,6 +127,7 @@ struct acpi_scan_handler {
>  	bool (*match)(const char *idstr, const struct acpi_device_id **matchid);
>  	int (*attach)(struct acpi_device *dev, const struct acpi_device_id *id);
>  	void (*detach)(struct acpi_device *dev);
> +	void (*post_eject)(struct acpi_device *dev);
>  	void (*bind)(struct device *phys_dev);
>  	void (*unbind)(struct device *phys_dev);
>  	struct acpi_hotplug_profile hotplug;
Russell King (Oracle) Sept. 18, 2023, 10:27 a.m. UTC | #9
On Wed, Sep 13, 2023 at 04:37:48PM +0000, James Morse wrote:
> This series is based on v6.6-rc1, and can be retrieved from:
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/ virtual_cpu_hotplug/rfc/v2

Hi James,

FYI, this doesn't seem to be based upon v6.6-rc1, but v6.4-rc5.
virtual_cpu_hotplug/rfc/v2 seems to have a hash of 505859b05e15.

Thanks.
Salil Mehta Sept. 26, 2023, 1:16 p.m. UTC | #10
> From: James Morse <james.morse@arm.com>
> Sent: Wednesday, September 13, 2023 5:38 PM

[...]

> 
> Hello!
> 
> Changes since RFC-v1:
>  * riscv is new, ia64 is gone
>  * The KVM support is different, and upstream - no need to patch the host.
> 
> ---
> 
> This series adds what looks like cpuhotplug support to arm64 for use in
> virtual machines. It does this by moving the cpu_register() calls for
> architectures that support ACPI out of the arch code by using
> GENERIC_CPU_DEVICES, then into the ACPI processor driver.
> 
> The kubernetes folk really want to be able to add CPUs to an existing VM,
> in exactly the same way they do on x86. The use-case is pre-booting guests
> with one CPU, then adding the number that were actually needed when the
> workload is provisioned.
> 

[...]

> 
> I had a go at switching the remaining architectures over to
> GENERIC_CPU_DEVICES,
> so that the Kconfig symbol can be removed, but I got stuck with powerpc
> and s390.
> 
> I've only build tested Loongarch and riscv. I've removed the ia64 specific
> patches, but left the changes in other patches to make git-grep review of
> renames easier.
> 
> If folk want to play along at home, you'll need a copy of Qemu that
> supports this.
> https://github.com/salil-mehta/qemu.git salil/virt-cpuhp-armv8/rfc-v2-rc6


Please use the latest pushed RFC V2 instead:
https://lore.kernel.org/qemu-devel/20230926100436.28284-1-salil.mehta@huawei.com/T/#m523b37819c4811c7827333982004e07a1ef03879

Repository:
https://github.com/salil-mehta/qemu.git  virt-cpuhp-armv8/rfc-v2


Thanks
Salil.


[...]

> Why is this still an RFC? I'm still looking for confirmation from the
> kubernetes/kata folk that this works for them. Because of this I've culled
> the CC list...
> 
> 
> This series is based on v6.6-rc1, and can be retrieved from:
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/virtual_cpu_hotplug/rfc/v2