[v5,00/18] ACPI/arm64: add support for virtual cpu hotplug

Message ID	20240412143719.11398-1-Jonathan.Cameron@huawei.com
Headers	show Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 398DD139D08; Fri, 12 Apr 2024 14:37:26 +0000 (UTC) From: Jonathan Cameron <Jonathan.Cameron@huawei.com> To: <linux-pm@vger.kernel.org>, <loongarch@lists.linux.dev>, <linux-acpi@vger.kernel.org>, <linux-arch@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <linux-arm-kernel@lists.infradead.org>, <kvmarm@lists.linux.dev>, <x86@kernel.org>, Russell King <linux@armlinux.org.uk>, "Rafael J . Wysocki" <rafael@kernel.org>, Miguel Luis <miguel.luis@oracle.com>, James Morse <james.morse@arm.com>, Salil Mehta <salil.mehta@huawei.com>, Jean-Philippe Brucker <jean-philippe@linaro.org>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org> CC: <linuxarm@huawei.com>, <justin.he@arm.com>, <jianyong.wu@arm.com> Subject: [PATCH v5 00/18] ACPI/arm64: add support for virtual cpu hotplug Date: Fri, 12 Apr 2024 15:37:01 +0100 Message-ID: <20240412143719.11398-1-Jonathan.Cameron@huawei.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain
Series	ACPI/arm64: add support for virtual cpu hotplug \| expand [v5,00/18] ACPI/arm64: add support for virtual cpu hotplug [v5,01/18] cpu: Do not warn on arch_register_cpu() returning -EPROBE_DEFER [v5,02/18] ACPI: processor: Set the ACPI_COMPANION for the struct cpu instance [v5,03/18] ACPI: processor: Register deferred CPUs from acpi_processor_get_info() [v5,04/18] ACPI: Rename acpi_processor_hotadd_init and remove pre-processor guards [v5,05/18] ACPI: utils: Add an acpi_sta_enabled() helper and use it in acpi_processor_make_present() [v5,06/18] ACPI: scan: Add parameter to allow defering some actions in acpi_scan_check_and_detach. [v5,07/18] ACPI: Add post_eject to struct acpi_scan_handler for cpu hotplug [v5,08/18] ACPI: convert acpi_processor_post_eject() to use IS_ENABLED() [v5,09/18] ACPI: Check _STA present bit before making CPUs not present [v5,10/18] ACPI: Warn when the present bit changes but the feature is not enabled [v5,11/18] arm64: acpi: Move get_cpu_for_acpi_id() to a header [v5,12/18] irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc() [v5,13/18] irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs [v5,14/18] arm64: psci: Ignore DENIED CPUs [v5,15/18] arm64: arch_register_cpu() variant to allow checking of ACPI _STA [v5,16/18] ACPI: add support to (un)register CPUs based on the _STA enabled bit [v5,17/18] arm64: document virtual CPU hotplug's expectations [v5,18/18] cpumask: Add enabled cpumask for present CPUs that can be brought online

Jonathan Cameron April 12, 2024, 2:37 p.m. UTC

This patch set changes hands again in an attempt to set a new record for
most people who have worked on a single problem.

Miguel has been working on a rename and factoring out of arch
specific code patch set that will clash with this.
https://lore.kernel.org/linux-acpi/20240409150536.9933-1-miguel.luis@oracle.com/
[RFC PATCH 0/4] ACPI: processor: refactor acpi_processor_{get_info|remove}

v5 changes:
- Rebase on Rafael's rework of acpi_scan_check_and_detach() series that
  superceeded the original first patch.
  https://lore.kernel.org/linux-acpi/6021126.lOV4Wx5bFT@kreacher/
  That dealt with what I thought was the most controversial part of the
  series - checking the enabled bit ACPI _STA for CPUS.
- Change the overall handling so that arch_register_cpu() returns
  -EPROBE_DEFER if the particular architecture is not yet ready to
  answer the question of whether a particular CPU maybe used.
  This occurs for ARM64 + ACPI in 2 cases.
  1) At the initial callsite early in boot, before the AML interpreter
     is available and so the code can't query _STA.
  2) If _STA is queried but a particular CPU is present but not enabled.
     Those are the ones we are going to hotplug later.
  For all other architectures and ARM64 DT boots the this deferred
  flow is not used.
- Make the _make_enabled() and _make_not_enabled() flows more similar
  to the _make_present() and _make_not_present(). There are still
  sufficient differences that I don't think it makes sense to combine
  the code, but ensuring the locking and NUMA handling brings them
  closer together.  Note than an additional series will address the
  question of onlining and offlining the NUMA node as for now it
  will always be present (that series is not necessary for initial
  merge of this feature).

Dropped RFC because I think this is getting close to ready for merging
and now we are interested in normal review rather than calling out
significant remaining questions.

Updated version of James' original introduction.

This series adds what looks like cpuhotplug support to arm64 for use in
virtual machines. It does this by moving the cpu_register() calls for
architectures that support ACPI into an arch specific call made from
the ACPI processor driver.
 
The kubernetes folk really want to be able to add CPUs to an existing VM,
in exactly the same way they do on x86. The use-case is pre-booting guests
with one CPU, then adding the number that were actually needed when the
workload is provisioned.

Wait? Doesn't arm64 support cpuhotplug already!?
In the arm world, cpuhotplug gets used to mean removing the power from a CPU.
The CPU is offline, and remains present. For x86, and ACPI, cpuhotplug
has the additional step of physically removing the CPU, so that it isn't
present anymore.
 
Arm64 doesn't support this, and can't support it: CPUs are really a slice
of the SoC, and there is not enough information in the existing ACPI tables
to describe which bits of the slice also got removed. Without a reference
machine: adding this support to the spec is a wild goose chase.
 
Critically: everything described in the firmware tables must remain present.
 
For a virtual machine this is easy as all the other bits of 'virtual SoC'
are emulated, so they can (and do) remain present when a vCPU is 'removed'.

On a system that supports cpuhotplug the MADT has to describe every possible
CPU at boot. Under KVM, the vGIC needs to know about every possible vCPU before
the guest is started.
With these constraints, virtual-cpuhotplug is really just a hypervisor/firmware
policy about which CPUs can be brought online.
 
This series adds support for virtual-cpuhotplug as exactly that: firmware
policy. This may even work on a physical machine too; for a guest the part of
firmware is played by the VMM. (typically Qemu).
 
PSCI support is modified to return 'DENIED' if the CPU can't be brought
online/enabled yet. The CPU object's _STA method's enabled bit is used to
indicate firmware's current disposition. If the CPU has its enabled bit clear,
it will not be registered with sysfs, and attempts to bring it online will
fail. The notifications that _STA has changed its value then work in the same
way as physical hotplug, and firmware can cause the CPU to be registered some
time later, allowing it to be brought online.
 
This creates something that looks like cpuhotplug to user-space, as the sysfs
files appear and disappear, and the udev notifications look the same.
 
One notable difference is the CPU present mask, which is exposed via sysfs.
Because the CPUs remain present throughout, they can still be seen in that mask.
This value does get used by webbrowsers to estimate the number of CPUs
as the CPU online mask is constantly changed on mobile phones.
 
Linux is tolerant of PSCI returning errors, as its always been allowed to do
that. To avoid confusing OS that can't tolerate this, we needed an additional
bit in the MADT GICC flags. This series copies ACPI_MADT_ONLINE_CAPABLE, which
appears to be for this purpose, but calls it ACPI_MADT_GICC_CPU_CAPABLE as it
has a different bit position in the GICC.
 
This code is unconditionally enabled for all ACPI architectures, though for
now only arm64 will have deferred the cpu_register() calls.

If there are problems with firmware tables on some devices, the CPUs will
already be online by the time the acpi_processor_make_enabled() is called.
A mismatch here causes a firmware-bug message and kernel taint. This should
only affect people with broken firmware who also boot with maxcpus=1, and
bring CPUs online later.
 
If folk want to play along at home, you'll need a copy of Qemu that supports this.
https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v2

Replace your '-smp' argument with something like:
 | -smp cpus=1,maxcpus=3,cores=3,threads=1,sockets=1
 
 then feed the following to the Qemu montior;
 | (qemu) device_add driver=host-arm-cpu,core-id=1,id=cpu1
 | (qemu) device_del cpu1

James Morse (11):
  ACPI: processor: Register deferred CPUs from acpi_processor_get_info()
  ACPI: Rename acpi_processor_hotadd_init and  remove pre-processor
    guards
  ACPI: Add post_eject to struct acpi_scan_handler for cpu hotplug
  ACPI: Check _STA present bit before making CPUs not present
  ACPI: Warn when the present bit changes but the feature is not enabled
  arm64: acpi: Move get_cpu_for_acpi_id() to a header
  irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc()
  irqchip/gic-v3: Add support for ACPI's disabled but 'online capable'
    CPUs
  ACPI: add support to (un)register CPUs based on the _STA enabled bit
  arm64: document virtual CPU hotplug's expectations
  cpumask: Add enabled cpumask for present CPUs that can be brought
    online

Jean-Philippe Brucker (1):
  arm64: psci: Ignore DENIED CPUs

Jonathan Cameron (5):
  cpu: Do not warn on arch_register_cpu() returning -EPROBE_DEFER
  ACPI: processor: Set the ACPI_COMPANION for the struct cpu instance
  ACPI: utils: Add an acpi_sta_enabled() helper and use it in
    acpi_processor_make_present()
  ACPI: scan: Add parameter to allow defering some actions in
    acpi_scan_check_and_detach.
  arm64: arch_register_cpu() variant to allow checking of ACPI _STA

Russell King (1):
  ACPI: convert acpi_processor_post_eject() to use IS_ENABLED()

 .../ABI/testing/sysfs-devices-system-cpu      |   6 +
 Documentation/arch/arm64/cpu-hotplug.rst      |  79 ++++++++++++
 Documentation/arch/arm64/index.rst            |   1 +
 arch/arm64/include/asm/acpi.h                 |  11 ++
 arch/arm64/kernel/acpi_numa.c                 |  11 --
 arch/arm64/kernel/psci.c                      |   2 +-
 arch/arm64/kernel/smp.c                       |  23 +++-
 drivers/acpi/acpi_processor.c                 | 112 +++++++++++++++---
 drivers/acpi/scan.c                           |  57 +++++++--
 drivers/acpi/utils.c                          |  21 ++++
 drivers/base/cpu.c                            |  12 +-
 drivers/irqchip/irq-gic-v3.c                  |  32 +++--
 include/acpi/acpi_bus.h                       |   2 +
 include/linux/acpi.h                          |   5 +-
 include/linux/cpumask.h                       |  25 ++++
 kernel/cpu.c                                  |   3 +
 16 files changed, 346 insertions(+), 56 deletions(-)
 create mode 100644 Documentation/arch/arm64/cpu-hotplug.rst

Russell King (Oracle) April 12, 2024, 9:52 p.m. UTC | #1

On Fri, Apr 12, 2024 at 10:54:32PM +0200, Thomas Gleixner wrote:
> On Fri, Apr 12 2024 at 21:16, Russell King (Oracle) wrote:
> > On Fri, Apr 12, 2024 at 08:30:40PM +0200, Rafael J. Wysocki wrote:
> >> Say acpi_map_cpu) / acpi_unmap_cpu() are turned into arch calls.
> >> What's the difference then?  The locking, which should be fine if I'm
> >> not mistaken and need_hotplug_init that needs to be set if this code
> >> runs after the processor driver has loaded AFAICS.
> >
> > It is over this that I walked away from progressing this code, because
> > I don't think it's quite as simple as you make it out to be.
> >
> > Yes, acpi_map_cpu() and acpi_unmap_cpu() are already arch implemented
> > functions, so Arm64 can easily provide stubs for these that do nothing.
> > That never caused me any concern.
> >
> > What does cause me great concern though are the finer details. For
> > example, above you seem to drop the evaluation of _STA for the
> > "make_present" case - I've no idea whether that is something that
> > should be deleted or not (if it is something that can be deleted,
> > then why not delete it now?)
> >
> > As for the cpu locking, I couldn't find anything in arch_register_cpu()
> > that depends on the cpu_maps_update stuff nor needs the cpus_write_lock
> > being taken - so I've no idea why the "make_present" case takes these
> > locks.
> 
> Anything which updates a CPU mask, e.g. cpu_present_mask, after early
> boot must hold the appropriate write locks. Otherwise it would be
> possible to online a CPU which just got marked present, but the
> registration has not completed yet.

Yes. As far as I've been able to determine, arch_register_cpu()
doesn't manipulate any of the CPU masks. All it seems to be doing
is initialising the struct cpu, registering the embedded struct
device, and setting up the sysfs links to its NUMA node.

There is nothing obvious in there which manipulates any CPU masks, and
this is rather my fundamental point when I said "I couldn't find
anything in arch_register_cpu() that depends on ...".

If there is something, then comments in the code would be a useful aid
because it's highly non-obvious where such a manipulation is located,
and hence why the locks are necessary.

> > Finally, the "pr->flags.need_hotplug_init = 1" thing... it's not
> > obvious that this is required - remember that with Arm64's "enabled"
> > toggling, the "processor" is a slice of the system and doesn't
> > actually go away - it's just "not enabled" for use.
> >
> > Again, as "processors" in Arm64 are slices of the system, they have
> > to be fully described in ACPI before the OS boots, and they will be
> > marked as being "present", which means they will be enumerated, and
> > the driver will be probed. Any processor that is not to be used will
> > not have its enabled bit set. It is my understanding that every
> > processor will result in the ACPI processor driver being bound to it
> > whether its enabled or not.
> >
> > The difference between real hotplug and Arm64 hotplug is that real
> > hotplug makes stuff not-present (and thus unenumerable). Arm64 hotplug
> > makes stuff not-enabled which is still enumerable.
> 
> Define "real hotplug" :)
> 
> Real physical hotplug does not really exist. That's at least true for
> x86, where the physical hotplug support was chased for a while, but
> never ended up in production.
> 
> Though virtualization happily jumped on it to hot add/remove CPUs
> to/from a guest.
> 
> There are limitations to this and we learned it the hard way on X86. At
> the end we came up with the following restrictions:
> 
>     1) All possible CPUs have to be advertised at boot time via firmware
>        (ACPI/DT/whatever) independent of them being present at boot time
>        or not.
> 
>        That guarantees proper sizing and ensures that associations
>        between hardware entities and software representations and the
>        resulting topology are stable for the lifetime of a system.
> 
>        It is really required to know the full topology of the system at
>        boot time especially with hybrid CPUs where some of the cores
>        have hyperthreading and the others do not.
> 
> 
>     2) Hot add can only mark an already registered (possible) CPU
>        present. Adding non-registered CPUs after boot is not possible.
> 
>        The CPU must have been registered in #1 already to ensure that
>        the system topology does not suddenly change in an incompatible
>        way at run-time.
> 
> The same restriction would apply to real physical hotplug. I don't think
> that's any different for ARM64 or any other architecture.

This makes me wonder whether the Arm64 has been barking up the wrong
tree then, and whether the whole "present" vs "enabled" thing comes
from a misunderstanding as far as a CPU goes.

However, there is a big difference between the two. On x86, a processor
is just a processor. On Arm64, a "processor" is a slice of the system
(includes the interrupt controller, PMUs etc) and we must enumerate
those even when the processor itself is not enabled. This is the whole
reason there's a difference between "present" and "enabled" and why
there's a difference between x86 cpu hotplug and arm64 cpu hotplug.
The processor never actually goes away in arm64, it's just prevented
from being used.

Jonathan Cameron April 15, 2024, 9:16 a.m. UTC | #2

On Mon, 15 Apr 2024 09:45:52 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Sat, 13 Apr 2024 01:23:48 +0200
> Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > Russell!
> > 
> > On Fri, Apr 12 2024 at 22:52, Russell King (Oracle) wrote:  
> > > On Fri, Apr 12, 2024 at 10:54:32PM +0200, Thomas Gleixner wrote:    
> > >> > As for the cpu locking, I couldn't find anything in arch_register_cpu()
> > >> > that depends on the cpu_maps_update stuff nor needs the cpus_write_lock
> > >> > being taken - so I've no idea why the "make_present" case takes these
> > >> > locks.    
> > >> 
> > >> Anything which updates a CPU mask, e.g. cpu_present_mask, after early
> > >> boot must hold the appropriate write locks. Otherwise it would be
> > >> possible to online a CPU which just got marked present, but the
> > >> registration has not completed yet.    
> > >
> > > Yes. As far as I've been able to determine, arch_register_cpu()
> > > doesn't manipulate any of the CPU masks. All it seems to be doing
> > > is initialising the struct cpu, registering the embedded struct
> > > device, and setting up the sysfs links to its NUMA node.
> > >
> > > There is nothing obvious in there which manipulates any CPU masks, and
> > > this is rather my fundamental point when I said "I couldn't find
> > > anything in arch_register_cpu() that depends on ...".
> > >
> > > If there is something, then comments in the code would be a useful aid
> > > because it's highly non-obvious where such a manipulation is located,
> > > and hence why the locks are necessary.    
> > 
> > acpi_processor_hotadd_init()
> > ...
> >          acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);
> > 
> > That ends up in fiddling with cpu_present_mask.
> > 
> > I grant you that arch_register_cpu() is not, but it might rely on the
> > external locking too. I could not be bothered to figure that out.
> >   
> > >> Define "real hotplug" :)
> > >> 
> > >> Real physical hotplug does not really exist. That's at least true for
> > >> x86, where the physical hotplug support was chased for a while, but
> > >> never ended up in production.
> > >> 
> > >> Though virtualization happily jumped on it to hot add/remove CPUs
> > >> to/from a guest.
> > >> 
> > >> There are limitations to this and we learned it the hard way on X86. At
> > >> the end we came up with the following restrictions:
> > >> 
> > >>     1) All possible CPUs have to be advertised at boot time via firmware
> > >>        (ACPI/DT/whatever) independent of them being present at boot time
> > >>        or not.
> > >> 
> > >>        That guarantees proper sizing and ensures that associations
> > >>        between hardware entities and software representations and the
> > >>        resulting topology are stable for the lifetime of a system.
> > >> 
> > >>        It is really required to know the full topology of the system at
> > >>        boot time especially with hybrid CPUs where some of the cores
> > >>        have hyperthreading and the others do not.
> > >> 
> > >> 
> > >>     2) Hot add can only mark an already registered (possible) CPU
> > >>        present. Adding non-registered CPUs after boot is not possible.
> > >> 
> > >>        The CPU must have been registered in #1 already to ensure that
> > >>        the system topology does not suddenly change in an incompatible
> > >>        way at run-time.
> > >> 
> > >> The same restriction would apply to real physical hotplug. I don't think
> > >> that's any different for ARM64 or any other architecture.    
> > >
> > > This makes me wonder whether the Arm64 has been barking up the wrong
> > > tree then, and whether the whole "present" vs "enabled" thing comes
> > > from a misunderstanding as far as a CPU goes.
> > >
> > > However, there is a big difference between the two. On x86, a processor
> > > is just a processor. On Arm64, a "processor" is a slice of the system
> > > (includes the interrupt controller, PMUs etc) and we must enumerate
> > > those even when the processor itself is not enabled. This is the whole
> > > reason there's a difference between "present" and "enabled" and why
> > > there's a difference between x86 cpu hotplug and arm64 cpu hotplug.
> > > The processor never actually goes away in arm64, it's just prevented
> > > from being used.    
> > 
> > It's the same on X86 at least in the physical world.  
> 
> There were public calls on this via the Linaro Open Discussions group,
> so I can talk a little about how we ended up here.  Note that (in my
> opinion) there is zero chance of this changing - it took us well over
> a year to get to this conclusion.  So if we ever want ARM vCPU HP
> we need to work within these constraints. 
> 
> The ARM architecture folk (the ones defining the ARM ARM, relevant ACPI
> specs etc, not the kernel maintainers) are determined that they want
> to retain the option to do real physical CPU hotplug in the future
> with all the necessary work around dynamic interrupt controller
> initialization, debug and many other messy corners.
> 
> Thus anything defined had to be structured in a way that was 'different'
> from that.
> 
> I don't mind the proposed flattening of the 2 paths if the ARM kernel
> maintainers are fine with it but it will remove the distinctions and
> we will need to be very careful with the CPU masks - we can't handle
> them the same as x86 does.
> 
> I'll get on with doing that, but do need input from Will / Catalin / James.
> There are some quirks that need calling out as it's not quite a simple
> as it appears from a high level.
> 
> Another part of that long discussion established that there is userspace
> (Android IIRC) in which the CPU present mask must include all CPUs
> at boot. To change that would be userspace ABI breakage so we can't
> do that.  Hence the dance around adding yet another mask to allow the
> OS to understand which CPUs are 'present' but not possible to online.
> 
> Flattening the two paths removes any distinction between calls that
> are for real hotplug and those that are for this online capable path.
> As a side note, the indicating bit for these flows is defined in ACPI
> for x86 from ACPI 6.3 as a flag in Processor Local APIC
> (the ARM64 definition is a cut and paste of that text).  So someone
> is interested in this distinction on x86. I can't say who but if
> you have a mantis account you can easily follow the history and it
> might be instructive to not everyone considering the current x86
> flow the right way to do it.

Would a higher level check to catch that we are hitting undefined
territory on arm64 be acceptable? That might satisfy the constraint
that we should not have any software for arm64 that would run if
physical CPU HP is added to the arch in future.  Something like:

@@ -331,6 +331,13 @@ static int acpi_processor_get_info(struct acpi_device *device)

        c = &per_cpu(cpu_devices, pr->id);
        ACPI_COMPANION_SET(&c->dev, device);
+
+       if (!IS_ENABLED(CONFIG_ACPI_CPU_HOTPLUG_CPU) &&
+           (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id))) {
+               pr_err_once("Changing CPU present bit is not supported\n");
+               return -ENODEV;
+       }
+

This is basically lifting the check out of the acpi_processor_make_present()
call in this patch set.

With that in place before the new shared call I think we should be fine
wrt to the ARM Architecture requirements.

Jonathan


        /*
> 
> Jonathan
> 
> 
> > 
> > Thanks,
> > 
> >         tglx
> >   
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Rafael J. Wysocki April 15, 2024, 11:37 a.m. UTC | #3

On Mon, Apr 15, 2024 at 10:46 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Sat, 13 Apr 2024 01:23:48 +0200
> Thomas Gleixner <tglx@linutronix.de> wrote:
>
> > Russell!
> >
> > On Fri, Apr 12 2024 at 22:52, Russell King (Oracle) wrote:
> > > On Fri, Apr 12, 2024 at 10:54:32PM +0200, Thomas Gleixner wrote:
> > >> > As for the cpu locking, I couldn't find anything in arch_register_cpu()
> > >> > that depends on the cpu_maps_update stuff nor needs the cpus_write_lock
> > >> > being taken - so I've no idea why the "make_present" case takes these
> > >> > locks.
> > >>
> > >> Anything which updates a CPU mask, e.g. cpu_present_mask, after early
> > >> boot must hold the appropriate write locks. Otherwise it would be
> > >> possible to online a CPU which just got marked present, but the
> > >> registration has not completed yet.
> > >
> > > Yes. As far as I've been able to determine, arch_register_cpu()
> > > doesn't manipulate any of the CPU masks. All it seems to be doing
> > > is initialising the struct cpu, registering the embedded struct
> > > device, and setting up the sysfs links to its NUMA node.
> > >
> > > There is nothing obvious in there which manipulates any CPU masks, and
> > > this is rather my fundamental point when I said "I couldn't find
> > > anything in arch_register_cpu() that depends on ...".
> > >
> > > If there is something, then comments in the code would be a useful aid
> > > because it's highly non-obvious where such a manipulation is located,
> > > and hence why the locks are necessary.
> >
> > acpi_processor_hotadd_init()
> > ...
> >          acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);
> >
> > That ends up in fiddling with cpu_present_mask.
> >
> > I grant you that arch_register_cpu() is not, but it might rely on the
> > external locking too. I could not be bothered to figure that out.
> >
> > >> Define "real hotplug" :)
> > >>
> > >> Real physical hotplug does not really exist. That's at least true for
> > >> x86, where the physical hotplug support was chased for a while, but
> > >> never ended up in production.
> > >>
> > >> Though virtualization happily jumped on it to hot add/remove CPUs
> > >> to/from a guest.
> > >>
> > >> There are limitations to this and we learned it the hard way on X86. At
> > >> the end we came up with the following restrictions:
> > >>
> > >>     1) All possible CPUs have to be advertised at boot time via firmware
> > >>        (ACPI/DT/whatever) independent of them being present at boot time
> > >>        or not.
> > >>
> > >>        That guarantees proper sizing and ensures that associations
> > >>        between hardware entities and software representations and the
> > >>        resulting topology are stable for the lifetime of a system.
> > >>
> > >>        It is really required to know the full topology of the system at
> > >>        boot time especially with hybrid CPUs where some of the cores
> > >>        have hyperthreading and the others do not.
> > >>
> > >>
> > >>     2) Hot add can only mark an already registered (possible) CPU
> > >>        present. Adding non-registered CPUs after boot is not possible.
> > >>
> > >>        The CPU must have been registered in #1 already to ensure that
> > >>        the system topology does not suddenly change in an incompatible
> > >>        way at run-time.
> > >>
> > >> The same restriction would apply to real physical hotplug. I don't think
> > >> that's any different for ARM64 or any other architecture.
> > >
> > > This makes me wonder whether the Arm64 has been barking up the wrong
> > > tree then, and whether the whole "present" vs "enabled" thing comes
> > > from a misunderstanding as far as a CPU goes.
> > >
> > > However, there is a big difference between the two. On x86, a processor
> > > is just a processor. On Arm64, a "processor" is a slice of the system
> > > (includes the interrupt controller, PMUs etc) and we must enumerate
> > > those even when the processor itself is not enabled. This is the whole
> > > reason there's a difference between "present" and "enabled" and why
> > > there's a difference between x86 cpu hotplug and arm64 cpu hotplug.
> > > The processor never actually goes away in arm64, it's just prevented
> > > from being used.
> >
> > It's the same on X86 at least in the physical world.
>
> There were public calls on this via the Linaro Open Discussions group,
> so I can talk a little about how we ended up here.  Note that (in my
> opinion) there is zero chance of this changing - it took us well over
> a year to get to this conclusion.  So if we ever want ARM vCPU HP
> we need to work within these constraints.
>
> The ARM architecture folk (the ones defining the ARM ARM, relevant ACPI
> specs etc, not the kernel maintainers) are determined that they want
> to retain the option to do real physical CPU hotplug in the future
> with all the necessary work around dynamic interrupt controller
> initialization, debug and many other messy corners.

That's OK, but the difference is not in the ACPi CPU enumeration/removal code.

> Thus anything defined had to be structured in a way that was 'different'
> from that.

Apparently, that's where things got confused.

> I don't mind the proposed flattening of the 2 paths if the ARM kernel
> maintainers are fine with it but it will remove the distinctions and
> we will need to be very careful with the CPU masks - we can't handle
> them the same as x86 does.

At the ACPI code level, there is no distinction.

A CPU that was not available before has just become available.  The
platform firmware has notified the kernel about it and now
acpi_processor_add() runs.  Why would it need to use different code
paths depending on what _STA bits were clear before?

Yes, there is some arch stuff to be called and that arch stuff should
figure out what to do to make things actually work.

> I'll get on with doing that, but do need input from Will / Catalin / James.
> There are some quirks that need calling out as it's not quite a simple
> as it appears from a high level.
>
> Another part of that long discussion established that there is userspace
> (Android IIRC) in which the CPU present mask must include all CPUs
> at boot. To change that would be userspace ABI breakage so we can't
> do that.  Hence the dance around adding yet another mask to allow the
> OS to understand which CPUs are 'present' but not possible to online.
>
> Flattening the two paths removes any distinction between calls that
> are for real hotplug and those that are for this online capable path.

Which calls exactly do you mean?

> As a side note, the indicating bit for these flows is defined in ACPI
> for x86 from ACPI 6.3 as a flag in Processor Local APIC
> (the ARM64 definition is a cut and paste of that text).  So someone
> is interested in this distinction on x86. I can't say who but if
> you have a mantis account you can easily follow the history and it
> might be instructive to not everyone considering the current x86
> flow the right way to do it.

So a physically absent processor is different from a physically
present processor that has not been disabled.  No doubt about this.

That said, I'm still unsure why these two cases require two different
code paths in acpi_processor_add().

Jonathan Cameron April 15, 2024, 11:57 a.m. UTC | #4

On Mon, 15 Apr 2024 10:16:37 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Mon, 15 Apr 2024 09:45:52 +0100
> Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> 
> > On Sat, 13 Apr 2024 01:23:48 +0200
> > Thomas Gleixner <tglx@linutronix.de> wrote:
> >   
> > > Russell!
> > > 
> > > On Fri, Apr 12 2024 at 22:52, Russell King (Oracle) wrote:    
> > > > On Fri, Apr 12, 2024 at 10:54:32PM +0200, Thomas Gleixner wrote:      
> > > >> > As for the cpu locking, I couldn't find anything in arch_register_cpu()
> > > >> > that depends on the cpu_maps_update stuff nor needs the cpus_write_lock
> > > >> > being taken - so I've no idea why the "make_present" case takes these
> > > >> > locks.      
> > > >> 
> > > >> Anything which updates a CPU mask, e.g. cpu_present_mask, after early
> > > >> boot must hold the appropriate write locks. Otherwise it would be
> > > >> possible to online a CPU which just got marked present, but the
> > > >> registration has not completed yet.      
> > > >
> > > > Yes. As far as I've been able to determine, arch_register_cpu()
> > > > doesn't manipulate any of the CPU masks. All it seems to be doing
> > > > is initialising the struct cpu, registering the embedded struct
> > > > device, and setting up the sysfs links to its NUMA node.
> > > >
> > > > There is nothing obvious in there which manipulates any CPU masks, and
> > > > this is rather my fundamental point when I said "I couldn't find
> > > > anything in arch_register_cpu() that depends on ...".
> > > >
> > > > If there is something, then comments in the code would be a useful aid
> > > > because it's highly non-obvious where such a manipulation is located,
> > > > and hence why the locks are necessary.      
> > > 
> > > acpi_processor_hotadd_init()
> > > ...
> > >          acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);
> > > 
> > > That ends up in fiddling with cpu_present_mask.
> > > 
> > > I grant you that arch_register_cpu() is not, but it might rely on the
> > > external locking too. I could not be bothered to figure that out.
> > >     
> > > >> Define "real hotplug" :)
> > > >> 
> > > >> Real physical hotplug does not really exist. That's at least true for
> > > >> x86, where the physical hotplug support was chased for a while, but
> > > >> never ended up in production.
> > > >> 
> > > >> Though virtualization happily jumped on it to hot add/remove CPUs
> > > >> to/from a guest.
> > > >> 
> > > >> There are limitations to this and we learned it the hard way on X86. At
> > > >> the end we came up with the following restrictions:
> > > >> 
> > > >>     1) All possible CPUs have to be advertised at boot time via firmware
> > > >>        (ACPI/DT/whatever) independent of them being present at boot time
> > > >>        or not.
> > > >> 
> > > >>        That guarantees proper sizing and ensures that associations
> > > >>        between hardware entities and software representations and the
> > > >>        resulting topology are stable for the lifetime of a system.
> > > >> 
> > > >>        It is really required to know the full topology of the system at
> > > >>        boot time especially with hybrid CPUs where some of the cores
> > > >>        have hyperthreading and the others do not.
> > > >> 
> > > >> 
> > > >>     2) Hot add can only mark an already registered (possible) CPU
> > > >>        present. Adding non-registered CPUs after boot is not possible.
> > > >> 
> > > >>        The CPU must have been registered in #1 already to ensure that
> > > >>        the system topology does not suddenly change in an incompatible
> > > >>        way at run-time.
> > > >> 
> > > >> The same restriction would apply to real physical hotplug. I don't think
> > > >> that's any different for ARM64 or any other architecture.      
> > > >
> > > > This makes me wonder whether the Arm64 has been barking up the wrong
> > > > tree then, and whether the whole "present" vs "enabled" thing comes
> > > > from a misunderstanding as far as a CPU goes.
> > > >
> > > > However, there is a big difference between the two. On x86, a processor
> > > > is just a processor. On Arm64, a "processor" is a slice of the system
> > > > (includes the interrupt controller, PMUs etc) and we must enumerate
> > > > those even when the processor itself is not enabled. This is the whole
> > > > reason there's a difference between "present" and "enabled" and why
> > > > there's a difference between x86 cpu hotplug and arm64 cpu hotplug.
> > > > The processor never actually goes away in arm64, it's just prevented
> > > > from being used.      
> > > 
> > > It's the same on X86 at least in the physical world.    
> > 
> > There were public calls on this via the Linaro Open Discussions group,
> > so I can talk a little about how we ended up here.  Note that (in my
> > opinion) there is zero chance of this changing - it took us well over
> > a year to get to this conclusion.  So if we ever want ARM vCPU HP
> > we need to work within these constraints. 
> > 
> > The ARM architecture folk (the ones defining the ARM ARM, relevant ACPI
> > specs etc, not the kernel maintainers) are determined that they want
> > to retain the option to do real physical CPU hotplug in the future
> > with all the necessary work around dynamic interrupt controller
> > initialization, debug and many other messy corners.
> > 
> > Thus anything defined had to be structured in a way that was 'different'
> > from that.
> > 
> > I don't mind the proposed flattening of the 2 paths if the ARM kernel
> > maintainers are fine with it but it will remove the distinctions and
> > we will need to be very careful with the CPU masks - we can't handle
> > them the same as x86 does.
> > 
> > I'll get on with doing that, but do need input from Will / Catalin / James.
> > There are some quirks that need calling out as it's not quite a simple
> > as it appears from a high level.
> > 
> > Another part of that long discussion established that there is userspace
> > (Android IIRC) in which the CPU present mask must include all CPUs
> > at boot. To change that would be userspace ABI breakage so we can't
> > do that.  Hence the dance around adding yet another mask to allow the
> > OS to understand which CPUs are 'present' but not possible to online.
> > 
> > Flattening the two paths removes any distinction between calls that
> > are for real hotplug and those that are for this online capable path.
> > As a side note, the indicating bit for these flows is defined in ACPI
> > for x86 from ACPI 6.3 as a flag in Processor Local APIC
> > (the ARM64 definition is a cut and paste of that text).  So someone
> > is interested in this distinction on x86. I can't say who but if
> > you have a mantis account you can easily follow the history and it
> > might be instructive to not everyone considering the current x86
> > flow the right way to do it.  
> 
> Would a higher level check to catch that we are hitting undefined
> territory on arm64 be acceptable? That might satisfy the constraint
> that we should not have any software for arm64 that would run if
> physical CPU HP is added to the arch in future.  Something like:
> 
> @@ -331,6 +331,13 @@ static int acpi_processor_get_info(struct acpi_device *device)
> 
>         c = &per_cpu(cpu_devices, pr->id);
>         ACPI_COMPANION_SET(&c->dev, device);
> +
> +       if (!IS_ENABLED(CONFIG_ACPI_CPU_HOTPLUG_CPU) &&
> +           (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id))) {
> +               pr_err_once("Changing CPU present bit is not supported\n");
> +               return -ENODEV;
> +       }
> +
> 
> This is basically lifting the check out of the acpi_processor_make_present()
> call in this patch set.
> 
> With that in place before the new shared call I think we should be fine
> wrt to the ARM Architecture requirements.

As discussed elsewhere in this thread, I'll push this into the arm64
specific arch_register_cpu() definition.

> 
> Jonathan
> 
> 
>         /*
> > 
> > Jonathan
> > 
> >   
> > > 
> > > Thanks,
> > > 
> > >         tglx
> > >     
> > 
> > 
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel  
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Jonathan Cameron April 15, 2024, 12:23 p.m. UTC | #5

On Mon, 15 Apr 2024 14:04:26 +0200
"Rafael J. Wysocki" <rafael@kernel.org> wrote:

> On Mon, Apr 15, 2024 at 1:56 PM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > On Mon, 15 Apr 2024 13:37:08 +0200
> > "Rafael J. Wysocki" <rafael@kernel.org> wrote:
> >  
> > > On Mon, Apr 15, 2024 at 10:46 AM Jonathan Cameron
> > > <Jonathan.Cameron@huawei.com> wrote:  
> > > >
> > > > On Sat, 13 Apr 2024 01:23:48 +0200
> > > > Thomas Gleixner <tglx@linutronix.de> wrote:
> > > >  
> > > > > Russell!
> > > > >
> > > > > On Fri, Apr 12 2024 at 22:52, Russell King (Oracle) wrote:  
> > > > > > On Fri, Apr 12, 2024 at 10:54:32PM +0200, Thomas Gleixner wrote:  
> > > > > >> > As for the cpu locking, I couldn't find anything in arch_register_cpu()
> > > > > >> > that depends on the cpu_maps_update stuff nor needs the cpus_write_lock
> > > > > >> > being taken - so I've no idea why the "make_present" case takes these
> > > > > >> > locks.  
> > > > > >>
> > > > > >> Anything which updates a CPU mask, e.g. cpu_present_mask, after early
> > > > > >> boot must hold the appropriate write locks. Otherwise it would be
> > > > > >> possible to online a CPU which just got marked present, but the
> > > > > >> registration has not completed yet.  
> > > > > >
> > > > > > Yes. As far as I've been able to determine, arch_register_cpu()
> > > > > > doesn't manipulate any of the CPU masks. All it seems to be doing
> > > > > > is initialising the struct cpu, registering the embedded struct
> > > > > > device, and setting up the sysfs links to its NUMA node.
> > > > > >
> > > > > > There is nothing obvious in there which manipulates any CPU masks, and
> > > > > > this is rather my fundamental point when I said "I couldn't find
> > > > > > anything in arch_register_cpu() that depends on ...".
> > > > > >
> > > > > > If there is something, then comments in the code would be a useful aid
> > > > > > because it's highly non-obvious where such a manipulation is located,
> > > > > > and hence why the locks are necessary.  
> > > > >
> > > > > acpi_processor_hotadd_init()
> > > > > ...
> > > > >          acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);
> > > > >
> > > > > That ends up in fiddling with cpu_present_mask.
> > > > >
> > > > > I grant you that arch_register_cpu() is not, but it might rely on the
> > > > > external locking too. I could not be bothered to figure that out.
> > > > >  
> > > > > >> Define "real hotplug" :)
> > > > > >>
> > > > > >> Real physical hotplug does not really exist. That's at least true for
> > > > > >> x86, where the physical hotplug support was chased for a while, but
> > > > > >> never ended up in production.
> > > > > >>
> > > > > >> Though virtualization happily jumped on it to hot add/remove CPUs
> > > > > >> to/from a guest.
> > > > > >>
> > > > > >> There are limitations to this and we learned it the hard way on X86. At
> > > > > >> the end we came up with the following restrictions:
> > > > > >>
> > > > > >>     1) All possible CPUs have to be advertised at boot time via firmware
> > > > > >>        (ACPI/DT/whatever) independent of them being present at boot time
> > > > > >>        or not.
> > > > > >>
> > > > > >>        That guarantees proper sizing and ensures that associations
> > > > > >>        between hardware entities and software representations and the
> > > > > >>        resulting topology are stable for the lifetime of a system.
> > > > > >>
> > > > > >>        It is really required to know the full topology of the system at
> > > > > >>        boot time especially with hybrid CPUs where some of the cores
> > > > > >>        have hyperthreading and the others do not.
> > > > > >>
> > > > > >>
> > > > > >>     2) Hot add can only mark an already registered (possible) CPU
> > > > > >>        present. Adding non-registered CPUs after boot is not possible.
> > > > > >>
> > > > > >>        The CPU must have been registered in #1 already to ensure that
> > > > > >>        the system topology does not suddenly change in an incompatible
> > > > > >>        way at run-time.
> > > > > >>
> > > > > >> The same restriction would apply to real physical hotplug. I don't think
> > > > > >> that's any different for ARM64 or any other architecture.  
> > > > > >
> > > > > > This makes me wonder whether the Arm64 has been barking up the wrong
> > > > > > tree then, and whether the whole "present" vs "enabled" thing comes
> > > > > > from a misunderstanding as far as a CPU goes.
> > > > > >
> > > > > > However, there is a big difference between the two. On x86, a processor
> > > > > > is just a processor. On Arm64, a "processor" is a slice of the system
> > > > > > (includes the interrupt controller, PMUs etc) and we must enumerate
> > > > > > those even when the processor itself is not enabled. This is the whole
> > > > > > reason there's a difference between "present" and "enabled" and why
> > > > > > there's a difference between x86 cpu hotplug and arm64 cpu hotplug.
> > > > > > The processor never actually goes away in arm64, it's just prevented
> > > > > > from being used.  
> > > > >
> > > > > It's the same on X86 at least in the physical world.  
> > > >
> > > > There were public calls on this via the Linaro Open Discussions group,
> > > > so I can talk a little about how we ended up here.  Note that (in my
> > > > opinion) there is zero chance of this changing - it took us well over
> > > > a year to get to this conclusion.  So if we ever want ARM vCPU HP
> > > > we need to work within these constraints.
> > > >
> > > > The ARM architecture folk (the ones defining the ARM ARM, relevant ACPI
> > > > specs etc, not the kernel maintainers) are determined that they want
> > > > to retain the option to do real physical CPU hotplug in the future
> > > > with all the necessary work around dynamic interrupt controller
> > > > initialization, debug and many other messy corners.  
> > >
> > > That's OK, but the difference is not in the ACPi CPU enumeration/removal code.
> > >  
> > > > Thus anything defined had to be structured in a way that was 'different'
> > > > from that.  
> > >
> > > Apparently, that's where things got confused.
> > >  
> > > > I don't mind the proposed flattening of the 2 paths if the ARM kernel
> > > > maintainers are fine with it but it will remove the distinctions and
> > > > we will need to be very careful with the CPU masks - we can't handle
> > > > them the same as x86 does.  
> > >
> > > At the ACPI code level, there is no distinction.
> > >
> > > A CPU that was not available before has just become available.  The
> > > platform firmware has notified the kernel about it and now
> > > acpi_processor_add() runs.  Why would it need to use different code
> > > paths depending on what _STA bits were clear before?  
> >
> > I think we will continue to disagree on this.  To my mind and from the
> > ACPI specification, they are two different state transitions with different
> > required actions.  
> 
> Well, please be specific: What exactly do you mean here and which
> parts of the spec are you talking about?

Given we are moving on with your suggestion, lets leave this for now - too many
other things to do! :)

> 
> > Those state transitions are an ACPI level thing not
> > an arch level one.  However, I want a solution that moves things forwards
> > so I'll give pushing that entirely into the arch code a try.  
> 
> Thanks!
> 
> Though I think that there is a disconnect between us that needs to be
> clarified first.

I'm fine with accepting your approach if it works and is acceptable
to the arm kernel folk. They are getting a non trivial arch_register_cpu()
with a bunch of ACPI specific handling in it that may come as a surprise.

> 
> > >
> > > Yes, there is some arch stuff to be called and that arch stuff should
> > > figure out what to do to make things actually work.
> > >  
> > > > I'll get on with doing that, but do need input from Will / Catalin / James.
> > > > There are some quirks that need calling out as it's not quite a simple
> > > > as it appears from a high level.
> > > >
> > > > Another part of that long discussion established that there is userspace
> > > > (Android IIRC) in which the CPU present mask must include all CPUs
> > > > at boot. To change that would be userspace ABI breakage so we can't
> > > > do that.  Hence the dance around adding yet another mask to allow the
> > > > OS to understand which CPUs are 'present' but not possible to online.
> > > >
> > > > Flattening the two paths removes any distinction between calls that
> > > > are for real hotplug and those that are for this online capable path.  
> > >
> > > Which calls exactly do you mean?  
> >
> > At the moment he distinction does not exist (because x86 only supports
> > fake physical CPU HP and arm64 only vCPU HP / online capable), but if
> > the architecture is defined for arm64 physical hotplug in the future
> > we would need to do interrupt controller bring up + a lot of other stuff.
> >
> > It may be possible to do that in the arch code - will be hard to verify
> > that until that arch is defined  Today all I need to do is ensure that
> > any attempt to do present bit setting for ARM64 returns an error.
> > That looks to be straight forward.  
> 
> OK
> 
> >  
> > >  
> > > > As a side note, the indicating bit for these flows is defined in ACPI
> > > > for x86 from ACPI 6.3 as a flag in Processor Local APIC
> > > > (the ARM64 definition is a cut and paste of that text).  So someone
> > > > is interested in this distinction on x86. I can't say who but if
> > > > you have a mantis account you can easily follow the history and it
> > > > might be instructive to not everyone considering the current x86
> > > > flow the right way to do it.  
> > >
> > > So a physically absent processor is different from a physically
> > > present processor that has not been disabled.  No doubt about this.
> > >
> > > That said, I'm still unsure why these two cases require two different
> > > code paths in acpi_processor_add().  
> >
> > It might be possible to push the checking down into arch_register_cpu()
> > and have that for now reject any attempt to do physical CPU HP on arm64.
> > It is that gate that is vital to getting this accepted by ARM.
> >
> > I'm still very much stuck on the hotadd_init flag however, so any suggestions
> > on that would be very welcome!  
> 
> I need to do some investigation which will take some time I suppose.

I'll do so as well once I've gotten the rest sorted out.  That whole
structure seems overly complex and liable to race, though maybe sufficient
locking happens to be held that it's not a problem.

Jonathan

Rafael J. Wysocki April 15, 2024, 12:41 p.m. UTC | #6

On Mon, Apr 15, 2024 at 2:37 PM Salil Mehta <salil.mehta@huawei.com> wrote:
>
> Hi Rafael,
>
> >  From: Rafael J. Wysocki <rafael@kernel.org>
> >  Sent: Monday, April 15, 2024 1:04 PM
> >

[cut]

> >
> >  I need to do some investigation which will take some time I suppose.
>
>
> You might find below cover letter and links to the presentations useful:
>
> 1. https://lore.kernel.org/qemu-devel/20230926100436.28284-1-salil.mehta@huawei.com/
> 2. https://kvm-forum.qemu.org/2023/KVM-forum-cpu-hotplug_7OJ1YyJ.pdf
> 3. https://kvm-forum.qemu.org/2023/Challenges_Revisited_in_Supporting_Virt_CPU_Hotplug_-__ii0iNb3.pdf
> 4. https://sched.co/eE4m

Thanks, I'll go through this, but I kind of doubt if it helps me with
finding out what to do with the hotadd_init flag.

Jonathan Cameron April 16, 2024, 5:41 p.m. UTC | #7

On Mon, 15 Apr 2024 13:23:51 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Mon, 15 Apr 2024 14:04:26 +0200
> "Rafael J. Wysocki" <rafael@kernel.org> wrote:
> 
> > On Mon, Apr 15, 2024 at 1:56 PM Jonathan Cameron
> > <Jonathan.Cameron@huawei.com> wrote:  
> > >
> > > On Mon, 15 Apr 2024 13:37:08 +0200
> > > "Rafael J. Wysocki" <rafael@kernel.org> wrote:
> > >    
> > > > On Mon, Apr 15, 2024 at 10:46 AM Jonathan Cameron
> > > > <Jonathan.Cameron@huawei.com> wrote:    
> > > > >
> > > > > On Sat, 13 Apr 2024 01:23:48 +0200
> > > > > Thomas Gleixner <tglx@linutronix.de> wrote:
> > > > >    
> > > > > > Russell!
> > > > > >
> > > > > > On Fri, Apr 12 2024 at 22:52, Russell King (Oracle) wrote:    
> > > > > > > On Fri, Apr 12, 2024 at 10:54:32PM +0200, Thomas Gleixner wrote:    
> > > > > > >> > As for the cpu locking, I couldn't find anything in arch_register_cpu()
> > > > > > >> > that depends on the cpu_maps_update stuff nor needs the cpus_write_lock
> > > > > > >> > being taken - so I've no idea why the "make_present" case takes these
> > > > > > >> > locks.    
> > > > > > >>
> > > > > > >> Anything which updates a CPU mask, e.g. cpu_present_mask, after early
> > > > > > >> boot must hold the appropriate write locks. Otherwise it would be
> > > > > > >> possible to online a CPU which just got marked present, but the
> > > > > > >> registration has not completed yet.    
> > > > > > >
> > > > > > > Yes. As far as I've been able to determine, arch_register_cpu()
> > > > > > > doesn't manipulate any of the CPU masks. All it seems to be doing
> > > > > > > is initialising the struct cpu, registering the embedded struct
> > > > > > > device, and setting up the sysfs links to its NUMA node.
> > > > > > >
> > > > > > > There is nothing obvious in there which manipulates any CPU masks, and
> > > > > > > this is rather my fundamental point when I said "I couldn't find
> > > > > > > anything in arch_register_cpu() that depends on ...".
> > > > > > >
> > > > > > > If there is something, then comments in the code would be a useful aid
> > > > > > > because it's highly non-obvious where such a manipulation is located,
> > > > > > > and hence why the locks are necessary.    
> > > > > >
> > > > > > acpi_processor_hotadd_init()
> > > > > > ...
> > > > > >          acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);
> > > > > >
> > > > > > That ends up in fiddling with cpu_present_mask.
> > > > > >
> > > > > > I grant you that arch_register_cpu() is not, but it might rely on the
> > > > > > external locking too. I could not be bothered to figure that out.
> > > > > >    
> > > > > > >> Define "real hotplug" :)
> > > > > > >>
> > > > > > >> Real physical hotplug does not really exist. That's at least true for
> > > > > > >> x86, where the physical hotplug support was chased for a while, but
> > > > > > >> never ended up in production.
> > > > > > >>
> > > > > > >> Though virtualization happily jumped on it to hot add/remove CPUs
> > > > > > >> to/from a guest.
> > > > > > >>
> > > > > > >> There are limitations to this and we learned it the hard way on X86. At
> > > > > > >> the end we came up with the following restrictions:
> > > > > > >>
> > > > > > >>     1) All possible CPUs have to be advertised at boot time via firmware
> > > > > > >>        (ACPI/DT/whatever) independent of them being present at boot time
> > > > > > >>        or not.
> > > > > > >>
> > > > > > >>        That guarantees proper sizing and ensures that associations
> > > > > > >>        between hardware entities and software representations and the
> > > > > > >>        resulting topology are stable for the lifetime of a system.
> > > > > > >>
> > > > > > >>        It is really required to know the full topology of the system at
> > > > > > >>        boot time especially with hybrid CPUs where some of the cores
> > > > > > >>        have hyperthreading and the others do not.
> > > > > > >>
> > > > > > >>
> > > > > > >>     2) Hot add can only mark an already registered (possible) CPU
> > > > > > >>        present. Adding non-registered CPUs after boot is not possible.
> > > > > > >>
> > > > > > >>        The CPU must have been registered in #1 already to ensure that
> > > > > > >>        the system topology does not suddenly change in an incompatible
> > > > > > >>        way at run-time.
> > > > > > >>
> > > > > > >> The same restriction would apply to real physical hotplug. I don't think
> > > > > > >> that's any different for ARM64 or any other architecture.    
> > > > > > >
> > > > > > > This makes me wonder whether the Arm64 has been barking up the wrong
> > > > > > > tree then, and whether the whole "present" vs "enabled" thing comes
> > > > > > > from a misunderstanding as far as a CPU goes.
> > > > > > >
> > > > > > > However, there is a big difference between the two. On x86, a processor
> > > > > > > is just a processor. On Arm64, a "processor" is a slice of the system
> > > > > > > (includes the interrupt controller, PMUs etc) and we must enumerate
> > > > > > > those even when the processor itself is not enabled. This is the whole
> > > > > > > reason there's a difference between "present" and "enabled" and why
> > > > > > > there's a difference between x86 cpu hotplug and arm64 cpu hotplug.
> > > > > > > The processor never actually goes away in arm64, it's just prevented
> > > > > > > from being used.    
> > > > > >
> > > > > > It's the same on X86 at least in the physical world.    
> > > > >
> > > > > There were public calls on this via the Linaro Open Discussions group,
> > > > > so I can talk a little about how we ended up here.  Note that (in my
> > > > > opinion) there is zero chance of this changing - it took us well over
> > > > > a year to get to this conclusion.  So if we ever want ARM vCPU HP
> > > > > we need to work within these constraints.
> > > > >
> > > > > The ARM architecture folk (the ones defining the ARM ARM, relevant ACPI
> > > > > specs etc, not the kernel maintainers) are determined that they want
> > > > > to retain the option to do real physical CPU hotplug in the future
> > > > > with all the necessary work around dynamic interrupt controller
> > > > > initialization, debug and many other messy corners.    
> > > >
> > > > That's OK, but the difference is not in the ACPi CPU enumeration/removal code.
> > > >    
> > > > > Thus anything defined had to be structured in a way that was 'different'
> > > > > from that.    
> > > >
> > > > Apparently, that's where things got confused.
> > > >    
> > > > > I don't mind the proposed flattening of the 2 paths if the ARM kernel
> > > > > maintainers are fine with it but it will remove the distinctions and
> > > > > we will need to be very careful with the CPU masks - we can't handle
> > > > > them the same as x86 does.    
> > > >
> > > > At the ACPI code level, there is no distinction.
> > > >
> > > > A CPU that was not available before has just become available.  The
> > > > platform firmware has notified the kernel about it and now
> > > > acpi_processor_add() runs.  Why would it need to use different code
> > > > paths depending on what _STA bits were clear before?    
> > >
> > > I think we will continue to disagree on this.  To my mind and from the
> > > ACPI specification, they are two different state transitions with different
> > > required actions.    
> > 
> > Well, please be specific: What exactly do you mean here and which
> > parts of the spec are you talking about?  
> 
> Given we are moving on with your suggestion, lets leave this for now - too many
> other things to do! :)
> 
> >   
> > > Those state transitions are an ACPI level thing not
> > > an arch level one.  However, I want a solution that moves things forwards
> > > so I'll give pushing that entirely into the arch code a try.    
> > 
> > Thanks!
> > 
> > Though I think that there is a disconnect between us that needs to be
> > clarified first.  
> 
> I'm fine with accepting your approach if it works and is acceptable
> to the arm kernel folk. They are getting a non trivial arch_register_cpu()
> with a bunch of ACPI specific handling in it that may come as a surprise.
> 
> >   
> > > >
> > > > Yes, there is some arch stuff to be called and that arch stuff should
> > > > figure out what to do to make things actually work.
> > > >    
> > > > > I'll get on with doing that, but do need input from Will / Catalin / James.
> > > > > There are some quirks that need calling out as it's not quite a simple
> > > > > as it appears from a high level.
> > > > >
> > > > > Another part of that long discussion established that there is userspace
> > > > > (Android IIRC) in which the CPU present mask must include all CPUs
> > > > > at boot. To change that would be userspace ABI breakage so we can't
> > > > > do that.  Hence the dance around adding yet another mask to allow the
> > > > > OS to understand which CPUs are 'present' but not possible to online.
> > > > >
> > > > > Flattening the two paths removes any distinction between calls that
> > > > > are for real hotplug and those that are for this online capable path.    
> > > >
> > > > Which calls exactly do you mean?    
> > >
> > > At the moment he distinction does not exist (because x86 only supports
> > > fake physical CPU HP and arm64 only vCPU HP / online capable), but if
> > > the architecture is defined for arm64 physical hotplug in the future
> > > we would need to do interrupt controller bring up + a lot of other stuff.
> > >
> > > It may be possible to do that in the arch code - will be hard to verify
> > > that until that arch is defined  Today all I need to do is ensure that
> > > any attempt to do present bit setting for ARM64 returns an error.
> > > That looks to be straight forward.    
> > 
> > OK
> >   
> > >    
> > > >    
> > > > > As a side note, the indicating bit for these flows is defined in ACPI
> > > > > for x86 from ACPI 6.3 as a flag in Processor Local APIC
> > > > > (the ARM64 definition is a cut and paste of that text).  So someone
> > > > > is interested in this distinction on x86. I can't say who but if
> > > > > you have a mantis account you can easily follow the history and it
> > > > > might be instructive to not everyone considering the current x86
> > > > > flow the right way to do it.    
> > > >
> > > > So a physically absent processor is different from a physically
> > > > present processor that has not been disabled.  No doubt about this.
> > > >
> > > > That said, I'm still unsure why these two cases require two different
> > > > code paths in acpi_processor_add().    
> > >
> > > It might be possible to push the checking down into arch_register_cpu()
> > > and have that for now reject any attempt to do physical CPU HP on arm64.
> > > It is that gate that is vital to getting this accepted by ARM.
> > >
> > > I'm still very much stuck on the hotadd_init flag however, so any suggestions
> > > on that would be very welcome!    
> > 
> > I need to do some investigation which will take some time I suppose.  
> 
> I'll do so as well once I've gotten the rest sorted out.  That whole
> structure seems overly complex and liable to race, though maybe sufficient
> locking happens to be held that it's not a problem.

Back to this a (maybe) last outstanding problem.

Superficially I think we might be able to get around this by always
doing the setup in the initial online. In brief that looks something the
below code.  Relying on the cpu hotplug callback registration calling
the acpi_soft_cpu_online for all instances that are already online.

Very lightly tested on arm64 and x86 with cold and hotplugged CPUs.
However this is all in emulation and I don't have access to any significant
x86 test farms :( So help will be needed if it's not immediately obvious why
we can't do this.

Of course, I'm open to other suggestions!

For now I'll put a tidied version of this one is as an RFC with the rest of v6.

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 06e718b650e5..97ca53b516d0 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -340,7 +340,7 @@ static int acpi_processor_get_info(struct acpi_device *device)
         */
        per_cpu(processor_device_array, pr->id) = device;
        per_cpu(processors, pr->id) = pr;
-
+       pr->flags.need_hotplug_init = 1;
        /*
         *  Extra Processor objects may be enumerated on MP systems with
         *  less than the max # of CPUs. They should be ignored _iff
diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c
index 67db60eda370..930f911fc435 100644
--- a/drivers/acpi/processor_driver.c
+++ b/drivers/acpi/processor_driver.c
@@ -206,7 +206,7 @@ static int acpi_processor_start(struct device *dev)

        /* Protect against concurrent CPU hotplug operations */
        cpu_hotplug_disable();
-       ret = __acpi_processor_start(device);
+       //      ret = __acpi_processor_start(device);
        cpu_hotplug_enable();
        return ret;
 }
@@ -279,7 +279,7 @@ static int __init acpi_processor_driver_init(void)
        if (result < 0)
                return result;

-       result = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+       result = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
                                           "acpi/cpu-drv:online",
                                           acpi_soft_cpu_online, NULL);
        if (result < 0)
> 
> Jonathan
> 
> 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Rafael J. Wysocki April 16, 2024, 7:02 p.m. UTC | #8

On Tue, Apr 16, 2024 at 7:41 PM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Mon, 15 Apr 2024 13:23:51 +0100
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
>
> > On Mon, 15 Apr 2024 14:04:26 +0200
> > "Rafael J. Wysocki" <rafael@kernel.org> wrote:

[cut]

> > > > I'm still very much stuck on the hotadd_init flag however, so any suggestions
> > > > on that would be very welcome!
> > >
> > > I need to do some investigation which will take some time I suppose.
> >
> > I'll do so as well once I've gotten the rest sorted out.  That whole
> > structure seems overly complex and liable to race, though maybe sufficient
> > locking happens to be held that it's not a problem.
>
> Back to this a (maybe) last outstanding problem.
>
> Superficially I think we might be able to get around this by always
> doing the setup in the initial online. In brief that looks something the
> below code.  Relying on the cpu hotplug callback registration calling
> the acpi_soft_cpu_online for all instances that are already online.
>
> Very lightly tested on arm64 and x86 with cold and hotplugged CPUs.
> However this is all in emulation and I don't have access to any significant
> x86 test farms :( So help will be needed if it's not immediately obvious why
> we can't do this.

AFAICS, this should work.  At least I don't see why it wouldn't.

> Of course, I'm open to other suggestions!
>
> For now I'll put a tidied version of this one is as an RFC with the rest of v6.
>
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index 06e718b650e5..97ca53b516d0 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -340,7 +340,7 @@ static int acpi_processor_get_info(struct acpi_device *device)
>          */
>         per_cpu(processor_device_array, pr->id) = device;
>         per_cpu(processors, pr->id) = pr;
> -
> +       pr->flags.need_hotplug_init = 1;
>         /*
>          *  Extra Processor objects may be enumerated on MP systems with
>          *  less than the max # of CPUs. They should be ignored _iff
> diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c
> index 67db60eda370..930f911fc435 100644
> --- a/drivers/acpi/processor_driver.c
> +++ b/drivers/acpi/processor_driver.c
> @@ -206,7 +206,7 @@ static int acpi_processor_start(struct device *dev)
>
>         /* Protect against concurrent CPU hotplug operations */
>         cpu_hotplug_disable();
> -       ret = __acpi_processor_start(device);
> +       //      ret = __acpi_processor_start(device);
>         cpu_hotplug_enable();
>         return ret;
>  }

So it looks like acpi_processor_start() is not necessary any more, is it?

> @@ -279,7 +279,7 @@ static int __init acpi_processor_driver_init(void)
>         if (result < 0)
>                 return result;
>
> -       result = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> +       result = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
>                                            "acpi/cpu-drv:online",
>                                            acpi_soft_cpu_online, NULL);
>         if (result < 0)
> >
> > Jonathan

Thanks!

Jonathan Cameron April 17, 2024, 10:39 a.m. UTC | #9

On Tue, 16 Apr 2024 21:02:02 +0200
"Rafael J. Wysocki" <rafael@kernel.org> wrote:

> On Tue, Apr 16, 2024 at 7:41 PM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > On Mon, 15 Apr 2024 13:23:51 +0100
> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> >  
> > > On Mon, 15 Apr 2024 14:04:26 +0200
> > > "Rafael J. Wysocki" <rafael@kernel.org> wrote:  
> 
> [cut]
> 
> > > > > I'm still very much stuck on the hotadd_init flag however, so any suggestions
> > > > > on that would be very welcome!  
> > > >
> > > > I need to do some investigation which will take some time I suppose.  
> > >
> > > I'll do so as well once I've gotten the rest sorted out.  That whole
> > > structure seems overly complex and liable to race, though maybe sufficient
> > > locking happens to be held that it's not a problem.  
> >
> > Back to this a (maybe) last outstanding problem.
> >
> > Superficially I think we might be able to get around this by always
> > doing the setup in the initial online. In brief that looks something the
> > below code.  Relying on the cpu hotplug callback registration calling
> > the acpi_soft_cpu_online for all instances that are already online.
> >
> > Very lightly tested on arm64 and x86 with cold and hotplugged CPUs.
> > However this is all in emulation and I don't have access to any significant
> > x86 test farms :( So help will be needed if it's not immediately obvious why
> > we can't do this.  
> 
> AFAICS, this should work.  At least I don't see why it wouldn't.
> 
> > Of course, I'm open to other suggestions!
> >
> > For now I'll put a tidied version of this one is as an RFC with the rest of v6.
> >
> > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > index 06e718b650e5..97ca53b516d0 100644
> > --- a/drivers/acpi/acpi_processor.c
> > +++ b/drivers/acpi/acpi_processor.c
> > @@ -340,7 +340,7 @@ static int acpi_processor_get_info(struct acpi_device *device)
> >          */
> >         per_cpu(processor_device_array, pr->id) = device;
> >         per_cpu(processors, pr->id) = pr;
> > -
> > +       pr->flags.need_hotplug_init = 1;
> >         /*
> >          *  Extra Processor objects may be enumerated on MP systems with
> >          *  less than the max # of CPUs. They should be ignored _iff
> > diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c
> > index 67db60eda370..930f911fc435 100644
> > --- a/drivers/acpi/processor_driver.c
> > +++ b/drivers/acpi/processor_driver.c
> > @@ -206,7 +206,7 @@ static int acpi_processor_start(struct device *dev)
> >
> >         /* Protect against concurrent CPU hotplug operations */
> >         cpu_hotplug_disable();
> > -       ret = __acpi_processor_start(device);
> > +       //      ret = __acpi_processor_start(device);
> >         cpu_hotplug_enable();
> >         return ret;
> >  }  
> 
> So it looks like acpi_processor_start() is not necessary any more, is it?

Absolutely.  This needs cleaning up beyond this hack.

Given pr has been initialized to 0, flipping the flag to be something
like 'initialized' and having the driver set it on first online rather than
in acpi_processor.c will clean it up further.

Jonathan
> 
> > @@ -279,7 +279,7 @@ static int __init acpi_processor_driver_init(void)
> >         if (result < 0)
> >                 return result;
> >
> > -       result = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> > +       result = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> >                                            "acpi/cpu-drv:online",
> >                                            acpi_soft_cpu_online, NULL);
> >         if (result < 0)  
> > >
> > > Jonathan  
> 
> Thanks!

[v5,00/18] ACPI/arm64: add support for virtual cpu hotplug

Message

Comments