mbox series

[v5,00/18] ACPI/arm64: add support for virtual cpu hotplug

Message ID 20240412143719.11398-1-Jonathan.Cameron@huawei.com
Headers show
Series ACPI/arm64: add support for virtual cpu hotplug | expand

Message

Jonathan Cameron April 12, 2024, 2:37 p.m. UTC
This patch set changes hands again in an attempt to set a new record for
most people who have worked on a single problem.

Miguel has been working on a rename and factoring out of arch
specific code patch set that will clash with this.
https://lore.kernel.org/linux-acpi/20240409150536.9933-1-miguel.luis@oracle.com/
[RFC PATCH 0/4] ACPI: processor: refactor acpi_processor_{get_info|remove}

v5 changes:
- Rebase on Rafael's rework of acpi_scan_check_and_detach() series that
  superceeded the original first patch.
  https://lore.kernel.org/linux-acpi/6021126.lOV4Wx5bFT@kreacher/
  That dealt with what I thought was the most controversial part of the
  series - checking the enabled bit ACPI _STA for CPUS.
- Change the overall handling so that arch_register_cpu() returns
  -EPROBE_DEFER if the particular architecture is not yet ready to
  answer the question of whether a particular CPU maybe used.
  This occurs for ARM64 + ACPI in 2 cases.
  1) At the initial callsite early in boot, before the AML interpreter
     is available and so the code can't query _STA.
  2) If _STA is queried but a particular CPU is present but not enabled.
     Those are the ones we are going to hotplug later.
  For all other architectures and ARM64 DT boots the this deferred
  flow is not used.
- Make the _make_enabled() and _make_not_enabled() flows more similar
  to the _make_present() and _make_not_present(). There are still
  sufficient differences that I don't think it makes sense to combine
  the code, but ensuring the locking and NUMA handling brings them
  closer together.  Note than an additional series will address the
  question of onlining and offlining the NUMA node as for now it
  will always be present (that series is not necessary for initial
  merge of this feature).

Dropped RFC because I think this is getting close to ready for merging
and now we are interested in normal review rather than calling out
significant remaining questions.

Updated version of James' original introduction.

This series adds what looks like cpuhotplug support to arm64 for use in
virtual machines. It does this by moving the cpu_register() calls for
architectures that support ACPI into an arch specific call made from
the ACPI processor driver.
 
The kubernetes folk really want to be able to add CPUs to an existing VM,
in exactly the same way they do on x86. The use-case is pre-booting guests
with one CPU, then adding the number that were actually needed when the
workload is provisioned.

Wait? Doesn't arm64 support cpuhotplug already!?
In the arm world, cpuhotplug gets used to mean removing the power from a CPU.
The CPU is offline, and remains present. For x86, and ACPI, cpuhotplug
has the additional step of physically removing the CPU, so that it isn't
present anymore.
 
Arm64 doesn't support this, and can't support it: CPUs are really a slice
of the SoC, and there is not enough information in the existing ACPI tables
to describe which bits of the slice also got removed. Without a reference
machine: adding this support to the spec is a wild goose chase.
 
Critically: everything described in the firmware tables must remain present.
 
For a virtual machine this is easy as all the other bits of 'virtual SoC'
are emulated, so they can (and do) remain present when a vCPU is 'removed'.

On a system that supports cpuhotplug the MADT has to describe every possible
CPU at boot. Under KVM, the vGIC needs to know about every possible vCPU before
the guest is started.
With these constraints, virtual-cpuhotplug is really just a hypervisor/firmware
policy about which CPUs can be brought online.
 
This series adds support for virtual-cpuhotplug as exactly that: firmware
policy. This may even work on a physical machine too; for a guest the part of
firmware is played by the VMM. (typically Qemu).
 
PSCI support is modified to return 'DENIED' if the CPU can't be brought
online/enabled yet. The CPU object's _STA method's enabled bit is used to
indicate firmware's current disposition. If the CPU has its enabled bit clear,
it will not be registered with sysfs, and attempts to bring it online will
fail. The notifications that _STA has changed its value then work in the same
way as physical hotplug, and firmware can cause the CPU to be registered some
time later, allowing it to be brought online.
 
This creates something that looks like cpuhotplug to user-space, as the sysfs
files appear and disappear, and the udev notifications look the same.
 
One notable difference is the CPU present mask, which is exposed via sysfs.
Because the CPUs remain present throughout, they can still be seen in that mask.
This value does get used by webbrowsers to estimate the number of CPUs
as the CPU online mask is constantly changed on mobile phones.
 
Linux is tolerant of PSCI returning errors, as its always been allowed to do
that. To avoid confusing OS that can't tolerate this, we needed an additional
bit in the MADT GICC flags. This series copies ACPI_MADT_ONLINE_CAPABLE, which
appears to be for this purpose, but calls it ACPI_MADT_GICC_CPU_CAPABLE as it
has a different bit position in the GICC.
 
This code is unconditionally enabled for all ACPI architectures, though for
now only arm64 will have deferred the cpu_register() calls.

If there are problems with firmware tables on some devices, the CPUs will
already be online by the time the acpi_processor_make_enabled() is called.
A mismatch here causes a firmware-bug message and kernel taint. This should
only affect people with broken firmware who also boot with maxcpus=1, and
bring CPUs online later.
 
If folk want to play along at home, you'll need a copy of Qemu that supports this.
https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v2

Replace your '-smp' argument with something like:
 | -smp cpus=1,maxcpus=3,cores=3,threads=1,sockets=1
 
 then feed the following to the Qemu montior;
 | (qemu) device_add driver=host-arm-cpu,core-id=1,id=cpu1
 | (qemu) device_del cpu1

James Morse (11):
  ACPI: processor: Register deferred CPUs from acpi_processor_get_info()
  ACPI: Rename acpi_processor_hotadd_init and  remove pre-processor
    guards
  ACPI: Add post_eject to struct acpi_scan_handler for cpu hotplug
  ACPI: Check _STA present bit before making CPUs not present
  ACPI: Warn when the present bit changes but the feature is not enabled
  arm64: acpi: Move get_cpu_for_acpi_id() to a header
  irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc()
  irqchip/gic-v3: Add support for ACPI's disabled but 'online capable'
    CPUs
  ACPI: add support to (un)register CPUs based on the _STA enabled bit
  arm64: document virtual CPU hotplug's expectations
  cpumask: Add enabled cpumask for present CPUs that can be brought
    online

Jean-Philippe Brucker (1):
  arm64: psci: Ignore DENIED CPUs

Jonathan Cameron (5):
  cpu: Do not warn on arch_register_cpu() returning -EPROBE_DEFER
  ACPI: processor: Set the ACPI_COMPANION for the struct cpu instance
  ACPI: utils: Add an acpi_sta_enabled() helper and use it in
    acpi_processor_make_present()
  ACPI: scan: Add parameter to allow defering some actions in
    acpi_scan_check_and_detach.
  arm64: arch_register_cpu() variant to allow checking of ACPI _STA

Russell King (1):
  ACPI: convert acpi_processor_post_eject() to use IS_ENABLED()

 .../ABI/testing/sysfs-devices-system-cpu      |   6 +
 Documentation/arch/arm64/cpu-hotplug.rst      |  79 ++++++++++++
 Documentation/arch/arm64/index.rst            |   1 +
 arch/arm64/include/asm/acpi.h                 |  11 ++
 arch/arm64/kernel/acpi_numa.c                 |  11 --
 arch/arm64/kernel/psci.c                      |   2 +-
 arch/arm64/kernel/smp.c                       |  23 +++-
 drivers/acpi/acpi_processor.c                 | 112 +++++++++++++++---
 drivers/acpi/scan.c                           |  57 +++++++--
 drivers/acpi/utils.c                          |  21 ++++
 drivers/base/cpu.c                            |  12 +-
 drivers/irqchip/irq-gic-v3.c                  |  32 +++--
 include/acpi/acpi_bus.h                       |   2 +
 include/linux/acpi.h                          |   5 +-
 include/linux/cpumask.h                       |  25 ++++
 kernel/cpu.c                                  |   3 +
 16 files changed, 346 insertions(+), 56 deletions(-)
 create mode 100644 Documentation/arch/arm64/cpu-hotplug.rst

Comments

Rafael J. Wysocki April 12, 2024, 5:42 p.m. UTC | #1
On Fri, Apr 12, 2024 at 4:37 PM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> For arm64 the CPU registration cannot complete until the ACPI intepretter
> us up and running so in those cases the arch specific
> arch_register_cpu() will return -EPROBE_DEFER at this stage and the
> registration will be attempted later.
>
> Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Acked-by: Rafael J. Wysocki <rafael@kernel.org>

> ---
> v5: New patch.
>     Note that for now no arch_register_cpu() calls return -EPROBE_DEFER
>     so it has no impact until the arm64 one is added later in this series.
> ---
>  drivers/base/cpu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> index 56fba44ba391..b9d0d14e5960 100644
> --- a/drivers/base/cpu.c
> +++ b/drivers/base/cpu.c
> @@ -558,7 +558,7 @@ static void __init cpu_dev_register_generic(void)
>
>         for_each_present_cpu(i) {
>                 ret = arch_register_cpu(i);
> -               if (ret)
> +               if (ret != -EPROBE_DEFER)
>                         pr_warn("register_cpu %d failed (%d)\n", i, ret);
>         }
>  }
> --
> 2.39.2
>
Rafael J. Wysocki April 12, 2024, 6:30 p.m. UTC | #2
On Fri, Apr 12, 2024 at 4:38 PM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> From: James Morse <james.morse@arm.com>
>
> The arm64 specific arch_register_cpu() call may defer CPU registration
> until the ACPI interpreter is available and the _STA method can
> be evaluated.
>
> If this occurs, then a second attempt is made in
> acpi_processor_get_info(). Note that the arm64 specific call has
> not yet been added so for now this will never be successfully
> called.
>
> Systems can still be booted with 'acpi=off', or not include an
> ACPI description at all as in these cases arch_register_cpu()
> will not have deferred registration when first called.
>
> This moves the CPU register logic back to a subsys_initcall(),
> while the memory nodes will have been registered earlier.
> Note this is where the call was prior to the cleanup series so
> there should be no side effects of moving it back again for this
> specific case.
>
> [PATCH 00/21] Initial cleanups for vCPU HP.
> https://lore.kernel.org/all/ZVyz%2FVe5pPu8AWoA@shell.armlinux.org.uk/
>
> e.g. 5b95f94c3b9f ("x86/topology: Switch over to GENERIC_CPU_DEVICES")
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Tested-by: Miguel Luis <miguel.luis@oracle.com>
> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
> Tested-by: Jianyong Wu <jianyong.wu@arm.com>
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Joanthan Cameron <Jonathan.Cameron@huawei.com>
> ---
> v5: Update commit message to make it clear this is moving the
>     init back to where it was until very recently.
>
>     No longer change the condition in the earlier registration point
>     as that will be handled by the arm64 registration routine
>     deferring until called again here.
> ---
>  drivers/acpi/acpi_processor.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index 93e029403d05..c78398cdd060 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -317,6 +317,18 @@ static int acpi_processor_get_info(struct acpi_device *device)
>
>         c = &per_cpu(cpu_devices, pr->id);
>         ACPI_COMPANION_SET(&c->dev, device);
> +       /*
> +        * Register CPUs that are present. get_cpu_device() is used to skip
> +        * duplicate CPU descriptions from firmware.
> +        */
> +       if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
> +           !get_cpu_device(pr->id)) {
> +               int ret = arch_register_cpu(pr->id);
> +
> +               if (ret)
> +                       return ret;
> +       }
> +
>         /*
>          *  Extra Processor objects may be enumerated on MP systems with
>          *  less than the max # of CPUs. They should be ignored _iff
> --

I am still unsure why there need to be two paths calling
arch_register_cpu() in acpi_processor_get_info().

Just below the comment partially pulled into the patch context above,
there is this code:

if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
         int ret = acpi_processor_hotadd_init(pr);

        if (ret)
                return ret;
}

For the sake of the argument, fold acpi_processor_hotadd_init() into
it and drop the redundant _STA check from it:

if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
        if (invalid_phys_cpuid(pr->phys_id))
                return -ENODEV;

        cpu_maps_update_begin();
        cpus_write_lock();

       ret = acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);
       if (ret) {
                cpus_write_unlock();
                cpu_maps_update_done();
                return ret;
       }
       ret = arch_register_cpu(pr->id);
       if (ret) {
                acpi_unmap_cpu(pr->id);

                cpus_write_unlock();
                cpu_maps_update_done();
                return ret;
       }
      pr_info("CPU%d has been hot-added\n", pr->id);
      pr->flags.need_hotplug_init = 1;

      cpus_write_unlock();
      cpu_maps_update_done();
}

so I'm not sure why this cannot be combined with the new code.

Say acpi_map_cpu) / acpi_unmap_cpu() are turned into arch calls.
What's the difference then?  The locking, which should be fine if I'm
not mistaken and need_hotplug_init that needs to be set if this code
runs after the processor driver has loaded AFAICS.
Russell King (Oracle) April 12, 2024, 8:16 p.m. UTC | #3
On Fri, Apr 12, 2024 at 08:30:40PM +0200, Rafael J. Wysocki wrote:
> On Fri, Apr 12, 2024 at 4:38 PM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > From: James Morse <james.morse@arm.com>
> >
> > The arm64 specific arch_register_cpu() call may defer CPU registration
> > until the ACPI interpreter is available and the _STA method can
> > be evaluated.
> >
> > If this occurs, then a second attempt is made in
> > acpi_processor_get_info(). Note that the arm64 specific call has
> > not yet been added so for now this will never be successfully
> > called.
> >
> > Systems can still be booted with 'acpi=off', or not include an
> > ACPI description at all as in these cases arch_register_cpu()
> > will not have deferred registration when first called.
> >
> > This moves the CPU register logic back to a subsys_initcall(),
> > while the memory nodes will have been registered earlier.
> > Note this is where the call was prior to the cleanup series so
> > there should be no side effects of moving it back again for this
> > specific case.
> >
> > [PATCH 00/21] Initial cleanups for vCPU HP.
> > https://lore.kernel.org/all/ZVyz%2FVe5pPu8AWoA@shell.armlinux.org.uk/
> >
> > e.g. 5b95f94c3b9f ("x86/topology: Switch over to GENERIC_CPU_DEVICES")
> >
> > Signed-off-by: James Morse <james.morse@arm.com>
> > Reviewed-by: Gavin Shan <gshan@redhat.com>
> > Tested-by: Miguel Luis <miguel.luis@oracle.com>
> > Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
> > Tested-by: Jianyong Wu <jianyong.wu@arm.com>
> > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Joanthan Cameron <Jonathan.Cameron@huawei.com>
> > ---
> > v5: Update commit message to make it clear this is moving the
> >     init back to where it was until very recently.
> >
> >     No longer change the condition in the earlier registration point
> >     as that will be handled by the arm64 registration routine
> >     deferring until called again here.
> > ---
> >  drivers/acpi/acpi_processor.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > index 93e029403d05..c78398cdd060 100644
> > --- a/drivers/acpi/acpi_processor.c
> > +++ b/drivers/acpi/acpi_processor.c
> > @@ -317,6 +317,18 @@ static int acpi_processor_get_info(struct acpi_device *device)
> >
> >         c = &per_cpu(cpu_devices, pr->id);
> >         ACPI_COMPANION_SET(&c->dev, device);
> > +       /*
> > +        * Register CPUs that are present. get_cpu_device() is used to skip
> > +        * duplicate CPU descriptions from firmware.
> > +        */
> > +       if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
> > +           !get_cpu_device(pr->id)) {
> > +               int ret = arch_register_cpu(pr->id);
> > +
> > +               if (ret)
> > +                       return ret;
> > +       }
> > +
> >         /*
> >          *  Extra Processor objects may be enumerated on MP systems with
> >          *  less than the max # of CPUs. They should be ignored _iff
> > --
> 
> I am still unsure why there need to be two paths calling
> arch_register_cpu() in acpi_processor_get_info().
> 
> Just below the comment partially pulled into the patch context above,
> there is this code:
> 
> if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
>          int ret = acpi_processor_hotadd_init(pr);
> 
>         if (ret)
>                 return ret;
> }
> 
> For the sake of the argument, fold acpi_processor_hotadd_init() into
> it and drop the redundant _STA check from it:
> 
> if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
>         if (invalid_phys_cpuid(pr->phys_id))
>                 return -ENODEV;
> 
>         cpu_maps_update_begin();
>         cpus_write_lock();
> 
>        ret = acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);
>        if (ret) {
>                 cpus_write_unlock();
>                 cpu_maps_update_done();
>                 return ret;
>        }
>        ret = arch_register_cpu(pr->id);
>        if (ret) {
>                 acpi_unmap_cpu(pr->id);
> 
>                 cpus_write_unlock();
>                 cpu_maps_update_done();
>                 return ret;
>        }
>       pr_info("CPU%d has been hot-added\n", pr->id);
>       pr->flags.need_hotplug_init = 1;
> 
>       cpus_write_unlock();
>       cpu_maps_update_done();
> }
> 
> so I'm not sure why this cannot be combined with the new code.
> 
> Say acpi_map_cpu) / acpi_unmap_cpu() are turned into arch calls.
> What's the difference then?  The locking, which should be fine if I'm
> not mistaken and need_hotplug_init that needs to be set if this code
> runs after the processor driver has loaded AFAICS.

It is over this that I walked away from progressing this code, because
I don't think it's quite as simple as you make it out to be.

Yes, acpi_map_cpu() and acpi_unmap_cpu() are already arch implemented
functions, so Arm64 can easily provide stubs for these that do nothing.
That never caused me any concern.

What does cause me great concern though are the finer details. For
example, above you seem to drop the evaluation of _STA for the
"make_present" case - I've no idea whether that is something that
should be deleted or not (if it is something that can be deleted,
then why not delete it now?)

As for the cpu locking, I couldn't find anything in arch_register_cpu()
that depends on the cpu_maps_update stuff nor needs the cpus_write_lock
being taken - so I've no idea why the "make_present" case takes these
locks.

Finally, the "pr->flags.need_hotplug_init = 1" thing... it's not
obvious that this is required - remember that with Arm64's "enabled"
toggling, the "processor" is a slice of the system and doesn't
actually go away - it's just "not enabled" for use.

Again, as "processors" in Arm64 are slices of the system, they have
to be fully described in ACPI before the OS boots, and they will be
marked as being "present", which means they will be enumerated, and
the driver will be probed. Any processor that is not to be used will
not have its enabled bit set. It is my understanding that every
processor will result in the ACPI processor driver being bound to it
whether its enabled or not.

The difference between real hotplug and Arm64 hotplug is that real
hotplug makes stuff not-present (and thus unenumerable). Arm64 hotplug
makes stuff not-enabled which is still enumerable.

... or at least that is my understanding which may not be entirely
correct (which is why I stepped down because I feel totally out of
my depth with ACPI stuff.)
Thomas Gleixner April 12, 2024, 8:54 p.m. UTC | #4
On Fri, Apr 12 2024 at 21:16, Russell King (Oracle) wrote:
> On Fri, Apr 12, 2024 at 08:30:40PM +0200, Rafael J. Wysocki wrote:
>> Say acpi_map_cpu) / acpi_unmap_cpu() are turned into arch calls.
>> What's the difference then?  The locking, which should be fine if I'm
>> not mistaken and need_hotplug_init that needs to be set if this code
>> runs after the processor driver has loaded AFAICS.
>
> It is over this that I walked away from progressing this code, because
> I don't think it's quite as simple as you make it out to be.
>
> Yes, acpi_map_cpu() and acpi_unmap_cpu() are already arch implemented
> functions, so Arm64 can easily provide stubs for these that do nothing.
> That never caused me any concern.
>
> What does cause me great concern though are the finer details. For
> example, above you seem to drop the evaluation of _STA for the
> "make_present" case - I've no idea whether that is something that
> should be deleted or not (if it is something that can be deleted,
> then why not delete it now?)
>
> As for the cpu locking, I couldn't find anything in arch_register_cpu()
> that depends on the cpu_maps_update stuff nor needs the cpus_write_lock
> being taken - so I've no idea why the "make_present" case takes these
> locks.

Anything which updates a CPU mask, e.g. cpu_present_mask, after early
boot must hold the appropriate write locks. Otherwise it would be
possible to online a CPU which just got marked present, but the
registration has not completed yet.

> Finally, the "pr->flags.need_hotplug_init = 1" thing... it's not
> obvious that this is required - remember that with Arm64's "enabled"
> toggling, the "processor" is a slice of the system and doesn't
> actually go away - it's just "not enabled" for use.
>
> Again, as "processors" in Arm64 are slices of the system, they have
> to be fully described in ACPI before the OS boots, and they will be
> marked as being "present", which means they will be enumerated, and
> the driver will be probed. Any processor that is not to be used will
> not have its enabled bit set. It is my understanding that every
> processor will result in the ACPI processor driver being bound to it
> whether its enabled or not.
>
> The difference between real hotplug and Arm64 hotplug is that real
> hotplug makes stuff not-present (and thus unenumerable). Arm64 hotplug
> makes stuff not-enabled which is still enumerable.

Define "real hotplug" :)

Real physical hotplug does not really exist. That's at least true for
x86, where the physical hotplug support was chased for a while, but
never ended up in production.

Though virtualization happily jumped on it to hot add/remove CPUs
to/from a guest.

There are limitations to this and we learned it the hard way on X86. At
the end we came up with the following restrictions:

    1) All possible CPUs have to be advertised at boot time via firmware
       (ACPI/DT/whatever) independent of them being present at boot time
       or not.

       That guarantees proper sizing and ensures that associations
       between hardware entities and software representations and the
       resulting topology are stable for the lifetime of a system.

       It is really required to know the full topology of the system at
       boot time especially with hybrid CPUs where some of the cores
       have hyperthreading and the others do not.


    2) Hot add can only mark an already registered (possible) CPU
       present. Adding non-registered CPUs after boot is not possible.

       The CPU must have been registered in #1 already to ensure that
       the system topology does not suddenly change in an incompatible
       way at run-time.

The same restriction would apply to real physical hotplug. I don't think
that's any different for ARM64 or any other architecture.

Hope that helps.

Thanks,

        tglx
Russell King (Oracle) April 12, 2024, 9:52 p.m. UTC | #5
On Fri, Apr 12, 2024 at 10:54:32PM +0200, Thomas Gleixner wrote:
> On Fri, Apr 12 2024 at 21:16, Russell King (Oracle) wrote:
> > On Fri, Apr 12, 2024 at 08:30:40PM +0200, Rafael J. Wysocki wrote:
> >> Say acpi_map_cpu) / acpi_unmap_cpu() are turned into arch calls.
> >> What's the difference then?  The locking, which should be fine if I'm
> >> not mistaken and need_hotplug_init that needs to be set if this code
> >> runs after the processor driver has loaded AFAICS.
> >
> > It is over this that I walked away from progressing this code, because
> > I don't think it's quite as simple as you make it out to be.
> >
> > Yes, acpi_map_cpu() and acpi_unmap_cpu() are already arch implemented
> > functions, so Arm64 can easily provide stubs for these that do nothing.
> > That never caused me any concern.
> >
> > What does cause me great concern though are the finer details. For
> > example, above you seem to drop the evaluation of _STA for the
> > "make_present" case - I've no idea whether that is something that
> > should be deleted or not (if it is something that can be deleted,
> > then why not delete it now?)
> >
> > As for the cpu locking, I couldn't find anything in arch_register_cpu()
> > that depends on the cpu_maps_update stuff nor needs the cpus_write_lock
> > being taken - so I've no idea why the "make_present" case takes these
> > locks.
> 
> Anything which updates a CPU mask, e.g. cpu_present_mask, after early
> boot must hold the appropriate write locks. Otherwise it would be
> possible to online a CPU which just got marked present, but the
> registration has not completed yet.

Yes. As far as I've been able to determine, arch_register_cpu()
doesn't manipulate any of the CPU masks. All it seems to be doing
is initialising the struct cpu, registering the embedded struct
device, and setting up the sysfs links to its NUMA node.

There is nothing obvious in there which manipulates any CPU masks, and
this is rather my fundamental point when I said "I couldn't find
anything in arch_register_cpu() that depends on ...".

If there is something, then comments in the code would be a useful aid
because it's highly non-obvious where such a manipulation is located,
and hence why the locks are necessary.

> > Finally, the "pr->flags.need_hotplug_init = 1" thing... it's not
> > obvious that this is required - remember that with Arm64's "enabled"
> > toggling, the "processor" is a slice of the system and doesn't
> > actually go away - it's just "not enabled" for use.
> >
> > Again, as "processors" in Arm64 are slices of the system, they have
> > to be fully described in ACPI before the OS boots, and they will be
> > marked as being "present", which means they will be enumerated, and
> > the driver will be probed. Any processor that is not to be used will
> > not have its enabled bit set. It is my understanding that every
> > processor will result in the ACPI processor driver being bound to it
> > whether its enabled or not.
> >
> > The difference between real hotplug and Arm64 hotplug is that real
> > hotplug makes stuff not-present (and thus unenumerable). Arm64 hotplug
> > makes stuff not-enabled which is still enumerable.
> 
> Define "real hotplug" :)
> 
> Real physical hotplug does not really exist. That's at least true for
> x86, where the physical hotplug support was chased for a while, but
> never ended up in production.
> 
> Though virtualization happily jumped on it to hot add/remove CPUs
> to/from a guest.
> 
> There are limitations to this and we learned it the hard way on X86. At
> the end we came up with the following restrictions:
> 
>     1) All possible CPUs have to be advertised at boot time via firmware
>        (ACPI/DT/whatever) independent of them being present at boot time
>        or not.
> 
>        That guarantees proper sizing and ensures that associations
>        between hardware entities and software representations and the
>        resulting topology are stable for the lifetime of a system.
> 
>        It is really required to know the full topology of the system at
>        boot time especially with hybrid CPUs where some of the cores
>        have hyperthreading and the others do not.
> 
> 
>     2) Hot add can only mark an already registered (possible) CPU
>        present. Adding non-registered CPUs after boot is not possible.
> 
>        The CPU must have been registered in #1 already to ensure that
>        the system topology does not suddenly change in an incompatible
>        way at run-time.
> 
> The same restriction would apply to real physical hotplug. I don't think
> that's any different for ARM64 or any other architecture.

This makes me wonder whether the Arm64 has been barking up the wrong
tree then, and whether the whole "present" vs "enabled" thing comes
from a misunderstanding as far as a CPU goes.

However, there is a big difference between the two. On x86, a processor
is just a processor. On Arm64, a "processor" is a slice of the system
(includes the interrupt controller, PMUs etc) and we must enumerate
those even when the processor itself is not enabled. This is the whole
reason there's a difference between "present" and "enabled" and why
there's a difference between x86 cpu hotplug and arm64 cpu hotplug.
The processor never actually goes away in arm64, it's just prevented
from being used.
Thomas Gleixner April 12, 2024, 11:23 p.m. UTC | #6
Russell!

On Fri, Apr 12 2024 at 22:52, Russell King (Oracle) wrote:
> On Fri, Apr 12, 2024 at 10:54:32PM +0200, Thomas Gleixner wrote:
>> > As for the cpu locking, I couldn't find anything in arch_register_cpu()
>> > that depends on the cpu_maps_update stuff nor needs the cpus_write_lock
>> > being taken - so I've no idea why the "make_present" case takes these
>> > locks.
>> 
>> Anything which updates a CPU mask, e.g. cpu_present_mask, after early
>> boot must hold the appropriate write locks. Otherwise it would be
>> possible to online a CPU which just got marked present, but the
>> registration has not completed yet.
>
> Yes. As far as I've been able to determine, arch_register_cpu()
> doesn't manipulate any of the CPU masks. All it seems to be doing
> is initialising the struct cpu, registering the embedded struct
> device, and setting up the sysfs links to its NUMA node.
>
> There is nothing obvious in there which manipulates any CPU masks, and
> this is rather my fundamental point when I said "I couldn't find
> anything in arch_register_cpu() that depends on ...".
>
> If there is something, then comments in the code would be a useful aid
> because it's highly non-obvious where such a manipulation is located,
> and hence why the locks are necessary.

acpi_processor_hotadd_init()
...
         acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);

That ends up in fiddling with cpu_present_mask.

I grant you that arch_register_cpu() is not, but it might rely on the
external locking too. I could not be bothered to figure that out.

>> Define "real hotplug" :)
>> 
>> Real physical hotplug does not really exist. That's at least true for
>> x86, where the physical hotplug support was chased for a while, but
>> never ended up in production.
>> 
>> Though virtualization happily jumped on it to hot add/remove CPUs
>> to/from a guest.
>> 
>> There are limitations to this and we learned it the hard way on X86. At
>> the end we came up with the following restrictions:
>> 
>>     1) All possible CPUs have to be advertised at boot time via firmware
>>        (ACPI/DT/whatever) independent of them being present at boot time
>>        or not.
>> 
>>        That guarantees proper sizing and ensures that associations
>>        between hardware entities and software representations and the
>>        resulting topology are stable for the lifetime of a system.
>> 
>>        It is really required to know the full topology of the system at
>>        boot time especially with hybrid CPUs where some of the cores
>>        have hyperthreading and the others do not.
>> 
>> 
>>     2) Hot add can only mark an already registered (possible) CPU
>>        present. Adding non-registered CPUs after boot is not possible.
>> 
>>        The CPU must have been registered in #1 already to ensure that
>>        the system topology does not suddenly change in an incompatible
>>        way at run-time.
>> 
>> The same restriction would apply to real physical hotplug. I don't think
>> that's any different for ARM64 or any other architecture.
>
> This makes me wonder whether the Arm64 has been barking up the wrong
> tree then, and whether the whole "present" vs "enabled" thing comes
> from a misunderstanding as far as a CPU goes.
>
> However, there is a big difference between the two. On x86, a processor
> is just a processor. On Arm64, a "processor" is a slice of the system
> (includes the interrupt controller, PMUs etc) and we must enumerate
> those even when the processor itself is not enabled. This is the whole
> reason there's a difference between "present" and "enabled" and why
> there's a difference between x86 cpu hotplug and arm64 cpu hotplug.
> The processor never actually goes away in arm64, it's just prevented
> from being used.

It's the same on X86 at least in the physical world.

Thanks,

        tglx
Jonathan Cameron April 15, 2024, 8:45 a.m. UTC | #7
On Sat, 13 Apr 2024 01:23:48 +0200
Thomas Gleixner <tglx@linutronix.de> wrote:

> Russell!
> 
> On Fri, Apr 12 2024 at 22:52, Russell King (Oracle) wrote:
> > On Fri, Apr 12, 2024 at 10:54:32PM +0200, Thomas Gleixner wrote:  
> >> > As for the cpu locking, I couldn't find anything in arch_register_cpu()
> >> > that depends on the cpu_maps_update stuff nor needs the cpus_write_lock
> >> > being taken - so I've no idea why the "make_present" case takes these
> >> > locks.  
> >> 
> >> Anything which updates a CPU mask, e.g. cpu_present_mask, after early
> >> boot must hold the appropriate write locks. Otherwise it would be
> >> possible to online a CPU which just got marked present, but the
> >> registration has not completed yet.  
> >
> > Yes. As far as I've been able to determine, arch_register_cpu()
> > doesn't manipulate any of the CPU masks. All it seems to be doing
> > is initialising the struct cpu, registering the embedded struct
> > device, and setting up the sysfs links to its NUMA node.
> >
> > There is nothing obvious in there which manipulates any CPU masks, and
> > this is rather my fundamental point when I said "I couldn't find
> > anything in arch_register_cpu() that depends on ...".
> >
> > If there is something, then comments in the code would be a useful aid
> > because it's highly non-obvious where such a manipulation is located,
> > and hence why the locks are necessary.  
> 
> acpi_processor_hotadd_init()
> ...
>          acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);
> 
> That ends up in fiddling with cpu_present_mask.
> 
> I grant you that arch_register_cpu() is not, but it might rely on the
> external locking too. I could not be bothered to figure that out.
> 
> >> Define "real hotplug" :)
> >> 
> >> Real physical hotplug does not really exist. That's at least true for
> >> x86, where the physical hotplug support was chased for a while, but
> >> never ended up in production.
> >> 
> >> Though virtualization happily jumped on it to hot add/remove CPUs
> >> to/from a guest.
> >> 
> >> There are limitations to this and we learned it the hard way on X86. At
> >> the end we came up with the following restrictions:
> >> 
> >>     1) All possible CPUs have to be advertised at boot time via firmware
> >>        (ACPI/DT/whatever) independent of them being present at boot time
> >>        or not.
> >> 
> >>        That guarantees proper sizing and ensures that associations
> >>        between hardware entities and software representations and the
> >>        resulting topology are stable for the lifetime of a system.
> >> 
> >>        It is really required to know the full topology of the system at
> >>        boot time especially with hybrid CPUs where some of the cores
> >>        have hyperthreading and the others do not.
> >> 
> >> 
> >>     2) Hot add can only mark an already registered (possible) CPU
> >>        present. Adding non-registered CPUs after boot is not possible.
> >> 
> >>        The CPU must have been registered in #1 already to ensure that
> >>        the system topology does not suddenly change in an incompatible
> >>        way at run-time.
> >> 
> >> The same restriction would apply to real physical hotplug. I don't think
> >> that's any different for ARM64 or any other architecture.  
> >
> > This makes me wonder whether the Arm64 has been barking up the wrong
> > tree then, and whether the whole "present" vs "enabled" thing comes
> > from a misunderstanding as far as a CPU goes.
> >
> > However, there is a big difference between the two. On x86, a processor
> > is just a processor. On Arm64, a "processor" is a slice of the system
> > (includes the interrupt controller, PMUs etc) and we must enumerate
> > those even when the processor itself is not enabled. This is the whole
> > reason there's a difference between "present" and "enabled" and why
> > there's a difference between x86 cpu hotplug and arm64 cpu hotplug.
> > The processor never actually goes away in arm64, it's just prevented
> > from being used.  
> 
> It's the same on X86 at least in the physical world.

There were public calls on this via the Linaro Open Discussions group,
so I can talk a little about how we ended up here.  Note that (in my
opinion) there is zero chance of this changing - it took us well over
a year to get to this conclusion.  So if we ever want ARM vCPU HP
we need to work within these constraints. 

The ARM architecture folk (the ones defining the ARM ARM, relevant ACPI
specs etc, not the kernel maintainers) are determined that they want
to retain the option to do real physical CPU hotplug in the future
with all the necessary work around dynamic interrupt controller
initialization, debug and many other messy corners.

Thus anything defined had to be structured in a way that was 'different'
from that.

I don't mind the proposed flattening of the 2 paths if the ARM kernel
maintainers are fine with it but it will remove the distinctions and
we will need to be very careful with the CPU masks - we can't handle
them the same as x86 does.

I'll get on with doing that, but do need input from Will / Catalin / James.
There are some quirks that need calling out as it's not quite a simple
as it appears from a high level.

Another part of that long discussion established that there is userspace
(Android IIRC) in which the CPU present mask must include all CPUs
at boot. To change that would be userspace ABI breakage so we can't
do that.  Hence the dance around adding yet another mask to allow the
OS to understand which CPUs are 'present' but not possible to online.

Flattening the two paths removes any distinction between calls that
are for real hotplug and those that are for this online capable path.
As a side note, the indicating bit for these flows is defined in ACPI
for x86 from ACPI 6.3 as a flag in Processor Local APIC
(the ARM64 definition is a cut and paste of that text).  So someone
is interested in this distinction on x86. I can't say who but if
you have a mantis account you can easily follow the history and it
might be instructive to not everyone considering the current x86
flow the right way to do it.

Jonathan


> 
> Thanks,
> 
>         tglx
>
Jonathan Cameron April 15, 2024, 9:16 a.m. UTC | #8
On Mon, 15 Apr 2024 09:45:52 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Sat, 13 Apr 2024 01:23:48 +0200
> Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > Russell!
> > 
> > On Fri, Apr 12 2024 at 22:52, Russell King (Oracle) wrote:  
> > > On Fri, Apr 12, 2024 at 10:54:32PM +0200, Thomas Gleixner wrote:    
> > >> > As for the cpu locking, I couldn't find anything in arch_register_cpu()
> > >> > that depends on the cpu_maps_update stuff nor needs the cpus_write_lock
> > >> > being taken - so I've no idea why the "make_present" case takes these
> > >> > locks.    
> > >> 
> > >> Anything which updates a CPU mask, e.g. cpu_present_mask, after early
> > >> boot must hold the appropriate write locks. Otherwise it would be
> > >> possible to online a CPU which just got marked present, but the
> > >> registration has not completed yet.    
> > >
> > > Yes. As far as I've been able to determine, arch_register_cpu()
> > > doesn't manipulate any of the CPU masks. All it seems to be doing
> > > is initialising the struct cpu, registering the embedded struct
> > > device, and setting up the sysfs links to its NUMA node.
> > >
> > > There is nothing obvious in there which manipulates any CPU masks, and
> > > this is rather my fundamental point when I said "I couldn't find
> > > anything in arch_register_cpu() that depends on ...".
> > >
> > > If there is something, then comments in the code would be a useful aid
> > > because it's highly non-obvious where such a manipulation is located,
> > > and hence why the locks are necessary.    
> > 
> > acpi_processor_hotadd_init()
> > ...
> >          acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);
> > 
> > That ends up in fiddling with cpu_present_mask.
> > 
> > I grant you that arch_register_cpu() is not, but it might rely on the
> > external locking too. I could not be bothered to figure that out.
> >   
> > >> Define "real hotplug" :)
> > >> 
> > >> Real physical hotplug does not really exist. That's at least true for
> > >> x86, where the physical hotplug support was chased for a while, but
> > >> never ended up in production.
> > >> 
> > >> Though virtualization happily jumped on it to hot add/remove CPUs
> > >> to/from a guest.
> > >> 
> > >> There are limitations to this and we learned it the hard way on X86. At
> > >> the end we came up with the following restrictions:
> > >> 
> > >>     1) All possible CPUs have to be advertised at boot time via firmware
> > >>        (ACPI/DT/whatever) independent of them being present at boot time
> > >>        or not.
> > >> 
> > >>        That guarantees proper sizing and ensures that associations
> > >>        between hardware entities and software representations and the
> > >>        resulting topology are stable for the lifetime of a system.
> > >> 
> > >>        It is really required to know the full topology of the system at
> > >>        boot time especially with hybrid CPUs where some of the cores
> > >>        have hyperthreading and the others do not.
> > >> 
> > >> 
> > >>     2) Hot add can only mark an already registered (possible) CPU
> > >>        present. Adding non-registered CPUs after boot is not possible.
> > >> 
> > >>        The CPU must have been registered in #1 already to ensure that
> > >>        the system topology does not suddenly change in an incompatible
> > >>        way at run-time.
> > >> 
> > >> The same restriction would apply to real physical hotplug. I don't think
> > >> that's any different for ARM64 or any other architecture.    
> > >
> > > This makes me wonder whether the Arm64 has been barking up the wrong
> > > tree then, and whether the whole "present" vs "enabled" thing comes
> > > from a misunderstanding as far as a CPU goes.
> > >
> > > However, there is a big difference between the two. On x86, a processor
> > > is just a processor. On Arm64, a "processor" is a slice of the system
> > > (includes the interrupt controller, PMUs etc) and we must enumerate
> > > those even when the processor itself is not enabled. This is the whole
> > > reason there's a difference between "present" and "enabled" and why
> > > there's a difference between x86 cpu hotplug and arm64 cpu hotplug.
> > > The processor never actually goes away in arm64, it's just prevented
> > > from being used.    
> > 
> > It's the same on X86 at least in the physical world.  
> 
> There were public calls on this via the Linaro Open Discussions group,
> so I can talk a little about how we ended up here.  Note that (in my
> opinion) there is zero chance of this changing - it took us well over
> a year to get to this conclusion.  So if we ever want ARM vCPU HP
> we need to work within these constraints. 
> 
> The ARM architecture folk (the ones defining the ARM ARM, relevant ACPI
> specs etc, not the kernel maintainers) are determined that they want
> to retain the option to do real physical CPU hotplug in the future
> with all the necessary work around dynamic interrupt controller
> initialization, debug and many other messy corners.
> 
> Thus anything defined had to be structured in a way that was 'different'
> from that.
> 
> I don't mind the proposed flattening of the 2 paths if the ARM kernel
> maintainers are fine with it but it will remove the distinctions and
> we will need to be very careful with the CPU masks - we can't handle
> them the same as x86 does.
> 
> I'll get on with doing that, but do need input from Will / Catalin / James.
> There are some quirks that need calling out as it's not quite a simple
> as it appears from a high level.
> 
> Another part of that long discussion established that there is userspace
> (Android IIRC) in which the CPU present mask must include all CPUs
> at boot. To change that would be userspace ABI breakage so we can't
> do that.  Hence the dance around adding yet another mask to allow the
> OS to understand which CPUs are 'present' but not possible to online.
> 
> Flattening the two paths removes any distinction between calls that
> are for real hotplug and those that are for this online capable path.
> As a side note, the indicating bit for these flows is defined in ACPI
> for x86 from ACPI 6.3 as a flag in Processor Local APIC
> (the ARM64 definition is a cut and paste of that text).  So someone
> is interested in this distinction on x86. I can't say who but if
> you have a mantis account you can easily follow the history and it
> might be instructive to not everyone considering the current x86
> flow the right way to do it.

Would a higher level check to catch that we are hitting undefined
territory on arm64 be acceptable? That might satisfy the constraint
that we should not have any software for arm64 that would run if
physical CPU HP is added to the arch in future.  Something like:

@@ -331,6 +331,13 @@ static int acpi_processor_get_info(struct acpi_device *device)

        c = &per_cpu(cpu_devices, pr->id);
        ACPI_COMPANION_SET(&c->dev, device);
+
+       if (!IS_ENABLED(CONFIG_ACPI_CPU_HOTPLUG_CPU) &&
+           (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id))) {
+               pr_err_once("Changing CPU present bit is not supported\n");
+               return -ENODEV;
+       }
+

This is basically lifting the check out of the acpi_processor_make_present()
call in this patch set.

With that in place before the new shared call I think we should be fine
wrt to the ARM Architecture requirements.

Jonathan


        /*
> 
> Jonathan
> 
> 
> > 
> > Thanks,
> > 
> >         tglx
> >   
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Jonathan Cameron April 15, 2024, 9:31 a.m. UTC | #9
On Mon, 15 Apr 2024 10:16:37 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Mon, 15 Apr 2024 09:45:52 +0100
> Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> 
> > On Sat, 13 Apr 2024 01:23:48 +0200
> > Thomas Gleixner <tglx@linutronix.de> wrote:
> >   
> > > Russell!
> > > 
> > > On Fri, Apr 12 2024 at 22:52, Russell King (Oracle) wrote:    
> > > > On Fri, Apr 12, 2024 at 10:54:32PM +0200, Thomas Gleixner wrote:      
> > > >> > As for the cpu locking, I couldn't find anything in arch_register_cpu()
> > > >> > that depends on the cpu_maps_update stuff nor needs the cpus_write_lock
> > > >> > being taken - so I've no idea why the "make_present" case takes these
> > > >> > locks.      
> > > >> 
> > > >> Anything which updates a CPU mask, e.g. cpu_present_mask, after early
> > > >> boot must hold the appropriate write locks. Otherwise it would be
> > > >> possible to online a CPU which just got marked present, but the
> > > >> registration has not completed yet.      
> > > >
> > > > Yes. As far as I've been able to determine, arch_register_cpu()
> > > > doesn't manipulate any of the CPU masks. All it seems to be doing
> > > > is initialising the struct cpu, registering the embedded struct
> > > > device, and setting up the sysfs links to its NUMA node.
> > > >
> > > > There is nothing obvious in there which manipulates any CPU masks, and
> > > > this is rather my fundamental point when I said "I couldn't find
> > > > anything in arch_register_cpu() that depends on ...".
> > > >
> > > > If there is something, then comments in the code would be a useful aid
> > > > because it's highly non-obvious where such a manipulation is located,
> > > > and hence why the locks are necessary.      
> > > 
> > > acpi_processor_hotadd_init()
> > > ...
> > >          acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);
> > > 
> > > That ends up in fiddling with cpu_present_mask.
> > > 
> > > I grant you that arch_register_cpu() is not, but it might rely on the
> > > external locking too. I could not be bothered to figure that out.
> > >     
> > > >> Define "real hotplug" :)
> > > >> 
> > > >> Real physical hotplug does not really exist. That's at least true for
> > > >> x86, where the physical hotplug support was chased for a while, but
> > > >> never ended up in production.
> > > >> 
> > > >> Though virtualization happily jumped on it to hot add/remove CPUs
> > > >> to/from a guest.
> > > >> 
> > > >> There are limitations to this and we learned it the hard way on X86. At
> > > >> the end we came up with the following restrictions:
> > > >> 
> > > >>     1) All possible CPUs have to be advertised at boot time via firmware
> > > >>        (ACPI/DT/whatever) independent of them being present at boot time
> > > >>        or not.
> > > >> 
> > > >>        That guarantees proper sizing and ensures that associations
> > > >>        between hardware entities and software representations and the
> > > >>        resulting topology are stable for the lifetime of a system.
> > > >> 
> > > >>        It is really required to know the full topology of the system at
> > > >>        boot time especially with hybrid CPUs where some of the cores
> > > >>        have hyperthreading and the others do not.
> > > >> 
> > > >> 
> > > >>     2) Hot add can only mark an already registered (possible) CPU
> > > >>        present. Adding non-registered CPUs after boot is not possible.
> > > >> 
> > > >>        The CPU must have been registered in #1 already to ensure that
> > > >>        the system topology does not suddenly change in an incompatible
> > > >>        way at run-time.
> > > >> 
> > > >> The same restriction would apply to real physical hotplug. I don't think
> > > >> that's any different for ARM64 or any other architecture.      
> > > >
> > > > This makes me wonder whether the Arm64 has been barking up the wrong
> > > > tree then, and whether the whole "present" vs "enabled" thing comes
> > > > from a misunderstanding as far as a CPU goes.
> > > >
> > > > However, there is a big difference between the two. On x86, a processor
> > > > is just a processor. On Arm64, a "processor" is a slice of the system
> > > > (includes the interrupt controller, PMUs etc) and we must enumerate
> > > > those even when the processor itself is not enabled. This is the whole
> > > > reason there's a difference between "present" and "enabled" and why
> > > > there's a difference between x86 cpu hotplug and arm64 cpu hotplug.
> > > > The processor never actually goes away in arm64, it's just prevented
> > > > from being used.      
> > > 
> > > It's the same on X86 at least in the physical world.    
> > 
> > There were public calls on this via the Linaro Open Discussions group,
> > so I can talk a little about how we ended up here.  Note that (in my
> > opinion) there is zero chance of this changing - it took us well over
> > a year to get to this conclusion.  So if we ever want ARM vCPU HP
> > we need to work within these constraints. 
> > 
> > The ARM architecture folk (the ones defining the ARM ARM, relevant ACPI
> > specs etc, not the kernel maintainers) are determined that they want
> > to retain the option to do real physical CPU hotplug in the future
> > with all the necessary work around dynamic interrupt controller
> > initialization, debug and many other messy corners.
> > 
> > Thus anything defined had to be structured in a way that was 'different'
> > from that.
> > 
> > I don't mind the proposed flattening of the 2 paths if the ARM kernel
> > maintainers are fine with it but it will remove the distinctions and
> > we will need to be very careful with the CPU masks - we can't handle
> > them the same as x86 does.
> > 
> > I'll get on with doing that, but do need input from Will / Catalin / James.
> > There are some quirks that need calling out as it's not quite a simple
> > as it appears from a high level.
> > 
> > Another part of that long discussion established that there is userspace
> > (Android IIRC) in which the CPU present mask must include all CPUs
> > at boot. To change that would be userspace ABI breakage so we can't
> > do that.  Hence the dance around adding yet another mask to allow the
> > OS to understand which CPUs are 'present' but not possible to online.
> > 
> > Flattening the two paths removes any distinction between calls that
> > are for real hotplug and those that are for this online capable path.
> > As a side note, the indicating bit for these flows is defined in ACPI
> > for x86 from ACPI 6.3 as a flag in Processor Local APIC
> > (the ARM64 definition is a cut and paste of that text).  So someone
> > is interested in this distinction on x86. I can't say who but if
> > you have a mantis account you can easily follow the history and it
> > might be instructive to not everyone considering the current x86
> > flow the right way to do it.  
> 
> Would a higher level check to catch that we are hitting undefined
> territory on arm64 be acceptable? That might satisfy the constraint
> that we should not have any software for arm64 that would run if
> physical CPU HP is added to the arch in future.  Something like:
> 
> @@ -331,6 +331,13 @@ static int acpi_processor_get_info(struct acpi_device *device)
> 
>         c = &per_cpu(cpu_devices, pr->id);
>         ACPI_COMPANION_SET(&c->dev, device);
> +
> +       if (!IS_ENABLED(CONFIG_ACPI_CPU_HOTPLUG_CPU) &&
> +           (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id))) {
> +               pr_err_once("Changing CPU present bit is not supported\n");
> +               return -ENODEV;
> +       }
> +
> 
> This is basically lifting the check out of the acpi_processor_make_present()
> call in this patch set.
> 
> With that in place before the new shared call I think we should be fine
> wrt to the ARM Architecture requirements.
> 

Thomas, one result of using the same code in both paths is that we 
end up calling acpi_map_cpu() in paths on x86 that aren't under CONFIG_ACPI_HOTPLUG_CPU
any more.   If anyone ever implements the x86 version of online capable, this
might be valid.

For now I've dropped the CONFIG_ACPI_HOTPLUG_CPU guard in arch/x86/kernel/acpi/boot.c
but is that the right thing to do or should we stub out with an error return for
now?

> Jonathan
> 
> 
>         /*
> > 
> > Jonathan
> > 
> >   
> > > 
> > > Thanks,
> > > 
> > >         tglx
> > >     
> > 
> > 
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel  
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Jonathan Cameron April 15, 2024, 10:52 a.m. UTC | #10
On Fri, 12 Apr 2024 20:30:40 +0200
"Rafael J. Wysocki" <rafael@kernel.org> wrote:

> On Fri, Apr 12, 2024 at 4:38 PM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > From: James Morse <james.morse@arm.com>
> >
> > The arm64 specific arch_register_cpu() call may defer CPU registration
> > until the ACPI interpreter is available and the _STA method can
> > be evaluated.
> >
> > If this occurs, then a second attempt is made in
> > acpi_processor_get_info(). Note that the arm64 specific call has
> > not yet been added so for now this will never be successfully
> > called.
> >
> > Systems can still be booted with 'acpi=off', or not include an
> > ACPI description at all as in these cases arch_register_cpu()
> > will not have deferred registration when first called.
> >
> > This moves the CPU register logic back to a subsys_initcall(),
> > while the memory nodes will have been registered earlier.
> > Note this is where the call was prior to the cleanup series so
> > there should be no side effects of moving it back again for this
> > specific case.
> >
> > [PATCH 00/21] Initial cleanups for vCPU HP.
> > https://lore.kernel.org/all/ZVyz%2FVe5pPu8AWoA@shell.armlinux.org.uk/
> >
> > e.g. 5b95f94c3b9f ("x86/topology: Switch over to GENERIC_CPU_DEVICES")
> >
> > Signed-off-by: James Morse <james.morse@arm.com>
> > Reviewed-by: Gavin Shan <gshan@redhat.com>
> > Tested-by: Miguel Luis <miguel.luis@oracle.com>
> > Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
> > Tested-by: Jianyong Wu <jianyong.wu@arm.com>
> > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Joanthan Cameron <Jonathan.Cameron@huawei.com>
> > ---
> > v5: Update commit message to make it clear this is moving the
> >     init back to where it was until very recently.
> >
> >     No longer change the condition in the earlier registration point
> >     as that will be handled by the arm64 registration routine
> >     deferring until called again here.
> > ---
> >  drivers/acpi/acpi_processor.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > index 93e029403d05..c78398cdd060 100644
> > --- a/drivers/acpi/acpi_processor.c
> > +++ b/drivers/acpi/acpi_processor.c
> > @@ -317,6 +317,18 @@ static int acpi_processor_get_info(struct acpi_device *device)
> >
> >         c = &per_cpu(cpu_devices, pr->id);
> >         ACPI_COMPANION_SET(&c->dev, device);
> > +       /*
> > +        * Register CPUs that are present. get_cpu_device() is used to skip
> > +        * duplicate CPU descriptions from firmware.
> > +        */
> > +       if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
> > +           !get_cpu_device(pr->id)) {
> > +               int ret = arch_register_cpu(pr->id);
> > +
> > +               if (ret)
> > +                       return ret;
> > +       }
> > +
> >         /*
> >          *  Extra Processor objects may be enumerated on MP systems with
> >          *  less than the max # of CPUs. They should be ignored _iff
> > --  
> 
> I am still unsure why there need to be two paths calling
> arch_register_cpu() in acpi_processor_get_info().

I replied further down the thread, but the key point was to maintain
the strong distinction between 'what' was done in a real hotplug
path vs one where onlining was all.  We can relax that but it goes
contrary to the careful dance that was needed to get any agreement
to the ARM architecture aspects of this.

> 
> Just below the comment partially pulled into the patch context above,
> there is this code:
> 
> if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
>          int ret = acpi_processor_hotadd_init(pr);
> 
>         if (ret)
>                 return ret;
> }
> 
> For the sake of the argument, fold acpi_processor_hotadd_init() into
> it and drop the redundant _STA check from it:

If we combine these, the _STA check is necessary because we will call this
path for delayed onlining of ARM64 CPUs (if the earlier registration code
call or arch_register_cpu() returned -EPROBE defer). That's the only way
we know that a given CPU is online capable but firmware is saying we can't
bring it online yet (it may be be vHP later).

> 
> if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
>         if (invalid_phys_cpuid(pr->phys_id))
>                 return -ENODEV;
> 
>         cpu_maps_update_begin();
>         cpus_write_lock();
> 
>        ret = acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);

I read that call as
	acpi_map_cpu_for_physical_cpu_hotplug()
but we could make it equivalent of.
	acpi_map_cpu_for_whatever_cpu_hotplug()
(I'm not proposing those names though ;)

in which case it is fine to just stub it out on ARM64.
>        if (ret) {
>                 cpus_write_unlock();
>                 cpu_maps_update_done();
>                 return ret;
>        }
>        ret = arch_register_cpu(pr->id);
>        if (ret) {
>                 acpi_unmap_cpu(pr->id);
> 
>                 cpus_write_unlock();
>                 cpu_maps_update_done();
>                 return ret;
>        }
>       pr_info("CPU%d has been hot-added\n", pr->id);
>       pr->flags.need_hotplug_init = 1;
This one needs more careful handling because we are calling this
for non hotplug cases on arm64 in which case we end up setting this
for initially online CPUs - thus if we offline and online them
again via sysfs /sys/bus/cpu/device/cpuX/online it goes through the
hotplug path and should not.

So I need a way to detect if we are hotplugging the cpu or not.
Is there a standard way to do this?  I haven't figured out how
to use flags in drivers to communicate this state.

> 
>       cpus_write_unlock();
>       cpu_maps_update_done();
> }
> 
> so I'm not sure why this cannot be combined with the new code.
> 
> Say acpi_map_cpu) / acpi_unmap_cpu() are turned into arch calls.
> What's the difference then?  The locking, which should be fine if I'm
> not mistaken and need_hotplug_init that needs to be set if this code
> runs after the processor driver has loaded AFAICS.

That's the bit that I'm currently finding a challenge. Is there a clean
way to detect that?

Jonathan
Salil Mehta April 15, 2024, 11:07 a.m. UTC | #11
Hello,

Engaging after a long time (I've been almost off grid due to very challenging personal circumstances).
Please find my response inline.

Thanks
Salil

>  From: Rafael J. Wysocki <rafael@kernel.org>
>  Sent: Friday, April 12, 2024 7:31 PM
>  To: Jonathan Cameron <jonathan.cameron@huawei.com>
>  Cc: linux-pm@vger.kernel.org; loongarch@lists.linux.dev; linux-
>  acpi@vger.kernel.org; linux-arch@vger.kernel.org; linux-
>  kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
>  kvmarm@lists.linux.dev; x86@kernel.org; Russell King
>  <linux@armlinux.org.uk>; Rafael J . Wysocki <rafael@kernel.org>; Miguel
>  Luis <miguel.luis@oracle.com>; James Morse <james.morse@arm.com>;
>  Salil Mehta <salil.mehta@huawei.com>; Jean-Philippe Brucker <jean-
>  philippe@linaro.org>; Catalin Marinas <catalin.marinas@arm.com>; Will
>  Deacon <will@kernel.org>; Linuxarm <linuxarm@huawei.com>;
>  justin.he@arm.com; jianyong.wu@arm.com
>  Subject: Re: [PATCH v5 03/18] ACPI: processor: Register deferred CPUs from
>  acpi_processor_get_info()
>  
>  On Fri, Apr 12, 2024 at 4:38 PM Jonathan Cameron
>  <Jonathan.Cameron@huawei.com> wrote:
>  >
>  > From: James Morse <james.morse@arm.com>
>  >
>  > The arm64 specific arch_register_cpu() call may defer CPU registration
>  > until the ACPI interpreter is available and the _STA method can be
>  > evaluated.
>  >
>  > If this occurs, then a second attempt is made in
>  > acpi_processor_get_info(). Note that the arm64 specific call has not
>  > yet been added so for now this will never be successfully called.
>  >
>  > Systems can still be booted with 'acpi=off', or not include an ACPI
>  > description at all as in these cases arch_register_cpu() will not have
>  > deferred registration when first called.
>  >
>  > This moves the CPU register logic back to a subsys_initcall(), while
>  > the memory nodes will have been registered earlier.
>  > Note this is where the call was prior to the cleanup series so there
>  > should be no side effects of moving it back again for this specific
>  > case.
>  >
>  > [PATCH 00/21] Initial cleanups for vCPU HP.
>  >
>  https://lore.kernel.org/all/ZVyz%2FVe5pPu8AWoA@shell.armlinux.org.uk/
>  >
>  > e.g. 5b95f94c3b9f ("x86/topology: Switch over to
>  GENERIC_CPU_DEVICES")
>  >
>  > Signed-off-by: James Morse <james.morse@arm.com>
>  > Reviewed-by: Gavin Shan <gshan@redhat.com>
>  > Tested-by: Miguel Luis <miguel.luis@oracle.com>
>  > Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
>  > Tested-by: Jianyong Wu <jianyong.wu@arm.com>
>  > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
>  > Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>  > Signed-off-by: Joanthan Cameron <Jonathan.Cameron@huawei.com>
>  > ---
>  > v5: Update commit message to make it clear this is moving the
>  >     init back to where it was until very recently.
>  >
>  >     No longer change the condition in the earlier registration point
>  >     as that will be handled by the arm64 registration routine
>  >     deferring until called again here.
>  > ---
>  >  drivers/acpi/acpi_processor.c | 12 ++++++++++++
>  >  1 file changed, 12 insertions(+)
>  >
>  > diff --git a/drivers/acpi/acpi_processor.c
>  > b/drivers/acpi/acpi_processor.c index 93e029403d05..c78398cdd060
>  > 100644
>  > --- a/drivers/acpi/acpi_processor.c
>  > +++ b/drivers/acpi/acpi_processor.c
>  > @@ -317,6 +317,18 @@ static int acpi_processor_get_info(struct
>  > acpi_device *device)
>  >
>  >         c = &per_cpu(cpu_devices, pr->id);
>  >         ACPI_COMPANION_SET(&c->dev, device);
>  > +       /*
>  > +        * Register CPUs that are present. get_cpu_device() is used to skip
>  > +        * duplicate CPU descriptions from firmware.
>  > +        */
>  > +       if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
>  > +           !get_cpu_device(pr->id)) {
>  > +               int ret = arch_register_cpu(pr->id);
>  > +
>  > +               if (ret)
>  > +                       return ret;
>  > +       }
>  > +
>  >         /*
>  >          *  Extra Processor objects may be enumerated on MP systems with
>  >          *  less than the max # of CPUs. They should be ignored _iff
>  > --
>  
>  I am still unsure why there need to be two paths calling
>  arch_register_cpu() in acpi_processor_get_info().


This is because all CPUs are expected to be 'present' during the boot time for ARM64 arch.
This is not true for x86 world i.e. the logical_cpuid could  be invalid (and present mask not
set) for the x86 arch during the boot time.  Faking the 'present' behavior at the virtualizer
level for ARM is like interfering with the architecture and then tweaking the kernel to fit
that unauthorized hack. This has a potential to break the existing and future version of the
ARM arch. (Between, I'm one of the initial offender of doing that but later corrected the
approach after many discussions and KVM Forum presentations along with ARM )

Therefore, in ARM we keep all the processor as present and just use _STA enabled bit to
decide the online'ing of the processor and this requires a separate handling.


>  
>  Just below the comment partially pulled into the patch context above, there
>  is this code:
>  
>  if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
>           int ret = acpi_processor_hotadd_init(pr);
>  
>          if (ret)
>                  return ret;
>  }
>  
>  For the sake of the argument, fold acpi_processor_hotadd_init() into it and
>  drop the redundant _STA check from it:
>  
>  if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
>          if (invalid_phys_cpuid(pr->phys_id))
>                  return -ENODEV;
>  
>          cpu_maps_update_begin();
>          cpus_write_lock();
>  
>         ret = acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);
>         if (ret) {
>                  cpus_write_unlock();
>                  cpu_maps_update_done();
>                  return ret;
>         }
>         ret = arch_register_cpu(pr->id);
>         if (ret) {
>                  acpi_unmap_cpu(pr->id);
>  
>                  cpus_write_unlock();
>                  cpu_maps_update_done();
>                  return ret;
>         }
>        pr_info("CPU%d has been hot-added\n", pr->id);
>        pr->flags.need_hotplug_init = 1;
>  
>        cpus_write_unlock();
>        cpu_maps_update_done();
>  }
>  
>  so I'm not sure why this cannot be combined with the new code.
>  
>  Say acpi_map_cpu) / acpi_unmap_cpu() are turned into arch calls.


We cannot because logical cpu-id can never be invalid and cpus can
never be in NOT present state on ARM arch.


>  What's the difference then?  


Above is the precise difference. Changing the behavior of 'presence' in
the ARM architecture after boot is not allowed. With the latest efforts, we
have added the concept of 'online-capable' bit which can help in defer
online'ing the CPUs but then this is not same as not being present at the
boot time. 


The locking, which should be fine if I'm not
>  mistaken and need_hotplug_init that needs to be set if this code runs after
>  the processor driver has loaded AFAICS.

AFAICS, Locking looks to be okay to me as well.

Best regards
Salil.
Jonathan Cameron April 15, 2024, 11:11 a.m. UTC | #12
On Mon, 15 Apr 2024 11:52:03 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Fri, 12 Apr 2024 20:30:40 +0200
> "Rafael J. Wysocki" <rafael@kernel.org> wrote:
> 
> > On Fri, Apr 12, 2024 at 4:38 PM Jonathan Cameron
> > <Jonathan.Cameron@huawei.com> wrote:  
> > >
> > > From: James Morse <james.morse@arm.com>
> > >
> > > The arm64 specific arch_register_cpu() call may defer CPU registration
> > > until the ACPI interpreter is available and the _STA method can
> > > be evaluated.
> > >
> > > If this occurs, then a second attempt is made in
> > > acpi_processor_get_info(). Note that the arm64 specific call has
> > > not yet been added so for now this will never be successfully
> > > called.
> > >
> > > Systems can still be booted with 'acpi=off', or not include an
> > > ACPI description at all as in these cases arch_register_cpu()
> > > will not have deferred registration when first called.
> > >
> > > This moves the CPU register logic back to a subsys_initcall(),
> > > while the memory nodes will have been registered earlier.
> > > Note this is where the call was prior to the cleanup series so
> > > there should be no side effects of moving it back again for this
> > > specific case.
> > >
> > > [PATCH 00/21] Initial cleanups for vCPU HP.
> > > https://lore.kernel.org/all/ZVyz%2FVe5pPu8AWoA@shell.armlinux.org.uk/
> > >
> > > e.g. 5b95f94c3b9f ("x86/topology: Switch over to GENERIC_CPU_DEVICES")
> > >
> > > Signed-off-by: James Morse <james.morse@arm.com>
> > > Reviewed-by: Gavin Shan <gshan@redhat.com>
> > > Tested-by: Miguel Luis <miguel.luis@oracle.com>
> > > Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
> > > Tested-by: Jianyong Wu <jianyong.wu@arm.com>
> > > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > > Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > Signed-off-by: Joanthan Cameron <Jonathan.Cameron@huawei.com>
> > > ---
> > > v5: Update commit message to make it clear this is moving the
> > >     init back to where it was until very recently.
> > >
> > >     No longer change the condition in the earlier registration point
> > >     as that will be handled by the arm64 registration routine
> > >     deferring until called again here.
> > > ---
> > >  drivers/acpi/acpi_processor.c | 12 ++++++++++++
> > >  1 file changed, 12 insertions(+)
> > >
> > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > > index 93e029403d05..c78398cdd060 100644
> > > --- a/drivers/acpi/acpi_processor.c
> > > +++ b/drivers/acpi/acpi_processor.c
> > > @@ -317,6 +317,18 @@ static int acpi_processor_get_info(struct acpi_device *device)
> > >
> > >         c = &per_cpu(cpu_devices, pr->id);
> > >         ACPI_COMPANION_SET(&c->dev, device);
> > > +       /*
> > > +        * Register CPUs that are present. get_cpu_device() is used to skip
> > > +        * duplicate CPU descriptions from firmware.
> > > +        */
> > > +       if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
> > > +           !get_cpu_device(pr->id)) {
> > > +               int ret = arch_register_cpu(pr->id);
> > > +
> > > +               if (ret)
> > > +                       return ret;
> > > +       }
> > > +
> > >         /*
> > >          *  Extra Processor objects may be enumerated on MP systems with
> > >          *  less than the max # of CPUs. They should be ignored _iff
> > > --    
> > 
> > I am still unsure why there need to be two paths calling
> > arch_register_cpu() in acpi_processor_get_info().  
> 
> I replied further down the thread, but the key point was to maintain
> the strong distinction between 'what' was done in a real hotplug
> path vs one where onlining was all.  We can relax that but it goes
> contrary to the careful dance that was needed to get any agreement
> to the ARM architecture aspects of this.
> 
> > 
> > Just below the comment partially pulled into the patch context above,
> > there is this code:
> > 
> > if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
> >          int ret = acpi_processor_hotadd_init(pr);
> > 
> >         if (ret)
> >                 return ret;
> > }
> > 
> > For the sake of the argument, fold acpi_processor_hotadd_init() into
> > it and drop the redundant _STA check from it:  
> 
> If we combine these, the _STA check is necessary because we will call this
> path for delayed onlining of ARM64 CPUs (if the earlier registration code
> call or arch_register_cpu() returned -EPROBE defer). That's the only way
> we know that a given CPU is online capable but firmware is saying we can't
> bring it online yet (it may be be vHP later).

For arm64 ignore this comment. I'd forgotten we moved it into the arch
specific code last week.  May need similar on x86 I'm not 100% sure.

> 
> > 
> > if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
> >         if (invalid_phys_cpuid(pr->phys_id))
> >                 return -ENODEV;
> > 
> >         cpu_maps_update_begin();
> >         cpus_write_lock();
> > 
> >        ret = acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);  
> 
> I read that call as
> 	acpi_map_cpu_for_physical_cpu_hotplug()
> but we could make it equivalent of.
> 	acpi_map_cpu_for_whatever_cpu_hotplug()
> (I'm not proposing those names though ;)
> 
> in which case it is fine to just stub it out on ARM64.
> >        if (ret) {
> >                 cpus_write_unlock();
> >                 cpu_maps_update_done();
> >                 return ret;
> >        }
> >        ret = arch_register_cpu(pr->id);
> >        if (ret) {
> >                 acpi_unmap_cpu(pr->id);
> > 
> >                 cpus_write_unlock();
> >                 cpu_maps_update_done();
> >                 return ret;
> >        }
> >       pr_info("CPU%d has been hot-added\n", pr->id);
> >       pr->flags.need_hotplug_init = 1;  
> This one needs more careful handling because we are calling this
> for non hotplug cases on arm64 in which case we end up setting this
> for initially online CPUs - thus if we offline and online them
> again via sysfs /sys/bus/cpu/device/cpuX/online it goes through the
> hotplug path and should not.
> 
> So I need a way to detect if we are hotplugging the cpu or not.
> Is there a standard way to do this?  I haven't figured out how
> to use flags in drivers to communicate this state.
> 
> > 
> >       cpus_write_unlock();
> >       cpu_maps_update_done();
> > }
> > 
> > so I'm not sure why this cannot be combined with the new code.
> > 
> > Say acpi_map_cpu) / acpi_unmap_cpu() are turned into arch calls.
> > What's the difference then?  The locking, which should be fine if I'm
> > not mistaken and need_hotplug_init that needs to be set if this code
> > runs after the processor driver has loaded AFAICS.  
> 
> That's the bit that I'm currently finding a challenge. Is there a clean
> way to detect that?
> 
> Jonathan
> 
> 
> 
> 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Rafael J. Wysocki April 15, 2024, 11:37 a.m. UTC | #13
On Mon, Apr 15, 2024 at 10:46 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Sat, 13 Apr 2024 01:23:48 +0200
> Thomas Gleixner <tglx@linutronix.de> wrote:
>
> > Russell!
> >
> > On Fri, Apr 12 2024 at 22:52, Russell King (Oracle) wrote:
> > > On Fri, Apr 12, 2024 at 10:54:32PM +0200, Thomas Gleixner wrote:
> > >> > As for the cpu locking, I couldn't find anything in arch_register_cpu()
> > >> > that depends on the cpu_maps_update stuff nor needs the cpus_write_lock
> > >> > being taken - so I've no idea why the "make_present" case takes these
> > >> > locks.
> > >>
> > >> Anything which updates a CPU mask, e.g. cpu_present_mask, after early
> > >> boot must hold the appropriate write locks. Otherwise it would be
> > >> possible to online a CPU which just got marked present, but the
> > >> registration has not completed yet.
> > >
> > > Yes. As far as I've been able to determine, arch_register_cpu()
> > > doesn't manipulate any of the CPU masks. All it seems to be doing
> > > is initialising the struct cpu, registering the embedded struct
> > > device, and setting up the sysfs links to its NUMA node.
> > >
> > > There is nothing obvious in there which manipulates any CPU masks, and
> > > this is rather my fundamental point when I said "I couldn't find
> > > anything in arch_register_cpu() that depends on ...".
> > >
> > > If there is something, then comments in the code would be a useful aid
> > > because it's highly non-obvious where such a manipulation is located,
> > > and hence why the locks are necessary.
> >
> > acpi_processor_hotadd_init()
> > ...
> >          acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);
> >
> > That ends up in fiddling with cpu_present_mask.
> >
> > I grant you that arch_register_cpu() is not, but it might rely on the
> > external locking too. I could not be bothered to figure that out.
> >
> > >> Define "real hotplug" :)
> > >>
> > >> Real physical hotplug does not really exist. That's at least true for
> > >> x86, where the physical hotplug support was chased for a while, but
> > >> never ended up in production.
> > >>
> > >> Though virtualization happily jumped on it to hot add/remove CPUs
> > >> to/from a guest.
> > >>
> > >> There are limitations to this and we learned it the hard way on X86. At
> > >> the end we came up with the following restrictions:
> > >>
> > >>     1) All possible CPUs have to be advertised at boot time via firmware
> > >>        (ACPI/DT/whatever) independent of them being present at boot time
> > >>        or not.
> > >>
> > >>        That guarantees proper sizing and ensures that associations
> > >>        between hardware entities and software representations and the
> > >>        resulting topology are stable for the lifetime of a system.
> > >>
> > >>        It is really required to know the full topology of the system at
> > >>        boot time especially with hybrid CPUs where some of the cores
> > >>        have hyperthreading and the others do not.
> > >>
> > >>
> > >>     2) Hot add can only mark an already registered (possible) CPU
> > >>        present. Adding non-registered CPUs after boot is not possible.
> > >>
> > >>        The CPU must have been registered in #1 already to ensure that
> > >>        the system topology does not suddenly change in an incompatible
> > >>        way at run-time.
> > >>
> > >> The same restriction would apply to real physical hotplug. I don't think
> > >> that's any different for ARM64 or any other architecture.
> > >
> > > This makes me wonder whether the Arm64 has been barking up the wrong
> > > tree then, and whether the whole "present" vs "enabled" thing comes
> > > from a misunderstanding as far as a CPU goes.
> > >
> > > However, there is a big difference between the two. On x86, a processor
> > > is just a processor. On Arm64, a "processor" is a slice of the system
> > > (includes the interrupt controller, PMUs etc) and we must enumerate
> > > those even when the processor itself is not enabled. This is the whole
> > > reason there's a difference between "present" and "enabled" and why
> > > there's a difference between x86 cpu hotplug and arm64 cpu hotplug.
> > > The processor never actually goes away in arm64, it's just prevented
> > > from being used.
> >
> > It's the same on X86 at least in the physical world.
>
> There were public calls on this via the Linaro Open Discussions group,
> so I can talk a little about how we ended up here.  Note that (in my
> opinion) there is zero chance of this changing - it took us well over
> a year to get to this conclusion.  So if we ever want ARM vCPU HP
> we need to work within these constraints.
>
> The ARM architecture folk (the ones defining the ARM ARM, relevant ACPI
> specs etc, not the kernel maintainers) are determined that they want
> to retain the option to do real physical CPU hotplug in the future
> with all the necessary work around dynamic interrupt controller
> initialization, debug and many other messy corners.

That's OK, but the difference is not in the ACPi CPU enumeration/removal code.

> Thus anything defined had to be structured in a way that was 'different'
> from that.

Apparently, that's where things got confused.

> I don't mind the proposed flattening of the 2 paths if the ARM kernel
> maintainers are fine with it but it will remove the distinctions and
> we will need to be very careful with the CPU masks - we can't handle
> them the same as x86 does.

At the ACPI code level, there is no distinction.

A CPU that was not available before has just become available.  The
platform firmware has notified the kernel about it and now
acpi_processor_add() runs.  Why would it need to use different code
paths depending on what _STA bits were clear before?

Yes, there is some arch stuff to be called and that arch stuff should
figure out what to do to make things actually work.

> I'll get on with doing that, but do need input from Will / Catalin / James.
> There are some quirks that need calling out as it's not quite a simple
> as it appears from a high level.
>
> Another part of that long discussion established that there is userspace
> (Android IIRC) in which the CPU present mask must include all CPUs
> at boot. To change that would be userspace ABI breakage so we can't
> do that.  Hence the dance around adding yet another mask to allow the
> OS to understand which CPUs are 'present' but not possible to online.
>
> Flattening the two paths removes any distinction between calls that
> are for real hotplug and those that are for this online capable path.

Which calls exactly do you mean?

> As a side note, the indicating bit for these flows is defined in ACPI
> for x86 from ACPI 6.3 as a flag in Processor Local APIC
> (the ARM64 definition is a cut and paste of that text).  So someone
> is interested in this distinction on x86. I can't say who but if
> you have a mantis account you can easily follow the history and it
> might be instructive to not everyone considering the current x86
> flow the right way to do it.

So a physically absent processor is different from a physically
present processor that has not been disabled.  No doubt about this.

That said, I'm still unsure why these two cases require two different
code paths in acpi_processor_add().
Salil Mehta April 15, 2024, 11:51 a.m. UTC | #14
Hello,

>  From: Thomas Gleixner <tglx@linutronix.de>
>  Sent: Friday, April 12, 2024 9:55 PM
>  
>  On Fri, Apr 12 2024 at 21:16, Russell King (Oracle) wrote:
>  > On Fri, Apr 12, 2024 at 08:30:40PM +0200, Rafael J. Wysocki wrote:
>  >> Say acpi_map_cpu) / acpi_unmap_cpu() are turned into arch calls.
>  >> What's the difference then?  The locking, which should be fine if I'm
>  >> not mistaken and need_hotplug_init that needs to be set if this code
>  >> runs after the processor driver has loaded AFAICS.
>  >
>  > It is over this that I walked away from progressing this code, because
>  > I don't think it's quite as simple as you make it out to be.
>  >
>  > Yes, acpi_map_cpu() and acpi_unmap_cpu() are already arch
>  implemented
>  > functions, so Arm64 can easily provide stubs for these that do nothing.
>  > That never caused me any concern.
>  >
>  > What does cause me great concern though are the finer details. For
>  > example, above you seem to drop the evaluation of _STA for the
>  > "make_present" case - I've no idea whether that is something that
>  > should be deleted or not (if it is something that can be deleted, then
>  > why not delete it now?)
>  >
>  > As for the cpu locking, I couldn't find anything in
>  > arch_register_cpu() that depends on the cpu_maps_update stuff nor
>  > needs the cpus_write_lock being taken - so I've no idea why the
>  > "make_present" case takes these locks.
>  
>  Anything which updates a CPU mask, e.g. cpu_present_mask, after early
>  boot must hold the appropriate write locks. Otherwise it would be possible
>  to online a CPU which just got marked present, but the registration has not
>  completed yet.
>  
>  > Finally, the "pr->flags.need_hotplug_init = 1" thing... it's not
>  > obvious that this is required - remember that with Arm64's "enabled"
>  > toggling, the "processor" is a slice of the system and doesn't
>  > actually go away - it's just "not enabled" for use.
>  >
>  > Again, as "processors" in Arm64 are slices of the system, they have to
>  > be fully described in ACPI before the OS boots, and they will be
>  > marked as being "present", which means they will be enumerated, and
>  > the driver will be probed. Any processor that is not to be used will
>  > not have its enabled bit set. It is my understanding that every
>  > processor will result in the ACPI processor driver being bound to it
>  > whether its enabled or not.
>  >
>  > The difference between real hotplug and Arm64 hotplug is that real
>  > hotplug makes stuff not-present (and thus unenumerable). Arm64
>  hotplug
>  > makes stuff not-enabled which is still enumerable.
>  
>  Define "real hotplug" :)
>  
>  Real physical hotplug does not really exist. That's at least true for x86, where
>  the physical hotplug support was chased for a while, but never ended up in
>  production.
>  
>  Though virtualization happily jumped on it to hot add/remove CPUs to/from
>  a guest.
>  
>  There are limitations to this and we learned it the hard way on X86. At the
>  end we came up with the following restrictions:
>  
>      1) All possible CPUs have to be advertised at boot time via firmware
>         (ACPI/DT/whatever) independent of them being present at boot time
>         or not.
>  
>         That guarantees proper sizing and ensures that associations
>         between hardware entities and software representations and the
>         resulting topology are stable for the lifetime of a system.
>  
>         It is really required to know the full topology of the system at
>         boot time especially with hybrid CPUs where some of the cores
>         have hyperthreading and the others do not.
>  
>  
>      2) Hot add can only mark an already registered (possible) CPU
>         present. Adding non-registered CPUs after boot is not possible.
>  
>         The CPU must have been registered in #1 already to ensure that
>         the system topology does not suddenly change in an incompatible
>         way at run-time.
>  
>  The same restriction would apply to real physical hotplug. I don't think that's
>  any different for ARM64 or any other architecture.


There is a difference:

1.   ARM arch does not allows for any processor to be NOT present. Hence, because of
this restriction any of its related per-cpu components must be present and enumerated
at the boot time as well (exposed by firmware and ACPI). This means all the enumerated
processors will be marked as 'present' but they might exist in NOT enabled (_STA.enabled=0)
state.
 
There was one clear difference and please correct me if I'm wrong here,  for x86, the LAPIC
associated with the x86 core can be brought online later even after boot?

But for ARM Arch, processors and its corresponding per-cpu components like redistributors
all need to be present and enumerated during the boot time. Redistributors are part of
ALWAYS-ON power domain. 

2.  Agreed regarding the topology. Are you suggesting that we must call arch_register_cpu()
during boot time for all the 'present' CPUs? Even if that's the case, we might still want to defer
registration of the cpu device (register_cpu() API) with the Linux device model. Later is what
we are doing to hide/unhide the CPUs from the user while STA.Enabled Bit is toggled due to
CPU (un)plug action.


Best regards
Salil.


>  
>  Hope that helps.
>  
>  Thanks,
>  
>          tglx
Rafael J. Wysocki April 15, 2024, 11:52 a.m. UTC | #15
On Mon, Apr 15, 2024 at 12:52 PM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Fri, 12 Apr 2024 20:30:40 +0200
> "Rafael J. Wysocki" <rafael@kernel.org> wrote:
>
> > On Fri, Apr 12, 2024 at 4:38 PM Jonathan Cameron
> > <Jonathan.Cameron@huawei.com> wrote:
> > >
> > > From: James Morse <james.morse@arm.com>
> > >
> > > The arm64 specific arch_register_cpu() call may defer CPU registration
> > > until the ACPI interpreter is available and the _STA method can
> > > be evaluated.
> > >
> > > If this occurs, then a second attempt is made in
> > > acpi_processor_get_info(). Note that the arm64 specific call has
> > > not yet been added so for now this will never be successfully
> > > called.
> > >
> > > Systems can still be booted with 'acpi=off', or not include an
> > > ACPI description at all as in these cases arch_register_cpu()
> > > will not have deferred registration when first called.
> > >
> > > This moves the CPU register logic back to a subsys_initcall(),
> > > while the memory nodes will have been registered earlier.
> > > Note this is where the call was prior to the cleanup series so
> > > there should be no side effects of moving it back again for this
> > > specific case.
> > >
> > > [PATCH 00/21] Initial cleanups for vCPU HP.
> > > https://lore.kernel.org/all/ZVyz%2FVe5pPu8AWoA@shell.armlinux.org.uk/
> > >
> > > e.g. 5b95f94c3b9f ("x86/topology: Switch over to GENERIC_CPU_DEVICES")
> > >
> > > Signed-off-by: James Morse <james.morse@arm.com>
> > > Reviewed-by: Gavin Shan <gshan@redhat.com>
> > > Tested-by: Miguel Luis <miguel.luis@oracle.com>
> > > Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
> > > Tested-by: Jianyong Wu <jianyong.wu@arm.com>
> > > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > > Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > Signed-off-by: Joanthan Cameron <Jonathan.Cameron@huawei.com>
> > > ---
> > > v5: Update commit message to make it clear this is moving the
> > >     init back to where it was until very recently.
> > >
> > >     No longer change the condition in the earlier registration point
> > >     as that will be handled by the arm64 registration routine
> > >     deferring until called again here.
> > > ---
> > >  drivers/acpi/acpi_processor.c | 12 ++++++++++++
> > >  1 file changed, 12 insertions(+)
> > >
> > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > > index 93e029403d05..c78398cdd060 100644
> > > --- a/drivers/acpi/acpi_processor.c
> > > +++ b/drivers/acpi/acpi_processor.c
> > > @@ -317,6 +317,18 @@ static int acpi_processor_get_info(struct acpi_device *device)
> > >
> > >         c = &per_cpu(cpu_devices, pr->id);
> > >         ACPI_COMPANION_SET(&c->dev, device);
> > > +       /*
> > > +        * Register CPUs that are present. get_cpu_device() is used to skip
> > > +        * duplicate CPU descriptions from firmware.
> > > +        */
> > > +       if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
> > > +           !get_cpu_device(pr->id)) {
> > > +               int ret = arch_register_cpu(pr->id);
> > > +
> > > +               if (ret)
> > > +                       return ret;
> > > +       }
> > > +
> > >         /*
> > >          *  Extra Processor objects may be enumerated on MP systems with
> > >          *  less than the max # of CPUs. They should be ignored _iff
> > > --
> >
> > I am still unsure why there need to be two paths calling
> > arch_register_cpu() in acpi_processor_get_info().
>
> I replied further down the thread, but the key point was to maintain
> the strong distinction between 'what' was done in a real hotplug
> path vs one where onlining was all.  We can relax that but it goes
> contrary to the careful dance that was needed to get any agreement
> to the ARM architecture aspects of this.

This seems to go beyond technical territory.

As a general rule, we tend to combine code paths that look similar
instead of making them separate on purpose.  Especially with a little
to no explanation of the technical reason.

> >
> > Just below the comment partially pulled into the patch context above,
> > there is this code:
> >
> > if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
> >          int ret = acpi_processor_hotadd_init(pr);
> >
> >         if (ret)
> >                 return ret;
> > }
> >
> > For the sake of the argument, fold acpi_processor_hotadd_init() into
> > it and drop the redundant _STA check from it:
>
> If we combine these, the _STA check is necessary because we will call this
> path for delayed onlining of ARM64 CPUs (if the earlier registration code
> call or arch_register_cpu() returned -EPROBE defer). That's the only way
> we know that a given CPU is online capable but firmware is saying we can't
> bring it online yet (it may be be vHP later).

Ignoring the above as per the other message.

> >
> > if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
> >         if (invalid_phys_cpuid(pr->phys_id))
> >                 return -ENODEV;
> >
> >         cpu_maps_update_begin();
> >         cpus_write_lock();
> >
> >        ret = acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);
>
> I read that call as
>         acpi_map_cpu_for_physical_cpu_hotplug()
> but we could make it equivalent of.
>         acpi_map_cpu_for_whatever_cpu_hotplug()

Yes.

> (I'm not proposing those names though ;)

Sure.

> in which case it is fine to just stub it out on ARM64.
> >        if (ret) {
> >                 cpus_write_unlock();
> >                 cpu_maps_update_done();
> >                 return ret;
> >        }
> >        ret = arch_register_cpu(pr->id);
> >        if (ret) {
> >                 acpi_unmap_cpu(pr->id);
> >
> >                 cpus_write_unlock();
> >                 cpu_maps_update_done();
> >                 return ret;
> >        }
> >       pr_info("CPU%d has been hot-added\n", pr->id);
> >       pr->flags.need_hotplug_init = 1;
> This one needs more careful handling because we are calling this
> for non hotplug cases on arm64 in which case we end up setting this
> for initially online CPUs - thus if we offline and online them
> again via sysfs /sys/bus/cpu/device/cpuX/online it goes through the
> hotplug path and should not.
>
> So I need a way to detect if we are hotplugging the cpu or not.
> Is there a standard way to do this?

If you mean physical hot-add, I don't think so.

What exactly do you need to do differently in the two cases?

>  I haven't figured out how to use flags in drivers to communicate this state.
>
> >
> >       cpus_write_unlock();
> >       cpu_maps_update_done();
> > }
> >
> > so I'm not sure why this cannot be combined with the new code.
> >
> > Say acpi_map_cpu) / acpi_unmap_cpu() are turned into arch calls.
> > What's the difference then?  The locking, which should be fine if I'm
> > not mistaken and need_hotplug_init that needs to be set if this code
> > runs after the processor driver has loaded AFAICS.
>
> That's the bit that I'm currently finding a challenge. Is there a clean
> way to detect that?

If you mean the need_hotplug_init flag, I'm currently a bit struggling
to convince myself that it is really needed.
Jonathan Cameron April 15, 2024, 11:56 a.m. UTC | #16
On Mon, 15 Apr 2024 13:37:08 +0200
"Rafael J. Wysocki" <rafael@kernel.org> wrote:

> On Mon, Apr 15, 2024 at 10:46 AM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > On Sat, 13 Apr 2024 01:23:48 +0200
> > Thomas Gleixner <tglx@linutronix.de> wrote:
> >  
> > > Russell!
> > >
> > > On Fri, Apr 12 2024 at 22:52, Russell King (Oracle) wrote:  
> > > > On Fri, Apr 12, 2024 at 10:54:32PM +0200, Thomas Gleixner wrote:  
> > > >> > As for the cpu locking, I couldn't find anything in arch_register_cpu()
> > > >> > that depends on the cpu_maps_update stuff nor needs the cpus_write_lock
> > > >> > being taken - so I've no idea why the "make_present" case takes these
> > > >> > locks.  
> > > >>
> > > >> Anything which updates a CPU mask, e.g. cpu_present_mask, after early
> > > >> boot must hold the appropriate write locks. Otherwise it would be
> > > >> possible to online a CPU which just got marked present, but the
> > > >> registration has not completed yet.  
> > > >
> > > > Yes. As far as I've been able to determine, arch_register_cpu()
> > > > doesn't manipulate any of the CPU masks. All it seems to be doing
> > > > is initialising the struct cpu, registering the embedded struct
> > > > device, and setting up the sysfs links to its NUMA node.
> > > >
> > > > There is nothing obvious in there which manipulates any CPU masks, and
> > > > this is rather my fundamental point when I said "I couldn't find
> > > > anything in arch_register_cpu() that depends on ...".
> > > >
> > > > If there is something, then comments in the code would be a useful aid
> > > > because it's highly non-obvious where such a manipulation is located,
> > > > and hence why the locks are necessary.  
> > >
> > > acpi_processor_hotadd_init()
> > > ...
> > >          acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);
> > >
> > > That ends up in fiddling with cpu_present_mask.
> > >
> > > I grant you that arch_register_cpu() is not, but it might rely on the
> > > external locking too. I could not be bothered to figure that out.
> > >  
> > > >> Define "real hotplug" :)
> > > >>
> > > >> Real physical hotplug does not really exist. That's at least true for
> > > >> x86, where the physical hotplug support was chased for a while, but
> > > >> never ended up in production.
> > > >>
> > > >> Though virtualization happily jumped on it to hot add/remove CPUs
> > > >> to/from a guest.
> > > >>
> > > >> There are limitations to this and we learned it the hard way on X86. At
> > > >> the end we came up with the following restrictions:
> > > >>
> > > >>     1) All possible CPUs have to be advertised at boot time via firmware
> > > >>        (ACPI/DT/whatever) independent of them being present at boot time
> > > >>        or not.
> > > >>
> > > >>        That guarantees proper sizing and ensures that associations
> > > >>        between hardware entities and software representations and the
> > > >>        resulting topology are stable for the lifetime of a system.
> > > >>
> > > >>        It is really required to know the full topology of the system at
> > > >>        boot time especially with hybrid CPUs where some of the cores
> > > >>        have hyperthreading and the others do not.
> > > >>
> > > >>
> > > >>     2) Hot add can only mark an already registered (possible) CPU
> > > >>        present. Adding non-registered CPUs after boot is not possible.
> > > >>
> > > >>        The CPU must have been registered in #1 already to ensure that
> > > >>        the system topology does not suddenly change in an incompatible
> > > >>        way at run-time.
> > > >>
> > > >> The same restriction would apply to real physical hotplug. I don't think
> > > >> that's any different for ARM64 or any other architecture.  
> > > >
> > > > This makes me wonder whether the Arm64 has been barking up the wrong
> > > > tree then, and whether the whole "present" vs "enabled" thing comes
> > > > from a misunderstanding as far as a CPU goes.
> > > >
> > > > However, there is a big difference between the two. On x86, a processor
> > > > is just a processor. On Arm64, a "processor" is a slice of the system
> > > > (includes the interrupt controller, PMUs etc) and we must enumerate
> > > > those even when the processor itself is not enabled. This is the whole
> > > > reason there's a difference between "present" and "enabled" and why
> > > > there's a difference between x86 cpu hotplug and arm64 cpu hotplug.
> > > > The processor never actually goes away in arm64, it's just prevented
> > > > from being used.  
> > >
> > > It's the same on X86 at least in the physical world.  
> >
> > There were public calls on this via the Linaro Open Discussions group,
> > so I can talk a little about how we ended up here.  Note that (in my
> > opinion) there is zero chance of this changing - it took us well over
> > a year to get to this conclusion.  So if we ever want ARM vCPU HP
> > we need to work within these constraints.
> >
> > The ARM architecture folk (the ones defining the ARM ARM, relevant ACPI
> > specs etc, not the kernel maintainers) are determined that they want
> > to retain the option to do real physical CPU hotplug in the future
> > with all the necessary work around dynamic interrupt controller
> > initialization, debug and many other messy corners.  
> 
> That's OK, but the difference is not in the ACPi CPU enumeration/removal code.
> 
> > Thus anything defined had to be structured in a way that was 'different'
> > from that.  
> 
> Apparently, that's where things got confused.
> 
> > I don't mind the proposed flattening of the 2 paths if the ARM kernel
> > maintainers are fine with it but it will remove the distinctions and
> > we will need to be very careful with the CPU masks - we can't handle
> > them the same as x86 does.  
> 
> At the ACPI code level, there is no distinction.
> 
> A CPU that was not available before has just become available.  The
> platform firmware has notified the kernel about it and now
> acpi_processor_add() runs.  Why would it need to use different code
> paths depending on what _STA bits were clear before?

I think we will continue to disagree on this.  To my mind and from the
ACPI specification, they are two different state transitions with different
required actions.   Those state transitions are an ACPI level thing not
an arch level one.  However, I want a solution that moves things forwards
so I'll give pushing that entirely into the arch code a try.

> 
> Yes, there is some arch stuff to be called and that arch stuff should
> figure out what to do to make things actually work.
> 
> > I'll get on with doing that, but do need input from Will / Catalin / James.
> > There are some quirks that need calling out as it's not quite a simple
> > as it appears from a high level.
> >
> > Another part of that long discussion established that there is userspace
> > (Android IIRC) in which the CPU present mask must include all CPUs
> > at boot. To change that would be userspace ABI breakage so we can't
> > do that.  Hence the dance around adding yet another mask to allow the
> > OS to understand which CPUs are 'present' but not possible to online.
> >
> > Flattening the two paths removes any distinction between calls that
> > are for real hotplug and those that are for this online capable path.  
> 
> Which calls exactly do you mean?

At the moment he distinction does not exist (because x86 only supports
fake physical CPU HP and arm64 only vCPU HP / online capable), but if
the architecture is defined for arm64 physical hotplug in the future
we would need to do interrupt controller bring up + a lot of other stuff.

It may be possible to do that in the arch code - will be hard to verify
that until that arch is defined  Today all I need to do is ensure that
any attempt to do present bit setting for ARM64 returns an error.
That looks to be straight forward.


> 
> > As a side note, the indicating bit for these flows is defined in ACPI
> > for x86 from ACPI 6.3 as a flag in Processor Local APIC
> > (the ARM64 definition is a cut and paste of that text).  So someone
> > is interested in this distinction on x86. I can't say who but if
> > you have a mantis account you can easily follow the history and it
> > might be instructive to not everyone considering the current x86
> > flow the right way to do it.  
> 
> So a physically absent processor is different from a physically
> present processor that has not been disabled.  No doubt about this.
> 
> That said, I'm still unsure why these two cases require two different
> code paths in acpi_processor_add().

It might be possible to push the checking down into arch_register_cpu()
and have that for now reject any attempt to do physical CPU HP on arm64.
It is that gate that is vital to getting this accepted by ARM.

I'm still very much stuck on the hotadd_init flag however, so any suggestions
on that would be very welcome! 

Jonathan
Jonathan Cameron April 15, 2024, 11:57 a.m. UTC | #17
On Mon, 15 Apr 2024 10:16:37 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Mon, 15 Apr 2024 09:45:52 +0100
> Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> 
> > On Sat, 13 Apr 2024 01:23:48 +0200
> > Thomas Gleixner <tglx@linutronix.de> wrote:
> >   
> > > Russell!
> > > 
> > > On Fri, Apr 12 2024 at 22:52, Russell King (Oracle) wrote:    
> > > > On Fri, Apr 12, 2024 at 10:54:32PM +0200, Thomas Gleixner wrote:      
> > > >> > As for the cpu locking, I couldn't find anything in arch_register_cpu()
> > > >> > that depends on the cpu_maps_update stuff nor needs the cpus_write_lock
> > > >> > being taken - so I've no idea why the "make_present" case takes these
> > > >> > locks.      
> > > >> 
> > > >> Anything which updates a CPU mask, e.g. cpu_present_mask, after early
> > > >> boot must hold the appropriate write locks. Otherwise it would be
> > > >> possible to online a CPU which just got marked present, but the
> > > >> registration has not completed yet.      
> > > >
> > > > Yes. As far as I've been able to determine, arch_register_cpu()
> > > > doesn't manipulate any of the CPU masks. All it seems to be doing
> > > > is initialising the struct cpu, registering the embedded struct
> > > > device, and setting up the sysfs links to its NUMA node.
> > > >
> > > > There is nothing obvious in there which manipulates any CPU masks, and
> > > > this is rather my fundamental point when I said "I couldn't find
> > > > anything in arch_register_cpu() that depends on ...".
> > > >
> > > > If there is something, then comments in the code would be a useful aid
> > > > because it's highly non-obvious where such a manipulation is located,
> > > > and hence why the locks are necessary.      
> > > 
> > > acpi_processor_hotadd_init()
> > > ...
> > >          acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);
> > > 
> > > That ends up in fiddling with cpu_present_mask.
> > > 
> > > I grant you that arch_register_cpu() is not, but it might rely on the
> > > external locking too. I could not be bothered to figure that out.
> > >     
> > > >> Define "real hotplug" :)
> > > >> 
> > > >> Real physical hotplug does not really exist. That's at least true for
> > > >> x86, where the physical hotplug support was chased for a while, but
> > > >> never ended up in production.
> > > >> 
> > > >> Though virtualization happily jumped on it to hot add/remove CPUs
> > > >> to/from a guest.
> > > >> 
> > > >> There are limitations to this and we learned it the hard way on X86. At
> > > >> the end we came up with the following restrictions:
> > > >> 
> > > >>     1) All possible CPUs have to be advertised at boot time via firmware
> > > >>        (ACPI/DT/whatever) independent of them being present at boot time
> > > >>        or not.
> > > >> 
> > > >>        That guarantees proper sizing and ensures that associations
> > > >>        between hardware entities and software representations and the
> > > >>        resulting topology are stable for the lifetime of a system.
> > > >> 
> > > >>        It is really required to know the full topology of the system at
> > > >>        boot time especially with hybrid CPUs where some of the cores
> > > >>        have hyperthreading and the others do not.
> > > >> 
> > > >> 
> > > >>     2) Hot add can only mark an already registered (possible) CPU
> > > >>        present. Adding non-registered CPUs after boot is not possible.
> > > >> 
> > > >>        The CPU must have been registered in #1 already to ensure that
> > > >>        the system topology does not suddenly change in an incompatible
> > > >>        way at run-time.
> > > >> 
> > > >> The same restriction would apply to real physical hotplug. I don't think
> > > >> that's any different for ARM64 or any other architecture.      
> > > >
> > > > This makes me wonder whether the Arm64 has been barking up the wrong
> > > > tree then, and whether the whole "present" vs "enabled" thing comes
> > > > from a misunderstanding as far as a CPU goes.
> > > >
> > > > However, there is a big difference between the two. On x86, a processor
> > > > is just a processor. On Arm64, a "processor" is a slice of the system
> > > > (includes the interrupt controller, PMUs etc) and we must enumerate
> > > > those even when the processor itself is not enabled. This is the whole
> > > > reason there's a difference between "present" and "enabled" and why
> > > > there's a difference between x86 cpu hotplug and arm64 cpu hotplug.
> > > > The processor never actually goes away in arm64, it's just prevented
> > > > from being used.      
> > > 
> > > It's the same on X86 at least in the physical world.    
> > 
> > There were public calls on this via the Linaro Open Discussions group,
> > so I can talk a little about how we ended up here.  Note that (in my
> > opinion) there is zero chance of this changing - it took us well over
> > a year to get to this conclusion.  So if we ever want ARM vCPU HP
> > we need to work within these constraints. 
> > 
> > The ARM architecture folk (the ones defining the ARM ARM, relevant ACPI
> > specs etc, not the kernel maintainers) are determined that they want
> > to retain the option to do real physical CPU hotplug in the future
> > with all the necessary work around dynamic interrupt controller
> > initialization, debug and many other messy corners.
> > 
> > Thus anything defined had to be structured in a way that was 'different'
> > from that.
> > 
> > I don't mind the proposed flattening of the 2 paths if the ARM kernel
> > maintainers are fine with it but it will remove the distinctions and
> > we will need to be very careful with the CPU masks - we can't handle
> > them the same as x86 does.
> > 
> > I'll get on with doing that, but do need input from Will / Catalin / James.
> > There are some quirks that need calling out as it's not quite a simple
> > as it appears from a high level.
> > 
> > Another part of that long discussion established that there is userspace
> > (Android IIRC) in which the CPU present mask must include all CPUs
> > at boot. To change that would be userspace ABI breakage so we can't
> > do that.  Hence the dance around adding yet another mask to allow the
> > OS to understand which CPUs are 'present' but not possible to online.
> > 
> > Flattening the two paths removes any distinction between calls that
> > are for real hotplug and those that are for this online capable path.
> > As a side note, the indicating bit for these flows is defined in ACPI
> > for x86 from ACPI 6.3 as a flag in Processor Local APIC
> > (the ARM64 definition is a cut and paste of that text).  So someone
> > is interested in this distinction on x86. I can't say who but if
> > you have a mantis account you can easily follow the history and it
> > might be instructive to not everyone considering the current x86
> > flow the right way to do it.  
> 
> Would a higher level check to catch that we are hitting undefined
> territory on arm64 be acceptable? That might satisfy the constraint
> that we should not have any software for arm64 that would run if
> physical CPU HP is added to the arch in future.  Something like:
> 
> @@ -331,6 +331,13 @@ static int acpi_processor_get_info(struct acpi_device *device)
> 
>         c = &per_cpu(cpu_devices, pr->id);
>         ACPI_COMPANION_SET(&c->dev, device);
> +
> +       if (!IS_ENABLED(CONFIG_ACPI_CPU_HOTPLUG_CPU) &&
> +           (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id))) {
> +               pr_err_once("Changing CPU present bit is not supported\n");
> +               return -ENODEV;
> +       }
> +
> 
> This is basically lifting the check out of the acpi_processor_make_present()
> call in this patch set.
> 
> With that in place before the new shared call I think we should be fine
> wrt to the ARM Architecture requirements.

As discussed elsewhere in this thread, I'll push this into the arm64
specific arch_register_cpu() definition.

> 
> Jonathan
> 
> 
>         /*
> > 
> > Jonathan
> > 
> >   
> > > 
> > > Thanks,
> > > 
> > >         tglx
> > >     
> > 
> > 
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel  
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Rafael J. Wysocki April 15, 2024, 12:04 p.m. UTC | #18
On Mon, Apr 15, 2024 at 1:56 PM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Mon, 15 Apr 2024 13:37:08 +0200
> "Rafael J. Wysocki" <rafael@kernel.org> wrote:
>
> > On Mon, Apr 15, 2024 at 10:46 AM Jonathan Cameron
> > <Jonathan.Cameron@huawei.com> wrote:
> > >
> > > On Sat, 13 Apr 2024 01:23:48 +0200
> > > Thomas Gleixner <tglx@linutronix.de> wrote:
> > >
> > > > Russell!
> > > >
> > > > On Fri, Apr 12 2024 at 22:52, Russell King (Oracle) wrote:
> > > > > On Fri, Apr 12, 2024 at 10:54:32PM +0200, Thomas Gleixner wrote:
> > > > >> > As for the cpu locking, I couldn't find anything in arch_register_cpu()
> > > > >> > that depends on the cpu_maps_update stuff nor needs the cpus_write_lock
> > > > >> > being taken - so I've no idea why the "make_present" case takes these
> > > > >> > locks.
> > > > >>
> > > > >> Anything which updates a CPU mask, e.g. cpu_present_mask, after early
> > > > >> boot must hold the appropriate write locks. Otherwise it would be
> > > > >> possible to online a CPU which just got marked present, but the
> > > > >> registration has not completed yet.
> > > > >
> > > > > Yes. As far as I've been able to determine, arch_register_cpu()
> > > > > doesn't manipulate any of the CPU masks. All it seems to be doing
> > > > > is initialising the struct cpu, registering the embedded struct
> > > > > device, and setting up the sysfs links to its NUMA node.
> > > > >
> > > > > There is nothing obvious in there which manipulates any CPU masks, and
> > > > > this is rather my fundamental point when I said "I couldn't find
> > > > > anything in arch_register_cpu() that depends on ...".
> > > > >
> > > > > If there is something, then comments in the code would be a useful aid
> > > > > because it's highly non-obvious where such a manipulation is located,
> > > > > and hence why the locks are necessary.
> > > >
> > > > acpi_processor_hotadd_init()
> > > > ...
> > > >          acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);
> > > >
> > > > That ends up in fiddling with cpu_present_mask.
> > > >
> > > > I grant you that arch_register_cpu() is not, but it might rely on the
> > > > external locking too. I could not be bothered to figure that out.
> > > >
> > > > >> Define "real hotplug" :)
> > > > >>
> > > > >> Real physical hotplug does not really exist. That's at least true for
> > > > >> x86, where the physical hotplug support was chased for a while, but
> > > > >> never ended up in production.
> > > > >>
> > > > >> Though virtualization happily jumped on it to hot add/remove CPUs
> > > > >> to/from a guest.
> > > > >>
> > > > >> There are limitations to this and we learned it the hard way on X86. At
> > > > >> the end we came up with the following restrictions:
> > > > >>
> > > > >>     1) All possible CPUs have to be advertised at boot time via firmware
> > > > >>        (ACPI/DT/whatever) independent of them being present at boot time
> > > > >>        or not.
> > > > >>
> > > > >>        That guarantees proper sizing and ensures that associations
> > > > >>        between hardware entities and software representations and the
> > > > >>        resulting topology are stable for the lifetime of a system.
> > > > >>
> > > > >>        It is really required to know the full topology of the system at
> > > > >>        boot time especially with hybrid CPUs where some of the cores
> > > > >>        have hyperthreading and the others do not.
> > > > >>
> > > > >>
> > > > >>     2) Hot add can only mark an already registered (possible) CPU
> > > > >>        present. Adding non-registered CPUs after boot is not possible.
> > > > >>
> > > > >>        The CPU must have been registered in #1 already to ensure that
> > > > >>        the system topology does not suddenly change in an incompatible
> > > > >>        way at run-time.
> > > > >>
> > > > >> The same restriction would apply to real physical hotplug. I don't think
> > > > >> that's any different for ARM64 or any other architecture.
> > > > >
> > > > > This makes me wonder whether the Arm64 has been barking up the wrong
> > > > > tree then, and whether the whole "present" vs "enabled" thing comes
> > > > > from a misunderstanding as far as a CPU goes.
> > > > >
> > > > > However, there is a big difference between the two. On x86, a processor
> > > > > is just a processor. On Arm64, a "processor" is a slice of the system
> > > > > (includes the interrupt controller, PMUs etc) and we must enumerate
> > > > > those even when the processor itself is not enabled. This is the whole
> > > > > reason there's a difference between "present" and "enabled" and why
> > > > > there's a difference between x86 cpu hotplug and arm64 cpu hotplug.
> > > > > The processor never actually goes away in arm64, it's just prevented
> > > > > from being used.
> > > >
> > > > It's the same on X86 at least in the physical world.
> > >
> > > There were public calls on this via the Linaro Open Discussions group,
> > > so I can talk a little about how we ended up here.  Note that (in my
> > > opinion) there is zero chance of this changing - it took us well over
> > > a year to get to this conclusion.  So if we ever want ARM vCPU HP
> > > we need to work within these constraints.
> > >
> > > The ARM architecture folk (the ones defining the ARM ARM, relevant ACPI
> > > specs etc, not the kernel maintainers) are determined that they want
> > > to retain the option to do real physical CPU hotplug in the future
> > > with all the necessary work around dynamic interrupt controller
> > > initialization, debug and many other messy corners.
> >
> > That's OK, but the difference is not in the ACPi CPU enumeration/removal code.
> >
> > > Thus anything defined had to be structured in a way that was 'different'
> > > from that.
> >
> > Apparently, that's where things got confused.
> >
> > > I don't mind the proposed flattening of the 2 paths if the ARM kernel
> > > maintainers are fine with it but it will remove the distinctions and
> > > we will need to be very careful with the CPU masks - we can't handle
> > > them the same as x86 does.
> >
> > At the ACPI code level, there is no distinction.
> >
> > A CPU that was not available before has just become available.  The
> > platform firmware has notified the kernel about it and now
> > acpi_processor_add() runs.  Why would it need to use different code
> > paths depending on what _STA bits were clear before?
>
> I think we will continue to disagree on this.  To my mind and from the
> ACPI specification, they are two different state transitions with different
> required actions.

Well, please be specific: What exactly do you mean here and which
parts of the spec are you talking about?

> Those state transitions are an ACPI level thing not
> an arch level one.  However, I want a solution that moves things forwards
> so I'll give pushing that entirely into the arch code a try.

Thanks!

Though I think that there is a disconnect between us that needs to be
clarified first.

> >
> > Yes, there is some arch stuff to be called and that arch stuff should
> > figure out what to do to make things actually work.
> >
> > > I'll get on with doing that, but do need input from Will / Catalin / James.
> > > There are some quirks that need calling out as it's not quite a simple
> > > as it appears from a high level.
> > >
> > > Another part of that long discussion established that there is userspace
> > > (Android IIRC) in which the CPU present mask must include all CPUs
> > > at boot. To change that would be userspace ABI breakage so we can't
> > > do that.  Hence the dance around adding yet another mask to allow the
> > > OS to understand which CPUs are 'present' but not possible to online.
> > >
> > > Flattening the two paths removes any distinction between calls that
> > > are for real hotplug and those that are for this online capable path.
> >
> > Which calls exactly do you mean?
>
> At the moment he distinction does not exist (because x86 only supports
> fake physical CPU HP and arm64 only vCPU HP / online capable), but if
> the architecture is defined for arm64 physical hotplug in the future
> we would need to do interrupt controller bring up + a lot of other stuff.
>
> It may be possible to do that in the arch code - will be hard to verify
> that until that arch is defined  Today all I need to do is ensure that
> any attempt to do present bit setting for ARM64 returns an error.
> That looks to be straight forward.

OK

>
> >
> > > As a side note, the indicating bit for these flows is defined in ACPI
> > > for x86 from ACPI 6.3 as a flag in Processor Local APIC
> > > (the ARM64 definition is a cut and paste of that text).  So someone
> > > is interested in this distinction on x86. I can't say who but if
> > > you have a mantis account you can easily follow the history and it
> > > might be instructive to not everyone considering the current x86
> > > flow the right way to do it.
> >
> > So a physically absent processor is different from a physically
> > present processor that has not been disabled.  No doubt about this.
> >
> > That said, I'm still unsure why these two cases require two different
> > code paths in acpi_processor_add().
>
> It might be possible to push the checking down into arch_register_cpu()
> and have that for now reject any attempt to do physical CPU HP on arm64.
> It is that gate that is vital to getting this accepted by ARM.
>
> I'm still very much stuck on the hotadd_init flag however, so any suggestions
> on that would be very welcome!

I need to do some investigation which will take some time I suppose.
Jonathan Cameron April 15, 2024, 12:23 p.m. UTC | #19
On Mon, 15 Apr 2024 14:04:26 +0200
"Rafael J. Wysocki" <rafael@kernel.org> wrote:

> On Mon, Apr 15, 2024 at 1:56 PM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > On Mon, 15 Apr 2024 13:37:08 +0200
> > "Rafael J. Wysocki" <rafael@kernel.org> wrote:
> >  
> > > On Mon, Apr 15, 2024 at 10:46 AM Jonathan Cameron
> > > <Jonathan.Cameron@huawei.com> wrote:  
> > > >
> > > > On Sat, 13 Apr 2024 01:23:48 +0200
> > > > Thomas Gleixner <tglx@linutronix.de> wrote:
> > > >  
> > > > > Russell!
> > > > >
> > > > > On Fri, Apr 12 2024 at 22:52, Russell King (Oracle) wrote:  
> > > > > > On Fri, Apr 12, 2024 at 10:54:32PM +0200, Thomas Gleixner wrote:  
> > > > > >> > As for the cpu locking, I couldn't find anything in arch_register_cpu()
> > > > > >> > that depends on the cpu_maps_update stuff nor needs the cpus_write_lock
> > > > > >> > being taken - so I've no idea why the "make_present" case takes these
> > > > > >> > locks.  
> > > > > >>
> > > > > >> Anything which updates a CPU mask, e.g. cpu_present_mask, after early
> > > > > >> boot must hold the appropriate write locks. Otherwise it would be
> > > > > >> possible to online a CPU which just got marked present, but the
> > > > > >> registration has not completed yet.  
> > > > > >
> > > > > > Yes. As far as I've been able to determine, arch_register_cpu()
> > > > > > doesn't manipulate any of the CPU masks. All it seems to be doing
> > > > > > is initialising the struct cpu, registering the embedded struct
> > > > > > device, and setting up the sysfs links to its NUMA node.
> > > > > >
> > > > > > There is nothing obvious in there which manipulates any CPU masks, and
> > > > > > this is rather my fundamental point when I said "I couldn't find
> > > > > > anything in arch_register_cpu() that depends on ...".
> > > > > >
> > > > > > If there is something, then comments in the code would be a useful aid
> > > > > > because it's highly non-obvious where such a manipulation is located,
> > > > > > and hence why the locks are necessary.  
> > > > >
> > > > > acpi_processor_hotadd_init()
> > > > > ...
> > > > >          acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);
> > > > >
> > > > > That ends up in fiddling with cpu_present_mask.
> > > > >
> > > > > I grant you that arch_register_cpu() is not, but it might rely on the
> > > > > external locking too. I could not be bothered to figure that out.
> > > > >  
> > > > > >> Define "real hotplug" :)
> > > > > >>
> > > > > >> Real physical hotplug does not really exist. That's at least true for
> > > > > >> x86, where the physical hotplug support was chased for a while, but
> > > > > >> never ended up in production.
> > > > > >>
> > > > > >> Though virtualization happily jumped on it to hot add/remove CPUs
> > > > > >> to/from a guest.
> > > > > >>
> > > > > >> There are limitations to this and we learned it the hard way on X86. At
> > > > > >> the end we came up with the following restrictions:
> > > > > >>
> > > > > >>     1) All possible CPUs have to be advertised at boot time via firmware
> > > > > >>        (ACPI/DT/whatever) independent of them being present at boot time
> > > > > >>        or not.
> > > > > >>
> > > > > >>        That guarantees proper sizing and ensures that associations
> > > > > >>        between hardware entities and software representations and the
> > > > > >>        resulting topology are stable for the lifetime of a system.
> > > > > >>
> > > > > >>        It is really required to know the full topology of the system at
> > > > > >>        boot time especially with hybrid CPUs where some of the cores
> > > > > >>        have hyperthreading and the others do not.
> > > > > >>
> > > > > >>
> > > > > >>     2) Hot add can only mark an already registered (possible) CPU
> > > > > >>        present. Adding non-registered CPUs after boot is not possible.
> > > > > >>
> > > > > >>        The CPU must have been registered in #1 already to ensure that
> > > > > >>        the system topology does not suddenly change in an incompatible
> > > > > >>        way at run-time.
> > > > > >>
> > > > > >> The same restriction would apply to real physical hotplug. I don't think
> > > > > >> that's any different for ARM64 or any other architecture.  
> > > > > >
> > > > > > This makes me wonder whether the Arm64 has been barking up the wrong
> > > > > > tree then, and whether the whole "present" vs "enabled" thing comes
> > > > > > from a misunderstanding as far as a CPU goes.
> > > > > >
> > > > > > However, there is a big difference between the two. On x86, a processor
> > > > > > is just a processor. On Arm64, a "processor" is a slice of the system
> > > > > > (includes the interrupt controller, PMUs etc) and we must enumerate
> > > > > > those even when the processor itself is not enabled. This is the whole
> > > > > > reason there's a difference between "present" and "enabled" and why
> > > > > > there's a difference between x86 cpu hotplug and arm64 cpu hotplug.
> > > > > > The processor never actually goes away in arm64, it's just prevented
> > > > > > from being used.  
> > > > >
> > > > > It's the same on X86 at least in the physical world.  
> > > >
> > > > There were public calls on this via the Linaro Open Discussions group,
> > > > so I can talk a little about how we ended up here.  Note that (in my
> > > > opinion) there is zero chance of this changing - it took us well over
> > > > a year to get to this conclusion.  So if we ever want ARM vCPU HP
> > > > we need to work within these constraints.
> > > >
> > > > The ARM architecture folk (the ones defining the ARM ARM, relevant ACPI
> > > > specs etc, not the kernel maintainers) are determined that they want
> > > > to retain the option to do real physical CPU hotplug in the future
> > > > with all the necessary work around dynamic interrupt controller
> > > > initialization, debug and many other messy corners.  
> > >
> > > That's OK, but the difference is not in the ACPi CPU enumeration/removal code.
> > >  
> > > > Thus anything defined had to be structured in a way that was 'different'
> > > > from that.  
> > >
> > > Apparently, that's where things got confused.
> > >  
> > > > I don't mind the proposed flattening of the 2 paths if the ARM kernel
> > > > maintainers are fine with it but it will remove the distinctions and
> > > > we will need to be very careful with the CPU masks - we can't handle
> > > > them the same as x86 does.  
> > >
> > > At the ACPI code level, there is no distinction.
> > >
> > > A CPU that was not available before has just become available.  The
> > > platform firmware has notified the kernel about it and now
> > > acpi_processor_add() runs.  Why would it need to use different code
> > > paths depending on what _STA bits were clear before?  
> >
> > I think we will continue to disagree on this.  To my mind and from the
> > ACPI specification, they are two different state transitions with different
> > required actions.  
> 
> Well, please be specific: What exactly do you mean here and which
> parts of the spec are you talking about?

Given we are moving on with your suggestion, lets leave this for now - too many
other things to do! :)

> 
> > Those state transitions are an ACPI level thing not
> > an arch level one.  However, I want a solution that moves things forwards
> > so I'll give pushing that entirely into the arch code a try.  
> 
> Thanks!
> 
> Though I think that there is a disconnect between us that needs to be
> clarified first.

I'm fine with accepting your approach if it works and is acceptable
to the arm kernel folk. They are getting a non trivial arch_register_cpu()
with a bunch of ACPI specific handling in it that may come as a surprise.

> 
> > >
> > > Yes, there is some arch stuff to be called and that arch stuff should
> > > figure out what to do to make things actually work.
> > >  
> > > > I'll get on with doing that, but do need input from Will / Catalin / James.
> > > > There are some quirks that need calling out as it's not quite a simple
> > > > as it appears from a high level.
> > > >
> > > > Another part of that long discussion established that there is userspace
> > > > (Android IIRC) in which the CPU present mask must include all CPUs
> > > > at boot. To change that would be userspace ABI breakage so we can't
> > > > do that.  Hence the dance around adding yet another mask to allow the
> > > > OS to understand which CPUs are 'present' but not possible to online.
> > > >
> > > > Flattening the two paths removes any distinction between calls that
> > > > are for real hotplug and those that are for this online capable path.  
> > >
> > > Which calls exactly do you mean?  
> >
> > At the moment he distinction does not exist (because x86 only supports
> > fake physical CPU HP and arm64 only vCPU HP / online capable), but if
> > the architecture is defined for arm64 physical hotplug in the future
> > we would need to do interrupt controller bring up + a lot of other stuff.
> >
> > It may be possible to do that in the arch code - will be hard to verify
> > that until that arch is defined  Today all I need to do is ensure that
> > any attempt to do present bit setting for ARM64 returns an error.
> > That looks to be straight forward.  
> 
> OK
> 
> >  
> > >  
> > > > As a side note, the indicating bit for these flows is defined in ACPI
> > > > for x86 from ACPI 6.3 as a flag in Processor Local APIC
> > > > (the ARM64 definition is a cut and paste of that text).  So someone
> > > > is interested in this distinction on x86. I can't say who but if
> > > > you have a mantis account you can easily follow the history and it
> > > > might be instructive to not everyone considering the current x86
> > > > flow the right way to do it.  
> > >
> > > So a physically absent processor is different from a physically
> > > present processor that has not been disabled.  No doubt about this.
> > >
> > > That said, I'm still unsure why these two cases require two different
> > > code paths in acpi_processor_add().  
> >
> > It might be possible to push the checking down into arch_register_cpu()
> > and have that for now reject any attempt to do physical CPU HP on arm64.
> > It is that gate that is vital to getting this accepted by ARM.
> >
> > I'm still very much stuck on the hotadd_init flag however, so any suggestions
> > on that would be very welcome!  
> 
> I need to do some investigation which will take some time I suppose.

I'll do so as well once I've gotten the rest sorted out.  That whole
structure seems overly complex and liable to race, though maybe sufficient
locking happens to be held that it's not a problem.

Jonathan
Salil Mehta April 15, 2024, 12:37 p.m. UTC | #20
Hi Rafael,

>  From: Rafael J. Wysocki <rafael@kernel.org>
>  Sent: Monday, April 15, 2024 1:04 PM
>  
>  On Mon, Apr 15, 2024 at 1:56 PM Jonathan Cameron
>  <Jonathan.Cameron@huawei.com> wrote:
>  >
>  > On Mon, 15 Apr 2024 13:37:08 +0200
>  > "Rafael J. Wysocki" <rafael@kernel.org> wrote:
>  >
>  > > On Mon, Apr 15, 2024 at 10:46 AM Jonathan Cameron
>  > > <Jonathan.Cameron@huawei.com> wrote:
>  > > >
>  > > > On Sat, 13 Apr 2024 01:23:48 +0200 Thomas Gleixner
>  > > > <tglx@linutronix.de> wrote:
>  > > >
>  > > > > Russell!
>  > > > >
>  > > > > On Fri, Apr 12 2024 at 22:52, Russell King (Oracle) wrote:
>  > > > > > On Fri, Apr 12, 2024 at 10:54:32PM +0200, Thomas Gleixner wrote:
>  > > > > >> > As for the cpu locking, I couldn't find anything in
>  > > > > >> > arch_register_cpu() that depends on the cpu_maps_update
>  > > > > >> > stuff nor needs the cpus_write_lock being taken - so I've
>  > > > > >> > no idea why the "make_present" case takes these locks.
>  > > > > >>
>  > > > > >> Anything which updates a CPU mask, e.g. cpu_present_mask,
>  > > > > >> after early boot must hold the appropriate write locks.
>  > > > > >> Otherwise it would be possible to online a CPU which just got
>  > > > > >> marked present, but the registration has not completed yet.
>  > > > > >
>  > > > > > Yes. As far as I've been able to determine,
>  > > > > > arch_register_cpu() doesn't manipulate any of the CPU masks.
>  > > > > > All it seems to be doing is initialising the struct cpu,
>  > > > > > registering the embedded struct device, and setting up the sysfs  links to its NUMA node.
>  > > > > >
>  > > > > > There is nothing obvious in there which manipulates any CPU
>  > > > > > masks, and this is rather my fundamental point when I said "I
>  > > > > > couldn't find anything in arch_register_cpu() that depends on ...".
>  > > > > >
>  > > > > > If there is something, then comments in the code would be a
>  > > > > > useful aid because it's highly non-obvious where such a
>  > > > > > manipulation is located, and hence why the locks are necessary.
>  > > > >
>  > > > > acpi_processor_hotadd_init()
>  > > > > ...
>  > > > >          acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id,
>  > > > > &pr->id);
>  > > > >
>  > > > > That ends up in fiddling with cpu_present_mask.
>  > > > >
>  > > > > I grant you that arch_register_cpu() is not, but it might rely
>  > > > > on the external locking too. I could not be bothered to figure that  out.
>  > > > >
>  > > > > >> Define "real hotplug" :)
>  > > > > >>
>  > > > > >> Real physical hotplug does not really exist. That's at least
>  > > > > >> true for x86, where the physical hotplug support was chased
>  > > > > >> for a while, but never ended up in production.
>  > > > > >>
>  > > > > >> Though virtualization happily jumped on it to hot add/remove
>  > > > > >> CPUs to/from a guest.
>  > > > > >>
>  > > > > >> There are limitations to this and we learned it the hard way
>  > > > > >> on X86. At the end we came up with the following restrictions:
>  > > > > >>
>  > > > > >>     1) All possible CPUs have to be advertised at boot time via firmware
>  > > > > >>        (ACPI/DT/whatever) independent of them being present at boot time
>  > > > > >>        or not.
>  > > > > >>
>  > > > > >>        That guarantees proper sizing and ensures that associations
>  > > > > >>        between hardware entities and software representations and the
>  > > > > >>        resulting topology are stable for the lifetime of a system.
>  > > > > >>
>  > > > > >>        It is really required to know the full topology of the system at
>  > > > > >>        boot time especially with hybrid CPUs where some of the cores
>  > > > > >>        have hyperthreading and the others do not.
>  > > > > >>
>  > > > > >>
>  > > > > >>     2) Hot add can only mark an already registered (possible) CPU
>  > > > > >>        present. Adding non-registered CPUs after boot is not  possible.
>  > > > > >>
>  > > > > >>        The CPU must have been registered in #1 already to ensure  that
>  > > > > >>        the system topology does not suddenly change in an incompatible
>  > > > > >>        way at run-time.
>  > > > > >>
>  > > > > >> The same restriction would apply to real physical hotplug. I
>  > > > > >> don't think that's any different for ARM64 or any other architecture.
>  > > > > >
>  > > > > > This makes me wonder whether the Arm64 has been barking up the
>  > > > > > wrong tree then, and whether the whole "present" vs "enabled"
>  > > > > > thing comes from a misunderstanding as far as a CPU goes.
>  > > > > >
>  > > > > > However, there is a big difference between the two. On x86, a
>  > > > > > processor is just a processor. On Arm64, a "processor" is a
>  > > > > > slice of the system (includes the interrupt controller, PMUs
>  > > > > > etc) and we must enumerate those even when the processor
>  > > > > > itself is not enabled. This is the whole reason there's a
>  > > > > > difference between "present" and "enabled" and why there's a difference between x86 cpu hotplug and arm64 cpu hotplug.
>  > > > > > The processor never actually goes away in arm64, it's just
>  > > > > > prevented from being used.
>  > > > >
>  > > > > It's the same on X86 at least in the physical world.
>  > > >
>  > > > There were public calls on this via the Linaro Open Discussions
>  > > > group, so I can talk a little about how we ended up here.  Note
>  > > > that (in my
>  > > > opinion) there is zero chance of this changing - it took us well
>  > > > over a year to get to this conclusion.  So if we ever want ARM
>  > > > vCPU HP we need to work within these constraints.
>  > > >
>  > > > The ARM architecture folk (the ones defining the ARM ARM, relevant
>  > > > ACPI specs etc, not the kernel maintainers) are determined that
>  > > > they want to retain the option to do real physical CPU hotplug in
>  > > > the future with all the necessary work around dynamic interrupt
>  > > > controller initialization, debug and many other messy corners.
>  > >
>  > > That's OK, but the difference is not in the ACPi CPU enumeration/removal code.
>  > >
>  > > > Thus anything defined had to be structured in a way that was  'different'
>  > > > from that.
>  > >
>  > > Apparently, that's where things got confused.
>  > >
>  > > > I don't mind the proposed flattening of the 2 paths if the ARM
>  > > > kernel maintainers are fine with it but it will remove the
>  > > > distinctions and we will need to be very careful with the CPU
>  > > > masks - we can't handle them the same as x86 does.
>  > >
>  > > At the ACPI code level, there is no distinction.
>  > >
>  > > A CPU that was not available before has just become available.  The
>  > > platform firmware has notified the kernel about it and now
>  > > acpi_processor_add() runs.  Why would it need to use different code
>  > > paths depending on what _STA bits were clear before?
>  >
>  > I think we will continue to disagree on this.  To my mind and from the
>  > ACPI specification, they are two different state transitions with
>  > different required actions.
>  
>  Well, please be specific: What exactly do you mean here and which parts of
>  the spec are you talking about?
>  
>  > Those state transitions are an ACPI level thing not an arch level one.
>  > However, I want a solution that moves things forwards so I'll give
>  > pushing that entirely into the arch code a try.
>  
>  Thanks!
>  
>  Though I think that there is a disconnect between us that needs to be
>  clarified first.
>  
>  > >
>  > > Yes, there is some arch stuff to be called and that arch stuff
>  > > should figure out what to do to make things actually work.
>  > >
>  > > > I'll get on with doing that, but do need input from Will / Catalin / James.
>  > > > There are some quirks that need calling out as it's not quite a
>  > > > simple as it appears from a high level.
>  > > >
>  > > > Another part of that long discussion established that there is
>  > > > userspace (Android IIRC) in which the CPU present mask must
>  > > > include all CPUs at boot. To change that would be userspace ABI
>  > > > breakage so we can't do that.  Hence the dance around adding yet
>  > > > another mask to allow the OS to understand which CPUs are 'present'
>  but not possible to online.
>  > > >
>  > > > Flattening the two paths removes any distinction between calls
>  > > > that are for real hotplug and those that are for this online capable path.
>  > >
>  > > Which calls exactly do you mean?
>  >
>  > At the moment he distinction does not exist (because x86 only supports
>  > fake physical CPU HP and arm64 only vCPU HP / online capable), but if
>  > the architecture is defined for arm64 physical hotplug in the future
>  > we would need to do interrupt controller bring up + a lot of other stuff.
>  >
>  > It may be possible to do that in the arch code - will be hard to
>  > verify that until that arch is defined  Today all I need to do is
>  > ensure that any attempt to do present bit setting for ARM64 returns an error.
>  > That looks to be straight forward.
>  
>  OK
>  
>  >
>  > >
>  > > > As a side note, the indicating bit for these flows is defined in
>  > > > ACPI for x86 from ACPI 6.3 as a flag in Processor Local APIC (the
>  > > > ARM64 definition is a cut and paste of that text).  So someone is
>  > > > interested in this distinction on x86. I can't say who but if you
>  > > > have a mantis account you can easily follow the history and it
>  > > > might be instructive to not everyone considering the current x86
>  > > > flow the right way to do it.
>  > >
>  > > So a physically absent processor is different from a physically
>  > > present processor that has not been disabled.  No doubt about this.
>  > >
>  > > That said, I'm still unsure why these two cases require two
>  > > different code paths in acpi_processor_add().
>  >
>  > It might be possible to push the checking down into
>  > arch_register_cpu() and have that for now reject any attempt to do
>  physical CPU HP on arm64.
>  > It is that gate that is vital to getting this accepted by ARM.
>  >
>  > I'm still very much stuck on the hotadd_init flag however, so any
>  > suggestions on that would be very welcome!
>  
>  I need to do some investigation which will take some time I suppose.


You might find below cover letter and links to the presentations useful:

1. https://lore.kernel.org/qemu-devel/20230926100436.28284-1-salil.mehta@huawei.com/
2. https://kvm-forum.qemu.org/2023/KVM-forum-cpu-hotplug_7OJ1YyJ.pdf
3. https://kvm-forum.qemu.org/2023/Challenges_Revisited_in_Supporting_Virt_CPU_Hotplug_-__ii0iNb3.pdf
4. https://sched.co/eE4m


Best regards
Salil.
Rafael J. Wysocki April 15, 2024, 12:41 p.m. UTC | #21
On Mon, Apr 15, 2024 at 2:37 PM Salil Mehta <salil.mehta@huawei.com> wrote:
>
> Hi Rafael,
>
> >  From: Rafael J. Wysocki <rafael@kernel.org>
> >  Sent: Monday, April 15, 2024 1:04 PM
> >

[cut]

> >
> >  I need to do some investigation which will take some time I suppose.
>
>
> You might find below cover letter and links to the presentations useful:
>
> 1. https://lore.kernel.org/qemu-devel/20230926100436.28284-1-salil.mehta@huawei.com/
> 2. https://kvm-forum.qemu.org/2023/KVM-forum-cpu-hotplug_7OJ1YyJ.pdf
> 3. https://kvm-forum.qemu.org/2023/Challenges_Revisited_in_Supporting_Virt_CPU_Hotplug_-__ii0iNb3.pdf
> 4. https://sched.co/eE4m

Thanks, I'll go through this, but I kind of doubt if it helps me with
finding out what to do with the hotadd_init flag.
Rafael J. Wysocki April 15, 2024, 12:51 p.m. UTC | #22
On Mon, Apr 15, 2024 at 1:51 PM Salil Mehta <salil.mehta@huawei.com> wrote:
>
> Hello,
>
> >  From: Thomas Gleixner <tglx@linutronix.de>
> >  Sent: Friday, April 12, 2024 9:55 PM
> >
> >  On Fri, Apr 12 2024 at 21:16, Russell King (Oracle) wrote:
> >  > On Fri, Apr 12, 2024 at 08:30:40PM +0200, Rafael J. Wysocki wrote:
> >  >> Say acpi_map_cpu) / acpi_unmap_cpu() are turned into arch calls.
> >  >> What's the difference then?  The locking, which should be fine if I'm
> >  >> not mistaken and need_hotplug_init that needs to be set if this code
> >  >> runs after the processor driver has loaded AFAICS.
> >  >
> >  > It is over this that I walked away from progressing this code, because
> >  > I don't think it's quite as simple as you make it out to be.
> >  >
> >  > Yes, acpi_map_cpu() and acpi_unmap_cpu() are already arch
> >  implemented
> >  > functions, so Arm64 can easily provide stubs for these that do nothing.
> >  > That never caused me any concern.
> >  >
> >  > What does cause me great concern though are the finer details. For
> >  > example, above you seem to drop the evaluation of _STA for the
> >  > "make_present" case - I've no idea whether that is something that
> >  > should be deleted or not (if it is something that can be deleted, then
> >  > why not delete it now?)
> >  >
> >  > As for the cpu locking, I couldn't find anything in
> >  > arch_register_cpu() that depends on the cpu_maps_update stuff nor
> >  > needs the cpus_write_lock being taken - so I've no idea why the
> >  > "make_present" case takes these locks.
> >
> >  Anything which updates a CPU mask, e.g. cpu_present_mask, after early
> >  boot must hold the appropriate write locks. Otherwise it would be possible
> >  to online a CPU which just got marked present, but the registration has not
> >  completed yet.
> >
> >  > Finally, the "pr->flags.need_hotplug_init = 1" thing... it's not
> >  > obvious that this is required - remember that with Arm64's "enabled"
> >  > toggling, the "processor" is a slice of the system and doesn't
> >  > actually go away - it's just "not enabled" for use.
> >  >
> >  > Again, as "processors" in Arm64 are slices of the system, they have to
> >  > be fully described in ACPI before the OS boots, and they will be
> >  > marked as being "present", which means they will be enumerated, and
> >  > the driver will be probed. Any processor that is not to be used will
> >  > not have its enabled bit set. It is my understanding that every
> >  > processor will result in the ACPI processor driver being bound to it
> >  > whether its enabled or not.
> >  >
> >  > The difference between real hotplug and Arm64 hotplug is that real
> >  > hotplug makes stuff not-present (and thus unenumerable). Arm64
> >  hotplug
> >  > makes stuff not-enabled which is still enumerable.
> >
> >  Define "real hotplug" :)
> >
> >  Real physical hotplug does not really exist. That's at least true for x86, where
> >  the physical hotplug support was chased for a while, but never ended up in
> >  production.
> >
> >  Though virtualization happily jumped on it to hot add/remove CPUs to/from
> >  a guest.
> >
> >  There are limitations to this and we learned it the hard way on X86. At the
> >  end we came up with the following restrictions:
> >
> >      1) All possible CPUs have to be advertised at boot time via firmware
> >         (ACPI/DT/whatever) independent of them being present at boot time
> >         or not.
> >
> >         That guarantees proper sizing and ensures that associations
> >         between hardware entities and software representations and the
> >         resulting topology are stable for the lifetime of a system.
> >
> >         It is really required to know the full topology of the system at
> >         boot time especially with hybrid CPUs where some of the cores
> >         have hyperthreading and the others do not.
> >
> >
> >      2) Hot add can only mark an already registered (possible) CPU
> >         present. Adding non-registered CPUs after boot is not possible.
> >
> >         The CPU must have been registered in #1 already to ensure that
> >         the system topology does not suddenly change in an incompatible
> >         way at run-time.
> >
> >  The same restriction would apply to real physical hotplug. I don't think that's
> >  any different for ARM64 or any other architecture.
>
>
> There is a difference:
>
> 1.   ARM arch does not allows for any processor to be NOT present. Hence, because of
> this restriction any of its related per-cpu components must be present and enumerated
> at the boot time as well (exposed by firmware and ACPI). This means all the enumerated
> processors will be marked as 'present' but they might exist in NOT enabled (_STA.enabled=0)
> state.
>
> There was one clear difference and please correct me if I'm wrong here,  for x86, the LAPIC
> associated with the x86 core can be brought online later even after boot?
>
> But for ARM Arch, processors and its corresponding per-cpu components like redistributors
> all need to be present and enumerated during the boot time. Redistributors are part of
> ALWAYS-ON power domain.

OK

So what exactly is the problem with this and what does
acpi_processor_add() have to do with it?

Do you want the per-CPU structures etc. to be created from the
acpi_processor_add() path?

This plain won't work because acpi_processor_add(), as defined in the
mainline kernel today (and the Jonathan's patches don't change that
AFAICS), returns an error for processor devices with the "enabled" bit
clear in _STA (it returns an error if the "present" bit is clear too,
but that's obvious), so it only gets to calling arch_register_cpu() if
*both* "present" and "enabled" _STA bits are set for the given
processor device.

That, BTW, is why I keep saying that from the ACPI CPU enumeration
code perspective, there is no difference between "present" and
"enabled".

> 2.  Agreed regarding the topology. Are you suggesting that we must call arch_register_cpu()
> during boot time for all the 'present' CPUs? Even if that's the case, we might still want to defer
> registration of the cpu device (register_cpu() API) with the Linux device model. Later is what
> we are doing to hide/unhide the CPUs from the user while STA.Enabled Bit is toggled due to
> CPU (un)plug action.

There are two ways to approach this IMV, and both seem to be valid in principle.

One would be to treat CPUs with the "enabled" bit clear as not present
and create all of the requisite data structures for them when they
become available (in analogy with the "real hot-add" case).

The alternative one is to create all of the requisite data structures
for the CPUs that you find during boot, but register CPU devices for
those having the "enabled" _STA bit set.

It looks like you have chosen the second approach, which is fine with
me (although personally, I would rather choose the first one), but
then your arch code needs to arrange for the requisite CPU data
structures etc. to be set up before acpi_processor_add() gets called
because, as per the above, the latter just rejects CPUs with the
"enabled" _STA bit clear.

Thanks!
Salil Mehta April 15, 2024, 3:31 p.m. UTC | #23
>  From: Rafael J. Wysocki <rafael@kernel.org>
>  Sent: Monday, April 15, 2024 1:51 PM
>  
>  On Mon, Apr 15, 2024 at 1:51 PM Salil Mehta <salil.mehta@huawei.com>
>  wrote:
>  >
>  > Hello,
>  >
>  > >  From: Thomas Gleixner <tglx@linutronix.de>
>  > >  Sent: Friday, April 12, 2024 9:55 PM
>  > >
>  > >  On Fri, Apr 12 2024 at 21:16, Russell King (Oracle) wrote:
>  > >  > On Fri, Apr 12, 2024 at 08:30:40PM +0200, Rafael J. Wysocki wrote:
>  > >  >> Say acpi_map_cpu) / acpi_unmap_cpu() are turned into arch calls.
>  > >  >> What's the difference then?  The locking, which should be fine
>  > > if I'm  >> not mistaken and need_hotplug_init that needs to be set
>  > > if this code  >> runs after the processor driver has loaded AFAICS.
>  > >  >
>  > >  > It is over this that I walked away from progressing this code,
>  > > because  > I don't think it's quite as simple as you make it out to be.
>  > >  >
>  > >  > Yes, acpi_map_cpu() and acpi_unmap_cpu() are already arch
>  > > implemented  > functions, so Arm64 can easily provide stubs for
>  > > these that do nothing.
>  > >  > That never caused me any concern.
>  > >  >
>  > >  > What does cause me great concern though are the finer details.
>  > > For  > example, above you seem to drop the evaluation of _STA for
>  > > the  > "make_present" case - I've no idea whether that is something
>  > > that  > should be deleted or not (if it is something that can be
>  > > deleted, then  > why not delete it now?)  >  > As for the cpu
>  > > locking, I couldn't find anything in  > arch_register_cpu() that
>  > > depends on the cpu_maps_update stuff nor  > needs the
>  > > cpus_write_lock being taken - so I've no idea why the  >
>  > > "make_present" case takes these locks.
>  > >
>  > >  Anything which updates a CPU mask, e.g. cpu_present_mask, after
>  > > early  boot must hold the appropriate write locks. Otherwise it
>  > > would be possible  to online a CPU which just got marked present,
>  > > but the registration has not  completed yet.
>  > >
>  > >  > Finally, the "pr->flags.need_hotplug_init = 1" thing... it's not
>  > > > obvious that this is required - remember that with Arm64's "enabled"
>  > >  > toggling, the "processor" is a slice of the system and doesn't  >
>  > > actually go away - it's just "not enabled" for use.
>  > >  >
>  > >  > Again, as "processors" in Arm64 are slices of the system, they
>  > > have to  > be fully described in ACPI before the OS boots, and they
>  > > will be  > marked as being "present", which means they will be
>  > > enumerated, and  > the driver will be probed. Any processor that is
>  > > not to be used will  > not have its enabled bit set. It is my
>  > > understanding that every  > processor will result in the ACPI
>  > > processor driver being bound to it  > whether its enabled or not.
>  > >  >
>  > >  > The difference between real hotplug and Arm64 hotplug is that
>  > > real  > hotplug makes stuff not-present (and thus unenumerable).
>  > > Arm64  hotplug  > makes stuff not-enabled which is still enumerable.
>  > >
>  > >  Define "real hotplug" :)
>  > >
>  > >  Real physical hotplug does not really exist. That's at least true
>  > > for x86, where  the physical hotplug support was chased for a while,
>  > > but never ended up in  production.
>  > >
>  > >  Though virtualization happily jumped on it to hot add/remove CPUs
>  > > to/from  a guest.
>  > >
>  > >  There are limitations to this and we learned it the hard way on
>  > > X86. At the  end we came up with the following restrictions:
>  > >
>  > >      1) All possible CPUs have to be advertised at boot time via firmware
>  > >         (ACPI/DT/whatever) independent of them being present at boot time
>  > >         or not.
>  > >
>  > >         That guarantees proper sizing and ensures that associations
>  > >         between hardware entities and software representations and the
>  > >         resulting topology are stable for the lifetime of a system.
>  > >
>  > >         It is really required to know the full topology of the system at
>  > >         boot time especially with hybrid CPUs where some of the cores
>  > >         have hyperthreading and the others do not.
>  > >
>  > >
>  > >      2) Hot add can only mark an already registered (possible) CPU
>  > >         present. Adding non-registered CPUs after boot is not possible.
>  > >
>  > >         The CPU must have been registered in #1 already to ensure that
>  > >         the system topology does not suddenly change in an incompatible
>  > >         way at run-time.
>  > >
>  > >  The same restriction would apply to real physical hotplug. I don't
>  > > think that's  any different for ARM64 or any other architecture.
>  >
>  >
>  > There is a difference:
>  >
>  > 1.   ARM arch does not allows for any processor to be NOT present. Hence,  because of
>  > this restriction any of its related per-cpu components must be present
>  > and enumerated at the boot time as well (exposed by firmware and
>  > ACPI). This means all the enumerated processors will be marked as
>  > 'present' but they might exist in NOT enabled (_STA.enabled=0) state.
>  >
>  > There was one clear difference and please correct me if I'm wrong
>  > here,  for x86, the LAPIC associated with the x86 core can be brought online later even after boot?
>  >
>  > But for ARM Arch, processors and its corresponding per-cpu components
>  > like redistributors all need to be present and enumerated during the
>  > boot time. Redistributors are part of ALWAYS-ON power domain.
>  
>  OK
>  
>  So what exactly is the problem with this and what does
>  acpi_processor_add() have to do with it?


For ARM Arch, during boot time, it should add processor as if no hotplug exists. But later,
in context to the (fake) hotplug trigger from the virtualizer as a result of the CPU (un)plug
action  it should just end up in registering the already present CPU with the Linux Driver Model. 


>  
>  Do you want the per-CPU structures etc. to be created from the
>  acpi_processor_add() path?


I referred to the components related to ARM CPU Arch like redistributors etc.
which will get initialized in context to Arch specific _init code not here. This
i.e. acpi_processor_add() is arch agnostic code common to all architectures.

[ A digression: You do have _weak functions which can be overridden to arch specific
 handling like  arch_(un)map_cpu() etc. but we can't use those to defer initialize
 the CPU related components - ARM Arch constraint!]


>  
>  This plain won't work because acpi_processor_add(), as defined in the
>  mainline kernel today (and the Jonathan's patches don't change that
>  AFAICS), returns an error for processor devices with the "enabled" bit clear
>  in _STA (it returns an error if the "present" bit is clear too, but that's
>  obvious), so it only gets to calling arch_register_cpu() if
>  *both* "present" and "enabled" _STA bits are set for the given processor
>  device.


If you are referring to the _STA check in the XX_hot_add_init() then in the current
kernel code it only checks for the ACPI_STA_DEVICE_PRESENT flag and not
the ACPI_STA_DEVICE_ENABLED flag(?). The code being reviewed has changed
exactly that behavior for 2 legs i.e. make-present and make-enabled legs.

I'm open to further address your point clearly.

>  
>  That, BTW, is why I keep saying that from the ACPI CPU enumeration code
>  perspective, there is no difference between "present" and "enabled".


Agreed but there is still a subtle difference.  Enumeration happens once and
for all the processors during the boot time. And as confirmed by yourself and
Thomas as well that even in x86 arch all the processors will be discovered and
their topology fixed during the boot time which is effectively the same behavior
as in the ARM Arch. But ARM assumes those 'present' bits in the present masks
to be set during the boot time which is not like x86(?).  Hence, 'present cpu' Bits
will always be equal to 'possible cpu' Bits. This is a constraint put by the ARM
maintainers and looks unshakable. 


>  
>  > 2.  Agreed regarding the topology. Are you suggesting that we must
>  > call arch_register_cpu() during boot time for all the 'present' CPUs?
>  > Even if that's the case, we might still want to defer registration of
>  > the cpu device (register_cpu() API) with the Linux device model. Later
>  > is what we are doing to hide/unhide the CPUs from the user while
>  STA.Enabled Bit is toggled due to CPU (un)plug action.
>  
>  There are two ways to approach this IMV, and both seem to be valid in
>  principle.
>  
>  One would be to treat CPUs with the "enabled" bit clear as not present and
>  create all of the requisite data structures for them when they become
>  available (in analogy with the "real hot-add" case).


Right. This one is out-of-scope for ARM Arch as we cannot defer any Arch
specific sizing and initializations after boot i.e. when processor_add() gets
called again later as a trigger of CPU plug action at the Virtualizer.


>  
>  The alternative one is to create all of the requisite data structures for the
>  CPUs that you find during boot, but register CPU devices for those having
>  the "enabled" _STA bit set.


Correct. and we defer the registration for CPUs with online-capable Bit
set in the ACPI MADT/GICC data structure. These CPUs basically form
set of hot-pluggable CPUs on ARM. 


>  
>  It looks like you have chosen the second approach, which is fine with me
>  (although personally, I would rather choose the first one), but then your
>  arch code needs to arrange for the requisite CPU data structures etc. to be
>  set up before acpi_processor_add() gets called because, as per the above,
>  the latter just rejects CPUs with the "enabled" _STA bit clear.

Yes, correct. First one is not possible - at least for now and to have that it will
require lot of rework especially at GIC. But there are many other arch components
(like timers, PMUs, etc.) whose behavior needs to be specified somewhere in the
architecture as well. All these are closely coupled with the ARM CPU architecture.
(it's beyond this discussion and lets leave that to ARM folks).

This patch-set has a change to deal with ACPI _STA.Enabled Bit accordingly.


Best regards
Salil.

>  
>  Thanks!
Rafael J. Wysocki April 15, 2024, 4:38 p.m. UTC | #24
On Mon, Apr 15, 2024 at 5:31 PM Salil Mehta <salil.mehta@huawei.com> wrote:
>
> >  From: Rafael J. Wysocki <rafael@kernel.org>
> >  Sent: Monday, April 15, 2024 1:51 PM
> >
> >  On Mon, Apr 15, 2024 at 1:51 PM Salil Mehta <salil.mehta@huawei.com>
> >  wrote:
> >  >

[cut]

> >  > >  Though virtualization happily jumped on it to hot add/remove CPUs
> >  > > to/from  a guest.
> >  > >
> >  > >  There are limitations to this and we learned it the hard way on
> >  > > X86. At the  end we came up with the following restrictions:
> >  > >
> >  > >      1) All possible CPUs have to be advertised at boot time via firmware
> >  > >         (ACPI/DT/whatever) independent of them being present at boot time
> >  > >         or not.
> >  > >
> >  > >         That guarantees proper sizing and ensures that associations
> >  > >         between hardware entities and software representations and the
> >  > >         resulting topology are stable for the lifetime of a system.
> >  > >
> >  > >         It is really required to know the full topology of the system at
> >  > >         boot time especially with hybrid CPUs where some of the cores
> >  > >         have hyperthreading and the others do not.
> >  > >
> >  > >
> >  > >      2) Hot add can only mark an already registered (possible) CPU
> >  > >         present. Adding non-registered CPUs after boot is not possible.
> >  > >
> >  > >         The CPU must have been registered in #1 already to ensure that
> >  > >         the system topology does not suddenly change in an incompatible
> >  > >         way at run-time.
> >  > >
> >  > >  The same restriction would apply to real physical hotplug. I don't
> >  > > think that's  any different for ARM64 or any other architecture.
> >  >
> >  >
> >  > There is a difference:
> >  >
> >  > 1.   ARM arch does not allows for any processor to be NOT present. Hence,  because of
> >  > this restriction any of its related per-cpu components must be present
> >  > and enumerated at the boot time as well (exposed by firmware and
> >  > ACPI). This means all the enumerated processors will be marked as
> >  > 'present' but they might exist in NOT enabled (_STA.enabled=0) state.
> >  >
> >  > There was one clear difference and please correct me if I'm wrong
> >  > here,  for x86, the LAPIC associated with the x86 core can be brought online later even after boot?
> >  >
> >  > But for ARM Arch, processors and its corresponding per-cpu components
> >  > like redistributors all need to be present and enumerated during the
> >  > boot time. Redistributors are part of ALWAYS-ON power domain.
> >
> >  OK
> >
> >  So what exactly is the problem with this and what does
> >  acpi_processor_add() have to do with it?
>
>
> For ARM Arch, during boot time, it should add processor as if no hotplug exists. But later,
> in context to the (fake) hotplug trigger from the virtualizer as a result of the CPU (un)plug
> action  it should just end up in registering the already present CPU with the Linux Driver Model.

So let me repeat this last time: acpi_processor_add() cannot do that,
because (as defined today) it rejects CPUs with the "enabled" bit
clear in _STA.

> >
> >  Do you want the per-CPU structures etc. to be created from the
> >  acpi_processor_add() path?
>
>
> I referred to the components related to ARM CPU Arch like redistributors etc.
> which will get initialized in context to Arch specific _init code not here. This
> i.e. acpi_processor_add() is arch agnostic code common to all architectures.
>
> [ A digression: You do have _weak functions which can be overridden to arch specific
>  handling like  arch_(un)map_cpu() etc. but we can't use those to defer initialize
>  the CPU related components - ARM Arch constraint!]

Not right now, but they can be added I suppose.

>
> >
> >  This plain won't work because acpi_processor_add(), as defined in the
> >  mainline kernel today (and the Jonathan's patches don't change that
> >  AFAICS), returns an error for processor devices with the "enabled" bit clear
> >  in _STA (it returns an error if the "present" bit is clear too, but that's
> >  obvious), so it only gets to calling arch_register_cpu() if
> >  *both* "present" and "enabled" _STA bits are set for the given processor
> >  device.
>
>
> If you are referring to the _STA check in the XX_hot_add_init() then in the current
> kernel code it only checks for the ACPI_STA_DEVICE_PRESENT flag and not
> the ACPI_STA_DEVICE_ENABLED flag(?).

No, I am not.  I'm referring to this code in 6.9-rc4:

static int acpi_processor_add(struct acpi_device *device,
                    const struct acpi_device_id *id)
{
    struct acpi_processor *pr;
    struct device *dev;
    int result = 0;

    if (!acpi_device_is_enabled(device))
        return -ENODEV;

    ...
}

where acpi_device_is_enabled() is defined as follows:

bool acpi_device_is_enabled(const struct acpi_device *adev)
{
    return adev->status.present && adev->status.enabled;
}

> The code being reviewed has changed
> exactly that behavior for 2 legs i.e. make-present and make-enabled legs.

I'm not sure what you mean here, but the code above means that
acpi_processor_add) does not distinguish between CPU with the
"present" bit clear (in which case the "enabled" bit must also be
clear as per the spec) and CPUs with the "present" bit set and the
"enabled" bit clear.  These two cases are handled in the same way.

> I'm open to further address your point clearly.

I hope that the above is clear enough.

> >
> >  That, BTW, is why I keep saying that from the ACPI CPU enumeration code
> >  perspective, there is no difference between "present" and "enabled".
>
>
> Agreed but there is still a subtle difference.  Enumeration happens once and
> for all the processors during the boot time. And as confirmed by yourself and
> Thomas as well that even in x86 arch all the processors will be discovered and
> their topology fixed during the boot time which is effectively the same behavior
> as in the ARM Arch. But ARM assumes those 'present' bits in the present masks
> to be set during the boot time which is not like x86(?).  Hence, 'present cpu' Bits
> will always be equal to 'possible cpu' Bits. This is a constraint put by the ARM
> maintainers and looks unshakable.

Yes, there are differences between architectures, but the ACPI code is
(or at least should be) architecture-agnostic (as you said somewhere
above).  So why does this matter for the ACPI code?

> >
> >  > 2.  Agreed regarding the topology. Are you suggesting that we must
> >  > call arch_register_cpu() during boot time for all the 'present' CPUs?
> >  > Even if that's the case, we might still want to defer registration of
> >  > the cpu device (register_cpu() API) with the Linux device model. Later
> >  > is what we are doing to hide/unhide the CPUs from the user while
> >  STA.Enabled Bit is toggled due to CPU (un)plug action.
> >
> >  There are two ways to approach this IMV, and both seem to be valid in
> >  principle.
> >
> >  One would be to treat CPUs with the "enabled" bit clear as not present and
> >  create all of the requisite data structures for them when they become
> >  available (in analogy with the "real hot-add" case).
>
>
> Right. This one is out-of-scope for ARM Arch as we cannot defer any Arch
> specific sizing and initializations after boot i.e. when processor_add() gets
> called again later as a trigger of CPU plug action at the Virtualizer.
>
>
> >
> >  The alternative one is to create all of the requisite data structures for the
> >  CPUs that you find during boot, but register CPU devices for those having
> >  the "enabled" _STA bit set.
>
>
> Correct. and we defer the registration for CPUs with online-capable Bit
> set in the ACPI MADT/GICC data structure. These CPUs basically form
> set of hot-pluggable CPUs on ARM.
>
>
> >
> >  It looks like you have chosen the second approach, which is fine with me
> >  (although personally, I would rather choose the first one), but then your
> >  arch code needs to arrange for the requisite CPU data structures etc. to be
> >  set up before acpi_processor_add() gets called because, as per the above,
> >  the latter just rejects CPUs with the "enabled" _STA bit clear.
>
> Yes, correct. First one is not possible - at least for now and to have that it will
> require lot of rework especially at GIC. But there are many other arch components
> (like timers, PMUs, etc.) whose behavior needs to be specified somewhere in the
> architecture as well. All these are closely coupled with the ARM CPU architecture.
> (it's beyond this discussion and lets leave that to ARM folks).
>
> This patch-set has a change to deal with ACPI _STA.Enabled Bit accordingly.

Well, I'm having a hard time with this.

As far as CPU enumeration goes, if the "enabled" bit is clear in _STA,
it does not happen at all.  Both on ARM and on x86.

Now tell me why there need to be two separate code paths calling
arch_register_cpu() in acpi_processor_add()?

I see no reason whatsoever.

Moreover, I see reasons why there needs to be only one such code path.

Please feel free to prove me wrong.

Thanks!
Jonathan Cameron April 16, 2024, 5:41 p.m. UTC | #25
On Mon, 15 Apr 2024 13:23:51 +0100
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Mon, 15 Apr 2024 14:04:26 +0200
> "Rafael J. Wysocki" <rafael@kernel.org> wrote:
> 
> > On Mon, Apr 15, 2024 at 1:56 PM Jonathan Cameron
> > <Jonathan.Cameron@huawei.com> wrote:  
> > >
> > > On Mon, 15 Apr 2024 13:37:08 +0200
> > > "Rafael J. Wysocki" <rafael@kernel.org> wrote:
> > >    
> > > > On Mon, Apr 15, 2024 at 10:46 AM Jonathan Cameron
> > > > <Jonathan.Cameron@huawei.com> wrote:    
> > > > >
> > > > > On Sat, 13 Apr 2024 01:23:48 +0200
> > > > > Thomas Gleixner <tglx@linutronix.de> wrote:
> > > > >    
> > > > > > Russell!
> > > > > >
> > > > > > On Fri, Apr 12 2024 at 22:52, Russell King (Oracle) wrote:    
> > > > > > > On Fri, Apr 12, 2024 at 10:54:32PM +0200, Thomas Gleixner wrote:    
> > > > > > >> > As for the cpu locking, I couldn't find anything in arch_register_cpu()
> > > > > > >> > that depends on the cpu_maps_update stuff nor needs the cpus_write_lock
> > > > > > >> > being taken - so I've no idea why the "make_present" case takes these
> > > > > > >> > locks.    
> > > > > > >>
> > > > > > >> Anything which updates a CPU mask, e.g. cpu_present_mask, after early
> > > > > > >> boot must hold the appropriate write locks. Otherwise it would be
> > > > > > >> possible to online a CPU which just got marked present, but the
> > > > > > >> registration has not completed yet.    
> > > > > > >
> > > > > > > Yes. As far as I've been able to determine, arch_register_cpu()
> > > > > > > doesn't manipulate any of the CPU masks. All it seems to be doing
> > > > > > > is initialising the struct cpu, registering the embedded struct
> > > > > > > device, and setting up the sysfs links to its NUMA node.
> > > > > > >
> > > > > > > There is nothing obvious in there which manipulates any CPU masks, and
> > > > > > > this is rather my fundamental point when I said "I couldn't find
> > > > > > > anything in arch_register_cpu() that depends on ...".
> > > > > > >
> > > > > > > If there is something, then comments in the code would be a useful aid
> > > > > > > because it's highly non-obvious where such a manipulation is located,
> > > > > > > and hence why the locks are necessary.    
> > > > > >
> > > > > > acpi_processor_hotadd_init()
> > > > > > ...
> > > > > >          acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);
> > > > > >
> > > > > > That ends up in fiddling with cpu_present_mask.
> > > > > >
> > > > > > I grant you that arch_register_cpu() is not, but it might rely on the
> > > > > > external locking too. I could not be bothered to figure that out.
> > > > > >    
> > > > > > >> Define "real hotplug" :)
> > > > > > >>
> > > > > > >> Real physical hotplug does not really exist. That's at least true for
> > > > > > >> x86, where the physical hotplug support was chased for a while, but
> > > > > > >> never ended up in production.
> > > > > > >>
> > > > > > >> Though virtualization happily jumped on it to hot add/remove CPUs
> > > > > > >> to/from a guest.
> > > > > > >>
> > > > > > >> There are limitations to this and we learned it the hard way on X86. At
> > > > > > >> the end we came up with the following restrictions:
> > > > > > >>
> > > > > > >>     1) All possible CPUs have to be advertised at boot time via firmware
> > > > > > >>        (ACPI/DT/whatever) independent of them being present at boot time
> > > > > > >>        or not.
> > > > > > >>
> > > > > > >>        That guarantees proper sizing and ensures that associations
> > > > > > >>        between hardware entities and software representations and the
> > > > > > >>        resulting topology are stable for the lifetime of a system.
> > > > > > >>
> > > > > > >>        It is really required to know the full topology of the system at
> > > > > > >>        boot time especially with hybrid CPUs where some of the cores
> > > > > > >>        have hyperthreading and the others do not.
> > > > > > >>
> > > > > > >>
> > > > > > >>     2) Hot add can only mark an already registered (possible) CPU
> > > > > > >>        present. Adding non-registered CPUs after boot is not possible.
> > > > > > >>
> > > > > > >>        The CPU must have been registered in #1 already to ensure that
> > > > > > >>        the system topology does not suddenly change in an incompatible
> > > > > > >>        way at run-time.
> > > > > > >>
> > > > > > >> The same restriction would apply to real physical hotplug. I don't think
> > > > > > >> that's any different for ARM64 or any other architecture.    
> > > > > > >
> > > > > > > This makes me wonder whether the Arm64 has been barking up the wrong
> > > > > > > tree then, and whether the whole "present" vs "enabled" thing comes
> > > > > > > from a misunderstanding as far as a CPU goes.
> > > > > > >
> > > > > > > However, there is a big difference between the two. On x86, a processor
> > > > > > > is just a processor. On Arm64, a "processor" is a slice of the system
> > > > > > > (includes the interrupt controller, PMUs etc) and we must enumerate
> > > > > > > those even when the processor itself is not enabled. This is the whole
> > > > > > > reason there's a difference between "present" and "enabled" and why
> > > > > > > there's a difference between x86 cpu hotplug and arm64 cpu hotplug.
> > > > > > > The processor never actually goes away in arm64, it's just prevented
> > > > > > > from being used.    
> > > > > >
> > > > > > It's the same on X86 at least in the physical world.    
> > > > >
> > > > > There were public calls on this via the Linaro Open Discussions group,
> > > > > so I can talk a little about how we ended up here.  Note that (in my
> > > > > opinion) there is zero chance of this changing - it took us well over
> > > > > a year to get to this conclusion.  So if we ever want ARM vCPU HP
> > > > > we need to work within these constraints.
> > > > >
> > > > > The ARM architecture folk (the ones defining the ARM ARM, relevant ACPI
> > > > > specs etc, not the kernel maintainers) are determined that they want
> > > > > to retain the option to do real physical CPU hotplug in the future
> > > > > with all the necessary work around dynamic interrupt controller
> > > > > initialization, debug and many other messy corners.    
> > > >
> > > > That's OK, but the difference is not in the ACPi CPU enumeration/removal code.
> > > >    
> > > > > Thus anything defined had to be structured in a way that was 'different'
> > > > > from that.    
> > > >
> > > > Apparently, that's where things got confused.
> > > >    
> > > > > I don't mind the proposed flattening of the 2 paths if the ARM kernel
> > > > > maintainers are fine with it but it will remove the distinctions and
> > > > > we will need to be very careful with the CPU masks - we can't handle
> > > > > them the same as x86 does.    
> > > >
> > > > At the ACPI code level, there is no distinction.
> > > >
> > > > A CPU that was not available before has just become available.  The
> > > > platform firmware has notified the kernel about it and now
> > > > acpi_processor_add() runs.  Why would it need to use different code
> > > > paths depending on what _STA bits were clear before?    
> > >
> > > I think we will continue to disagree on this.  To my mind and from the
> > > ACPI specification, they are two different state transitions with different
> > > required actions.    
> > 
> > Well, please be specific: What exactly do you mean here and which
> > parts of the spec are you talking about?  
> 
> Given we are moving on with your suggestion, lets leave this for now - too many
> other things to do! :)
> 
> >   
> > > Those state transitions are an ACPI level thing not
> > > an arch level one.  However, I want a solution that moves things forwards
> > > so I'll give pushing that entirely into the arch code a try.    
> > 
> > Thanks!
> > 
> > Though I think that there is a disconnect between us that needs to be
> > clarified first.  
> 
> I'm fine with accepting your approach if it works and is acceptable
> to the arm kernel folk. They are getting a non trivial arch_register_cpu()
> with a bunch of ACPI specific handling in it that may come as a surprise.
> 
> >   
> > > >
> > > > Yes, there is some arch stuff to be called and that arch stuff should
> > > > figure out what to do to make things actually work.
> > > >    
> > > > > I'll get on with doing that, but do need input from Will / Catalin / James.
> > > > > There are some quirks that need calling out as it's not quite a simple
> > > > > as it appears from a high level.
> > > > >
> > > > > Another part of that long discussion established that there is userspace
> > > > > (Android IIRC) in which the CPU present mask must include all CPUs
> > > > > at boot. To change that would be userspace ABI breakage so we can't
> > > > > do that.  Hence the dance around adding yet another mask to allow the
> > > > > OS to understand which CPUs are 'present' but not possible to online.
> > > > >
> > > > > Flattening the two paths removes any distinction between calls that
> > > > > are for real hotplug and those that are for this online capable path.    
> > > >
> > > > Which calls exactly do you mean?    
> > >
> > > At the moment he distinction does not exist (because x86 only supports
> > > fake physical CPU HP and arm64 only vCPU HP / online capable), but if
> > > the architecture is defined for arm64 physical hotplug in the future
> > > we would need to do interrupt controller bring up + a lot of other stuff.
> > >
> > > It may be possible to do that in the arch code - will be hard to verify
> > > that until that arch is defined  Today all I need to do is ensure that
> > > any attempt to do present bit setting for ARM64 returns an error.
> > > That looks to be straight forward.    
> > 
> > OK
> >   
> > >    
> > > >    
> > > > > As a side note, the indicating bit for these flows is defined in ACPI
> > > > > for x86 from ACPI 6.3 as a flag in Processor Local APIC
> > > > > (the ARM64 definition is a cut and paste of that text).  So someone
> > > > > is interested in this distinction on x86. I can't say who but if
> > > > > you have a mantis account you can easily follow the history and it
> > > > > might be instructive to not everyone considering the current x86
> > > > > flow the right way to do it.    
> > > >
> > > > So a physically absent processor is different from a physically
> > > > present processor that has not been disabled.  No doubt about this.
> > > >
> > > > That said, I'm still unsure why these two cases require two different
> > > > code paths in acpi_processor_add().    
> > >
> > > It might be possible to push the checking down into arch_register_cpu()
> > > and have that for now reject any attempt to do physical CPU HP on arm64.
> > > It is that gate that is vital to getting this accepted by ARM.
> > >
> > > I'm still very much stuck on the hotadd_init flag however, so any suggestions
> > > on that would be very welcome!    
> > 
> > I need to do some investigation which will take some time I suppose.  
> 
> I'll do so as well once I've gotten the rest sorted out.  That whole
> structure seems overly complex and liable to race, though maybe sufficient
> locking happens to be held that it's not a problem.

Back to this a (maybe) last outstanding problem.

Superficially I think we might be able to get around this by always
doing the setup in the initial online. In brief that looks something the
below code.  Relying on the cpu hotplug callback registration calling
the acpi_soft_cpu_online for all instances that are already online.

Very lightly tested on arm64 and x86 with cold and hotplugged CPUs.
However this is all in emulation and I don't have access to any significant
x86 test farms :( So help will be needed if it's not immediately obvious why
we can't do this.

Of course, I'm open to other suggestions!

For now I'll put a tidied version of this one is as an RFC with the rest of v6.

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 06e718b650e5..97ca53b516d0 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -340,7 +340,7 @@ static int acpi_processor_get_info(struct acpi_device *device)
         */
        per_cpu(processor_device_array, pr->id) = device;
        per_cpu(processors, pr->id) = pr;
-
+       pr->flags.need_hotplug_init = 1;
        /*
         *  Extra Processor objects may be enumerated on MP systems with
         *  less than the max # of CPUs. They should be ignored _iff
diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c
index 67db60eda370..930f911fc435 100644
--- a/drivers/acpi/processor_driver.c
+++ b/drivers/acpi/processor_driver.c
@@ -206,7 +206,7 @@ static int acpi_processor_start(struct device *dev)

        /* Protect against concurrent CPU hotplug operations */
        cpu_hotplug_disable();
-       ret = __acpi_processor_start(device);
+       //      ret = __acpi_processor_start(device);
        cpu_hotplug_enable();
        return ret;
 }
@@ -279,7 +279,7 @@ static int __init acpi_processor_driver_init(void)
        if (result < 0)
                return result;

-       result = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+       result = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
                                           "acpi/cpu-drv:online",
                                           acpi_soft_cpu_online, NULL);
        if (result < 0)
> 
> Jonathan
> 
> 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Rafael J. Wysocki April 16, 2024, 7:02 p.m. UTC | #26
On Tue, Apr 16, 2024 at 7:41 PM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Mon, 15 Apr 2024 13:23:51 +0100
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
>
> > On Mon, 15 Apr 2024 14:04:26 +0200
> > "Rafael J. Wysocki" <rafael@kernel.org> wrote:

[cut]

> > > > I'm still very much stuck on the hotadd_init flag however, so any suggestions
> > > > on that would be very welcome!
> > >
> > > I need to do some investigation which will take some time I suppose.
> >
> > I'll do so as well once I've gotten the rest sorted out.  That whole
> > structure seems overly complex and liable to race, though maybe sufficient
> > locking happens to be held that it's not a problem.
>
> Back to this a (maybe) last outstanding problem.
>
> Superficially I think we might be able to get around this by always
> doing the setup in the initial online. In brief that looks something the
> below code.  Relying on the cpu hotplug callback registration calling
> the acpi_soft_cpu_online for all instances that are already online.
>
> Very lightly tested on arm64 and x86 with cold and hotplugged CPUs.
> However this is all in emulation and I don't have access to any significant
> x86 test farms :( So help will be needed if it's not immediately obvious why
> we can't do this.

AFAICS, this should work.  At least I don't see why it wouldn't.

> Of course, I'm open to other suggestions!
>
> For now I'll put a tidied version of this one is as an RFC with the rest of v6.
>
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index 06e718b650e5..97ca53b516d0 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -340,7 +340,7 @@ static int acpi_processor_get_info(struct acpi_device *device)
>          */
>         per_cpu(processor_device_array, pr->id) = device;
>         per_cpu(processors, pr->id) = pr;
> -
> +       pr->flags.need_hotplug_init = 1;
>         /*
>          *  Extra Processor objects may be enumerated on MP systems with
>          *  less than the max # of CPUs. They should be ignored _iff
> diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c
> index 67db60eda370..930f911fc435 100644
> --- a/drivers/acpi/processor_driver.c
> +++ b/drivers/acpi/processor_driver.c
> @@ -206,7 +206,7 @@ static int acpi_processor_start(struct device *dev)
>
>         /* Protect against concurrent CPU hotplug operations */
>         cpu_hotplug_disable();
> -       ret = __acpi_processor_start(device);
> +       //      ret = __acpi_processor_start(device);
>         cpu_hotplug_enable();
>         return ret;
>  }

So it looks like acpi_processor_start() is not necessary any more, is it?

> @@ -279,7 +279,7 @@ static int __init acpi_processor_driver_init(void)
>         if (result < 0)
>                 return result;
>
> -       result = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> +       result = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
>                                            "acpi/cpu-drv:online",
>                                            acpi_soft_cpu_online, NULL);
>         if (result < 0)
> >
> > Jonathan

Thanks!
Jonathan Cameron April 17, 2024, 10:39 a.m. UTC | #27
On Tue, 16 Apr 2024 21:02:02 +0200
"Rafael J. Wysocki" <rafael@kernel.org> wrote:

> On Tue, Apr 16, 2024 at 7:41 PM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > On Mon, 15 Apr 2024 13:23:51 +0100
> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> >  
> > > On Mon, 15 Apr 2024 14:04:26 +0200
> > > "Rafael J. Wysocki" <rafael@kernel.org> wrote:  
> 
> [cut]
> 
> > > > > I'm still very much stuck on the hotadd_init flag however, so any suggestions
> > > > > on that would be very welcome!  
> > > >
> > > > I need to do some investigation which will take some time I suppose.  
> > >
> > > I'll do so as well once I've gotten the rest sorted out.  That whole
> > > structure seems overly complex and liable to race, though maybe sufficient
> > > locking happens to be held that it's not a problem.  
> >
> > Back to this a (maybe) last outstanding problem.
> >
> > Superficially I think we might be able to get around this by always
> > doing the setup in the initial online. In brief that looks something the
> > below code.  Relying on the cpu hotplug callback registration calling
> > the acpi_soft_cpu_online for all instances that are already online.
> >
> > Very lightly tested on arm64 and x86 with cold and hotplugged CPUs.
> > However this is all in emulation and I don't have access to any significant
> > x86 test farms :( So help will be needed if it's not immediately obvious why
> > we can't do this.  
> 
> AFAICS, this should work.  At least I don't see why it wouldn't.
> 
> > Of course, I'm open to other suggestions!
> >
> > For now I'll put a tidied version of this one is as an RFC with the rest of v6.
> >
> > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > index 06e718b650e5..97ca53b516d0 100644
> > --- a/drivers/acpi/acpi_processor.c
> > +++ b/drivers/acpi/acpi_processor.c
> > @@ -340,7 +340,7 @@ static int acpi_processor_get_info(struct acpi_device *device)
> >          */
> >         per_cpu(processor_device_array, pr->id) = device;
> >         per_cpu(processors, pr->id) = pr;
> > -
> > +       pr->flags.need_hotplug_init = 1;
> >         /*
> >          *  Extra Processor objects may be enumerated on MP systems with
> >          *  less than the max # of CPUs. They should be ignored _iff
> > diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c
> > index 67db60eda370..930f911fc435 100644
> > --- a/drivers/acpi/processor_driver.c
> > +++ b/drivers/acpi/processor_driver.c
> > @@ -206,7 +206,7 @@ static int acpi_processor_start(struct device *dev)
> >
> >         /* Protect against concurrent CPU hotplug operations */
> >         cpu_hotplug_disable();
> > -       ret = __acpi_processor_start(device);
> > +       //      ret = __acpi_processor_start(device);
> >         cpu_hotplug_enable();
> >         return ret;
> >  }  
> 
> So it looks like acpi_processor_start() is not necessary any more, is it?

Absolutely.  This needs cleaning up beyond this hack.

Given pr has been initialized to 0, flipping the flag to be something
like 'initialized' and having the driver set it on first online rather than
in acpi_processor.c will clean it up further.

Jonathan
> 
> > @@ -279,7 +279,7 @@ static int __init acpi_processor_driver_init(void)
> >         if (result < 0)
> >                 return result;
> >
> > -       result = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> > +       result = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
> >                                            "acpi/cpu-drv:online",
> >                                            acpi_soft_cpu_online, NULL);
> >         if (result < 0)  
> > >
> > > Jonathan  
> 
> Thanks!
Salil Mehta April 17, 2024, 3:01 p.m. UTC | #28
HI Rafael,

>  From: Rafael J. Wysocki <rafael@kernel.org>
>  Sent: Monday, April 15, 2024 5:39 PM
>  
>  On Mon, Apr 15, 2024 at 5:31 PM Salil Mehta <salil.mehta@huawei.com>  wrote:
>  >
>  > >  From: Rafael J. Wysocki <rafael@kernel.org>
>  > >  Sent: Monday, April 15, 2024 1:51 PM
>  > >
>  > >  On Mon, Apr 15, 2024 at 1:51 PM Salil Mehta
>  > > <salil.mehta@huawei.com>
>  > >  wrote:
>  > >  >
>  
>  [cut]
>  
>  > >  > >  Though virtualization happily jumped on it to hot add/remove
>  > > CPUs  > > to/from  a guest.
>  > >  > >
>  > >  > >  There are limitations to this and we learned it the hard way
>  > > on  > > X86. At the  end we came up with the following restrictions:
>  > >  > >
>  > >  > >      1) All possible CPUs have to be advertised at boot time via  firmware
>  > >  > >         (ACPI/DT/whatever) independent of them being present at boot time
>  > >  > >         or not.
>  > >  > >
>  > >  > >         That guarantees proper sizing and ensures that associations
>  > >  > >         between hardware entities and software representations and the
>  > >  > >         resulting topology are stable for the lifetime of a system.
>  > >  > >
>  > >  > >         It is really required to know the full topology of the system at
>  > >  > >         boot time especially with hybrid CPUs where some of the cores
>  > >  > >         have hyperthreading and the others do not.
>  > >  > >
>  > >  > >
>  > >  > >      2) Hot add can only mark an already registered (possible) CPU
>  > >  > >         present. Adding non-registered CPUs after boot is not possible.
>  > >  > >
>  > >  > >         The CPU must have been registered in #1 already to ensure that
>  > >  > >         the system topology does not suddenly change in an incompatible
>  > >  > >         way at run-time.
>  > >  > >
>  > >  > >  The same restriction would apply to real physical hotplug. I
>  > > don't  > > think that's  any different for ARM64 or any other architecture.
>  > >  >
>  > >  >
>  > >  > There is a difference:
>  > >  >
>  > >  > 1.   ARM arch does not allows for any processor to be NOT present. Hence,  because of
>  > >  > this restriction any of its related per-cpu components must be
>  > > present  > and enumerated at the boot time as well (exposed by
>  > > firmware and  > ACPI). This means all the enumerated processors will
>  > > be marked as  > 'present' but they might exist in NOT enabled (_STA.enabled=0) state.
>  > >  >
>  > >  > There was one clear difference and please correct me if I'm wrong
>  > > > here,  for x86, the LAPIC associated with the x86 core can be brought online later even after boot?
>  > >  >
>  > >  > But for ARM Arch, processors and its corresponding per-cpu
>  > > components  > like redistributors all need to be present and
>  > > enumerated during the  > boot time. Redistributors are part of ALWAYS-ON power domain.
>  > >
>  > >  OK
>  > >
>  > >  So what exactly is the problem with this and what does
>  > >  acpi_processor_add() have to do with it?
>  >
>  >
>  > For ARM Arch, during boot time, it should add processor as if no
>  > hotplug exists. But later, in context to the (fake) hotplug trigger
>  > from the virtualizer as a result of the CPU (un)plug action  it should just
>  end up in registering the already present CPU with the Linux Driver Model.
>  
>  So let me repeat this last time: acpi_processor_add() cannot do that,
>  because (as defined today) it rejects CPUs with the "enabled" bit clear in  _STA.


I understand that now because you have placed a check recently. sorry for stretching
it a bit but I wanted to clearly understand the reason for this behavior. Is it because,

1. It does not makes sense to add a disabled but present/functional processor or
    perhaps there are repercussions to support such a behavior?

Or

2. None of the existing processors need such a behavior?



>  > >  Do you want the per-CPU structures etc. to be created from the
>  > >  acpi_processor_add() path?
>  >
>  >
>  > I referred to the components related to ARM CPU Arch like redistributors etc.
>  > which will get initialized in context to Arch specific _init code not
>  > here. This i.e. acpi_processor_add() is arch agnostic code common to all architectures.
>  >
>  > [ A digression: You do have _weak functions which can be overridden to
>  > arch specific  handling like  arch_(un)map_cpu() etc. but we can't use
>  > those to defer initialize  the CPU related components - ARM Arch
>  > constraint!]
>  
>  Not right now, but they can be added I suppose.
>  
>  >
>  > >
>  > >  This plain won't work because acpi_processor_add(), as defined in
>  > > the  mainline kernel today (and the Jonathan's patches don't change
>  > > that  AFAICS), returns an error for processor devices with the
>  > > "enabled" bit clear  in _STA (it returns an error if the "present"
>  > > bit is clear too, but that's  obvious), so it only gets to calling
>  > > arch_register_cpu() if
>  > >  *both* "present" and "enabled" _STA bits are set for the given
>  > > processor  device.
>  >
>  >
>  > If you are referring to the _STA check in the XX_hot_add_init() then
>  > in the current kernel code it only checks for the
>  > ACPI_STA_DEVICE_PRESENT flag and not the ACPI_STA_DEVICE_ENABLED flag(?).
>  
>  No, I am not.  I'm referring to this code in 6.9-rc4:
>  
>  static int acpi_processor_add(struct acpi_device *device,
>                      const struct acpi_device_id *id) {
>      struct acpi_processor *pr;
>      struct device *dev;
>      int result = 0;
>  
>      if (!acpi_device_is_enabled(device))
>          return -ENODEV;


Ahh, sorry, I missed this check as this has been added recently. Yes, now your
logic of having common legs makes more sense.


>  
>      ...
>  }
>  
>  where acpi_device_is_enabled() is defined as follows:
>  
>  bool acpi_device_is_enabled(const struct acpi_device *adev) {
>      return adev->status.present && adev->status.enabled; }


Got it. 


[digression note:]
BTW, I'm wondering why we are checking adev->status.present
as having adev->status.enabled as set and adev->status.present
as unset would mean firmware has a BUG. If we really want to
check this then we should rather flag a warning on detection
of this condition? 


Either this:
 bool acpi_device_is_enabled(const struct acpi_device *adev) {
      
     if (!acpi_device_is_present(adev)) {
            if (adev->status.enabled)
                       pr_debug("Device [%s] status inconsistent: Enabled but not Present\n",
                                          device->pnp.bus_id);
            return false;
     }
      return  true;
}

Ideally this inconsistency should have been checked in acpi_bus_get_status()
and above function should have been just,

file: drivers/acpi/scan.c
bool acpi_device_is_enabled(const struct acpi_device *adev) {
      return !!adev->status.enabled; }


file: drivers/acpi/bus.c
int acpi_bus_get_status(struct acpi_device *device)
{
       [...]

	status = acpi_bus_get_status_handle(device->handle, &sta);
	if (ACPI_FAILURE(status))
		return -ENODEV;

	acpi_set_device_status(device, sta);

	if (device->status.functional && !device->status.present) {
		pr_debug("Device [%s] status [%08x]: functional but not present\n",
			 device->pnp.bus_id, (u32)sta);
	}

+	if (device->status.enabled && !device->status.present) {
+		pr_debug("BUG: Device [%s] status [%08x]: enabled but not present\n",
+			 device->pnp.bus_id, (u32)sta);
+                         /* any specific handling here? */
+	}

	pr_debug("Device [%s] status [%08x]\n", device->pnp.bus_id, (u32)sta);
	return 0;
}

>  
>  > The code being reviewed has changed
>  > exactly that behavior for 2 legs i.e. make-present and make-enabled legs.
>  
>  I'm not sure what you mean here, but the code above means that
>  acpi_processor_add) does not distinguish between CPU with the "present"
>  bit clear (in which case the "enabled" bit must also be clear as per the spec)
>  and CPUs with the "present" bit set and the "enabled" bit clear.  These two
>  cases are handled in the same way.
>  
>  > I'm open to further address your point clearly.
>  
>  I hope that the above is clear enough.


Yes, clear now. I missed the new check.

>  
>  > >
>  > >  That, BTW, is why I keep saying that from the ACPI CPU enumeration
>  > > code  perspective, there is no difference between "present" and
>  "enabled".
>  >
>  >
>  > Agreed but there is still a subtle difference.  Enumeration happens
>  > once and for all the processors during the boot time. And as confirmed
>  > by yourself and Thomas as well that even in x86 arch all the
>  > processors will be discovered and their topology fixed during the boot
>  > time which is effectively the same behavior as in the ARM Arch. But
>  > ARM assumes those 'present' bits in the present masks to be set during
>  > the boot time which is not like x86(?).  Hence, 'present cpu' Bits
>  > will always be equal to 'possible cpu' Bits. This is a constraint put by the
>  ARM maintainers and looks unshakable.
>  
>  Yes, there are differences between architectures, but the ACPI code is (or
>  at least should be) architecture-agnostic (as you said somewhere above).
>  So why does this matter for the ACPI code?


It should not. There were few bits like overriding of arch_register_cpu() which
was not allowed by ARM folks in 2020 when I floated the first prototype.


>  > >  > 2.  Agreed regarding the topology. Are you suggesting that we
>  > > must  > call arch_register_cpu() during boot time for all the 'present'  CPUs?
>  > >  > Even if that's the case, we might still want to defer
>  > > registration of  > the cpu device (register_cpu() API) with the
>  > > Linux device model. Later  > is what we are doing to hide/unhide the
>  > > CPUs from the user while  STA.Enabled Bit is toggled due to CPU  (un)plug action.
>  > >
>  > >  There are two ways to approach this IMV, and both seem to be valid
>  > > in  principle.
>  > >
>  > >  One would be to treat CPUs with the "enabled" bit clear as not
>  > > present and  create all of the requisite data structures for them
>  > > when they become  available (in analogy with the "real hot-add" case).
>  >
>  >
>  > Right. This one is out-of-scope for ARM Arch as we cannot defer any
>  > Arch specific sizing and initializations after boot i.e. when
>  > processor_add() gets called again later as a trigger of CPU plug action at the Virtualizer.
>  >
>  >
>  > >
>  > >  The alternative one is to create all of the requisite data
>  > > structures for the  CPUs that you find during boot, but register CPU
>  > > devices for those having  the "enabled" _STA bit set.
>  >
>  >
>  > Correct. and we defer the registration for CPUs with online-capable
>  > Bit set in the ACPI MADT/GICC data structure. These CPUs basically
>  > form set of hot-pluggable CPUs on ARM.
>  >
>  >
>  > >
>  > >  It looks like you have chosen the second approach, which is fine
>  > > with me  (although personally, I would rather choose the first one),
>  > > but then your  arch code needs to arrange for the requisite CPU data
>  > > structures etc. to be  set up before acpi_processor_add() gets
>  > > called because, as per the above,  the latter just rejects CPUs with the  "enabled" _STA bit clear.
>  >
>  > Yes, correct. First one is not possible - at least for now and to have
>  > that it will require lot of rework especially at GIC. But there are
>  > many other arch components (like timers, PMUs, etc.) whose behavior
>  > needs to be specified somewhere in the architecture as well. All these are closely coupled with the ARM CPU architecture.
>  > (it's beyond this discussion and lets leave that to ARM folks).
>  >
>  > This patch-set has a change to deal with ACPI _STA.Enabled Bit accordingly.
>  
>  Well, I'm having a hard time with this.
>  
>  As far as CPU enumeration goes, if the "enabled" bit is clear in _STA, it does
>  not happen at all.  Both on ARM and on x86.

sure, I can see that now.

>  
>  Now tell me why there need to be two separate code paths calling
>  arch_register_cpu() in acpi_processor_add()?


As mentioned above, in the first prototype I floated in the year 2020 any attempts
to override the __weak call of arch_register_cpu() for ARM64 was discouraged. 
Though, the reasons might have changed now as some code has been moved.

Once we are allowed to override the calls then there are many more possibilities
which open up to simplify the code further.


>  
>  I see no reason whatsoever.
>  
>  Moreover, I see reasons why there needs to be only one such code path.
>  
>  Please feel free to prove me wrong.
>  
>  Thanks!
Rafael J. Wysocki April 17, 2024, 4:19 p.m. UTC | #29
On Wed, Apr 17, 2024 at 5:01 PM Salil Mehta <salil.mehta@huawei.com> wrote:
>
> HI Rafael,
>
> >  From: Rafael J. Wysocki <rafael@kernel.org>
> >  Sent: Monday, April 15, 2024 5:39 PM
> >
> >  On Mon, Apr 15, 2024 at 5:31 PM Salil Mehta <salil.mehta@huawei.com>  wrote:
> >  >
> >  > >  From: Rafael J. Wysocki <rafael@kernel.org>
> >  > >  Sent: Monday, April 15, 2024 1:51 PM
> >  > >
> >  > >  On Mon, Apr 15, 2024 at 1:51 PM Salil Mehta
> >  > > <salil.mehta@huawei.com>
> >  > >  wrote:
> >  > >  >
> >
> >  [cut]
> >
> >  > >  > >  Though virtualization happily jumped on it to hot add/remove
> >  > > CPUs  > > to/from  a guest.
> >  > >  > >
> >  > >  > >  There are limitations to this and we learned it the hard way
> >  > > on  > > X86. At the  end we came up with the following restrictions:
> >  > >  > >
> >  > >  > >      1) All possible CPUs have to be advertised at boot time via  firmware
> >  > >  > >         (ACPI/DT/whatever) independent of them being present at boot time
> >  > >  > >         or not.
> >  > >  > >
> >  > >  > >         That guarantees proper sizing and ensures that associations
> >  > >  > >         between hardware entities and software representations and the
> >  > >  > >         resulting topology are stable for the lifetime of a system.
> >  > >  > >
> >  > >  > >         It is really required to know the full topology of the system at
> >  > >  > >         boot time especially with hybrid CPUs where some of the cores
> >  > >  > >         have hyperthreading and the others do not.
> >  > >  > >
> >  > >  > >
> >  > >  > >      2) Hot add can only mark an already registered (possible) CPU
> >  > >  > >         present. Adding non-registered CPUs after boot is not possible.
> >  > >  > >
> >  > >  > >         The CPU must have been registered in #1 already to ensure that
> >  > >  > >         the system topology does not suddenly change in an incompatible
> >  > >  > >         way at run-time.
> >  > >  > >
> >  > >  > >  The same restriction would apply to real physical hotplug. I
> >  > > don't  > > think that's  any different for ARM64 or any other architecture.
> >  > >  >
> >  > >  >
> >  > >  > There is a difference:
> >  > >  >
> >  > >  > 1.   ARM arch does not allows for any processor to be NOT present. Hence,  because of
> >  > >  > this restriction any of its related per-cpu components must be
> >  > > present  > and enumerated at the boot time as well (exposed by
> >  > > firmware and  > ACPI). This means all the enumerated processors will
> >  > > be marked as  > 'present' but they might exist in NOT enabled (_STA.enabled=0) state.
> >  > >  >
> >  > >  > There was one clear difference and please correct me if I'm wrong
> >  > > > here,  for x86, the LAPIC associated with the x86 core can be brought online later even after boot?
> >  > >  >
> >  > >  > But for ARM Arch, processors and its corresponding per-cpu
> >  > > components  > like redistributors all need to be present and
> >  > > enumerated during the  > boot time. Redistributors are part of ALWAYS-ON power domain.
> >  > >
> >  > >  OK
> >  > >
> >  > >  So what exactly is the problem with this and what does
> >  > >  acpi_processor_add() have to do with it?
> >  >
> >  >
> >  > For ARM Arch, during boot time, it should add processor as if no
> >  > hotplug exists. But later, in context to the (fake) hotplug trigger
> >  > from the virtualizer as a result of the CPU (un)plug action  it should just
> >  end up in registering the already present CPU with the Linux Driver Model.
> >
> >  So let me repeat this last time: acpi_processor_add() cannot do that,
> >  because (as defined today) it rejects CPUs with the "enabled" bit clear in  _STA.
>
>
> I understand that now because you have placed a check recently. sorry for stretching
> it a bit but I wanted to clearly understand the reason for this behavior. Is it because,
>
> 1. It does not makes sense to add a disabled but present/functional processor or
>     perhaps there are repercussions to support such a behavior?

Yes because it is unusable.

> Or
>
> 2. None of the existing processors need such a behavior?
>
>
>
> >  > >  Do you want the per-CPU structures etc. to be created from the
> >  > >  acpi_processor_add() path?
> >  >
> >  >
> >  > I referred to the components related to ARM CPU Arch like redistributors etc.
> >  > which will get initialized in context to Arch specific _init code not
> >  > here. This i.e. acpi_processor_add() is arch agnostic code common to all architectures.
> >  >
> >  > [ A digression: You do have _weak functions which can be overridden to
> >  > arch specific  handling like  arch_(un)map_cpu() etc. but we can't use
> >  > those to defer initialize  the CPU related components - ARM Arch
> >  > constraint!]
> >
> >  Not right now, but they can be added I suppose.
> >
> >  >
> >  > >
> >  > >  This plain won't work because acpi_processor_add(), as defined in
> >  > > the  mainline kernel today (and the Jonathan's patches don't change
> >  > > that  AFAICS), returns an error for processor devices with the
> >  > > "enabled" bit clear  in _STA (it returns an error if the "present"
> >  > > bit is clear too, but that's  obvious), so it only gets to calling
> >  > > arch_register_cpu() if
> >  > >  *both* "present" and "enabled" _STA bits are set for the given
> >  > > processor  device.
> >  >
> >  >
> >  > If you are referring to the _STA check in the XX_hot_add_init() then
> >  > in the current kernel code it only checks for the
> >  > ACPI_STA_DEVICE_PRESENT flag and not the ACPI_STA_DEVICE_ENABLED flag(?).
> >
> >  No, I am not.  I'm referring to this code in 6.9-rc4:
> >
> >  static int acpi_processor_add(struct acpi_device *device,
> >                      const struct acpi_device_id *id) {
> >      struct acpi_processor *pr;
> >      struct device *dev;
> >      int result = 0;
> >
> >      if (!acpi_device_is_enabled(device))
> >          return -ENODEV;
>
>
> Ahh, sorry, I missed this check as this has been added recently. Yes, now your
> logic of having common legs makes more sense.
>
>
> >
> >      ...
> >  }
> >
> >  where acpi_device_is_enabled() is defined as follows:
> >
> >  bool acpi_device_is_enabled(const struct acpi_device *adev) {
> >      return adev->status.present && adev->status.enabled; }
>
>
> Got it.
>
>
> [digression note:]
> BTW, I'm wondering why we are checking adev->status.present
> as having adev->status.enabled as set and adev->status.present
> as unset would mean firmware has a BUG. If we really want to
> check this then we should rather flag a warning on detection
> of this condition?

Adding a warning would be fine with me.

> Either this:
>  bool acpi_device_is_enabled(const struct acpi_device *adev) {
>
>      if (!acpi_device_is_present(adev)) {
>             if (adev->status.enabled)
>                        pr_debug("Device [%s] status inconsistent: Enabled but not Present\n",
>                                           device->pnp.bus_id);
>             return false;
>      }
>       return  true;
> }
>
> Ideally this inconsistency should have been checked in acpi_bus_get_status()
> and above function should have been just,

Yes, it can be added there.  It can even clear 'enabled' if 'present' is clear.

> file: drivers/acpi/scan.c
> bool acpi_device_is_enabled(const struct acpi_device *adev) {
>       return !!adev->status.enabled; }

Sure.

> file: drivers/acpi/bus.c
> int acpi_bus_get_status(struct acpi_device *device)
> {
>        [...]
>
>         status = acpi_bus_get_status_handle(device->handle, &sta);
>         if (ACPI_FAILURE(status))
>                 return -ENODEV;
>
>         acpi_set_device_status(device, sta);
>
>         if (device->status.functional && !device->status.present) {
>                 pr_debug("Device [%s] status [%08x]: functional but not present\n",
>                          device->pnp.bus_id, (u32)sta);
>         }
>
> +       if (device->status.enabled && !device->status.present) {
> +               pr_debug("BUG: Device [%s] status [%08x]: enabled but not present\n",
> +                        device->pnp.bus_id, (u32)sta);
> +                         /* any specific handling here? */
> +       }
>
>         pr_debug("Device [%s] status [%08x]\n", device->pnp.bus_id, (u32)sta);
>         return 0;
> }
>
> >
> >  > The code being reviewed has changed
> >  > exactly that behavior for 2 legs i.e. make-present and make-enabled legs.
> >
> >  I'm not sure what you mean here, but the code above means that
> >  acpi_processor_add) does not distinguish between CPU with the "present"
> >  bit clear (in which case the "enabled" bit must also be clear as per the spec)
> >  and CPUs with the "present" bit set and the "enabled" bit clear.  These two
> >  cases are handled in the same way.
> >
> >  > I'm open to further address your point clearly.
> >
> >  I hope that the above is clear enough.
>
>
> Yes, clear now. I missed the new check.
>
> >
> >  > >
> >  > >  That, BTW, is why I keep saying that from the ACPI CPU enumeration
> >  > > code  perspective, there is no difference between "present" and
> >  "enabled".
> >  >
> >  >
> >  > Agreed but there is still a subtle difference.  Enumeration happens
> >  > once and for all the processors during the boot time. And as confirmed
> >  > by yourself and Thomas as well that even in x86 arch all the
> >  > processors will be discovered and their topology fixed during the boot
> >  > time which is effectively the same behavior as in the ARM Arch. But
> >  > ARM assumes those 'present' bits in the present masks to be set during
> >  > the boot time which is not like x86(?).  Hence, 'present cpu' Bits
> >  > will always be equal to 'possible cpu' Bits. This is a constraint put by the
> >  ARM maintainers and looks unshakable.
> >
> >  Yes, there are differences between architectures, but the ACPI code is (or
> >  at least should be) architecture-agnostic (as you said somewhere above).
> >  So why does this matter for the ACPI code?
>
>
> It should not. There were few bits like overriding of arch_register_cpu() which
> was not allowed by ARM folks in 2020 when I floated the first prototype.
>
>
> >  > >  > 2.  Agreed regarding the topology. Are you suggesting that we
> >  > > must  > call arch_register_cpu() during boot time for all the 'present'  CPUs?
> >  > >  > Even if that's the case, we might still want to defer
> >  > > registration of  > the cpu device (register_cpu() API) with the
> >  > > Linux device model. Later  > is what we are doing to hide/unhide the
> >  > > CPUs from the user while  STA.Enabled Bit is toggled due to CPU  (un)plug action.
> >  > >
> >  > >  There are two ways to approach this IMV, and both seem to be valid
> >  > > in  principle.
> >  > >
> >  > >  One would be to treat CPUs with the "enabled" bit clear as not
> >  > > present and  create all of the requisite data structures for them
> >  > > when they become  available (in analogy with the "real hot-add" case).
> >  >
> >  >
> >  > Right. This one is out-of-scope for ARM Arch as we cannot defer any
> >  > Arch specific sizing and initializations after boot i.e. when
> >  > processor_add() gets called again later as a trigger of CPU plug action at the Virtualizer.
> >  >
> >  >
> >  > >
> >  > >  The alternative one is to create all of the requisite data
> >  > > structures for the  CPUs that you find during boot, but register CPU
> >  > > devices for those having  the "enabled" _STA bit set.
> >  >
> >  >
> >  > Correct. and we defer the registration for CPUs with online-capable
> >  > Bit set in the ACPI MADT/GICC data structure. These CPUs basically
> >  > form set of hot-pluggable CPUs on ARM.
> >  >
> >  >
> >  > >
> >  > >  It looks like you have chosen the second approach, which is fine
> >  > > with me  (although personally, I would rather choose the first one),
> >  > > but then your  arch code needs to arrange for the requisite CPU data
> >  > > structures etc. to be  set up before acpi_processor_add() gets
> >  > > called because, as per the above,  the latter just rejects CPUs with the  "enabled" _STA bit clear.
> >  >
> >  > Yes, correct. First one is not possible - at least for now and to have
> >  > that it will require lot of rework especially at GIC. But there are
> >  > many other arch components (like timers, PMUs, etc.) whose behavior
> >  > needs to be specified somewhere in the architecture as well. All these are closely coupled with the ARM CPU architecture.
> >  > (it's beyond this discussion and lets leave that to ARM folks).
> >  >
> >  > This patch-set has a change to deal with ACPI _STA.Enabled Bit accordingly.
> >
> >  Well, I'm having a hard time with this.
> >
> >  As far as CPU enumeration goes, if the "enabled" bit is clear in _STA, it does
> >  not happen at all.  Both on ARM and on x86.
>
> sure, I can see that now.
>
> >
> >  Now tell me why there need to be two separate code paths calling
> >  arch_register_cpu() in acpi_processor_add()?
>
>
> As mentioned above, in the first prototype I floated in the year 2020 any attempts
> to override the __weak call of arch_register_cpu() for ARM64 was discouraged.
> Though, the reasons might have changed now as some code has been moved.
>
> Once we are allowed to override the calls then there are many more possibilities
> which open up to simplify the code further.

Well, IMV this should just be an arch function with no __weak
defaults, because the default would probably be unusable in practice
anyway.