mbox series

[00/29] x86/cpu: Rework the topology evaluation

Message ID 20230724155329.474037902@linutronix.de
Headers show
Series x86/cpu: Rework the topology evaluation | expand

Message

Thomas Gleixner July 24, 2023, 5:43 p.m. UTC
Hi!

A recent commit to the CPUID leaf 0xb/0x1f parser made me look deeper at
the way how topology is evaluated. That "fix" is just yet another cure the
sypmtom hack which completely ignores the underlying disaster.

The way how topology evaluation works is to overwrite the relevant
variables as often as possible. E.g. smp_num_siblings gets overwritten a
gazillion times, which is wrong to begin with. The boot CPU writes it 3
times, each AP two times.

What's worse is that this just works by chance on hybrid systems due to the
fact that the existing ones all seem to boot on a P-Core which has
SMT. Would it boot on a E-Core which has no SMT, then parts of the early
topology evaluation including the primary thread mask which is required for
parallel CPU bringup would be completely wrong. Overwriting it later on
with the correct value does not help at all.

What's wrong today with hybrid already is the number of cores per package.
On an ADL with 8 P-Cores and 8 E-cores the resulting number of cores per
package is evaluated to be 12. Which is not further surprising because the
CPUID 0xb/0x1f parser looks at the number of logical processors at core
level and divides them by the number of SMP siblings.

   24 / 2 = 12

Just that this CPU has obviously 16 cores not 12.

It's is even clearly documented in the SDM that this is wrong.

 "Bits 15-00: The number of logical processors across all instances of this
  domain within the next higher- scoped domain relative to this current
  logical processor. (For example, in a processor socket/package comprising
  "M" dies of "N" cores each, where each core has "L" processors, the
  "die" domain subleaf value of this field would be M*N*L. In an asymmetric
  topology this would be the summation of the value across the lower domain
  level instances to create each upper domain level instance.) This number
  reflects configuration as shipped by Intel. Note, software must not use
  this field to enumerate processor topology.

  Software must not use the value of EBX[15:0] to enumerate processor topology
  of the system. The value is only intended for display and diagnostic purposes.
  The actual number of logical processors available to BIOS/OS/Applications
  may be different from the value of EBX[15:0], depending on software and
  platform hardware configurations."

This "_NOT_ to use for topology evaluation" sentence existed even before
hybrid came along and got ignored. The code worked by chance, but with
hybrid all bets are off. The code completely falls apart once CPUID leaf
0x1f enumerates any topology level between CORE and DIE, but that's not a
suprise.

The proper thing to do is to actually evaluate the full topology including
the non-present (hotpluggable) CPUs based on the APICIDs which are provided
by the firmware and a proper topology domain level parser. This can exactly
tell the number of physical packages, logical packages etc. _before_ even
booting a single AP. All of that can be evaluated upfront.

Aside of that there are too many places which do their own topology
evaluation, but there is absolutely no central point which can actually
provide all of that information in a consistent way. This needs to change.

This series implements another piece towards this: sane CPUID evaluation,
which is done at _one_ place in a proper well defined order instead of
having it sprinkled all over the CPUID evaluation code.

At the end of this series this is pretty much bug compatible with the
current CPUID evaluation code in respect to the cores per package
evaluation, but it gets rid of overwriting things like smp_num_siblings,
which is now written once, but is still not capable to work correctly on a
hybrid machine which boots from a non SMT core. These things can only be
fixed up in the next step(s).

When I tried to go further with this I ran into yet another pile of
historical layers of duct tape and haywire with a gazillion of random
variables sprinkled all over the place. That's still work in progress.  It
actually works, but the last step which switches over is not yet in a shape
that can be easily reviewed. Stay tuned.

The series is based on the APIC cleanup series:

  https://lore.kernel.org/lkml/20230724131206.500814398@linutronix.de

and also available on top of that from git:

 git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-cpuid-v1

Thanks,

	tglx
---
 arch/x86/kernel/cpu/topology.c              |  168 --------------------
 b/Documentation/arch/x86/topology.rst       |   12 -
 b/arch/x86/events/amd/core.c                |    2 
 b/arch/x86/events/amd/uncore.c              |    2 
 b/arch/x86/events/intel/uncore.c            |    2 
 b/arch/x86/include/asm/apic.h               |    1 
 b/arch/x86/include/asm/cacheinfo.h          |    3 
 b/arch/x86/include/asm/cpuid.h              |   32 +++
 b/arch/x86/include/asm/processor.h          |   58 ++++--
 b/arch/x86/include/asm/smp.h                |    2 
 b/arch/x86/include/asm/topology.h           |   51 +++++-
 b/arch/x86/include/asm/x86_init.h           |    2 
 b/arch/x86/kernel/amd_nb.c                  |    8 
 b/arch/x86/kernel/apic/apic_flat_64.c       |    7 
 b/arch/x86/kernel/apic/apic_noop.c          |    3 
 b/arch/x86/kernel/apic/apic_numachip.c      |   11 -
 b/arch/x86/kernel/apic/bigsmp_32.c          |    6 
 b/arch/x86/kernel/apic/local.h              |    1 
 b/arch/x86/kernel/apic/probe_32.c           |    6 
 b/arch/x86/kernel/apic/x2apic_cluster.c     |    1 
 b/arch/x86/kernel/apic/x2apic_phys.c        |    6 
 b/arch/x86/kernel/apic/x2apic_uv_x.c        |   63 +------
 b/arch/x86/kernel/cpu/Makefile              |    5 
 b/arch/x86/kernel/cpu/amd.c                 |  156 ------------------
 b/arch/x86/kernel/cpu/cacheinfo.c           |   51 ++----
 b/arch/x86/kernel/cpu/centaur.c             |    4 
 b/arch/x86/kernel/cpu/common.c              |  108 +-----------
 b/arch/x86/kernel/cpu/cpu.h                 |   14 +
 b/arch/x86/kernel/cpu/debugfs.c             |   98 +++++++++++
 b/arch/x86/kernel/cpu/hygon.c               |  133 ---------------
 b/arch/x86/kernel/cpu/intel.c               |   38 ----
 b/arch/x86/kernel/cpu/mce/amd.c             |    4 
 b/arch/x86/kernel/cpu/mce/apei.c            |    4 
 b/arch/x86/kernel/cpu/mce/core.c            |    4 
 b/arch/x86/kernel/cpu/mce/inject.c          |    7 
 b/arch/x86/kernel/cpu/proc.c                |    8 
 b/arch/x86/kernel/cpu/topology.h            |   51 ++++++
 b/arch/x86/kernel/cpu/topology_amd.c        |  179 +++++++++++++++++++++
 b/arch/x86/kernel/cpu/topology_common.c     |  233 ++++++++++++++++++++++++++++
 b/arch/x86/kernel/cpu/topology_ext.c        |  136 ++++++++++++++++
 b/arch/x86/kernel/cpu/zhaoxin.c             |   18 --
 b/arch/x86/kernel/smpboot.c                 |   64 ++++---
 b/arch/x86/kernel/vsmp_64.c                 |   13 -
 b/arch/x86/mm/amdtopology.c                 |   35 +---
 b/arch/x86/xen/apic.c                       |    8 
 b/drivers/edac/amd64_edac.c                 |    4 
 b/drivers/edac/mce_amd.c                    |    4 
 b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c |    2 
 b/drivers/hwmon/fam15h_power.c              |    7 
 b/drivers/scsi/lpfc/lpfc_init.c             |    8 
 b/drivers/virt/acrn/hsm.c                   |    2 
 51 files changed, 964 insertions(+), 881 deletions(-)

Comments

Peter Zijlstra July 24, 2023, 9:02 p.m. UTC | #1
On Mon, Jul 24, 2023 at 10:49:42PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 24, 2023 at 07:44:17PM +0200, Thomas Gleixner wrote:
> 
> > +static inline bool topo_subleaf(struct topo_scan *tscan, u32 leaf, u32 subleaf)
> > +{
> > +	unsigned int dom, maxtype = leaf == 0xb ? CORE_TYPE + 1 : MAX_TYPE;
> > +	struct {
> > +		// eax
> > +		u32	x2apic_shift	:  5, // Number of bits to shift APIC ID right
> > +					      // for the topology ID at the next level
> > +			__rsvd0		: 27; // Reserved
> > +					      // ebx
> > +		u32	num_processors	: 16, // Number of processors at current level
> > +			__rsvd1		: 16; // Reserved
> > +					      // ecx
> > +		u32	level		:  8, // Current topology level. Same as sub leaf number
> > +			type		:  8, // Level type. If 0, invalid
> > +			__rsvd2		: 16; // Reserved
> > +					      // edx
> > +		u32	x2apic_id	: 32; // X2APIC ID of the current logical processor
> 
> That comment seems inconsistent, either have then all aligned or move
> all register names left.

AMD code seems to have the reg names left aligned -- perhaps do the same
here.
Thomas Gleixner July 25, 2023, 6:51 a.m. UTC | #2
On Mon, Jul 24 2023 at 22:49, Peter Zijlstra wrote:
> On Mon, Jul 24, 2023 at 07:44:17PM +0200, Thomas Gleixner wrote:
>
>> +static inline bool topo_subleaf(struct topo_scan *tscan, u32 leaf, u32 subleaf)
>> +{
>> +	unsigned int dom, maxtype = leaf == 0xb ? CORE_TYPE + 1 : MAX_TYPE;
>> +	struct {
>> +		// eax
>> +		u32	x2apic_shift	:  5, // Number of bits to shift APIC ID right
>> +					      // for the topology ID at the next level
>> +			__rsvd0		: 27; // Reserved
>> +					      // ebx
>> +		u32	num_processors	: 16, // Number of processors at current level
>> +			__rsvd1		: 16; // Reserved
>> +					      // ecx
>> +		u32	level		:  8, // Current topology level. Same as sub leaf number
>> +			type		:  8, // Level type. If 0, invalid
>> +			__rsvd2		: 16; // Reserved
>> +					      // edx
>> +		u32	x2apic_id	: 32; // X2APIC ID of the current logical processor
>
> That comment seems inconsistent, either have then all aligned or move
> all register names left.

Bah. I had all the register names left at some point. No idea how I lost
that again. Probably when I rolled back to some earlier version after
screwing up :)

>> +
>> +	/* Read all available subleafs and populate the levels */
>> +	for (subleaf = 0; topo_subleaf(tscan, leaf, subleaf); subleaf++);
>
> Personally I prefer:
>
> 	for (;;)
> 		;
>
> that is, have the semicolon on it's own line, but meh.

:)
Thomas Gleixner July 25, 2023, 7:46 a.m. UTC | #3
On Mon, Jul 24 2023 at 19:43, Thomas Gleixner wrote:
> +struct cpuinfo_topology {
> +	u16			apicid;
> +	u16			initial_apicid;

There was an offlist question whether these should be u32 because with
X2APIC the APIC ID is 32bit wide.

The answer is yes, no, maybe. Why?

In practice there are limitations, both on the hardware side and on the
kernel side.

The kernel limits the max. APIC ID to 32768 and the maximum number of
CPUs to 8192 right now. Increasing the maximum APIC ID is possible, but
that needs some deep thoughts as we have one array which is
MAX_LOCAL_APIC sized and a bitmap of that size too. Even the bitmap
would require (1 << 32)/8 = 5.36871e+08 B = 512MB of memory. With a limit
of 32768 it's a reasonable 4KB. :)

On the hardware side the topology information is in the APIC ID:

      [PKGID][DIEID]...[COREID][THREADID]

where everything below the PKGID is relative to the package. Right now
the vendors have that space packed, i.e. the number of bits below PKGID
is sized that its the next power of 2 which allows to fit the actual
number of logical processors.

There have been systems where the PKGID shift was larger than that which
caused us to do the logical package mapping because we ended up with
package ID gaps. That was caused by incorrect information in leaf
0xB/0x1F, i.e. the package shift enumerated was smaller than the actual
one.

So with an upper limit of 8192 CPUs the limitation to 32K APIC IDs
should be really sufficient. The largest package shift I've seen so far
is 8, i.e. 256 logical processors per package. That means 32 packages
max. That should be sufficient for a while, right? The HPE/UV people
might have a word to say here though.

So no, u16 is fine, but yes, we can make it u32 just for simplicity
sake, which still does not allow you to have an APIC ID >= 32k, but
makes it easy enough to expand that to e.g. 64K or 128K if the need ever
arises. Let me rework that accordingly.

Thanks,

        tglx
Dimitri Sivanich July 25, 2023, 8:17 p.m. UTC | #4
We have successfully booted a 32-socket, 3840 cpu HPE Sapphire Rapids system
with your patchset applied, as found in:
  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-cpuid-v1

# cat /sys/kernel/debug/x86/topo/cpus/3839
initial_apicid: ff7
apicid: ff7
pkg_id: 31
die_id: 31
cu_id: 255
core_id: 59
logical_pkg_id: 31
logical_die_id: 31
llc_id: 3968
l2c_id: 4086
amd_node_id: 0
amd_nodes_per_pkg: 0
max_cores: 60
max_die_per_pkg: 1
smp_num_siblings: 2


We were also able to boot a 60-socket, 2160 cpu HPE SuperDome Flex (SkyLake)
with the patchset.

# cat /sys/kernel/debug/x86/topo/cpus/2159
initial_apicid: ef7
apicid: ef7
pkg_id: 59
die_id: 59
cu_id: 255
core_id: 27
logical_pkg_id: 59
logical_die_id: 59
llc_id: 3776
l2c_id: 3830
amd_node_id: 0
amd_nodes_per_pkg: 0
max_cores: 18
max_die_per_pkg: 1
smp_num_siblings: 2


On Mon, Jul 24, 2023 at 07:43:50PM +0200, Thomas Gleixner wrote:
> Hi!
> 
> A recent commit to the CPUID leaf 0xb/0x1f parser made me look deeper at
> the way how topology is evaluated. That "fix" is just yet another cure the
> sypmtom hack which completely ignores the underlying disaster.
> 
> The way how topology evaluation works is to overwrite the relevant
> variables as often as possible. E.g. smp_num_siblings gets overwritten a
> gazillion times, which is wrong to begin with. The boot CPU writes it 3
> times, each AP two times.
> 
> What's worse is that this just works by chance on hybrid systems due to the
> fact that the existing ones all seem to boot on a P-Core which has
> SMT. Would it boot on a E-Core which has no SMT, then parts of the early
> topology evaluation including the primary thread mask which is required for
> parallel CPU bringup would be completely wrong. Overwriting it later on
> with the correct value does not help at all.
> 
> What's wrong today with hybrid already is the number of cores per package.
> On an ADL with 8 P-Cores and 8 E-cores the resulting number of cores per
> package is evaluated to be 12. Which is not further surprising because the
> CPUID 0xb/0x1f parser looks at the number of logical processors at core
> level and divides them by the number of SMP siblings.
> 
>    24 / 2 = 12
> 
> Just that this CPU has obviously 16 cores not 12.
> 
> It's is even clearly documented in the SDM that this is wrong.
> 
>  "Bits 15-00: The number of logical processors across all instances of this
>   domain within the next higher- scoped domain relative to this current
>   logical processor. (For example, in a processor socket/package comprising
>   "M" dies of "N" cores each, where each core has "L" processors, the
>   "die" domain subleaf value of this field would be M*N*L. In an asymmetric
>   topology this would be the summation of the value across the lower domain
>   level instances to create each upper domain level instance.) This number
>   reflects configuration as shipped by Intel. Note, software must not use
>   this field to enumerate processor topology.
> 
>   Software must not use the value of EBX[15:0] to enumerate processor topology
>   of the system. The value is only intended for display and diagnostic purposes.
>   The actual number of logical processors available to BIOS/OS/Applications
>   may be different from the value of EBX[15:0], depending on software and
>   platform hardware configurations."
> 
> This "_NOT_ to use for topology evaluation" sentence existed even before
> hybrid came along and got ignored. The code worked by chance, but with
> hybrid all bets are off. The code completely falls apart once CPUID leaf
> 0x1f enumerates any topology level between CORE and DIE, but that's not a
> suprise.
> 
> The proper thing to do is to actually evaluate the full topology including
> the non-present (hotpluggable) CPUs based on the APICIDs which are provided
> by the firmware and a proper topology domain level parser. This can exactly
> tell the number of physical packages, logical packages etc. _before_ even
> booting a single AP. All of that can be evaluated upfront.
> 
> Aside of that there are too many places which do their own topology
> evaluation, but there is absolutely no central point which can actually
> provide all of that information in a consistent way. This needs to change.
> 
> This series implements another piece towards this: sane CPUID evaluation,
> which is done at _one_ place in a proper well defined order instead of
> having it sprinkled all over the CPUID evaluation code.
> 
> At the end of this series this is pretty much bug compatible with the
> current CPUID evaluation code in respect to the cores per package
> evaluation, but it gets rid of overwriting things like smp_num_siblings,
> which is now written once, but is still not capable to work correctly on a
> hybrid machine which boots from a non SMT core. These things can only be
> fixed up in the next step(s).
> 
> When I tried to go further with this I ran into yet another pile of
> historical layers of duct tape and haywire with a gazillion of random
> variables sprinkled all over the place. That's still work in progress.  It
> actually works, but the last step which switches over is not yet in a shape
> that can be easily reviewed. Stay tuned.
> 
> The series is based on the APIC cleanup series:
> 
>   https://lore.kernel.org/lkml/20230724131206.500814398@linutronix.de
> 
> and also available on top of that from git:
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-cpuid-v1
> 
> Thanks,
> 
> 	tglx
> ---
>  arch/x86/kernel/cpu/topology.c              |  168 --------------------
>  b/Documentation/arch/x86/topology.rst       |   12 -
>  b/arch/x86/events/amd/core.c                |    2 
>  b/arch/x86/events/amd/uncore.c              |    2 
>  b/arch/x86/events/intel/uncore.c            |    2 
>  b/arch/x86/include/asm/apic.h               |    1 
>  b/arch/x86/include/asm/cacheinfo.h          |    3 
>  b/arch/x86/include/asm/cpuid.h              |   32 +++
>  b/arch/x86/include/asm/processor.h          |   58 ++++--
>  b/arch/x86/include/asm/smp.h                |    2 
>  b/arch/x86/include/asm/topology.h           |   51 +++++-
>  b/arch/x86/include/asm/x86_init.h           |    2 
>  b/arch/x86/kernel/amd_nb.c                  |    8 
>  b/arch/x86/kernel/apic/apic_flat_64.c       |    7 
>  b/arch/x86/kernel/apic/apic_noop.c          |    3 
>  b/arch/x86/kernel/apic/apic_numachip.c      |   11 -
>  b/arch/x86/kernel/apic/bigsmp_32.c          |    6 
>  b/arch/x86/kernel/apic/local.h              |    1 
>  b/arch/x86/kernel/apic/probe_32.c           |    6 
>  b/arch/x86/kernel/apic/x2apic_cluster.c     |    1 
>  b/arch/x86/kernel/apic/x2apic_phys.c        |    6 
>  b/arch/x86/kernel/apic/x2apic_uv_x.c        |   63 +------
>  b/arch/x86/kernel/cpu/Makefile              |    5 
>  b/arch/x86/kernel/cpu/amd.c                 |  156 ------------------
>  b/arch/x86/kernel/cpu/cacheinfo.c           |   51 ++----
>  b/arch/x86/kernel/cpu/centaur.c             |    4 
>  b/arch/x86/kernel/cpu/common.c              |  108 +-----------
>  b/arch/x86/kernel/cpu/cpu.h                 |   14 +
>  b/arch/x86/kernel/cpu/debugfs.c             |   98 +++++++++++
>  b/arch/x86/kernel/cpu/hygon.c               |  133 ---------------
>  b/arch/x86/kernel/cpu/intel.c               |   38 ----
>  b/arch/x86/kernel/cpu/mce/amd.c             |    4 
>  b/arch/x86/kernel/cpu/mce/apei.c            |    4 
>  b/arch/x86/kernel/cpu/mce/core.c            |    4 
>  b/arch/x86/kernel/cpu/mce/inject.c          |    7 
>  b/arch/x86/kernel/cpu/proc.c                |    8 
>  b/arch/x86/kernel/cpu/topology.h            |   51 ++++++
>  b/arch/x86/kernel/cpu/topology_amd.c        |  179 +++++++++++++++++++++
>  b/arch/x86/kernel/cpu/topology_common.c     |  233 ++++++++++++++++++++++++++++
>  b/arch/x86/kernel/cpu/topology_ext.c        |  136 ++++++++++++++++
>  b/arch/x86/kernel/cpu/zhaoxin.c             |   18 --
>  b/arch/x86/kernel/smpboot.c                 |   64 ++++---
>  b/arch/x86/kernel/vsmp_64.c                 |   13 -
>  b/arch/x86/mm/amdtopology.c                 |   35 +---
>  b/arch/x86/xen/apic.c                       |    8 
>  b/drivers/edac/amd64_edac.c                 |    4 
>  b/drivers/edac/mce_amd.c                    |    4 
>  b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c |    2 
>  b/drivers/hwmon/fam15h_power.c              |    7 
>  b/drivers/scsi/lpfc/lpfc_init.c             |    8 
>  b/drivers/virt/acrn/hsm.c                   |    2 
>  51 files changed, 964 insertions(+), 881 deletions(-)
> 
> 
>
Sohil Mehta July 26, 2023, 10:15 p.m. UTC | #5
On 7/24/2023 10:43 AM, Thomas Gleixner wrote:
> The series is based on the APIC cleanup series:
> 
>   https://lore.kernel.org/lkml/20230724131206.500814398@linutronix.de
> 
> and also available on top of that from git:
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-cpuid-v1
> 

The series boots fine on an old Sandy Bridge 2S system. There is a print
from topology_update_die_map() which is missing from dmesg but it seems
mostly harmless.

> [    0.085686] smpboot: x86: Booting SMP configuration:> [    0.085690] .... node  #0, CPUs:          #1   #2   #3   #4   #5
#6   #7   #8   #9
> [    0.089253] .... node  #1, CPUs:    #10  #11  #12  #13  #14  #15  #16  #17  #18  #19
> [    0.000000] smpboot: CPU 10 Converting physical 0 to logical die 1

 ^^ The "Converting physical..." line doesn't show up with the patches
applied.

> [    0.134035] .... node  #0, CPUs:    #20  #21  #22  #23  #24  #25  #26  #27  #28  #29
> [    0.140239] .... node  #1, CPUs:    #30  #31  #32  #33  #34  #35  #36  #37  #38  #39

Please let me know if you need more information.

Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Thomas Gleixner July 26, 2023, 10:38 p.m. UTC | #6
On Wed, Jul 26 2023 at 15:15, Sohil Mehta wrote:
> On 7/24/2023 10:43 AM, Thomas Gleixner wrote:
>> The series is based on the APIC cleanup series:
>> 
>>   https://lore.kernel.org/lkml/20230724131206.500814398@linutronix.de
>> 
>> and also available on top of that from git:
>> 
>>  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-cpuid-v1
>> 
>
> The series boots fine on an old Sandy Bridge 2S system. There is a print
> from topology_update_die_map() which is missing from dmesg but it seems
> mostly harmless.
>
>> [    0.085686] smpboot: x86: Booting SMP configuration:> [    0.085690] .... node  #0, CPUs:          #1   #2   #3   #4   #5
> #6   #7   #8   #9
>> [    0.089253] .... node  #1, CPUs:    #10  #11  #12  #13  #14  #15  #16  #17  #18  #19
>> [    0.000000] smpboot: CPU 10 Converting physical 0 to logical die 1
>
>  ^^ The "Converting physical..." line doesn't show up with the patches
> applied.

That message comes from the complete nonsense in the current upstream
code that cpuinfo::die_id is made relative to the package. That's just
wrong. My rework uses the physical die id which is unique by definition
and therefore does not need this conversion. The logical ID is the same
as the physical id in that case.

>> [    0.134035] .... node  #0, CPUs:    #20  #21  #22  #23  #24  #25  #26  #27  #28  #29
>> [    0.140239] .... node  #1, CPUs:    #30  #31  #32  #33  #34  #35  #36  #37  #38  #39
>
> Please let me know if you need more information.
>
> Tested-by: Sohil Mehta <sohil.mehta@intel.com>

There is a real and unrelated snafu vs. that logical package and logical
die management which I discovered today. I missed the fact that this
cruft abuses cpuinfo as permanent storage, which breaks CPU
offline/online as the online operation reinitializes the topology
information.

I pushed out a fixed version to

  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/topology

earlier. It doesn't have a tag yet.

Thanks,

        tglx
Sohil Mehta July 26, 2023, 11:27 p.m. UTC | #7
On 7/26/2023 3:38 PM, Thomas Gleixner wrote:
> There is a real and unrelated snafu vs. that logical package and logical
> die management which I discovered today. I missed the fact that this
> cruft abuses cpuinfo as permanent storage, which breaks CPU
> offline/online as the online operation reinitializes the topology
> information.
> 
> I pushed out a fixed version to
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/topology
> 
> earlier. It doesn't have a tag yet.
> 

Thanks, I'll try this out and experiment with CPU hotplug as well.

-Sohil