mbox series

[RFC,0/4] Parse ACPI/PPTT for cache information

Message ID 20170805001159.12769-1-jeremy.linton@arm.com
Headers show
Series Parse ACPI/PPTT for cache information | expand

Message

Jeremy Linton Aug. 5, 2017, 12:11 a.m. UTC
ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is
used to describe the processor and cache topologies. Ideally it is
used to extend/override information provided by the hardware, but
right now ARM64 is entirely dependent on firmware provided tables.

This patch parses the table for the cache topology only. Its quite
trivial to add processor/cluster/???/socket level parsing as well,
but that information isn't as useful as the already provided NUMA
SRAT/SLIT information which provides relative distances. The one
useful thing, is the number of physical sockets but due to the
way arm64 considers "clusters" to be sockets, a larger discussion
is required here.

An example of lstopo with this patch:

[root@mammon-juno-rh ~]# lstopo-no-graphics
Machine (8072MB)
  Package L#0 + L2 L#0 (1024KB)
    L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
    L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
    L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
    L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
  Package L#1 + L2 L#1 (2048KB)
    L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)
    L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)
  HostBridge L#0
    PCIBridge
      PCIBridge
        PCIBridge
          PCI 1095:3132
            Block(Disk) L#0 "sda"
        PCIBridge
          PCI 11ab:4380
            Net L#1 "enp8s0"

Jeremy Linton (4):
  drivers: base: cacheinfo: Add support for ACPI based firmware tables
  arm64: cacheinfo: Add support for ACPI/PPTT generated topology
  ACPI/PPTT: Add Processor Properties Topology Table parsing
  ACPI: Enable PPTT support on ARM64

 arch/arm64/kernel/cacheinfo.c |  23 ++-
 drivers/acpi/arm64/Kconfig    |   3 +
 drivers/acpi/arm64/Makefile   |   1 +
 drivers/acpi/arm64/pptt.c     | 389 ++++++++++++++++++++++++++++++++++++++++++
 drivers/base/cacheinfo.c      |  15 +-
 include/linux/cacheinfo.h     |   1 +
 6 files changed, 422 insertions(+), 10 deletions(-)
 create mode 100644 drivers/acpi/arm64/pptt.c

-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Hanjun Guo Aug. 7, 2017, 10:20 a.m. UTC | #1
+Cc Xiongfeng (who is also working on the PPTT but focusing on
CPU topology)

Hi Jeremy,

On 2017/8/5 8:11, Jeremy Linton wrote:
> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is

> used to describe the processor and cache topologies. Ideally it is

> used to extend/override information provided by the hardware, but

> right now ARM64 is entirely dependent on firmware provided tables.

> 

> This patch parses the table for the cache topology only. Its quite

> trivial to add processor/cluster/???/socket level parsing as well,

> but that information isn't as useful as the already provided NUMA

> SRAT/SLIT information which provides relative distances. The one

> useful thing, is the number of physical sockets but due to the

> way arm64 considers "clusters" to be sockets, a larger discussion

> is required here.


I think we need the socket to represent the true topology of
the SoC, which means that considering clusters to be sockets is
wrong on ARM64 server platforms, a "socket" needs to be a memory
controller attached I think.

Take D05 for example, there are two physical SoC sockets on
the board but with two CPU DIE (with memory controller) on each
physical socket, and 4 clusters on each CPU DIE.

When considering clusters as sockets (that's the code for now),
there are 16 "sockets" to represent to OS for schedule input,
but only 4 NUMA nodes, which are confusing the scheduler a lot...

Xiongfeng was working on the CPU topology based on PPTT, and the code
is under internal review, if it's OK for you, we can send them out
for review comments to see if we can join our effort together, or
we can work on top of your patches, as you like :)

> 

> An example of lstopo with this patch:

> 

> [root@mammon-juno-rh ~]# lstopo-no-graphics

> Machine (8072MB)

>    Package L#0 + L2 L#0 (1024KB)

>      L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)

>      L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)

>      L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)

>      L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)

>    Package L#1 + L2 L#1 (2048KB)

>      L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)

>      L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)

>    HostBridge L#0

>      PCIBridge

>        PCIBridge

>          PCIBridge

>            PCI 1095:3132

>              Block(Disk) L#0 "sda"

>          PCIBridge

>            PCI 11ab:4380

>              Net L#1 "enp8s0"

> 

> Jeremy Linton (4):

>    drivers: base: cacheinfo: Add support for ACPI based firmware tables

>    arm64: cacheinfo: Add support for ACPI/PPTT generated topology

>    ACPI/PPTT: Add Processor Properties Topology Table parsing

>    ACPI: Enable PPTT support on ARM64

> 

>   arch/arm64/kernel/cacheinfo.c |  23 ++-

>   drivers/acpi/arm64/Kconfig    |   3 +

>   drivers/acpi/arm64/Makefile   |   1 +

>   drivers/acpi/arm64/pptt.c     | 389 ++++++++++++++++++++++++++++++++++++++++++


I think PPTT is not ARM64 only, can be used for x86 too,
shall we locate them on drivers/acpi?

Rafael was working a lot on the PPTT proposal for the
spec, I think he can comment on this :)

Rafael, what do you think?

Thanks
Hanjun
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeffrey Hugo Aug. 7, 2017, 5:10 p.m. UTC | #2
On 8/7/2017 4:20 AM, Hanjun Guo wrote:
> +Cc Xiongfeng (who is also working on the PPTT but focusing on

> CPU topology)

> 

> Hi Jeremy,

> 

> On 2017/8/5 8:11, Jeremy Linton wrote:

>> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is

>> used to describe the processor and cache topologies. Ideally it is

>> used to extend/override information provided by the hardware, but

>> right now ARM64 is entirely dependent on firmware provided tables.

>>

>> This patch parses the table for the cache topology only. Its quite

>> trivial to add processor/cluster/???/socket level parsing as well,

>> but that information isn't as useful as the already provided NUMA

>> SRAT/SLIT information which provides relative distances. The one

>> useful thing, is the number of physical sockets but due to the

>> way arm64 considers "clusters" to be sockets, a larger discussion

>> is required here.

> 

> I think we need the socket to represent the true topology of

> the SoC, which means that considering clusters to be sockets is

> wrong on ARM64 server platforms, a "socket" needs to be a memory

> controller attached I think.

> 

> Take D05 for example, there are two physical SoC sockets on

> the board but with two CPU DIE (with memory controller) on each

> physical socket, and 4 clusters on each CPU DIE.

> 

> When considering clusters as sockets (that's the code for now),

> there are 16 "sockets" to represent to OS for schedule input,

> but only 4 NUMA nodes, which are confusing the scheduler a lot...

> 

> Xiongfeng was working on the CPU topology based on PPTT, and the code

> is under internal review, if it's OK for you, we can send them out

> for review comments to see if we can join our effort together, or

> we can work on top of your patches, as you like :)

> 

>>

>> An example of lstopo with this patch:

>>

>> [root@mammon-juno-rh ~]# lstopo-no-graphics

>> Machine (8072MB)

>>    Package L#0 + L2 L#0 (1024KB)

>>      L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)

>>      L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)

>>      L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)

>>      L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)

>>    Package L#1 + L2 L#1 (2048KB)

>>      L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)

>>      L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)

>>    HostBridge L#0

>>      PCIBridge

>>        PCIBridge

>>          PCIBridge

>>            PCI 1095:3132

>>              Block(Disk) L#0 "sda"

>>          PCIBridge

>>            PCI 11ab:4380

>>              Net L#1 "enp8s0"

>>

>> Jeremy Linton (4):

>>    drivers: base: cacheinfo: Add support for ACPI based firmware tables

>>    arm64: cacheinfo: Add support for ACPI/PPTT generated topology

>>    ACPI/PPTT: Add Processor Properties Topology Table parsing

>>    ACPI: Enable PPTT support on ARM64

>>

>>   arch/arm64/kernel/cacheinfo.c |  23 ++-

>>   drivers/acpi/arm64/Kconfig    |   3 +

>>   drivers/acpi/arm64/Makefile   |   1 +

>>   drivers/acpi/arm64/pptt.c     | 389 

>> ++++++++++++++++++++++++++++++++++++++++++

> 

> I think PPTT is not ARM64 only, can be used for x86 too,

> shall we locate them on drivers/acpi?


Austin and I have been working on the CPU topology.  Sounds like we have 
some overlap with your work.  We have a working prototype at 
https://source.codeaurora.org/quic/server/kernel/log/?h=jhugo/pptt but 
are still doing some cleanup and fixes for our needs.

Drivers/acpi makes sense to us, and was our working assumption.

> 

> Rafael was working a lot on the PPTT proposal for the

> spec, I think he can comment on this :)

> 

> Rafael, what do you think?

> 

> Thanks

> Hanjun

> 

> _______________________________________________

> linux-arm-kernel mailing list

> linux-arm-kernel@lists.infradead.org

> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



-- 
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeremy Linton Aug. 7, 2017, 5:33 p.m. UTC | #3
Hi,

On 08/07/2017 05:20 AM, Hanjun Guo wrote:
> +Cc Xiongfeng (who is also working on the PPTT but focusing on

> CPU topology)

> 

> Hi Jeremy,

> 

> On 2017/8/5 8:11, Jeremy Linton wrote:

>> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is

>> used to describe the processor and cache topologies. Ideally it is

>> used to extend/override information provided by the hardware, but

>> right now ARM64 is entirely dependent on firmware provided tables.

>>

>> This patch parses the table for the cache topology only. Its quite

>> trivial to add processor/cluster/???/socket level parsing as well,

>> but that information isn't as useful as the already provided NUMA

>> SRAT/SLIT information which provides relative distances. The one

>> useful thing, is the number of physical sockets but due to the

>> way arm64 considers "clusters" to be sockets, a larger discussion

>> is required here.

> 

> I think we need the socket to represent the true topology of

> the SoC, which means that considering clusters to be sockets is

> wrong on ARM64 server platforms, a "socket" needs to be a memory

> controller attached I think.


If I understand correctly, your suggesting that the socket isn't really 
the physical socket, but a grouping at the memory controller level?

My general take was that thread/core/socket is now insufficient as even 
x86 designs now have more levels of hierarchy than that. Mapping those 
layers, and the numa weighting into something meaningful for linux, well 
that was more than I wanted to start with. Particularly since the PPTT 
spec is silent about memory controller attachments at particular node 
levels, as well as a few other possible short comings.



> 

> Take D05 for example, there are two physical SoC sockets on

> the board but with two CPU DIE (with memory controller) on each

> physical socket, and 4 clusters on each CPU DIE.

> 

> When considering clusters as sockets (that's the code for now),

> there are 16 "sockets" to represent to OS for schedule input,

> but only 4 NUMA nodes, which are confusing the scheduler a lot...

> 

> Xiongfeng was working on the CPU topology based on PPTT, and the code

> is under internal review, if it's OK for you, we can send them out

> for review comments to see if we can join our effort together, or

> we can work on top of your patches, as you like :)

> 

>>

>> An example of lstopo with this patch:

>>

>> [root@mammon-juno-rh ~]# lstopo-no-graphics

>> Machine (8072MB)

>>    Package L#0 + L2 L#0 (1024KB)

>>      L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)

>>      L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)

>>      L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)

>>      L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)

>>    Package L#1 + L2 L#1 (2048KB)

>>      L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)

>>      L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)

>>    HostBridge L#0

>>      PCIBridge

>>        PCIBridge

>>          PCIBridge

>>            PCI 1095:3132

>>              Block(Disk) L#0 "sda"

>>          PCIBridge

>>            PCI 11ab:4380

>>              Net L#1 "enp8s0"

>>

>> Jeremy Linton (4):

>>    drivers: base: cacheinfo: Add support for ACPI based firmware tables

>>    arm64: cacheinfo: Add support for ACPI/PPTT generated topology

>>    ACPI/PPTT: Add Processor Properties Topology Table parsing

>>    ACPI: Enable PPTT support on ARM64

>>

>>   arch/arm64/kernel/cacheinfo.c |  23 ++-

>>   drivers/acpi/arm64/Kconfig    |   3 +

>>   drivers/acpi/arm64/Makefile   |   1 +

>>   drivers/acpi/arm64/pptt.c     | 389 

>> ++++++++++++++++++++++++++++++++++++++++++

> 

> I think PPTT is not ARM64 only, can be used for x86 too,

> shall we locate them on drivers/acpi?


Sure.. But, I was using the assumption that the table would only really 
be useful on arm64. On x86 the table is unnecessary and generally would 
have to be dynamically generated by the firmware (as it should be on 
arm) anyway. Put another way, does anyone want to use it on another 
platform?


> 

> Rafael was working a lot on the PPTT proposal for the

> spec, I think he can comment on this :) >

> Rafael, what do you think?

> 

> Thanks

> Hanjun


--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xiongfeng Wang Aug. 8, 2017, 4:19 a.m. UTC | #4
+Cc John Gary (who is working on the PPTT with me)

Hi Jeffrey,

On 2017/8/8 1:10, Jeffrey Hugo wrote:
> On 8/7/2017 4:20 AM, Hanjun Guo wrote:

>> +Cc Xiongfeng (who is also working on the PPTT but focusing on

>> CPU topology)

>>

>> Hi Jeremy,

>>

>> On 2017/8/5 8:11, Jeremy Linton wrote:

>>> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is

>>> used to describe the processor and cache topologies. Ideally it is

>>> used to extend/override information provided by the hardware, but

>>> right now ARM64 is entirely dependent on firmware provided tables.

>>>

>>> This patch parses the table for the cache topology only. Its quite

>>> trivial to add processor/cluster/???/socket level parsing as well,

>>> but that information isn't as useful as the already provided NUMA

>>> SRAT/SLIT information which provides relative distances. The one

>>> useful thing, is the number of physical sockets but due to the

>>> way arm64 considers "clusters" to be sockets, a larger discussion

>>> is required here.

>>

>> I think we need the socket to represent the true topology of

>> the SoC, which means that considering clusters to be sockets is

>> wrong on ARM64 server platforms, a "socket" needs to be a memory

>> controller attached I think.

>>

>> Take D05 for example, there are two physical SoC sockets on

>> the board but with two CPU DIE (with memory controller) on each

>> physical socket, and 4 clusters on each CPU DIE.

>>

>> When considering clusters as sockets (that's the code for now),

>> there are 16 "sockets" to represent to OS for schedule input,

>> but only 4 NUMA nodes, which are confusing the scheduler a lot...

>>

>> Xiongfeng was working on the CPU topology based on PPTT, and the code

>> is under internal review, if it's OK for you, we can send them out

>> for review comments to see if we can join our effort together, or

>> we can work on top of your patches, as you like :)

>>

>>>

>>> An example of lstopo with this patch:

>>>

>>> [root@mammon-juno-rh ~]# lstopo-no-graphics

>>> Machine (8072MB)

>>>    Package L#0 + L2 L#0 (1024KB)

>>>      L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)

>>>      L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)

>>>      L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)

>>>      L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)

>>>    Package L#1 + L2 L#1 (2048KB)

>>>      L1d L#4 (32KB) + L1i L#4 (48KB) + Core L#4 + PU L#4 (P#4)

>>>      L1d L#5 (32KB) + L1i L#5 (48KB) + Core L#5 + PU L#5 (P#5)

>>>    HostBridge L#0

>>>      PCIBridge

>>>        PCIBridge

>>>          PCIBridge

>>>            PCI 1095:3132

>>>              Block(Disk) L#0 "sda"

>>>          PCIBridge

>>>            PCI 11ab:4380

>>>              Net L#1 "enp8s0"

>>>

>>> Jeremy Linton (4):

>>>    drivers: base: cacheinfo: Add support for ACPI based firmware tables

>>>    arm64: cacheinfo: Add support for ACPI/PPTT generated topology

>>>    ACPI/PPTT: Add Processor Properties Topology Table parsing

>>>    ACPI: Enable PPTT support on ARM64

>>>

>>>   arch/arm64/kernel/cacheinfo.c |  23 ++-

>>>   drivers/acpi/arm64/Kconfig    |   3 +

>>>   drivers/acpi/arm64/Makefile   |   1 +

>>>   drivers/acpi/arm64/pptt.c     | 389 ++++++++++++++++++++++++++++++++++++++++++

>>

>> I think PPTT is not ARM64 only, can be used for x86 too,

>> shall we locate them on drivers/acpi?

> 

> Austin and I have been working on the CPU topology.  Sounds like we have some overlap with your work.  We have a working prototype at https://source.codeaurora.org/quic/server/kernel/log/?h=jhugo/pptt but are still doing some cleanup and fixes for our needs.

> 

> Drivers/acpi makes sense to us, and was our working assumption.

> 

John and I have also been working on the CPU topology. Our code can be got at https://github.com/fenghusthu/acpi_pptt .

I have tested it on the D05 board and it is OK.

Maybe we can work together, combine our codes, and make a better code structure.


>>

>> Rafael was working a lot on the PPTT proposal for the

>> spec, I think he can comment on this :)

>>

>> Rafael, what do you think?

>>

>> Thanks

>> Hanjun

>>

>> _______________________________________________

>> linux-arm-kernel mailing list

>> linux-arm-kernel@lists.infradead.org

>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

> 

> 


--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xiongfeng Wang Aug. 21, 2017, 3:15 a.m. UTC | #5
Hi Hanjun

On 2017/8/9 22:08, Hanjun Guo wrote:
> Hi Jeremy,

> 

> On 8 August 2017 at 01:33, Jeremy Linton <jeremy.linton@arm.com <mailto:jeremy.linton@arm.com>> wrote:

>>

>> Hi,

>>

>> On 08/07/2017 05:20 AM, Hanjun Guo wrote:

>>>

>>> +Cc Xiongfeng (who is also working on the PPTT but focusing on

>>> CPU topology)

>>>

>>> Hi Jeremy,

>>>

>>> On 2017/8/5 8:11, Jeremy Linton wrote:

>>>>

>>>> ACPI 6.2 adds the Processor Properties Topology Table (PPTT), which is

>>>> used to describe the processor and cache topologies. Ideally it is

>>>> used to extend/override information provided by the hardware, but

>>>> right now ARM64 is entirely dependent on firmware provided tables.

>>>>

>>>> This patch parses the table for the cache topology only. Its quite

>>>> trivial to add processor/cluster/???/socket level parsing as well,

>>>> but that information isn't as useful as the already provided NUMA

>>>> SRAT/SLIT information which provides relative distances. The one

>>>> useful thing, is the number of physical sockets but due to the

>>>> way arm64 considers "clusters" to be sockets, a larger discussion

>>>> is required here.

>>>

>>>

>>> I think we need the socket to represent the true topology of

>>> the SoC, which means that considering clusters to be sockets is

>>> wrong on ARM64 server platforms, a "socket" needs to be a memory

>>> controller attached I think.

>>

I agree with you that clusters should not be considered as sockets. Cores in a
cluster may share a L2 cache and should stay in a same local sched_domain for
better performance. This is done in function cpu_coregroup_mask(), which use
cpu_topology[cpu].core_sibling to build a sched_domain. The core_sibling information
is also used in sysfs and considered as cores in a socket by "lscpu".

I think we may need to add members 'physical_package_id' and 'cluster_sibling' in
struct cpu_topology. So that cores in a cluster represented by 'core_sibling' are composed
in a same local sched_domain, and cores in a package represented by 'cluster_sibling'
are considered as a socket by 'lscpu'.


Thanks,
Xiongfeng Wang

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html