diff mbox

[4/6] arm64: topology: add MPIDR-based detection

Message ID 1399063112-16917-5-git-send-email-broonie@kernel.org
State New
Headers show

Commit Message

Mark Brown May 2, 2014, 8:38 p.m. UTC
From: Zi Shen Lim <zlim@broadcom.com>

Create cpu topology based on MPIDR. When hardware sets MPIDR to sane
values, this method will always work. Therefore it should also work well
as the fallback method. [1]

When we have multiple processing elements in the system, we create
the cpu topology by mapping each affinity level (from lowest to highest)
to threads (if they exist), cores, and clusters.

We combine data from all higher affinity levels into cluster_id
so we don't lose any information from MPIDR. [2]

[1] http://www.spinics.net/lists/arm-kernel/msg317445.html
[2] https://lkml.org/lkml/2014/4/23/703

[Raise the priority of the error message if we don't discover topology
now that we can read it from MPIDIR -- broonie]

Signed-off-by: Zi Shen Lim <zlim@broadcom.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
---
 arch/arm64/include/asm/cputype.h |  5 +++++
 arch/arm64/kernel/topology.c     | 46 ++++++++++++++++++++++++++++++++++++----
 2 files changed, 47 insertions(+), 4 deletions(-)

Comments

Mark Brown May 16, 2014, 6:39 p.m. UTC | #1
On Fri, May 16, 2014 at 05:34:04PM +0100, Sudeep Holla wrote:

> This is broken, IIRC Lorenzo commented on this in the previous version of the
> patch.

Could you please be specific?  Lorenzo was concerned about overflow but
that ought to be addressed here.

> Take a simple example of system with 2 Quad core clusters.
> The mpidr_hash.shift_aff[1] will be 6 as you need minimum 2 bits to represent
> aff[0]. So you will end up with second cluster with id = 4 instead of 1.

This isn't a problem, the clusters can have any numbers so long as they
are distinct.  There is no other requirement.

> I am not sure if we need this serialization, but even if we need it you can't
> simply use the hash bits generated for MPIDR.Aff{3..0} serialization directly
> as is for serializing parts of it.

Ah, now I look at what the hash is doing that is indeed directly useful
- we can mask or shift out the bits we don't want.  Equally well it just
looks like a preference?

This does seem like something that could be dealt with incrementally.
Sudeep Holla May 19, 2014, 9:57 a.m. UTC | #2
On 16/05/14 19:39, Mark Brown wrote:
> On Fri, May 16, 2014 at 05:34:04PM +0100, Sudeep Holla wrote:
>
>> This is broken, IIRC Lorenzo commented on this in the previous version of the
>> patch.
>
> Could you please be specific?  Lorenzo was concerned about overflow but
> that ought to be addressed here.
>

Ah, my bad. You are right, I took his comment on the shift to be different from
overflow which is not the case.

>> Take a simple example of system with 2 Quad core clusters.
>> The mpidr_hash.shift_aff[1] will be 6 as you need minimum 2 bits to represent
>> aff[0]. So you will end up with second cluster with id = 4 instead of 1.
>
> This isn't a problem, the clusters can have any numbers so long as they
> are distinct.  There is no other requirement.
>

IIUC these topology information is exposed via sysfs. So it's good to have
uniformity though they can have any number. As mentioned in the example, if the
linearisation depend on aff[0], then this factor will not be uniform.

>> I am not sure if we need this serialization, but even if we need it you can't
>> simply use the hash bits generated for MPIDR.Aff{3..0} serialization directly
>> as is for serializing parts of it.
>
> Ah, now I look at what the hash is doing that is indeed directly useful
> - we can mask or shift out the bits we don't want.  Equally well it just
> looks like a preference?

Yes we can use the hash bits, but the way it's done in this patch needs fixing
so that we can be more uniform(as its exposed via sysfs)

>
> This does seem like something that could be dealt with incrementally.
>

Sorry, I didn't mean to block this patch, I am just mentioning the possible
issue with this patch.

Regards,
Sudeep
Lorenzo Pieralisi May 19, 2014, 10:54 a.m. UTC | #3
On Mon, May 19, 2014 at 10:57:40AM +0100, Sudeep Holla wrote:
> 
> 
> On 16/05/14 19:39, Mark Brown wrote:
> > On Fri, May 16, 2014 at 05:34:04PM +0100, Sudeep Holla wrote:
> >
> >> This is broken, IIRC Lorenzo commented on this in the previous version of the
> >> patch.
> >
> > Could you please be specific?  Lorenzo was concerned about overflow but
> > that ought to be addressed here.
> >
> 
> Ah, my bad. You are right, I took his comment on the shift to be different from
> overflow which is not the case.
> 
> >> Take a simple example of system with 2 Quad core clusters.
> >> The mpidr_hash.shift_aff[1] will be 6 as you need minimum 2 bits to represent
> >> aff[0]. So you will end up with second cluster with id = 4 instead of 1.
> >
> > This isn't a problem, the clusters can have any numbers so long as they
> > are distinct.  There is no other requirement.
> >
> 
> IIUC these topology information is exposed via sysfs. So it's good to have
> uniformity though they can have any number. As mentioned in the example, if the
> linearisation depend on aff[0], then this factor will not be uniform.
> 
> >> I am not sure if we need this serialization, but even if we need it you can't
> >> simply use the hash bits generated for MPIDR.Aff{3..0} serialization directly
> >> as is for serializing parts of it.
> >
> > Ah, now I look at what the hash is doing that is indeed directly useful
> > - we can mask or shift out the bits we don't want.  Equally well it just
> > looks like a preference?
> 
> Yes we can use the hash bits, but the way it's done in this patch needs fixing
> so that we can be more uniform(as its exposed via sysfs)
> 
> >
> > This does seem like something that could be dealt with incrementally.
> >
> 
> Sorry, I didn't mean to block this patch, I am just mentioning the possible
> issue with this patch.

Hash is used to save memory, if all we need is a unique identifier
we do not need the hash at all.

As to what should we use as cluster id, honestly I do not know.

I am quite tempted to use just three affinity levels for the moment
and fix it when need arises, after all on ARM we have aff2 completely
unused at the moment for the non-SMT systems (ie all ARM SMP systems in
the mainline) and we are not coalescing affinity 2 into affinity 1 in
any way.

So either you ignore aff3, or can do something like that (non-SMT):

cluster_id = MPIDR_EL1[39:32] << 16 | MPIDR_EL1[23:16] << 8 | MPIDR_EL1[15:8]

Also please remove the warning on the missing topology information, if
we fall back to MPIDR_EL1 we will always have a topology of sorts, as
borked as it might be, so that warning becomes useless, ie it is never
triggered.

Lorenzo
Mark Brown May 19, 2014, 12:14 p.m. UTC | #4
On Mon, May 19, 2014 at 10:57:40AM +0100, Sudeep Holla wrote:
> On 16/05/14 19:39, Mark Brown wrote:
> >On Fri, May 16, 2014 at 05:34:04PM +0100, Sudeep Holla wrote:

> >>Take a simple example of system with 2 Quad core clusters.
> >>The mpidr_hash.shift_aff[1] will be 6 as you need minimum 2 bits to represent
> >>aff[0]. So you will end up with second cluster with id = 4 instead of 1.

> >This isn't a problem, the clusters can have any numbers so long as they
> >are distinct.  There is no other requirement.

> IIUC these topology information is exposed via sysfs. So it's good to have
> uniformity though they can have any number. As mentioned in the example, if the
> linearisation depend on aff[0], then this factor will not be uniform.

So your concern is that the width of aff[0] will vary?  That's
reasonable.
Mark Brown May 19, 2014, 12:33 p.m. UTC | #5
On Mon, May 19, 2014 at 11:54:05AM +0100, Lorenzo Pieralisi wrote:

> As to what should we use as cluster id, honestly I do not know.

> I am quite tempted to use just three affinity levels for the moment
> and fix it when need arises, after all on ARM we have aff2 completely
> unused at the moment for the non-SMT systems (ie all ARM SMP systems in
> the mainline) and we are not coalescing affinity 2 into affinity 1 in
> any way.

> So either you ignore aff3, or can do something like that (non-SMT):

> cluster_id = MPIDR_EL1[39:32] << 16 | MPIDR_EL1[23:16] << 8 | MPIDR_EL1[15:8]

Which is roughly what the original code that you were worried about did
IIRC?  It seems silly to ignore the higher affinity levels since it's
trivial to look at them and means the kernel will at least split
everything into clusters if it does encounter some hardware that uses
them as opposed to merging those distinguished only by the higher
affinity levels.

> Also please remove the warning on the missing topology information, if
> we fall back to MPIDR_EL1 we will always have a topology of sorts, as
> borked as it might be, so that warning becomes useless, ie it is never
> triggered.

I'll add something to this patch - the warning is needed if the DT code
gets merged without this and it seems this one still going round the
loop.  :/
Lorenzo Pieralisi May 19, 2014, 2:13 p.m. UTC | #6
On Mon, May 19, 2014 at 01:33:14PM +0100, Mark Brown wrote:
> On Mon, May 19, 2014 at 11:54:05AM +0100, Lorenzo Pieralisi wrote:
> 
> > As to what should we use as cluster id, honestly I do not know.
> 
> > I am quite tempted to use just three affinity levels for the moment
> > and fix it when need arises, after all on ARM we have aff2 completely
> > unused at the moment for the non-SMT systems (ie all ARM SMP systems in
> > the mainline) and we are not coalescing affinity 2 into affinity 1 in
> > any way.
> 
> > So either you ignore aff3, or can do something like that (non-SMT):
> 
> > cluster_id = MPIDR_EL1[39:32] << 16 | MPIDR_EL1[23:16] << 8 | MPIDR_EL1[15:8]
> 
> Which is roughly what the original code that you were worried about did
> IIRC?  It seems silly to ignore the higher affinity levels since it's
> trivial to look at them and means the kernel will at least split
> everything into clusters if it does encounter some hardware that uses
> them as opposed to merging those distinguished only by the higher
> affinity levels.

No, it is not, at all, you do not remember correctly I am afraid.

Using the pseudo code above is as useful as using the hashing algorithm,
they both provide you with a unique id and that's all we need for the
topology.

The original code was using the hash shifts the wrong way, which could
lead to incorrect behaviour.

If ignoring affinity levels is silly, then we have to fix ARM 32 bit
code too, since there on ALL SMP ARM systems in the mainline, affinity
level 2 *is* ignored (since they are not SMT systems).

Having said that, I really think that for the time being we should use three
affinity levels (for topology purposes) as ARM 32 does and be done with this
stuff.

To be 100% precise, I think the best way to do it is to add another
topology level (ie "book" in the kernel or whatever you want to call it for
ARM, which I think is unused) and update the parsing code and data structure
accordingly so that if two clusters have the same id but belong to different
"books" (aff2 or aff3 - depending on SMT) the sibling masks are still correct.

Documentation/cputopology.txt

We will cross that bridge when we come to it instead of lumping affinity
levels together, that's my opinion.

> > Also please remove the warning on the missing topology information, if
> > we fall back to MPIDR_EL1 we will always have a topology of sorts, as
> > borked as it might be, so that warning becomes useless, ie it is never
> > triggered.
> 
> I'll add something to this patch - the warning is needed if the DT code
> gets merged without this and it seems this one still going round the
> loop.  :/

Yes but *this* patch is bumping the log level not the other ones and what I
am saying is that when this code patch gets merged that warning is useless (ie
never triggered - cluster id can't be == -1) unless I am missing something
here.

Do not bother, use three affinity levels and be done with that, we will
deal with aff3 (and aff2) when time comes.

Thanks,
Lorenzo
Mark Brown May 19, 2014, 4:12 p.m. UTC | #7
On Mon, May 19, 2014 at 03:13:24PM +0100, Lorenzo Pieralisi wrote:

> Using the pseudo code above is as useful as using the hashing algorithm,
> they both provide you with a unique id and that's all we need for the
> topology.

> The original code was using the hash shifts the wrong way, which could
> lead to incorrect behaviour.

Ah, OK.  I misremembered.

> If ignoring affinity levels is silly, then we have to fix ARM 32 bit
> code too, since there on ALL SMP ARM systems in the mainline, affinity
> level 2 *is* ignored (since they are not SMT systems).

I think someone should, yes, though I'd be a bit surprised if anyone
ever actually makes 32 bit ARM hardware where it makes any difference.

> Having said that, I really think that for the time being we should use three
> affinity levels (for topology purposes) as ARM 32 does and be done with this
> stuff.

> To be 100% precise, I think the best way to do it is to add another
> topology level (ie "book" in the kernel or whatever you want to call it for
> ARM, which I think is unused) and update the parsing code and data structure
> accordingly so that if two clusters have the same id but belong to different
> "books" (aff2 or aff3 - depending on SMT) the sibling masks are still correct.

> Documentation/cputopology.txt

> We will cross that bridge when we come to it instead of lumping affinity
> levels together, that's my opinion.

Right, that's about what I'm saying - I don't think we should actually
do anything to describe a multi-level topology to the scheduler since
it's not clear to me if the physical realities of such systems system
will map on to what the scheduler thinks it's modelling well, or for
that matter if the scheduler won't have a better model or be capable of
automatically tuning this stuff without explicit information from the
architecture by the time anyone gets around to making such hardware.

The only reason for paying attention at all at present is to ensure that
we don't do something actively broken and squash clusters together
should we encounter such a system - we do the same thing that the DT
code, we just ignore the heirachy and report everything as a flat
topology.

> > I'll add something to this patch - the warning is needed if the DT code
> > gets merged without this and it seems this one still going round the
> > loop.  :/

> Yes but *this* patch is bumping the log level not the other ones and what I
> am saying is that when this code patch gets merged that warning is useless (ie
> never triggered - cluster id can't be == -1) unless I am missing something
> here.

Right, if we have the MPIDR support then it should be removed.  It's
useful if we get the DT without MPIDR but not otherwise - once we can
parse MPIDRs for most SoCs there should be very little reason to ever
bother with the DT binding.

> Do not bother, use three affinity levels and be done with that, we will
> deal with aff3 (and aff2) when time comes.

This would leave the MPIDR support worse than DT which seems retrograde
especially given that it's so simple to differentiate the clusters.  The
only issue with that appears to be about precisely how to make up the
cluster numbers which is a cosmetic one.
Lorenzo Pieralisi May 19, 2014, 5:15 p.m. UTC | #8
On Mon, May 19, 2014 at 05:12:17PM +0100, Mark Brown wrote:

[...]

> > Do not bother, use three affinity levels and be done with that, we will
> > deal with aff3 (and aff2) when time comes.
> 
> This would leave the MPIDR support worse than DT which seems retrograde
> especially given that it's so simple to differentiate the clusters.  The
> only issue with that appears to be about precisely how to make up the
> cluster numbers which is a cosmetic one.

That's the reason why I said you should not bother. If you want to use
4 affinity levels, I do not see the point in using the hash for that though,
since all we need is a unique id which can be easily created without
resorting to hashing.

Hashing compresses the cluster index, but that index is not representative
of HW anyway. If you go for simple shifting we might end up with huge cluster
ids, which is fine but a bit weird.

So either (1) you use three affinity levels or (2) the simplest way to combine
the affinity levels.

I personally do not like squashing affinity levels, but I can live with
that as long as we are done with this, functionality is ready and it is
time we get that in.

Thanks,
Lorenzo
Mark Brown May 19, 2014, 6:39 p.m. UTC | #9
On Mon, May 19, 2014 at 06:15:11PM +0100, Lorenzo Pieralisi wrote:
> On Mon, May 19, 2014 at 05:12:17PM +0100, Mark Brown wrote:

> > This would leave the MPIDR support worse than DT which seems retrograde
> > especially given that it's so simple to differentiate the clusters.  The
> > only issue with that appears to be about precisely how to make up the
> > cluster numbers which is a cosmetic one.

> That's the reason why I said you should not bother. If you want to use
> 4 affinity levels, I do not see the point in using the hash for that though,
> since all we need is a unique id which can be easily created without
> resorting to hashing.

Right, OK.  I personally wouldn't have used anything non-trivial either
but equally well I didn't see any strong reason not to do it either and
it's what Zi Shen sent.  I suspect this is based on the review comments
the first time around.
diff mbox

Patch

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index c404fb0..b3b3287 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -18,6 +18,8 @@ 
 
 #define INVALID_HWID		ULONG_MAX
 
+#define MPIDR_UP_BITMASK	(0x1 << 30)
+#define MPIDR_MT_BITMASK	(0x1 << 24)
 #define MPIDR_HWID_BITMASK	0xff00ffffff
 
 #define MPIDR_LEVEL_BITS_SHIFT	3
@@ -30,6 +32,9 @@ 
 #define MPIDR_AFFINITY_LEVEL(mpidr, level) \
 	((mpidr >> MPIDR_LEVEL_SHIFT(level)) & MPIDR_LEVEL_MASK)
 
+#define MPIDR_AFF_MASK(level) \
+	((u64)MPIDR_LEVEL_MASK << MPIDR_LEVEL_SHIFT(level))
+
 #define read_cpuid(reg) ({						\
 	u64 __val;							\
 	asm("mrs	%0, " #reg : "=r" (__val));			\
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 43514f9..3b2479c 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -20,6 +20,8 @@ 
 #include <linux/of.h>
 #include <linux/sched.h>
 
+#include <asm/cputype.h>
+#include <asm/smp_plat.h>
 #include <asm/topology.h>
 
 static int __init get_cpu_for_node(struct device_node *node)
@@ -220,10 +222,8 @@  static void update_siblings_masks(unsigned int cpuid)
 	int cpu;
 
 	if (cpuid_topo->cluster_id == -1) {
-		/*
-		 * DT does not contain topology information for this cpu.
-		 */
-		pr_debug("CPU%u: No topology information configured\n", cpuid);
+		/* No topology information for this cpu ?! */
+		pr_err("CPU%u: No topology information configured\n", cpuid);
 		return;
 	}
 
@@ -249,6 +249,44 @@  static void update_siblings_masks(unsigned int cpuid)
 
 void store_cpu_topology(unsigned int cpuid)
 {
+	struct cpu_topology *cpuid_topo = &cpu_topology[cpuid];
+	u64 mpidr;
+
+	if (cpuid_topo->cluster_id != -1)
+		goto topology_populated;
+
+	mpidr = read_cpuid_mpidr();
+
+	/* Create cpu topology mapping based on MPIDR. */
+	if (mpidr & MPIDR_UP_BITMASK) {
+		/* Uniprocessor system */
+		cpuid_topo->thread_id  = -1;
+		cpuid_topo->core_id    = MPIDR_AFFINITY_LEVEL(mpidr, 0);
+		cpuid_topo->cluster_id = 0;
+	} else if (mpidr & MPIDR_MT_BITMASK) {
+		/* Multiprocessor system : Multi-threads per core */
+		cpuid_topo->thread_id  = MPIDR_AFFINITY_LEVEL(mpidr, 0);
+		cpuid_topo->core_id    = MPIDR_AFFINITY_LEVEL(mpidr, 1);
+		cpuid_topo->cluster_id =
+			((mpidr & MPIDR_AFF_MASK(2)) >> mpidr_hash.shift_aff[2] |
+			 (mpidr & MPIDR_AFF_MASK(3)) >> mpidr_hash.shift_aff[3])
+			>> mpidr_hash.shift_aff[1] >> mpidr_hash.shift_aff[0];
+	} else {
+		/* Multiprocessor system : Single-thread per core */
+		cpuid_topo->thread_id  = -1;
+		cpuid_topo->core_id    = MPIDR_AFFINITY_LEVEL(mpidr, 0);
+		cpuid_topo->cluster_id =
+			((mpidr & MPIDR_AFF_MASK(1)) >> mpidr_hash.shift_aff[1] |
+			 (mpidr & MPIDR_AFF_MASK(2)) >> mpidr_hash.shift_aff[2] |
+			 (mpidr & MPIDR_AFF_MASK(3)) >> mpidr_hash.shift_aff[3])
+			>> mpidr_hash.shift_aff[0];
+	}
+
+	pr_debug("CPU%u: cluster %d core %d thread %d mpidr %llx\n",
+		 cpuid, cpuid_topo->cluster_id, cpuid_topo->core_id,
+		 cpuid_topo->thread_id, mpidr);
+
+topology_populated:
 	update_siblings_masks(cpuid);
 }