Message ID | 1399063112-16917-5-git-send-email-broonie@kernel.org |
---|---|
State | New |
Headers | show |
On Fri, May 16, 2014 at 05:34:04PM +0100, Sudeep Holla wrote: > This is broken, IIRC Lorenzo commented on this in the previous version of the > patch. Could you please be specific? Lorenzo was concerned about overflow but that ought to be addressed here. > Take a simple example of system with 2 Quad core clusters. > The mpidr_hash.shift_aff[1] will be 6 as you need minimum 2 bits to represent > aff[0]. So you will end up with second cluster with id = 4 instead of 1. This isn't a problem, the clusters can have any numbers so long as they are distinct. There is no other requirement. > I am not sure if we need this serialization, but even if we need it you can't > simply use the hash bits generated for MPIDR.Aff{3..0} serialization directly > as is for serializing parts of it. Ah, now I look at what the hash is doing that is indeed directly useful - we can mask or shift out the bits we don't want. Equally well it just looks like a preference? This does seem like something that could be dealt with incrementally.
On 16/05/14 19:39, Mark Brown wrote: > On Fri, May 16, 2014 at 05:34:04PM +0100, Sudeep Holla wrote: > >> This is broken, IIRC Lorenzo commented on this in the previous version of the >> patch. > > Could you please be specific? Lorenzo was concerned about overflow but > that ought to be addressed here. > Ah, my bad. You are right, I took his comment on the shift to be different from overflow which is not the case. >> Take a simple example of system with 2 Quad core clusters. >> The mpidr_hash.shift_aff[1] will be 6 as you need minimum 2 bits to represent >> aff[0]. So you will end up with second cluster with id = 4 instead of 1. > > This isn't a problem, the clusters can have any numbers so long as they > are distinct. There is no other requirement. > IIUC these topology information is exposed via sysfs. So it's good to have uniformity though they can have any number. As mentioned in the example, if the linearisation depend on aff[0], then this factor will not be uniform. >> I am not sure if we need this serialization, but even if we need it you can't >> simply use the hash bits generated for MPIDR.Aff{3..0} serialization directly >> as is for serializing parts of it. > > Ah, now I look at what the hash is doing that is indeed directly useful > - we can mask or shift out the bits we don't want. Equally well it just > looks like a preference? Yes we can use the hash bits, but the way it's done in this patch needs fixing so that we can be more uniform(as its exposed via sysfs) > > This does seem like something that could be dealt with incrementally. > Sorry, I didn't mean to block this patch, I am just mentioning the possible issue with this patch. Regards, Sudeep
On Mon, May 19, 2014 at 10:57:40AM +0100, Sudeep Holla wrote: > > > On 16/05/14 19:39, Mark Brown wrote: > > On Fri, May 16, 2014 at 05:34:04PM +0100, Sudeep Holla wrote: > > > >> This is broken, IIRC Lorenzo commented on this in the previous version of the > >> patch. > > > > Could you please be specific? Lorenzo was concerned about overflow but > > that ought to be addressed here. > > > > Ah, my bad. You are right, I took his comment on the shift to be different from > overflow which is not the case. > > >> Take a simple example of system with 2 Quad core clusters. > >> The mpidr_hash.shift_aff[1] will be 6 as you need minimum 2 bits to represent > >> aff[0]. So you will end up with second cluster with id = 4 instead of 1. > > > > This isn't a problem, the clusters can have any numbers so long as they > > are distinct. There is no other requirement. > > > > IIUC these topology information is exposed via sysfs. So it's good to have > uniformity though they can have any number. As mentioned in the example, if the > linearisation depend on aff[0], then this factor will not be uniform. > > >> I am not sure if we need this serialization, but even if we need it you can't > >> simply use the hash bits generated for MPIDR.Aff{3..0} serialization directly > >> as is for serializing parts of it. > > > > Ah, now I look at what the hash is doing that is indeed directly useful > > - we can mask or shift out the bits we don't want. Equally well it just > > looks like a preference? > > Yes we can use the hash bits, but the way it's done in this patch needs fixing > so that we can be more uniform(as its exposed via sysfs) > > > > > This does seem like something that could be dealt with incrementally. > > > > Sorry, I didn't mean to block this patch, I am just mentioning the possible > issue with this patch. Hash is used to save memory, if all we need is a unique identifier we do not need the hash at all. As to what should we use as cluster id, honestly I do not know. I am quite tempted to use just three affinity levels for the moment and fix it when need arises, after all on ARM we have aff2 completely unused at the moment for the non-SMT systems (ie all ARM SMP systems in the mainline) and we are not coalescing affinity 2 into affinity 1 in any way. So either you ignore aff3, or can do something like that (non-SMT): cluster_id = MPIDR_EL1[39:32] << 16 | MPIDR_EL1[23:16] << 8 | MPIDR_EL1[15:8] Also please remove the warning on the missing topology information, if we fall back to MPIDR_EL1 we will always have a topology of sorts, as borked as it might be, so that warning becomes useless, ie it is never triggered. Lorenzo
On Mon, May 19, 2014 at 10:57:40AM +0100, Sudeep Holla wrote: > On 16/05/14 19:39, Mark Brown wrote: > >On Fri, May 16, 2014 at 05:34:04PM +0100, Sudeep Holla wrote: > >>Take a simple example of system with 2 Quad core clusters. > >>The mpidr_hash.shift_aff[1] will be 6 as you need minimum 2 bits to represent > >>aff[0]. So you will end up with second cluster with id = 4 instead of 1. > >This isn't a problem, the clusters can have any numbers so long as they > >are distinct. There is no other requirement. > IIUC these topology information is exposed via sysfs. So it's good to have > uniformity though they can have any number. As mentioned in the example, if the > linearisation depend on aff[0], then this factor will not be uniform. So your concern is that the width of aff[0] will vary? That's reasonable.
On Mon, May 19, 2014 at 11:54:05AM +0100, Lorenzo Pieralisi wrote: > As to what should we use as cluster id, honestly I do not know. > I am quite tempted to use just three affinity levels for the moment > and fix it when need arises, after all on ARM we have aff2 completely > unused at the moment for the non-SMT systems (ie all ARM SMP systems in > the mainline) and we are not coalescing affinity 2 into affinity 1 in > any way. > So either you ignore aff3, or can do something like that (non-SMT): > cluster_id = MPIDR_EL1[39:32] << 16 | MPIDR_EL1[23:16] << 8 | MPIDR_EL1[15:8] Which is roughly what the original code that you were worried about did IIRC? It seems silly to ignore the higher affinity levels since it's trivial to look at them and means the kernel will at least split everything into clusters if it does encounter some hardware that uses them as opposed to merging those distinguished only by the higher affinity levels. > Also please remove the warning on the missing topology information, if > we fall back to MPIDR_EL1 we will always have a topology of sorts, as > borked as it might be, so that warning becomes useless, ie it is never > triggered. I'll add something to this patch - the warning is needed if the DT code gets merged without this and it seems this one still going round the loop. :/
On Mon, May 19, 2014 at 01:33:14PM +0100, Mark Brown wrote: > On Mon, May 19, 2014 at 11:54:05AM +0100, Lorenzo Pieralisi wrote: > > > As to what should we use as cluster id, honestly I do not know. > > > I am quite tempted to use just three affinity levels for the moment > > and fix it when need arises, after all on ARM we have aff2 completely > > unused at the moment for the non-SMT systems (ie all ARM SMP systems in > > the mainline) and we are not coalescing affinity 2 into affinity 1 in > > any way. > > > So either you ignore aff3, or can do something like that (non-SMT): > > > cluster_id = MPIDR_EL1[39:32] << 16 | MPIDR_EL1[23:16] << 8 | MPIDR_EL1[15:8] > > Which is roughly what the original code that you were worried about did > IIRC? It seems silly to ignore the higher affinity levels since it's > trivial to look at them and means the kernel will at least split > everything into clusters if it does encounter some hardware that uses > them as opposed to merging those distinguished only by the higher > affinity levels. No, it is not, at all, you do not remember correctly I am afraid. Using the pseudo code above is as useful as using the hashing algorithm, they both provide you with a unique id and that's all we need for the topology. The original code was using the hash shifts the wrong way, which could lead to incorrect behaviour. If ignoring affinity levels is silly, then we have to fix ARM 32 bit code too, since there on ALL SMP ARM systems in the mainline, affinity level 2 *is* ignored (since they are not SMT systems). Having said that, I really think that for the time being we should use three affinity levels (for topology purposes) as ARM 32 does and be done with this stuff. To be 100% precise, I think the best way to do it is to add another topology level (ie "book" in the kernel or whatever you want to call it for ARM, which I think is unused) and update the parsing code and data structure accordingly so that if two clusters have the same id but belong to different "books" (aff2 or aff3 - depending on SMT) the sibling masks are still correct. Documentation/cputopology.txt We will cross that bridge when we come to it instead of lumping affinity levels together, that's my opinion. > > Also please remove the warning on the missing topology information, if > > we fall back to MPIDR_EL1 we will always have a topology of sorts, as > > borked as it might be, so that warning becomes useless, ie it is never > > triggered. > > I'll add something to this patch - the warning is needed if the DT code > gets merged without this and it seems this one still going round the > loop. :/ Yes but *this* patch is bumping the log level not the other ones and what I am saying is that when this code patch gets merged that warning is useless (ie never triggered - cluster id can't be == -1) unless I am missing something here. Do not bother, use three affinity levels and be done with that, we will deal with aff3 (and aff2) when time comes. Thanks, Lorenzo
On Mon, May 19, 2014 at 03:13:24PM +0100, Lorenzo Pieralisi wrote: > Using the pseudo code above is as useful as using the hashing algorithm, > they both provide you with a unique id and that's all we need for the > topology. > The original code was using the hash shifts the wrong way, which could > lead to incorrect behaviour. Ah, OK. I misremembered. > If ignoring affinity levels is silly, then we have to fix ARM 32 bit > code too, since there on ALL SMP ARM systems in the mainline, affinity > level 2 *is* ignored (since they are not SMT systems). I think someone should, yes, though I'd be a bit surprised if anyone ever actually makes 32 bit ARM hardware where it makes any difference. > Having said that, I really think that for the time being we should use three > affinity levels (for topology purposes) as ARM 32 does and be done with this > stuff. > To be 100% precise, I think the best way to do it is to add another > topology level (ie "book" in the kernel or whatever you want to call it for > ARM, which I think is unused) and update the parsing code and data structure > accordingly so that if two clusters have the same id but belong to different > "books" (aff2 or aff3 - depending on SMT) the sibling masks are still correct. > Documentation/cputopology.txt > We will cross that bridge when we come to it instead of lumping affinity > levels together, that's my opinion. Right, that's about what I'm saying - I don't think we should actually do anything to describe a multi-level topology to the scheduler since it's not clear to me if the physical realities of such systems system will map on to what the scheduler thinks it's modelling well, or for that matter if the scheduler won't have a better model or be capable of automatically tuning this stuff without explicit information from the architecture by the time anyone gets around to making such hardware. The only reason for paying attention at all at present is to ensure that we don't do something actively broken and squash clusters together should we encounter such a system - we do the same thing that the DT code, we just ignore the heirachy and report everything as a flat topology. > > I'll add something to this patch - the warning is needed if the DT code > > gets merged without this and it seems this one still going round the > > loop. :/ > Yes but *this* patch is bumping the log level not the other ones and what I > am saying is that when this code patch gets merged that warning is useless (ie > never triggered - cluster id can't be == -1) unless I am missing something > here. Right, if we have the MPIDR support then it should be removed. It's useful if we get the DT without MPIDR but not otherwise - once we can parse MPIDRs for most SoCs there should be very little reason to ever bother with the DT binding. > Do not bother, use three affinity levels and be done with that, we will > deal with aff3 (and aff2) when time comes. This would leave the MPIDR support worse than DT which seems retrograde especially given that it's so simple to differentiate the clusters. The only issue with that appears to be about precisely how to make up the cluster numbers which is a cosmetic one.
On Mon, May 19, 2014 at 05:12:17PM +0100, Mark Brown wrote: [...] > > Do not bother, use three affinity levels and be done with that, we will > > deal with aff3 (and aff2) when time comes. > > This would leave the MPIDR support worse than DT which seems retrograde > especially given that it's so simple to differentiate the clusters. The > only issue with that appears to be about precisely how to make up the > cluster numbers which is a cosmetic one. That's the reason why I said you should not bother. If you want to use 4 affinity levels, I do not see the point in using the hash for that though, since all we need is a unique id which can be easily created without resorting to hashing. Hashing compresses the cluster index, but that index is not representative of HW anyway. If you go for simple shifting we might end up with huge cluster ids, which is fine but a bit weird. So either (1) you use three affinity levels or (2) the simplest way to combine the affinity levels. I personally do not like squashing affinity levels, but I can live with that as long as we are done with this, functionality is ready and it is time we get that in. Thanks, Lorenzo
On Mon, May 19, 2014 at 06:15:11PM +0100, Lorenzo Pieralisi wrote: > On Mon, May 19, 2014 at 05:12:17PM +0100, Mark Brown wrote: > > This would leave the MPIDR support worse than DT which seems retrograde > > especially given that it's so simple to differentiate the clusters. The > > only issue with that appears to be about precisely how to make up the > > cluster numbers which is a cosmetic one. > That's the reason why I said you should not bother. If you want to use > 4 affinity levels, I do not see the point in using the hash for that though, > since all we need is a unique id which can be easily created without > resorting to hashing. Right, OK. I personally wouldn't have used anything non-trivial either but equally well I didn't see any strong reason not to do it either and it's what Zi Shen sent. I suspect this is based on the review comments the first time around.
diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h index c404fb0..b3b3287 100644 --- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -18,6 +18,8 @@ #define INVALID_HWID ULONG_MAX +#define MPIDR_UP_BITMASK (0x1 << 30) +#define MPIDR_MT_BITMASK (0x1 << 24) #define MPIDR_HWID_BITMASK 0xff00ffffff #define MPIDR_LEVEL_BITS_SHIFT 3 @@ -30,6 +32,9 @@ #define MPIDR_AFFINITY_LEVEL(mpidr, level) \ ((mpidr >> MPIDR_LEVEL_SHIFT(level)) & MPIDR_LEVEL_MASK) +#define MPIDR_AFF_MASK(level) \ + ((u64)MPIDR_LEVEL_MASK << MPIDR_LEVEL_SHIFT(level)) + #define read_cpuid(reg) ({ \ u64 __val; \ asm("mrs %0, " #reg : "=r" (__val)); \ diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index 43514f9..3b2479c 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -20,6 +20,8 @@ #include <linux/of.h> #include <linux/sched.h> +#include <asm/cputype.h> +#include <asm/smp_plat.h> #include <asm/topology.h> static int __init get_cpu_for_node(struct device_node *node) @@ -220,10 +222,8 @@ static void update_siblings_masks(unsigned int cpuid) int cpu; if (cpuid_topo->cluster_id == -1) { - /* - * DT does not contain topology information for this cpu. - */ - pr_debug("CPU%u: No topology information configured\n", cpuid); + /* No topology information for this cpu ?! */ + pr_err("CPU%u: No topology information configured\n", cpuid); return; } @@ -249,6 +249,44 @@ static void update_siblings_masks(unsigned int cpuid) void store_cpu_topology(unsigned int cpuid) { + struct cpu_topology *cpuid_topo = &cpu_topology[cpuid]; + u64 mpidr; + + if (cpuid_topo->cluster_id != -1) + goto topology_populated; + + mpidr = read_cpuid_mpidr(); + + /* Create cpu topology mapping based on MPIDR. */ + if (mpidr & MPIDR_UP_BITMASK) { + /* Uniprocessor system */ + cpuid_topo->thread_id = -1; + cpuid_topo->core_id = MPIDR_AFFINITY_LEVEL(mpidr, 0); + cpuid_topo->cluster_id = 0; + } else if (mpidr & MPIDR_MT_BITMASK) { + /* Multiprocessor system : Multi-threads per core */ + cpuid_topo->thread_id = MPIDR_AFFINITY_LEVEL(mpidr, 0); + cpuid_topo->core_id = MPIDR_AFFINITY_LEVEL(mpidr, 1); + cpuid_topo->cluster_id = + ((mpidr & MPIDR_AFF_MASK(2)) >> mpidr_hash.shift_aff[2] | + (mpidr & MPIDR_AFF_MASK(3)) >> mpidr_hash.shift_aff[3]) + >> mpidr_hash.shift_aff[1] >> mpidr_hash.shift_aff[0]; + } else { + /* Multiprocessor system : Single-thread per core */ + cpuid_topo->thread_id = -1; + cpuid_topo->core_id = MPIDR_AFFINITY_LEVEL(mpidr, 0); + cpuid_topo->cluster_id = + ((mpidr & MPIDR_AFF_MASK(1)) >> mpidr_hash.shift_aff[1] | + (mpidr & MPIDR_AFF_MASK(2)) >> mpidr_hash.shift_aff[2] | + (mpidr & MPIDR_AFF_MASK(3)) >> mpidr_hash.shift_aff[3]) + >> mpidr_hash.shift_aff[0]; + } + + pr_debug("CPU%u: cluster %d core %d thread %d mpidr %llx\n", + cpuid, cpuid_topo->cluster_id, cpuid_topo->core_id, + cpuid_topo->thread_id, mpidr); + +topology_populated: update_siblings_masks(cpuid); }