[1/4] ARM: vexpress/dcscb: fix cache disabling sequences

Message ID alpine.LFD.2.03.1307221353360.15022@syhkavp.arg
State New
Headers show

Commit Message

Nicolas Pitre July 22, 2013, 5:58 p.m.
On Fri, 19 Jul 2013, Lorenzo Pieralisi wrote:

> On Fri, Jul 19, 2013 at 11:28:49AM +0100, Dave Martin wrote:
> > On Thu, Jul 18, 2013 at 02:59:06PM -0400, Nicolas Pitre wrote:
> > > On Thu, 18 Jul 2013, Dave Martin wrote:
> > > 
> > > > I had other names in mind, like "coherency_exit" or "cache_exit".
> > > > Those are not very intelligible, but that might at least make people
> > > > pause and think before blindly using it.
> > > 
> > > Good point.  It should still embody the architecture name for which it 
> > > is valid though.
> > 
> > Sure, I was assuming something would be pasted on the start of the name.
> 
> v7 :-) with a comment describing the assumptions (in particular related
> as Dave mentioned to the SMP bit behaviour) ?

OK... What about this then:

----- >8
From: Nicolas Pitre <nicolas.pitre@linaro.org>
Date: Thu, 18 Jul 2013 13:12:48 -0400
Subject: [PATCH] ARM: cacheflush: consolidate single-CPU ARMv7 cache disabling
 code

This code is becoming duplicated in many places.  So let's consolidate
it into a handy macro that is known to be right and available for reuse.

Signed-off-by: Nicolas Pitre <nico@linaro.org>

Comments

Dave Martin July 23, 2013, 10:43 a.m. | #1
On Mon, Jul 22, 2013 at 01:58:12PM -0400, Nicolas Pitre wrote:
> On Fri, 19 Jul 2013, Lorenzo Pieralisi wrote:
> 
> > On Fri, Jul 19, 2013 at 11:28:49AM +0100, Dave Martin wrote:
> > > On Thu, Jul 18, 2013 at 02:59:06PM -0400, Nicolas Pitre wrote:
> > > > On Thu, 18 Jul 2013, Dave Martin wrote:
> > > > 
> > > > > I had other names in mind, like "coherency_exit" or "cache_exit".
> > > > > Those are not very intelligible, but that might at least make people
> > > > > pause and think before blindly using it.
> > > > 
> > > > Good point.  It should still embody the architecture name for which it 
> > > > is valid though.
> > > 
> > > Sure, I was assuming something would be pasted on the start of the name.
> > 
> > v7 :-) with a comment describing the assumptions (in particular related
> > as Dave mentioned to the SMP bit behaviour) ?
> 
> OK... What about this then:
> 
> ----- >8
> From: Nicolas Pitre <nicolas.pitre@linaro.org>
> Date: Thu, 18 Jul 2013 13:12:48 -0400
> Subject: [PATCH] ARM: cacheflush: consolidate single-CPU ARMv7 cache disabling
>  code
> 
> This code is becoming duplicated in many places.  So let's consolidate
> it into a handy macro that is known to be right and available for reuse.
> 
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> 
> diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
> index 17d0ae8672..8f4e2297e2 100644
> --- a/arch/arm/include/asm/cacheflush.h
> +++ b/arch/arm/include/asm/cacheflush.h
> @@ -436,4 +436,44 @@ static inline void __sync_cache_range_r(volatile void *p, size_t size)
>  #define sync_cache_w(ptr) __sync_cache_range_w(ptr, sizeof *(ptr))
>  #define sync_cache_r(ptr) __sync_cache_range_r(ptr, sizeof *(ptr))
>  
> +/*
> + * Disabling cache access for one CPU in an ARMv7 SMP system is tricky.
> + * To do so we must:
> + *
> + * - Clear the SCTLR.C bit to prevent further cache allocations
> + * - Flush the desired level of cache
> + * - Clear the ACTLR "SMP" bit to disable local coherency
> + *
> + * ... and so without any intervening memory access in between those steps,
> + * not even to the stack.
> + *
> + * WARNING -- After this has been called:
> + *
> + * - No ldr/str exclusive must be used.

Maybe write ldrex/strex (I misread that the first time around as "no
ldr/str ... must be used", which seemed a bit drastic)

> + * - The CPU is obviously no longer coherent with the other CPUs.
> + *
> + * Further considerations:
> + *
> + * - This relies on the presence and behavior of the AUXCR.SMP bit as
> + *   documented in the ARMv7 TRM.  Vendor implementations that deviate from

Sorry to be pedantic here, but there is no "ARMv7 TRM".  The SMP bit is
not part of ARMv7 at all.  Also, it seems that A9 isn't precisely the
same: two ACTLR bits need to be twiddled.  R-class CPUs are generally
not the same either.

This is why I preferred to treat the whole sequence as specific to a
particular CPU implementation.  The similarity between A7 and A15
might be viewed as a happy coincidence (it also makes life easier in
big.LITTLE land).

(Also, you mix the names "ACTLR" and "AUXCR", but people will generally
be able to guess those are the same thing)

 
> + *   that will need their own procedure.
> + * - This is unlikely to work if Linux is running non-secure.
> + */
> +#define v7_exit_coherency_flush(level) \

... hence:

#define cortex_a15_exit_coherency(level) ...

#define cortex_a7_exit_coherency(level) cortex_a15_exit_coherency(level)


What do you think?

Cheers
---Dave

> +	asm volatile( \
> +	"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t" \
> +	"bic	r0, r0, #"__stringify(CR_C)" \n\t" \
> +	"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t" \
> +	"isb	\n\t" \
> +	"bl	v7_flush_dcache_"__stringify(level)" \n\t" \
> +	"clrex	\n\t" \
> +	"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t" \
> +	"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t" \
> +	"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t" \
> +	"isb	\n\t" \
> +	"dsb	" \
> +	/* The clobber list is dictated by the call to v7_flush_dcache_* */ \
> +	: : : "r0","r1","r2","r3","r4","r5","r6","r7", \
> +	      "r9","r10","r11","lr","memory" )
> +
>  #endif
> diff --git a/arch/arm/mach-vexpress/dcscb.c b/arch/arm/mach-vexpress/dcscb.c
> index 85fffa702f..14d4996887 100644
> --- a/arch/arm/mach-vexpress/dcscb.c
> +++ b/arch/arm/mach-vexpress/dcscb.c
> @@ -133,32 +133,8 @@ static void dcscb_power_down(void)
>  	if (last_man && __mcpm_outbound_enter_critical(cpu, cluster)) {
>  		arch_spin_unlock(&dcscb_lock);
>  
> -		/*
> -		 * Flush all cache levels for this cluster.
> -		 *
> -		 * To do so we do:
> -		 * - Clear the SCTLR.C bit to prevent further cache allocations
> -		 * - Flush the whole cache
> -		 * - Clear the ACTLR "SMP" bit to disable local coherency
> -		 *
> -		 * Let's do it in the safest possible way i.e. with
> -		 * no memory access within the following sequence
> -		 * including to the stack.
> -		 */
> -		asm volatile(
> -		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
> -		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
> -		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
> -		"isb	\n\t"
> -		"bl	v7_flush_dcache_all \n\t"
> -		"clrex	\n\t"
> -		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
> -		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
> -		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
> -		"isb	\n\t"
> -		"dsb	"
> -		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
> -		      "r9","r10","r11","lr","memory");
> +		/* Flush all cache levels for this cluster. */
> +		v7_exit_coherency_flush(all);
>  
>  		/*
>  		 * This is a harmless no-op.  On platforms with a real
> @@ -177,24 +153,8 @@ static void dcscb_power_down(void)
>  	} else {
>  		arch_spin_unlock(&dcscb_lock);
>  
> -		/*
> -		 * Flush the local CPU cache.
> -		 * Let's do it in the safest possible way as above.
> -		 */
> -		asm volatile(
> -		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
> -		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
> -		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
> -		"isb	\n\t"
> -		"bl	v7_flush_dcache_louis \n\t"
> -		"clrex	\n\t"
> -		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
> -		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
> -		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
> -		"isb	\n\t"
> -		"dsb	"
> -		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
> -		      "r9","r10","r11","lr","memory");
> +		/* Disable and flush the local CPU cache. */
> +		v7_exit_coherency_flush(louis);
>  	}
>  
>  	__mcpm_cpu_down(cpu, cluster);
> diff --git a/arch/arm/mach-vexpress/tc2_pm.c b/arch/arm/mach-vexpress/tc2_pm.c
> index dfb55d45b6..5940f1e317 100644
> --- a/arch/arm/mach-vexpress/tc2_pm.c
> +++ b/arch/arm/mach-vexpress/tc2_pm.c
> @@ -134,26 +134,7 @@ static void tc2_pm_down(u64 residency)
>  			: : "r" (0x400) );
>  		}
>  
> -		/*
> -		 * We need to disable and flush the whole (L1 and L2) cache.
> -		 * Let's do it in the safest possible way i.e. with
> -		 * no memory access within the following sequence
> -		 * including the stack.
> -		 */
> -		asm volatile(
> -		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
> -		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
> -		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
> -		"isb	\n\t"
> -		"bl	v7_flush_dcache_all \n\t"
> -		"clrex	\n\t"
> -		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
> -		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
> -		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
> -		"isb	\n\t"
> -		"dsb	"
> -		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
> -		      "r9","r10","r11","lr","memory");
> +		v7_exit_coherency_flush(all);
>  
>  		cci_disable_port_by_cpu(mpidr);
>  
> @@ -169,24 +150,7 @@ static void tc2_pm_down(u64 residency)
>  
>  		arch_spin_unlock(&tc2_pm_lock);
>  
> -		/*
> -		 * We need to disable and flush only the L1 cache.
> -		 * Let's do it in the safest possible way as above.
> -		 */
> -		asm volatile(
> -		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
> -		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
> -		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
> -		"isb	\n\t"
> -		"bl	v7_flush_dcache_louis \n\t"
> -		"clrex	\n\t"
> -		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
> -		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
> -		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
> -		"isb	\n\t"
> -		"dsb	"
> -		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
> -		      "r9","r10","r11","lr","memory");
> +		v7_exit_coherency_flush(louis);
>  	}
>  
>  	__mcpm_cpu_down(cpu, cluster);
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Nicolas Pitre July 23, 2013, 12:28 p.m. | #2
On Tue, 23 Jul 2013, Dave Martin wrote:

> On Mon, Jul 22, 2013 at 01:58:12PM -0400, Nicolas Pitre wrote:
> > On Fri, 19 Jul 2013, Lorenzo Pieralisi wrote:
> > 
> > > On Fri, Jul 19, 2013 at 11:28:49AM +0100, Dave Martin wrote:
> > > > On Thu, Jul 18, 2013 at 02:59:06PM -0400, Nicolas Pitre wrote:
> > > > > On Thu, 18 Jul 2013, Dave Martin wrote:
> > > > > 
> > > > > > I had other names in mind, like "coherency_exit" or "cache_exit".
> > > > > > Those are not very intelligible, but that might at least make people
> > > > > > pause and think before blindly using it.
> > > > > 
> > > > > Good point.  It should still embody the architecture name for which it 
> > > > > is valid though.
> > > > 
> > > > Sure, I was assuming something would be pasted on the start of the name.
> > > 
> > > v7 :-) with a comment describing the assumptions (in particular related
> > > as Dave mentioned to the SMP bit behaviour) ?
> > 
> > OK... What about this then:
> > 
> > ----- >8
> > From: Nicolas Pitre <nicolas.pitre@linaro.org>
> > Date: Thu, 18 Jul 2013 13:12:48 -0400
> > Subject: [PATCH] ARM: cacheflush: consolidate single-CPU ARMv7 cache disabling
> >  code
> > 
> > This code is becoming duplicated in many places.  So let's consolidate
> > it into a handy macro that is known to be right and available for reuse.
> > 
> > Signed-off-by: Nicolas Pitre <nico@linaro.org>
> > 
> > diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
> > index 17d0ae8672..8f4e2297e2 100644
> > --- a/arch/arm/include/asm/cacheflush.h
> > +++ b/arch/arm/include/asm/cacheflush.h
> > @@ -436,4 +436,44 @@ static inline void __sync_cache_range_r(volatile void *p, size_t size)
> >  #define sync_cache_w(ptr) __sync_cache_range_w(ptr, sizeof *(ptr))
> >  #define sync_cache_r(ptr) __sync_cache_range_r(ptr, sizeof *(ptr))
> >  
> > +/*
> > + * Disabling cache access for one CPU in an ARMv7 SMP system is tricky.
> > + * To do so we must:
> > + *
> > + * - Clear the SCTLR.C bit to prevent further cache allocations
> > + * - Flush the desired level of cache
> > + * - Clear the ACTLR "SMP" bit to disable local coherency
> > + *
> > + * ... and so without any intervening memory access in between those steps,
> > + * not even to the stack.
> > + *
> > + * WARNING -- After this has been called:
> > + *
> > + * - No ldr/str exclusive must be used.
> 
> Maybe write ldrex/strex (I misread that the first time around as "no
> ldr/str ... must be used", which seemed a bit drastic)

OK.

> > + * - The CPU is obviously no longer coherent with the other CPUs.
> > + *
> > + * Further considerations:
> > + *
> > + * - This relies on the presence and behavior of the AUXCR.SMP bit as
> > + *   documented in the ARMv7 TRM.  Vendor implementations that deviate from
> 
> Sorry to be pedantic here, but there is no "ARMv7 TRM".  The SMP bit is
> not part of ARMv7 at all.

Well, I just copied Lorenzo's words here, trusting he knew more about it 
than I do.

> Also, it seems that A9 isn't precisely the
> same: two ACTLR bits need to be twiddled.  R-class CPUs are generally
> not the same either.
> 
> This is why I preferred to treat the whole sequence as specific to a
> particular CPU implementation.  The similarity between A7 and A15
> might be viewed as a happy coincidence (it also makes life easier in
> big.LITTLE land).

Fair enough.

> (Also, you mix the names "ACTLR" and "AUXCR", but people will generally
> be able to guess those are the same thing)

We apparently use both in the kernel already.  Which one is the 
canonical one?

> > + *   that will need their own procedure.
> > + * - This is unlikely to work if Linux is running non-secure.
> > + */
> > +#define v7_exit_coherency_flush(level) \
> 
> ... hence:
> 
> #define cortex_a15_exit_coherency(level) ...
> 
> #define cortex_a7_exit_coherency(level) cortex_a15_exit_coherency(level)
> 
> What do you think?

To be pedantic again, we don't really exit coherency at some cache 
level.  Hence my usage of "flush" indicating that the level argument 
applies only to that and not to the coherency status.


Nicolas
Lorenzo Pieralisi July 23, 2013, 4:33 p.m. | #3
On Tue, Jul 23, 2013 at 01:28:16PM +0100, Nicolas Pitre wrote:

[...]

> > > + * - The CPU is obviously no longer coherent with the other CPUs.
> > > + *
> > > + * Further considerations:
> > > + *
> > > + * - This relies on the presence and behavior of the AUXCR.SMP bit as
> > > + *   documented in the ARMv7 TRM.  Vendor implementations that deviate from
> > 
> > Sorry to be pedantic here, but there is no "ARMv7 TRM".  The SMP bit is
> > not part of ARMv7 at all.
> 
> Well, I just copied Lorenzo's words here, trusting he knew more about it 
> than I do.
> 
> > Also, it seems that A9 isn't precisely the
> > same: two ACTLR bits need to be twiddled.  R-class CPUs are generally
> > not the same either.

If you mean the ACTLR.FW bit in A9, A5, and R7, then, IIRC, we do not need to
clear it, clearing the SMP bit is enough.

See, Dave has a point, there is no explicit "unified v7 TRM disable
clean and exit coherency procedure" even though the designers end goal is to
have one and that's the one you wrote. The code you posted is perfectly ok on
all v7 implementations in the kernel I am aware of, I stand to be corrected
but to the best of my knowledge that's the case.

> > This is why I preferred to treat the whole sequence as specific to a
> > particular CPU implementation.  The similarity between A7 and A15
> > might be viewed as a happy coincidence (it also makes life easier in
> > big.LITTLE land).
> 
> Fair enough.

I disagree on the happy coincidence but the point is taken. I am not
sure about what we should do, but I reiterate my point, the sequence as
it stands is OK on all NS v7 implementations I am aware of. We can add
macros to differentiate processors when we need them, but again that's
just my opinion, as correct as it can be.

Lorenzo
Dave Martin July 25, 2013, 12:04 p.m. | #4
On Tue, Jul 23, 2013 at 05:33:19PM +0100, Lorenzo Pieralisi wrote:
> On Tue, Jul 23, 2013 at 01:28:16PM +0100, Nicolas Pitre wrote:
> 
> [...]
> 
> > > > + * - The CPU is obviously no longer coherent with the other CPUs.
> > > > + *
> > > > + * Further considerations:
> > > > + *
> > > > + * - This relies on the presence and behavior of the AUXCR.SMP bit as
> > > > + *   documented in the ARMv7 TRM.  Vendor implementations that deviate from
> > > 
> > > Sorry to be pedantic here, but there is no "ARMv7 TRM".  The SMP bit is
> > > not part of ARMv7 at all.
> > 
> > Well, I just copied Lorenzo's words here, trusting he knew more about it 
> > than I do.
> > 
> > > Also, it seems that A9 isn't precisely the
> > > same: two ACTLR bits need to be twiddled.  R-class CPUs are generally
> > > not the same either.
> 
> If you mean the ACTLR.FW bit in A9, A5, and R7, then, IIRC, we do not need to
> clear it, clearing the SMP bit is enough.
> 
> See, Dave has a point, there is no explicit "unified v7 TRM disable
> clean and exit coherency procedure" even though the designers end goal is to
> have one and that's the one you wrote. The code you posted is perfectly ok on
> all v7 implementations in the kernel I am aware of, I stand to be corrected
> but to the best of my knowledge that's the case.

What I'm really concerned about here is not Cortex-*, but the vendors
who have their own CPU implementations.  I don't think I've ever seen
a TRM for one of those.

> 
> > > This is why I preferred to treat the whole sequence as specific to a
> > > particular CPU implementation.  The similarity between A7 and A15
> > > might be viewed as a happy coincidence (it also makes life easier in
> > > big.LITTLE land).
> > 
> > Fair enough.
> 
> I disagree on the happy coincidence but the point is taken. I am not

I confess to exaggeration!  There was obviously an effort to align A15
and A7, but we know there are differences at the system level, and there
is always the possibility of b.L pairings which differ on things like
the SMP bit, or b.L pairings of CPUs not originally designed to pair but
which are "similar enough" to be pairable.

> sure about what we should do, but I reiterate my point, the sequence as
> it stands is OK on all NS v7 implementations I am aware of. We can add
> macros to differentiate processors when we need them, but again that's
> just my opinion, as correct as it can be.

I still vote for naming them a15_* a7_* or similar.

For big.LITTLE pairings, I think we should assume that the big CPU's
macro is usable on the LITTLE CPU too.  Assuming sane CPU designs, that
will be true, and we know it works for A15+A7.

So we would use the a15 macro on TC2, but the a7 macro on an a7-only
platform.


The other way to abstract this stuff would be to add a new per-CPU
proc_fn, but that may be overkill, particularly since we know the whole
mcpm backend may tend to be platform-specific, not just the CPU coherency
exit part.

Cheers
---Dave
Nicolas Pitre July 26, 2013, 2:58 p.m. | #5
On Thu, 25 Jul 2013, Dave Martin wrote:

> On Tue, Jul 23, 2013 at 05:33:19PM +0100, Lorenzo Pieralisi wrote:
> > On Tue, Jul 23, 2013 at 01:28:16PM +0100, Nicolas Pitre wrote:
> > 
> > [...]
> > 
> > > > > + * - The CPU is obviously no longer coherent with the other CPUs.
> > > > > + *
> > > > > + * Further considerations:
> > > > > + *
> > > > > + * - This relies on the presence and behavior of the AUXCR.SMP bit as
> > > > > + *   documented in the ARMv7 TRM.  Vendor implementations that deviate from
> > > > 
> > > > Sorry to be pedantic here, but there is no "ARMv7 TRM".  The SMP bit is
> > > > not part of ARMv7 at all.
> > > 
> > > Well, I just copied Lorenzo's words here, trusting he knew more about it 
> > > than I do.
> > > 
> > > > Also, it seems that A9 isn't precisely the
> > > > same: two ACTLR bits need to be twiddled.  R-class CPUs are generally
> > > > not the same either.
> > 
> > If you mean the ACTLR.FW bit in A9, A5, and R7, then, IIRC, we do not need to
> > clear it, clearing the SMP bit is enough.
> > 
> > See, Dave has a point, there is no explicit "unified v7 TRM disable
> > clean and exit coherency procedure" even though the designers end goal is to
> > have one and that's the one you wrote. The code you posted is perfectly ok on
> > all v7 implementations in the kernel I am aware of, I stand to be corrected
> > but to the best of my knowledge that's the case.
> 
> What I'm really concerned about here is not Cortex-*, but the vendors
> who have their own CPU implementations.  I don't think I've ever seen
> a TRM for one of those.

Could we wait until they materialize before worrying about possible 
incompatibility issues?  After all this is not a user space ABI that 
cannot be changed afterwards.  Remember this is Linux internal code we 
have the ability to change when needed, hence we should resist the 
temptation to over-engineer.

> I confess to exaggeration!  There was obviously an effort to align A15
> and A7, but we know there are differences at the system level, and there
> is always the possibility of b.L pairings which differ on things like
> the SMP bit, or b.L pairings of CPUs not originally designed to pair but
> which are "similar enough" to be pairable.
> 
> > sure about what we should do, but I reiterate my point, the sequence as
> > it stands is OK on all NS v7 implementations I am aware of. We can add
> > macros to differentiate processors when we need them, but again that's
> > just my opinion, as correct as it can be.
> 
> I still vote for naming them a15_* a7_* or similar.
> 
> For big.LITTLE pairings, I think we should assume that the big CPU's
> macro is usable on the LITTLE CPU too.  Assuming sane CPU designs, that
> will be true, and we know it works for A15+A7.

The MCPM backend don't have to assume any kind of pairing.  The TC2 
backend already has to test whether or not it is running on an A15 in 
order to turn off L2 prefetching which is an A15-only feature.  This is 
done outside of this macro exactly to keep the macro generic to all 
known (so far) v7 implementations.


Nicolas
Dave Martin July 26, 2013, 4 p.m. | #6
On Fri, Jul 26, 2013 at 10:58:27AM -0400, Nicolas Pitre wrote:
> On Thu, 25 Jul 2013, Dave Martin wrote:
> 
> > On Tue, Jul 23, 2013 at 05:33:19PM +0100, Lorenzo Pieralisi wrote:
> > > On Tue, Jul 23, 2013 at 01:28:16PM +0100, Nicolas Pitre wrote:
> > > 
> > > [...]
> > > 
> > > > > > + * - The CPU is obviously no longer coherent with the other CPUs.
> > > > > > + *
> > > > > > + * Further considerations:
> > > > > > + *
> > > > > > + * - This relies on the presence and behavior of the AUXCR.SMP bit as
> > > > > > + *   documented in the ARMv7 TRM.  Vendor implementations that deviate from
> > > > > 
> > > > > Sorry to be pedantic here, but there is no "ARMv7 TRM".  The SMP bit is
> > > > > not part of ARMv7 at all.
> > > > 
> > > > Well, I just copied Lorenzo's words here, trusting he knew more about it 
> > > > than I do.
> > > > 
> > > > > Also, it seems that A9 isn't precisely the
> > > > > same: two ACTLR bits need to be twiddled.  R-class CPUs are generally
> > > > > not the same either.
> > > 
> > > If you mean the ACTLR.FW bit in A9, A5, and R7, then, IIRC, we do not need to
> > > clear it, clearing the SMP bit is enough.
> > > 
> > > See, Dave has a point, there is no explicit "unified v7 TRM disable
> > > clean and exit coherency procedure" even though the designers end goal is to
> > > have one and that's the one you wrote. The code you posted is perfectly ok on
> > > all v7 implementations in the kernel I am aware of, I stand to be corrected
> > > but to the best of my knowledge that's the case.
> > 
> > What I'm really concerned about here is not Cortex-*, but the vendors
> > who have their own CPU implementations.  I don't think I've ever seen
> > a TRM for one of those.
> 
> Could we wait until they materialize before worrying about possible 
> incompatibility issues?  After all this is not a user space ABI that 
> cannot be changed afterwards.  Remember this is Linux internal code we 
> have the ability to change when needed, hence we should resist the 
> temptation to over-engineer.

If I could think of a better name, I would suggest it ... but unless we call
it "cortexa" or something, I don't know what it should be.  I'm not sure
how much I like that either.  Strictly speaking, that's still not correct.

So I guess this comes down to being clear about which CPUs we expect it to
work for.

Right now, we know it works for A7 and A15.

It sounds like we believe it might work for A9.  Lorenzo might know about
A5, A7 etc.  But I'm not sure this exact sequence has been tested on any
of those CPUs(?)

We shouldn't expect it to work for any non ARM Ltd CPU, but on platforms
where firmware does the power control it's not Linux's responsibility
anyhow.  (Probably the case for most of those vendors).

> > I confess to exaggeration!  There was obviously an effort to align A15
> > and A7, but we know there are differences at the system level, and there
> > is always the possibility of b.L pairings which differ on things like
> > the SMP bit, or b.L pairings of CPUs not originally designed to pair but
> > which are "similar enough" to be pairable.
> > 
> > > sure about what we should do, but I reiterate my point, the sequence as
> > > it stands is OK on all NS v7 implementations I am aware of. We can add
> > > macros to differentiate processors when we need them, but again that's
> > > just my opinion, as correct as it can be.
> > 
> > I still vote for naming them a15_* a7_* or similar.
> > 
> > For big.LITTLE pairings, I think we should assume that the big CPU's
> > macro is usable on the LITTLE CPU too.  Assuming sane CPU designs, that
> > will be true, and we know it works for A15+A7.
> 
> The MCPM backend don't have to assume any kind of pairing.  The TC2 
> backend already has to test whether or not it is running on an A15 in 
> order to turn off L2 prefetching which is an A15-only feature.  This is 
> done outside of this macro exactly to keep the macro generic to all 
> known (so far) v7 implementations.

That's true.

Cheers
---Dave
Lorenzo Pieralisi July 26, 2013, 4:24 p.m. | #7
On Fri, Jul 26, 2013 at 05:00:15PM +0100, Dave Martin wrote:
> On Fri, Jul 26, 2013 at 10:58:27AM -0400, Nicolas Pitre wrote:
> > On Thu, 25 Jul 2013, Dave Martin wrote:
> > 
> > > On Tue, Jul 23, 2013 at 05:33:19PM +0100, Lorenzo Pieralisi wrote:
> > > > On Tue, Jul 23, 2013 at 01:28:16PM +0100, Nicolas Pitre wrote:
> > > > 
> > > > [...]
> > > > 
> > > > > > > + * - The CPU is obviously no longer coherent with the other CPUs.
> > > > > > > + *
> > > > > > > + * Further considerations:
> > > > > > > + *
> > > > > > > + * - This relies on the presence and behavior of the AUXCR.SMP bit as
> > > > > > > + *   documented in the ARMv7 TRM.  Vendor implementations that deviate from
> > > > > > 
> > > > > > Sorry to be pedantic here, but there is no "ARMv7 TRM".  The SMP bit is
> > > > > > not part of ARMv7 at all.
> > > > > 
> > > > > Well, I just copied Lorenzo's words here, trusting he knew more about it 
> > > > > than I do.
> > > > > 
> > > > > > Also, it seems that A9 isn't precisely the
> > > > > > same: two ACTLR bits need to be twiddled.  R-class CPUs are generally
> > > > > > not the same either.
> > > > 
> > > > If you mean the ACTLR.FW bit in A9, A5, and R7, then, IIRC, we do not need to
> > > > clear it, clearing the SMP bit is enough.
> > > > 
> > > > See, Dave has a point, there is no explicit "unified v7 TRM disable
> > > > clean and exit coherency procedure" even though the designers end goal is to
> > > > have one and that's the one you wrote. The code you posted is perfectly ok on
> > > > all v7 implementations in the kernel I am aware of, I stand to be corrected
> > > > but to the best of my knowledge that's the case.
> > > 
> > > What I'm really concerned about here is not Cortex-*, but the vendors
> > > who have their own CPU implementations.  I don't think I've ever seen
> > > a TRM for one of those.
> > 
> > Could we wait until they materialize before worrying about possible 
> > incompatibility issues?  After all this is not a user space ABI that 
> > cannot be changed afterwards.  Remember this is Linux internal code we 
> > have the ability to change when needed, hence we should resist the 
> > temptation to over-engineer.
> 
> If I could think of a better name, I would suggest it ... but unless we call
> it "cortexa" or something, I don't know what it should be.  I'm not sure
> how much I like that either.  Strictly speaking, that's still not correct.
> 
> So I guess this comes down to being clear about which CPUs we expect it to
> work for.
> 
> Right now, we know it works for A7 and A15.
> 
> It sounds like we believe it might work for A9.  Lorenzo might know about
> A5, A7 etc.  But I'm not sure this exact sequence has been tested on any
> of those CPUs(?)

I tested this sequence on A9 MP a while ago, even though, obviously was not
this same macro. I know for certain it has been tested on A5, again not the
same macro but the same sequence.

On R7 honestly I can't certify.

What if we postpone the consolidation till MCPM for some A9 platforms
(eg Tegra) is posted (and in the meantime you leave the macro in mach-vexpress
to avoid duplication) ?

Thank you,
Lorenzo

Patch

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 17d0ae8672..8f4e2297e2 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -436,4 +436,44 @@  static inline void __sync_cache_range_r(volatile void *p, size_t size)
 #define sync_cache_w(ptr) __sync_cache_range_w(ptr, sizeof *(ptr))
 #define sync_cache_r(ptr) __sync_cache_range_r(ptr, sizeof *(ptr))
 
+/*
+ * Disabling cache access for one CPU in an ARMv7 SMP system is tricky.
+ * To do so we must:
+ *
+ * - Clear the SCTLR.C bit to prevent further cache allocations
+ * - Flush the desired level of cache
+ * - Clear the ACTLR "SMP" bit to disable local coherency
+ *
+ * ... and so without any intervening memory access in between those steps,
+ * not even to the stack.
+ *
+ * WARNING -- After this has been called:
+ *
+ * - No ldr/str exclusive must be used.
+ * - The CPU is obviously no longer coherent with the other CPUs.
+ *
+ * Further considerations:
+ *
+ * - This relies on the presence and behavior of the AUXCR.SMP bit as
+ *   documented in the ARMv7 TRM.  Vendor implementations that deviate from
+ *   that will need their own procedure.
+ * - This is unlikely to work if Linux is running non-secure.
+ */
+#define v7_exit_coherency_flush(level) \
+	asm volatile( \
+	"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t" \
+	"bic	r0, r0, #"__stringify(CR_C)" \n\t" \
+	"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t" \
+	"isb	\n\t" \
+	"bl	v7_flush_dcache_"__stringify(level)" \n\t" \
+	"clrex	\n\t" \
+	"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t" \
+	"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t" \
+	"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t" \
+	"isb	\n\t" \
+	"dsb	" \
+	/* The clobber list is dictated by the call to v7_flush_dcache_* */ \
+	: : : "r0","r1","r2","r3","r4","r5","r6","r7", \
+	      "r9","r10","r11","lr","memory" )
+
 #endif
diff --git a/arch/arm/mach-vexpress/dcscb.c b/arch/arm/mach-vexpress/dcscb.c
index 85fffa702f..14d4996887 100644
--- a/arch/arm/mach-vexpress/dcscb.c
+++ b/arch/arm/mach-vexpress/dcscb.c
@@ -133,32 +133,8 @@  static void dcscb_power_down(void)
 	if (last_man && __mcpm_outbound_enter_critical(cpu, cluster)) {
 		arch_spin_unlock(&dcscb_lock);
 
-		/*
-		 * Flush all cache levels for this cluster.
-		 *
-		 * To do so we do:
-		 * - Clear the SCTLR.C bit to prevent further cache allocations
-		 * - Flush the whole cache
-		 * - Clear the ACTLR "SMP" bit to disable local coherency
-		 *
-		 * Let's do it in the safest possible way i.e. with
-		 * no memory access within the following sequence
-		 * including to the stack.
-		 */
-		asm volatile(
-		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
-		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
-		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
-		"isb	\n\t"
-		"bl	v7_flush_dcache_all \n\t"
-		"clrex	\n\t"
-		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
-		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
-		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
-		"isb	\n\t"
-		"dsb	"
-		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
-		      "r9","r10","r11","lr","memory");
+		/* Flush all cache levels for this cluster. */
+		v7_exit_coherency_flush(all);
 
 		/*
 		 * This is a harmless no-op.  On platforms with a real
@@ -177,24 +153,8 @@  static void dcscb_power_down(void)
 	} else {
 		arch_spin_unlock(&dcscb_lock);
 
-		/*
-		 * Flush the local CPU cache.
-		 * Let's do it in the safest possible way as above.
-		 */
-		asm volatile(
-		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
-		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
-		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
-		"isb	\n\t"
-		"bl	v7_flush_dcache_louis \n\t"
-		"clrex	\n\t"
-		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
-		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
-		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
-		"isb	\n\t"
-		"dsb	"
-		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
-		      "r9","r10","r11","lr","memory");
+		/* Disable and flush the local CPU cache. */
+		v7_exit_coherency_flush(louis);
 	}
 
 	__mcpm_cpu_down(cpu, cluster);
diff --git a/arch/arm/mach-vexpress/tc2_pm.c b/arch/arm/mach-vexpress/tc2_pm.c
index dfb55d45b6..5940f1e317 100644
--- a/arch/arm/mach-vexpress/tc2_pm.c
+++ b/arch/arm/mach-vexpress/tc2_pm.c
@@ -134,26 +134,7 @@  static void tc2_pm_down(u64 residency)
 			: : "r" (0x400) );
 		}
 
-		/*
-		 * We need to disable and flush the whole (L1 and L2) cache.
-		 * Let's do it in the safest possible way i.e. with
-		 * no memory access within the following sequence
-		 * including the stack.
-		 */
-		asm volatile(
-		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
-		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
-		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
-		"isb	\n\t"
-		"bl	v7_flush_dcache_all \n\t"
-		"clrex	\n\t"
-		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
-		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
-		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
-		"isb	\n\t"
-		"dsb	"
-		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
-		      "r9","r10","r11","lr","memory");
+		v7_exit_coherency_flush(all);
 
 		cci_disable_port_by_cpu(mpidr);
 
@@ -169,24 +150,7 @@  static void tc2_pm_down(u64 residency)
 
 		arch_spin_unlock(&tc2_pm_lock);
 
-		/*
-		 * We need to disable and flush only the L1 cache.
-		 * Let's do it in the safest possible way as above.
-		 */
-		asm volatile(
-		"mrc	p15, 0, r0, c1, c0, 0	@ get CR \n\t"
-		"bic	r0, r0, #"__stringify(CR_C)" \n\t"
-		"mcr	p15, 0, r0, c1, c0, 0	@ set CR \n\t"
-		"isb	\n\t"
-		"bl	v7_flush_dcache_louis \n\t"
-		"clrex	\n\t"
-		"mrc	p15, 0, r0, c1, c0, 1	@ get AUXCR \n\t"
-		"bic	r0, r0, #(1 << 6)	@ disable local coherency \n\t"
-		"mcr	p15, 0, r0, c1, c0, 1	@ set AUXCR \n\t"
-		"isb	\n\t"
-		"dsb	"
-		: : : "r0","r1","r2","r3","r4","r5","r6","r7",
-		      "r9","r10","r11","lr","memory");
+		v7_exit_coherency_flush(louis);
 	}
 
 	__mcpm_cpu_down(cpu, cluster);