diff mbox series

[RESEND] arm64: arch_timer: Workaround for Cortex-A73 erratum 858921

Message ID 1504069569-27809-1-git-send-email-leo.yan@linaro.org
State New
Headers show
Series [RESEND] arm64: arch_timer: Workaround for Cortex-A73 erratum 858921 | expand

Commit Message

Leo Yan Aug. 30, 2017, 5:06 a.m. UTC
commit fa8d815fac96e7c9247783d5a1f8fa4685b3c543 upstream.

Cortex-A73 (all versions) counter read can return a wrong value
when the counter crosses a 32bit boundary.

The workaround involves performing the read twice, and to return
one or the other depending on whether a transition has taken place.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Signed-off-by: Leo Yan <leo.yan@linaro.org>

---
 arch/arm64/Kconfig                  | 12 ++++++++++++
 arch/arm64/include/asm/arch_timer.h | 11 +++++++++++
 2 files changed, 23 insertions(+)

-- 
2.7.4

Comments

Greg Kroah-Hartman Aug. 30, 2017, 6:03 a.m. UTC | #1
On Wed, Aug 30, 2017 at 01:06:09PM +0800, Leo Yan wrote:
> commit fa8d815fac96e7c9247783d5a1f8fa4685b3c543 upstream.

> 

> Cortex-A73 (all versions) counter read can return a wrong value

> when the counter crosses a 32bit boundary.

> 

> The workaround involves performing the read twice, and to return

> one or the other depending on whether a transition has taken place.

> 

> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

> Signed-off-by: Leo Yan <leo.yan@linaro.org>

> ---

>  arch/arm64/Kconfig                  | 12 ++++++++++++

>  arch/arm64/include/asm/arch_timer.h | 11 +++++++++++

>  2 files changed, 23 insertions(+)


Why [RESEND] ?

And what stable kernel tree(s) do you want this to be applied to?

thanks,

greg k-h
Leo Yan Aug. 30, 2017, 6:12 a.m. UTC | #2
On Wed, Aug 30, 2017 at 08:03:38AM +0200, Greg Kroah-Hartman wrote:
> On Wed, Aug 30, 2017 at 01:06:09PM +0800, Leo Yan wrote:

> > commit fa8d815fac96e7c9247783d5a1f8fa4685b3c543 upstream.

> > 

> > Cortex-A73 (all versions) counter read can return a wrong value

> > when the counter crosses a 32bit boundary.

> > 

> > The workaround involves performing the read twice, and to return

> > one or the other depending on whether a transition has taken place.

> > 

> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

> > Signed-off-by: Leo Yan <leo.yan@linaro.org>

> > ---

> >  arch/arm64/Kconfig                  | 12 ++++++++++++

> >  arch/arm64/include/asm/arch_timer.h | 11 +++++++++++

> >  2 files changed, 23 insertions(+)

> 

> Why [RESEND] ?


In the first email your address and mailing list address are mixed,
I am not sure if the patch has been sent out properly. So resend it
again. Sorry for spamming.

> And what stable kernel tree(s) do you want this to be applied to?


This patch is target for kernel 4.4 and 4.9.

Thanks,
Leo Yan
Marc Zyngier Aug. 30, 2017, 8:02 a.m. UTC | #3
+ Mark

On 30/08/17 06:06, Leo Yan wrote:
> commit fa8d815fac96e7c9247783d5a1f8fa4685b3c543 upstream.

> 

> Cortex-A73 (all versions) counter read can return a wrong value

> when the counter crosses a 32bit boundary.

> 

> The workaround involves performing the read twice, and to return

> one or the other depending on whether a transition has taken place.

> 

> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

> Signed-off-by: Leo Yan <leo.yan@linaro.org>

> ---

>  arch/arm64/Kconfig                  | 12 ++++++++++++

>  arch/arm64/include/asm/arch_timer.h | 11 +++++++++++

>  2 files changed, 23 insertions(+)

> 

> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig

> index 14cdc6d..68e7c98 100644

> --- a/arch/arm64/Kconfig

> +++ b/arch/arm64/Kconfig

> @@ -374,6 +374,18 @@ config ARM64_ERRATUM_843419

>  

>  	  If unsure, say Y.

>  

> +config ARM64_ERRATUM_858921

> +	bool "Cortex-A73: 858921: arch timer counter read can return a wrong value"

> +	default y

> +	depends on ARM_ARCH_TIMER && ARM64

> +	help

> +	  This option enables a workaround applicable to Cortex-A73

> +	  (all versions), whose counter may return incorrect values.

> +	  The workaround will be dynamically enabled when an affected

> +	  core is detected.

> +

> +	  If unsure, say Y.

> +

>  config CAVIUM_ERRATUM_22375

>  	bool "Cavium erratum 22375, 24313"

>  	default y

> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h

> index fbe0ca3..9b2b0f5 100644

> --- a/arch/arm64/include/asm/arch_timer.h

> +++ b/arch/arm64/include/asm/arch_timer.h

> @@ -114,6 +114,16 @@ static inline u64 arch_counter_get_cntpct(void)

>  	return 0;

>  }

>  

> +#ifdef CONFIG_ARM64_ERRATUM_858921

> +static inline u64 arch_counter_get_cntvct(void)

> +{

> +	u64 old, new;

> +

> +	asm volatile("mrs %0, cntvct_el0" : "=r" (old));

> +	asm volatile("mrs %0, cntvct_el0" : "=r" (new));

> +	return (((old ^ new) >> 32) & 1) ? old : new;

> +}

> +#else

>  static inline u64 arch_counter_get_cntvct(void)

>  {

>  	u64 cval;

> @@ -123,6 +133,7 @@ static inline u64 arch_counter_get_cntvct(void)

>  

>  	return cval;

>  }

> +#endif

>  

>  static inline int arch_timer_arch_init(void)

>  {

> 


This is completely busted:

- You are only addressing the kernel side, and ignore userspace (which
is just as broken as the kernel).
- You lie in the description of the option (this is in no way dynamic,
since you didn't backport the whole workaround infrastructure).

So I'm afraid I'm NAKing this. Please refrain from blindly backporting
random patches. fa8d815fac96e7c9 only makes sense in the context of the
whole series, and on its own gives you a very false sense of having
properly addressed it.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...
Leo Yan Aug. 30, 2017, 8:20 a.m. UTC | #4
Hi Marc,

On Wed, Aug 30, 2017 at 09:02:47AM +0100, Marc Zyngier wrote:
> + Mark

> 

> On 30/08/17 06:06, Leo Yan wrote:

> > commit fa8d815fac96e7c9247783d5a1f8fa4685b3c543 upstream.

> > 

> > Cortex-A73 (all versions) counter read can return a wrong value

> > when the counter crosses a 32bit boundary.

> > 

> > The workaround involves performing the read twice, and to return

> > one or the other depending on whether a transition has taken place.

> > 

> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

> > Signed-off-by: Leo Yan <leo.yan@linaro.org>

> > ---

> >  arch/arm64/Kconfig                  | 12 ++++++++++++

> >  arch/arm64/include/asm/arch_timer.h | 11 +++++++++++

> >  2 files changed, 23 insertions(+)

> > 

> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig

> > index 14cdc6d..68e7c98 100644

> > --- a/arch/arm64/Kconfig

> > +++ b/arch/arm64/Kconfig

> > @@ -374,6 +374,18 @@ config ARM64_ERRATUM_843419

> >  

> >  	  If unsure, say Y.

> >  

> > +config ARM64_ERRATUM_858921

> > +	bool "Cortex-A73: 858921: arch timer counter read can return a wrong value"

> > +	default y

> > +	depends on ARM_ARCH_TIMER && ARM64

> > +	help

> > +	  This option enables a workaround applicable to Cortex-A73

> > +	  (all versions), whose counter may return incorrect values.

> > +	  The workaround will be dynamically enabled when an affected

> > +	  core is detected.

> > +

> > +	  If unsure, say Y.

> > +

> >  config CAVIUM_ERRATUM_22375

> >  	bool "Cavium erratum 22375, 24313"

> >  	default y

> > diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h

> > index fbe0ca3..9b2b0f5 100644

> > --- a/arch/arm64/include/asm/arch_timer.h

> > +++ b/arch/arm64/include/asm/arch_timer.h

> > @@ -114,6 +114,16 @@ static inline u64 arch_counter_get_cntpct(void)

> >  	return 0;

> >  }

> >  

> > +#ifdef CONFIG_ARM64_ERRATUM_858921

> > +static inline u64 arch_counter_get_cntvct(void)

> > +{

> > +	u64 old, new;

> > +

> > +	asm volatile("mrs %0, cntvct_el0" : "=r" (old));

> > +	asm volatile("mrs %0, cntvct_el0" : "=r" (new));

> > +	return (((old ^ new) >> 32) & 1) ? old : new;

> > +}

> > +#else

> >  static inline u64 arch_counter_get_cntvct(void)

> >  {

> >  	u64 cval;

> > @@ -123,6 +133,7 @@ static inline u64 arch_counter_get_cntvct(void)

> >  

> >  	return cval;

> >  }

> > +#endif

> >  

> >  static inline int arch_timer_arch_init(void)

> >  {

> > 

> 

> This is completely busted:

> 

> - You are only addressing the kernel side, and ignore userspace (which

> is just as broken as the kernel).


Thanks for quick reviewing.

Yeah, I remembered there have some code directly reads arch timer
virtual counter from userspace. But can you remind for userspace
broken issue, is the code in libc or vdso?

Here I have another question is: after applied the whole workaround
infrastructure, can it also fix userspace broken issue?

> - You lie in the description of the option (this is in no way dynamic,

> since you didn't backport the whole workaround infrastructure).


I should fix it.

> So I'm afraid I'm NAKing this. Please refrain from blindly backporting

> random patches. fa8d815fac96e7c9 only makes sense in the context of the

> whole series, and on its own gives you a very false sense of having

> properly addressed it.


IIUC, at least fa8d815fac96e7c9 can fix issue in kernel side, such like
for sched_clock() roll back issue [1].

So for this issue, are you suggesting we need backport whole workaround
infrastructure onto kernel 4.4 and 4.9? Many ARM devices are working
with these two kernels.

[1] https://lists.linaro.org/pipermail/eas-dev/2017-August/000941.html

Thanks,
Leo Yan
Marc Zyngier Aug. 30, 2017, 8:46 a.m. UTC | #5
On 30/08/17 09:20, Leo Yan wrote:
> Hi Marc,

> 

> On Wed, Aug 30, 2017 at 09:02:47AM +0100, Marc Zyngier wrote:

>> + Mark

>>

>> On 30/08/17 06:06, Leo Yan wrote:

>>> commit fa8d815fac96e7c9247783d5a1f8fa4685b3c543 upstream.

>>>

>>> Cortex-A73 (all versions) counter read can return a wrong value

>>> when the counter crosses a 32bit boundary.

>>>

>>> The workaround involves performing the read twice, and to return

>>> one or the other depending on whether a transition has taken place.

>>>

>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

>>> Signed-off-by: Leo Yan <leo.yan@linaro.org>

>>> ---

>>>  arch/arm64/Kconfig                  | 12 ++++++++++++

>>>  arch/arm64/include/asm/arch_timer.h | 11 +++++++++++

>>>  2 files changed, 23 insertions(+)

>>>

>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig

>>> index 14cdc6d..68e7c98 100644

>>> --- a/arch/arm64/Kconfig

>>> +++ b/arch/arm64/Kconfig

>>> @@ -374,6 +374,18 @@ config ARM64_ERRATUM_843419

>>>  

>>>  	  If unsure, say Y.

>>>  

>>> +config ARM64_ERRATUM_858921

>>> +	bool "Cortex-A73: 858921: arch timer counter read can return a wrong value"

>>> +	default y

>>> +	depends on ARM_ARCH_TIMER && ARM64

>>> +	help

>>> +	  This option enables a workaround applicable to Cortex-A73

>>> +	  (all versions), whose counter may return incorrect values.

>>> +	  The workaround will be dynamically enabled when an affected

>>> +	  core is detected.

>>> +

>>> +	  If unsure, say Y.

>>> +

>>>  config CAVIUM_ERRATUM_22375

>>>  	bool "Cavium erratum 22375, 24313"

>>>  	default y

>>> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h

>>> index fbe0ca3..9b2b0f5 100644

>>> --- a/arch/arm64/include/asm/arch_timer.h

>>> +++ b/arch/arm64/include/asm/arch_timer.h

>>> @@ -114,6 +114,16 @@ static inline u64 arch_counter_get_cntpct(void)

>>>  	return 0;

>>>  }

>>>  

>>> +#ifdef CONFIG_ARM64_ERRATUM_858921

>>> +static inline u64 arch_counter_get_cntvct(void)

>>> +{

>>> +	u64 old, new;

>>> +

>>> +	asm volatile("mrs %0, cntvct_el0" : "=r" (old));

>>> +	asm volatile("mrs %0, cntvct_el0" : "=r" (new));

>>> +	return (((old ^ new) >> 32) & 1) ? old : new;

>>> +}

>>> +#else

>>>  static inline u64 arch_counter_get_cntvct(void)

>>>  {

>>>  	u64 cval;

>>> @@ -123,6 +133,7 @@ static inline u64 arch_counter_get_cntvct(void)

>>>  

>>>  	return cval;

>>>  }

>>> +#endif

>>>  

>>>  static inline int arch_timer_arch_init(void)

>>>  {

>>>

>>

>> This is completely busted:

>>

>> - You are only addressing the kernel side, and ignore userspace (which

>> is just as broken as the kernel).

> 

> Thanks for quick reviewing.

> 

> Yeah, I remembered there have some code directly reads arch timer

> virtual counter from userspace. But can you remind for userspace

> broken issue, is the code in libc or vdso?


You're missing the point. The virtual counter is freely available to
userspace to play with (this is a de-facto ABI). Since it can return bad
values, it needs to be trapped to be correctly emulated (together with
cntfrq_el0), and the VDSO disabled.

> Here I have another question is: after applied the whole workaround

> infrastructure, can it also fix userspace broken issue?


Yup. That's why I ended-up with a 18 patches series, and not just this
single one. Trust me, I'm lazy. There is nothing I hate more than doing
useless work.

>> - You lie in the description of the option (this is in no way dynamic,

>> since you didn't backport the whole workaround infrastructure).

> 

> I should fix it.

> 

>> So I'm afraid I'm NAKing this. Please refrain from blindly backporting

>> random patches. fa8d815fac96e7c9 only makes sense in the context of the

>> whole series, and on its own gives you a very false sense of having

>> properly addressed it.

> 

> IIUC, at least fa8d815fac96e7c9 can fix issue in kernel side, such like

> for sched_clock() roll back issue [1].


What's the point of fixing the kernel if userspace is just as likely to
fail?

> So for this issue, are you suggesting we need backport whole workaround

> infrastructure onto kernel 4.4 and 4.9? Many ARM devices are working

> with these two kernels.


Then these systems are completely broken if they use a Cortex-A73.
Either they run mainline (which will be just fine), or they get a fully
backported workaround infrastructure.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...
Leo Yan Aug. 31, 2017, 12:30 a.m. UTC | #6
Hi Marc,

On Wed, Aug 30, 2017 at 09:46:26AM +0100, Marc Zyngier wrote:

[...]

> >> This is completely busted:

> >>

> >> - You are only addressing the kernel side, and ignore userspace (which

> >> is just as broken as the kernel).

> > 

> > Thanks for quick reviewing.

> > 

> > Yeah, I remembered there have some code directly reads arch timer

> > virtual counter from userspace. But can you remind for userspace

> > broken issue, is the code in libc or vdso?

> 

> You're missing the point. The virtual counter is freely available to

> userspace to play with (this is a de-facto ABI). Since it can return bad

> values, it needs to be trapped to be correctly emulated (together with

> cntfrq_el0), and the VDSO disabled.

> 

> > Here I have another question is: after applied the whole workaround

> > infrastructure, can it also fix userspace broken issue?

> 

> Yup. That's why I ended-up with a 18 patches series, and not just this

> single one. Trust me, I'm lazy. There is nothing I hate more than doing

> useless work.


Thanks for detailed explaination. Now it's much clear for me.

> >> - You lie in the description of the option (this is in no way dynamic,

> >> since you didn't backport the whole workaround infrastructure).

> > 

> > I should fix it.

> > 

> >> So I'm afraid I'm NAKing this. Please refrain from blindly backporting

> >> random patches. fa8d815fac96e7c9 only makes sense in the context of the

> >> whole series, and on its own gives you a very false sense of having

> >> properly addressed it.

> > 

> > IIUC, at least fa8d815fac96e7c9 can fix issue in kernel side, such like

> > for sched_clock() roll back issue [1].

> 

> What's the point of fixing the kernel if userspace is just as likely to

> fail?


Understand now.

> > So for this issue, are you suggesting we need backport whole workaround

> > infrastructure onto kernel 4.4 and 4.9? Many ARM devices are working

> > with these two kernels.

> 

> Then these systems are completely broken if they use a Cortex-A73.

> Either they run mainline (which will be just fine), or they get a fully

> backported workaround infrastructure.


Yeah, we should do right thing.

Have ARM kernel team ported this patch series (or is in planning)? I
also will check with Linaro kernel team as well, I just want to avoid
duplicate efforts if these patches have been back ported. Otherwise, I
will backport the patch series.

Thanks,
Leo Yan
Marc Zyngier Aug. 31, 2017, 8:30 a.m. UTC | #7
Hi Leo,

On 31/08/17 01:30, Leo Yan wrote:
> Hi Marc,

> 

> On Wed, Aug 30, 2017 at 09:46:26AM +0100, Marc Zyngier wrote:

> 

> [...]

> 

>> Then these systems are completely broken if they use a Cortex-A73.

>> Either they run mainline (which will be just fine), or they get a fully

>> backported workaround infrastructure.

> 

> Yeah, we should do right thing.

> 

> Have ARM kernel team ported this patch series (or is in planning)? I

> also will check with Linaro kernel team as well, I just want to avoid

> duplicate efforts if these patches have been back ported. Otherwise, I

> will backport the patch series.

I have no plan to backport this to older kernels. This is new HW that
didn't exist (or wasn't publicly available) when 4.4/4.9 were released,
so 4.12 is effectively the first kernel I intend to support this erratum on.

If you have an interest in running older kernels on this HW, you'll have
to do the backport yourself. Please make sure you Cc both Mark and
myself on the patches.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...
Leo Yan Sept. 1, 2017, 2:01 a.m. UTC | #8
On Thu, Aug 31, 2017 at 09:30:29AM +0100, Marc Zyngier wrote:
> Hi Leo,

> 

> On 31/08/17 01:30, Leo Yan wrote:

> > Hi Marc,

> > 

> > On Wed, Aug 30, 2017 at 09:46:26AM +0100, Marc Zyngier wrote:

> > 

> > [...]

> > 

> >> Then these systems are completely broken if they use a Cortex-A73.

> >> Either they run mainline (which will be just fine), or they get a fully

> >> backported workaround infrastructure.

> > 

> > Yeah, we should do right thing.

> > 

> > Have ARM kernel team ported this patch series (or is in planning)? I

> > also will check with Linaro kernel team as well, I just want to avoid

> > duplicate efforts if these patches have been back ported. Otherwise, I

> > will backport the patch series.

> I have no plan to backport this to older kernels. This is new HW that

> didn't exist (or wasn't publicly available) when 4.4/4.9 were released,

> so 4.12 is effectively the first kernel I intend to support this erratum on.

> 

> If you have an interest in running older kernels on this HW, you'll have

> to do the backport yourself. Please make sure you Cc both Mark and

> myself on the patches.


Sure, will do this.

Thanks,
Leo Yan
diff mbox series

Patch

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 14cdc6d..68e7c98 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -374,6 +374,18 @@  config ARM64_ERRATUM_843419
 
 	  If unsure, say Y.
 
+config ARM64_ERRATUM_858921
+	bool "Cortex-A73: 858921: arch timer counter read can return a wrong value"
+	default y
+	depends on ARM_ARCH_TIMER && ARM64
+	help
+	  This option enables a workaround applicable to Cortex-A73
+	  (all versions), whose counter may return incorrect values.
+	  The workaround will be dynamically enabled when an affected
+	  core is detected.
+
+	  If unsure, say Y.
+
 config CAVIUM_ERRATUM_22375
 	bool "Cavium erratum 22375, 24313"
 	default y
diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
index fbe0ca3..9b2b0f5 100644
--- a/arch/arm64/include/asm/arch_timer.h
+++ b/arch/arm64/include/asm/arch_timer.h
@@ -114,6 +114,16 @@  static inline u64 arch_counter_get_cntpct(void)
 	return 0;
 }
 
+#ifdef CONFIG_ARM64_ERRATUM_858921
+static inline u64 arch_counter_get_cntvct(void)
+{
+	u64 old, new;
+
+	asm volatile("mrs %0, cntvct_el0" : "=r" (old));
+	asm volatile("mrs %0, cntvct_el0" : "=r" (new));
+	return (((old ^ new) >> 32) & 1) ? old : new;
+}
+#else
 static inline u64 arch_counter_get_cntvct(void)
 {
 	u64 cval;
@@ -123,6 +133,7 @@  static inline u64 arch_counter_get_cntvct(void)
 
 	return cval;
 }
+#endif
 
 static inline int arch_timer_arch_init(void)
 {