Message ID | 1504069569-27809-1-git-send-email-leo.yan@linaro.org |
---|---|
State | New |
Headers | show |
Series | [RESEND] arm64: arch_timer: Workaround for Cortex-A73 erratum 858921 | expand |
On Wed, Aug 30, 2017 at 01:06:09PM +0800, Leo Yan wrote: > commit fa8d815fac96e7c9247783d5a1f8fa4685b3c543 upstream. > > Cortex-A73 (all versions) counter read can return a wrong value > when the counter crosses a 32bit boundary. > > The workaround involves performing the read twice, and to return > one or the other depending on whether a transition has taken place. > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> > Signed-off-by: Leo Yan <leo.yan@linaro.org> > --- > arch/arm64/Kconfig | 12 ++++++++++++ > arch/arm64/include/asm/arch_timer.h | 11 +++++++++++ > 2 files changed, 23 insertions(+) Why [RESEND] ? And what stable kernel tree(s) do you want this to be applied to? thanks, greg k-h
On Wed, Aug 30, 2017 at 08:03:38AM +0200, Greg Kroah-Hartman wrote: > On Wed, Aug 30, 2017 at 01:06:09PM +0800, Leo Yan wrote: > > commit fa8d815fac96e7c9247783d5a1f8fa4685b3c543 upstream. > > > > Cortex-A73 (all versions) counter read can return a wrong value > > when the counter crosses a 32bit boundary. > > > > The workaround involves performing the read twice, and to return > > one or the other depending on whether a transition has taken place. > > > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> > > Signed-off-by: Leo Yan <leo.yan@linaro.org> > > --- > > arch/arm64/Kconfig | 12 ++++++++++++ > > arch/arm64/include/asm/arch_timer.h | 11 +++++++++++ > > 2 files changed, 23 insertions(+) > > Why [RESEND] ? In the first email your address and mailing list address are mixed, I am not sure if the patch has been sent out properly. So resend it again. Sorry for spamming. > And what stable kernel tree(s) do you want this to be applied to? This patch is target for kernel 4.4 and 4.9. Thanks, Leo Yan
+ Mark On 30/08/17 06:06, Leo Yan wrote: > commit fa8d815fac96e7c9247783d5a1f8fa4685b3c543 upstream. > > Cortex-A73 (all versions) counter read can return a wrong value > when the counter crosses a 32bit boundary. > > The workaround involves performing the read twice, and to return > one or the other depending on whether a transition has taken place. > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> > Signed-off-by: Leo Yan <leo.yan@linaro.org> > --- > arch/arm64/Kconfig | 12 ++++++++++++ > arch/arm64/include/asm/arch_timer.h | 11 +++++++++++ > 2 files changed, 23 insertions(+) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 14cdc6d..68e7c98 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -374,6 +374,18 @@ config ARM64_ERRATUM_843419 > > If unsure, say Y. > > +config ARM64_ERRATUM_858921 > + bool "Cortex-A73: 858921: arch timer counter read can return a wrong value" > + default y > + depends on ARM_ARCH_TIMER && ARM64 > + help > + This option enables a workaround applicable to Cortex-A73 > + (all versions), whose counter may return incorrect values. > + The workaround will be dynamically enabled when an affected > + core is detected. > + > + If unsure, say Y. > + > config CAVIUM_ERRATUM_22375 > bool "Cavium erratum 22375, 24313" > default y > diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h > index fbe0ca3..9b2b0f5 100644 > --- a/arch/arm64/include/asm/arch_timer.h > +++ b/arch/arm64/include/asm/arch_timer.h > @@ -114,6 +114,16 @@ static inline u64 arch_counter_get_cntpct(void) > return 0; > } > > +#ifdef CONFIG_ARM64_ERRATUM_858921 > +static inline u64 arch_counter_get_cntvct(void) > +{ > + u64 old, new; > + > + asm volatile("mrs %0, cntvct_el0" : "=r" (old)); > + asm volatile("mrs %0, cntvct_el0" : "=r" (new)); > + return (((old ^ new) >> 32) & 1) ? old : new; > +} > +#else > static inline u64 arch_counter_get_cntvct(void) > { > u64 cval; > @@ -123,6 +133,7 @@ static inline u64 arch_counter_get_cntvct(void) > > return cval; > } > +#endif > > static inline int arch_timer_arch_init(void) > { > This is completely busted: - You are only addressing the kernel side, and ignore userspace (which is just as broken as the kernel). - You lie in the description of the option (this is in no way dynamic, since you didn't backport the whole workaround infrastructure). So I'm afraid I'm NAKing this. Please refrain from blindly backporting random patches. fa8d815fac96e7c9 only makes sense in the context of the whole series, and on its own gives you a very false sense of having properly addressed it. Thanks, M. -- Jazz is not dead. It just smells funny...
Hi Marc, On Wed, Aug 30, 2017 at 09:02:47AM +0100, Marc Zyngier wrote: > + Mark > > On 30/08/17 06:06, Leo Yan wrote: > > commit fa8d815fac96e7c9247783d5a1f8fa4685b3c543 upstream. > > > > Cortex-A73 (all versions) counter read can return a wrong value > > when the counter crosses a 32bit boundary. > > > > The workaround involves performing the read twice, and to return > > one or the other depending on whether a transition has taken place. > > > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> > > Signed-off-by: Leo Yan <leo.yan@linaro.org> > > --- > > arch/arm64/Kconfig | 12 ++++++++++++ > > arch/arm64/include/asm/arch_timer.h | 11 +++++++++++ > > 2 files changed, 23 insertions(+) > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > > index 14cdc6d..68e7c98 100644 > > --- a/arch/arm64/Kconfig > > +++ b/arch/arm64/Kconfig > > @@ -374,6 +374,18 @@ config ARM64_ERRATUM_843419 > > > > If unsure, say Y. > > > > +config ARM64_ERRATUM_858921 > > + bool "Cortex-A73: 858921: arch timer counter read can return a wrong value" > > + default y > > + depends on ARM_ARCH_TIMER && ARM64 > > + help > > + This option enables a workaround applicable to Cortex-A73 > > + (all versions), whose counter may return incorrect values. > > + The workaround will be dynamically enabled when an affected > > + core is detected. > > + > > + If unsure, say Y. > > + > > config CAVIUM_ERRATUM_22375 > > bool "Cavium erratum 22375, 24313" > > default y > > diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h > > index fbe0ca3..9b2b0f5 100644 > > --- a/arch/arm64/include/asm/arch_timer.h > > +++ b/arch/arm64/include/asm/arch_timer.h > > @@ -114,6 +114,16 @@ static inline u64 arch_counter_get_cntpct(void) > > return 0; > > } > > > > +#ifdef CONFIG_ARM64_ERRATUM_858921 > > +static inline u64 arch_counter_get_cntvct(void) > > +{ > > + u64 old, new; > > + > > + asm volatile("mrs %0, cntvct_el0" : "=r" (old)); > > + asm volatile("mrs %0, cntvct_el0" : "=r" (new)); > > + return (((old ^ new) >> 32) & 1) ? old : new; > > +} > > +#else > > static inline u64 arch_counter_get_cntvct(void) > > { > > u64 cval; > > @@ -123,6 +133,7 @@ static inline u64 arch_counter_get_cntvct(void) > > > > return cval; > > } > > +#endif > > > > static inline int arch_timer_arch_init(void) > > { > > > > This is completely busted: > > - You are only addressing the kernel side, and ignore userspace (which > is just as broken as the kernel). Thanks for quick reviewing. Yeah, I remembered there have some code directly reads arch timer virtual counter from userspace. But can you remind for userspace broken issue, is the code in libc or vdso? Here I have another question is: after applied the whole workaround infrastructure, can it also fix userspace broken issue? > - You lie in the description of the option (this is in no way dynamic, > since you didn't backport the whole workaround infrastructure). I should fix it. > So I'm afraid I'm NAKing this. Please refrain from blindly backporting > random patches. fa8d815fac96e7c9 only makes sense in the context of the > whole series, and on its own gives you a very false sense of having > properly addressed it. IIUC, at least fa8d815fac96e7c9 can fix issue in kernel side, such like for sched_clock() roll back issue [1]. So for this issue, are you suggesting we need backport whole workaround infrastructure onto kernel 4.4 and 4.9? Many ARM devices are working with these two kernels. [1] https://lists.linaro.org/pipermail/eas-dev/2017-August/000941.html Thanks, Leo Yan
On 30/08/17 09:20, Leo Yan wrote: > Hi Marc, > > On Wed, Aug 30, 2017 at 09:02:47AM +0100, Marc Zyngier wrote: >> + Mark >> >> On 30/08/17 06:06, Leo Yan wrote: >>> commit fa8d815fac96e7c9247783d5a1f8fa4685b3c543 upstream. >>> >>> Cortex-A73 (all versions) counter read can return a wrong value >>> when the counter crosses a 32bit boundary. >>> >>> The workaround involves performing the read twice, and to return >>> one or the other depending on whether a transition has taken place. >>> >>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> >>> Signed-off-by: Leo Yan <leo.yan@linaro.org> >>> --- >>> arch/arm64/Kconfig | 12 ++++++++++++ >>> arch/arm64/include/asm/arch_timer.h | 11 +++++++++++ >>> 2 files changed, 23 insertions(+) >>> >>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >>> index 14cdc6d..68e7c98 100644 >>> --- a/arch/arm64/Kconfig >>> +++ b/arch/arm64/Kconfig >>> @@ -374,6 +374,18 @@ config ARM64_ERRATUM_843419 >>> >>> If unsure, say Y. >>> >>> +config ARM64_ERRATUM_858921 >>> + bool "Cortex-A73: 858921: arch timer counter read can return a wrong value" >>> + default y >>> + depends on ARM_ARCH_TIMER && ARM64 >>> + help >>> + This option enables a workaround applicable to Cortex-A73 >>> + (all versions), whose counter may return incorrect values. >>> + The workaround will be dynamically enabled when an affected >>> + core is detected. >>> + >>> + If unsure, say Y. >>> + >>> config CAVIUM_ERRATUM_22375 >>> bool "Cavium erratum 22375, 24313" >>> default y >>> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h >>> index fbe0ca3..9b2b0f5 100644 >>> --- a/arch/arm64/include/asm/arch_timer.h >>> +++ b/arch/arm64/include/asm/arch_timer.h >>> @@ -114,6 +114,16 @@ static inline u64 arch_counter_get_cntpct(void) >>> return 0; >>> } >>> >>> +#ifdef CONFIG_ARM64_ERRATUM_858921 >>> +static inline u64 arch_counter_get_cntvct(void) >>> +{ >>> + u64 old, new; >>> + >>> + asm volatile("mrs %0, cntvct_el0" : "=r" (old)); >>> + asm volatile("mrs %0, cntvct_el0" : "=r" (new)); >>> + return (((old ^ new) >> 32) & 1) ? old : new; >>> +} >>> +#else >>> static inline u64 arch_counter_get_cntvct(void) >>> { >>> u64 cval; >>> @@ -123,6 +133,7 @@ static inline u64 arch_counter_get_cntvct(void) >>> >>> return cval; >>> } >>> +#endif >>> >>> static inline int arch_timer_arch_init(void) >>> { >>> >> >> This is completely busted: >> >> - You are only addressing the kernel side, and ignore userspace (which >> is just as broken as the kernel). > > Thanks for quick reviewing. > > Yeah, I remembered there have some code directly reads arch timer > virtual counter from userspace. But can you remind for userspace > broken issue, is the code in libc or vdso? You're missing the point. The virtual counter is freely available to userspace to play with (this is a de-facto ABI). Since it can return bad values, it needs to be trapped to be correctly emulated (together with cntfrq_el0), and the VDSO disabled. > Here I have another question is: after applied the whole workaround > infrastructure, can it also fix userspace broken issue? Yup. That's why I ended-up with a 18 patches series, and not just this single one. Trust me, I'm lazy. There is nothing I hate more than doing useless work. >> - You lie in the description of the option (this is in no way dynamic, >> since you didn't backport the whole workaround infrastructure). > > I should fix it. > >> So I'm afraid I'm NAKing this. Please refrain from blindly backporting >> random patches. fa8d815fac96e7c9 only makes sense in the context of the >> whole series, and on its own gives you a very false sense of having >> properly addressed it. > > IIUC, at least fa8d815fac96e7c9 can fix issue in kernel side, such like > for sched_clock() roll back issue [1]. What's the point of fixing the kernel if userspace is just as likely to fail? > So for this issue, are you suggesting we need backport whole workaround > infrastructure onto kernel 4.4 and 4.9? Many ARM devices are working > with these two kernels. Then these systems are completely broken if they use a Cortex-A73. Either they run mainline (which will be just fine), or they get a fully backported workaround infrastructure. Thanks, M. -- Jazz is not dead. It just smells funny...
Hi Marc, On Wed, Aug 30, 2017 at 09:46:26AM +0100, Marc Zyngier wrote: [...] > >> This is completely busted: > >> > >> - You are only addressing the kernel side, and ignore userspace (which > >> is just as broken as the kernel). > > > > Thanks for quick reviewing. > > > > Yeah, I remembered there have some code directly reads arch timer > > virtual counter from userspace. But can you remind for userspace > > broken issue, is the code in libc or vdso? > > You're missing the point. The virtual counter is freely available to > userspace to play with (this is a de-facto ABI). Since it can return bad > values, it needs to be trapped to be correctly emulated (together with > cntfrq_el0), and the VDSO disabled. > > > Here I have another question is: after applied the whole workaround > > infrastructure, can it also fix userspace broken issue? > > Yup. That's why I ended-up with a 18 patches series, and not just this > single one. Trust me, I'm lazy. There is nothing I hate more than doing > useless work. Thanks for detailed explaination. Now it's much clear for me. > >> - You lie in the description of the option (this is in no way dynamic, > >> since you didn't backport the whole workaround infrastructure). > > > > I should fix it. > > > >> So I'm afraid I'm NAKing this. Please refrain from blindly backporting > >> random patches. fa8d815fac96e7c9 only makes sense in the context of the > >> whole series, and on its own gives you a very false sense of having > >> properly addressed it. > > > > IIUC, at least fa8d815fac96e7c9 can fix issue in kernel side, such like > > for sched_clock() roll back issue [1]. > > What's the point of fixing the kernel if userspace is just as likely to > fail? Understand now. > > So for this issue, are you suggesting we need backport whole workaround > > infrastructure onto kernel 4.4 and 4.9? Many ARM devices are working > > with these two kernels. > > Then these systems are completely broken if they use a Cortex-A73. > Either they run mainline (which will be just fine), or they get a fully > backported workaround infrastructure. Yeah, we should do right thing. Have ARM kernel team ported this patch series (or is in planning)? I also will check with Linaro kernel team as well, I just want to avoid duplicate efforts if these patches have been back ported. Otherwise, I will backport the patch series. Thanks, Leo Yan
Hi Leo, On 31/08/17 01:30, Leo Yan wrote: > Hi Marc, > > On Wed, Aug 30, 2017 at 09:46:26AM +0100, Marc Zyngier wrote: > > [...] > >> Then these systems are completely broken if they use a Cortex-A73. >> Either they run mainline (which will be just fine), or they get a fully >> backported workaround infrastructure. > > Yeah, we should do right thing. > > Have ARM kernel team ported this patch series (or is in planning)? I > also will check with Linaro kernel team as well, I just want to avoid > duplicate efforts if these patches have been back ported. Otherwise, I > will backport the patch series. I have no plan to backport this to older kernels. This is new HW that didn't exist (or wasn't publicly available) when 4.4/4.9 were released, so 4.12 is effectively the first kernel I intend to support this erratum on. If you have an interest in running older kernels on this HW, you'll have to do the backport yourself. Please make sure you Cc both Mark and myself on the patches. Thanks, M. -- Jazz is not dead. It just smells funny...
On Thu, Aug 31, 2017 at 09:30:29AM +0100, Marc Zyngier wrote: > Hi Leo, > > On 31/08/17 01:30, Leo Yan wrote: > > Hi Marc, > > > > On Wed, Aug 30, 2017 at 09:46:26AM +0100, Marc Zyngier wrote: > > > > [...] > > > >> Then these systems are completely broken if they use a Cortex-A73. > >> Either they run mainline (which will be just fine), or they get a fully > >> backported workaround infrastructure. > > > > Yeah, we should do right thing. > > > > Have ARM kernel team ported this patch series (or is in planning)? I > > also will check with Linaro kernel team as well, I just want to avoid > > duplicate efforts if these patches have been back ported. Otherwise, I > > will backport the patch series. > I have no plan to backport this to older kernels. This is new HW that > didn't exist (or wasn't publicly available) when 4.4/4.9 were released, > so 4.12 is effectively the first kernel I intend to support this erratum on. > > If you have an interest in running older kernels on this HW, you'll have > to do the backport yourself. Please make sure you Cc both Mark and > myself on the patches. Sure, will do this. Thanks, Leo Yan
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 14cdc6d..68e7c98 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -374,6 +374,18 @@ config ARM64_ERRATUM_843419 If unsure, say Y. +config ARM64_ERRATUM_858921 + bool "Cortex-A73: 858921: arch timer counter read can return a wrong value" + default y + depends on ARM_ARCH_TIMER && ARM64 + help + This option enables a workaround applicable to Cortex-A73 + (all versions), whose counter may return incorrect values. + The workaround will be dynamically enabled when an affected + core is detected. + + If unsure, say Y. + config CAVIUM_ERRATUM_22375 bool "Cavium erratum 22375, 24313" default y diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h index fbe0ca3..9b2b0f5 100644 --- a/arch/arm64/include/asm/arch_timer.h +++ b/arch/arm64/include/asm/arch_timer.h @@ -114,6 +114,16 @@ static inline u64 arch_counter_get_cntpct(void) return 0; } +#ifdef CONFIG_ARM64_ERRATUM_858921 +static inline u64 arch_counter_get_cntvct(void) +{ + u64 old, new; + + asm volatile("mrs %0, cntvct_el0" : "=r" (old)); + asm volatile("mrs %0, cntvct_el0" : "=r" (new)); + return (((old ^ new) >> 32) & 1) ? old : new; +} +#else static inline u64 arch_counter_get_cntvct(void) { u64 cval; @@ -123,6 +133,7 @@ static inline u64 arch_counter_get_cntvct(void) return cval; } +#endif static inline int arch_timer_arch_init(void) {