diff mbox series

tick: prefer a lower rating device only if it's CPU local device

Message ID 1525881728-4858-1-git-send-email-sudeep.holla@arm.com
State Accepted
Commit 1332a90558013ae4242e3dd7934bdcdeafb06c0d
Headers show
Series tick: prefer a lower rating device only if it's CPU local device | expand

Commit Message

Sudeep Holla May 9, 2018, 4:02 p.m. UTC
Checking the equality of cpumask for both new and old tick device doesn't
ensure that it's CPU local device. This will cause issue if a low rating
clockevent tick device is registered first followed by the registration
of higher rating clockevent tick device.

In such case, clockevents_released list will never get emptied as both
the devices get selected as preferred one and we will loop forever in
clockevents_notify_released.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

---
 kernel/time/tick-common.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Hi Thomas,

I am seeing this issue on my Juno devboard, where system wide timers
with rating 300 and 400 are registered in same order and we get stuck in
a loop in clockevents_notify_released. Let me know if this looks sane or
you have any suggestions that I can try out.

Regards,
Sudeep

--
2.7.4

Comments

Kevin Hilman July 2, 2018, 11:44 p.m. UTC | #1
Hi Sudeep,

On Wed, May 9, 2018 at 9:02 AM Sudeep Holla <sudeep.holla@arm.com> wrote:
>

> Checking the equality of cpumask for both new and old tick device doesn't

> ensure that it's CPU local device. This will cause issue if a low rating

> clockevent tick device is registered first followed by the registration

> of higher rating clockevent tick device.

>

> In such case, clockevents_released list will never get emptied as both

> the devices get selected as preferred one and we will loop forever in

> clockevents_notify_released.

>

> Cc: Frederic Weisbecker <fweisbec@gmail.com>

> Cc: Thomas Gleixner <tglx@linutronix.de>

> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>


I've got a arm32 board (meson8b-odroidc1) that's been failing in
kernelCI.org since the merge window (boot log[1]), and I finally got
around to bisecting it[2].  Unfortunately, the bisect pointed at a
merge commit, but with some trial and error (and a suggestion by Arnd)
I was able to test that revering $SUBJECT commit[3], my problem goes
away.

Another interesting data point is that disabling SMP (either by
"nosmp" on the command-line or CONFIG_SMP=n) also makes the problem go
away, without needing to revert this patch.

AFAICT, this platform, is using a single timer as a clocksource
("amlogic,meson6-timer") which is not a per-CPU timer.

I ran out of time to keep digging on this issue, and I'm still not
sure exactly what's going on, but I wanted to report it in case anyone
else has any ideas, and so we can hopefully get it fixed during the
-rc cycle.

Kevin

[1] https://storage.kernelci.org/mainline/master/v4.18-rc2-357-gd3bc0e67f852/arm/multi_v7_defconfig/lab-baylibre-seattle/boot-meson8b-odroidc1.html
[2] http://termbin.com/mk07
[3] in mainline as: 1332a9055801 tick: Prefer a lower rating device
only if it's CPU local device

> ---

>  kernel/time/tick-common.c | 3 ++-

>  1 file changed, 2 insertions(+), 1 deletion(-)

>

> Hi Thomas,

>

> I am seeing this issue on my Juno devboard, where system wide timers

> with rating 300 and 400 are registered in same order and we get stuck in

> a loop in clockevents_notify_released. Let me know if this looks sane or

> you have any suggestions that I can try out.

>

> Regards,

> Sudeep

>

> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c

> index 49edc1c4f3e6..78e598334007 100644

> --- a/kernel/time/tick-common.c

> +++ b/kernel/time/tick-common.c

> @@ -277,7 +277,8 @@ static bool tick_check_preferred(struct clock_event_device *curdev,

>          */

>         return !curdev ||

>                 newdev->rating > curdev->rating ||

> -              !cpumask_equal(curdev->cpumask, newdev->cpumask);

> +              (!cpumask_equal(curdev->cpumask, newdev->cpumask) &&

> +               !tick_check_percpu(curdev, newdev, smp_processor_id()));

>  }

>

>  /*

> --

> 2.7.4

>
Sudeep Holla July 3, 2018, 10:53 a.m. UTC | #2
On Mon, Jul 02, 2018 at 04:44:33PM -0700, Kevin Hilman wrote:
> Hi Sudeep,

> 

> On Wed, May 9, 2018 at 9:02 AM Sudeep Holla <sudeep.holla@arm.com> wrote:

> >

> > Checking the equality of cpumask for both new and old tick device doesn't

> > ensure that it's CPU local device. This will cause issue if a low rating

> > clockevent tick device is registered first followed by the registration

> > of higher rating clockevent tick device.

> >

> > In such case, clockevents_released list will never get emptied as both

> > the devices get selected as preferred one and we will loop forever in

> > clockevents_notify_released.

> >

> > Cc: Frederic Weisbecker <fweisbec@gmail.com>

> > Cc: Thomas Gleixner <tglx@linutronix.de>

> > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

> 

> I've got a arm32 board (meson8b-odroidc1) that's been failing in

> kernelCI.org since the merge window (boot log[1]), and I finally got

> around to bisecting it[2].  Unfortunately, the bisect pointed at a

> merge commit, but with some trial and error (and a suggestion by Arnd)

> I was able to test that revering $SUBJECT commit[3], my problem goes

> away.

>


Interesting. Sorry for causing the regression.

> Another interesting data point is that disabling SMP (either by

> "nosmp" on the command-line or CONFIG_SMP=n) also makes the problem go

> away, without needing to revert this patch.

>


I am not sure of nosmp, but with CONFIG_SMP=n, TICK_BROADCAST also gets
disabled. dummy_timer won't be registered I assume.

I am not sure if dummy_timer is selected as it's per_cpu but the rating
is low anyways.

> AFAICT, this platform, is using a single timer as a clocksource

> ("amlogic,meson6-timer") which is not a per-CPU timer.

>


Yes that's what I could gather from DT. But this is A5 right ? It may
have per CPU TWD(watchdof timer) but DT doesn't specify it, so should be
fine.

> I ran out of time to keep digging on this issue, and I'm still not

> sure exactly what's going on, but I wanted to report it in case anyone

> else has any ideas, and so we can hopefully get it fixed during the

> -rc cycle.

>


From the log, it looks like the platform has booted to userspace. Any chance
we can have a look at:
$ grep "" /sys/devices/system/clock*/{broadcast,clock*}/{available,current}_*

--
Regards,
Sudeep
Kevin Hilman July 3, 2018, 3:04 p.m. UTC | #3
On Tue, Jul 3, 2018 at 3:54 AM Sudeep Holla <sudeep.holla@arm.com> wrote:
>

> On Mon, Jul 02, 2018 at 04:44:33PM -0700, Kevin Hilman wrote:

> > Hi Sudeep,

> >

> > On Wed, May 9, 2018 at 9:02 AM Sudeep Holla <sudeep.holla@arm.com> wrote:

> > >

> > > Checking the equality of cpumask for both new and old tick device doesn't

> > > ensure that it's CPU local device. This will cause issue if a low rating

> > > clockevent tick device is registered first followed by the registration

> > > of higher rating clockevent tick device.

> > >

> > > In such case, clockevents_released list will never get emptied as both

> > > the devices get selected as preferred one and we will loop forever in

> > > clockevents_notify_released.

> > >

> > > Cc: Frederic Weisbecker <fweisbec@gmail.com>

> > > Cc: Thomas Gleixner <tglx@linutronix.de>

> > > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

> >

> > I've got a arm32 board (meson8b-odroidc1) that's been failing in

> > kernelCI.org since the merge window (boot log[1]), and I finally got

> > around to bisecting it[2].  Unfortunately, the bisect pointed at a

> > merge commit, but with some trial and error (and a suggestion by Arnd)

> > I was able to test that revering $SUBJECT commit[3], my problem goes

> > away.

> >

>

> Interesting. Sorry for causing the regression.

>

> > Another interesting data point is that disabling SMP (either by

> > "nosmp" on the command-line or CONFIG_SMP=n) also makes the problem go

> > away, without needing to revert this patch.

> >

>

> I am not sure of nosmp, but with CONFIG_SMP=n, TICK_BROADCAST also gets

> disabled. dummy_timer won't be registered I assume.

>

> I am not sure if dummy_timer is selected as it's per_cpu but the rating

> is low anyways.


> > AFAICT, this platform, is using a single timer as a clocksource

> > ("amlogic,meson6-timer") which is not a per-CPU timer.

> >

>

> Yes that's what I could gather from DT. But this is A5 right ? It may

> have per CPU TWD(watchdof timer) but DT doesn't specify it, so should be

> fine.

>

> > I ran out of time to keep digging on this issue, and I'm still not

> > sure exactly what's going on, but I wanted to report it in case anyone

> > else has any ideas, and so we can hopefully get it fixed during the

> > -rc cycle.

> >

>

> From the log, it looks like the platform has booted to userspace. Any chance

> we can have a look at:

> $ grep "" /sys/devices/system/clock*/{broadcast,clock*}/{available,current}_*


In the failing case, it doesn't boot to a shell, so I can't do that,
but after I revert the patch, I have this:

/ # ls -l /sys/devices/system/clocksource
total 0
drwxr-xr-x    3 root     root             0 Jan  1 00:00 clocksource0
drwxr-xr-x    2 root     root             0 Jan  1 00:00 power
-rw-r--r--    1 root     root          4096 Jan  1 00:00 uevent
/ # cat /sys/devices/system/clocksource/clocksource0/available_clocksource
timer jiffies
/ # cat /sys/devices/system/clocksource/clocksource0/current_clocksource
timer

/ # cat /sys/devices/system/clockevents/broadcast/current_device
meson6_tick
/ # cat /sys/devices/system/clockevents/clockevent0/current_device
dummy_timer
/ # cat /sys/devices/system/clockevents/clockevent1/current_device
dummy_timer
/ # cat /sys/devices/system/clockevents/clockevent2/current_device
dummy_timer

Kevin
Sudeep Holla July 3, 2018, 3:44 p.m. UTC | #4
On Tue, Jul 03, 2018 at 08:04:37AM -0700, Kevin Hilman wrote:
> On Tue, Jul 3, 2018 at 3:54 AM Sudeep Holla <sudeep.holla@arm.com> wrote:

> >

> > On Mon, Jul 02, 2018 at 04:44:33PM -0700, Kevin Hilman wrote:

> > > Hi Sudeep,

> > >

> > > On Wed, May 9, 2018 at 9:02 AM Sudeep Holla <sudeep.holla@arm.com> wrote:

> > > >

> > > > Checking the equality of cpumask for both new and old tick device doesn't

> > > > ensure that it's CPU local device. This will cause issue if a low rating

> > > > clockevent tick device is registered first followed by the registration

> > > > of higher rating clockevent tick device.

> > > >

> > > > In such case, clockevents_released list will never get emptied as both

> > > > the devices get selected as preferred one and we will loop forever in

> > > > clockevents_notify_released.

> > > >

> > > > Cc: Frederic Weisbecker <fweisbec@gmail.com>

> > > > Cc: Thomas Gleixner <tglx@linutronix.de>

> > > > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

> > >

> > > I've got a arm32 board (meson8b-odroidc1) that's been failing in

> > > kernelCI.org since the merge window (boot log[1]), and I finally got

> > > around to bisecting it[2].  Unfortunately, the bisect pointed at a

> > > merge commit, but with some trial and error (and a suggestion by Arnd)

> > > I was able to test that revering $SUBJECT commit[3], my problem goes

> > > away.

> > >

> >

> > Interesting. Sorry for causing the regression.

> >

> > > Another interesting data point is that disabling SMP (either by

> > > "nosmp" on the command-line or CONFIG_SMP=n) also makes the problem go

> > > away, without needing to revert this patch.

> > >

> >

> > I am not sure of nosmp, but with CONFIG_SMP=n, TICK_BROADCAST also gets

> > disabled. dummy_timer won't be registered I assume.

> >

> > I am not sure if dummy_timer is selected as it's per_cpu but the rating

> > is low anyways.

> 

> > > AFAICT, this platform, is using a single timer as a clocksource

> > > ("amlogic,meson6-timer") which is not a per-CPU timer.

> > >

> >

> > Yes that's what I could gather from DT. But this is A5 right ? It may

> > have per CPU TWD(watchdof timer) but DT doesn't specify it, so should be

> > fine.

> >

> > > I ran out of time to keep digging on this issue, and I'm still not

> > > sure exactly what's going on, but I wanted to report it in case anyone

> > > else has any ideas, and so we can hopefully get it fixed during the

> > > -rc cycle.

> > >

> >

> > From the log, it looks like the platform has booted to userspace. Any chance

> > we can have a look at:

> > $ grep "" /sys/devices/system/clock*/{broadcast,clock*}/{available,current}_*

> 

> In the failing case, it doesn't boot to a shell, so I can't do that,

> but after I revert the patch, I have this:

>


Ah ok, does it hang when it registers clockevents ?

> / # ls -l /sys/devices/system/clocksource

> total 0

> drwxr-xr-x    3 root     root             0 Jan  1 00:00 clocksource0

> drwxr-xr-x    2 root     root             0 Jan  1 00:00 power

> -rw-r--r--    1 root     root          4096 Jan  1 00:00 uevent

> / # cat /sys/devices/system/clocksource/clocksource0/available_clocksource

> timer jiffies


Looks good.

> / # cat /sys/devices/system/clocksource/clocksource0/current_clocksource

> timer

>


OK, meson6 clocksource is active

> / # cat /sys/devices/system/clockevents/broadcast/current_device

> meson6_tick


OK, it can support broadcast

> / # cat /sys/devices/system/clockevents/clockevent0/current_device

> dummy_timer

> / # cat /sys/devices/system/clockevents/clockevent1/current_device

> dummy_timer

> / # cat /sys/devices/system/clockevents/clockevent2/current_device

> dummy_timer


But I can't understand why is dummy_timer the active event source and
not meson6_tick. And you say this is working case ? Looks suspicious.

If dummy_timer was getting used, I think meson6_tick was never utilised
before as I see this platform doesn't have cpuidle(at-least from DT)

--
Regards,
Sudeep
Thomas Gleixner July 3, 2018, 4:08 p.m. UTC | #5
On Tue, 3 Jul 2018, Sudeep Holla wrote:
> On Tue, Jul 03, 2018 at 08:04:37AM -0700, Kevin Hilman wrote:

> > / # ls -l /sys/devices/system/clocksource

> > total 0

> > drwxr-xr-x    3 root     root             0 Jan  1 00:00 clocksource0

> > drwxr-xr-x    2 root     root             0 Jan  1 00:00 power

> > -rw-r--r--    1 root     root          4096 Jan  1 00:00 uevent

> > / # cat /sys/devices/system/clocksource/clocksource0/available_clocksource

> > timer jiffies

> 

> Looks good.

> 

> > / # cat /sys/devices/system/clocksource/clocksource0/current_clocksource

> > timer

> >

> 

> OK, meson6 clocksource is active

> 

> > / # cat /sys/devices/system/clockevents/broadcast/current_device

> > meson6_tick

> 

> OK, it can support broadcast

> 

> > / # cat /sys/devices/system/clockevents/clockevent0/current_device

> > dummy_timer

> > / # cat /sys/devices/system/clockevents/clockevent1/current_device

> > dummy_timer

> > / # cat /sys/devices/system/clockevents/clockevent2/current_device

> > dummy_timer

> 

> But I can't understand why is dummy_timer the active event source and

> not meson6_tick. And you say this is working case ? Looks suspicious.


Because if it switches to broadcast mode then the meson timer cannot longer
be used as per cpu timer. It's broadcasting to all CPUs via the dummy timer.

Thanks,

	tglx
Sudeep Holla July 3, 2018, 4:48 p.m. UTC | #6
Hi Thomas,

On Tue, Jul 03, 2018 at 06:08:19PM +0200, Thomas Gleixner wrote:

[...]

> > > / # cat /sys/devices/system/clockevents/broadcast/current_device

> > > meson6_tick

> >

> > OK, it can support broadcast

> >

> > > / # cat /sys/devices/system/clockevents/clockevent0/current_device

> > > dummy_timer

> > > / # cat /sys/devices/system/clockevents/clockevent1/current_device

> > > dummy_timer

> > > / # cat /sys/devices/system/clockevents/clockevent2/current_device

> > > dummy_timer

> >

> > But I can't understand why is dummy_timer the active event source and

> > not meson6_tick. And you say this is working case ? Looks suspicious.

>

> Because if it switches to broadcast mode then the meson timer cannot longer

> be used as per cpu timer. It's broadcasting to all CPUs via the dummy timer.


Thanks for the explanation. I completely misread the sysfs entry and
assume clockevent_register failed for meson6 and hence regarded as
suspicious which is complete non-sense, my bad. Sorry for that.
I think I now understand the issue.

1. Juno usecase for which $subject was added as fix:

Two system wide timers(cpumask=possible cpus) with rating 300 and 400.
When second one with 400 is added, timer with rating 300 is added to
released list and again added back to main one. In this case both were
chosen as preferred and that resulted in deadlock.

2. Meson6 usecase:

When meson6_tick is added, it's set as preferred and dummy_timer is released.
When it's being added back from the released list, it will be chosen as
preferred as it's per_cpu resulting in deadlock.

I am not sure how to fix this. Should the fix to my original problem have
checks for both old and new for per-cpu  to prevent the issue reported on
Meson6

--
Regards,
Sudeep
Martin Blumenstingl July 8, 2018, 8:59 p.m. UTC | #7
Hi Thomas,

On Tue, Jul 3, 2018 at 6:48 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
>

> Hi Thomas,

>

> On Tue, Jul 03, 2018 at 06:08:19PM +0200, Thomas Gleixner wrote:

>

> [...]

>

> > > > / # cat /sys/devices/system/clockevents/broadcast/current_device

> > > > meson6_tick

> > >

> > > OK, it can support broadcast

> > >

> > > > / # cat /sys/devices/system/clockevents/clockevent0/current_device

> > > > dummy_timer

> > > > / # cat /sys/devices/system/clockevents/clockevent1/current_device

> > > > dummy_timer

> > > > / # cat /sys/devices/system/clockevents/clockevent2/current_device

> > > > dummy_timer

> > >

> > > But I can't understand why is dummy_timer the active event source and

> > > not meson6_tick. And you say this is working case ? Looks suspicious.

> >

> > Because if it switches to broadcast mode then the meson timer cannot longer

> > be used as per cpu timer. It's broadcasting to all CPUs via the dummy timer.

>

> Thanks for the explanation. I completely misread the sysfs entry and

> assume clockevent_register failed for meson6 and hence regarded as

> suspicious which is complete non-sense, my bad. Sorry for that.

> I think I now understand the issue.

>

> 1. Juno usecase for which $subject was added as fix:

>

> Two system wide timers(cpumask=possible cpus) with rating 300 and 400.

> When second one with 400 is added, timer with rating 300 is added to

> released list and again added back to main one. In this case both were

> chosen as preferred and that resulted in deadlock.

>

> 2. Meson6 usecase:

>

> When meson6_tick is added, it's set as preferred and dummy_timer is released.

> When it's being added back from the released list, it will be chosen as

> preferred as it's per_cpu resulting in deadlock.

>

> I am not sure how to fix this. Should the fix to my original problem have

> checks for both old and new for per-cpu  to prevent the issue reported on

> Meson6

could you please answer Sudeep's question?

in the meantime I checked Sudpeed's suggestion regarding the TWD
timer: it seems that the Meson8 (Cortex-A9) and Meson8b (Cortex-A5)
SoCs have the ARM TWD (timer watchdog) built-in - Carlo sent a patch
for this a long time ago: [0].
however, I figured out that this doesn't work out-of-the-box anymore:
the TWD seems to be supplied (according to my own tests) by CPU_CLK
div 16 and the interrupt should be IRQ_TYPE_EDGE_RISING
unfortunately we cannot simply add the TWD timer to Meson8 or Meson8b
because this would first require changes to the clock controller
driver (the are currently registered as platform driver, which is too
late for the TWD timer driver)

in other words: we cannot use the TWD timer on the Meson platform in
the v4.18 cycle, so I would prefer a fix of the timer/tick code


Regards
Martin


[0] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-December/391928.html
Sudeep Holla July 9, 2018, 3:12 p.m. UTC | #8
On 08/07/18 21:59, Martin Blumenstingl wrote:
> Hi Thomas,

> 

> On Tue, Jul 3, 2018 at 6:48 PM Sudeep Holla <sudeep.holla@arm.com> wrote:

>>

>> Hi Thomas,

>>

>> On Tue, Jul 03, 2018 at 06:08:19PM +0200, Thomas Gleixner wrote:

>>

>> [...]

>>

>>>>> / # cat /sys/devices/system/clockevents/broadcast/current_device

>>>>> meson6_tick

>>>>

>>>> OK, it can support broadcast

>>>>

>>>>> / # cat /sys/devices/system/clockevents/clockevent0/current_device

>>>>> dummy_timer

>>>>> / # cat /sys/devices/system/clockevents/clockevent1/current_device

>>>>> dummy_timer

>>>>> / # cat /sys/devices/system/clockevents/clockevent2/current_device

>>>>> dummy_timer

>>>>

>>>> But I can't understand why is dummy_timer the active event source and

>>>> not meson6_tick. And you say this is working case ? Looks suspicious.

>>>

>>> Because if it switches to broadcast mode then the meson timer cannot longer

>>> be used as per cpu timer. It's broadcasting to all CPUs via the dummy timer.

>>

>> Thanks for the explanation. I completely misread the sysfs entry and

>> assume clockevent_register failed for meson6 and hence regarded as

>> suspicious which is complete non-sense, my bad. Sorry for that.

>> I think I now understand the issue.

>>

>> 1. Juno usecase for which $subject was added as fix:

>>

>> Two system wide timers(cpumask=possible cpus) with rating 300 and 400.

>> When second one with 400 is added, timer with rating 300 is added to

>> released list and again added back to main one. In this case both were

>> chosen as preferred and that resulted in deadlock.

>>

>> 2. Meson6 usecase:

>>

>> When meson6_tick is added, it's set as preferred and dummy_timer is released.

>> When it's being added back from the released list, it will be chosen as

>> preferred as it's per_cpu resulting in deadlock.

>>

>> I am not sure how to fix this. Should the fix to my original problem have

>> checks for both old and new for per-cpu  to prevent the issue reported on

>> Meson6

> could you please answer Sudeep's question?

> 


OK, I think I have misunderstood my original problem because of the
cpumask_equal for nr_bits <= BITS_PER_LONG. It uses nr_cpumask_bits
which is NR_CPUS when CPUMASK_OFFSTACK is not enabled.

Enabling it fixes my original problem(reverting $subject patch). So it
should be reverted. And also pointed out the issue with ARM mem timer.

I am sorry for the mess, I will post the revert and along with the fix
to my issue. Once again sorry for not understanding the root cause
correctly.
-- 
Regards,
Sudeep
diff mbox series

Patch

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 49edc1c4f3e6..78e598334007 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -277,7 +277,8 @@  static bool tick_check_preferred(struct clock_event_device *curdev,
 	 */
 	return !curdev ||
 		newdev->rating > curdev->rating ||
-	       !cpumask_equal(curdev->cpumask, newdev->cpumask);
+	       (!cpumask_equal(curdev->cpumask, newdev->cpumask) &&
+	        !tick_check_percpu(curdev, newdev, smp_processor_id()));
 }

 /*