diff mbox series

PM-runtime: fix deadlock with ktime

Message ID 1548836194-15264-1-git-send-email-vincent.guittot@linaro.org
State Accepted
Commit 15efb47dc560849d0c07db96fdad5121f2cf736e
Headers show
Series PM-runtime: fix deadlock with ktime | expand

Commit Message

Vincent Guittot Jan. 30, 2019, 8:16 a.m. UTC
A deadlock has been seen when swicthing clocksources which use PM runtime.
The call path is:
change_clocksource
    ...
    write_seqcount_begin
    ...
    timekeeping_update
        ...
        sh_cmt_clocksource_enable
            ...
            rpm_resume
                pm_runtime_mark_last_busy
                    ktime_get
                        do
                            read_seqcount_begin
                        while read_seqcount_retry
    ....
    write_seqcount_end

Although we should be safe because we haven't yet changed the clocksource
at that time, we can't because of seqcount protection.

Use ktime_get_mono_fast_ns instead which is lock safe for such case

Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers")
Reported-by: Biju Das <biju.das@bp.renesas.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

---
 drivers/base/power/runtime.c | 10 +++++-----
 include/linux/pm_runtime.h   |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

-- 
2.7.4

Comments

Vincent Guittot Jan. 30, 2019, 9:14 a.m. UTC | #1
Hi Geert,

On Wed, 30 Jan 2019 at 09:21, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>

> Hi Vincent,

>

> On Wed, Jan 30, 2019 at 9:16 AM Vincent Guittot

> <vincent.guittot@linaro.org> wrote:

> > A deadlock has been seen when swicthing clocksources which use PM runtime.

> > The call path is:

> > change_clocksource

> >     ...

> >     write_seqcount_begin

> >     ...

> >     timekeeping_update

> >         ...

> >         sh_cmt_clocksource_enable

> >             ...

> >             rpm_resume

> >                 pm_runtime_mark_last_busy

> >                     ktime_get

> >                         do

> >                             read_seqcount_begin

> >                         while read_seqcount_retry

> >     ....

> >     write_seqcount_end

> >

> > Although we should be safe because we haven't yet changed the clocksource

> > at that time, we can't because of seqcount protection.

> >

> > Use ktime_get_mono_fast_ns instead which is lock safe for such case

> >

> > Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers")

> > Reported-by: Biju Das <biju.das@bp.renesas.com>

> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

>

> Thanks for your patch!

>

> /**

>  * ktime_get_mono_fast_ns - Fast NMI safe access to clock monotonic

>  *

>  * This timestamp is not guaranteed to be monotonic across an update.

>  * The timestamp is calculated by:

>  *

>  *      now = base_mono + clock_delta * slope

>  *

>  * So if the update lowers the slope, readers who are forced to the

>  * not yet updated second array are still using the old steeper slope.

>  *

>  * tmono

>  * ^

>  * |    o  n

>  * |   o n

>  * |  u

>  * | o

>  * |o

>  * |12345678---> reader order

>  *

>  * o = old slope

>  * u = update

>  * n = new slope

>  *

>  * So reader 6 will observe time going backwards versus reader 5.

>  *

>  * While other CPUs are likely to be able observe that, the only way

>  * for a CPU local observation is when an NMI hits in the middle of

>  * the update. Timestamps taken from that NMI context might be ahead

>  * of the following timestamps. Callers need to be aware of that and

>  * deal with it.

>  */

>

> As this function is not guaranteed to be monotonic, have you checked how

> the Runtime PM code behaves if time goes backwards? Does it just make

> a suboptimal decision or does it crash?


As a worst case this will generate a suboptimal decision around the update

Regards,
Vincent

>

> Thanks!

>

> Gr{oetje,eeting}s,

>

>                         Geert

>

> --

> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

>

> In personal conversations with technical people, I call myself a hacker. But

> when I'm talking to journalists I just say "programmer" or something like that.

>                                 -- Linus Torvalds
Vincent Guittot Jan. 30, 2019, 9:41 a.m. UTC | #2
On Wed, 30 Jan 2019 at 10:39, Rafael J. Wysocki <rafael@kernel.org> wrote:
>

> On Wed, Jan 30, 2019 at 10:14 AM Vincent Guittot

> <vincent.guittot@linaro.org> wrote:

> >

> > Hi Geert,

> >

> > On Wed, 30 Jan 2019 at 09:21, Geert Uytterhoeven <geert@linux-m68k.org> wrote:

> > >

> > > Hi Vincent,

> > >

> > > On Wed, Jan 30, 2019 at 9:16 AM Vincent Guittot

> > > <vincent.guittot@linaro.org> wrote:

> > > > A deadlock has been seen when swicthing clocksources which use PM runtime.

> > > > The call path is:

> > > > change_clocksource

> > > >     ...

> > > >     write_seqcount_begin

> > > >     ...

> > > >     timekeeping_update

> > > >         ...

> > > >         sh_cmt_clocksource_enable

> > > >             ...

> > > >             rpm_resume

> > > >                 pm_runtime_mark_last_busy

> > > >                     ktime_get

> > > >                         do

> > > >                             read_seqcount_begin

> > > >                         while read_seqcount_retry

> > > >     ....

> > > >     write_seqcount_end

> > > >

> > > > Although we should be safe because we haven't yet changed the clocksource

> > > > at that time, we can't because of seqcount protection.

> > > >

> > > > Use ktime_get_mono_fast_ns instead which is lock safe for such case

> > > >

> > > > Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers")

> > > > Reported-by: Biju Das <biju.das@bp.renesas.com>

> > > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

> > >

> > > Thanks for your patch!

> > >

> > > /**

> > >  * ktime_get_mono_fast_ns - Fast NMI safe access to clock monotonic

> > >  *

> > >  * This timestamp is not guaranteed to be monotonic across an update.

> > >  * The timestamp is calculated by:

> > >  *

> > >  *      now = base_mono + clock_delta * slope

> > >  *

> > >  * So if the update lowers the slope, readers who are forced to the

> > >  * not yet updated second array are still using the old steeper slope.

> > >  *

> > >  * tmono

> > >  * ^

> > >  * |    o  n

> > >  * |   o n

> > >  * |  u

> > >  * | o

> > >  * |o

> > >  * |12345678---> reader order

> > >  *

> > >  * o = old slope

> > >  * u = update

> > >  * n = new slope

> > >  *

> > >  * So reader 6 will observe time going backwards versus reader 5.

> > >  *

> > >  * While other CPUs are likely to be able observe that, the only way

> > >  * for a CPU local observation is when an NMI hits in the middle of

> > >  * the update. Timestamps taken from that NMI context might be ahead

> > >  * of the following timestamps. Callers need to be aware of that and

> > >  * deal with it.

> > >  */

> > >

> > > As this function is not guaranteed to be monotonic, have you checked how

> > > the Runtime PM code behaves if time goes backwards? Does it just make

> > > a suboptimal decision or does it crash?

> >

> > As a worst case this will generate a suboptimal decision around the update

>

> So that should be explained in the changelog of the patch.  In detail,

> if poss, please.


Ok, I'm going to update the commit message
diff mbox series

Patch

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 457be03..708a13f 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -130,7 +130,7 @@  u64 pm_runtime_autosuspend_expiration(struct device *dev)
 {
 	int autosuspend_delay;
 	u64 last_busy, expires = 0;
-	u64 now = ktime_to_ns(ktime_get());
+	u64 now = ktime_get_mono_fast_ns();
 
 	if (!dev->power.use_autosuspend)
 		goto out;
@@ -909,7 +909,7 @@  static enum hrtimer_restart  pm_suspend_timer_fn(struct hrtimer *timer)
 	 * If 'expires' is after the current time, we've been called
 	 * too early.
 	 */
-	if (expires > 0 && expires < ktime_to_ns(ktime_get())) {
+	if (expires > 0 && expires < ktime_get_mono_fast_ns()) {
 		dev->power.timer_expires = 0;
 		rpm_suspend(dev, dev->power.timer_autosuspends ?
 		    (RPM_ASYNC | RPM_AUTO) : RPM_ASYNC);
@@ -928,7 +928,7 @@  static enum hrtimer_restart  pm_suspend_timer_fn(struct hrtimer *timer)
 int pm_schedule_suspend(struct device *dev, unsigned int delay)
 {
 	unsigned long flags;
-	ktime_t expires;
+	u64 expires;
 	int retval;
 
 	spin_lock_irqsave(&dev->power.lock, flags);
@@ -945,8 +945,8 @@  int pm_schedule_suspend(struct device *dev, unsigned int delay)
 	/* Other scheduled or pending requests need to be canceled. */
 	pm_runtime_cancel_pending(dev);
 
-	expires = ktime_add(ktime_get(), ms_to_ktime(delay));
-	dev->power.timer_expires = ktime_to_ns(expires);
+	expires = ktime_get_mono_fast_ns() + (u64)delay * NSEC_PER_MSEC);
+	dev->power.timer_expires = expires;
 	dev->power.timer_autosuspends = 0;
 	hrtimer_start(&dev->power.suspend_timer, expires, HRTIMER_MODE_ABS);
 
diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
index 54af4ee..fed5be7 100644
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -105,7 +105,7 @@  static inline bool pm_runtime_callbacks_present(struct device *dev)
 
 static inline void pm_runtime_mark_last_busy(struct device *dev)
 {
-	WRITE_ONCE(dev->power.last_busy, ktime_to_ns(ktime_get()));
+	WRITE_ONCE(dev->power.last_busy, ktime_get_mono_fast_ns());
 }
 
 static inline bool pm_runtime_is_irq_safe(struct device *dev)