diff mbox

[1/2] hrtimer: reprogram event for expires=KTIME_MAX in hrtimer_force_reprogram()

Message ID 189ea71f724d00995a73e63c8ed9ff5b7857a69d.1399623699.git.viresh.kumar@linaro.org
State New
Headers show

Commit Message

Viresh Kumar May 9, 2014, 8:40 a.m. UTC
In hrtimer_force_reprogram(), we are reprogramming event device only if the next
timer event is before KTIME_MAX. But what if it is equal to KTIME_MAX? As we
aren't reprogramming it again, it will be set to the last value it was, probably
tick interval, i.e. few milliseconds.

And we will get a interrupt due to that, wouldn't have any hrtimers to service
and return without doing much. But the implementation of event device's driver
may make it more stupid. For example: drivers/clocksource/arm_arch_timer.c
disables the event device only on SHUTDOWN/UNUSED requests in set-mode.
Otherwise, it will keep giving interrupts at tick interval even if
hrtimer_interrupt() didn't reprogram tick..

To get this fixed, lets reprogram event device even for KTIME_MAX, so that the
timer is scheduled for long enough.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 kernel/hrtimer.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Preeti U Murthy May 9, 2014, 10:34 a.m. UTC | #1
Hi Viresh,

On 05/09/2014 02:10 PM, Viresh Kumar wrote:
> In hrtimer_force_reprogram(), we are reprogramming event device only if the next
> timer event is before KTIME_MAX. But what if it is equal to KTIME_MAX? As we
> aren't reprogramming it again, it will be set to the last value it was, probably
> tick interval, i.e. few milliseconds.
> 
> And we will get a interrupt due to that, wouldn't have any hrtimers to service
> and return without doing much. But the implementation of event device's driver
> may make it more stupid. For example: drivers/clocksource/arm_arch_timer.c
> disables the event device only on SHUTDOWN/UNUSED requests in set-mode.
> Otherwise, it will keep giving interrupts at tick interval even if
> hrtimer_interrupt() didn't reprogram tick..
> 
> To get this fixed, lets reprogram event device even for KTIME_MAX, so that the
> timer is scheduled for long enough.

I looked through the code in arm_arch_timer.c and I think the more
fundamental problem lies in the timer handler there. Ideally even before
calling the tick event handler the timer handler must be programming the
tick device to fire at some __MAX__ time.
Then irrespective of whether the core kernel deems it appropriate to
program it or not, the max time by which a timer interrupt will get
deferred is __MAX__ and one will not find anomalies like what you saw.

The reason this got exposed in NOHZ_FULL config is because in a normal
NOHZ scenario when the cpu goes idle, and there are no pending timers in
timer_list, even then tick_sched_timer gets cancelled. Precisely the
scenario that you have described.
   But we don't get continuous interrupts then because the first time we
get an interrupt, we queue the tick_sched_timer and program the tick
device to the time of its expiry and therefore *push* the time at which
your tick device should fire further.
  In case of NOHZ_FULL however I am presuming you will not queue the
tick_sched_timer again unless there is more than one process in the
runqueue. Therefore the tick device keeps firing since its counter
remains in the expired state and is not pushed up.

Moreover from the core kernel's perspective also this does not look like
the right thing to do. The core timer code cannot *shutdown* a clock
device simply because there are no pending timers. Some arch may change
their notion of *shutdown* to rendering the tick device unusable. Some
archs may already do that.
   Hence I don't think we should take a drastic measure as to shutdown
the clock device in case of no pending timers,

My suggestion is as pointed above to set the tick device to a KTIME_MAX
equivalent before calling the timer interrupt event handler.

Regards
Preeti U Murthy
> 
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  kernel/hrtimer.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index 6b715c0..b21085c 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -591,8 +591,7 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal)
>  	if (cpu_base->hang_detected)
>  		return;
> 
> -	if (cpu_base->expires_next.tv64 != KTIME_MAX)
> -		tick_program_event(cpu_base->expires_next, 1);
> +	tick_program_event(cpu_base->expires_next, 1);
>  }
> 
>  /*
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Viresh Kumar May 9, 2014, 10:57 a.m. UTC | #2
On 9 May 2014 16:04, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
> On 05/09/2014 02:10 PM, Viresh Kumar wrote:

> I looked through the code in arm_arch_timer.c and I think the more
> fundamental problem lies in the timer handler there. Ideally even before
> calling the tick event handler the timer handler must be programming the
> tick device to fire at some __MAX__ time.

Ideally, the device should have stopped events as we programmed it in
ONESHOT mode. And should have waited for kernel to set it again..

But probably that device doesn't have a ONESHOT mode and is firing
again and again. Anyway the real problem I was trying to solve wasn't
infinite interrupts coming from event dev, but the first extra event that
we should have got rid of .. It just happened that we got more problems
on this particular board.

> Then irrespective of whether the core kernel deems it appropriate to
> program it or not, the max time by which a timer interrupt will get
> deferred is __MAX__ and one will not find anomalies like what you saw.

We will still get a interrupt once the counter overflows. And that is bad too.

> The reason this got exposed in NOHZ_FULL config is because in a normal
> NOHZ scenario when the cpu goes idle, and there are no pending timers in
> timer_list, even then tick_sched_timer gets cancelled. Precisely the
> scenario that you have described.

I haven't tried but it looks like this problem will exist there as well.. Who is
disabling the event device in that case when tick_sched timer goes off ?
The same question that is applicable in this case as well..

>    But we don't get continuous interrupts then because the first time we
> get an interrupt, we queue the tick_sched_timer and program the tick
> device to the time of its expiry and therefore *push* the time at which
> your tick device should fire further.

Probably not.. We don't get continuous interrupts because that's a special
case for my platform. But I am quite sure you would be getting one extra
interrupt after tick period, but because we didn't had anything to service
hrtimer_interrupt() routine just returned and CPU went into idle.

> Moreover from the core kernel's perspective also this does not look like
> the right thing to do. The core timer code cannot *shutdown* a clock
> device simply because there are no pending timers.

Why? To me it looks the right thing to do..

> Some arch may change
> their notion of *shutdown* to rendering the tick device unusable. Some
> archs may already do that.

There is only one definition of 'Shutdown' for me which every platform
must implement. Stop the event device to give any new events. that's it.

>    Hence I don't think we should take a drastic measure as to shutdown
> the clock device in case of no pending timers,

Sorry, I still don't agree :) .. We don't know when is the next time we need
to use a service, so free it. What will we get by pushing it to a long
long time ?
What would we loose if we SHUTDOWN it now ?

> My suggestion is as pointed above to set the tick device to a KTIME_MAX
> equivalent before calling the timer interrupt event handler.

This would still interrupt on overflow, so isn't the right idea..
Not currently as there are limitations, but later on with NO_HZ_FULL a core
should be allowed to go into infinite isolation, unless the application running
on it wants.. And this pushing to KTIME_MAX wouldn't work in that case..

Thanks for your review and the long chats we had about this problem since
yesterday on IRC..

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Preeti U Murthy May 10, 2014, 4:17 p.m. UTC | #3
On 05/09/2014 04:27 PM, Viresh Kumar wrote:
> On 9 May 2014 16:04, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
>> On 05/09/2014 02:10 PM, Viresh Kumar wrote:
> 
>> I looked through the code in arm_arch_timer.c and I think the more
>> fundamental problem lies in the timer handler there. Ideally even before
>> calling the tick event handler the timer handler must be programming the
>> tick device to fire at some __MAX__ time.
> 
> Ideally, the device should have stopped events as we programmed it in
> ONESHOT mode. And should have waited for kernel to set it again..
> 
> But probably that device doesn't have a ONESHOT mode and is firing
> again and again. Anyway the real problem I was trying to solve wasn't
> infinite interrupts coming from event dev, but the first extra event that
> we should have got rid of .. It just happened that we got more problems
> on this particular board.

So on a timer interrupt the tick device, irrespective of if it is in
ONESHOT mode or not, is in an expired state. Thus it will continue to
fire. What has ONESHOT mode got to do with this?

> 
>> Then irrespective of whether the core kernel deems it appropriate to
>> program it or not, the max time by which a timer interrupt will get
>> deferred is __MAX__ and one will not find anomalies like what you saw.
> 
> We will still get a interrupt once the counter overflows. And that is bad too.

Hmm true.
> 
>> The reason this got exposed in NOHZ_FULL config is because in a normal
>> NOHZ scenario when the cpu goes idle, and there are no pending timers in
>> timer_list, even then tick_sched_timer gets cancelled. Precisely the
>> scenario that you have described.
> 
> I haven't tried but it looks like this problem will exist there as well.. Who is
> disabling the event device in that case when tick_sched timer goes off ?
> The same question that is applicable in this case as well..
> 
>>    But we don't get continuous interrupts then because the first time we
>> get an interrupt, we queue the tick_sched_timer and program the tick
>> device to the time of its expiry and therefore *push* the time at which
>> your tick device should fire further.
> 
> Probably not.. We don't get continuous interrupts because that's a special
> case for my platform. But I am quite sure you would be getting one extra
> interrupt after tick period, but because we didn't had anything to service

Hmm? I didn't get this. Why would we?  We ensure that if there are no
pending timers in timer_list the tick_sched_timer is cancelled. We
cannot get spurious interrupts when there are no pending timers in NOHZ
mode.

> hrtimer_interrupt() routine just returned and CPU went into idle.
> 
>> Moreover from the core kernel's perspective also this does not look like
>> the right thing to do. The core timer code cannot *shutdown* a clock
>> device simply because there are no pending timers.
> 
> Why? To me it looks the right thing to do..
> 
>> Some arch may change
>> their notion of *shutdown* to rendering the tick device unusable. Some
>> archs may already do that.
> 
> There is only one definition of 'Shutdown' for me which every platform
> must implement. Stop the event device to give any new events. that's it.
> 
>>    Hence I don't think we should take a drastic measure as to shutdown
>> the clock device in case of no pending timers,
> 
> Sorry, I still don't agree :) .. We don't know when is the next time we need
> to use a service, so free it. What will we get by pushing it to a long
> long time ?
> What would we loose if we SHUTDOWN it now ?
> 
>> My suggestion is as pointed above to set the tick device to a KTIME_MAX
>> equivalent before calling the timer interrupt event handler.
> 
> This would still interrupt on overflow, so isn't the right idea..
> Not currently as there are limitations, but later on with NO_HZ_FULL a core
> should be allowed to go into infinite isolation, unless the application running
> on it wants.. And this pushing to KTIME_MAX wouldn't work in that case..

Hmm yeah looking at the problem that you are trying to solve, that being
completely disabling timer interrupts on cpus that are running just one
process, it appears to me that setting the tick device in SHUTDOWN mode
is the only way to do so. And you are right. We use SHUTDOWN mode to
imply that the device can be switched off. Its upto the arch to react to
it appropriately.

My concern is on powerpc today when we set the device to SHUTDOWN mode
we set the decrementer to a MAX value. Which means we will get
interrupts only spaced out more widely in time. But on NOHZ_FULL mode if
you are looking at completely disabling tick_sched_timer as long as a
single process runs then we might need to change the semantics here.

Regards
Preeti U Murthy
> 
> Thanks for your review and the long chats we had about this problem since
> yesterday on IRC..
> 
> --
> viresh
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Viresh Kumar May 12, 2014, 5:53 a.m. UTC | #4
On 10 May 2014 21:47, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
> On 05/09/2014 04:27 PM, Viresh Kumar wrote:
>> On 9 May 2014 16:04, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:

>> Ideally, the device should have stopped events as we programmed it in
>> ONESHOT mode. And should have waited for kernel to set it again..
>>
>> But probably that device doesn't have a ONESHOT mode and is firing
>> again and again. Anyway the real problem I was trying to solve wasn't
>> infinite interrupts coming from event dev, but the first extra event that
>> we should have got rid of .. It just happened that we got more problems
>> on this particular board.
>
> So on a timer interrupt the tick device, irrespective of if it is in
> ONESHOT mode or not, is in an expired state. Thus it will continue to
> fire. What has ONESHOT mode got to do with this?

So, the arch specific timer handler must be clearing it I suppose and it
shouldn't have fired again after 5 ms as it is not reprogrammed.

Probably that's an implementation specific stuff.. I have seen timers which
have two modes, periodic: they fire continuously and oneshot: they get
disabled after firing and have to be reprogrammed.

>>> The reason this got exposed in NOHZ_FULL config is because in a normal
>>> NOHZ scenario when the cpu goes idle, and there are no pending timers in
>>> timer_list, even then tick_sched_timer gets cancelled. Precisely the
>>> scenario that you have described.
>>
>> I haven't tried but it looks like this problem will exist there as well.. Who is
>> disabling the event device in that case when tick_sched timer goes off ?
>> The same question that is applicable in this case as well..
>>
>>>    But we don't get continuous interrupts then because the first time we
>>> get an interrupt, we queue the tick_sched_timer and program the tick
>>> device to the time of its expiry and therefore *push* the time at which
>>> your tick device should fire further.
>>
>> Probably not.. We don't get continuous interrupts because that's a special
>> case for my platform. But I am quite sure you would be getting one extra
>> interrupt after tick period, but because we didn't had anything to service
>
> Hmm? I didn't get this. Why would we?  We ensure that if there are no
> pending timers in timer_list the tick_sched_timer is cancelled. We
> cannot get spurious interrupts when there are no pending timers in NOHZ
> mode.

Okay, there are no pending timers to fire and even we have disabled
tick_sched_timer as well.. But the event dev isn't SHUTDOWN or reprogrammed.
And so it must fire after tick interval? Exactly the same issue we are getting
here in NO_HZ_FULL..

And the worst part is we aren't getting these interrupts in traces as well.
Somebody probably need to revisit the trace_irq_handler_entry part as well
to catch such problems.

> Hmm yeah looking at the problem that you are trying to solve, that being
> completely disabling timer interrupts on cpus that are running just one
> process, it appears to me that setting the tick device in SHUTDOWN mode
> is the only way to do so. And you are right. We use SHUTDOWN mode to
> imply that the device can be switched off. Its upto the arch to react to
> it appropriately.

So, from the mail where tglx blasted me off, we have a better solution to
implement now :)

> My concern is on powerpc today when we set the device to SHUTDOWN mode
> we set the decrementer to a MAX value. Which means we will get
> interrupts only spaced out more widely in time. But on NOHZ_FULL mode if
> you are looking at completely disabling tick_sched_timer as long as a
> single process runs then we might need to change the semantics here.

Lets see if we can do some nice stuff with ONESHOT_STOPPED state..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Preeti U Murthy May 13, 2014, 4:57 a.m. UTC | #5
On 05/12/2014 11:23 AM, Viresh Kumar wrote:
> On 10 May 2014 21:47, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
>> On 05/09/2014 04:27 PM, Viresh Kumar wrote:
>>> On 9 May 2014 16:04, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
> 
>>> Ideally, the device should have stopped events as we programmed it in
>>> ONESHOT mode. And should have waited for kernel to set it again..
>>>
>>> But probably that device doesn't have a ONESHOT mode and is firing
>>> again and again. Anyway the real problem I was trying to solve wasn't
>>> infinite interrupts coming from event dev, but the first extra event that
>>> we should have got rid of .. It just happened that we got more problems
>>> on this particular board.
>>
>> So on a timer interrupt the tick device, irrespective of if it is in
>> ONESHOT mode or not, is in an expired state. Thus it will continue to
>> fire. What has ONESHOT mode got to do with this?
> 
> So, the arch specific timer handler must be clearing it I suppose and it
> shouldn't have fired again after 5 ms as it is not reprogrammed.
> 
> Probably that's an implementation specific stuff.. I have seen timers which
> have two modes, periodic: they fire continuously and oneshot: they get
> disabled after firing and have to be reprogrammed.

Yes that is implementation specific. From a core kernel code's
perspective, periodic mode is where the clock device will fire
periodically; it resets itself on every expiry. Oneshot mode is where
the clock device should explicitly be programmed to fire and can be
programmed for any point in time unlike periodic where the clock device
should be programmed only at periodic intervals. That is it. Beyond that
the core kernel cannot assume anything more.

Its possible that in oneshot mode some implementations disable the clock
device on expiry. Some might not. Hence the core code must make
decisions which *will not break either of these implementations*. For
example on PowerPC we do not disable the clock device. Instead we reset
the clock device to fire after 4seconds by default on expiry unless
there is some timer queued which sets the clock device to fire
appropriately.

The point is the implementations have their reasons for implementing
these modes and the core kernel code cannot be based on "Ideally the
device should have stopped" assumptions.

> 
>>>> The reason this got exposed in NOHZ_FULL config is because in a normal
>>>> NOHZ scenario when the cpu goes idle, and there are no pending timers in
>>>> timer_list, even then tick_sched_timer gets cancelled. Precisely the
>>>> scenario that you have described.
>>>
>>> I haven't tried but it looks like this problem will exist there as well.. Who is
>>> disabling the event device in that case when tick_sched timer goes off ?
>>> The same question that is applicable in this case as well..
>>>
>>>>    But we don't get continuous interrupts then because the first time we
>>>> get an interrupt, we queue the tick_sched_timer and program the tick
>>>> device to the time of its expiry and therefore *push* the time at which
>>>> your tick device should fire further.
>>>
>>> Probably not.. We don't get continuous interrupts because that's a special
>>> case for my platform. But I am quite sure you would be getting one extra
>>> interrupt after tick period, but because we didn't had anything to service
>>
>> Hmm? I didn't get this. Why would we?  We ensure that if there are no
>> pending timers in timer_list the tick_sched_timer is cancelled. We
>> cannot get spurious interrupts when there are no pending timers in NOHZ
>> mode.
> 
> Okay, there are no pending timers to fire and even we have disabled
> tick_sched_timer as well.. But the event dev isn't SHUTDOWN or reprogrammed.
> And so it must fire after tick interval? Exactly the same issue we are getting
> here in NO_HZ_FULL..

Not after tick interval. If there is a pending hrtimer, the event dev
will fire at its expiry time. If there are no pending hrtimers as well
as timers in timer_list, then its upto the arch to decide how it will
handle "no pending timer events in the future". It can set the clock
device to expire at some far off time for an example.

> 
> And the worst part is we aren't getting these interrupts in traces as well.
> Somebody probably need to revisit the trace_irq_handler_entry part as well
> to catch such problems.
> 
>> Hmm yeah looking at the problem that you are trying to solve, that being
>> completely disabling timer interrupts on cpus that are running just one
>> process, it appears to me that setting the tick device in SHUTDOWN mode
>> is the only way to do so. And you are right. We use SHUTDOWN mode to
>> imply that the device can be switched off. Its upto the arch to react to
>> it appropriately.
> 
> So, from the mail where tglx blasted me off, we have a better solution to
> implement now :)
> 
>> My concern is on powerpc today when we set the device to SHUTDOWN mode
>> we set the decrementer to a MAX value. Which means we will get
>> interrupts only spaced out more widely in time. But on NOHZ_FULL mode if
>> you are looking at completely disabling tick_sched_timer as long as a
>> single process runs then we might need to change the semantics here.
> 
> Lets see if we can do some nice stuff with ONESHOT_STOPPED state..

Indeed. That is a very clean way to getting this done. We were trying to
extrapolate the existing code to solve this problem. That was bound to
snap :)

Regards
Preeti U Murthy
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
diff mbox

Patch

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 6b715c0..b21085c 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -591,8 +591,7 @@  hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal)
 	if (cpu_base->hang_detected)
 		return;
 
-	if (cpu_base->expires_next.tv64 != KTIME_MAX)
-		tick_program_event(cpu_base->expires_next, 1);
+	tick_program_event(cpu_base->expires_next, 1);
 }
 
 /*