diff mbox series

[RFC] watchdog: base rate-limiting on get_ticks() rather than get_timer()

Message ID 20200605111657.28773-1-rasmus.villemoes@prevas.dk
State New
Headers show
Series [RFC] watchdog: base rate-limiting on get_ticks() rather than get_timer() | expand

Commit Message

Rasmus Villemoes June 5, 2020, 11:16 a.m. UTC
On powerpc, get_timer() is implemented using a volatile variable that
gets incremented from the decrementer interrupt handler. Hence, when
interrupts are disabled, time as measured by get_timer() ceases to
pass.

Interrupts are necessarily disabled during bootm (see the comment in
bootm_disable_interrupts() for why). But after interrupts are
disabled, there's still lots of work to do - e.g. decompressing the
kernel image to the right load address. Unfortunately, the
rate-limiting logic in wdt_uclass.c's watchdog_reset function means
that WATCHDOG_RESET() becomes a no-op, since get_timer() never
progresses past the next_reset.

This, combined with an external gpio-triggered watchdog that must be
petted at least every 800ms, means our board gets reset before booting
into linux.

Now, at least on powerpc, get_ticks() continues to return sensible
results whether or not interrupts are enabled. So this fixes the above
problem for our board - but I don't know if get_ticks() can be assumed
to always work.

Signed-off-by: Rasmus Villemoes <rasmus.villemoes at prevas.dk>
---
This is what I had in mind. I also considered making it a config knob
(possibly just auto-selected based on $ARCH) whether to use
get_timer() or get_ticks(), but that becomes quite ugly.

 drivers/watchdog/wdt-uclass.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

Comments

Stefan Roese June 5, 2020, 11:48 a.m. UTC | #1
On 05.06.20 13:16, Rasmus Villemoes wrote:
> On powerpc, get_timer() is implemented using a volatile variable that
> gets incremented from the decrementer interrupt handler. Hence, when
> interrupts are disabled, time as measured by get_timer() ceases to
> pass.
> 
> Interrupts are necessarily disabled during bootm (see the comment in
> bootm_disable_interrupts() for why). But after interrupts are
> disabled, there's still lots of work to do - e.g. decompressing the
> kernel image to the right load address. Unfortunately, the
> rate-limiting logic in wdt_uclass.c's watchdog_reset function means
> that WATCHDOG_RESET() becomes a no-op, since get_timer() never
> progresses past the next_reset.
> 
> This, combined with an external gpio-triggered watchdog that must be
> petted at least every 800ms, means our board gets reset before booting
> into linux.
> 
> Now, at least on powerpc, get_ticks() continues to return sensible
> results whether or not interrupts are enabled. So this fixes the above
> problem for our board - but I don't know if get_ticks() can be assumed
> to always work.
> 
> Signed-off-by: Rasmus Villemoes <rasmus.villemoes at prevas.dk>
> ---
> This is what I had in mind. I also considered making it a config knob
> (possibly just auto-selected based on $ARCH) whether to use
> get_timer() or get_ticks(), but that becomes quite ugly.

I hesitate a bit, moving with this generic code from get_timer()
to get_ticks() for all boards. Did you test this patch on other
platforms, like some ARM boards?

Please note that I don't reject it - just asking.

Thanks,
Stefan

>   drivers/watchdog/wdt-uclass.c | 22 ++++++++++++----------
>   1 file changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/watchdog/wdt-uclass.c b/drivers/watchdog/wdt-uclass.c
> index 4cdb7bd64c..7be4e9b5bc 100644
> --- a/drivers/watchdog/wdt-uclass.c
> +++ b/drivers/watchdog/wdt-uclass.c
> @@ -16,14 +16,15 @@ DECLARE_GLOBAL_DATA_PTR;
>   
>   #define WATCHDOG_TIMEOUT_SECS	(CONFIG_WATCHDOG_TIMEOUT_MSECS / 1000)
>   
> -/*
> - * Reset every 1000ms, or however often is required as indicated by a
> - * hw_margin_ms property.
> - */
> -static ulong reset_period = 1000;
> +static u64 ratelimit_ticks;
>   
>   int initr_watchdog(void)
>   {
> +	/*
> +	 * Reset every 1000ms, or however often is required as indicated by a
> +	 * hw_margin_ms property.
> +	 */
> +	u32 reset_period = 1000;
>   	u32 timeout = WATCHDOG_TIMEOUT_SECS;
>   
>   	/*
> @@ -48,6 +49,7 @@ int initr_watchdog(void)
>   						    4 * reset_period) / 4;
>   	}
>   
> +	ratelimit_ticks = usec_to_tick(reset_period * 1000);
>   	wdt_start(gd->watchdog_dev, timeout * 1000, 0);
>   	gd->flags |= GD_FLG_WDT_READY;
>   	printf("WDT:   Started with%s servicing (%ds timeout)\n",
> @@ -117,17 +119,17 @@ int wdt_expire_now(struct udevice *dev, ulong flags)
>    */
>   void watchdog_reset(void)
>   {
> -	static ulong next_reset;
> -	ulong now;
> +	static u64 last_reset;
> +	u64 now;
>   
>   	/* Exit if GD is not ready or watchdog is not initialized yet */
>   	if (!gd || !(gd->flags & GD_FLG_WDT_READY))
>   		return;
>   
>   	/* Do not reset the watchdog too often */
> -	now = get_timer(0);
> -	if (time_after(now, next_reset)) {
> -		next_reset = now + reset_period;
> +	now = get_ticks();
> +	if (now - last_reset >= ratelimit_ticks) {
> +		last_reset = now;
>   		wdt_reset(gd->watchdog_dev);
>   	}
>   }
> 


Viele Gr??e,
Stefan
Rasmus Villemoes June 5, 2020, 12:11 p.m. UTC | #2
On 05/06/2020 13.48, Stefan Roese wrote:
> On 05.06.20 13:16, Rasmus Villemoes wrote:
>> This is what I had in mind. I also considered making it a config knob
>> (possibly just auto-selected based on $ARCH) whether to use
>> get_timer() or get_ticks(), but that becomes quite ugly.
> 
> I hesitate a bit, moving with this generic code from get_timer()
> to get_ticks() for all boards. Did you test this patch on other
> platforms, like some ARM boards?
> 
> Please note that I don't reject it - just asking.

Yeah, I'm not really too happy about it myself, exactly because it
affects all arches/platforms. And no, I don't have other hardware handy
unfortunately. So it's very much an RFC where I hope someone with
knowledge of the various arches can say whether one can expect
get_ticks() to be at least as widely available as get_timer().

Rasmus
Stefan Roese June 5, 2020, 12:13 p.m. UTC | #3
On 05.06.20 14:11, Rasmus Villemoes wrote:
> On 05/06/2020 13.48, Stefan Roese wrote:
>> On 05.06.20 13:16, Rasmus Villemoes wrote:
>>> This is what I had in mind. I also considered making it a config knob
>>> (possibly just auto-selected based on $ARCH) whether to use
>>> get_timer() or get_ticks(), but that becomes quite ugly.
>>
>> I hesitate a bit, moving with this generic code from get_timer()
>> to get_ticks() for all boards. Did you test this patch on other
>> platforms, like some ARM boards?
>>
>> Please note that I don't reject it - just asking.
> 
> Yeah, I'm not really too happy about it myself, exactly because it
> affects all arches/platforms. And no, I don't have other hardware handy
> unfortunately. So it's very much an RFC where I hope someone with
> knowledge of the various arches can say whether one can expect
> get_ticks() to be at least as widely available as get_timer().

I'll try to test it on a non-powerpc platform soon'ish.

Thanks,
Stefan
Stefan Roese June 5, 2020, 1:37 p.m. UTC | #4
On 05.06.20 14:13, Stefan Roese wrote:
> On 05.06.20 14:11, Rasmus Villemoes wrote:
>> On 05/06/2020 13.48, Stefan Roese wrote:
>>> On 05.06.20 13:16, Rasmus Villemoes wrote:
>>>> This is what I had in mind. I also considered making it a config knob
>>>> (possibly just auto-selected based on $ARCH) whether to use
>>>> get_timer() or get_ticks(), but that becomes quite ugly.
>>>
>>> I hesitate a bit, moving with this generic code from get_timer()
>>> to get_ticks() for all boards. Did you test this patch on other
>>> platforms, like some ARM boards?
>>>
>>> Please note that I don't reject it - just asking.
>>
>> Yeah, I'm not really too happy about it myself, exactly because it
>> affects all arches/platforms. And no, I don't have other hardware handy
>> unfortunately. So it's very much an RFC where I hope someone with
>> knowledge of the various arches can say whether one can expect
>> get_ticks() to be at least as widely available as get_timer().
> 
> I'll try to test it on a non-powerpc platform soon'ish.

I've tested it on an MIPS based platform (gardena-smart-gateway-mt7688)
and it works there without any issues AFAICT.

Two more remarks:

Could you please run a build test with this patch applied for all
boards (Travis, Azure...)?

And do you see an issue, that the timer-wrap fixed by Chris with this
patch [1] will not re-occur with your "ticks" approach?

Thanks,
Stefan

[1]
Author: Chris Packham <judge.packham at gmail.com>  2020-02-24 01:20:33
Committer: Stefan Roese <sr at denx.de>  2020-03-16 11:25:12
Parent: db41d65a97f335167e1fbc0400a83333b5157703 (common: Move hang() to 
the same header as panic())
Child:  b4d9452c442769e6dc25649ac02db2d5ed5ea0c8 (watchdog: move 
initr_watchdog() to wdt-uclass.c)
Branches: master, watchdog-ratelimit-2020-06-05-rfc and many more (84)
Follows: v2020.04-rc3
Precedes: v2020.04-rc4

     watchdog: Handle timer wrap around

     On some platforms/architectures the value from get_timer() can wrap.
     This is particularly problematic when long-running code needs to 
measure
     a time difference as is the case with watchdog_reset() which tries to
     avoid tickling the watchdog too frequently.

     Use time_after() from time.h instead of a plain > comparison to avoid
     any issues with the time wrapping on a system that has been sitting in
     u-boot for a long time.

     Signed-off-by: Chris Packham <judge.packham at gmail.com>
     Reviewed-by: Stefan Roese <sr at denx.de>
Rasmus Villemoes June 5, 2020, 2:18 p.m. UTC | #5
On 05/06/2020 15.37, Stefan Roese wrote:
> On 05.06.20 14:13, Stefan Roese wrote:
>> On 05.06.20 14:11, Rasmus Villemoes wrote:
>>> On 05/06/2020 13.48, Stefan Roese wrote:
>>>> On 05.06.20 13:16, Rasmus Villemoes wrote:
>>>>> This is what I had in mind. I also considered making it a config knob
>>>>> (possibly just auto-selected based on $ARCH) whether to use
>>>>> get_timer() or get_ticks(), but that becomes quite ugly.
>>>>
>>>> I hesitate a bit, moving with this generic code from get_timer()
>>>> to get_ticks() for all boards. Did you test this patch on other
>>>> platforms, like some ARM boards?
>>>>
>>>> Please note that I don't reject it - just asking.
>>>
>>> Yeah, I'm not really too happy about it myself, exactly because it
>>> affects all arches/platforms. And no, I don't have other hardware handy
>>> unfortunately. So it's very much an RFC where I hope someone with
>>> knowledge of the various arches can say whether one can expect
>>> get_ticks() to be at least as widely available as get_timer().
>>
>> I'll try to test it on a non-powerpc platform soon'ish.
> 
> I've tested it on an MIPS based platform (gardena-smart-gateway-mt7688)
> and it works there without any issues AFAICT.

Thanks.

> Two more remarks:
> 
> Could you please run a build test with this patch applied for all
> boards (Travis, Azure...)?

I may need a few more pointers than that. What am I supposed to do exactly?

> And do you see an issue, that the timer-wrap fixed by Chris with this
> patch [1] will not re-occur with your "ticks" approach?

No, it will not re-occur, because this does the comparison the right
way, namely by subtracting the absolute times and comparing that delta
to the threshold (that's also what the time_after does under the hood).
The wrong way is saying now >= last_reset + threshold (or as I think the
code used to do, computing next_reset = now + threshold and then asking
now >= next_reset).

Rasmus
Stefan Roese June 5, 2020, 2:23 p.m. UTC | #6
On 05.06.20 16:18, Rasmus Villemoes wrote:
> On 05/06/2020 15.37, Stefan Roese wrote:
>> On 05.06.20 14:13, Stefan Roese wrote:
>>> On 05.06.20 14:11, Rasmus Villemoes wrote:
>>>> On 05/06/2020 13.48, Stefan Roese wrote:
>>>>> On 05.06.20 13:16, Rasmus Villemoes wrote:
>>>>>> This is what I had in mind. I also considered making it a config knob
>>>>>> (possibly just auto-selected based on $ARCH) whether to use
>>>>>> get_timer() or get_ticks(), but that becomes quite ugly.
>>>>>
>>>>> I hesitate a bit, moving with this generic code from get_timer()
>>>>> to get_ticks() for all boards. Did you test this patch on other
>>>>> platforms, like some ARM boards?
>>>>>
>>>>> Please note that I don't reject it - just asking.
>>>>
>>>> Yeah, I'm not really too happy about it myself, exactly because it
>>>> affects all arches/platforms. And no, I don't have other hardware handy
>>>> unfortunately. So it's very much an RFC where I hope someone with
>>>> knowledge of the various arches can say whether one can expect
>>>> get_ticks() to be at least as widely available as get_timer().
>>>
>>> I'll try to test it on a non-powerpc platform soon'ish.
>>
>> I've tested it on an MIPS based platform (gardena-smart-gateway-mt7688)
>> and it works there without any issues AFAICT.
> 
> Thanks.
> 
>> Two more remarks:
>>
>> Could you please run a build test with this patch applied for all
>> boards (Travis, Azure...)?
> 
> I may need a few more pointers than that. What am I supposed to do exactly?

Not sure if there is some documentation on how to use the Travis,
Gitlab or Azure build. It mainly uses buildman [1] internally, so you
can use buildman locally if you don't want to push the build into
the cloud.

>> And do you see an issue, that the timer-wrap fixed by Chris with this
>> patch [1] will not re-occur with your "ticks" approach?
> 
> No, it will not re-occur, because this does the comparison the right
> way, namely by subtracting the absolute times and comparing that delta
> to the threshold (that's also what the time_after does under the hood).
> The wrong way is saying now >= last_reset + threshold (or as I think the
> code used to do, computing next_reset = now + threshold and then asking
> now >= next_reset).

Ah, good to know. Thanks for confirming.

Thanks,
Stefan

[1] tools/buildman/README
Tom Rini June 5, 2020, 2:34 p.m. UTC | #7
On Fri, Jun 05, 2020 at 04:23:21PM +0200, Stefan Roese wrote:
> On 05.06.20 16:18, Rasmus Villemoes wrote:
> > On 05/06/2020 15.37, Stefan Roese wrote:
> > > On 05.06.20 14:13, Stefan Roese wrote:
> > > > On 05.06.20 14:11, Rasmus Villemoes wrote:
> > > > > On 05/06/2020 13.48, Stefan Roese wrote:
> > > > > > On 05.06.20 13:16, Rasmus Villemoes wrote:
> > > > > > > This is what I had in mind. I also considered making it a config knob
> > > > > > > (possibly just auto-selected based on $ARCH) whether to use
> > > > > > > get_timer() or get_ticks(), but that becomes quite ugly.
> > > > > > 
> > > > > > I hesitate a bit, moving with this generic code from get_timer()
> > > > > > to get_ticks() for all boards. Did you test this patch on other
> > > > > > platforms, like some ARM boards?
> > > > > > 
> > > > > > Please note that I don't reject it - just asking.
> > > > > 
> > > > > Yeah, I'm not really too happy about it myself, exactly because it
> > > > > affects all arches/platforms. And no, I don't have other hardware handy
> > > > > unfortunately. So it's very much an RFC where I hope someone with
> > > > > knowledge of the various arches can say whether one can expect
> > > > > get_ticks() to be at least as widely available as get_timer().
> > > > 
> > > > I'll try to test it on a non-powerpc platform soon'ish.
> > > 
> > > I've tested it on an MIPS based platform (gardena-smart-gateway-mt7688)
> > > and it works there without any issues AFAICT.
> > 
> > Thanks.
> > 
> > > Two more remarks:
> > > 
> > > Could you please run a build test with this patch applied for all
> > > boards (Travis, Azure...)?
> > 
> > I may need a few more pointers than that. What am I supposed to do exactly?
> 
> Not sure if there is some documentation on how to use the Travis,
> Gitlab or Azure build. It mainly uses buildman [1] internally, so you
> can use buildman locally if you don't want to push the build into
> the cloud.

So, here's a recently-I-learned.  If you submit a pull request on GitHub
to https://github.com/u-boot/u-boot that will trigger Travis and Azure
runs:
https://travis-ci.org/github/u-boot/u-boot/pull_requests
and somewhere under:
https://dev.azure.com/u-boot/u-boot/_build?definitionId=2&_a=summary

Note that sometimes Travis will have connectivity failures and you have
to be me to tell it to re-run the job.  Azure only has failures when we
get those to-be-debugged random failures of sandbox/qemu targets.
Rasmus Villemoes June 5, 2020, 7:56 p.m. UTC | #8
On 05/06/2020 16.34, Tom Rini wrote:
> On Fri, Jun 05, 2020 at 04:23:21PM +0200, Stefan Roese wrote:
>> On 05.06.20 16:18, Rasmus Villemoes wrote:
>>> On 05/06/2020 15.37, Stefan Roese wrote:
>>>> Could you please run a build test with this patch applied for all
>>>> boards (Travis, Azure...)?
>>>
>>> I may need a few more pointers than that. What am I supposed to do exactly?
>>
>> Not sure if there is some documentation on how to use the Travis,
>> Gitlab or Azure build. It mainly uses buildman [1] internally, so you
>> can use buildman locally if you don't want to push the build into
>> the cloud.
> 
> So, here's a recently-I-learned.  If you submit a pull request on GitHub
> to https://github.com/u-boot/u-boot that will trigger Travis and Azure
> runs:
> https://travis-ci.org/github/u-boot/u-boot/pull_requests
> and somewhere under:
> https://dev.azure.com/u-boot/u-boot/_build?definitionId=2&_a=summary
> 
> Note that sometimes Travis will have connectivity failures and you have
> to be me to tell it to re-run the job.  Azure only has failures when we
> get those to-be-debugged random failures of sandbox/qemu targets.
> 

Thanks! FWIW,
https://travis-ci.org/github/u-boot/u-boot/builds/695074829 and
https://dev.azure.com/u-boot/u-boot/_build/results?buildId=713&view=results
are all green - but how many of those actually build wdt-uclass.c I
don't know.

Rasmus
Christophe Leroy April 9, 2021, 2:37 p.m. UTC | #9
Le 09/04/2021 à 16:12, Rasmus Villemoes a écrit :
> On powerpc, time as measured by get_timer() ceases to pass when

> interrupts are disabled (since on powerpc get_timer() returns the

> value of a volatile variable that gets updated via a timer

> interrupt). That in turn means the watchdog_reset() function provided

> by CONFIG_WDT ceases to work due to the ratelimiting it imposes.

> 

> Normally, interrupts are just disabled very briefly. However, during

> bootm, they are disabled for good prior to decompressing the kernel

> image, which can be a somewhat time-consuming operation. Even when we

> manage to decompress the kernel and do the other preparation steps and

> hand over control to the kernel, the kernel also takes some time

> before it is ready to assume responsibility for handling the

> watchdog. The end result is that the board gets reset prematurely.

> 

> The ratelimiting isn't really strictly needed (prior to DM WDT, no

> such thing existed), so just disable it when we know that time no

> longer passes and have watchdog_reset() (e.g. called from

> decompression loop) unconditionally reset the watchdog timer.



Do we need to make it that complicated ? I think before the generic implementation, powerpc didn't 
have a rate limitation at all for pinging the watchdog, why not go back this direction, all the time ?

I mean we could simply set reset_period to 0 at all time for powerpc ( and change the test to 
time_after_eq() instead of time_after() ).


> 

> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>

> ---

> 

> I previously sent a patch to change the ratelimiting to be based on

> get_ticks() instead of get_timer(), but that has gone nowhere

> [1]. This is an alternative which only affects powerpc (and only

> boards that have enabled CONFIG_WDT). I hope the watchdog maintainers

> will accept at least one of these, or suggest a third alternative, so

> I don't have to keep some out-of-tree patch applied without knowing if

> that's the direction upstream will take.

> 

> [1] https://patchwork.ozlabs.org/project/uboot/patch/20200605111657.28773-1-rasmus.villemoes@prevas.dk/

> 

>
Rasmus Villemoes April 12, 2021, 7:04 a.m. UTC | #10
On 09/04/2021 16.37, Christophe Leroy wrote:
> 

> 

> Le 09/04/2021 à 16:12, Rasmus Villemoes a écrit :


>> The ratelimiting isn't really strictly needed (prior to DM WDT, no

>> such thing existed), so just disable it when we know that time no

>> longer passes and have watchdog_reset() (e.g. called from

>> decompression loop) unconditionally reset the watchdog timer.

> 

> 

> Do we need to make it that complicated ? I think before the generic

> implementation, powerpc didn't have a rate limitation at all for pinging

> the watchdog, why not go back this direction, all the time ?

> 

> I mean we could simply set reset_period to 0 at all time for powerpc (

> and change the test to time_after_eq() instead of time_after() ).


Maybe. I think I'd still prefer the one that changes the rate-limiting
to be based on get_ticks() instead of get_timer(), which did seem to
work everywhere.

IIRC, Stefan previously rejected having a config knob or board hook for
disabling the ratelimiting - changing the code [the s/after/after_eq/]
to effectively allow that by passing hw_margin_ms=0 in DT would probably
work, but seems quite hacky (especially when DT is supposed to describe
hardware).

I've also floated the option of making get_timer() return something
sensible even with interrupts off on ppc - that would also solve the
problem, and would benefit other generic code that uses get_timer()
without knowing its limitation on ppc.
https://lists.denx.de/pipermail/u-boot/2020-June/414842.html .

So, there are lots of ways of solving it. Which direction to take really
depends on which viewpoint one has:

(1) the watchdog code should, if rate-limiting at all, use an
implementation that works on all platforms [the get_ticks option]
(2) the watchdog code should, if rate-limiting at all, provide a way for
a board/platform to override or disable that
(3) get_timer() on ppc is broken, it needs to provide sensible results
at all times, even when interrupts are disabled for tens or hundreds of
milliseconds.

Rasmus
Stefan Roese April 12, 2021, 8:57 a.m. UTC | #11
On 12.04.21 09:04, Rasmus Villemoes wrote:
> On 09/04/2021 16.37, Christophe Leroy wrote:

>>

>>

>> Le 09/04/2021 à 16:12, Rasmus Villemoes a écrit :

> 

>>> The ratelimiting isn't really strictly needed (prior to DM WDT, no

>>> such thing existed), so just disable it when we know that time no

>>> longer passes and have watchdog_reset() (e.g. called from

>>> decompression loop) unconditionally reset the watchdog timer.

>>

>>

>> Do we need to make it that complicated ? I think before the generic

>> implementation, powerpc didn't have a rate limitation at all for pinging

>> the watchdog, why not go back this direction, all the time ?

>>

>> I mean we could simply set reset_period to 0 at all time for powerpc (

>> and change the test to time_after_eq() instead of time_after() ).

> 

> Maybe. I think I'd still prefer the one that changes the rate-limiting

> to be based on get_ticks() instead of get_timer(), which did seem to

> work everywhere.

> 

> IIRC, Stefan previously rejected having a config knob or board hook for

> disabling the ratelimiting - changing the code [the s/after/after_eq/]

> to effectively allow that by passing hw_margin_ms=0 in DT would probably

> work, but seems quite hacky (especially when DT is supposed to describe

> hardware).


I'm not sure if I explicitly rejected any patches, sorry it's been too
long. ;)

But I also find this idea from Christophe quite applealing, as it's
impact on the common code is minimal. Either you set the DT property
to 0. Or if you don't have the possibility to do this (no DT on your
platform?), then add a new Kconfig option to configure the default
value for "reset_period" via Kconfig.

> I've also floated the option of making get_timer() return something

> sensible even with interrupts off on ppc - that would also solve the

> problem, and would benefit other generic code that uses get_timer()

> without knowing its limitation on ppc.

> https://lists.denx.de/pipermail/u-boot/2020-June/414842.html .

> 

> So, there are lots of ways of solving it. Which direction to take really

> depends on which viewpoint one has:

> 

> (1) the watchdog code should, if rate-limiting at all, use an

> implementation that works on all platforms [the get_ticks option]

> (2) the watchdog code should, if rate-limiting at all, provide a way for

> a board/platform to override or disable that

> (3) get_timer() on ppc is broken, it needs to provide sensible results

> at all times, even when interrupts are disabled for tens or hundreds of

> milliseconds.


"hw_margin_ms=0" would be my favorite way right now (see above). If this
does not work for you, I'm also fine with the latest patch you sent to
the list.

Thanks,
Stefan
diff mbox series

Patch

diff --git a/drivers/watchdog/wdt-uclass.c b/drivers/watchdog/wdt-uclass.c
index 4cdb7bd64c..7be4e9b5bc 100644
--- a/drivers/watchdog/wdt-uclass.c
+++ b/drivers/watchdog/wdt-uclass.c
@@ -16,14 +16,15 @@  DECLARE_GLOBAL_DATA_PTR;
 
 #define WATCHDOG_TIMEOUT_SECS	(CONFIG_WATCHDOG_TIMEOUT_MSECS / 1000)
 
-/*
- * Reset every 1000ms, or however often is required as indicated by a
- * hw_margin_ms property.
- */
-static ulong reset_period = 1000;
+static u64 ratelimit_ticks;
 
 int initr_watchdog(void)
 {
+	/*
+	 * Reset every 1000ms, or however often is required as indicated by a
+	 * hw_margin_ms property.
+	 */
+	u32 reset_period = 1000;
 	u32 timeout = WATCHDOG_TIMEOUT_SECS;
 
 	/*
@@ -48,6 +49,7 @@  int initr_watchdog(void)
 						    4 * reset_period) / 4;
 	}
 
+	ratelimit_ticks = usec_to_tick(reset_period * 1000);
 	wdt_start(gd->watchdog_dev, timeout * 1000, 0);
 	gd->flags |= GD_FLG_WDT_READY;
 	printf("WDT:   Started with%s servicing (%ds timeout)\n",
@@ -117,17 +119,17 @@  int wdt_expire_now(struct udevice *dev, ulong flags)
  */
 void watchdog_reset(void)
 {
-	static ulong next_reset;
-	ulong now;
+	static u64 last_reset;
+	u64 now;
 
 	/* Exit if GD is not ready or watchdog is not initialized yet */
 	if (!gd || !(gd->flags & GD_FLG_WDT_READY))
 		return;
 
 	/* Do not reset the watchdog too often */
-	now = get_timer(0);
-	if (time_after(now, next_reset)) {
-		next_reset = now + reset_period;
+	now = get_ticks();
+	if (now - last_reset >= ratelimit_ticks) {
+		last_reset = now;
 		wdt_reset(gd->watchdog_dev);
 	}
 }