diff mbox series

watchdog: allow overriding the rate-limiting logic

Message ID 20200312114034.20016-1-rasmus.villemoes@prevas.dk
State New
Headers show
Series watchdog: allow overriding the rate-limiting logic | expand

Commit Message

Rasmus Villemoes March 12, 2020, 11:40 a.m. UTC
On the MPC8309-derived board I'm currently working on, the current
rate-limiting logic in the generic watchdog_reset function poses two
problems:

First, the hard-coded limit of 1000ms is too large for the
external GPIO-triggered watchdog we have.

Second, in the SPL, the decrementer interrupt is not enabled, so the
get_timer() call calls always returns 0, so wdt_reset() is never
actually called. Enabling that interrupt (i.e. calling
interrupt_init() somewhere in my early board code) is a bit
problematic, since U-Boot proper gets loaded to address 0, hence
overwriting exception vectors - and the reason I even need to care
about the watchdog in the SPL is that the signature verification takes
a rather long time, so just disabling interrupts before loading U-Boot
proper doesn't work.

Both problems can be solved by allowing the board to override the
rate-limiting logic. For example, in my case I will implement the
function in terms of get_ticks() (i.e. the time base registers on
PPC) which do not depend on interrupts, and use a threshold of
gd->bus_clk / (4*16) ticks (~62ms).

Signed-off-by: Rasmus Villemoes <rasmus.villemoes at prevas.dk>
---

This is on top of https://patchwork.ozlabs.org/patch/1242772/, but
it's obviously trivial to do on master instead.

 drivers/watchdog/wdt-uclass.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

Comments

Stefan Roese March 12, 2020, 11:58 a.m. UTC | #1
Hi Rasmus,

(added Christophe to Cc)

On 12.03.20 12:40, Rasmus Villemoes wrote:
> On the MPC8309-derived board I'm currently working on, the current
> rate-limiting logic in the generic watchdog_reset function poses two
> problems:
> 
> First, the hard-coded limit of 1000ms is too large for the
> external GPIO-triggered watchdog we have.

I remember a discussion a few weeks ago with Christophe, where you
pointed out, that there is already the official "hw_margin_ms" DT
property for this delay here. Why don't you introduce it here so that
all boards can use it as well?

> Second, in the SPL, the decrementer interrupt is not enabled, so the
> get_timer() call calls always returns 0, so wdt_reset() is never
> actually called. Enabling that interrupt (i.e. calling
> interrupt_init() somewhere in my early board code) is a bit
> problematic, since U-Boot proper gets loaded to address 0, hence
> overwriting exception vectors - and the reason I even need to care
> about the watchdog in the SPL is that the signature verification takes
> a rather long time, so just disabling interrupts before loading U-Boot
> proper doesn't work.
> 
> Both problems can be solved by allowing the board to override the
> rate-limiting logic. For example, in my case I will implement the
> function in terms of get_ticks() (i.e. the time base registers on
> PPC) which do not depend on interrupts, and use a threshold of
> gd->bus_clk / (4*16) ticks (~62ms).

I would assume that you might run into multiple issues, when your timer
infrastructure is not working correctly in SPL. Can't you instead "fix"
this by using this get_ticks() option for the get_timer() functions in
SPL so that you can use the common functions here and in all other
places in SPL as well?

> Signed-off-by: Rasmus Villemoes <rasmus.villemoes at prevas.dk>
> ---
> 
> This is on top of https://patchwork.ozlabs.org/patch/1242772/, but
> it's obviously trivial to do on master instead.
> 
>   drivers/watchdog/wdt-uclass.c | 21 ++++++++++++++-------
>   1 file changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/watchdog/wdt-uclass.c b/drivers/watchdog/wdt-uclass.c
> index 309a0e9c5b..ad53e86b80 100644
> --- a/drivers/watchdog/wdt-uclass.c
> +++ b/drivers/watchdog/wdt-uclass.c
> @@ -67,6 +67,19 @@ int wdt_expire_now(struct udevice *dev, ulong flags)
>   }
>   
>   #if defined(CONFIG_WATCHDOG)
> +__weak int watchdog_reset_ratelimit(void)
> +{
> +	static ulong next_reset;
> +	ulong now;
> +
> +	now = get_timer(0);
> +	if (time_after(now, next_reset)) {
> +		next_reset = now + 1000;	/* reset every 1000ms */
> +		return 1;
> +	}
> +	return 0;
> +}
> +
>   /*
>    * Called by macro WATCHDOG_RESET. This function be called *very* early,
>    * so we need to make sure, that the watchdog driver is ready before using
> @@ -74,19 +87,13 @@ int wdt_expire_now(struct udevice *dev, ulong flags)
>    */
>   void watchdog_reset(void)
>   {
> -	static ulong next_reset;
> -	ulong now;
> -
>   	/* Exit if GD is not ready or watchdog is not initialized yet */
>   	if (!gd || !(gd->flags & GD_FLG_WDT_READY))
>   		return;
>   
>   	/* Do not reset the watchdog too often */
> -	now = get_timer(0);
> -	if (time_after(now, next_reset)) {
> -		next_reset = now + 1000;	/* reset every 1000ms */
> +	if (watchdog_reset_ratelimit())
>   		wdt_reset(gd->watchdog_dev);
> -	}
>   }
>   #endif
>   
> 

Thanks,
Stefan
Rasmus Villemoes March 12, 2020, 3:36 p.m. UTC | #2
On 12/03/2020 12.58, Stefan Roese wrote:
> Hi Rasmus,
> 
> (added Christophe to Cc)
> 
> On 12.03.20 12:40, Rasmus Villemoes wrote:
>> On the MPC8309-derived board I'm currently working on, the current
>> rate-limiting logic in the generic watchdog_reset function poses two
>> problems:
>>
>> First, the hard-coded limit of 1000ms is too large for the
>> external GPIO-triggered watchdog we have.
> 
> I remember a discussion a few weeks ago with Christophe, where you
> pointed out, that there is already the official "hw_margin_ms" DT
> property for this delay here. Why don't you introduce it here so that
> all boards can use it as well?

Well, I considered that. Reading the value is probably just adding a
dev_read_u32_default(..., 1000) next to the one reading the timeout-sec
property in initr_watchdog(). But what is a good place to stash the
result? It really belongs with the device, but as long as only one is
supported I suppose one could move initr_watchdog to wdt-uclass.c and
use a static variable in that file.

I can certainly do that. That leaves the other problem.

>> Second, in the SPL, the decrementer interrupt is not enabled, so the
>> get_timer() call calls always returns 0, so wdt_reset() is never
>> actually called. Enabling that interrupt (i.e. calling
>> interrupt_init() somewhere in my early board code) is a bit
>> problematic, since U-Boot proper gets loaded to address 0, hence
>> overwriting exception vectors - and the reason I even need to care
>> about the watchdog in the SPL is that the signature verification takes
>> a rather long time, so just disabling interrupts before loading U-Boot
>> proper doesn't work.
>>
>> Both problems can be solved by allowing the board to override the
>> rate-limiting logic. For example, in my case I will implement the
>> function in terms of get_ticks() (i.e. the time base registers on
>> PPC) which do not depend on interrupts, and use a threshold of
>> gd->bus_clk / (4*16) ticks (~62ms).
> 
> I would assume that you might run into multiple issues, when your timer
> infrastructure is not working correctly in SPL. 

None so far, except for the ratelimiting in terms of get_timer() not
working.

> Can't you instead "fix"
> this by using this get_ticks() option for the get_timer() functions in
> SPL so that you can use the common functions here and in all other
> places in SPL as well?

I don't understand what you mean. On PPC, get_timer(0) just returns a
static variable which is incremented from the timer_interrupt(). When
that interrupt is not enabled, that variable is always 0.

I can enable the timer interrupt and get the time (in terms of
get_timer) going, but as I wrote, I would have to disable that interrupt
again before actually loading U-Boot [you can probably imagine the fun
of debugging what happens when one overwrites the exception vectors], so
time would stop passing as far as get_timer() is concerned, which means
that again the watchdog would no longer be serviced. So if there is to
be any ratelimiting at all, it really has to be based on a time that
does tick without relying on interrupts.

Rasmus
Rasmus Villemoes March 13, 2020, 4:04 p.m. UTC | #3
Some watchdogs must be reset more often than the once-per-second
ratelimit used by the generic watchdog_reset function in
wdt-uclass.c. There's precedent (from the gpio-wdt driver in linux)
for using a property called hw_margin_ms to let the device tree tell
the driver how often the device needs resetting. So use that
generically. No change in default behaviour.

On top of https://patchwork.ozlabs.org/patch/1242772/ .

Stefan, something like this? That at least solves half my problems and
might be useful to others as well. Then I'll have to figure out the
time-stands-still problem in some other way.

Rasmus Villemoes (3):
  watchdog: remove stale ifndef CONFIG_WATCHDOG_TIMEOUT_MSECS from wdt.h
  watchdog: move initr_watchdog() to wdt-uclass.c
  watchdog: honour hw_margin_ms DT property

 drivers/watchdog/wdt-uclass.c | 43 ++++++++++++++++++++++++++++++++++-
 include/wdt.h                 | 38 +------------------------------
 2 files changed, 43 insertions(+), 38 deletions(-)
Stefan Roese March 14, 2020, 12:04 p.m. UTC | #4
On 13.03.20 17:04, Rasmus Villemoes wrote:
> Some watchdogs must be reset more often than the once-per-second
> ratelimit used by the generic watchdog_reset function in
> wdt-uclass.c. There's precedent (from the gpio-wdt driver in linux)
> for using a property called hw_margin_ms to let the device tree tell
> the driver how often the device needs resetting. So use that
> generically. No change in default behaviour.
> 
> On top of https://patchwork.ozlabs.org/patch/1242772/ .
> 
> Stefan, something like this?

Yes, thanks for looking into this.

> That at least solves half my problems and
> might be useful to others as well. Then I'll have to figure out the
> time-stands-still problem in some other way.

If its too hard to enable interrupts in SPL for you or to provide some
other means of a working get_timer() API, then we needto find another
solution. You started with this weak function, which of course works.
What other options are there? Adding a callback mechanism to register
platform specific callback functions? Even though this might get a
little bit too complicated.

Thanks,
Stefan
Rasmus Villemoes March 16, 2020, 3:52 p.m. UTC | #5
On 14/03/2020 13.04, Stefan Roese wrote:
> On 13.03.20 17:04, Rasmus Villemoes wrote:

>> That at least solves half my problems and
>> might be useful to others as well. Then I'll have to figure out the
>> time-stands-still problem in some other way.
> 
> If its too hard to enable interrupts in SPL for you or to provide some
> other means of a working get_timer() API, then we needto find another
> solution. You started with this weak function, which of course works.
> What other options are there? Adding a callback mechanism to register
> platform specific callback functions? Even though this might get a
> little bit too complicated.

Now that I dig a bit more into this, I seem to remember that we actually
also had problems in U-Boot proper when loading a compressed kernel, so
for now we're using an uncompressed kernel in our FIT images. I will
have to re-investigate, but it now occurs to me that it might be due to
the fact that interrupts get disabled during bootm (which makes sense,
the same reason I stated previously of interrupt vectors about to be
overwritten), so even in U-Boot proper, time as measured by get_timer()
ceases to pass after that point, so all the WATCHDOG_RESET() calls from
the inflate code effectively get ignored.

So it may be necessary to have some wdt_ratelimit_disable() hook that
can be called from bootm_disable_interrupts() and e.g. some
board-specific SPL code. I'll do some experiments and figure out if I do
indeed need such a hook.

Rasmus
diff mbox series

Patch

diff --git a/drivers/watchdog/wdt-uclass.c b/drivers/watchdog/wdt-uclass.c
index 309a0e9c5b..ad53e86b80 100644
--- a/drivers/watchdog/wdt-uclass.c
+++ b/drivers/watchdog/wdt-uclass.c
@@ -67,6 +67,19 @@  int wdt_expire_now(struct udevice *dev, ulong flags)
 }
 
 #if defined(CONFIG_WATCHDOG)
+__weak int watchdog_reset_ratelimit(void)
+{
+	static ulong next_reset;
+	ulong now;
+
+	now = get_timer(0);
+	if (time_after(now, next_reset)) {
+		next_reset = now + 1000;	/* reset every 1000ms */
+		return 1;
+	}
+	return 0;
+}
+
 /*
  * Called by macro WATCHDOG_RESET. This function be called *very* early,
  * so we need to make sure, that the watchdog driver is ready before using
@@ -74,19 +87,13 @@  int wdt_expire_now(struct udevice *dev, ulong flags)
  */
 void watchdog_reset(void)
 {
-	static ulong next_reset;
-	ulong now;
-
 	/* Exit if GD is not ready or watchdog is not initialized yet */
 	if (!gd || !(gd->flags & GD_FLG_WDT_READY))
 		return;
 
 	/* Do not reset the watchdog too often */
-	now = get_timer(0);
-	if (time_after(now, next_reset)) {
-		next_reset = now + 1000;	/* reset every 1000ms */
+	if (watchdog_reset_ratelimit())
 		wdt_reset(gd->watchdog_dev);
-	}
 }
 #endif