diff mbox series

[RFC] improve it87_wdt (IT8784/IT8786) / keeping WDTCTRL unchanged / deactivate watchdog by setting WDTVALLSB/WDTVALMSB 0

Message ID 87b1c97177c43eeec640483cc2f83f5f4d7b1060.camel@wefi.net
State New
Headers show
Series [RFC] improve it87_wdt (IT8784/IT8786) / keeping WDTCTRL unchanged / deactivate watchdog by setting WDTVALLSB/WDTVALMSB 0 | expand

Commit Message

Werner Fischer Oct. 16, 2023, 1:16 p.m. UTC
Hi Guenter, hi Hanspeter,

I currently testing two devices with IT8784 and IT8786 watchdog timers.
Although the chips are supported by it87_wdt.c after Hanspeter's
patches back in 2020, the watchdog functionality does not work in my
following test:
- Debian 12 using Kernel 6.1.58 or current 6.6-rc
- loading module it87_wdt
- starting wd_keepalive Deamon
- killing wd_keepalive using signal 9
-> system keeps on running even after the configured watchdog timeout

For debugging purposes, I have used the patch below to report the
content of the watchdog registers 0x71 (WDTCTRL), 0x72 (WDTCFG), 0x73
(WDTVALLSB), and 0x74 (WDTVALMSB) to the system log.

It turned out, that 0x71 (WDTCTRL) has initially the following value
set (before the module changes it to 0x00):
- 8 decimal (IT8784 / IT8786)
- 4 decimal (IT8613)

I figured out, that the following code line makes the watchdog of
IT8784 and IT8786 non-functional for me:
  superio_outb(0x00, WDTCTRL);
I have removed this code in my patch below, then the watchdog works for
IT8784 and IT8786.

I'm not sure, why the WDTCTRL register is set to 0x00 in the code. As
it seems, the register can have different meanings for differnt IT8xxx
chips. Accoring to [1] it seems sufficient to set both WDTVALLSB and
WDTVALMSB to 0x00 to deactivate the watchdog timer: "When the WDT Time-
out Value register is set to a non-zero value, the WDT loads the value
and begin counting down from the value."
This happens e.g. also when wd_keepalive is stopped cleanly.

I am open to support to improve the it87_wdt code.

But before I'm writing and sending a patch, I have the following
question:
* What is the reason, why WDTCTRL is set to 0x00 in the code? and
* Could we think about removing this (at least for IT8784/8786)?

Best regards,
Werner

[1] https://www.ite.com.tw/uploads/product_download/IT8781_A_V0.2.2_121712.pdf
---

Comments

Werner Fischer Nov. 7, 2023, 1:18 p.m. UTC | #1
Hi Guenter, hi Wim, hi Hanspeter,

On Mon, 2023-10-16 at 15:16 +0200, Werner Fischer wrote:
> Hi Guenter, hi Hanspeter,
> 
> I currently testing two devices with IT8784 and IT8786 watchdog
> timers.
> Although the chips are supported by it87_wdt.c after Hanspeter's
> patches back in 2020, the watchdog functionality does not work in my
> following test:
> - Debian 12 using Kernel 6.1.58 or current 6.6-rc
> - loading module it87_wdt
> - starting wd_keepalive Deamon
> - killing wd_keepalive using signal 9
> -> system keeps on running even after the configured watchdog timeout
> 
> For debugging purposes, I have used the patch below to report the
> content of the watchdog registers 0x71 (WDTCTRL), 0x72 (WDTCFG), 0x73
> (WDTVALLSB), and 0x74 (WDTVALMSB) to the system log.
> 
> It turned out, that 0x71 (WDTCTRL) has initially the following value
> set (before the module changes it to 0x00):
> - 8 decimal (IT8784 / IT8786)
> - 4 decimal (IT8613)
> 
> I figured out, that the following code line makes the watchdog of
> IT8784 and IT8786 non-functional for me:
>   superio_outb(0x00, WDTCTRL);
> I have removed this code in my patch below, then the watchdog works
> for IT8784 and IT8786.
> 
> I'm not sure, why the WDTCTRL register is set to 0x00 in the code. As
> it seems, the register can have different meanings for differnt
> IT8xxx chips. Accoring to [1] it seems sufficient to set both
> WDTVALLSB and WDTVALMSB to 0x00 to deactivate the watchdog timer:
> "When the WDT Time-out Value register is set to a non-zero value, the
> WDT loads the value and begin counting down from the value."
> This happens e.g. also when wd_keepalive is stopped cleanly.
> 
> I am open to support to improve the it87_wdt code.
> 
> But before I'm writing and sending a patch, I have the following
> question:
> * What is the reason, why WDTCTRL is set to 0x00 in the code? and
> * Could we think about removing this (at least for IT8784/8786)?
It seems to me that setting WDTCTRL to 0x00 has been in the code from
the beginning.

For my test systems with IT8784 and IT8786 I got the following
information from the system vendor:
"71H bit 3 is the mode choice for the clock input of the IT8784/IT8786
chip. This bit is set to 1 (= PCICLK mode) and can not be set to 0."
Setting it to 0 breaks the watchdog functionality.

Unfortunately, ITE does not provide the specifications PDFs publicly
anymore. But the documentation at [2] provides details regarding the
Watchdog Timer Control Register (71h) of an ITE chip, which has the
description "External CLK_IN Select: 1: PCICLK" for bit 3, too.

As it seems system-dependent, removing
  superio_outb(0x00, WDTCTRL);
from the code may lead to problems with other ITE chips, which maybe
could need WDTCTRL set to 0x00.

So my idea to be on the safe side for exiting users of it87_wdt, too:
* What do you think about an optional module parameter to let the user
  choose to leave WDTCTRL untouched? (this would make the watchdog work
  e.g. with my test systems with IT8784 and IT8786, too)

In case you would support such a code change, I would be happy to write
a patch (including some other minor/cosmetic code fixes/addition of
further ITE chips).

Best regards,
Werner

[2] http://files.nexcom.com/Driver/NDiSB425/User_Manual_NDiSB425-SI3_170111.pdf
Guenter Roeck Nov. 7, 2023, 2:34 p.m. UTC | #2
On 11/7/23 05:18, Werner Fischer wrote:
> Hi Guenter, hi Wim, hi Hanspeter,
> 
> On Mon, 2023-10-16 at 15:16 +0200, Werner Fischer wrote:
>> Hi Guenter, hi Hanspeter,
>>
>> I currently testing two devices with IT8784 and IT8786 watchdog
>> timers.
>> Although the chips are supported by it87_wdt.c after Hanspeter's
>> patches back in 2020, the watchdog functionality does not work in my
>> following test:
>> - Debian 12 using Kernel 6.1.58 or current 6.6-rc
>> - loading module it87_wdt
>> - starting wd_keepalive Deamon
>> - killing wd_keepalive using signal 9
>> -> system keeps on running even after the configured watchdog timeout
>>
>> For debugging purposes, I have used the patch below to report the
>> content of the watchdog registers 0x71 (WDTCTRL), 0x72 (WDTCFG), 0x73
>> (WDTVALLSB), and 0x74 (WDTVALMSB) to the system log.
>>
>> It turned out, that 0x71 (WDTCTRL) has initially the following value
>> set (before the module changes it to 0x00):
>> - 8 decimal (IT8784 / IT8786)
>> - 4 decimal (IT8613)
>>
>> I figured out, that the following code line makes the watchdog of
>> IT8784 and IT8786 non-functional for me:
>>    superio_outb(0x00, WDTCTRL);
>> I have removed this code in my patch below, then the watchdog works
>> for IT8784 and IT8786.
>>
>> I'm not sure, why the WDTCTRL register is set to 0x00 in the code. As
>> it seems, the register can have different meanings for differnt
>> IT8xxx chips. Accoring to [1] it seems sufficient to set both
>> WDTVALLSB and WDTVALMSB to 0x00 to deactivate the watchdog timer:
>> "When the WDT Time-out Value register is set to a non-zero value, the
>> WDT loads the value and begin counting down from the value."
>> This happens e.g. also when wd_keepalive is stopped cleanly.
>>
>> I am open to support to improve the it87_wdt code.
>>
>> But before I'm writing and sending a patch, I have the following
>> question:
>> * What is the reason, why WDTCTRL is set to 0x00 in the code? and
>> * Could we think about removing this (at least for IT8784/8786)?
> It seems to me that setting WDTCTRL to 0x00 has been in the code from
> the beginning.
> 
> For my test systems with IT8784 and IT8786 I got the following
> information from the system vendor:
> "71H bit 3 is the mode choice for the clock input of the IT8784/IT8786
> chip. This bit is set to 1 (= PCICLK mode) and can not be set to 0."
> Setting it to 0 breaks the watchdog functionality.
> 
> Unfortunately, ITE does not provide the specifications PDFs publicly
> anymore. But the documentation at [2] provides details regarding the

They really never did, or at least not for a long time. Some board
vendors used to be Linux-friendly and provided datasheets on request,
but that is no longer the case. My recommendation used to be, and still is,
not to use boards with Super-IO chips from ITE to run Linux. This is not
only due to lack of datasheets, but also due to the lack of support from
both chip and board vendors if there are any issues when trying to support
the chips in Linux.

> Watchdog Timer Control Register (71h) of an ITE chip, which has the
> description "External CLK_IN Select: 1: PCICLK" for bit 3, too.
> 
> As it seems system-dependent, removing
>    superio_outb(0x00, WDTCTRL);
> from the code may lead to problems with other ITE chips, which maybe
> could need WDTCTRL set to 0x00.
> 

Bit 4..7 of the register are used to control watchdog timer resets (pings).
Skipping the write entirely is therefore unacceptable even for IT8784/IT8786
because we _don't_ want activity on a (legacy) keyboard, mouse, game,
or infrared port to reset the watchdog timer.

> So my idea to be on the safe side for exiting users of it87_wdt, too:
> * What do you think about an optional module parameter to let the user
>    choose to leave WDTCTRL untouched? (this would make the watchdog work
>    e.g. with my test systems with IT8784 and IT8786, too)
> 

Make it conditional for IT8784/IT8786: On those chips, read the value from
the chip and clear all but bit 3. This is the safest we can do. Future
chips can be added as needed.

Guenter
diff mbox series

Patch

diff --git a/drivers/watchdog/it87_wdt.c b/drivers/watchdog/it87_wdt.c
index bb1122909396..4ea5505b1771 100644
--- a/drivers/watchdog/it87_wdt.c
+++ b/drivers/watchdog/it87_wdt.c
@@ -50,6 +50,7 @@ 
 /* Chip Id numbers */
 #define NO_DEV_ID	0xffff
 #define IT8607_ID	0x8607
+#define IT8613_ID	0x8613
 #define IT8620_ID	0x8620
 #define IT8622_ID	0x8622
 #define IT8625_ID	0x8625
@@ -152,9 +153,26 @@  static inline int superio_inw(int reg)
 	return val;
 }
 
+static void wdt_debug_registers(unsigned int debugstage)
+{
+	unsigned int wdt_ctrl;
+	unsigned int wdt_cfg;
+	unsigned int wdt_vallsb;
+	unsigned int wdt_valmsb;
+	wdt_ctrl = superio_inb(WDTCTRL);
+	wdt_cfg = superio_inb(WDTCFG);
+	wdt_vallsb = superio_inb(WDTVALLSB);
+	wdt_valmsb = superio_inb(WDTVALMSB);
+	pr_info("DEBUG at stage %d: wdt_ctrl=%d; wdt_cfg=%d; wdt_vallsb=%d, wdt_valmsb=%d\n", debugstage, wdt_ctrl, wdt_cfg, wdt_vallsb, wdt_valmsb);
+}
+
 /* Internal function, should be called after superio_select(GPIO) */
 static void _wdt_update_timeout(unsigned int t)
 {
+	wdt_debug_registers(3);
+	/* test werner reset */
+	/* superio_outb(0x00, WDTCTRL); */
+	wdt_debug_registers(9);
 	unsigned char cfg = WDT_KRST;
 
 	if (testmode)
@@ -172,6 +190,7 @@  static void _wdt_update_timeout(unsigned int t)
 	superio_outb(t, WDTVALLSB);
 	if (max_units > 255)
 		superio_outb(t >> 8, WDTVALMSB);
+	wdt_debug_registers(4);
 }
 
 static int wdt_update_timeout(unsigned int t)
@@ -258,11 +277,13 @@  static int __init it87_wdt_init(void)
 	int rc;
 
 	rc = superio_enter();
+	wdt_debug_registers(0);
 	if (rc)
 		return rc;
 
 	chip_type = superio_inw(CHIPID);
 	chip_rev  = superio_inb(CHIPREV) & 0x0f;
+	wdt_debug_registers(1);
 	superio_exit();
 
 	switch (chip_type) {
@@ -274,9 +295,8 @@  static int __init it87_wdt_init(void)
 		break;
 	case IT8716_ID:
 	case IT8726_ID:
-		max_units = 65535;
-		break;
 	case IT8607_ID:
+	case IT8613_ID:
 	case IT8620_ID:
 	case IT8622_ID:
 	case IT8625_ID:
@@ -310,10 +330,12 @@  static int __init it87_wdt_init(void)
 	rc = superio_enter();
 	if (rc)
 		return rc;
-
+	wdt_debug_registers(10);
 	superio_select(GPIO);
+	wdt_debug_registers(11);
 	superio_outb(WDT_TOV1, WDTCFG);
-	superio_outb(0x00, WDTCTRL);
+	/* superio_outb(0x00, WDTCTRL); */
+	wdt_debug_registers(2);
 	superio_exit();
 
 	if (timeout < 1 || timeout > max_units * 60) {