diff mbox series

[v3,09/11] i2c: npcm: Handle spurious interrupts

Message ID 20220303083141.8742-10-warp5tw@gmail.com
State Superseded
Headers show
Series [v3,01/11] arm: dts: add new property for NPCM i2c module | expand

Commit Message

Tyrone Ting March 3, 2022, 8:31 a.m. UTC
From: Tali Perry <tali.perry1@gmail.com>

In order to better handle spurious interrupts:
1. Disable incoming interrupts in master only mode.
2. Clear end of busy (EOB) after every interrupt.
3. Return correct status during interrupt.

Fixes: 56a1485b102e ("i2c: npcm7xx: Add Nuvoton NPCM I2C controller driver")
Signed-off-by: Tali Perry <tali.perry1@gmail.com>
Signed-off-by: Tyrone Ting <kfting@nuvoton.com>
---
 drivers/i2c/busses/i2c-npcm7xx.c | 92 ++++++++++++++++++++++----------
 1 file changed, 63 insertions(+), 29 deletions(-)

Comments

Avi Fishman April 4, 2022, 5:03 p.m. UTC | #1
On Thu, Mar 3, 2022 at 4:14 PM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> On Thu, Mar 03, 2022 at 02:48:20PM +0200, Tali Perry wrote:
> > > On Thu, Mar 3, 2022 at 12:37 PM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> > > >
> > > > On Thu, Mar 03, 2022 at 04:31:39PM +0800, Tyrone Ting wrote:
> > > > > From: Tali Perry <tali.perry1@gmail.com>
> > > > >
> > > > > In order to better handle spurious interrupts:
> > > > > 1. Disable incoming interrupts in master only mode.
> > > > > 2. Clear end of busy (EOB) after every interrupt.
> > > > > 3. Return correct status during interrupt.
> > > >
> > > > This is bad commit message, it doesn't explain "why" you are doing these.
>
> ...
>
> > BMC users connect a huge tree of i2c devices and muxes.
> > This tree suffers from spikes, noise and double clocks.
> > All these may cause spurious interrupts to the BMC.
> >
> > If the driver gets an IRQ which was not expected and was not handled
> > by the IRQ handler,
> > there is nothing left to do but to clear the interrupt and move on.
>
> Yes, the problem is what "move on" means in your case.
> If you get a spurious interrupts there are possibilities what's wrong:
> 1) HW bug(s)
> 2) FW bug(s)
> 3) Missed IRQ mask in the driver
> 4) Improper IRQ mask in the driver
>
> The below approach seems incorrect to me.
>

Andy, What about this explanation:
On rare cases the i2c gets a spurious interrupt which means that we
enter an interrupt but in
the interrupt handler we don't find any status bit that points to the
reason we got this interrupt.
This may be a rare case of HW issue that is still under investigation.
In order to overcome this we are doing the following:
1. Disable incoming interrupts in master mode only when slave mode is
not enabled.
2. Clear end of busy (EOB) after every interrupt.
3. Clear other status bits (just in case since we found them cleared)
4. Return correct status during the interrupt that will finish the transaction.
On next xmit transaction if the bus is still busy the master will
issue a recovery process before issuing the new transaction.
> > If the transaction failed, driver has a recovery function.
> > After that, user may retry to send the message.
> >
> > Indeed the commit message doesn't explain all this.
> > We will fix and add to the next patchset.
> >
> > > > > +     /*
> > > > > +      * if irq is not one of the above, make sure EOB is disabled and all
> > > > > +      * status bits are cleared.
> > > >
> > > > This does not explain why you hide the spurious interrupt.
> > > >
> > > > > +      */
> > > > > +     if (ret == IRQ_NONE) {
> > > > > +             npcm_i2c_eob_int(bus, false);
> > > > > +             npcm_i2c_clear_master_status(bus);
> > > > > +     }
> > > > > +
> > > > > +     return IRQ_HANDLED;
>
> --
> With Best Regards,
> Andy Shevchenko
>
>
Andy Shevchenko April 5, 2022, 7:13 a.m. UTC | #2
On Mon, Apr 04, 2022 at 08:03:44PM +0300, Avi Fishman wrote:
> On Thu, Mar 3, 2022 at 4:14 PM Andy Shevchenko
> <andriy.shevchenko@linux.intel.com> wrote:
> > On Thu, Mar 03, 2022 at 02:48:20PM +0200, Tali Perry wrote:
> > > > On Thu, Mar 3, 2022 at 12:37 PM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> > > > > On Thu, Mar 03, 2022 at 04:31:39PM +0800, Tyrone Ting wrote:
> > > > > > From: Tali Perry <tali.perry1@gmail.com>
> > > > > >
> > > > > > In order to better handle spurious interrupts:
> > > > > > 1. Disable incoming interrupts in master only mode.
> > > > > > 2. Clear end of busy (EOB) after every interrupt.
> > > > > > 3. Return correct status during interrupt.
> > > > >
> > > > > This is bad commit message, it doesn't explain "why" you are doing these.
> >
> > ...
> >
> > > BMC users connect a huge tree of i2c devices and muxes.
> > > This tree suffers from spikes, noise and double clocks.
> > > All these may cause spurious interrupts to the BMC.

(1)

> > > If the driver gets an IRQ which was not expected and was not handled
> > > by the IRQ handler,
> > > there is nothing left to do but to clear the interrupt and move on.
> >
> > Yes, the problem is what "move on" means in your case.
> > If you get a spurious interrupts there are possibilities what's wrong:
> > 1) HW bug(s)
> > 2) FW bug(s)
> > 3) Missed IRQ mask in the driver
> > 4) Improper IRQ mask in the driver
> >
> > The below approach seems incorrect to me.
> 
> Andy, What about this explanation:
> On rare cases the i2c gets a spurious interrupt which means that we
> enter an interrupt but in
> the interrupt handler we don't find any status bit that points to the
> reason we got this interrupt.
> This may be a rare case of HW issue that is still under investigation.
> In order to overcome this we are doing the following:
> 1. Disable incoming interrupts in master mode only when slave mode is
> not enabled.
> 2. Clear end of busy (EOB) after every interrupt.
> 3. Clear other status bits (just in case since we found them cleared)
> 4. Return correct status during the interrupt that will finish the transaction.
> On next xmit transaction if the bus is still busy the master will
> issue a recovery process before issuing the new transaction.

This sounds better, thanks.

One thing to clarify, the (1) states that the HW "issue" is known and becomes a
PCB level one, i.e. noisy environment that has not been properly shielded.
So, if it is known, please put the reason in the commit message.

Also would be good to see numbers of "rare". Is it 0.1%?

> > > If the transaction failed, driver has a recovery function.
> > > After that, user may retry to send the message.
> > >
> > > Indeed the commit message doesn't explain all this.
> > > We will fix and add to the next patchset.
> > >
> > > > > > +     /*
> > > > > > +      * if irq is not one of the above, make sure EOB is disabled and all
> > > > > > +      * status bits are cleared.
> > > > >
> > > > > This does not explain why you hide the spurious interrupt.
> > > > >
> > > > > > +      */
Avi Fishman April 10, 2022, 7:33 a.m. UTC | #3
On Tue, Apr 5, 2022 at 10:13 AM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> On Mon, Apr 04, 2022 at 08:03:44PM +0300, Avi Fishman wrote:
> > On Thu, Mar 3, 2022 at 4:14 PM Andy Shevchenko
> > <andriy.shevchenko@linux.intel.com> wrote:
> > > On Thu, Mar 03, 2022 at 02:48:20PM +0200, Tali Perry wrote:
> > > > > On Thu, Mar 3, 2022 at 12:37 PM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> > > > > > On Thu, Mar 03, 2022 at 04:31:39PM +0800, Tyrone Ting wrote:
> > > > > > > From: Tali Perry <tali.perry1@gmail.com>
> > > > > > >
> > > > > > > In order to better handle spurious interrupts:
> > > > > > > 1. Disable incoming interrupts in master only mode.
> > > > > > > 2. Clear end of busy (EOB) after every interrupt.
> > > > > > > 3. Return correct status during interrupt.
> > > > > >
> > > > > > This is bad commit message, it doesn't explain "why" you are doing these.
> > >
> > > ...
> > >
> > > > BMC users connect a huge tree of i2c devices and muxes.
> > > > This tree suffers from spikes, noise and double clocks.
> > > > All these may cause spurious interrupts to the BMC.
>
> (1)
>
> > > > If the driver gets an IRQ which was not expected and was not handled
> > > > by the IRQ handler,
> > > > there is nothing left to do but to clear the interrupt and move on.
> > >
> > > Yes, the problem is what "move on" means in your case.
> > > If you get a spurious interrupts there are possibilities what's wrong:
> > > 1) HW bug(s)
> > > 2) FW bug(s)
> > > 3) Missed IRQ mask in the driver
> > > 4) Improper IRQ mask in the driver
> > >
> > > The below approach seems incorrect to me.
> >
> > Andy, What about this explanation:
> > On rare cases the i2c gets a spurious interrupt which means that we
> > enter an interrupt but in
> > the interrupt handler we don't find any status bit that points to the
> > reason we got this interrupt.
> > This may be a rare case of HW issue that is still under investigation

About 1 to 100,000 transactions

> > In order to overcome this we are doing the following:
> > 1. Disable incoming interrupts in master mode only when slave mode is
> > not enabled.
> > 2. Clear end of busy (EOB) after every interrupt.
> > 3. Clear other status bits (just in case since we found them cleared)
> > 4. Return correct status during the interrupt that will finish the transaction.
> > On next xmit transaction if the bus is still busy the master will
> > issue a recovery process before issuing the new transaction.
>
> This sounds better, thanks.
>
> One thing to clarify, the (1) states that the HW "issue" is known and becomes a
> PCB level one, i.e. noisy environment that has not been properly shielded.
> So, if it is known, please put the reason in the commit message.
>

The HW issue is not known yet, we see it on few platforms and in other
platforms we don't, so the first assumption was this.
So eventually we don't want to claim this without proving it.

> Also would be good to see numbers of "rare". Is it 0.1%?

I added above the known statistics.

>
> > > > If the transaction failed, driver has a recovery function.
> > > > After that, user may retry to send the message.
> > > >
> > > > Indeed the commit message doesn't explain all this.
> > > > We will fix and add to the next patchset.
> > > >
> > > > > > > +     /*
> > > > > > > +      * if irq is not one of the above, make sure EOB is disabled and all
> > > > > > > +      * status bits are cleared.
> > > > > >
> > > > > > This does not explain why you hide the spurious interrupt.
> > > > > >
> > > > > > > +      */
>
> --
> With Best Regards,
> Andy Shevchenko
>
>
diff mbox series

Patch

diff --git a/drivers/i2c/busses/i2c-npcm7xx.c b/drivers/i2c/busses/i2c-npcm7xx.c
index 66532c680338..73cef76127c9 100644
--- a/drivers/i2c/busses/i2c-npcm7xx.c
+++ b/drivers/i2c/busses/i2c-npcm7xx.c
@@ -564,6 +564,15 @@  static inline void npcm_i2c_nack(struct npcm_i2c *bus)
 	iowrite8(val, bus->reg + NPCM_I2CCTL1);
 }
 
+static inline void npcm_i2c_clear_master_status(struct npcm_i2c *bus)
+{
+	u8 val;
+
+	/* Clear NEGACK, STASTR and BER bits */
+	val = NPCM_I2CST_BER | NPCM_I2CST_NEGACK | NPCM_I2CST_STASTR;
+	iowrite8(val, bus->reg + NPCM_I2CST);
+}
+
 #if IS_ENABLED(CONFIG_I2C_SLAVE)
 static void npcm_i2c_slave_int_enable(struct npcm_i2c *bus, bool enable)
 {
@@ -643,8 +652,8 @@  static void npcm_i2c_reset(struct npcm_i2c *bus)
 	iowrite8(NPCM_I2CCST_BB, bus->reg + NPCM_I2CCST);
 	iowrite8(0xFF, bus->reg + NPCM_I2CST);
 
-	/* Clear EOB bit */
-	iowrite8(NPCM_I2CCST3_EO_BUSY, bus->reg + NPCM_I2CCST3);
+	/* Clear and disable EOB */
+	npcm_i2c_eob_int(bus, false);
 
 	/* Clear all fifo bits: */
 	iowrite8(NPCM_I2CFIF_CTS_CLR_FIFO, bus->reg + NPCM_I2CFIF_CTS);
@@ -656,6 +665,9 @@  static void npcm_i2c_reset(struct npcm_i2c *bus)
 	}
 #endif
 
+	/* clear status bits for spurious interrupts */
+	npcm_i2c_clear_master_status(bus);
+
 	bus->state = I2C_IDLE;
 }
 
@@ -818,15 +830,6 @@  static void npcm_i2c_read_fifo(struct npcm_i2c *bus, u8 bytes_in_fifo)
 	}
 }
 
-static inline void npcm_i2c_clear_master_status(struct npcm_i2c *bus)
-{
-	u8 val;
-
-	/* Clear NEGACK, STASTR and BER bits */
-	val = NPCM_I2CST_BER | NPCM_I2CST_NEGACK | NPCM_I2CST_STASTR;
-	iowrite8(val, bus->reg + NPCM_I2CST);
-}
-
 static void npcm_i2c_master_abort(struct npcm_i2c *bus)
 {
 	/* Only current master is allowed to issue a stop condition */
@@ -1234,7 +1237,16 @@  static irqreturn_t npcm_i2c_int_slave_handler(struct npcm_i2c *bus)
 		ret = IRQ_HANDLED;
 	} /* SDAST */
 
-	return ret;
+	/*
+	 * if irq is not one of the above, make sure EOB is disabled and all
+	 * status bits are cleared.
+	 */
+	if (ret == IRQ_NONE) {
+		npcm_i2c_eob_int(bus, false);
+		npcm_i2c_clear_master_status(bus);
+	}
+
+	return IRQ_HANDLED;
 }
 
 static int npcm_i2c_reg_slave(struct i2c_client *client)
@@ -1470,6 +1482,9 @@  static void npcm_i2c_irq_handle_nack(struct npcm_i2c *bus)
 		npcm_i2c_eob_int(bus, false);
 		npcm_i2c_master_stop(bus);
 
+		/* Clear SDA Status bit (by reading dummy byte) */
+		npcm_i2c_rd_byte(bus);
+
 		/*
 		 * The bus is released from stall only after the SW clears
 		 * NEGACK bit. Then a Stop condition is sent.
@@ -1477,6 +1492,8 @@  static void npcm_i2c_irq_handle_nack(struct npcm_i2c *bus)
 		npcm_i2c_clear_master_status(bus);
 		readx_poll_timeout_atomic(ioread8, bus->reg + NPCM_I2CCST, val,
 					  !(val & NPCM_I2CCST_BUSY), 10, 200);
+		/* verify no status bits are still set after bus is released */
+		npcm_i2c_clear_master_status(bus);
 	}
 	bus->state = I2C_IDLE;
 
@@ -1675,10 +1692,10 @@  static int npcm_i2c_recovery_tgclk(struct i2c_adapter *_adap)
 	int              iter = 27;
 
 	if ((npcm_i2c_get_SDA(_adap) == 1) && (npcm_i2c_get_SCL(_adap) == 1)) {
-		dev_dbg(bus->dev, "bus%d recovery skipped, bus not stuck",
-			bus->num);
+		dev_dbg(bus->dev, "bus%d-0x%x recovery skipped, bus not stuck",
+			bus->num, bus->dest_addr);
 		npcm_i2c_reset(bus);
-		return status;
+		return 0;
 	}
 
 	npcm_i2c_int_enable(bus, false);
@@ -1912,6 +1929,7 @@  static int npcm_i2c_init_module(struct npcm_i2c *bus, enum i2c_mode mode,
 	    bus_freq_hz < I2C_FREQ_MIN_HZ || bus_freq_hz > I2C_FREQ_MAX_HZ)
 		return -EINVAL;
 
+	npcm_i2c_int_enable(bus, false);
 	npcm_i2c_disable(bus);
 
 	/* Configure FIFO mode : */
@@ -1940,10 +1958,18 @@  static int npcm_i2c_init_module(struct npcm_i2c *bus, enum i2c_mode mode,
 	val = (val | NPCM_I2CCTL1_NMINTE) & ~NPCM_I2CCTL1_RWS;
 	iowrite8(val, bus->reg + NPCM_I2CCTL1);
 
-	npcm_i2c_int_enable(bus, true);
-
 	npcm_i2c_reset(bus);
 
+	/* check HW is OK: SDA and SCL should be high at this point. */
+	if ((npcm_i2c_get_SDA(&bus->adap) == 0) ||
+	    (npcm_i2c_get_SCL(&bus->adap) == 0)) {
+		dev_err(bus->dev, "I2C%d init fail: lines are low", bus->num);
+		dev_err(bus->dev, "SDA=%d SCL=%d", npcm_i2c_get_SDA(&bus->adap),
+			npcm_i2c_get_SCL(&bus->adap));
+		return -ENXIO;
+	}
+
+	npcm_i2c_int_enable(bus, true);
 	return 0;
 }
 
@@ -1991,10 +2017,14 @@  static irqreturn_t npcm_i2c_bus_irq(int irq, void *dev_id)
 #if IS_ENABLED(CONFIG_I2C_SLAVE)
 	if (bus->slave) {
 		bus->master_or_slave = I2C_SLAVE;
-		return npcm_i2c_int_slave_handler(bus);
+		if (npcm_i2c_int_slave_handler(bus))
+			return IRQ_HANDLED;
 	}
 #endif
-	return IRQ_NONE;
+	/* clear status bits for spurious interrupts */
+	npcm_i2c_clear_master_status(bus);
+
+	return IRQ_HANDLED;
 }
 
 static bool npcm_i2c_master_start_xmit(struct npcm_i2c *bus,
@@ -2051,7 +2081,6 @@  static int npcm_i2c_master_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs,
 	u8 *write_data, *read_data;
 	u8 slave_addr;
 	unsigned long timeout;
-	int ret = 0;
 	bool read_block = false;
 	bool read_PEC = false;
 	u8 bus_busy;
@@ -2141,12 +2170,12 @@  static int npcm_i2c_master_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs,
 	bus->read_block_use = read_block;
 
 	reinit_completion(&bus->cmd_complete);
-	if (!npcm_i2c_master_start_xmit(bus, slave_addr, nwrite, nread,
-					write_data, read_data, read_PEC,
-					read_block))
-		ret = -EBUSY;
 
-	if (ret != -EBUSY) {
+	npcm_i2c_int_enable(bus, true);
+
+	if (npcm_i2c_master_start_xmit(bus, slave_addr, nwrite, nread,
+				       write_data, read_data, read_PEC,
+				       read_block)) {
 		time_left = wait_for_completion_timeout(&bus->cmd_complete,
 							timeout);
 
@@ -2160,26 +2189,31 @@  static int npcm_i2c_master_xfer(struct i2c_adapter *adap, struct i2c_msg *msgs,
 			}
 		}
 	}
-	ret = bus->cmd_err;
 
 	/* if there was BER, check if need to recover the bus: */
 	if (bus->cmd_err == -EAGAIN)
-		ret = i2c_recover_bus(adap);
+		bus->cmd_err = i2c_recover_bus(adap);
 
 	/*
 	 * After any type of error, check if LAST bit is still set,
 	 * due to a HW issue.
 	 * It cannot be cleared without resetting the module.
 	 */
-	if (bus->cmd_err &&
-	    (NPCM_I2CRXF_CTL_LAST_PEC & ioread8(bus->reg + NPCM_I2CRXF_CTL)))
+	else if (bus->cmd_err &&
+		 (NPCM_I2CRXF_CTL_LAST_PEC & ioread8(bus->reg + NPCM_I2CRXF_CTL)))
 		npcm_i2c_reset(bus);
 
+	/* after any xfer, successful or not, stall and EOB must be disabled */
+	npcm_i2c_stall_after_start(bus, false);
+	npcm_i2c_eob_int(bus, false);
+
 #if IS_ENABLED(CONFIG_I2C_SLAVE)
 	/* reenable slave if it was enabled */
 	if (bus->slave)
 		iowrite8((bus->slave->addr & 0x7F) | NPCM_I2CADDR_SAEN,
 			 bus->reg + NPCM_I2CADDR1);
+#else
+	npcm_i2c_int_enable(bus, false);
 #endif
 	return bus->cmd_err;
 }