i2c: aspeed: Nullify bus messages after timeout

Message ID	20250131222941.798065-1-eajames@linux.ibm.com
State	New
Headers	show Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D37CC1F0E5E for <linux-i2c@vger.kernel.org>; Fri, 31 Jan 2025 22:30:11 +0000 (UTC) From: Eddie James <eajames@linux.ibm.com> To: linux-i2c@vger.kernel.org Cc: openbmc@lists.ozlabs.org, ryan_chen@aspeedtech.com, benh@kernel.crashing.org, joel@jms.id.au, andi.shyti@kernel.org, andrew@codeconstruct.com.au, Eddie James <eajames@linux.ibm.com> Subject: [PATCH] i2c: aspeed: Nullify bus messages after timeout Date: Fri, 31 Jan 2025 16:29:41 -0600 Message-ID: <20250131222941.798065-1-eajames@linux.ibm.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	i2c: aspeed: Nullify bus messages after timeout \| expand i2c: aspeed: Nullify bus messages after timeout

Message ID

20250131222941.798065-1-eajames@linux.ibm.com

State

New

Headers

From: Eddie James <eajames@linux.ibm.com>
To: linux-i2c@vger.kernel.org
Cc: openbmc@lists.ozlabs.org, ryan_chen@aspeedtech.com,
        benh@kernel.crashing.org, joel@jms.id.au, andi.shyti@kernel.org,
        andrew@codeconstruct.com.au, Eddie James <eajames@linux.ibm.com>
Subject: [PATCH] i2c: aspeed: Nullify bus messages after timeout
Date: Fri, 31 Jan 2025 16:29:41 -0600
Message-ID: <20250131222941.798065-1-eajames@linux.ibm.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

i2c: aspeed: Nullify bus messages after timeout | expand

Commit Message

Eddie James Jan. 31, 2025, 10:29 p.m. UTC

For multimaster case, it's conceivable that an interrupt comes
in after the transfer times out and attempts to use bus messages
that have already been freed by the i2c core.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
---
 drivers/i2c/busses/i2c-aspeed.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Andrew Jeffery Feb. 3, 2025, 3:31 a.m. UTC | #1

On Fri, 2025-01-31 at 16:29 -0600, Eddie James wrote:
> For multimaster case, it's conceivable that an interrupt comes
> in after the transfer times out and attempts to use bus messages
> that have already been freed by the i2c core.

This description seems a little vague. Did you hit this case in
practice?

> 
> Signed-off-by: Eddie James <eajames@linux.ibm.com>
> ---
>  drivers/i2c/busses/i2c-aspeed.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/i2c/busses/i2c-aspeed.c
> b/drivers/i2c/busses/i2c-aspeed.c
> index 1550d3d552aed..e344dcc2233fe 100644
> --- a/drivers/i2c/busses/i2c-aspeed.c
> +++ b/drivers/i2c/busses/i2c-aspeed.c
> @@ -731,6 +731,7 @@ static int aspeed_i2c_master_xfer(struct
> i2c_adapter *adap,
>                  * master command.
>                  */
>                 spin_lock_irqsave(&bus->lock, flags);
> +               bus->msgs = NULL;

It feels like there's buggy code elsewhere in the driver that this is
protecting (broken state machine)? I've had a look at the
aspeed_i2c_master_irq() implementation and can't see what though, as we
take an early-exit (before indexing into bus->msgs) if bus-
>master_state is INACTIVE or PENDING.

Can you be a bit more specific about where the issue lies?

Andrew


>                 if (bus->master_state == ASPEED_I2C_MASTER_PENDING)
>                         bus->master_state =
> ASPEED_I2C_MASTER_INACTIVE;
>                 spin_unlock_irqrestore(&bus->lock, flags);

Eddie James Feb. 3, 2025, 8:29 p.m. UTC | #2

On 2/2/25 21:31, Andrew Jeffery wrote:
> On Fri, 2025-01-31 at 16:29 -0600, Eddie James wrote:
>> For multimaster case, it's conceivable that an interrupt comes
>> in after the transfer times out and attempts to use bus messages
>> that have already been freed by the i2c core.
> This description seems a little vague. Did you hit this case in
> practice?

Yes. I no longer have the back trace but it's a null pointer access in 
the interrupt handler. We had a certain system that would hit this under 
certain conditions and this patch fixed it.

I can update the commit message with some more detail.

>
>> Signed-off-by: Eddie James <eajames@linux.ibm.com>
>> ---
>>   drivers/i2c/busses/i2c-aspeed.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/i2c/busses/i2c-aspeed.c
>> b/drivers/i2c/busses/i2c-aspeed.c
>> index 1550d3d552aed..e344dcc2233fe 100644
>> --- a/drivers/i2c/busses/i2c-aspeed.c
>> +++ b/drivers/i2c/busses/i2c-aspeed.c
>> @@ -731,6 +731,7 @@ static int aspeed_i2c_master_xfer(struct
>> i2c_adapter *adap,
>>                   * master command.
>>                   */
>>                  spin_lock_irqsave(&bus->lock, flags);
>> +               bus->msgs = NULL;
> It feels like there's buggy code elsewhere in the driver that this is
> protecting (broken state machine)? I've had a look at the
> aspeed_i2c_master_irq() implementation and can't see what though, as we
> take an early-exit (before indexing into bus->msgs) if bus-
>> master_state is INACTIVE or PENDING.
> Can you be a bit more specific about where the issue lies?

I'm sure the state machine isn't perfect, yea. The bad access can happen 
like this: if a transfer times out while waiting for an interrupt, the 
aspeed_i2c_master_xfer function will either reset the engine or recover 
the bus, and exit ETIMEDOUT. Resetting the engine will turn off 
interrupts, so we're safe. But recovering the bus doesn't turn off 
interrupts, so after the function exits ETIMEDOUT, I assume what happens 
is we get the interrupt for the previous transfer and try and access the 
messages pointer, which the i2c core has already freed.

Thanks for looking!

Eddie

>
> Andrew
>
>
>>                  if (bus->master_state == ASPEED_I2C_MASTER_PENDING)
>>                          bus->master_state =
>> ASPEED_I2C_MASTER_INACTIVE;
>>                  spin_unlock_irqrestore(&bus->lock, flags);

Andrew Jeffery Feb. 4, 2025, 4:13 a.m. UTC | #3

On Mon, 2025-02-03 at 14:29 -0600, Eddie James wrote:
> 
> On 2/2/25 21:31, Andrew Jeffery wrote:
> > On Fri, 2025-01-31 at 16:29 -0600, Eddie James wrote:
> > > For multimaster case, it's conceivable that an interrupt comes
> > > in after the transfer times out and attempts to use bus messages
> > > that have already been freed by the i2c core.
> > This description seems a little vague. Did you hit this case in
> > practice?
> 
> 
> Yes. I no longer have the back trace but it's a null pointer access in 
> the interrupt handler. We had a certain system that would hit this under 
> certain conditions and this patch fixed it.
> 
> 
> I can update the commit message with some more detail.

Thanks.

> 
> 
> > 
> > > Signed-off-by: Eddie James <eajames@linux.ibm.com>
> > > ---
> > >   drivers/i2c/busses/i2c-aspeed.c | 1 +
> > >   1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/drivers/i2c/busses/i2c-aspeed.c
> > > b/drivers/i2c/busses/i2c-aspeed.c
> > > index 1550d3d552aed..e344dcc2233fe 100644
> > > --- a/drivers/i2c/busses/i2c-aspeed.c
> > > +++ b/drivers/i2c/busses/i2c-aspeed.c
> > > @@ -731,6 +731,7 @@ static int aspeed_i2c_master_xfer(struct
> > > i2c_adapter *adap,
> > >                   * master command.
> > >                   */
> > >                  spin_lock_irqsave(&bus->lock, flags);
> > > +               bus->msgs = NULL;
> > It feels like there's buggy code elsewhere in the driver that this is
> > protecting (broken state machine)? I've had a look at the
> > aspeed_i2c_master_irq() implementation and can't see what though, as we
> > take an early-exit (before indexing into bus->msgs) if bus-
> > > master_state is INACTIVE or PENDING.
> > Can you be a bit more specific about where the issue lies?
> 
> 
> I'm sure the state machine isn't perfect, yea.
> 

Right, so I think that's what should be fixed; the explicit states
define possible invariants in the implementation. We shouldn't need to
test `msgs` to know its value (whether its value is correct should be
defined by the current state).

> The bad access can happen 
> like this: if a transfer times out while waiting for an interrupt, the 
> aspeed_i2c_master_xfer function will either reset the engine or recover 
> the bus, and exit ETIMEDOUT. Resetting the engine will turn off 
> interrupts, so we're safe. But recovering the bus doesn't turn off 
> interrupts, so after the function exits ETIMEDOUT, I assume what happens 
> is we get the interrupt for the previous transfer and try and access the 
> messages pointer, which the i2c core has already freed.

So what immediately concerns me is there's no RECOVER state in `enum
aspeed_i2c_master_state` or the rest of the implementation. We do have
the PENDING state, which we don't have in hardware, so there's no
reason for RECOVER to be missing, especially since we have the RECOVER
state in hardware (I2CD14[22:19] = 0b0011).

What do you think of adding that, and testing for it in the interrupt
handler?

Andrew

diff --git a/drivers/i2c/busses/i2c-aspeed.c b/drivers/i2c/busses/i2c-aspeed.c
index 1550d3d552aed..e344dcc2233fe 100644
--- a/drivers/i2c/busses/i2c-aspeed.c
+++ b/drivers/i2c/busses/i2c-aspeed.c
@@ -731,6 +731,7 @@  static int aspeed_i2c_master_xfer(struct i2c_adapter *adap,
 		 * master command.
 		 */
 		spin_lock_irqsave(&bus->lock, flags);
+		bus->msgs = NULL;
 		if (bus->master_state == ASPEED_I2C_MASTER_PENDING)
 			bus->master_state = ASPEED_I2C_MASTER_INACTIVE;
 		spin_unlock_irqrestore(&bus->lock, flags);

i2c: aspeed: Nullify bus messages after timeout

Commit Message

Comments

Patch