diff mbox series

[v7,2/5] i2c: core: run atomic i2c xfer when !preemptible

Message ID 20230327-tegra-pmic-reboot-v7-2-18699d5dcd76@skidata.com
State New
Headers show
Series mfd: tps6586x: register restart handler | expand

Commit Message

Benjamin Bara July 15, 2023, 7:53 a.m. UTC
From: Benjamin Bara <benjamin.bara@skidata.com>

Since bae1d3a05a8b, i2c transfers are non-atomic if preemption is
disabled. However, non-atomic i2c transfers require preemption (e.g. in
wait_for_completion() while waiting for the DMA).

panic() calls preempt_disable_notrace() before calling
emergency_restart(). Therefore, if an i2c device is used for the
restart, the xfer should be atomic. This avoids warnings like:

[   12.667612] WARNING: CPU: 1 PID: 1 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x33c/0x6b0
[   12.676926] Voluntary context switch within RCU read-side critical section!
...
[   12.742376]  schedule_timeout from wait_for_completion_timeout+0x90/0x114
[   12.749179]  wait_for_completion_timeout from tegra_i2c_wait_completion+0x40/0x70
...
[   12.994527]  atomic_notifier_call_chain from machine_restart+0x34/0x58
[   13.001050]  machine_restart from panic+0x2a8/0x32c

Use !preemptible() instead, which is basically the same check as
pre-v5.2.

Fixes: bae1d3a05a8b ("i2c: core: remove use of in_atomic()")
Cc: stable@vger.kernel.org # v5.2+
Suggested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Acked-by: Wolfram Sang <wsa@kernel.org>
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Tested-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Benjamin Bara <benjamin.bara@skidata.com>
---
 drivers/i2c/i2c-core.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Chris Morgan Nov. 13, 2023, 1:12 a.m. UTC | #1
On Sat, Jul 15, 2023 at 09:53:24AM +0200, Benjamin Bara wrote:
> From: Benjamin Bara <benjamin.bara@skidata.com>
> 
> Since bae1d3a05a8b, i2c transfers are non-atomic if preemption is
> disabled. However, non-atomic i2c transfers require preemption (e.g. in
> wait_for_completion() while waiting for the DMA).
> 
> panic() calls preempt_disable_notrace() before calling
> emergency_restart(). Therefore, if an i2c device is used for the
> restart, the xfer should be atomic. This avoids warnings like:
> 
> [   12.667612] WARNING: CPU: 1 PID: 1 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x33c/0x6b0
> [   12.676926] Voluntary context switch within RCU read-side critical section!
> ...
> [   12.742376]  schedule_timeout from wait_for_completion_timeout+0x90/0x114
> [   12.749179]  wait_for_completion_timeout from tegra_i2c_wait_completion+0x40/0x70
> ...
> [   12.994527]  atomic_notifier_call_chain from machine_restart+0x34/0x58
> [   13.001050]  machine_restart from panic+0x2a8/0x32c
> 
> Use !preemptible() instead, which is basically the same check as
> pre-v5.2.
> 
> Fixes: bae1d3a05a8b ("i2c: core: remove use of in_atomic()")
> Cc: stable@vger.kernel.org # v5.2+
> Suggested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
> Acked-by: Wolfram Sang <wsa@kernel.org>
> Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
> Tested-by: Nishanth Menon <nm@ti.com>
> Signed-off-by: Benjamin Bara <benjamin.bara@skidata.com>

For kernel 6.7 I'm having an issue when I shutdown or reboot my
Rockchip RK3326 or Rockchip RK3566 based devices, and I've bisected
the issue down to this specific commit.

When I shutdown or restart the device, I receive messages in the kernel
log like the following:

[   37.121148] rk3x-i2c fdd40000.i2c: irq in STATE_IDLE, ipd = 0x3
[   37.122178] rk3x-i2c fdd40000.i2c: irq in STATE_IDLE, ipd = 0x3
[   37.123212] rk3x-i2c fdd40000.i2c: irq in STATE_IDLE, ipd = 0x3
[   37.124226] rk3x-i2c fdd40000.i2c: irq in STATE_IDLE, ipd = 0x3
[   37.125242] rk3x-i2c fdd40000.i2c: irq in STATE_IDLE, ipd = 0x3
[   37.126133] rk3x-i2c fdd40000.i2c: irq in STATE_IDLE, ipd = 0x1

The device will also occasionally freeze instead of rebooting or
shutting down. The i2c errors are consistent, but the freezing
behavior is not.

Thank you.

> ---
>  drivers/i2c/i2c-core.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/i2c/i2c-core.h b/drivers/i2c/i2c-core.h
> index 1247e6e6e975..05b8b8dfa9bd 100644
> --- a/drivers/i2c/i2c-core.h
> +++ b/drivers/i2c/i2c-core.h
> @@ -29,7 +29,7 @@ int i2c_dev_irq_from_resources(const struct resource *resources,
>   */
>  static inline bool i2c_in_atomic_xfer_mode(void)
>  {
> -	return system_state > SYSTEM_RUNNING && irqs_disabled();
> +	return system_state > SYSTEM_RUNNING && !preemptible();
>  }
>  
>  static inline int __i2c_lock_bus_helper(struct i2c_adapter *adap)
> 
> -- 
> 2.34.1
>
Dmitry Osipenko Nov. 13, 2023, 3:46 a.m. UTC | #2
On 11/13/23 04:12, Chris Morgan wrote:
> On Sat, Jul 15, 2023 at 09:53:24AM +0200, Benjamin Bara wrote:
>> From: Benjamin Bara <benjamin.bara@skidata.com>
>>
>> Since bae1d3a05a8b, i2c transfers are non-atomic if preemption is
>> disabled. However, non-atomic i2c transfers require preemption (e.g. in
>> wait_for_completion() while waiting for the DMA).
>>
>> panic() calls preempt_disable_notrace() before calling
>> emergency_restart(). Therefore, if an i2c device is used for the
>> restart, the xfer should be atomic. This avoids warnings like:
>>
>> [   12.667612] WARNING: CPU: 1 PID: 1 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x33c/0x6b0
>> [   12.676926] Voluntary context switch within RCU read-side critical section!
>> ...
>> [   12.742376]  schedule_timeout from wait_for_completion_timeout+0x90/0x114
>> [   12.749179]  wait_for_completion_timeout from tegra_i2c_wait_completion+0x40/0x70
>> ...
>> [   12.994527]  atomic_notifier_call_chain from machine_restart+0x34/0x58
>> [   13.001050]  machine_restart from panic+0x2a8/0x32c
>>
>> Use !preemptible() instead, which is basically the same check as
>> pre-v5.2.
>>
>> Fixes: bae1d3a05a8b ("i2c: core: remove use of in_atomic()")
>> Cc: stable@vger.kernel.org # v5.2+
>> Suggested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
>> Acked-by: Wolfram Sang <wsa@kernel.org>
>> Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
>> Tested-by: Nishanth Menon <nm@ti.com>
>> Signed-off-by: Benjamin Bara <benjamin.bara@skidata.com>
> 
> For kernel 6.7 I'm having an issue when I shutdown or reboot my
> Rockchip RK3326 or Rockchip RK3566 based devices, and I've bisected
> the issue down to this specific commit.
> 
> When I shutdown or restart the device, I receive messages in the kernel
> log like the following:
> 
> [   37.121148] rk3x-i2c fdd40000.i2c: irq in STATE_IDLE, ipd = 0x3
> [   37.122178] rk3x-i2c fdd40000.i2c: irq in STATE_IDLE, ipd = 0x3
> [   37.123212] rk3x-i2c fdd40000.i2c: irq in STATE_IDLE, ipd = 0x3
> [   37.124226] rk3x-i2c fdd40000.i2c: irq in STATE_IDLE, ipd = 0x3
> [   37.125242] rk3x-i2c fdd40000.i2c: irq in STATE_IDLE, ipd = 0x3
> [   37.126133] rk3x-i2c fdd40000.i2c: irq in STATE_IDLE, ipd = 0x1
> 
> The device will also occasionally freeze instead of rebooting or
> shutting down. The i2c errors are consistent, but the freezing
> behavior is not.

I couldn't reproduce your issue with v6.7-rc1 and RK3399 that also uses rk3x-i2c. Though, the rk3x-i2c driver looks suspicious. Please try this patch:

diff --git a/drivers/i2c/busses/i2c-rk3x.c b/drivers/i2c/busses/i2c-rk3x.c
index a044ca0c35a1..aad00e9909cc 100644
--- a/drivers/i2c/busses/i2c-rk3x.c
+++ b/drivers/i2c/busses/i2c-rk3x.c
@@ -219,6 +219,8 @@ struct rk3x_i2c {
 	enum rk3x_i2c_state state;
 	unsigned int processed;
 	int error;
+
+	int irq;
 };
 
 static inline void i2c_writel(struct rk3x_i2c *i2c, u32 value,
@@ -1090,8 +1092,10 @@ static int rk3x_i2c_xfer_common(struct i2c_adapter *adap,
 		rk3x_i2c_start(i2c);
 
 		if (!polling) {
+			enable_irq(i2c->irq);
 			timeout = wait_event_timeout(i2c->wait, !i2c->busy,
 						     msecs_to_jiffies(WAIT_TIMEOUT));
+			disable_irq(i2c->irq);
 		} else {
 			timeout = rk3x_i2c_wait_xfer_poll(i2c);
 		}
@@ -1236,7 +1240,6 @@ static int rk3x_i2c_probe(struct platform_device *pdev)
 	int ret = 0;
 	int bus_nr;
 	u32 value;
-	int irq;
 	unsigned long clk_rate;
 
 	i2c = devm_kzalloc(&pdev->dev, sizeof(struct rk3x_i2c), GFP_KERNEL);
@@ -1299,11 +1302,14 @@ static int rk3x_i2c_probe(struct platform_device *pdev)
 	}
 
 	/* IRQ setup */
-	irq = platform_get_irq(pdev, 0);
-	if (irq < 0)
-		return irq;
+	i2c->irq = platform_get_irq(pdev, 0);
+	if (i2c->irq < 0)
+		return i2c->irq;
+
+	/* interrupt will be enabled during of transfer time */
+	irq_set_status_flags(i2c->irq, IRQ_NOAUTOEN);
 
-	ret = devm_request_irq(&pdev->dev, irq, rk3x_i2c_irq,
+	ret = devm_request_irq(&pdev->dev, i2c->irq, rk3x_i2c_irq,
 			       0, dev_name(&pdev->dev), i2c);
 	if (ret < 0) {
 		dev_err(&pdev->dev, "cannot request IRQ\n");
Chris Morgan Nov. 13, 2023, 2:54 p.m. UTC | #3
On Mon, Nov 13, 2023 at 06:46:30AM +0300, Dmitry Osipenko wrote:
> On 11/13/23 04:12, Chris Morgan wrote:
> > On Sat, Jul 15, 2023 at 09:53:24AM +0200, Benjamin Bara wrote:
> >> From: Benjamin Bara <benjamin.bara@skidata.com>
> >>
> >> Since bae1d3a05a8b, i2c transfers are non-atomic if preemption is
> >> disabled. However, non-atomic i2c transfers require preemption (e.g. in
> >> wait_for_completion() while waiting for the DMA).
> >>
> >> panic() calls preempt_disable_notrace() before calling
> >> emergency_restart(). Therefore, if an i2c device is used for the
> >> restart, the xfer should be atomic. This avoids warnings like:
> >>
> >> [   12.667612] WARNING: CPU: 1 PID: 1 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x33c/0x6b0
> >> [   12.676926] Voluntary context switch within RCU read-side critical section!
> >> ...
> >> [   12.742376]  schedule_timeout from wait_for_completion_timeout+0x90/0x114
> >> [   12.749179]  wait_for_completion_timeout from tegra_i2c_wait_completion+0x40/0x70
> >> ...
> >> [   12.994527]  atomic_notifier_call_chain from machine_restart+0x34/0x58
> >> [   13.001050]  machine_restart from panic+0x2a8/0x32c
> >>
> >> Use !preemptible() instead, which is basically the same check as
> >> pre-v5.2.
> >>
> >> Fixes: bae1d3a05a8b ("i2c: core: remove use of in_atomic()")
> >> Cc: stable@vger.kernel.org # v5.2+
> >> Suggested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
> >> Acked-by: Wolfram Sang <wsa@kernel.org>
> >> Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
> >> Tested-by: Nishanth Menon <nm@ti.com>
> >> Signed-off-by: Benjamin Bara <benjamin.bara@skidata.com>
> > 
> > For kernel 6.7 I'm having an issue when I shutdown or reboot my
> > Rockchip RK3326 or Rockchip RK3566 based devices, and I've bisected
> > the issue down to this specific commit.
> > 
> > When I shutdown or restart the device, I receive messages in the kernel
> > log like the following:
> > 
> > [   37.121148] rk3x-i2c fdd40000.i2c: irq in STATE_IDLE, ipd = 0x3
> > [   37.122178] rk3x-i2c fdd40000.i2c: irq in STATE_IDLE, ipd = 0x3
> > [   37.123212] rk3x-i2c fdd40000.i2c: irq in STATE_IDLE, ipd = 0x3
> > [   37.124226] rk3x-i2c fdd40000.i2c: irq in STATE_IDLE, ipd = 0x3
> > [   37.125242] rk3x-i2c fdd40000.i2c: irq in STATE_IDLE, ipd = 0x3
> > [   37.126133] rk3x-i2c fdd40000.i2c: irq in STATE_IDLE, ipd = 0x1
> > 
> > The device will also occasionally freeze instead of rebooting or
> > shutting down. The i2c errors are consistent, but the freezing
> > behavior is not.
> 
> I couldn't reproduce your issue with v6.7-rc1 and RK3399 that also uses rk3x-i2c. Though, the rk3x-i2c driver looks suspicious. Please try this patch:
> 
> diff --git a/drivers/i2c/busses/i2c-rk3x.c b/drivers/i2c/busses/i2c-rk3x.c
> index a044ca0c35a1..aad00e9909cc 100644
> --- a/drivers/i2c/busses/i2c-rk3x.c
> +++ b/drivers/i2c/busses/i2c-rk3x.c
> @@ -219,6 +219,8 @@ struct rk3x_i2c {
>  	enum rk3x_i2c_state state;
>  	unsigned int processed;
>  	int error;
> +
> +	int irq;
>  };
>  
>  static inline void i2c_writel(struct rk3x_i2c *i2c, u32 value,
> @@ -1090,8 +1092,10 @@ static int rk3x_i2c_xfer_common(struct i2c_adapter *adap,
>  		rk3x_i2c_start(i2c);
>  
>  		if (!polling) {
> +			enable_irq(i2c->irq);
>  			timeout = wait_event_timeout(i2c->wait, !i2c->busy,
>  						     msecs_to_jiffies(WAIT_TIMEOUT));
> +			disable_irq(i2c->irq);
>  		} else {
>  			timeout = rk3x_i2c_wait_xfer_poll(i2c);
>  		}
> @@ -1236,7 +1240,6 @@ static int rk3x_i2c_probe(struct platform_device *pdev)
>  	int ret = 0;
>  	int bus_nr;
>  	u32 value;
> -	int irq;
>  	unsigned long clk_rate;
>  
>  	i2c = devm_kzalloc(&pdev->dev, sizeof(struct rk3x_i2c), GFP_KERNEL);
> @@ -1299,11 +1302,14 @@ static int rk3x_i2c_probe(struct platform_device *pdev)
>  	}
>  
>  	/* IRQ setup */
> -	irq = platform_get_irq(pdev, 0);
> -	if (irq < 0)
> -		return irq;
> +	i2c->irq = platform_get_irq(pdev, 0);
> +	if (i2c->irq < 0)
> +		return i2c->irq;
> +
> +	/* interrupt will be enabled during of transfer time */
> +	irq_set_status_flags(i2c->irq, IRQ_NOAUTOEN);
>  
> -	ret = devm_request_irq(&pdev->dev, irq, rk3x_i2c_irq,
> +	ret = devm_request_irq(&pdev->dev, i2c->irq, rk3x_i2c_irq,
>  			       0, dev_name(&pdev->dev), i2c);
>  	if (ret < 0) {
>  		dev_err(&pdev->dev, "cannot request IRQ\n");
> 

I can confirm I no longer get any of the errors with this patch. Tested
on both an Anbernic RG353P (RK3566 with an RK817 PMIC) and an Odroid
Go Advance (RK3326 with an RK817 PMIC). The device appears to shut
down consistently again and I no longer see these messages in my dmesg
log when I shut down.

Thank you.

> 
> -- 
> Best regards,
> Dmitry
>
Chris Morgan Nov. 14, 2023, 6:43 p.m. UTC | #4
On Mon, Nov 13, 2023 at 04:48:26PM +0100, Benjamin Bara wrote:
> Hi!
> 
> Thanks for testing and the feedback!
> 
> On Mon, 13 Nov 2023 at 15:54, Chris Morgan <macroalpha82@gmail.com> wrote:
> > I can confirm I no longer get any of the errors with this patch. Tested
> > on both an Anbernic RG353P (RK3566 with an RK817 PMIC) and an Odroid
> > Go Advance (RK3326 with an RK817 PMIC). The device appears to shut
> > down consistently again and I no longer see these messages in my dmesg
> > log when I shut down.
> 
> Just to make sure: Are you compiling with CONFIG_PREEMPTION (and
> therefore CONFIG_PREEMPT_COUNT)?
> 
> If yes, could you please also test the following patch? Because I am not
> sure yet how polling can be false in a "polling required" situation,
> meaning .master_xfer() is called instead of .master_xfer_atomic() (while
> your test shows that irq_disabled() is true, which is basically done
> with !preemptible()). The patch should test the other way round: if the
> situation is found, force an atomic transfer instead.
> 
> Thank you!
> 
> diff --git a/drivers/i2c/busses/i2c-rk3x.c b/drivers/i2c/busses/i2c-rk3x.c
> index a044ca0c35a1..6e3e8433018f 100644
> --- a/drivers/i2c/busses/i2c-rk3x.c
> +++ b/drivers/i2c/busses/i2c-rk3x.c
> @@ -1131,6 +1131,10 @@ static int rk3x_i2c_xfer_common(struct i2c_adapter *adap,
>  static int rk3x_i2c_xfer(struct i2c_adapter *adap,
>                          struct i2c_msg *msgs, int num)
>  {
> +       if (irqs_disabled()) {
> +               WARN_ONCE(1, "Landed in non-atomic handler with disabled IRQs");
> +               return rk3x_i2c_xfer_common(adap, msgs, num, true);
> +       }
>         return rk3x_i2c_xfer_common(adap, msgs, num, false);
>  }
> 

I have CONFIG_PREEMPT_VOLUNTARY=y but CONFIG_PREEMPTION is not set.

Thank you.
Dmitry Osipenko Nov. 15, 2023, 6 a.m. UTC | #5
On 11/13/23 17:54, Chris Morgan wrote:
..
> I can confirm I no longer get any of the errors with this patch. Tested
> on both an Anbernic RG353P (RK3566 with an RK817 PMIC) and an Odroid
> Go Advance (RK3326 with an RK817 PMIC). The device appears to shut
> down consistently again and I no longer see these messages in my dmesg
> log when I shut down.

I'll prepare the proper patch, thanks.
Michael Walle Jan. 2, 2024, 3:03 p.m. UTC | #6
Hi,

> Since bae1d3a05a8b, i2c transfers are non-atomic if preemption is
> disabled. However, non-atomic i2c transfers require preemption (e.g. in
> wait_for_completion() while waiting for the DMA).
> 
> panic() calls preempt_disable_notrace() before calling
> emergency_restart(). Therefore, if an i2c device is used for the
> restart, the xfer should be atomic. This avoids warnings like:
> 
> [   12.667612] WARNING: CPU: 1 PID: 1 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x33c/0x6b0
> [   12.676926] Voluntary context switch within RCU read-side critical section!
> ...
> [   12.742376]  schedule_timeout from wait_for_completion_timeout+0x90/0x114
> [   12.749179]  wait_for_completion_timeout from tegra_i2c_wait_completion+0x40/0x70
> ...
> [   12.994527]  atomic_notifier_call_chain from machine_restart+0x34/0x58
> [   13.001050]  machine_restart from panic+0x2a8/0x32c
> 
> Use !preemptible() instead, which is basically the same check as
> pre-v5.2.
> 
> Fixes: bae1d3a05a8b ("i2c: core: remove use of in_atomic()")
> Cc: stable@vger.kernel.org # v5.2+
> Suggested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
> Acked-by: Wolfram Sang <wsa@kernel.org>
> Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
> Tested-by: Nishanth Menon <nm@ti.com>
> Signed-off-by: Benjamin Bara <benjamin.bara@skidata.com>
> ---
>  drivers/i2c/i2c-core.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/i2c/i2c-core.h b/drivers/i2c/i2c-core.h
> index 1247e6e6e975..05b8b8dfa9bd 100644
> --- a/drivers/i2c/i2c-core.h
> +++ b/drivers/i2c/i2c-core.h
> @@ -29,7 +29,7 @@ int i2c_dev_irq_from_resources(const struct resource *resources,
>   */
>  static inline bool i2c_in_atomic_xfer_mode(void)
>  {
> -	return system_state > SYSTEM_RUNNING && irqs_disabled();
> +	return system_state > SYSTEM_RUNNING && !preemptible();

With preemption disabled, this boils down to
  return system_state > SYSTEM_RUNNING (&& !0)

and will then generate a backtrace splash on each reboot on our
board:

# reboot -f
[   12.687169] No atomic I2C transfer handler for 'i2c-0'
[   12.692313] WARNING: CPU: 6 PID: 275 at drivers/i2c/i2c-core.h:40 i2c_smbus_xfer+0x100/0x118
[   12.700745] Modules linked in:
[   12.703788] CPU: 6 PID: 275 Comm: reboot Not tainted 6.7.0-rc6-next-20231222+ #2494
[   12.711431] Hardware name: Kontron 3.5"-SBC-i1200 (DT)
[   12.716555] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   12.723504] pc : i2c_smbus_xfer+0x100/0x118
[   12.727674] lr : i2c_smbus_xfer+0x100/0x118
[   12.731844] sp : ffff80008389b7c0
[   12.735146] x29: ffff80008389b7c0 x28: ffff0000c561b4c0 x27: ffff0000c2b06400
[   12.742268] x26: ffff0000c65aec4a x25: 000000000000000b x24: 0000000000000001
[   12.749389] x23: 0000000000000000 x22: 0000000000000064 x21: 0000000000000008
[   12.756510] x20: ffff80008389b836 x19: ffff0000c2dda080 x18: ffffffffffffffff
[   12.763631] x17: ffff800080813f48 x16: ffff80008081bd38 x15: 0730072d07630732
[   12.770752] x14: 0769072707200772 x13: 0730072d07630732 x12: 0769072707200772
[   12.777873] x11: 0720072007200720 x10: ffff8000827dc828 x9 : ffff80008012e154
[   12.784994] x8 : 00000000ffffefff x7 : ffff8000827dc828 x6 : 80000000fffff000
[   12.792116] x5 : 000000000000bff4 x4 : 0000000000000000 x3 : 0000000000000000
[   12.799236] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c561b4c0
[   12.806359] Call trace:
[   12.808793]  i2c_smbus_xfer+0x100/0x118
[   12.812616]  i2c_smbus_read_i2c_block_data+0x60/0xb0
[   12.817569]  mt6360_regmap_read+0xa4/0x1b8
[   12.821654]  _regmap_raw_read+0xfc/0x270
[   12.825566]  _regmap_bus_read+0x4c/0x90
[   12.829389]  _regmap_read+0x6c/0x168
[   12.832953]  _regmap_update_bits+0xf8/0x150
[   12.837124]  regmap_update_bits_base+0x6c/0xa8
[   12.841555]  regulator_disable_regmap+0x48/0x68
[   12.846073]  _regulator_do_disable+0x130/0x1e8
[   12.850504]  _regulator_disable+0x154/0x1d8
[   12.854676]  regulator_disable+0x44/0x90
[   12.858587]  mmc_regulator_set_ocr+0x8c/0x118
[   12.862933]  msdc_ops_set_ios+0x3f4/0x938
[   12.866932]  mmc_set_initial_state+0x90/0xa8
[   12.871191]  mmc_power_off+0x40/0x68
[   12.874754]  _mmc_sd_suspend+0x5c/0x190
[   12.878577]  mmc_sd_suspend+0x20/0x78
[   12.882227]  mmc_bus_shutdown+0x48/0x90
[   12.886051]  device_shutdown+0x134/0x298
[   12.889961]  kernel_restart+0x48/0xd0
[   12.893613]  __do_sys_reboot+0x1e0/0x278
[   12.897523]  __arm64_sys_reboot+0x2c/0x40
[   12.901519]  invoke_syscall+0x50/0x128
[   12.905257]  el0_svc_common.constprop.0+0x48/0xf0
[   12.909950]  do_el0_svc+0x24/0x38
[   12.913253]  el0_svc+0x38/0xd8
[   12.916296]  el0t_64_sync_handler+0x100/0x130
[   12.920639]  el0t_64_sync+0x1a4/0x1a8
[   12.924289] ---[ end trace 0000000000000000 ]---
...

I'm not sure if this is now the expected behavior or not. There will be
no backtraces, if I build a preemptible kernel, nor will there be
backtraces if I revert this patch.

OTOH, the driver I'm using (drivers/i2c/busses/i2c-mt65xx.c) has no
*_atomic(). So the warning is correct. There is also [1], which seems to
be the same issue I'm facing.

-michael

[1] https://lore.kernel.org/linux-i2c/13271b9b-4132-46ef-abf8-2c311967bb46@mailbox.org/
Benjamin Bara Jan. 2, 2024, 9:02 p.m. UTC | #7
Hi Michael,

On Tue, 2 Jan 2024 at 16:03, Michael Walle <mwalle@kernel.org> wrote:
> With preemption disabled, this boils down to
>   return system_state > SYSTEM_RUNNING (&& !0)
>
> and will then generate a backtrace splash on each reboot on our
> board:
>
> # reboot -f
> [   12.687169] No atomic I2C transfer handler for 'i2c-0'
> ...
> [   12.806359] Call trace:
> [   12.808793]  i2c_smbus_xfer+0x100/0x118
> ...
>
> I'm not sure if this is now the expected behavior or not. There will be
> no backtraces, if I build a preemptible kernel, nor will there be
> backtraces if I revert this patch.


thanks for the report.

In your case, the warning comes from shutting down a regulator during
device_shutdown(), so nothing really problematic here. However, later in
the "restart sequence", IRQs are disabled before the restart handlers
are called. If the reboot handlers would rely on irq-based
("non-atomic") i2c transfer, they might not work properly.

> OTOH, the driver I'm using (drivers/i2c/busses/i2c-mt65xx.c) has no
> *_atomic(). So the warning is correct. There is also [1], which seems to
> be the same issue I'm facing.
>
> -michael
>
> [1] https://lore.kernel.org/linux-i2c/13271b9b-4132-46ef-abf8-2c311967bb46@mailbox.org/


I tried to implement an atomic handler for the mt65xx, but I don't have
the respective hardware available to test it. I decided to use a similar
approach as done in drivers/i2c/busses/i2c-rk3x.c, which calls the IRQ
handler in a while loop if an atomic xfer is requested. IMHO, this
should work with IRQs enabled and disabled, but I am not sure if this is
the best approach...

diff --git a/drivers/i2c/busses/i2c-mt65xx.c b/drivers/i2c/busses/i2c-mt65xx.c
index a8b5719c3372..3c18305e6059 100644
--- a/drivers/i2c/busses/i2c-mt65xx.c
+++ b/drivers/i2c/busses/i2c-mt65xx.c
@@ -16,6 +16,7 @@
 #include <linux/interrupt.h>
 #include <linux/io.h>
 #include <linux/iopoll.h>
+#include <linux/jiffies.h>
 #include <linux/kernel.h>
 #include <linux/mm.h>
 #include <linux/module.h>
@@ -307,6 +308,8 @@ struct mtk_i2c {
     bool ignore_restart_irq;
     struct mtk_i2c_ac_timing ac_timing;
     const struct mtk_i2c_compatible *dev_comp;
+    bool atomic_xfer;
+    bool xfer_complete;
 };

 /**
@@ -994,6 +997,20 @@ static void i2c_dump_register(struct mtk_i2c *i2c)
         readl(i2c->pdmabase + OFFSET_RX_4G_MODE));
 }

+static irqreturn_t mtk_i2c_irq(int irqno, void *dev_id);
+
+static int mtk_i2c_wait_xfer_atomic(struct mtk_i2c *i2c)
+{
+    unsigned long timeout = jiffies + i2c->adap.timeout;
+
+    do {
+        udelay(10);
+        mtk_i2c_irq(0, i2c);
+    } while (!i2c->xfer_complete && time_before(jiffies, timeout));
+
+    return i2c->xfer_complete;
+}
+
 static int mtk_i2c_do_transfer(struct mtk_i2c *i2c, struct i2c_msg *msgs,
                   int num, int left_num)
 {
@@ -1015,7 +1032,10 @@ static int mtk_i2c_do_transfer(struct mtk_i2c
*i2c, struct i2c_msg *msgs,
     if (i2c->auto_restart)
         restart_flag = I2C_RS_TRANSFER;

-    reinit_completion(&i2c->msg_complete);
+    if (i2c->atomic_xfer)
+        i2c->xfer_complete = false;
+    else
+        reinit_completion(&i2c->msg_complete);

     if (i2c->dev_comp->apdma_sync &&
        i2c->op != I2C_MASTER_WRRD && num > 1) {
@@ -1195,8 +1215,12 @@ static int mtk_i2c_do_transfer(struct mtk_i2c
*i2c, struct i2c_msg *msgs,
     }
     mtk_i2c_writew(i2c, start_reg, OFFSET_START);

-    ret = wait_for_completion_timeout(&i2c->msg_complete,
-                     i2c->adap.timeout);
+    if (i2c->atomic_xfer)
+        /* We can't rely on wait_for_completion* calls in atomic mode. */
+        ret = mtk_i2c_wait_xfer_atomic(i2c);
+    else
+        ret = wait_for_completion_timeout(&i2c->msg_complete,
+                         i2c->adap.timeout);

     /* Clear interrupt mask */
     mtk_i2c_writew(i2c, ~(restart_flag | I2C_HS_NACKERR | I2C_ACKERR |
@@ -1238,8 +1262,8 @@ static int mtk_i2c_do_transfer(struct mtk_i2c
*i2c, struct i2c_msg *msgs,
     return 0;
 }

-static int mtk_i2c_transfer(struct i2c_adapter *adap,
-               struct i2c_msg msgs[], int num)
+static int mtk_i2c_transfer_common(struct i2c_adapter *adap,
+                  struct i2c_msg msgs[], int num, bool atomic)
 {
     int ret;
     int left_num = num;
@@ -1249,6 +1273,7 @@ static int mtk_i2c_transfer(struct i2c_adapter *adap,
     if (ret)
         return ret;

+    i2c->atomic_xfer = atomic;
     i2c->auto_restart = i2c->dev_comp->auto_restart;

     /* checking if we can skip restart and optimize using WRRD mode */
@@ -1303,6 +1328,18 @@ static int mtk_i2c_transfer(struct i2c_adapter *adap,
     return ret;
 }

+static int mtk_i2c_transfer(struct i2c_adapter *adap,
+               struct i2c_msg msgs[], int num)
+{
+    return mtk_i2c_transfer_common(adap, msgs, num, false);
+}
+
+static int mtk_i2c_transfer_atomic(struct i2c_adapter *adap,
+                  struct i2c_msg msgs[], int num)
+{
+    return mtk_i2c_transfer_common(adap, msgs, num, true);
+}
+
 static irqreturn_t mtk_i2c_irq(int irqno, void *dev_id)
 {
     struct mtk_i2c *i2c = dev_id;
@@ -1328,8 +1365,12 @@ static irqreturn_t mtk_i2c_irq(int irqno, void *dev_id)
         mtk_i2c_writew(i2c, I2C_RS_MUL_CNFG | I2C_RS_MUL_TRIG |
                    I2C_TRANSAC_START, OFFSET_START);
     } else {
-        if (i2c->irq_stat & (I2C_TRANSAC_COMP | restart_flag))
-            complete(&i2c->msg_complete);
+        if (i2c->irq_stat & (I2C_TRANSAC_COMP | restart_flag)) {
+            if (i2c->atomic_xfer)
+                i2c->xfer_complete = true;
+            else
+                complete(&i2c->msg_complete);
+        }
     }

     return IRQ_HANDLED;
@@ -1346,6 +1387,7 @@ static u32 mtk_i2c_functionality(struct i2c_adapter *adap)

 static const struct i2c_algorithm mtk_i2c_algorithm = {
     .master_xfer = mtk_i2c_transfer,
+    .master_xfer_atomic = mtk_i2c_transfer_atomic,
     .functionality = mtk_i2c_functionality,
 };
Michael Walle Jan. 3, 2024, 9:20 a.m. UTC | #8
Hi Benjamin,

>> With preemption disabled, this boils down to
>>   return system_state > SYSTEM_RUNNING (&& !0)
>> 
>> and will then generate a backtrace splash on each reboot on our
>> board:
>> 
>> # reboot -f
>> [   12.687169] No atomic I2C transfer handler for 'i2c-0'
>> ...
>> [   12.806359] Call trace:
>> [   12.808793]  i2c_smbus_xfer+0x100/0x118
>> ...
>> 
>> I'm not sure if this is now the expected behavior or not. There will 
>> be
>> no backtraces, if I build a preemptible kernel, nor will there be
>> backtraces if I revert this patch.
> 
> 
> thanks for the report.
> 
> In your case, the warning comes from shutting down a regulator during
> device_shutdown(), so nothing really problematic here.

I tend to disagree. Yes it's not problematic. But from a users point of
view, you get a splash of *many* backtraces on every reboot. Btw, one
should really turn this into a WARN_ONCE(). But even in this case you
might scare users which will eventually lead to more bug reports.

> However, later in
> the "restart sequence", IRQs are disabled before the restart handlers
> are called. If the reboot handlers would rely on irq-based
> ("non-atomic") i2c transfer, they might not work properly.

I get this from a technical point of view and agree that the correct
fix is to add the atomic variant to the i2c driver, which begs the
question, if adding the atomic variant to the driver will be considered
as a Fixes patch.

Do I get it correct, that in my case the interrupts are still enabled?
Otherwise I'd have gotten this warning even before your patch, correct?
Excuse my ignorance, but when are the interrupts actually disabled
during shutdown?

>> OTOH, the driver I'm using (drivers/i2c/busses/i2c-mt65xx.c) has no
>> *_atomic(). So the warning is correct. There is also [1], which seems 
>> to
>> be the same issue I'm facing.
>> 
>> -michael
>> 
>> [1] 
>> https://lore.kernel.org/linux-i2c/13271b9b-4132-46ef-abf8-2c311967bb46@mailbox.org/
> 
> 
> I tried to implement an atomic handler for the mt65xx, but I don't have
> the respective hardware available to test it. I decided to use a 
> similar
> approach as done in drivers/i2c/busses/i2c-rk3x.c, which calls the IRQ
> handler in a while loop if an atomic xfer is requested. IMHO, this
> should work with IRQs enabled and disabled, but I am not sure if this 
> is
> the best approach...

Thanks for already looking into that. Do you want to submit it as an
actual patch? If so, you can add

Tested-by: Michael Walle <mwalle@kernel.org>

But again, it would be nice if we somehow can get rid of this huge 
splash
of backtraces on 6.7.x (I guess it's already too late 6.7).

-michael
Benjamin Bara Jan. 3, 2024, 12:49 p.m. UTC | #9
Hi Michael,

On Wed, 3 Jan 2024 at 10:20, Michael Walle <mwalle@kernel.org> wrote:
> >> With preemption disabled, this boils down to
> >>   return system_state > SYSTEM_RUNNING (&& !0)
> >>
> >> and will then generate a backtrace splash on each reboot on our
> >> board:
> >>
> >> # reboot -f
> >> [   12.687169] No atomic I2C transfer handler for 'i2c-0'
> >> ...
> >> [   12.806359] Call trace:
> >> [   12.808793]  i2c_smbus_xfer+0x100/0x118
> >> ...
> >>
> >> I'm not sure if this is now the expected behavior or not. There will
> >> be
> >> no backtraces, if I build a preemptible kernel, nor will there be
> >> backtraces if I revert this patch.
> >
> >
> > thanks for the report.
> >
> > In your case, the warning comes from shutting down a regulator during
> > device_shutdown(), so nothing really problematic here.
>
> I tend to disagree. Yes it's not problematic. But from a users point of
> view, you get a splash of *many* backtraces on every reboot. Btw, one
> should really turn this into a WARN_ONCE(). But even in this case you
> might scare users which will eventually lead to more bug reports.

Sure, but the correct "fix" would be to implement an atomic handler if
the i2c is used during this late stage. I just meant that the
device_shutdown() is less problematic than the actual reboot handler.
Your PMIC seems to not have a reboot handler (registered (yet)), and is
therefore not "affected".

> > However, later in
> > the "restart sequence", IRQs are disabled before the restart handlers
> > are called. If the reboot handlers would rely on irq-based
> > ("non-atomic") i2c transfer, they might not work properly.
>
> I get this from a technical point of view and agree that the correct
> fix is to add the atomic variant to the i2c driver, which begs the
> question, if adding the atomic variant to the driver will be considered
> as a Fixes patch.

I can add a Fixes when I post it. Although the initial patch just makes
the actual problem "noisier".

> Do I get it correct, that in my case the interrupts are still enabled?
> Otherwise I'd have gotten this warning even before your patch, correct?

Yes, device_shutdown() is called during
kernel_{shutdown,restart}_prepare(), before
machine_{power_off,restart}() is called. The interrupts should therefore
still be enabled in your case.

> Excuse my ignorance, but when are the interrupts actually disabled
> during shutdown?

This is usually one of the first things done in machine_restart(),
before the architecture-specific restart handlers are called (which
might use i2c). Same for machine_power_off().

> >> OTOH, the driver I'm using (drivers/i2c/busses/i2c-mt65xx.c) has no
> >> *_atomic(). So the warning is correct. There is also [1], which seems
> >> to
> >> be the same issue I'm facing.
> >>
> >> -michael
> >>
> >> [1]
> >> https://lore.kernel.org/linux-i2c/13271b9b-4132-46ef-abf8-2c311967bb46@mailbox.org/
> >
> >
> > I tried to implement an atomic handler for the mt65xx, but I don't have
> > the respective hardware available to test it. I decided to use a
> > similar
> > approach as done in drivers/i2c/busses/i2c-rk3x.c, which calls the IRQ
> > handler in a while loop if an atomic xfer is requested. IMHO, this
> > should work with IRQs enabled and disabled, but I am not sure if this
> > is
> > the best approach...
>
> Thanks for already looking into that. Do you want to submit it as an
> actual patch? If so, you can add
>
> Tested-by: Michael Walle <mwalle@kernel.org>

Yes, I can do that - thanks for the quick feedback.

> But again, it would be nice if we somehow can get rid of this huge
> splash
> of backtraces on 6.7.x (I guess it's already too late 6.7).

IMHO, converting the error to WARN_ONCE() makes sense to reduce the
noise, but helps having more reliable reboot handling via i2c. Do you
think this is a sufficient "short-term solution" to reduce the noise
before the missing atomic handlers are actually implemented?
Michael Walle Jan. 3, 2024, 3:07 p.m. UTC | #10
Hi Benjamin,

>> >> With preemption disabled, this boils down to
>> >>   return system_state > SYSTEM_RUNNING (&& !0)
>> >>
>> >> and will then generate a backtrace splash on each reboot on our
>> >> board:
>> >>
>> >> # reboot -f
>> >> [   12.687169] No atomic I2C transfer handler for 'i2c-0'
>> >> ...
>> >> [   12.806359] Call trace:
>> >> [   12.808793]  i2c_smbus_xfer+0x100/0x118
>> >> ...
>> >>
>> >> I'm not sure if this is now the expected behavior or not. There will
>> >> be
>> >> no backtraces, if I build a preemptible kernel, nor will there be
>> >> backtraces if I revert this patch.
>> >
>> >
>> > thanks for the report.
>> >
>> > In your case, the warning comes from shutting down a regulator during
>> > device_shutdown(), so nothing really problematic here.
>> 
>> I tend to disagree. Yes it's not problematic. But from a users point 
>> of
>> view, you get a splash of *many* backtraces on every reboot. Btw, one
>> should really turn this into a WARN_ONCE(). But even in this case you
>> might scare users which will eventually lead to more bug reports.
> 
> Sure, but the correct "fix" would be to implement an atomic handler if
> the i2c is used during this late stage. I just meant that the
> device_shutdown() is less problematic than the actual reboot handler.
> Your PMIC seems to not have a reboot handler (registered (yet)), and is
> therefore not "affected".
> 
>> > However, later in
>> > the "restart sequence", IRQs are disabled before the restart handlers
>> > are called. If the reboot handlers would rely on irq-based
>> > ("non-atomic") i2c transfer, they might not work properly.
>> 
>> I get this from a technical point of view and agree that the correct
>> fix is to add the atomic variant to the i2c driver, which begs the
>> question, if adding the atomic variant to the driver will be 
>> considered
>> as a Fixes patch.
> 
> I can add a Fixes when I post it. Although the initial patch just makes
> the actual problem "noisier".

As far as I understand, there was no problem (for me at least),
because the interrupts were still enabled at this time. But now,
there is the problem with getting these backtraces and with that
the user reports.

Don't get me wrong, I'm all for the correct fix here. But at the
same time I fear all the reports we'll be getting. And in the meantime
there was already a new one.

>> Do I get it correct, that in my case the interrupts are still enabled?
>> Otherwise I'd have gotten this warning even before your patch, 
>> correct?
> 
> Yes, device_shutdown() is called during
> kernel_{shutdown,restart}_prepare(), before
> machine_{power_off,restart}() is called. The interrupts should 
> therefore
> still be enabled in your case.
> 
>> Excuse my ignorance, but when are the interrupts actually disabled
>> during shutdown?
> 
> This is usually one of the first things done in machine_restart(),
> before the architecture-specific restart handlers are called (which
> might use i2c). Same for machine_power_off().

Thanks for explaining.

>> >> OTOH, the driver I'm using (drivers/i2c/busses/i2c-mt65xx.c) has no
>> >> *_atomic(). So the warning is correct. There is also [1], which seems
>> >> to
>> >> be the same issue I'm facing.
>> >>
>> >> -michael
>> >>
>> >> [1]
>> >> https://lore.kernel.org/linux-i2c/13271b9b-4132-46ef-abf8-2c311967bb46@mailbox.org/
>> >
>> >
>> > I tried to implement an atomic handler for the mt65xx, but I don't have
>> > the respective hardware available to test it. I decided to use a
>> > similar
>> > approach as done in drivers/i2c/busses/i2c-rk3x.c, which calls the IRQ
>> > handler in a while loop if an atomic xfer is requested. IMHO, this
>> > should work with IRQs enabled and disabled, but I am not sure if this
>> > is
>> > the best approach...
>> 
>> Thanks for already looking into that. Do you want to submit it as an
>> actual patch? If so, you can add
>> 
>> Tested-by: Michael Walle <mwalle@kernel.org>
> 
> Yes, I can do that - thanks for the quick feedback.
> 
>> But again, it would be nice if we somehow can get rid of this huge
>> splash
>> of backtraces on 6.7.x (I guess it's already too late 6.7).
> 
> IMHO, converting the error to WARN_ONCE() makes sense to reduce the
> noise, but helps having more reliable reboot handling via i2c. Do you
> think this is a sufficient "short-term solution" to reduce the noise
> before the missing atomic handlers are actually implemented?

Turning that WARN into a WARN_ONCE is one thing. But it is still odd
that don't I get a warning with preemption enabled. Is that because
preemptible() will still return 1 until interrupts are actually 
disabled?
Can we achieve something similar with kernels without preemption 
support?
IOW, just warn iff there is an actual error, that is if i2c_xfer()
is called with interrupt off?

-michael
diff mbox series

Patch

diff --git a/drivers/i2c/i2c-core.h b/drivers/i2c/i2c-core.h
index 1247e6e6e975..05b8b8dfa9bd 100644
--- a/drivers/i2c/i2c-core.h
+++ b/drivers/i2c/i2c-core.h
@@ -29,7 +29,7 @@  int i2c_dev_irq_from_resources(const struct resource *resources,
  */
 static inline bool i2c_in_atomic_xfer_mode(void)
 {
-	return system_state > SYSTEM_RUNNING && irqs_disabled();
+	return system_state > SYSTEM_RUNNING && !preemptible();
 }
 
 static inline int __i2c_lock_bus_helper(struct i2c_adapter *adap)