MMC card detection may trip watchdog if card powered off

Message ID	BN0PR08MB695133000AF116F04C3A9FFE83212@BN0PR08MB6951.namprd08.prod.outlook.com
State	New
Headers	show Received: from NAM02-DM3-obe.outbound.protection.outlook.com (mail-dm3nam02on2042.outbound.protection.outlook.com [40.107.95.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1445113AA26 for <linux-mmc@vger.kernel.org>; Wed, 20 Nov 2024 16:21:13 +0000 (UTC) From: "Anthony Pighin (Nokia)" <anthony.pighin@nokia.com> To: "linux-mmc@vger.kernel.org" <linux-mmc@vger.kernel.org> Subject: MMC card detection may trip watchdog if card powered off Thread-Topic: MMC card detection may trip watchdog if card powered off Thread-Index: Ads7Zw4syJtxBkzjQFCdin/8++ZdRQ== Date: Wed, 20 Nov 2024 16:21:10 +0000 Message-ID: <BN0PR08MB695133000AF116F04C3A9FFE83212@BN0PR08MB6951.namprd08.prod.outlook.com> Accept-Language: en-US Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Precedence: bulk MIME-Version: 1.0
Series	MMC card detection may trip watchdog if card powered off \| expand MMC card detection may trip watchdog if card powered off

Anthony Pighin (Nokia) Nov. 20, 2024, 4:21 p.m. UTC

If card detection is done via polling, due to broken-cd (Freescale LX2160, etc.), or for other reasons, then the card will be polled asynchronously and periodically.

If that polling happens after the card has been put in powered off state (i.e. during system shutdown/reboot), then the polling times out. That timeout is of a long duration (10s). And it is repeated multiple times (x3). And that is all done after the watchdogd has been disabled, meaning that system watchdogs are not being kicked.

If the MMC polling exceeds the watchdog trip time, then the system will be ungraciously reset. Or in the case of a pretimeout capable watchdog, the pretimeout will trip unnecessarily.

    [   46.872767] mmc_mrq_pr_debug:274: mmc1: starting CMD6 arg 03220301 flags 0000049d
    [   46.880258] sdhci_irq:3558: mmc1: sdhci: IRQ status 0x00000001
    [   46.886082] sdhci_irq:3558: mmc1: sdhci: IRQ status 0x00000002
    [   46.891906] mmc_request_done:187: mmc1: req done (CMD6): 0: 00000800 00000000 00000000 00000000
    [   46.900606] mmc_set_ios:892: mmc1: clock 0Hz busmode 2 powermode 0 cs 0 Vdd 0 width 1 timing 0
    [   46.914934] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg 00010000 flags 00000195
    [   57.433351] mmc1: Timeout waiting for hardware cmd interrupt.
    ...
    [   71.031911] [Redacted] 2030000.i2c:[Redacted]@41:watchdog: Watchdog interrupt received!
    [   71.039737] Kernel panic - not syncing: watchdog pretimeout event
    [   71.045820] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O       6.6.59 #1
    [   71.053207] Hardware name: [Redacted]
    [   71.059897] Call trace:
    [   71.062332]  dump_backtrace+0x9c/0x128
    ...

CMD6 is SWITCH_FUNC and arg 03220301 is POWER_OFF_NOTIFICATION (bits 16:23 = 0x22 = 34).
CMD13 is SEND_STATUS, and when it occurs after the POWER_OFF_NOTIFICATION (as above) bad things happen.

I have made the following change to attempt to work around the issue, and it seems to hold up, but is also quite brute force:

Anthony

Ulf Hansson Nov. 21, 2024, 10:58 a.m. UTC | #1

On Wed, 20 Nov 2024 at 17:21, Anthony Pighin (Nokia)
<anthony.pighin@nokia.com> wrote:
>
> If card detection is done via polling, due to broken-cd (Freescale LX2160, etc.), or for other reasons, then the card will be polled asynchronously and periodically.
>
> If that polling happens after the card has been put in powered off state (i.e. during system shutdown/reboot), then the polling times out. That timeout is of a long duration (10s). And it is repeated multiple times (x3). And that is all done after the watchdogd has been disabled, meaning that system watchdogs are not being kicked.
>
> If the MMC polling exceeds the watchdog trip time, then the system will be ungraciously reset. Or in the case of a pretimeout capable watchdog, the pretimeout will trip unnecessarily.
>
>     [   46.872767] mmc_mrq_pr_debug:274: mmc1: starting CMD6 arg 03220301 flags 0000049d
>     [   46.880258] sdhci_irq:3558: mmc1: sdhci: IRQ status 0x00000001
>     [   46.886082] sdhci_irq:3558: mmc1: sdhci: IRQ status 0x00000002
>     [   46.891906] mmc_request_done:187: mmc1: req done (CMD6): 0: 00000800 00000000 00000000 00000000
>     [   46.900606] mmc_set_ios:892: mmc1: clock 0Hz busmode 2 powermode 0 cs 0 Vdd 0 width 1 timing 0
>     [   46.914934] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg 00010000 flags 00000195
>     [   57.433351] mmc1: Timeout waiting for hardware cmd interrupt.

Hmm. How is the polling being done? By using MMC_CAP_NEEDS_POLL?

The above certainly looks weird. The mmc_rescan work should not be
allowed to run at this point, as the work is getting punted to the
system_freezable_wq via mmc_schedule_delayed_work().

>     ...
>     [   71.031911] [Redacted] 2030000.i2c:[Redacted]@41:watchdog: Watchdog interrupt received!
>     [   71.039737] Kernel panic - not syncing: watchdog pretimeout event
>     [   71.045820] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O       6.6.59 #1
>     [   71.053207] Hardware name: [Redacted]
>     [   71.059897] Call trace:
>     [   71.062332]  dump_backtrace+0x9c/0x128
>     ...
>
> CMD6 is SWITCH_FUNC and arg 03220301 is POWER_OFF_NOTIFICATION (bits 16:23 = 0x22 = 34).
> CMD13 is SEND_STATUS, and when it occurs after the POWER_OFF_NOTIFICATION (as above) bad things happen.
>
> I have made the following change to attempt to work around the issue, and it seems to hold up, but is also quite brute force:
>
> --- a/drivers/mmc/core/mmc.c
> +++ b/drivers/mmc/core/mmc.c
> @@ -2046,6 +2046,11 @@ static void mmc_remove(struct mmc_host *host)
>   */
>  static int mmc_alive(struct mmc_host *host)
>  {
> +       if (host->card && mmc_card_suspended(host->card)) {
> +               pr_err("%s: Skip card detection: Card suspended\n",
> +                      mmc_hostname(host));
> +               return -ENOMEDIUM;
> +       }
>         return mmc_send_status(host->card, NULL);
>  }

Yeah, the above isn't really the correct solution in my opinion.

We need to prevent the mmc_rescan work from running, which I thought
we already did...

>
> Anthony
>
>

Kind regards
Uffe

Anthony Pighin (Nokia) Nov. 21, 2024, 1:23 p.m. UTC | #2

> >
> > If card detection is done via polling, due to broken-cd (Freescale LX2160,
> etc.), or for other reasons, then the card will be polled asynchronously and
> periodically.
> >
> > If that polling happens after the card has been put in powered off state (i.e.
> during system shutdown/reboot), then the polling times out. That timeout is
> of a long duration (10s). And it is repeated multiple times (x3). And that is all
> done after the watchdogd has been disabled, meaning that system watchdogs
> are not being kicked.
> >
> > If the MMC polling exceeds the watchdog trip time, then the system will be
> ungraciously reset. Or in the case of a pretimeout capable watchdog, the
> pretimeout will trip unnecessarily.
> >
> >     [   46.872767] mmc_mrq_pr_debug:274: mmc1: starting CMD6 arg
> 03220301 flags 0000049d
> >     [   46.880258] sdhci_irq:3558: mmc1: sdhci: IRQ status 0x00000001
> >     [   46.886082] sdhci_irq:3558: mmc1: sdhci: IRQ status 0x00000002
> >     [   46.891906] mmc_request_done:187: mmc1: req done (CMD6): 0:
> 00000800 00000000 00000000 00000000
> >     [   46.900606] mmc_set_ios:892: mmc1: clock 0Hz busmode 2
> powermode 0 cs 0 Vdd 0 width 1 timing 0
> >     [   46.914934] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg
> 00010000 flags 00000195
> >     [   57.433351] mmc1: Timeout waiting for hardware cmd interrupt.
> 
> Hmm. How is the polling being done? By using MMC_CAP_NEEDS_POLL?
> 

Correct. (At least in my understanding.) 'broken-cd' in the device tree will trigger
MMC_CAP_NEEDS_POLL to be set.

> The above certainly looks weird. The mmc_rescan work should not be
> allowed to run at this point, as the work is getting punted to the
> system_freezable_wq via mmc_schedule_delayed_work().
> 

This is the backtrace I get when the timeout occurs:

[   46.154348] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg 00010000 flags 00000195
[   46.161917] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000001
[   46.167743] mmc_request_done:187: mmc1: req done (CMD13): 0: 00000900 00000000 00000000 00000000
[   46.176526] mmc_mrq_pr_debug:274: mmc1: starting CMD6 arg 03220301 flags 0000049d
[   46.184016] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000001
[   46.189840] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000002
[   46.195665] mmc_request_done:187: mmc1: req done (CMD6): 0: 00000800 00000000 00000000 00000000
[   46.204362] mmc_set_ios:892: mmc1: clock 0Hz busmode 2 powermode 0 cs 0 Vdd 0 width 1 timing 0
[   46.219307] CPU: 6 PID: 153 Comm: kworker/6:1 Tainted: G           O       6.6.59 #1
[   46.231988] Hardware name: [Redacted]
[   46.238678] Workqueue: events_freezable mmc_rescan
[   46.243466] Call trace:
[   46.245901]  dump_backtrace+0x9c/0x128
[   46.249643]  show_stack+0x20/0x38
[   46.252948]  dump_stack_lvl+0x48/0x60
[   46.256603]  dump_stack+0x18/0x28
[   46.259909]  mmc_alive+0x74/0x88
[   46.263128]  _mmc_detect_card_removed+0x3c/0x158
[   46.267735]  mmc_detect+0x30/0x98
[   46.271040]  mmc_rescan+0x94/0x238
[   46.274432]  process_one_work+0x178/0x3d8
[   46.278432]  worker_thread+0x2bc/0x3e0
[   46.282171]  kthread+0xf8/0x110
[   46.285303]  ret_from_fork+0x10/0x20
[   46.288878] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg 00010000 flags 00000195
[   56.793379] mmc1: Timeout waiting for hardware cmd interrupt.
[   56.799116] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
[   56.805545] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00002202
...

> >     ...
> >     [   71.031911] [Redacted] 2030000.i2c:[Redacted]@41:watchdog:
> Watchdog interrupt received!
> >     [   71.039737] Kernel panic - not syncing: watchdog pretimeout event
> >     [   71.045820] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O
> 6.6.59 #1
> >     [   71.053207] Hardware name: [Redacted]
> >     [   71.059897] Call trace:
> >     [   71.062332]  dump_backtrace+0x9c/0x128
> >     ...
> >
> > CMD6 is SWITCH_FUNC and arg 03220301 is POWER_OFF_NOTIFICATION
> (bits 16:23 = 0x22 = 34).
> > CMD13 is SEND_STATUS, and when it occurs after the
> POWER_OFF_NOTIFICATION (as above) bad things happen.
> >
> > I have made the following change to attempt to work around the issue, and
> it seems to hold up, but is also quite brute force:
> >
> > --- a/drivers/mmc/core/mmc.c
> > +++ b/drivers/mmc/core/mmc.c
> > @@ -2046,6 +2046,11 @@ static void mmc_remove(struct mmc_host *host)
> >   */
> >  static int mmc_alive(struct mmc_host *host)  {
> > +       if (host->card && mmc_card_suspended(host->card)) {
> > +               pr_err("%s: Skip card detection: Card suspended\n",
> > +                      mmc_hostname(host));
> > +               return -ENOMEDIUM;
> > +       }
> >         return mmc_send_status(host->card, NULL);  }
> 
> Yeah, the above isn't really the correct solution in my opinion.
> 
> We need to prevent the mmc_rescan work from running, which I thought we
> already did...
> 
> >
> > Anthony
> >
> >
> 
> Kind regards
> Uffe

Ulf Hansson Nov. 22, 2024, 3:58 p.m. UTC | #3

On Thu, 21 Nov 2024 at 14:23, Anthony Pighin (Nokia)
<anthony.pighin@nokia.com> wrote:
>
> > >
> > > If card detection is done via polling, due to broken-cd (Freescale LX2160,
> > etc.), or for other reasons, then the card will be polled asynchronously and
> > periodically.
> > >
> > > If that polling happens after the card has been put in powered off state (i.e.
> > during system shutdown/reboot), then the polling times out. That timeout is
> > of a long duration (10s). And it is repeated multiple times (x3). And that is all
> > done after the watchdogd has been disabled, meaning that system watchdogs
> > are not being kicked.
> > >
> > > If the MMC polling exceeds the watchdog trip time, then the system will be
> > ungraciously reset. Or in the case of a pretimeout capable watchdog, the
> > pretimeout will trip unnecessarily.
> > >
> > >     [   46.872767] mmc_mrq_pr_debug:274: mmc1: starting CMD6 arg
> > 03220301 flags 0000049d
> > >     [   46.880258] sdhci_irq:3558: mmc1: sdhci: IRQ status 0x00000001
> > >     [   46.886082] sdhci_irq:3558: mmc1: sdhci: IRQ status 0x00000002
> > >     [   46.891906] mmc_request_done:187: mmc1: req done (CMD6): 0:
> > 00000800 00000000 00000000 00000000
> > >     [   46.900606] mmc_set_ios:892: mmc1: clock 0Hz busmode 2
> > powermode 0 cs 0 Vdd 0 width 1 timing 0
> > >     [   46.914934] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg
> > 00010000 flags 00000195
> > >     [   57.433351] mmc1: Timeout waiting for hardware cmd interrupt.
> >
> > Hmm. How is the polling being done? By using MMC_CAP_NEEDS_POLL?
> >
>
> Correct. (At least in my understanding.) 'broken-cd' in the device tree will trigger
> MMC_CAP_NEEDS_POLL to be set.
>
> > The above certainly looks weird. The mmc_rescan work should not be
> > allowed to run at this point, as the work is getting punted to the
> > system_freezable_wq via mmc_schedule_delayed_work().
> >
>
> This is the backtrace I get when the timeout occurs:
>
> [   46.154348] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg 00010000 flags 00000195
> [   46.161917] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000001
> [   46.167743] mmc_request_done:187: mmc1: req done (CMD13): 0: 00000900 00000000 00000000 00000000
> [   46.176526] mmc_mrq_pr_debug:274: mmc1: starting CMD6 arg 03220301 flags 0000049d
> [   46.184016] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000001
> [   46.189840] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000002
> [   46.195665] mmc_request_done:187: mmc1: req done (CMD6): 0: 00000800 00000000 00000000 00000000
> [   46.204362] mmc_set_ios:892: mmc1: clock 0Hz busmode 2 powermode 0 cs 0 Vdd 0 width 1 timing 0
> [   46.219307] CPU: 6 PID: 153 Comm: kworker/6:1 Tainted: G           O       6.6.59 #1
> [   46.231988] Hardware name: [Redacted]
> [   46.238678] Workqueue: events_freezable mmc_rescan
> [   46.243466] Call trace:
> [   46.245901]  dump_backtrace+0x9c/0x128
> [   46.249643]  show_stack+0x20/0x38
> [   46.252948]  dump_stack_lvl+0x48/0x60
> [   46.256603]  dump_stack+0x18/0x28
> [   46.259909]  mmc_alive+0x74/0x88
> [   46.263128]  _mmc_detect_card_removed+0x3c/0x158
> [   46.267735]  mmc_detect+0x30/0x98
> [   46.271040]  mmc_rescan+0x94/0x238
> [   46.274432]  process_one_work+0x178/0x3d8
> [   46.278432]  worker_thread+0x2bc/0x3e0
> [   46.282171]  kthread+0xf8/0x110
> [   46.285303]  ret_from_fork+0x10/0x20
> [   46.288878] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg 00010000 flags 00000195
> [   56.793379] mmc1: Timeout waiting for hardware cmd interrupt.
> [   56.799116] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
> [   56.805545] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00002202
> ...
>

Okay. If this is system suspend, it looks like the workqueue didn't
become frozen as exepcted. I don't know why, but this needs to be
investigated.

If this is a shutdown, mmc_host_classdev_shutdown() should be called
to "disable" the mmc_rescan work from running the code causing the
above.

What am I missing?

Kind regards
Uffe

> > >     ...
> > >     [   71.031911] [Redacted] 2030000.i2c:[Redacted]@41:watchdog:
> > Watchdog interrupt received!
> > >     [   71.039737] Kernel panic - not syncing: watchdog pretimeout event
> > >     [   71.045820] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O
> > 6.6.59 #1
> > >     [   71.053207] Hardware name: [Redacted]
> > >     [   71.059897] Call trace:
> > >     [   71.062332]  dump_backtrace+0x9c/0x128
> > >     ...
> > >
> > > CMD6 is SWITCH_FUNC and arg 03220301 is POWER_OFF_NOTIFICATION
> > (bits 16:23 = 0x22 = 34).
> > > CMD13 is SEND_STATUS, and when it occurs after the
> > POWER_OFF_NOTIFICATION (as above) bad things happen.
> > >
> > > I have made the following change to attempt to work around the issue, and
> > it seems to hold up, but is also quite brute force:
> > >
> > > --- a/drivers/mmc/core/mmc.c
> > > +++ b/drivers/mmc/core/mmc.c
> > > @@ -2046,6 +2046,11 @@ static void mmc_remove(struct mmc_host *host)
> > >   */
> > >  static int mmc_alive(struct mmc_host *host)  {
> > > +       if (host->card && mmc_card_suspended(host->card)) {
> > > +               pr_err("%s: Skip card detection: Card suspended\n",
> > > +                      mmc_hostname(host));
> > > +               return -ENOMEDIUM;
> > > +       }
> > >         return mmc_send_status(host->card, NULL);  }
> >
> > Yeah, the above isn't really the correct solution in my opinion.
> >
> > We need to prevent the mmc_rescan work from running, which I thought we
> > already did...
> >
> > >
> > > Anthony
> > >
> > >
> >
> > Kind regards
> > Uffe

Anthony Pighin (Nokia) Nov. 22, 2024, 5:47 p.m. UTC | #4

> > > >
> > > > If card detection is done via polling, due to broken-cd (Freescale
> > > > LX2160,
> > > etc.), or for other reasons, then the card will be polled
> > > asynchronously and periodically.
> > > >
> > > > If that polling happens after the card has been put in powered off state
> (i.e.
> > > during system shutdown/reboot), then the polling times out. That
> > > timeout is of a long duration (10s). And it is repeated multiple
> > > times (x3). And that is all done after the watchdogd has been
> > > disabled, meaning that system watchdogs are not being kicked.
> > > >
> > > > If the MMC polling exceeds the watchdog trip time, then the system
> > > > will be
> > > ungraciously reset. Or in the case of a pretimeout capable watchdog,
> > > the pretimeout will trip unnecessarily.
> > > >
> > > >     [   46.872767] mmc_mrq_pr_debug:274: mmc1: starting CMD6 arg
> > > 03220301 flags 0000049d
> > > >     [   46.880258] sdhci_irq:3558: mmc1: sdhci: IRQ status 0x00000001
> > > >     [   46.886082] sdhci_irq:3558: mmc1: sdhci: IRQ status 0x00000002
> > > >     [   46.891906] mmc_request_done:187: mmc1: req done (CMD6): 0:
> > > 00000800 00000000 00000000 00000000
> > > >     [   46.900606] mmc_set_ios:892: mmc1: clock 0Hz busmode 2
> > > powermode 0 cs 0 Vdd 0 width 1 timing 0
> > > >     [   46.914934] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg
> > > 00010000 flags 00000195
> > > >     [   57.433351] mmc1: Timeout waiting for hardware cmd interrupt.
> > >
> > > Hmm. How is the polling being done? By using MMC_CAP_NEEDS_POLL?
> > >
> >
> > Correct. (At least in my understanding.) 'broken-cd' in the device
> > tree will trigger MMC_CAP_NEEDS_POLL to be set.
> >
> > > The above certainly looks weird. The mmc_rescan work should not be
> > > allowed to run at this point, as the work is getting punted to the
> > > system_freezable_wq via mmc_schedule_delayed_work().
> > >
> >
> > This is the backtrace I get when the timeout occurs:
> >
> > [   46.154348] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg
> 00010000 flags 00000195
> > [   46.161917] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000001
> > [   46.167743] mmc_request_done:187: mmc1: req done (CMD13): 0:
> 00000900 00000000 00000000 00000000
> > [   46.176526] mmc_mrq_pr_debug:274: mmc1: starting CMD6 arg
> 03220301 flags 0000049d
> > [   46.184016] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000001
> > [   46.189840] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000002
> > [   46.195665] mmc_request_done:187: mmc1: req done (CMD6): 0:
> 00000800 00000000 00000000 00000000
> > [   46.204362] mmc_set_ios:892: mmc1: clock 0Hz busmode 2 powermode 0
> cs 0 Vdd 0 width 1 timing 0
> > [   46.219307] CPU: 6 PID: 153 Comm: kworker/6:1 Tainted: G           O
> 6.6.59 #1
> > [   46.231988] Hardware name: [Redacted]
> > [   46.238678] Workqueue: events_freezable mmc_rescan
> > [   46.243466] Call trace:
> > [   46.245901]  dump_backtrace+0x9c/0x128
> > [   46.249643]  show_stack+0x20/0x38
> > [   46.252948]  dump_stack_lvl+0x48/0x60
> > [   46.256603]  dump_stack+0x18/0x28
> > [   46.259909]  mmc_alive+0x74/0x88
> > [   46.263128]  _mmc_detect_card_removed+0x3c/0x158
> > [   46.267735]  mmc_detect+0x30/0x98
> > [   46.271040]  mmc_rescan+0x94/0x238
> > [   46.274432]  process_one_work+0x178/0x3d8
> > [   46.278432]  worker_thread+0x2bc/0x3e0
> > [   46.282171]  kthread+0xf8/0x110
> > [   46.285303]  ret_from_fork+0x10/0x20
> > [   46.288878] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg
> 00010000 flags 00000195
> > [   56.793379] mmc1: Timeout waiting for hardware cmd interrupt.
> > [   56.799116] mmc1: sdhci: ============ SDHCI REGISTER DUMP
> ===========
> > [   56.805545] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00002202
> > ...
> >
> 
> Okay. If this is system suspend, it looks like the workqueue didn't become
> frozen as exepcted. I don't know why, but this needs to be investigated.
> 
> If this is a shutdown, mmc_host_classdev_shutdown() should be called to
> "disable" the mmc_rescan work from running the code causing the above.
> 
> What am I missing?
> 
> Kind regards
> Uffe
> 

This occurs in response to a 'reboot' command.

Here is an instance of the failure, with initcall_debug set to 'Y':

Rebooting... 
[   44.500858] caam_jr 8030000.jr: shutdown
[   44.504805] caam_jr 8020000.jr: shutdown
[   44.508748] caam_jr 8010000.jr: shutdown
[   44.515349] tpm tpm0: shutdown_pre
[   44.523165] igb 0000:01:00.0: shutdown
[   44.558695] usb usb2-port1: shutdown
[   44.562268] usb usb1-port1: shutdown
[   44.565864] xhci-hcd xhci-hcd.0.auto: shutdown
[   44.570306] dwc3 3100000.usb: shutdown
[   44.574052] [Redacted] 2030000.i2c:cpld@41:watchdog: shutdown
[   44.580137] tpm_tis_spi spi2.0: shutdown
[   44.584054] spi-nor spi0.1: shutdown
[   44.588066] spi-nor spi0.0: shutdown
[   44.592112] mmcblk mmc0:0001: shutdown
[   44.595902] mmc_mrq_pr_debug:274: mmc0: starting CMD7 arg 00000000 flags 00000000
[   44.603413] sdhci_irq:3546: mmc0: sdhci: IRQ status 0x00000001
[   44.609240] mmc_request_done:187: mmc0: req done (CMD7): 0: 00000000 00000000 00000000 00000000
[   44.617939] mmc_set_ios:892: mmc0: clock 0Hz busmode 2 powermode 0 cs 0 Vdd 0 width 1 timing 0
[   44.627877] mmcblk mmc1:0001: shutdown
[   44.631709] mmc_mrq_pr_debug:274: mmc1: starting CMD6 arg 03200101 flags 00000095
[   44.639200] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000001
[   44.645027] mmc_request_done:187: mmc1: req done (CMD6): 0: 00000800 00000000 00000000 00000000
[   44.653727] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg 00010000 flags 00000195
[   44.661296] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000001
[   44.667123] mmc_request_done:187: mmc1: req done (CMD13): 0: 00000900 00000000 00000000 00000000
[   44.675907] mmc_mrq_pr_debug:274: mmc1: starting CMD6 arg 03220301 flags 0000049d
[   44.683399] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000001
[   44.689223] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000002
[   44.695047] mmc_request_done:187: mmc1: req done (CMD6): 0: 00000800 00000000 00000000 00000000
[   44.703746] mmc_set_ios:892: mmc1: clock 0Hz busmode 2 powermode 0 cs 0 Vdd 0 width 1 timing 0
[   44.713667] aer 0000:00:00.0:pcie002: shutdown
[   44.713673] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg 00010000 flags 00000195
[   44.718102] pcie_pme 0000:00:00.0:pcie001: shutdown
[   44.730534] pcieport 0000:00:00.0: shutdown
[   44.734898] fsl_mc_dprc dprc.1: shutdown
[   44.740863] fsl_mc_dprc dprc.1: DPRC device shutdown
[   44.745830] mmc_host mmc1: shutdown_pre
[   54.873286] mmc1: Timeout waiting for hardware cmd interrupt.
[   54.879023] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
[   54.885453] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00002202

The trigger CMD13 is in-flight at time 44.713673.
The mmc_host mmc1: shutdown_pre is at time 44.745830.

> > > >     ...
> > > >     [   71.031911] [Redacted] 2030000.i2c:[Redacted]@41:watchdog:
> > > Watchdog interrupt received!
> > > >     [   71.039737] Kernel panic - not syncing: watchdog pretimeout event
> > > >     [   71.045820] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O
> > > 6.6.59 #1
> > > >     [   71.053207] Hardware name: [Redacted]
> > > >     [   71.059897] Call trace:
> > > >     [   71.062332]  dump_backtrace+0x9c/0x128
> > > >     ...
> > > >
> > > > CMD6 is SWITCH_FUNC and arg 03220301 is
> POWER_OFF_NOTIFICATION
> > > (bits 16:23 = 0x22 = 34).
> > > > CMD13 is SEND_STATUS, and when it occurs after the
> > > POWER_OFF_NOTIFICATION (as above) bad things happen.
> > > >
> > > > I have made the following change to attempt to work around the
> > > > issue, and
> > > it seems to hold up, but is also quite brute force:
> > > >
> > > > --- a/drivers/mmc/core/mmc.c
> > > > +++ b/drivers/mmc/core/mmc.c
> > > > @@ -2046,6 +2046,11 @@ static void mmc_remove(struct mmc_host
> *host)
> > > >   */
> > > >  static int mmc_alive(struct mmc_host *host)  {
> > > > +       if (host->card && mmc_card_suspended(host->card)) {
> > > > +               pr_err("%s: Skip card detection: Card suspended\n",
> > > > +                      mmc_hostname(host));
> > > > +               return -ENOMEDIUM;
> > > > +       }
> > > >         return mmc_send_status(host->card, NULL);  }
> > >
> > > Yeah, the above isn't really the correct solution in my opinion.
> > >
> > > We need to prevent the mmc_rescan work from running, which I thought
> > > we already did...
> > >
> > > >
> > > > Anthony
> > > >
> > > >
> > >
> > > Kind regards
> > > Uffe

Ulf Hansson Nov. 25, 2024, 12:29 p.m. UTC | #5

On Fri, 22 Nov 2024 at 18:47, Anthony Pighin (Nokia)
<anthony.pighin@nokia.com> wrote:
>
> > > > >
> > > > > If card detection is done via polling, due to broken-cd (Freescale
> > > > > LX2160,
> > > > etc.), or for other reasons, then the card will be polled
> > > > asynchronously and periodically.
> > > > >
> > > > > If that polling happens after the card has been put in powered off state
> > (i.e.
> > > > during system shutdown/reboot), then the polling times out. That
> > > > timeout is of a long duration (10s). And it is repeated multiple
> > > > times (x3). And that is all done after the watchdogd has been
> > > > disabled, meaning that system watchdogs are not being kicked.
> > > > >
> > > > > If the MMC polling exceeds the watchdog trip time, then the system
> > > > > will be
> > > > ungraciously reset. Or in the case of a pretimeout capable watchdog,
> > > > the pretimeout will trip unnecessarily.
> > > > >
> > > > >     [   46.872767] mmc_mrq_pr_debug:274: mmc1: starting CMD6 arg
> > > > 03220301 flags 0000049d
> > > > >     [   46.880258] sdhci_irq:3558: mmc1: sdhci: IRQ status 0x00000001
> > > > >     [   46.886082] sdhci_irq:3558: mmc1: sdhci: IRQ status 0x00000002
> > > > >     [   46.891906] mmc_request_done:187: mmc1: req done (CMD6): 0:
> > > > 00000800 00000000 00000000 00000000
> > > > >     [   46.900606] mmc_set_ios:892: mmc1: clock 0Hz busmode 2
> > > > powermode 0 cs 0 Vdd 0 width 1 timing 0
> > > > >     [   46.914934] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg
> > > > 00010000 flags 00000195
> > > > >     [   57.433351] mmc1: Timeout waiting for hardware cmd interrupt.
> > > >
> > > > Hmm. How is the polling being done? By using MMC_CAP_NEEDS_POLL?
> > > >
> > >
> > > Correct. (At least in my understanding.) 'broken-cd' in the device
> > > tree will trigger MMC_CAP_NEEDS_POLL to be set.
> > >
> > > > The above certainly looks weird. The mmc_rescan work should not be
> > > > allowed to run at this point, as the work is getting punted to the
> > > > system_freezable_wq via mmc_schedule_delayed_work().
> > > >
> > >
> > > This is the backtrace I get when the timeout occurs:
> > >
> > > [   46.154348] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg
> > 00010000 flags 00000195
> > > [   46.161917] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000001
> > > [   46.167743] mmc_request_done:187: mmc1: req done (CMD13): 0:
> > 00000900 00000000 00000000 00000000
> > > [   46.176526] mmc_mrq_pr_debug:274: mmc1: starting CMD6 arg
> > 03220301 flags 0000049d
> > > [   46.184016] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000001
> > > [   46.189840] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000002
> > > [   46.195665] mmc_request_done:187: mmc1: req done (CMD6): 0:
> > 00000800 00000000 00000000 00000000
> > > [   46.204362] mmc_set_ios:892: mmc1: clock 0Hz busmode 2 powermode 0
> > cs 0 Vdd 0 width 1 timing 0
> > > [   46.219307] CPU: 6 PID: 153 Comm: kworker/6:1 Tainted: G           O
> > 6.6.59 #1
> > > [   46.231988] Hardware name: [Redacted]
> > > [   46.238678] Workqueue: events_freezable mmc_rescan
> > > [   46.243466] Call trace:
> > > [   46.245901]  dump_backtrace+0x9c/0x128
> > > [   46.249643]  show_stack+0x20/0x38
> > > [   46.252948]  dump_stack_lvl+0x48/0x60
> > > [   46.256603]  dump_stack+0x18/0x28
> > > [   46.259909]  mmc_alive+0x74/0x88
> > > [   46.263128]  _mmc_detect_card_removed+0x3c/0x158
> > > [   46.267735]  mmc_detect+0x30/0x98
> > > [   46.271040]  mmc_rescan+0x94/0x238
> > > [   46.274432]  process_one_work+0x178/0x3d8
> > > [   46.278432]  worker_thread+0x2bc/0x3e0
> > > [   46.282171]  kthread+0xf8/0x110
> > > [   46.285303]  ret_from_fork+0x10/0x20
> > > [   46.288878] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg
> > 00010000 flags 00000195
> > > [   56.793379] mmc1: Timeout waiting for hardware cmd interrupt.
> > > [   56.799116] mmc1: sdhci: ============ SDHCI REGISTER DUMP
> > ===========
> > > [   56.805545] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00002202
> > > ...
> > >
> >
> > Okay. If this is system suspend, it looks like the workqueue didn't become
> > frozen as exepcted. I don't know why, but this needs to be investigated.
> >
> > If this is a shutdown, mmc_host_classdev_shutdown() should be called to
> > "disable" the mmc_rescan work from running the code causing the above.
> >
> > What am I missing?
> >
> > Kind regards
> > Uffe
> >
>
> This occurs in response to a 'reboot' command.
>
> Here is an instance of the failure, with initcall_debug set to 'Y':
>
> Rebooting...
> [   44.500858] caam_jr 8030000.jr: shutdown
> [   44.504805] caam_jr 8020000.jr: shutdown
> [   44.508748] caam_jr 8010000.jr: shutdown
> [   44.515349] tpm tpm0: shutdown_pre
> [   44.523165] igb 0000:01:00.0: shutdown
> [   44.558695] usb usb2-port1: shutdown
> [   44.562268] usb usb1-port1: shutdown
> [   44.565864] xhci-hcd xhci-hcd.0.auto: shutdown
> [   44.570306] dwc3 3100000.usb: shutdown
> [   44.574052] [Redacted] 2030000.i2c:cpld@41:watchdog: shutdown
> [   44.580137] tpm_tis_spi spi2.0: shutdown
> [   44.584054] spi-nor spi0.1: shutdown
> [   44.588066] spi-nor spi0.0: shutdown
> [   44.592112] mmcblk mmc0:0001: shutdown
> [   44.595902] mmc_mrq_pr_debug:274: mmc0: starting CMD7 arg 00000000 flags 00000000
> [   44.603413] sdhci_irq:3546: mmc0: sdhci: IRQ status 0x00000001
> [   44.609240] mmc_request_done:187: mmc0: req done (CMD7): 0: 00000000 00000000 00000000 00000000
> [   44.617939] mmc_set_ios:892: mmc0: clock 0Hz busmode 2 powermode 0 cs 0 Vdd 0 width 1 timing 0
> [   44.627877] mmcblk mmc1:0001: shutdown
> [   44.631709] mmc_mrq_pr_debug:274: mmc1: starting CMD6 arg 03200101 flags 00000095
> [   44.639200] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000001
> [   44.645027] mmc_request_done:187: mmc1: req done (CMD6): 0: 00000800 00000000 00000000 00000000
> [   44.653727] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg 00010000 flags 00000195
> [   44.661296] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000001
> [   44.667123] mmc_request_done:187: mmc1: req done (CMD13): 0: 00000900 00000000 00000000 00000000
> [   44.675907] mmc_mrq_pr_debug:274: mmc1: starting CMD6 arg 03220301 flags 0000049d
> [   44.683399] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000001
> [   44.689223] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000002
> [   44.695047] mmc_request_done:187: mmc1: req done (CMD6): 0: 00000800 00000000 00000000 00000000
> [   44.703746] mmc_set_ios:892: mmc1: clock 0Hz busmode 2 powermode 0 cs 0 Vdd 0 width 1 timing 0
> [   44.713667] aer 0000:00:00.0:pcie002: shutdown
> [   44.713673] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg 00010000 flags 00000195
> [   44.718102] pcie_pme 0000:00:00.0:pcie001: shutdown
> [   44.730534] pcieport 0000:00:00.0: shutdown
> [   44.734898] fsl_mc_dprc dprc.1: shutdown
> [   44.740863] fsl_mc_dprc dprc.1: DPRC device shutdown
> [   44.745830] mmc_host mmc1: shutdown_pre
> [   54.873286] mmc1: Timeout waiting for hardware cmd interrupt.
> [   54.879023] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
> [   54.885453] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00002202
>
> The trigger CMD13 is in-flight at time 44.713673.
> The mmc_host mmc1: shutdown_pre is at time 44.745830.

Thanks a lot for providing the details! Yes, this certainly points out
the problem during shutdown/reboot.

I have sent a patch [1] that I think should address the problem. Can
you please test and verify it?

[...]

Kind regards
Uffe

[1]
https://lore.kernel.org/all/20241125122446.18684-1-ulf.hansson@linaro.org/

Anthony Pighin (Nokia) Nov. 25, 2024, 6:07 p.m. UTC | #6

> > > > > >
> > > > > > If card detection is done via polling, due to broken-cd
> > > > > > (Freescale LX2160,
> > > > > etc.), or for other reasons, then the card will be polled
> > > > > asynchronously and periodically.
> > > > > >
> > > > > > If that polling happens after the card has been put in powered
> > > > > > off state
> > > (i.e.
> > > > > during system shutdown/reboot), then the polling times out. That
> > > > > timeout is of a long duration (10s). And it is repeated multiple
> > > > > times (x3). And that is all done after the watchdogd has been
> > > > > disabled, meaning that system watchdogs are not being kicked.
> > > > > >
> > > > > > If the MMC polling exceeds the watchdog trip time, then the
> > > > > > system will be
> > > > > ungraciously reset. Or in the case of a pretimeout capable
> > > > > watchdog, the pretimeout will trip unnecessarily.
> > > > > >
> > > > > >     [   46.872767] mmc_mrq_pr_debug:274: mmc1: starting CMD6 arg
> > > > > 03220301 flags 0000049d
> > > > > >     [   46.880258] sdhci_irq:3558: mmc1: sdhci: IRQ status 0x00000001
> > > > > >     [   46.886082] sdhci_irq:3558: mmc1: sdhci: IRQ status 0x00000002
> > > > > >     [   46.891906] mmc_request_done:187: mmc1: req done (CMD6):
> 0:
> > > > > 00000800 00000000 00000000 00000000
> > > > > >     [   46.900606] mmc_set_ios:892: mmc1: clock 0Hz busmode 2
> > > > > powermode 0 cs 0 Vdd 0 width 1 timing 0
> > > > > >     [   46.914934] mmc_mrq_pr_debug:274: mmc1: starting CMD13
> arg
> > > > > 00010000 flags 00000195
> > > > > >     [   57.433351] mmc1: Timeout waiting for hardware cmd interrupt.
> > > > >
> > > > > Hmm. How is the polling being done? By using
> MMC_CAP_NEEDS_POLL?
> > > > >
> > > >
> > > > Correct. (At least in my understanding.) 'broken-cd' in the device
> > > > tree will trigger MMC_CAP_NEEDS_POLL to be set.
> > > >
> > > > > The above certainly looks weird. The mmc_rescan work should not
> > > > > be allowed to run at this point, as the work is getting punted
> > > > > to the system_freezable_wq via mmc_schedule_delayed_work().
> > > > >
> > > >
> > > > This is the backtrace I get when the timeout occurs:
> > > >
> > > > [   46.154348] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg
> > > 00010000 flags 00000195
> > > > [   46.161917] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000001
> > > > [   46.167743] mmc_request_done:187: mmc1: req done (CMD13): 0:
> > > 00000900 00000000 00000000 00000000
> > > > [   46.176526] mmc_mrq_pr_debug:274: mmc1: starting CMD6 arg
> > > 03220301 flags 0000049d
> > > > [   46.184016] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000001
> > > > [   46.189840] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000002
> > > > [   46.195665] mmc_request_done:187: mmc1: req done (CMD6): 0:
> > > 00000800 00000000 00000000 00000000
> > > > [   46.204362] mmc_set_ios:892: mmc1: clock 0Hz busmode 2
> powermode 0
> > > cs 0 Vdd 0 width 1 timing 0
> > > > [   46.219307] CPU: 6 PID: 153 Comm: kworker/6:1 Tainted: G           O
> > > 6.6.59 #1
> > > > [   46.231988] Hardware name: [Redacted]
> > > > [   46.238678] Workqueue: events_freezable mmc_rescan
> > > > [   46.243466] Call trace:
> > > > [   46.245901]  dump_backtrace+0x9c/0x128
> > > > [   46.249643]  show_stack+0x20/0x38
> > > > [   46.252948]  dump_stack_lvl+0x48/0x60
> > > > [   46.256603]  dump_stack+0x18/0x28
> > > > [   46.259909]  mmc_alive+0x74/0x88
> > > > [   46.263128]  _mmc_detect_card_removed+0x3c/0x158
> > > > [   46.267735]  mmc_detect+0x30/0x98
> > > > [   46.271040]  mmc_rescan+0x94/0x238
> > > > [   46.274432]  process_one_work+0x178/0x3d8
> > > > [   46.278432]  worker_thread+0x2bc/0x3e0
> > > > [   46.282171]  kthread+0xf8/0x110
> > > > [   46.285303]  ret_from_fork+0x10/0x20
> > > > [   46.288878] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg
> > > 00010000 flags 00000195
> > > > [   56.793379] mmc1: Timeout waiting for hardware cmd interrupt.
> > > > [   56.799116] mmc1: sdhci: ============ SDHCI REGISTER DUMP
> > > ===========
> > > > [   56.805545] mmc1: sdhci: Sys addr:  0x00000000 | Version:
> 0x00002202
> > > > ...
> > > >
> > >
> > > Okay. If this is system suspend, it looks like the workqueue didn't
> > > become frozen as exepcted. I don't know why, but this needs to be
> investigated.
> > >
> > > If this is a shutdown, mmc_host_classdev_shutdown() should be called
> > > to "disable" the mmc_rescan work from running the code causing the
> above.
> > >
> > > What am I missing?
> > >
> > > Kind regards
> > > Uffe
> > >
> >
> > This occurs in response to a 'reboot' command.
> >
> > Here is an instance of the failure, with initcall_debug set to 'Y':
> >
> > Rebooting...
> > [   44.500858] caam_jr 8030000.jr: shutdown
> > [   44.504805] caam_jr 8020000.jr: shutdown
> > [   44.508748] caam_jr 8010000.jr: shutdown
> > [   44.515349] tpm tpm0: shutdown_pre
> > [   44.523165] igb 0000:01:00.0: shutdown
> > [   44.558695] usb usb2-port1: shutdown
> > [   44.562268] usb usb1-port1: shutdown
> > [   44.565864] xhci-hcd xhci-hcd.0.auto: shutdown
> > [   44.570306] dwc3 3100000.usb: shutdown
> > [   44.574052] [Redacted] 2030000.i2c:cpld@41:watchdog: shutdown
> > [   44.580137] tpm_tis_spi spi2.0: shutdown
> > [   44.584054] spi-nor spi0.1: shutdown
> > [   44.588066] spi-nor spi0.0: shutdown
> > [   44.592112] mmcblk mmc0:0001: shutdown
> > [   44.595902] mmc_mrq_pr_debug:274: mmc0: starting CMD7 arg
> 00000000 flags 00000000
> > [   44.603413] sdhci_irq:3546: mmc0: sdhci: IRQ status 0x00000001
> > [   44.609240] mmc_request_done:187: mmc0: req done (CMD7): 0:
> 00000000 00000000 00000000 00000000
> > [   44.617939] mmc_set_ios:892: mmc0: clock 0Hz busmode 2 powermode 0
> cs 0 Vdd 0 width 1 timing 0
> > [   44.627877] mmcblk mmc1:0001: shutdown
> > [   44.631709] mmc_mrq_pr_debug:274: mmc1: starting CMD6 arg
> 03200101 flags 00000095
> > [   44.639200] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000001
> > [   44.645027] mmc_request_done:187: mmc1: req done (CMD6): 0:
> 00000800 00000000 00000000 00000000
> > [   44.653727] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg
> 00010000 flags 00000195
> > [   44.661296] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000001
> > [   44.667123] mmc_request_done:187: mmc1: req done (CMD13): 0:
> 00000900 00000000 00000000 00000000
> > [   44.675907] mmc_mrq_pr_debug:274: mmc1: starting CMD6 arg
> 03220301 flags 0000049d
> > [   44.683399] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000001
> > [   44.689223] sdhci_irq:3546: mmc1: sdhci: IRQ status 0x00000002
> > [   44.695047] mmc_request_done:187: mmc1: req done (CMD6): 0:
> 00000800 00000000 00000000 00000000
> > [   44.703746] mmc_set_ios:892: mmc1: clock 0Hz busmode 2 powermode 0
> cs 0 Vdd 0 width 1 timing 0
> > [   44.713667] aer 0000:00:00.0:pcie002: shutdown
> > [   44.713673] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg
> 00010000 flags 00000195
> > [   44.718102] pcie_pme 0000:00:00.0:pcie001: shutdown
> > [   44.730534] pcieport 0000:00:00.0: shutdown
> > [   44.734898] fsl_mc_dprc dprc.1: shutdown
> > [   44.740863] fsl_mc_dprc dprc.1: DPRC device shutdown
> > [   44.745830] mmc_host mmc1: shutdown_pre
> > [   54.873286] mmc1: Timeout waiting for hardware cmd interrupt.
> > [   54.879023] mmc1: sdhci: ============ SDHCI REGISTER DUMP
> ===========
> > [   54.885453] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00002202
> >
> > The trigger CMD13 is in-flight at time 44.713673.
> > The mmc_host mmc1: shutdown_pre is at time 44.745830.
>
> Thanks a lot for providing the details! Yes, this certainly points out the
> problem during shutdown/reboot.
>
> I have sent a patch [1] that I think should address the problem. Can you
> please test and verify it?
>
> [...]
>
> Kind regards
> Uffe
>
> [1]
> https://lore.ke/
> rnel.org%2Fall%2F20241125122446.18684-1-
> ulf.hansson%40linaro.org%2F&data=05%7C02%7Canthony.pighin%40nokia.co
> m%7C2b25985f3d974df1d15e08dd0d4cf275%7C5d4717519675428d917b70f4
> 4f9630b0%7C0%7C0%7C638681346376209654%7CUnknown%7CTWFpbGZsb
> 3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkF
> OIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=XQ%2B3SRi8%2B
> F9NZLEuhmekipj5YrjxVhTnlXIP2RUokmg%3D&reserved=0

I confirm that the patch corrects the behaviour.

Thank you!

Anthony

MMC card detection may trip watchdog if card powered off

Commit Message

Comments

Patch