[v3,10/10] mmc: mmci: Add busydetect timeout

Message ID	20230405-pl180-busydetect-fix-v3-10-cd3d5925ae64@linaro.org
State	New
Headers	show Return-Path: <linux-mmc-owner@vger.kernel.org> From: Linus Walleij <linus.walleij@linaro.org> Date: Sun, 11 Jun 2023 21:41:36 +0200 Subject: [PATCH v3 10/10] mmc: mmci: Add busydetect timeout MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20230405-pl180-busydetect-fix-v3-10-cd3d5925ae64@linaro.org> References: <20230405-pl180-busydetect-fix-v3-0-cd3d5925ae64@linaro.org> In-Reply-To: <20230405-pl180-busydetect-fix-v3-0-cd3d5925ae64@linaro.org> To: Yann Gautier <yann.gautier@foss.st.com>, Stefan Hansson <newbyte@disroot.org>, Ulf Hansson <ulf.hansson@linaro.org>, Maxime Coquelin <mcoquelin.stm32@gmail.com>, Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: linux-mmc@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com, Linus Walleij <linus.walleij@linaro.org> Precedence: bulk
Series	Fix busydetect on Ux500 PL180/MMCI \| expand [v3,00/10] Fix busydetect on Ux500 PL180/MMCI [v3,01/10] mmc: mmci: Clear busy_status when starting command [v3,02/10] mmc: mmci: Unwind big if() clause [v3,03/10] mmc: mmci: Stash status while waiting for busy [v3,04/10] mmc: mmci: Break out error check in busy detect [v3,05/10] mmc: mmci: Make busy complete state machine explicit [v3,06/10] mmc: mmci: Retry the busy start condition [v3,07/10] mmc: mmci: Use state machine state as exit condition [v3,08/10] mmc: mmci: Use a switch statement machine [v3,09/10] mmc: mmci: Break out a helper function [v3,10/10] mmc: mmci: Add busydetect timeout

Linus Walleij June 11, 2023, 7:41 p.m. UTC

Add a timeout for busydetect IRQs using a delayed work.
It might happen (and does happen) on Ux500 that the first
busy detect IRQ appears and not the second one. This will
make the host hang indefinitely waiting for the second
IRQ to appear.

Fire a delayed work after 10ms and re-engage the command
IRQ so the transaction finishes: we are certainly done
at this point, or we will catch an error in the status
register.

This makes the eMMC work again on Skomer.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
ChangeLog v2->v3:
- Rebased.
ChangeLog v1->v2:
- No changes
---
 drivers/mmc/host/mmci.c | 23 +++++++++++++++++++++++
 drivers/mmc/host/mmci.h |  1 +
 2 files changed, 24 insertions(+)

Ulf Hansson June 13, 2023, 12:07 p.m. UTC | #1

On Sun, 11 Jun 2023 at 21:41, Linus Walleij <linus.walleij@linaro.org> wrote:
>
> Add a timeout for busydetect IRQs using a delayed work.
> It might happen (and does happen) on Ux500 that the first
> busy detect IRQ appears and not the second one. This will
> make the host hang indefinitely waiting for the second
> IRQ to appear.
>
> Fire a delayed work after 10ms and re-engage the command
> IRQ so the transaction finishes: we are certainly done
> at this point, or we will catch an error in the status
> register.

A fixed value of 10ms doesn't work. We have lots of commands that are
associated with way longer timeouts than this.

Typically, the cmd->busy_timeout contains the current value of the
timeout that should be used for the commands that have the flags
MMC_RSP_BUSY set for it.

The stm variant already uses cmd->busy_timeout, but also assigns a
default timeout, just to make sure if the core has failed to set
cmd->busy_timeout (it shouldn't but who knows).

A few more more comments to code, see below.

>
> This makes the eMMC work again on Skomer.
>
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> ---
> ChangeLog v2->v3:
> - Rebased.
> ChangeLog v1->v2:
> - No changes
> ---
>  drivers/mmc/host/mmci.c | 23 +++++++++++++++++++++++
>  drivers/mmc/host/mmci.h |  1 +
>  2 files changed, 24 insertions(+)
>
> diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
> index 05b8fad26c10..7e40b8f2dbf3 100644
> --- a/drivers/mmc/host/mmci.c
> +++ b/drivers/mmc/host/mmci.c
> @@ -37,6 +37,7 @@
>  #include <linux/pinctrl/consumer.h>
>  #include <linux/reset.h>
>  #include <linux/gpio/consumer.h>
> +#include <linux/workqueue.h>
>
>  #include <asm/div64.h>
>  #include <asm/io.h>
> @@ -741,6 +742,8 @@ static bool ux500_busy_complete(struct mmci_host *host, u32 status, u32 err_msk)
>                         host->busy_status |= status & (MCI_CMDSENT | MCI_CMDRESPEND);
>                         writel(host->variant->busy_detect_mask, base + MMCICLEAR);
>                         host->busy_state = MMCI_BUSY_WAITING_FOR_END_IRQ;
> +                       schedule_delayed_work(&host->busy_timeout_work,
> +                                             msecs_to_jiffies(10));
>                 } else {
>                         dev_dbg(mmc_dev(host->mmc),
>                                 "lost busy status when waiting for busy start IRQ\n");
> @@ -752,6 +755,7 @@ static bool ux500_busy_complete(struct mmci_host *host, u32 status, u32 err_msk)
>                 if (status & host->variant->busy_detect_flag) {
>                         host->busy_status |= status & (MCI_CMDSENT | MCI_CMDRESPEND);
>                         writel(host->variant->busy_detect_mask, base + MMCICLEAR);
> +                       cancel_delayed_work_sync(&host->busy_timeout_work);
>                         ux500_busy_clear_mask_done(host);
>                 } else {
>                         dev_dbg(mmc_dev(host->mmc),
> @@ -1487,6 +1491,22 @@ mmci_cmd_irq(struct mmci_host *host, struct mmc_command *cmd,
>         }
>  }
>
> +/*
> + * This busy timeout worker is used to "kick" the command IRQ if a
> + * busy detect IRQ fails to appear in reasonable time. Only used on
> + * variants with busy detection IRQ delivery.
> + */
> +static void busy_timeout_work(struct work_struct *work)
> +{
> +       struct mmci_host *host =
> +               container_of(work, struct mmci_host, busy_timeout_work.work);
> +       u32 status;
> +
> +       dev_dbg(mmc_dev(host->mmc), "timeout waiting for busy IRQ\n");
> +       status = readl(host->base + MMCISTATUS);
> +       mmci_cmd_irq(host, host->cmd, status);

There's no locking here. I assume that's because we call
cancel_delayed_work_sync() from an atomic context and we don't want
that to hang.

However, can't the call to mmci_cmd_irq() race with a proper irq being
managed in parallel?

> +}
> +
>  static int mmci_get_rx_fifocnt(struct mmci_host *host, u32 status, int remain)
>  {
>         return remain - (readl(host->base + MMCIFIFOCNT) << 2);
> @@ -2300,6 +2320,9 @@ static int mmci_probe(struct amba_device *dev,
>                         goto clk_disable;
>         }
>
> +       if (host->variant->busy_detect && host->ops->busy_complete)
> +               INIT_DELAYED_WORK(&host->busy_timeout_work, busy_timeout_work);
> +
>         writel(MCI_IRQENABLE | variant->start_err, host->base + MMCIMASK0);
>
>         amba_set_drvdata(dev, mmc);
> diff --git a/drivers/mmc/host/mmci.h b/drivers/mmc/host/mmci.h
> index 12a7bbd3ce26..95d3d0a6049b 100644
> --- a/drivers/mmc/host/mmci.h
> +++ b/drivers/mmc/host/mmci.h
> @@ -451,6 +451,7 @@ struct mmci_host {
>         void                    *dma_priv;
>
>         s32                     next_cookie;
> +       struct delayed_work     busy_timeout_work;
>  };
>
>  #define dma_inprogress(host)   ((host)->dma_in_progress)
>

Kind regards
Uffe

Linus Walleij June 13, 2023, 8:32 p.m. UTC | #2

On Tue, Jun 13, 2023 at 2:08 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:

> Typically, the cmd->busy_timeout contains the current value of the
> timeout that should be used for the commands that have the flags
> MMC_RSP_BUSY set for it.
>
> The stm variant already uses cmd->busy_timeout, but also assigns a
> default timeout, just to make sure if the core has failed to set
> cmd->busy_timeout (it shouldn't but who knows).

I generalized the STM32 solution with the core-specified timeout
and default and used that.

If we know the core will always provide correct timeouts we should
probably delete hacks like this though (but that would be a separate
patch).

> > +static void busy_timeout_work(struct work_struct *work)
> > +{
> > +       struct mmci_host *host =
> > +               container_of(work, struct mmci_host, busy_timeout_work.work);
> > +       u32 status;
> > +
> > +       dev_dbg(mmc_dev(host->mmc), "timeout waiting for busy IRQ\n");
> > +       status = readl(host->base + MMCISTATUS);
> > +       mmci_cmd_irq(host, host->cmd, status);
>
> There's no locking here. I assume that's because we call
> cancel_delayed_work_sync() from an atomic context and we don't want
> that to hang.
>
> However, can't the call to mmci_cmd_irq() race with a proper irq being
> managed in parallel?

It will not be a problem AFAIC: the MMCI is using level IRQs, meaning it
will handle all flags that are set for command or data IRQs before exiting
the IRQ handler, the order of the IRQ handling if several fire at the same
time is actually dependent on the order the IRQ flags are checked in the
code.

When the interrupt handler exits, all IRQs should be handled and the
respective IRQ lines and thus the IRQ from the MMCI should be
de-asserted.

In this case, our problem is that an interrupt is not fired at all, but if
the actual IRQ (that should have been fired) or a totally different IRQ
(such as an error) is fired at the same time doesn't matter: the pending
IRQs will be handled, and if the timer then fires an additional IRQ
the IRQ handler will check if there are any IRQs to handle and then
conclude there isn't and then just return.

At least the way I read it.

I made a v4, sending it out so you can check!

Yours,
Linus Walleij

Ulf Hansson June 14, 2023, 10:04 a.m. UTC | #3

On Tue, 13 Jun 2023 at 22:32, Linus Walleij <linus.walleij@linaro.org> wrote:
>
> On Tue, Jun 13, 2023 at 2:08 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> > Typically, the cmd->busy_timeout contains the current value of the
> > timeout that should be used for the commands that have the flags
> > MMC_RSP_BUSY set for it.
> >
> > The stm variant already uses cmd->busy_timeout, but also assigns a
> > default timeout, just to make sure if the core has failed to set
> > cmd->busy_timeout (it shouldn't but who knows).
>
> I generalized the STM32 solution with the core-specified timeout
> and default and used that.
>
> If we know the core will always provide correct timeouts we should
> probably delete hacks like this though (but that would be a separate
> patch).
>
> > > +static void busy_timeout_work(struct work_struct *work)
> > > +{
> > > +       struct mmci_host *host =
> > > +               container_of(work, struct mmci_host, busy_timeout_work.work);
> > > +       u32 status;
> > > +
> > > +       dev_dbg(mmc_dev(host->mmc), "timeout waiting for busy IRQ\n");
> > > +       status = readl(host->base + MMCISTATUS);
> > > +       mmci_cmd_irq(host, host->cmd, status);
> >
> > There's no locking here. I assume that's because we call
> > cancel_delayed_work_sync() from an atomic context and we don't want
> > that to hang.
> >
> > However, can't the call to mmci_cmd_irq() race with a proper irq being
> > managed in parallel?
>
> It will not be a problem AFAIC: the MMCI is using level IRQs, meaning it
> will handle all flags that are set for command or data IRQs before exiting
> the IRQ handler, the order of the IRQ handling if several fire at the same
> time is actually dependent on the order the IRQ flags are checked in the
> code.
>
> When the interrupt handler exits, all IRQs should be handled and the
> respective IRQ lines and thus the IRQ from the MMCI should be
> de-asserted.

Right, I think I follow.

>
> In this case, our problem is that an interrupt is not fired at all, but if
> the actual IRQ (that should have been fired) or a totally different IRQ
> (such as an error) is fired at the same time doesn't matter: the pending
> IRQs will be handled, and if the timer then fires an additional IRQ
> the IRQ handler will check if there are any IRQs to handle and then
> conclude there isn't and then just return.

To clarify, I am not worried about the IRQ handling as such.

However, we use the spin_lock to protect some members in the struct
mmci_host. In this case, the mmci_cmd_irq() is using "host->cmd" to
understand whether there is an active command to manage. When the
command has been completed, we set host->cmd to NULL.

[...]

Kind regards
Uffe

Linus Walleij June 14, 2023, 11:17 a.m. UTC | #4

On Wed, Jun 14, 2023 at 12:05 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:

> However, we use the spin_lock to protect some members in the struct
> mmci_host. In this case, the mmci_cmd_irq() is using "host->cmd" to
> understand whether there is an active command to manage. When the
> command has been completed, we set host->cmd to NULL.

Hm right...

I'm leaning toward some catch-all like:

if (!host->cmd)
  state = MMCI_BUSY_DONE;

as if there is no command going on then surely nothing is busy on the
host controller.

Yours,
Linus Walleij

Ulf Hansson June 14, 2023, 12:21 p.m. UTC | #5

On Wed, 14 Jun 2023 at 13:17, Linus Walleij <linus.walleij@linaro.org> wrote:
>
> On Wed, Jun 14, 2023 at 12:05 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> > However, we use the spin_lock to protect some members in the struct
> > mmci_host. In this case, the mmci_cmd_irq() is using "host->cmd" to
> > understand whether there is an active command to manage. When the
> > command has been completed, we set host->cmd to NULL.
>
> Hm right...
>
> I'm leaning toward some catch-all like:
>
> if (!host->cmd)
>   state = MMCI_BUSY_DONE;
>
> as if there is no command going on then surely nothing is busy on the
> host controller.

Right, so at what point do you want to add this check?

Kind regards
Uffe

Linus Walleij June 14, 2023, 7:22 p.m. UTC | #6

On Wed, Jun 14, 2023 at 2:22 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On Wed, 14 Jun 2023 at 13:17, Linus Walleij <linus.walleij@linaro.org> wrote:
> >
> > On Wed, Jun 14, 2023 at 12:05 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > > However, we use the spin_lock to protect some members in the struct
> > > mmci_host. In this case, the mmci_cmd_irq() is using "host->cmd" to
> > > understand whether there is an active command to manage. When the
> > > command has been completed, we set host->cmd to NULL.
> >
> > Hm right...
> >
> > I'm leaning toward some catch-all like:
> >
> > if (!host->cmd)
> >   state = MMCI_BUSY_DONE;
> >
> > as if there is no command going on then surely nothing is busy on the
> > host controller.
>
> Right, so at what point do you want to add this check?

I have put it before the calls to the busy_complete() callback, in the
IRQ, where we are already in atomic context. If we are not processing
a command, we should not do any busy detection for sure.

Yours,
Linus Walleij

Linus Walleij June 14, 2023, 7:28 p.m. UTC | #7

On Wed, Jun 14, 2023 at 9:22 PM Linus Walleij <linus.walleij@linaro.org> wrote:
> On Wed, Jun 14, 2023 at 2:22 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > On Wed, 14 Jun 2023 at 13:17, Linus Walleij <linus.walleij@linaro.org> wrote:
> > >
> > > On Wed, Jun 14, 2023 at 12:05 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > > However, we use the spin_lock to protect some members in the struct
> > > > mmci_host. In this case, the mmci_cmd_irq() is using "host->cmd" to
> > > > understand whether there is an active command to manage. When the
> > > > command has been completed, we set host->cmd to NULL.
> > >
> > > Hm right...
> > >
> > > I'm leaning toward some catch-all like:
> > >
> > > if (!host->cmd)
> > >   state = MMCI_BUSY_DONE;
> > >
> > > as if there is no command going on then surely nothing is busy on the
> > > host controller.
> >
> > Right, so at what point do you want to add this check?
>
> I have put it before the calls to the busy_complete() callback, in the
> IRQ, where we are already in atomic context. If we are not processing
> a command, we should not do any busy detection for sure.

No that's wrong. The mmci_cmd_irq() is actually passed the command as
parameter, so I just augment the busy_complete() prototype to pass this
along down, check out the patch (I talk to much).

Yours,
Linus Walleij

[v3,10/10] mmc: mmci: Add busydetect timeout

Commit Message

Comments

Patch