mmc: dw_mmc-exynos: fix potential external abort in resume_noirq()

Message ID 20180611064838.11383-1-m.szyprowski@samsung.com
State New
Headers show
Series
  • mmc: dw_mmc-exynos: fix potential external abort in resume_noirq()
Related show

Commit Message

Marek Szyprowski June 11, 2018, 6:48 a.m.
dw_mci_exynos_resume_noirq() performs DWMMC register access without
ensuring that respective clocks are enabled. This might cause external
abort on some systems (observed on Exynos5433 based boards). Fix this
by adding needed prepare_enable/disable_unprepare calls.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

---
 drivers/mmc/host/dw_mmc-exynos.c | 6 ++++++
 1 file changed, 6 insertions(+)

-- 
2.17.1

--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Ulf Hansson June 11, 2018, 9:35 a.m. | #1
On 11 June 2018 at 08:48, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> dw_mci_exynos_resume_noirq() performs DWMMC register access without

> ensuring that respective clocks are enabled. This might cause external

> abort on some systems (observed on Exynos5433 based boards). Fix this

> by adding needed prepare_enable/disable_unprepare calls.

>

> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

> ---

>  drivers/mmc/host/dw_mmc-exynos.c | 6 ++++++

>  1 file changed, 6 insertions(+)

>

> diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c

> index 3164681108ae..6125b68726b0 100644

> --- a/drivers/mmc/host/dw_mmc-exynos.c

> +++ b/drivers/mmc/host/dw_mmc-exynos.c

> @@ -193,6 +193,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)

>         struct dw_mci_exynos_priv_data *priv = host->priv;

>         u32 clksel;

>

> +       clk_prepare_enable(host->biu_clk);

> +       clk_prepare_enable(host->ciu_clk);

> +

>         if (priv->ctrl_type == DW_MCI_TYPE_EXYNOS7 ||

>                 priv->ctrl_type == DW_MCI_TYPE_EXYNOS7_SMU)

>                 clksel = mci_readl(host, CLKSEL64);

> @@ -207,6 +210,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)

>                         mci_writel(host, CLKSEL, clksel);

>         }

>

> +       clk_disable_unprepare(host->biu_clk);

> +       clk_disable_unprepare(host->ciu_clk);

> +

>         return 0;

>  }

>  #else


I looked a little closer and I am wondering if it wouldn't be possible
to use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() instead of
SET_SYSTEM_SLEEP_PM_OPS()?

Somelike this:
SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
dw_mci_exynos_resume_noirq)

Then from dw_mci_exynos_resume_noirq() call pm_runtime_force_resume().

I think it would simplify the code a bit, as you can rely on the
runtime PM callbacks to deal with clk_prepare_enable() and
clk_disable_unprepare(), unless I am mistaken.

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marek Szyprowski June 11, 2018, 9:50 a.m. | #2
Hi Ulf,

On 2018-06-11 11:35, Ulf Hansson wrote:
> On 11 June 2018 at 08:48, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>> dw_mci_exynos_resume_noirq() performs DWMMC register access without

>> ensuring that respective clocks are enabled. This might cause external

>> abort on some systems (observed on Exynos5433 based boards). Fix this

>> by adding needed prepare_enable/disable_unprepare calls.

>>

>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

>> ---

>>   drivers/mmc/host/dw_mmc-exynos.c | 6 ++++++

>>   1 file changed, 6 insertions(+)

>>

>> diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c

>> index 3164681108ae..6125b68726b0 100644

>> --- a/drivers/mmc/host/dw_mmc-exynos.c

>> +++ b/drivers/mmc/host/dw_mmc-exynos.c

>> @@ -193,6 +193,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)

>>          struct dw_mci_exynos_priv_data *priv = host->priv;

>>          u32 clksel;

>>

>> +       clk_prepare_enable(host->biu_clk);

>> +       clk_prepare_enable(host->ciu_clk);

>> +

>>          if (priv->ctrl_type == DW_MCI_TYPE_EXYNOS7 ||

>>                  priv->ctrl_type == DW_MCI_TYPE_EXYNOS7_SMU)

>>                  clksel = mci_readl(host, CLKSEL64);

>> @@ -207,6 +210,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)

>>                          mci_writel(host, CLKSEL, clksel);

>>          }

>>

>> +       clk_disable_unprepare(host->biu_clk);

>> +       clk_disable_unprepare(host->ciu_clk);

>> +

>>          return 0;

>>   }

>>   #else

> I looked a little closer and I am wondering if it wouldn't be possible

> to use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() instead of

> SET_SYSTEM_SLEEP_PM_OPS()?

>

> Somelike this:

> SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,

> dw_mci_exynos_resume_noirq)

>

> Then from dw_mci_exynos_resume_noirq() call pm_runtime_force_resume().

>

> I think it would simplify the code a bit, as you can rely on the

> runtime PM callbacks to deal with clk_prepare_enable() and

> clk_disable_unprepare(), unless I am mistaken.


This will not fix the problem, because mci_writel() calls in
dw_mci_exynos_resume_noirq are done unconditionally, regardless of the
controller's runtime pm state. Since commit 1d9174fbc55e after calling
pm_runtime_force_resume() there is no guarantee that device is in
runtime active state if it was runtime suspended state.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson June 11, 2018, 12:24 p.m. | #3
On 11 June 2018 at 11:50, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> Hi Ulf,

>

> On 2018-06-11 11:35, Ulf Hansson wrote:

>> On 11 June 2018 at 08:48, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>>> dw_mci_exynos_resume_noirq() performs DWMMC register access without

>>> ensuring that respective clocks are enabled. This might cause external

>>> abort on some systems (observed on Exynos5433 based boards). Fix this

>>> by adding needed prepare_enable/disable_unprepare calls.

>>>

>>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

>>> ---

>>>   drivers/mmc/host/dw_mmc-exynos.c | 6 ++++++

>>>   1 file changed, 6 insertions(+)

>>>

>>> diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c

>>> index 3164681108ae..6125b68726b0 100644

>>> --- a/drivers/mmc/host/dw_mmc-exynos.c

>>> +++ b/drivers/mmc/host/dw_mmc-exynos.c

>>> @@ -193,6 +193,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)

>>>          struct dw_mci_exynos_priv_data *priv = host->priv;

>>>          u32 clksel;

>>>

>>> +       clk_prepare_enable(host->biu_clk);

>>> +       clk_prepare_enable(host->ciu_clk);

>>> +

>>>          if (priv->ctrl_type == DW_MCI_TYPE_EXYNOS7 ||

>>>                  priv->ctrl_type == DW_MCI_TYPE_EXYNOS7_SMU)

>>>                  clksel = mci_readl(host, CLKSEL64);

>>> @@ -207,6 +210,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)

>>>                          mci_writel(host, CLKSEL, clksel);

>>>          }

>>>

>>> +       clk_disable_unprepare(host->biu_clk);

>>> +       clk_disable_unprepare(host->ciu_clk);

>>> +

>>>          return 0;

>>>   }

>>>   #else

>> I looked a little closer and I am wondering if it wouldn't be possible

>> to use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() instead of

>> SET_SYSTEM_SLEEP_PM_OPS()?

>>

>> Somelike this:

>> SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,

>> dw_mci_exynos_resume_noirq)

>>

>> Then from dw_mci_exynos_resume_noirq() call pm_runtime_force_resume().

>>

>> I think it would simplify the code a bit, as you can rely on the

>> runtime PM callbacks to deal with clk_prepare_enable() and

>> clk_disable_unprepare(), unless I am mistaken.

>

> This will not fix the problem, because mci_writel() calls in

> dw_mci_exynos_resume_noirq are done unconditionally, regardless of the

> controller's runtime pm state. Since commit 1d9174fbc55e after calling

> pm_runtime_force_resume() there is no guarantee that device is in

> runtime active state if it was runtime suspended state.


Yes, because the runtime PM usage count is greater than 1.
(pm_runtime_get_noresume() is called during probe).

If you want to make this explicit (not relying on ->probe()), one can
add a ->suspend_noirq() callback and call pm_runtime_get_noresume() in
it.

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson June 12, 2018, 9:20 a.m. | #4
Hi Marek,

On 12 June 2018 at 10:28, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> Hi Ulf,

>

> On 2018-06-11 14:24, Ulf Hansson wrote:

>> On 11 June 2018 at 11:50, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>>> On 2018-06-11 11:35, Ulf Hansson wrote:

>>>> On 11 June 2018 at 08:48, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>>>>> dw_mci_exynos_resume_noirq() performs DWMMC register access without

>>>>> ensuring that respective clocks are enabled. This might cause external

>>>>> abort on some systems (observed on Exynos5433 based boards). Fix this

>>>>> by adding needed prepare_enable/disable_unprepare calls.

>>>>>

>>>>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

>>>>> ---

>>>>>    drivers/mmc/host/dw_mmc-exynos.c | 6 ++++++

>>>>>    1 file changed, 6 insertions(+)

>>>>>

>>>>> diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c

>>>>> index 3164681108ae..6125b68726b0 100644

>>>>> --- a/drivers/mmc/host/dw_mmc-exynos.c

>>>>> +++ b/drivers/mmc/host/dw_mmc-exynos.c

>>>>> @@ -193,6 +193,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)

>>>>>           struct dw_mci_exynos_priv_data *priv = host->priv;

>>>>>           u32 clksel;

>>>>>

>>>>> +       clk_prepare_enable(host->biu_clk);

>>>>> +       clk_prepare_enable(host->ciu_clk);

>>>>> +

>>>>>           if (priv->ctrl_type == DW_MCI_TYPE_EXYNOS7 ||

>>>>>                   priv->ctrl_type == DW_MCI_TYPE_EXYNOS7_SMU)

>>>>>                   clksel = mci_readl(host, CLKSEL64);

>>>>> @@ -207,6 +210,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)

>>>>>                           mci_writel(host, CLKSEL, clksel);

>>>>>           }

>>>>>

>>>>> +       clk_disable_unprepare(host->biu_clk);

>>>>> +       clk_disable_unprepare(host->ciu_clk);

>>>>> +

>>>>>           return 0;

>>>>>    }

>>>>>    #else

>>>> I looked a little closer and I am wondering if it wouldn't be possible

>>>> to use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() instead of

>>>> SET_SYSTEM_SLEEP_PM_OPS()?

>>>>

>>>> Somelike this:

>>>> SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,

>>>> dw_mci_exynos_resume_noirq)

>>>>

>>>> Then from dw_mci_exynos_resume_noirq() call pm_runtime_force_resume().

>>>>

>>>> I think it would simplify the code a bit, as you can rely on the

>>>> runtime PM callbacks to deal with clk_prepare_enable() and

>>>> clk_disable_unprepare(), unless I am mistaken.

>>> This will not fix the problem, because mci_writel() calls in

>>> dw_mci_exynos_resume_noirq are done unconditionally, regardless of the

>>> controller's runtime pm state. Since commit 1d9174fbc55e after calling

>>> pm_runtime_force_resume() there is no guarantee that device is in

>>> runtime active state if it was runtime suspended state.

>> Yes, because the runtime PM usage count is greater than 1.

>> (pm_runtime_get_noresume() is called during probe).

>>

>> If you want to make this explicit (not relying on ->probe()), one can

>> add a ->suspend_noirq() callback and call pm_runtime_get_noresume() in

>> it.

>

> Sorry, but I don't get how this would work. Exactly the same pattern as

> you have proposed was already used in s3c-64xx SPI driver and it didn't

> work properly (tested on the same SoC as this DW-MMC change). I had to

> move register access to runtime resume callback to fix external abort

> issue:

>

> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e935dba111621bd6a0c5d48e6511a4d9885103b4


Yep, that is a correct solution.

>

> Here in DW-MMC driver such approach (moving all the code to runtime

> resume callback) is not possible because of the potential interrupt storm

> caused by the hw bug (that's the reason of using noirq resume callback).


I understand. What you need is to run the runtime resume/suspend
callbacks in the resume/suspend noirq phase. Moreover, you need to
make sure that the runtime resume callback, really becomes invoked
during the resume noirq phase, because of the HW bug.

I think the below should work. Can you give it a try?

It relies on the call pm_runtime_get_noresume(), done during
->probe(). Note that, the driver always keeps the RPM usage count
increased, thus preventing runtime suspend during normal execution.

Anyway, if this doesn't work, your suggested approach works fine as well.

---
 drivers/mmc/host/dw_mmc-exynos.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c
index a84aa3f1ae85..66132f7fceed 100644
--- a/drivers/mmc/host/dw_mmc-exynos.c
+++ b/drivers/mmc/host/dw_mmc-exynos.c
@@ -175,7 +175,9 @@ static int dw_mci_exynos_runtime_resume(struct device *dev)

        return ret;
 }
+#endif /* CONFIG_PM */

+#ifdef CONFIG_PM_SLEEP
 /**
  * dw_mci_exynos_resume_noirq - Exynos-specific resume code
  *
@@ -193,6 +195,8 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
        struct dw_mci_exynos_priv_data *priv = host->priv;
        u32 clksel;

+       pm_runtime_force_resume(dev);
+
        if (priv->ctrl_type == DW_MCI_TYPE_EXYNOS7 ||
                priv->ctrl_type == DW_MCI_TYPE_EXYNOS7_SMU)
                clksel = mci_readl(host, CLKSEL64);
@@ -209,9 +213,7 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)

        return 0;
 }
-#else
-#define dw_mci_exynos_resume_noirq     NULL
-#endif /* CONFIG_PM */
+#endif /* CONFIG_PM_SLEEP */

 static void dw_mci_exynos_config_hs400(struct dw_mci *host, u32 timing)
 {
@@ -553,14 +555,11 @@ static int dw_mci_exynos_remove(struct
platform_device *pdev)
 }
 static const struct dev_pm_ops dw_mci_exynos_pmops = {
-       SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
-                               pm_runtime_force_resume)
+       SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
+                               dw_mci_exynos_resume_noirq)
        SET_RUNTIME_PM_OPS(dw_mci_runtime_suspend,
                           dw_mci_exynos_runtime_resume,
                           NULL)
-       .resume_noirq = dw_mci_exynos_resume_noirq,
-       .thaw_noirq = dw_mci_exynos_resume_noirq,
-       .restore_noirq = dw_mci_exynos_resume_noirq,
 };

 static struct platform_driver dw_mci_exynos_pltfm_driver = {
-- 

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson June 12, 2018, 10:07 a.m. | #5
On 12 June 2018 at 11:52, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> Hi Ulf,

>

> On 2018-06-12 11:20, Ulf Hansson wrote:

>> On 12 June 2018 at 10:28, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>>> On 2018-06-11 14:24, Ulf Hansson wrote:

>>>> On 11 June 2018 at 11:50, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>>>>> On 2018-06-11 11:35, Ulf Hansson wrote:

>>>>>> On 11 June 2018 at 08:48, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>>>>>>> dw_mci_exynos_resume_noirq() performs DWMMC register access without

>>>>>>> ensuring that respective clocks are enabled. This might cause external

>>>>>>> abort on some systems (observed on Exynos5433 based boards). Fix this

>>>>>>> by adding needed prepare_enable/disable_unprepare calls.

>>>>>>>

>>>>>>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

>>>>>>> ---

>>>>>>>     drivers/mmc/host/dw_mmc-exynos.c | 6 ++++++

>>>>>>>     1 file changed, 6 insertions(+)

>>>>>>>

>>>>>>> diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c

>>>>>>> index 3164681108ae..6125b68726b0 100644

>>>>>>> --- a/drivers/mmc/host/dw_mmc-exynos.c

>>>>>>> +++ b/drivers/mmc/host/dw_mmc-exynos.c

>>>>>>> @@ -193,6 +193,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)

>>>>>>>            struct dw_mci_exynos_priv_data *priv = host->priv;

>>>>>>>            u32 clksel;

>>>>>>>

>>>>>>> +       clk_prepare_enable(host->biu_clk);

>>>>>>> +       clk_prepare_enable(host->ciu_clk);

>>>>>>> +

>>>>>>>            if (priv->ctrl_type == DW_MCI_TYPE_EXYNOS7 ||

>>>>>>>                    priv->ctrl_type == DW_MCI_TYPE_EXYNOS7_SMU)

>>>>>>>                    clksel = mci_readl(host, CLKSEL64);

>>>>>>> @@ -207,6 +210,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)

>>>>>>>                            mci_writel(host, CLKSEL, clksel);

>>>>>>>            }

>>>>>>>

>>>>>>> +       clk_disable_unprepare(host->biu_clk);

>>>>>>> +       clk_disable_unprepare(host->ciu_clk);

>>>>>>> +

>>>>>>>            return 0;

>>>>>>>     }

>>>>>>>     #else

>>>>>> I looked a little closer and I am wondering if it wouldn't be possible

>>>>>> to use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() instead of

>>>>>> SET_SYSTEM_SLEEP_PM_OPS()?

>>>>>>

>>>>>> Somelike this:

>>>>>> SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,

>>>>>> dw_mci_exynos_resume_noirq)

>>>>>>

>>>>>> Then from dw_mci_exynos_resume_noirq() call pm_runtime_force_resume().

>>>>>>

>>>>>> I think it would simplify the code a bit, as you can rely on the

>>>>>> runtime PM callbacks to deal with clk_prepare_enable() and

>>>>>> clk_disable_unprepare(), unless I am mistaken.

>>>>> This will not fix the problem, because mci_writel() calls in

>>>>> dw_mci_exynos_resume_noirq are done unconditionally, regardless of the

>>>>> controller's runtime pm state. Since commit 1d9174fbc55e after calling

>>>>> pm_runtime_force_resume() there is no guarantee that device is in

>>>>> runtime active state if it was runtime suspended state.

>>>> Yes, because the runtime PM usage count is greater than 1.

>>>> (pm_runtime_get_noresume() is called during probe).

>>>>

>>>> If you want to make this explicit (not relying on ->probe()), one can

>>>> add a ->suspend_noirq() callback and call pm_runtime_get_noresume() in

>>>> it.

>>> Sorry, but I don't get how this would work. Exactly the same pattern as

>>> you have proposed was already used in s3c-64xx SPI driver and it didn't

>>> work properly (tested on the same SoC as this DW-MMC change). I had to

>>> move register access to runtime resume callback to fix external abort

>>> issue:

>>>

>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e935dba111621bd6a0c5d48e6511a4d9885103b4

>> Yep, that is a correct solution.

>>

>>> Here in DW-MMC driver such approach (moving all the code to runtime

>>> resume callback) is not possible because of the potential interrupt storm

>>> caused by the hw bug (that's the reason of using noirq resume callback).

>> I understand. What you need is to run the runtime resume/suspend

>> callbacks in the resume/suspend noirq phase. Moreover, you need to

>> make sure that the runtime resume callback, really becomes invoked

>> during the resume noirq phase, because of the HW bug.

>>

>> I think the below should work. Can you give it a try?

>>

>> It relies on the call pm_runtime_get_noresume(), done during

>> ->probe(). Note that, the driver always keeps the RPM usage count

>> increased, thus preventing runtime suspend during normal execution.

>>

>> Anyway, if this doesn't work, your suggested approach works fine as well.

>

> Okay, finally I got it. I wasn't aware that dw_mmc-exynos keeps device

> runtime active all the time between the driver probe() and remove().

> Right, this will fix this specific case, but it isn't a generic solution,

> so I will also add a comment on that, so one would not need to debug it

> again if he decides to change runtime pm usage scheme in dw_mmc-exynos

> in the future.


Seems reasonable!

If you want the more generic solution, I would add a exynos specific
suspend_noirq() callback, let it call pm_runtime_get_noresume() and
them pm_runtime_force_suspend().

In the corresponding resume_noirq() callback, extend my suggested
changes, with a call to pm_runtime_put_noidle() after all actions has
been done in it.

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marek Szyprowski June 12, 2018, 10:25 a.m. | #6
Hi Ulf,

On 2018-06-12 12:07, Ulf Hansson wrote:
> On 12 June 2018 at 11:52, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>> On 2018-06-12 11:20, Ulf Hansson wrote:

>>> On 12 June 2018 at 10:28, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>>>> On 2018-06-11 14:24, Ulf Hansson wrote:

>>>>> On 11 June 2018 at 11:50, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>>>>>> On 2018-06-11 11:35, Ulf Hansson wrote:

>>>>>>> On 11 June 2018 at 08:48, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>>>>>>>> dw_mci_exynos_resume_noirq() performs DWMMC register access without

>>>>>>>> ensuring that respective clocks are enabled. This might cause external

>>>>>>>> abort on some systems (observed on Exynos5433 based boards). Fix this

>>>>>>>> by adding needed prepare_enable/disable_unprepare calls.

>>>>>>>>

>>>>>>>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

>>>>>>>> ---

>>>>>>>>      drivers/mmc/host/dw_mmc-exynos.c | 6 ++++++

>>>>>>>>      1 file changed, 6 insertions(+)

>>>>>>>>

>>>>>>>> diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c

>>>>>>>> index 3164681108ae..6125b68726b0 100644

>>>>>>>> --- a/drivers/mmc/host/dw_mmc-exynos.c

>>>>>>>> +++ b/drivers/mmc/host/dw_mmc-exynos.c

>>>>>>>> @@ -193,6 +193,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)

>>>>>>>>             struct dw_mci_exynos_priv_data *priv = host->priv;

>>>>>>>>             u32 clksel;

>>>>>>>>

>>>>>>>> +       clk_prepare_enable(host->biu_clk);

>>>>>>>> +       clk_prepare_enable(host->ciu_clk);

>>>>>>>> +

>>>>>>>>             if (priv->ctrl_type == DW_MCI_TYPE_EXYNOS7 ||

>>>>>>>>                     priv->ctrl_type == DW_MCI_TYPE_EXYNOS7_SMU)

>>>>>>>>                     clksel = mci_readl(host, CLKSEL64);

>>>>>>>> @@ -207,6 +210,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)

>>>>>>>>                             mci_writel(host, CLKSEL, clksel);

>>>>>>>>             }

>>>>>>>>

>>>>>>>> +       clk_disable_unprepare(host->biu_clk);

>>>>>>>> +       clk_disable_unprepare(host->ciu_clk);

>>>>>>>> +

>>>>>>>>             return 0;

>>>>>>>>      }

>>>>>>>>      #else

>>>>>>> I looked a little closer and I am wondering if it wouldn't be possible

>>>>>>> to use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() instead of

>>>>>>> SET_SYSTEM_SLEEP_PM_OPS()?

>>>>>>>

>>>>>>> Somelike this:

>>>>>>> SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,

>>>>>>> dw_mci_exynos_resume_noirq)

>>>>>>>

>>>>>>> Then from dw_mci_exynos_resume_noirq() call pm_runtime_force_resume().

>>>>>>>

>>>>>>> I think it would simplify the code a bit, as you can rely on the

>>>>>>> runtime PM callbacks to deal with clk_prepare_enable() and

>>>>>>> clk_disable_unprepare(), unless I am mistaken.

>>>>>> This will not fix the problem, because mci_writel() calls in

>>>>>> dw_mci_exynos_resume_noirq are done unconditionally, regardless of the

>>>>>> controller's runtime pm state. Since commit 1d9174fbc55e after calling

>>>>>> pm_runtime_force_resume() there is no guarantee that device is in

>>>>>> runtime active state if it was runtime suspended state.

>>>>> Yes, because the runtime PM usage count is greater than 1.

>>>>> (pm_runtime_get_noresume() is called during probe).

>>>>>

>>>>> If you want to make this explicit (not relying on ->probe()), one can

>>>>> add a ->suspend_noirq() callback and call pm_runtime_get_noresume() in

>>>>> it.

>>>> Sorry, but I don't get how this would work. Exactly the same pattern as

>>>> you have proposed was already used in s3c-64xx SPI driver and it didn't

>>>> work properly (tested on the same SoC as this DW-MMC change). I had to

>>>> move register access to runtime resume callback to fix external abort

>>>> issue:

>>>>

>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e935dba111621bd6a0c5d48e6511a4d9885103b4

>>> Yep, that is a correct solution.

>>>

>>>> Here in DW-MMC driver such approach (moving all the code to runtime

>>>> resume callback) is not possible because of the potential interrupt storm

>>>> caused by the hw bug (that's the reason of using noirq resume callback).

>>> I understand. What you need is to run the runtime resume/suspend

>>> callbacks in the resume/suspend noirq phase. Moreover, you need to

>>> make sure that the runtime resume callback, really becomes invoked

>>> during the resume noirq phase, because of the HW bug.

>>>

>>> I think the below should work. Can you give it a try?

>>>

>>> It relies on the call pm_runtime_get_noresume(), done during

>>> ->probe(). Note that, the driver always keeps the RPM usage count

>>> increased, thus preventing runtime suspend during normal execution.

>>>

>>> Anyway, if this doesn't work, your suggested approach works fine as well.

>> Okay, finally I got it. I wasn't aware that dw_mmc-exynos keeps device

>> runtime active all the time between the driver probe() and remove().

>> Right, this will fix this specific case, but it isn't a generic solution,

>> so I will also add a comment on that, so one would not need to debug it

>> again if he decides to change runtime pm usage scheme in dw_mmc-exynos

>> in the future.

> Seems reasonable!

>

> If you want the more generic solution, I would add a exynos specific

> suspend_noirq() callback, let it call pm_runtime_get_noresume() and

> them pm_runtime_force_suspend().

>

> In the corresponding resume_noirq() callback, extend my suggested

> changes, with a call to pm_runtime_put_noidle() after all actions has

> been done in it.


Okay, this finally looks like a proper and future-proof solution. Just
one more question: why pm_runtime_put_noidle()? Hypothetically, when the
dw_mmc-exynos driver gets proper runtime PM management and system suspend
will happen when the device is in runtime suspended state, this will leave
it in runtime active state with refcount = 0 after the system resume cycle.
Imho simple pm_runtime_put() will be better in this case.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson June 12, 2018, 10:38 a.m. | #7
On 12 June 2018 at 12:25, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> Hi Ulf,

>

> On 2018-06-12 12:07, Ulf Hansson wrote:

>> On 12 June 2018 at 11:52, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>>> On 2018-06-12 11:20, Ulf Hansson wrote:

>>>> On 12 June 2018 at 10:28, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>>>>> On 2018-06-11 14:24, Ulf Hansson wrote:

>>>>>> On 11 June 2018 at 11:50, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>>>>>>> On 2018-06-11 11:35, Ulf Hansson wrote:

>>>>>>>> On 11 June 2018 at 08:48, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>>>>>>>>> dw_mci_exynos_resume_noirq() performs DWMMC register access without

>>>>>>>>> ensuring that respective clocks are enabled. This might cause external

>>>>>>>>> abort on some systems (observed on Exynos5433 based boards). Fix this

>>>>>>>>> by adding needed prepare_enable/disable_unprepare calls.

>>>>>>>>>

>>>>>>>>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

>>>>>>>>> ---

>>>>>>>>>      drivers/mmc/host/dw_mmc-exynos.c | 6 ++++++

>>>>>>>>>      1 file changed, 6 insertions(+)

>>>>>>>>>

>>>>>>>>> diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c

>>>>>>>>> index 3164681108ae..6125b68726b0 100644

>>>>>>>>> --- a/drivers/mmc/host/dw_mmc-exynos.c

>>>>>>>>> +++ b/drivers/mmc/host/dw_mmc-exynos.c

>>>>>>>>> @@ -193,6 +193,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)

>>>>>>>>>             struct dw_mci_exynos_priv_data *priv = host->priv;

>>>>>>>>>             u32 clksel;

>>>>>>>>>

>>>>>>>>> +       clk_prepare_enable(host->biu_clk);

>>>>>>>>> +       clk_prepare_enable(host->ciu_clk);

>>>>>>>>> +

>>>>>>>>>             if (priv->ctrl_type == DW_MCI_TYPE_EXYNOS7 ||

>>>>>>>>>                     priv->ctrl_type == DW_MCI_TYPE_EXYNOS7_SMU)

>>>>>>>>>                     clksel = mci_readl(host, CLKSEL64);

>>>>>>>>> @@ -207,6 +210,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)

>>>>>>>>>                             mci_writel(host, CLKSEL, clksel);

>>>>>>>>>             }

>>>>>>>>>

>>>>>>>>> +       clk_disable_unprepare(host->biu_clk);

>>>>>>>>> +       clk_disable_unprepare(host->ciu_clk);

>>>>>>>>> +

>>>>>>>>>             return 0;

>>>>>>>>>      }

>>>>>>>>>      #else

>>>>>>>> I looked a little closer and I am wondering if it wouldn't be possible

>>>>>>>> to use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() instead of

>>>>>>>> SET_SYSTEM_SLEEP_PM_OPS()?

>>>>>>>>

>>>>>>>> Somelike this:

>>>>>>>> SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,

>>>>>>>> dw_mci_exynos_resume_noirq)

>>>>>>>>

>>>>>>>> Then from dw_mci_exynos_resume_noirq() call pm_runtime_force_resume().

>>>>>>>>

>>>>>>>> I think it would simplify the code a bit, as you can rely on the

>>>>>>>> runtime PM callbacks to deal with clk_prepare_enable() and

>>>>>>>> clk_disable_unprepare(), unless I am mistaken.

>>>>>>> This will not fix the problem, because mci_writel() calls in

>>>>>>> dw_mci_exynos_resume_noirq are done unconditionally, regardless of the

>>>>>>> controller's runtime pm state. Since commit 1d9174fbc55e after calling

>>>>>>> pm_runtime_force_resume() there is no guarantee that device is in

>>>>>>> runtime active state if it was runtime suspended state.

>>>>>> Yes, because the runtime PM usage count is greater than 1.

>>>>>> (pm_runtime_get_noresume() is called during probe).

>>>>>>

>>>>>> If you want to make this explicit (not relying on ->probe()), one can

>>>>>> add a ->suspend_noirq() callback and call pm_runtime_get_noresume() in

>>>>>> it.

>>>>> Sorry, but I don't get how this would work. Exactly the same pattern as

>>>>> you have proposed was already used in s3c-64xx SPI driver and it didn't

>>>>> work properly (tested on the same SoC as this DW-MMC change). I had to

>>>>> move register access to runtime resume callback to fix external abort

>>>>> issue:

>>>>>

>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e935dba111621bd6a0c5d48e6511a4d9885103b4

>>>> Yep, that is a correct solution.

>>>>

>>>>> Here in DW-MMC driver such approach (moving all the code to runtime

>>>>> resume callback) is not possible because of the potential interrupt storm

>>>>> caused by the hw bug (that's the reason of using noirq resume callback).

>>>> I understand. What you need is to run the runtime resume/suspend

>>>> callbacks in the resume/suspend noirq phase. Moreover, you need to

>>>> make sure that the runtime resume callback, really becomes invoked

>>>> during the resume noirq phase, because of the HW bug.

>>>>

>>>> I think the below should work. Can you give it a try?

>>>>

>>>> It relies on the call pm_runtime_get_noresume(), done during

>>>> ->probe(). Note that, the driver always keeps the RPM usage count

>>>> increased, thus preventing runtime suspend during normal execution.

>>>>

>>>> Anyway, if this doesn't work, your suggested approach works fine as well.

>>> Okay, finally I got it. I wasn't aware that dw_mmc-exynos keeps device

>>> runtime active all the time between the driver probe() and remove().

>>> Right, this will fix this specific case, but it isn't a generic solution,

>>> so I will also add a comment on that, so one would not need to debug it

>>> again if he decides to change runtime pm usage scheme in dw_mmc-exynos

>>> in the future.

>> Seems reasonable!

>>

>> If you want the more generic solution, I would add a exynos specific

>> suspend_noirq() callback, let it call pm_runtime_get_noresume() and

>> them pm_runtime_force_suspend().

>>

>> In the corresponding resume_noirq() callback, extend my suggested

>> changes, with a call to pm_runtime_put_noidle() after all actions has

>> been done in it.

>

> Okay, this finally looks like a proper and future-proof solution. Just

> one more question: why pm_runtime_put_noidle()? Hypothetically, when the

> dw_mmc-exynos driver gets proper runtime PM management and system suspend

> will happen when the device is in runtime suspended state, this will leave

> it in runtime active state with refcount = 0 after the system resume cycle.

> Imho simple pm_runtime_put() will be better in this case.


It doesn't really matter.

Runtime PM is disabled for the device (done by driver core) and the PM
workqueue is suspended, which means pm_runtime_put() will not
immediately runtime suspend the device.

The important part is to restore the usage count, such that the driver
core can potentially runtime suspend the device when device_complete()
is called for it.

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c
index 3164681108ae..6125b68726b0 100644
--- a/drivers/mmc/host/dw_mmc-exynos.c
+++ b/drivers/mmc/host/dw_mmc-exynos.c
@@ -193,6 +193,9 @@  static int dw_mci_exynos_resume_noirq(struct device *dev)
 	struct dw_mci_exynos_priv_data *priv = host->priv;
 	u32 clksel;
 
+	clk_prepare_enable(host->biu_clk);
+	clk_prepare_enable(host->ciu_clk);
+
 	if (priv->ctrl_type == DW_MCI_TYPE_EXYNOS7 ||
 		priv->ctrl_type == DW_MCI_TYPE_EXYNOS7_SMU)
 		clksel = mci_readl(host, CLKSEL64);
@@ -207,6 +210,9 @@  static int dw_mci_exynos_resume_noirq(struct device *dev)
 			mci_writel(host, CLKSEL, clksel);
 	}
 
+	clk_disable_unprepare(host->biu_clk);
+	clk_disable_unprepare(host->ciu_clk);
+
 	return 0;
 }
 #else