PM / runtime: Fix state of suppliers of links created during consumer probe

Message ID 20180608084115.19242-1-m.szyprowski@samsung.com
State New
Headers show
Series
  • PM / runtime: Fix state of suppliers of links created during consumer probe
Related show

Commit Message

Marek Szyprowski June 8, 2018, 8:41 a.m.
When device links with DL_FLAG_PM_RUNTIME are created during consumer
probe, the supplier device is unconditionally runtime activated by
pm_runtime_get_sync(). However this get() is not noticed by
rpm_put_suppliers() called at the end of probe(), because the newly
created link lacks a rpm_active state bit. This worked when probe() called
pm_runtime_(resume,suspend)_suppliers() unconditionally, but patch
1e8378619841 ("PM / runtime: Fixup reference counting of device
link suppliers at probe") changed probe() to rely on the generic
rpm_put_suppliers() code. This patch updates the newly created link state
to proper value.

Fixes: 1e8378619841 ("PM / runtime: Fixup reference counting of device link suppliers at probe")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

---
This fixes Exynos IOMMU driver being unconditionally set to runtime active
state since the mentioned commit.
---
 drivers/base/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

-- 
2.17.1

Comments

Ulf Hansson June 8, 2018, 9:29 a.m. | #1
On 8 June 2018 at 10:41, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> When device links with DL_FLAG_PM_RUNTIME are created during consumer

> probe, the supplier device is unconditionally runtime activated by

> pm_runtime_get_sync(). However this get() is not noticed by

> rpm_put_suppliers() called at the end of probe(), because the newly

> created link lacks a rpm_active state bit. This worked when probe() called

> pm_runtime_(resume,suspend)_suppliers() unconditionally, but patch

> 1e8378619841 ("PM / runtime: Fixup reference counting of device

> link suppliers at probe") changed probe() to rely on the generic

> rpm_put_suppliers() code. This patch updates the newly created link state

> to proper value.

>

> Fixes: 1e8378619841 ("PM / runtime: Fixup reference counting of device link suppliers at probe")

> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

> ---

> This fixes Exynos IOMMU driver being unconditionally set to runtime active

> state since the mentioned commit.

> ---

>  drivers/base/core.c | 4 +++-

>  1 file changed, 3 insertions(+), 1 deletion(-)

>

> diff --git a/drivers/base/core.c b/drivers/base/core.c

> index ad7b50897bcc..8ffd353f40ce 100644

> --- a/drivers/base/core.c

> +++ b/drivers/base/core.c

> @@ -259,8 +259,10 @@ struct device_link *device_link_add(struct device *consumer,

>                                  * runtime PM usage counter after consumer probe

>                                  * in driver_probe_device().

>                                  */

> -                               if (flags & DL_FLAG_PM_RUNTIME)

> +                               if (flags & DL_FLAG_PM_RUNTIME) {


A thought: Why isn't the DL_FLAG_RPM_ACTIVE part of the condition above? Rafael?

>                                         pm_runtime_get_sync(supplier);

> +                                       link->rpm_active = true;

> +                               }

>

>                                 link->status = DL_STATE_CONSUMER_PROBE;

>                                 break;

> --

> 2.17.1

>


This seems to correct to me! Thanks for the fix!

Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>


Kind regards
Uffe
Marek Szyprowski June 12, 2018, 7:10 a.m. | #2
Hi Ulf,

On 2018-06-08 11:29, Ulf Hansson wrote:
> On 8 June 2018 at 10:41, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>> When device links with DL_FLAG_PM_RUNTIME are created during consumer

>> probe, the supplier device is unconditionally runtime activated by

>> pm_runtime_get_sync(). However this get() is not noticed by

>> rpm_put_suppliers() called at the end of probe(), because the newly

>> created link lacks a rpm_active state bit. This worked when probe() called

>> pm_runtime_(resume,suspend)_suppliers() unconditionally, but patch

>> 1e8378619841 ("PM / runtime: Fixup reference counting of device

>> link suppliers at probe") changed probe() to rely on the generic

>> rpm_put_suppliers() code. This patch updates the newly created link state

>> to proper value.

>>

>> Fixes: 1e8378619841 ("PM / runtime: Fixup reference counting of device link suppliers at probe")

>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

>> ---

>> This fixes Exynos IOMMU driver being unconditionally set to runtime active

>> state since the mentioned commit.

>> ---

>>   drivers/base/core.c | 4 +++-

>>   1 file changed, 3 insertions(+), 1 deletion(-)

>>

>> diff --git a/drivers/base/core.c b/drivers/base/core.c

>> index ad7b50897bcc..8ffd353f40ce 100644

>> --- a/drivers/base/core.c

>> +++ b/drivers/base/core.c

>> @@ -259,8 +259,10 @@ struct device_link *device_link_add(struct device *consumer,

>>                                   * runtime PM usage counter after consumer probe

>>                                   * in driver_probe_device().

>>                                   */

>> -                               if (flags & DL_FLAG_PM_RUNTIME)

>> +                               if (flags & DL_FLAG_PM_RUNTIME) {

> A thought: Why isn't the DL_FLAG_RPM_ACTIVE part of the condition above? Rafael?

>

>>                                          pm_runtime_get_sync(supplier);

>> +                                       link->rpm_active = true;

>> +                               }

>>

>>                                  link->status = DL_STATE_CONSUMER_PROBE;

>>                                  break;

>> --

>> 2.17.1

>>

> This seems to correct to me! Thanks for the fix!

>

> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>


Now I've investigated this issue further and there is still a regression
caused by 1e8378619841 ("PM / runtime: Fixup reference counting of device
link suppliers at probe") commit in the following case:

There are 2 devices: jpeg-codec and iommu. IOMMU is linked as a consumer
(with DL_FLAG_PM_RUNTIME flag) to jpeg-codec during jpeg-codec probe()
procedure. Jpeg-codec calls pm_runtime_enalbe() in is probe() function
and does nothing more, because it only registers the device in the system
and doesn't touch any hardware at all. IOMMU gets resumed during
jpeg-codec probe(). After the mentioned patch (and my fix), IOMMU is kept
runtime active, because rpm_idle() returns early with -EAGAIN and doesn't
call __rpm_callback()/rpm_put_suppliers(). This is valid until the first
use of jpeg-codec (so the first transition of jpeg-codec device from
runtime active to runtime suspended state).

Do you have any suggestion how this regression can be solved? I'm really
curious if there was any case which got fixed the the mentioned commit?

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland
Rafael J. Wysocki June 12, 2018, 8:22 a.m. | #3
On Tue, Jun 12, 2018 at 9:10 AM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> Hi Ulf,

>

> On 2018-06-08 11:29, Ulf Hansson wrote:

>> On 8 June 2018 at 10:41, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>>> When device links with DL_FLAG_PM_RUNTIME are created during consumer

>>> probe, the supplier device is unconditionally runtime activated by

>>> pm_runtime_get_sync(). However this get() is not noticed by

>>> rpm_put_suppliers() called at the end of probe(), because the newly

>>> created link lacks a rpm_active state bit. This worked when probe() called

>>> pm_runtime_(resume,suspend)_suppliers() unconditionally, but patch

>>> 1e8378619841 ("PM / runtime: Fixup reference counting of device

>>> link suppliers at probe") changed probe() to rely on the generic

>>> rpm_put_suppliers() code. This patch updates the newly created link state

>>> to proper value.

>>>

>>> Fixes: 1e8378619841 ("PM / runtime: Fixup reference counting of device link suppliers at probe")

>>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

>>> ---

>>> This fixes Exynos IOMMU driver being unconditionally set to runtime active

>>> state since the mentioned commit.

>>> ---

>>>   drivers/base/core.c | 4 +++-

>>>   1 file changed, 3 insertions(+), 1 deletion(-)

>>>

>>> diff --git a/drivers/base/core.c b/drivers/base/core.c

>>> index ad7b50897bcc..8ffd353f40ce 100644

>>> --- a/drivers/base/core.c

>>> +++ b/drivers/base/core.c

>>> @@ -259,8 +259,10 @@ struct device_link *device_link_add(struct device *consumer,

>>>                                   * runtime PM usage counter after consumer probe

>>>                                   * in driver_probe_device().

>>>                                   */

>>> -                               if (flags & DL_FLAG_PM_RUNTIME)

>>> +                               if (flags & DL_FLAG_PM_RUNTIME) {

>> A thought: Why isn't the DL_FLAG_RPM_ACTIVE part of the condition above? Rafael?

>>

>>>                                          pm_runtime_get_sync(supplier);

>>> +                                       link->rpm_active = true;

>>> +                               }

>>>

>>>                                  link->status = DL_STATE_CONSUMER_PROBE;

>>>                                  break;

>>> --

>>> 2.17.1

>>>

>> This seems to correct to me! Thanks for the fix!

>>

>> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

>

> Now I've investigated this issue further and there is still a regression

> caused by 1e8378619841 ("PM / runtime: Fixup reference counting of device

> link suppliers at probe") commit in the following case:

>

> There are 2 devices: jpeg-codec and iommu. IOMMU is linked as a consumer

> (with DL_FLAG_PM_RUNTIME flag) to jpeg-codec during jpeg-codec probe()

> procedure. Jpeg-codec calls pm_runtime_enalbe() in is probe() function

> and does nothing more, because it only registers the device in the system

> and doesn't touch any hardware at all. IOMMU gets resumed during

> jpeg-codec probe(). After the mentioned patch (and my fix), IOMMU is kept

> runtime active, because rpm_idle() returns early with -EAGAIN and doesn't

> call __rpm_callback()/rpm_put_suppliers(). This is valid until the first

> use of jpeg-codec (so the first transition of jpeg-codec device from

> runtime active to runtime suspended state).

>

> Do you have any suggestion how this regression can be solved? I'm really

> curious if there was any case which got fixed the the mentioned commit?


Having had a deeper look into this code I'm now thinking that I should
revert the Ulf's commit, because the case mentioned in its changelog
should be covered exactly by the condition modified by your patch.

Thanks,
Rafael
Ulf Hansson June 12, 2018, 8:26 a.m. | #4
+ Jon, Todor

On 12 June 2018 at 09:10, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> Hi Ulf,

>

> On 2018-06-08 11:29, Ulf Hansson wrote:

>> On 8 June 2018 at 10:41, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>>> When device links with DL_FLAG_PM_RUNTIME are created during consumer

>>> probe, the supplier device is unconditionally runtime activated by

>>> pm_runtime_get_sync(). However this get() is not noticed by

>>> rpm_put_suppliers() called at the end of probe(), because the newly

>>> created link lacks a rpm_active state bit. This worked when probe() called

>>> pm_runtime_(resume,suspend)_suppliers() unconditionally, but patch

>>> 1e8378619841 ("PM / runtime: Fixup reference counting of device

>>> link suppliers at probe") changed probe() to rely on the generic

>>> rpm_put_suppliers() code. This patch updates the newly created link state

>>> to proper value.

>>>

>>> Fixes: 1e8378619841 ("PM / runtime: Fixup reference counting of device link suppliers at probe")

>>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

>>> ---

>>> This fixes Exynos IOMMU driver being unconditionally set to runtime active

>>> state since the mentioned commit.

>>> ---

>>>   drivers/base/core.c | 4 +++-

>>>   1 file changed, 3 insertions(+), 1 deletion(-)

>>>

>>> diff --git a/drivers/base/core.c b/drivers/base/core.c

>>> index ad7b50897bcc..8ffd353f40ce 100644

>>> --- a/drivers/base/core.c

>>> +++ b/drivers/base/core.c

>>> @@ -259,8 +259,10 @@ struct device_link *device_link_add(struct device *consumer,

>>>                                   * runtime PM usage counter after consumer probe

>>>                                   * in driver_probe_device().

>>>                                   */

>>> -                               if (flags & DL_FLAG_PM_RUNTIME)

>>> +                               if (flags & DL_FLAG_PM_RUNTIME) {

>> A thought: Why isn't the DL_FLAG_RPM_ACTIVE part of the condition above? Rafael?

>>

>>>                                          pm_runtime_get_sync(supplier);

>>> +                                       link->rpm_active = true;

>>> +                               }

>>>

>>>                                  link->status = DL_STATE_CONSUMER_PROBE;

>>>                                  break;

>>> --

>>> 2.17.1

>>>

>> This seems to correct to me! Thanks for the fix!

>>

>> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

>

> Now I've investigated this issue further and there is still a regression

> caused by 1e8378619841 ("PM / runtime: Fixup reference counting of device

> link suppliers at probe") commit in the following case:

>

> There are 2 devices: jpeg-codec and iommu. IOMMU is linked as a consumer

> (with DL_FLAG_PM_RUNTIME flag) to jpeg-codec during jpeg-codec probe()

> procedure. Jpeg-codec calls pm_runtime_enalbe() in is probe() function

> and does nothing more, because it only registers the device in the system

> and doesn't touch any hardware at all. IOMMU gets resumed during

> jpeg-codec probe().


Right, so the IOMMU is runtime resumed because of the call to
pm_runtime_get_sync(supplier) above. This happened already before.

According to your information, it seems like this is actually
unnecessary and it would be nice if it could be avoided.

> After the mentioned patch (and my fix), IOMMU is kept

> runtime active, because rpm_idle() returns early with -EAGAIN and doesn't

> call __rpm_callback()/rpm_put_suppliers().


Yes, this is because the runtime PM usage count is now correctly
balanced. In other words, the earlier call to
pm_runtime_get_sync(supplier), keeps the IOMMU active.

> This is valid until the first

> use of jpeg-codec (so the first transition of jpeg-codec device from

> runtime active to runtime suspended state).


Yep, I get it. Thanks for the detailed description!

>

> Do you have any suggestion how this regression can be solved? I'm really

> curious if there was any case which got fixed by the mentioned commit?


So after a second thought, it seems like we should just drop the call
to pm_runtime_get_sync(supplier) above altogether. It's already been
done in case the the DL_FLAG_PM_RUNTIME and DL_FLAG_RPM_ACTIVE flags
are set.

I assume you don't have DL_FLAG_RPM_ACTIVE?

It may be that the commit introduces regressions, and thanks for
reporting and helping out.

In my tests (just using a test driver), I noticed the problems. And if
I recall correctly, so did Jon and Todor.

However, yes, it's need to be able to correctly manages created device
links during ->probe().

Kind regards
Uffe
Ulf Hansson June 12, 2018, 8:31 a.m. | #5
On 12 June 2018 at 10:22, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Tue, Jun 12, 2018 at 9:10 AM, Marek Szyprowski

> <m.szyprowski@samsung.com> wrote:

>> Hi Ulf,

>>

>> On 2018-06-08 11:29, Ulf Hansson wrote:

>>> On 8 June 2018 at 10:41, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>>>> When device links with DL_FLAG_PM_RUNTIME are created during consumer

>>>> probe, the supplier device is unconditionally runtime activated by

>>>> pm_runtime_get_sync(). However this get() is not noticed by

>>>> rpm_put_suppliers() called at the end of probe(), because the newly

>>>> created link lacks a rpm_active state bit. This worked when probe() called

>>>> pm_runtime_(resume,suspend)_suppliers() unconditionally, but patch

>>>> 1e8378619841 ("PM / runtime: Fixup reference counting of device

>>>> link suppliers at probe") changed probe() to rely on the generic

>>>> rpm_put_suppliers() code. This patch updates the newly created link state

>>>> to proper value.

>>>>

>>>> Fixes: 1e8378619841 ("PM / runtime: Fixup reference counting of device link suppliers at probe")

>>>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

>>>> ---

>>>> This fixes Exynos IOMMU driver being unconditionally set to runtime active

>>>> state since the mentioned commit.

>>>> ---

>>>>   drivers/base/core.c | 4 +++-

>>>>   1 file changed, 3 insertions(+), 1 deletion(-)

>>>>

>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c

>>>> index ad7b50897bcc..8ffd353f40ce 100644

>>>> --- a/drivers/base/core.c

>>>> +++ b/drivers/base/core.c

>>>> @@ -259,8 +259,10 @@ struct device_link *device_link_add(struct device *consumer,

>>>>                                   * runtime PM usage counter after consumer probe

>>>>                                   * in driver_probe_device().

>>>>                                   */

>>>> -                               if (flags & DL_FLAG_PM_RUNTIME)

>>>> +                               if (flags & DL_FLAG_PM_RUNTIME) {

>>> A thought: Why isn't the DL_FLAG_RPM_ACTIVE part of the condition above? Rafael?

>>>

>>>>                                          pm_runtime_get_sync(supplier);

>>>> +                                       link->rpm_active = true;

>>>> +                               }

>>>>

>>>>                                  link->status = DL_STATE_CONSUMER_PROBE;

>>>>                                  break;

>>>> --

>>>> 2.17.1

>>>>

>>> This seems to correct to me! Thanks for the fix!

>>>

>>> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

>>

>> Now I've investigated this issue further and there is still a regression

>> caused by 1e8378619841 ("PM / runtime: Fixup reference counting of device

>> link suppliers at probe") commit in the following case:

>>

>> There are 2 devices: jpeg-codec and iommu. IOMMU is linked as a consumer

>> (with DL_FLAG_PM_RUNTIME flag) to jpeg-codec during jpeg-codec probe()

>> procedure. Jpeg-codec calls pm_runtime_enalbe() in is probe() function

>> and does nothing more, because it only registers the device in the system

>> and doesn't touch any hardware at all. IOMMU gets resumed during

>> jpeg-codec probe(). After the mentioned patch (and my fix), IOMMU is kept

>> runtime active, because rpm_idle() returns early with -EAGAIN and doesn't

>> call __rpm_callback()/rpm_put_suppliers(). This is valid until the first

>> use of jpeg-codec (so the first transition of jpeg-codec device from

>> runtime active to runtime suspended state).

>>

>> Do you have any suggestion how this regression can be solved? I'm really

>> curious if there was any case which got fixed the the mentioned commit?

>

> Having had a deeper look into this code I'm now thinking that I should

> revert the Ulf's commit, because the case mentioned in its changelog

> should be covered exactly by the condition modified by your patch.


That would break the case for DL_FLAG_STATELESS. Which is the one I
have been focusing on (and clearly messed up the other case, sorry
about that).

Please have a look at my last reply to Marek. Actually, I think we can
improve the situation compared how it were before, as we don't need to
unconditionally runtime resume the supplier, but rather rely on the
DL_FLAG_RPM_ACTIVE flag.

Kind regards
Uffe
Rafael J. Wysocki June 12, 2018, 8:31 a.m. | #6
On Tue, Jun 12, 2018 at 10:26 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> + Jon, Todor

>

> On 12 June 2018 at 09:10, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>> Hi Ulf,

>>

>> On 2018-06-08 11:29, Ulf Hansson wrote:

>>> On 8 June 2018 at 10:41, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>>>> When device links with DL_FLAG_PM_RUNTIME are created during consumer

>>>> probe, the supplier device is unconditionally runtime activated by

>>>> pm_runtime_get_sync(). However this get() is not noticed by

>>>> rpm_put_suppliers() called at the end of probe(), because the newly

>>>> created link lacks a rpm_active state bit. This worked when probe() called

>>>> pm_runtime_(resume,suspend)_suppliers() unconditionally, but patch

>>>> 1e8378619841 ("PM / runtime: Fixup reference counting of device

>>>> link suppliers at probe") changed probe() to rely on the generic

>>>> rpm_put_suppliers() code. This patch updates the newly created link state

>>>> to proper value.

>>>>

>>>> Fixes: 1e8378619841 ("PM / runtime: Fixup reference counting of device link suppliers at probe")

>>>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

>>>> ---

>>>> This fixes Exynos IOMMU driver being unconditionally set to runtime active

>>>> state since the mentioned commit.

>>>> ---

>>>>   drivers/base/core.c | 4 +++-

>>>>   1 file changed, 3 insertions(+), 1 deletion(-)

>>>>

>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c

>>>> index ad7b50897bcc..8ffd353f40ce 100644

>>>> --- a/drivers/base/core.c

>>>> +++ b/drivers/base/core.c

>>>> @@ -259,8 +259,10 @@ struct device_link *device_link_add(struct device *consumer,

>>>>                                   * runtime PM usage counter after consumer probe

>>>>                                   * in driver_probe_device().

>>>>                                   */

>>>> -                               if (flags & DL_FLAG_PM_RUNTIME)

>>>> +                               if (flags & DL_FLAG_PM_RUNTIME) {

>>> A thought: Why isn't the DL_FLAG_RPM_ACTIVE part of the condition above? Rafael?

>>>

>>>>                                          pm_runtime_get_sync(supplier);

>>>> +                                       link->rpm_active = true;

>>>> +                               }

>>>>

>>>>                                  link->status = DL_STATE_CONSUMER_PROBE;

>>>>                                  break;

>>>> --

>>>> 2.17.1

>>>>

>>> This seems to correct to me! Thanks for the fix!

>>>

>>> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

>>

>> Now I've investigated this issue further and there is still a regression

>> caused by 1e8378619841 ("PM / runtime: Fixup reference counting of device

>> link suppliers at probe") commit in the following case:

>>

>> There are 2 devices: jpeg-codec and iommu. IOMMU is linked as a consumer

>> (with DL_FLAG_PM_RUNTIME flag) to jpeg-codec during jpeg-codec probe()

>> procedure. Jpeg-codec calls pm_runtime_enalbe() in is probe() function

>> and does nothing more, because it only registers the device in the system

>> and doesn't touch any hardware at all. IOMMU gets resumed during

>> jpeg-codec probe().

>

> Right, so the IOMMU is runtime resumed because of the call to

> pm_runtime_get_sync(supplier) above. This happened already before.

>

> According to your information, it seems like this is actually

> unnecessary and it would be nice if it could be avoided.

>

>> After the mentioned patch (and my fix), IOMMU is kept

>> runtime active, because rpm_idle() returns early with -EAGAIN and doesn't

>> call __rpm_callback()/rpm_put_suppliers().

>

> Yes, this is because the runtime PM usage count is now correctly

> balanced. In other words, the earlier call to

> pm_runtime_get_sync(supplier), keeps the IOMMU active.

>

>> This is valid until the first

>> use of jpeg-codec (so the first transition of jpeg-codec device from

>> runtime active to runtime suspended state).

>

> Yep, I get it. Thanks for the detailed description!

>

>>

>> Do you have any suggestion how this regression can be solved? I'm really

>> curious if there was any case which got fixed by the mentioned commit?

>

> So after a second thought, it seems like we should just drop the call

> to pm_runtime_get_sync(supplier) above altogether. It's already been

> done in case the the DL_FLAG_PM_RUNTIME and DL_FLAG_RPM_ACTIVE flags

> are set.


The problem is that nobody sets DL_FLAG_RPM_ACTIVE and relying on that
to get the refcounts right would be fragile anyway.

> I assume you don't have DL_FLAG_RPM_ACTIVE?

>

> It may be that the commit introduces regressions, and thanks for

> reporting and helping out.

>

> In my tests (just using a test driver), I noticed the problems. And if

> I recall correctly, so did Jon and Todor.


So actually I'd like to hear about them in the first place.

> However, yes, it's need to be able to correctly manages created device

> links during ->probe().


Right.

So I'm reverting commit 1e8378619841 and lets start again.

Thanks,
Rafael
Rafael J. Wysocki June 12, 2018, 8:33 a.m. | #7
On Tue, Jun 12, 2018 at 10:31 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On 12 June 2018 at 10:22, Rafael J. Wysocki <rafael@kernel.org> wrote:

>> On Tue, Jun 12, 2018 at 9:10 AM, Marek Szyprowski

>> <m.szyprowski@samsung.com> wrote:

>>> Hi Ulf,

>>>

>>> On 2018-06-08 11:29, Ulf Hansson wrote:

>>>> On 8 June 2018 at 10:41, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>>>>> When device links with DL_FLAG_PM_RUNTIME are created during consumer

>>>>> probe, the supplier device is unconditionally runtime activated by

>>>>> pm_runtime_get_sync(). However this get() is not noticed by

>>>>> rpm_put_suppliers() called at the end of probe(), because the newly

>>>>> created link lacks a rpm_active state bit. This worked when probe() called

>>>>> pm_runtime_(resume,suspend)_suppliers() unconditionally, but patch

>>>>> 1e8378619841 ("PM / runtime: Fixup reference counting of device

>>>>> link suppliers at probe") changed probe() to rely on the generic

>>>>> rpm_put_suppliers() code. This patch updates the newly created link state

>>>>> to proper value.

>>>>>

>>>>> Fixes: 1e8378619841 ("PM / runtime: Fixup reference counting of device link suppliers at probe")

>>>>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

>>>>> ---

>>>>> This fixes Exynos IOMMU driver being unconditionally set to runtime active

>>>>> state since the mentioned commit.

>>>>> ---

>>>>>   drivers/base/core.c | 4 +++-

>>>>>   1 file changed, 3 insertions(+), 1 deletion(-)

>>>>>

>>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c

>>>>> index ad7b50897bcc..8ffd353f40ce 100644

>>>>> --- a/drivers/base/core.c

>>>>> +++ b/drivers/base/core.c

>>>>> @@ -259,8 +259,10 @@ struct device_link *device_link_add(struct device *consumer,

>>>>>                                   * runtime PM usage counter after consumer probe

>>>>>                                   * in driver_probe_device().

>>>>>                                   */

>>>>> -                               if (flags & DL_FLAG_PM_RUNTIME)

>>>>> +                               if (flags & DL_FLAG_PM_RUNTIME) {

>>>> A thought: Why isn't the DL_FLAG_RPM_ACTIVE part of the condition above? Rafael?

>>>>

>>>>>                                          pm_runtime_get_sync(supplier);

>>>>> +                                       link->rpm_active = true;

>>>>> +                               }

>>>>>

>>>>>                                  link->status = DL_STATE_CONSUMER_PROBE;

>>>>>                                  break;

>>>>> --

>>>>> 2.17.1

>>>>>

>>>> This seems to correct to me! Thanks for the fix!

>>>>

>>>> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

>>>

>>> Now I've investigated this issue further and there is still a regression

>>> caused by 1e8378619841 ("PM / runtime: Fixup reference counting of device

>>> link suppliers at probe") commit in the following case:

>>>

>>> There are 2 devices: jpeg-codec and iommu. IOMMU is linked as a consumer

>>> (with DL_FLAG_PM_RUNTIME flag) to jpeg-codec during jpeg-codec probe()

>>> procedure. Jpeg-codec calls pm_runtime_enalbe() in is probe() function

>>> and does nothing more, because it only registers the device in the system

>>> and doesn't touch any hardware at all. IOMMU gets resumed during

>>> jpeg-codec probe(). After the mentioned patch (and my fix), IOMMU is kept

>>> runtime active, because rpm_idle() returns early with -EAGAIN and doesn't

>>> call __rpm_callback()/rpm_put_suppliers(). This is valid until the first

>>> use of jpeg-codec (so the first transition of jpeg-codec device from

>>> runtime active to runtime suspended state).

>>>

>>> Do you have any suggestion how this regression can be solved? I'm really

>>> curious if there was any case which got fixed the the mentioned commit?

>>

>> Having had a deeper look into this code I'm now thinking that I should

>> revert the Ulf's commit, because the case mentioned in its changelog

>> should be covered exactly by the condition modified by your patch.

>

> That would break the case for DL_FLAG_STATELESS. Which is the one I

> have been focusing on (and clearly messed up the other case, sorry

> about that).


Well, so why don't we start over?

This time with a clear problem description, please?

> Please have a look at my last reply to Marek. Actually, I think we can

> improve the situation compared how it were before, as we don't need to

> unconditionally runtime resume the supplier, but rather rely on the

> DL_FLAG_RPM_ACTIVE flag.


I have read it.

Thanks,
Rafael
Ulf Hansson June 12, 2018, 8:37 a.m. | #8
On 12 June 2018 at 10:33, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Tue, Jun 12, 2018 at 10:31 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:

>> On 12 June 2018 at 10:22, Rafael J. Wysocki <rafael@kernel.org> wrote:

>>> On Tue, Jun 12, 2018 at 9:10 AM, Marek Szyprowski

>>> <m.szyprowski@samsung.com> wrote:

>>>> Hi Ulf,

>>>>

>>>> On 2018-06-08 11:29, Ulf Hansson wrote:

>>>>> On 8 June 2018 at 10:41, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>>>>>> When device links with DL_FLAG_PM_RUNTIME are created during consumer

>>>>>> probe, the supplier device is unconditionally runtime activated by

>>>>>> pm_runtime_get_sync(). However this get() is not noticed by

>>>>>> rpm_put_suppliers() called at the end of probe(), because the newly

>>>>>> created link lacks a rpm_active state bit. This worked when probe() called

>>>>>> pm_runtime_(resume,suspend)_suppliers() unconditionally, but patch

>>>>>> 1e8378619841 ("PM / runtime: Fixup reference counting of device

>>>>>> link suppliers at probe") changed probe() to rely on the generic

>>>>>> rpm_put_suppliers() code. This patch updates the newly created link state

>>>>>> to proper value.

>>>>>>

>>>>>> Fixes: 1e8378619841 ("PM / runtime: Fixup reference counting of device link suppliers at probe")

>>>>>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

>>>>>> ---

>>>>>> This fixes Exynos IOMMU driver being unconditionally set to runtime active

>>>>>> state since the mentioned commit.

>>>>>> ---

>>>>>>   drivers/base/core.c | 4 +++-

>>>>>>   1 file changed, 3 insertions(+), 1 deletion(-)

>>>>>>

>>>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c

>>>>>> index ad7b50897bcc..8ffd353f40ce 100644

>>>>>> --- a/drivers/base/core.c

>>>>>> +++ b/drivers/base/core.c

>>>>>> @@ -259,8 +259,10 @@ struct device_link *device_link_add(struct device *consumer,

>>>>>>                                   * runtime PM usage counter after consumer probe

>>>>>>                                   * in driver_probe_device().

>>>>>>                                   */

>>>>>> -                               if (flags & DL_FLAG_PM_RUNTIME)

>>>>>> +                               if (flags & DL_FLAG_PM_RUNTIME) {

>>>>> A thought: Why isn't the DL_FLAG_RPM_ACTIVE part of the condition above? Rafael?

>>>>>

>>>>>>                                          pm_runtime_get_sync(supplier);

>>>>>> +                                       link->rpm_active = true;

>>>>>> +                               }

>>>>>>

>>>>>>                                  link->status = DL_STATE_CONSUMER_PROBE;

>>>>>>                                  break;

>>>>>> --

>>>>>> 2.17.1

>>>>>>

>>>>> This seems to correct to me! Thanks for the fix!

>>>>>

>>>>> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

>>>>

>>>> Now I've investigated this issue further and there is still a regression

>>>> caused by 1e8378619841 ("PM / runtime: Fixup reference counting of device

>>>> link suppliers at probe") commit in the following case:

>>>>

>>>> There are 2 devices: jpeg-codec and iommu. IOMMU is linked as a consumer

>>>> (with DL_FLAG_PM_RUNTIME flag) to jpeg-codec during jpeg-codec probe()

>>>> procedure. Jpeg-codec calls pm_runtime_enalbe() in is probe() function

>>>> and does nothing more, because it only registers the device in the system

>>>> and doesn't touch any hardware at all. IOMMU gets resumed during

>>>> jpeg-codec probe(). After the mentioned patch (and my fix), IOMMU is kept

>>>> runtime active, because rpm_idle() returns early with -EAGAIN and doesn't

>>>> call __rpm_callback()/rpm_put_suppliers(). This is valid until the first

>>>> use of jpeg-codec (so the first transition of jpeg-codec device from

>>>> runtime active to runtime suspended state).

>>>>

>>>> Do you have any suggestion how this regression can be solved? I'm really

>>>> curious if there was any case which got fixed the the mentioned commit?

>>>

>>> Having had a deeper look into this code I'm now thinking that I should

>>> revert the Ulf's commit, because the case mentioned in its changelog

>>> should be covered exactly by the condition modified by your patch.

>>

>> That would break the case for DL_FLAG_STATELESS. Which is the one I

>> have been focusing on (and clearly messed up the other case, sorry

>> about that).

>

> Well, so why don't we start over?


I think it's easier with a new fix on top. Reverting it would break
DL_FLAG_STATELESS, so just moving things into another broken state.

But, it's up to you.

[...]

Kind regards
Uffe
Marek Szyprowski June 12, 2018, 8:40 a.m. | #9
Hi Ulf,

On 2018-06-12 10:26, Ulf Hansson wrote:
> + Jon, Todor

>

> On 12 June 2018 at 09:10, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>> On 2018-06-08 11:29, Ulf Hansson wrote:

>>> On 8 June 2018 at 10:41, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>>>> When device links with DL_FLAG_PM_RUNTIME are created during consumer

>>>> probe, the supplier device is unconditionally runtime activated by

>>>> pm_runtime_get_sync(). However this get() is not noticed by

>>>> rpm_put_suppliers() called at the end of probe(), because the newly

>>>> created link lacks a rpm_active state bit. This worked when probe() called

>>>> pm_runtime_(resume,suspend)_suppliers() unconditionally, but patch

>>>> 1e8378619841 ("PM / runtime: Fixup reference counting of device

>>>> link suppliers at probe") changed probe() to rely on the generic

>>>> rpm_put_suppliers() code. This patch updates the newly created link state

>>>> to proper value.

>>>>

>>>> Fixes: 1e8378619841 ("PM / runtime: Fixup reference counting of device link suppliers at probe")

>>>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

>>>> ---

>>>> This fixes Exynos IOMMU driver being unconditionally set to runtime active

>>>> state since the mentioned commit.

>>>> ---

>>>>    drivers/base/core.c | 4 +++-

>>>>    1 file changed, 3 insertions(+), 1 deletion(-)

>>>>

>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c

>>>> index ad7b50897bcc..8ffd353f40ce 100644

>>>> --- a/drivers/base/core.c

>>>> +++ b/drivers/base/core.c

>>>> @@ -259,8 +259,10 @@ struct device_link *device_link_add(struct device *consumer,

>>>>                                    * runtime PM usage counter after consumer probe

>>>>                                    * in driver_probe_device().

>>>>                                    */

>>>> -                               if (flags & DL_FLAG_PM_RUNTIME)

>>>> +                               if (flags & DL_FLAG_PM_RUNTIME) {

>>> A thought: Why isn't the DL_FLAG_RPM_ACTIVE part of the condition above? Rafael?

>>>

>>>>                                           pm_runtime_get_sync(supplier);

>>>> +                                       link->rpm_active = true;

>>>> +                               }

>>>>

>>>>                                   link->status = DL_STATE_CONSUMER_PROBE;

>>>>                                   break;

>>>> --

>>>> 2.17.1

>>>>

>>> This seems to correct to me! Thanks for the fix!

>>>

>>> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

>> Now I've investigated this issue further and there is still a regression

>> caused by 1e8378619841 ("PM / runtime: Fixup reference counting of device

>> link suppliers at probe") commit in the following case:

>>

>> There are 2 devices: jpeg-codec and iommu. IOMMU is linked as a consumer

>> (with DL_FLAG_PM_RUNTIME flag) to jpeg-codec during jpeg-codec probe()

>> procedure. Jpeg-codec calls pm_runtime_enalbe() in is probe() function

>> and does nothing more, because it only registers the device in the system

>> and doesn't touch any hardware at all. IOMMU gets resumed during

>> jpeg-codec probe().

> Right, so the IOMMU is runtime resumed because of the call to

> pm_runtime_get_sync(supplier) above. This happened already before.

>

> According to your information, it seems like this is actually

> unnecessary and it would be nice if it could be avoided.


Not, it is necessary because the device driver should be able to do anything
in its probe() and until it calls pm_runtime_enable() on itself, the proper
resuming of parent(s) and supplier(s) are handled by PM/device core. That's
my understanding of how this works.

>> After the mentioned patch (and my fix), IOMMU is kept

>> runtime active, because rpm_idle() returns early with -EAGAIN and doesn't

>> call __rpm_callback()/rpm_put_suppliers().

> Yes, this is because the runtime PM usage count is now correctly

> balanced. In other words, the earlier call to

> pm_runtime_get_sync(supplier), keeps the IOMMU active.

>

>> This is valid until the first

>> use of jpeg-codec (so the first transition of jpeg-codec device from

>> runtime active to runtime suspended state).

> Yep, I get it. Thanks for the detailed description!

>

>> Do you have any suggestion how this regression can be solved? I'm really

>> curious if there was any case which got fixed by the mentioned commit?

> So after a second thought, it seems like we should just drop the call

> to pm_runtime_get_sync(supplier) above altogether. It's already been

> done in case the the DL_FLAG_PM_RUNTIME and DL_FLAG_RPM_ACTIVE flags

> are set.

>

> I assume you don't have DL_FLAG_RPM_ACTIVE?


Right, exynos-iommu driver don't set DL_FLAG_RPM_ACTIVE, because all it 
wants
is iommu's device runtime pm state to follow the supplier device runtime pm
state.

> It may be that the commit introduces regressions, and thanks for

> reporting and helping out.

>

> In my tests (just using a test driver), I noticed the problems. And if

> I recall correctly, so did Jon and Todor.

>

> However, yes, it's need to be able to correctly manages created device

> links during ->probe().


Please CC: me if you post a patch related to device links and runtime pm
handling.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland
Rafael J. Wysocki June 12, 2018, 8:42 a.m. | #10
On Tue, Jun 12, 2018 at 10:37 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On 12 June 2018 at 10:33, Rafael J. Wysocki <rafael@kernel.org> wrote:

>> On Tue, Jun 12, 2018 at 10:31 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:

>>> On 12 June 2018 at 10:22, Rafael J. Wysocki <rafael@kernel.org> wrote:

>>>> On Tue, Jun 12, 2018 at 9:10 AM, Marek Szyprowski

>>>> <m.szyprowski@samsung.com> wrote:

>>>>> Hi Ulf,

>>>>>

>>>>> On 2018-06-08 11:29, Ulf Hansson wrote:

>>>>>> On 8 June 2018 at 10:41, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>>>>>>> When device links with DL_FLAG_PM_RUNTIME are created during consumer

>>>>>>> probe, the supplier device is unconditionally runtime activated by

>>>>>>> pm_runtime_get_sync(). However this get() is not noticed by

>>>>>>> rpm_put_suppliers() called at the end of probe(), because the newly

>>>>>>> created link lacks a rpm_active state bit. This worked when probe() called

>>>>>>> pm_runtime_(resume,suspend)_suppliers() unconditionally, but patch

>>>>>>> 1e8378619841 ("PM / runtime: Fixup reference counting of device

>>>>>>> link suppliers at probe") changed probe() to rely on the generic

>>>>>>> rpm_put_suppliers() code. This patch updates the newly created link state

>>>>>>> to proper value.

>>>>>>>

>>>>>>> Fixes: 1e8378619841 ("PM / runtime: Fixup reference counting of device link suppliers at probe")

>>>>>>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

>>>>>>> ---

>>>>>>> This fixes Exynos IOMMU driver being unconditionally set to runtime active

>>>>>>> state since the mentioned commit.

>>>>>>> ---

>>>>>>>   drivers/base/core.c | 4 +++-

>>>>>>>   1 file changed, 3 insertions(+), 1 deletion(-)

>>>>>>>

>>>>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c

>>>>>>> index ad7b50897bcc..8ffd353f40ce 100644

>>>>>>> --- a/drivers/base/core.c

>>>>>>> +++ b/drivers/base/core.c

>>>>>>> @@ -259,8 +259,10 @@ struct device_link *device_link_add(struct device *consumer,

>>>>>>>                                   * runtime PM usage counter after consumer probe

>>>>>>>                                   * in driver_probe_device().

>>>>>>>                                   */

>>>>>>> -                               if (flags & DL_FLAG_PM_RUNTIME)

>>>>>>> +                               if (flags & DL_FLAG_PM_RUNTIME) {

>>>>>> A thought: Why isn't the DL_FLAG_RPM_ACTIVE part of the condition above? Rafael?

>>>>>>

>>>>>>>                                          pm_runtime_get_sync(supplier);

>>>>>>> +                                       link->rpm_active = true;

>>>>>>> +                               }

>>>>>>>

>>>>>>>                                  link->status = DL_STATE_CONSUMER_PROBE;

>>>>>>>                                  break;

>>>>>>> --

>>>>>>> 2.17.1

>>>>>>>

>>>>>> This seems to correct to me! Thanks for the fix!

>>>>>>

>>>>>> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

>>>>>

>>>>> Now I've investigated this issue further and there is still a regression

>>>>> caused by 1e8378619841 ("PM / runtime: Fixup reference counting of device

>>>>> link suppliers at probe") commit in the following case:

>>>>>

>>>>> There are 2 devices: jpeg-codec and iommu. IOMMU is linked as a consumer

>>>>> (with DL_FLAG_PM_RUNTIME flag) to jpeg-codec during jpeg-codec probe()

>>>>> procedure. Jpeg-codec calls pm_runtime_enalbe() in is probe() function

>>>>> and does nothing more, because it only registers the device in the system

>>>>> and doesn't touch any hardware at all. IOMMU gets resumed during

>>>>> jpeg-codec probe(). After the mentioned patch (and my fix), IOMMU is kept

>>>>> runtime active, because rpm_idle() returns early with -EAGAIN and doesn't

>>>>> call __rpm_callback()/rpm_put_suppliers(). This is valid until the first

>>>>> use of jpeg-codec (so the first transition of jpeg-codec device from

>>>>> runtime active to runtime suspended state).

>>>>>

>>>>> Do you have any suggestion how this regression can be solved? I'm really

>>>>> curious if there was any case which got fixed the the mentioned commit?

>>>>

>>>> Having had a deeper look into this code I'm now thinking that I should

>>>> revert the Ulf's commit, because the case mentioned in its changelog

>>>> should be covered exactly by the condition modified by your patch.

>>>

>>> That would break the case for DL_FLAG_STATELESS. Which is the one I

>>> have been focusing on (and clearly messed up the other case, sorry

>>> about that).

>>

>> Well, so why don't we start over?

>

> I think it's easier with a new fix on top. Reverting it would break

> DL_FLAG_STATELESS, so just moving things into another broken state.


Yes, the DL_FLAG_STATELESS is broken and it needs to be fixed, but I'm
not really sure if the approach in your commit is the best one.

Even if it is, the right way to do it is to revert things to a
previous state and do a proper fix then IMO.

> But, it's up to you.


So as I said before, I'll revert it and let's fix the problem properly.

Thanks,
Rafael
Ulf Hansson June 12, 2018, 8:46 a.m. | #11
[...]

>>> Now I've investigated this issue further and there is still a regression

>>> caused by 1e8378619841 ("PM / runtime: Fixup reference counting of device

>>> link suppliers at probe") commit in the following case:

>>>

>>> There are 2 devices: jpeg-codec and iommu. IOMMU is linked as a consumer

>>> (with DL_FLAG_PM_RUNTIME flag) to jpeg-codec during jpeg-codec probe()

>>> procedure. Jpeg-codec calls pm_runtime_enalbe() in is probe() function

>>> and does nothing more, because it only registers the device in the system

>>> and doesn't touch any hardware at all. IOMMU gets resumed during

>>> jpeg-codec probe().

>> Right, so the IOMMU is runtime resumed because of the call to

>> pm_runtime_get_sync(supplier) above. This happened already before.

>>

>> According to your information, it seems like this is actually

>> unnecessary and it would be nice if it could be avoided.

>

> Not, it is necessary because the device driver should be able to do anything

> in its probe() and until it calls pm_runtime_enable() on itself, the proper

> resuming of parent(s) and supplier(s) are handled by PM/device core. That's

> my understanding of how this works.


I see. So clearly, you would like the driver core to consider
DL_FLAG_RPM_ACTIVE, before it runtime resumed the supplier.

>

>>> After the mentioned patch (and my fix), IOMMU is kept

>>> runtime active, because rpm_idle() returns early with -EAGAIN and doesn't

>>> call __rpm_callback()/rpm_put_suppliers().

>> Yes, this is because the runtime PM usage count is now correctly

>> balanced. In other words, the earlier call to

>> pm_runtime_get_sync(supplier), keeps the IOMMU active.

>>

>>> This is valid until the first

>>> use of jpeg-codec (so the first transition of jpeg-codec device from

>>> runtime active to runtime suspended state).

>> Yep, I get it. Thanks for the detailed description!

>>

>>> Do you have any suggestion how this regression can be solved? I'm really

>>> curious if there was any case which got fixed by the mentioned commit?

>> So after a second thought, it seems like we should just drop the call

>> to pm_runtime_get_sync(supplier) above altogether. It's already been

>> done in case the the DL_FLAG_PM_RUNTIME and DL_FLAG_RPM_ACTIVE flags

>> are set.

>>

>> I assume you don't have DL_FLAG_RPM_ACTIVE?

>

> Right, exynos-iommu driver don't set DL_FLAG_RPM_ACTIVE, because all it

> wants

> is iommu's device runtime pm state to follow the supplier device runtime pm

> state.


Got it, thanks!

>

>> It may be that the commit introduces regressions, and thanks for

>> reporting and helping out.

>>

>> In my tests (just using a test driver), I noticed the problems. And if

>> I recall correctly, so did Jon and Todor.

>>

>> However, yes, it's need to be able to correctly manages created device

>> links during ->probe().

>

> Please CC: me if you post a patch related to device links and runtime pm

> handling.


Yes, count on that!

Kind regards
Uffe
Ulf Hansson June 12, 2018, 8:48 a.m. | #12
On 12 June 2018 at 10:42, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Tue, Jun 12, 2018 at 10:37 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:

>> On 12 June 2018 at 10:33, Rafael J. Wysocki <rafael@kernel.org> wrote:

>>> On Tue, Jun 12, 2018 at 10:31 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:

>>>> On 12 June 2018 at 10:22, Rafael J. Wysocki <rafael@kernel.org> wrote:

>>>>> On Tue, Jun 12, 2018 at 9:10 AM, Marek Szyprowski

>>>>> <m.szyprowski@samsung.com> wrote:

>>>>>> Hi Ulf,

>>>>>>

>>>>>> On 2018-06-08 11:29, Ulf Hansson wrote:

>>>>>>> On 8 June 2018 at 10:41, Marek Szyprowski <m.szyprowski@samsung.com> wrote:

>>>>>>>> When device links with DL_FLAG_PM_RUNTIME are created during consumer

>>>>>>>> probe, the supplier device is unconditionally runtime activated by

>>>>>>>> pm_runtime_get_sync(). However this get() is not noticed by

>>>>>>>> rpm_put_suppliers() called at the end of probe(), because the newly

>>>>>>>> created link lacks a rpm_active state bit. This worked when probe() called

>>>>>>>> pm_runtime_(resume,suspend)_suppliers() unconditionally, but patch

>>>>>>>> 1e8378619841 ("PM / runtime: Fixup reference counting of device

>>>>>>>> link suppliers at probe") changed probe() to rely on the generic

>>>>>>>> rpm_put_suppliers() code. This patch updates the newly created link state

>>>>>>>> to proper value.

>>>>>>>>

>>>>>>>> Fixes: 1e8378619841 ("PM / runtime: Fixup reference counting of device link suppliers at probe")

>>>>>>>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

>>>>>>>> ---

>>>>>>>> This fixes Exynos IOMMU driver being unconditionally set to runtime active

>>>>>>>> state since the mentioned commit.

>>>>>>>> ---

>>>>>>>>   drivers/base/core.c | 4 +++-

>>>>>>>>   1 file changed, 3 insertions(+), 1 deletion(-)

>>>>>>>>

>>>>>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c

>>>>>>>> index ad7b50897bcc..8ffd353f40ce 100644

>>>>>>>> --- a/drivers/base/core.c

>>>>>>>> +++ b/drivers/base/core.c

>>>>>>>> @@ -259,8 +259,10 @@ struct device_link *device_link_add(struct device *consumer,

>>>>>>>>                                   * runtime PM usage counter after consumer probe

>>>>>>>>                                   * in driver_probe_device().

>>>>>>>>                                   */

>>>>>>>> -                               if (flags & DL_FLAG_PM_RUNTIME)

>>>>>>>> +                               if (flags & DL_FLAG_PM_RUNTIME) {

>>>>>>> A thought: Why isn't the DL_FLAG_RPM_ACTIVE part of the condition above? Rafael?

>>>>>>>

>>>>>>>>                                          pm_runtime_get_sync(supplier);

>>>>>>>> +                                       link->rpm_active = true;

>>>>>>>> +                               }

>>>>>>>>

>>>>>>>>                                  link->status = DL_STATE_CONSUMER_PROBE;

>>>>>>>>                                  break;

>>>>>>>> --

>>>>>>>> 2.17.1

>>>>>>>>

>>>>>>> This seems to correct to me! Thanks for the fix!

>>>>>>>

>>>>>>> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

>>>>>>

>>>>>> Now I've investigated this issue further and there is still a regression

>>>>>> caused by 1e8378619841 ("PM / runtime: Fixup reference counting of device

>>>>>> link suppliers at probe") commit in the following case:

>>>>>>

>>>>>> There are 2 devices: jpeg-codec and iommu. IOMMU is linked as a consumer

>>>>>> (with DL_FLAG_PM_RUNTIME flag) to jpeg-codec during jpeg-codec probe()

>>>>>> procedure. Jpeg-codec calls pm_runtime_enalbe() in is probe() function

>>>>>> and does nothing more, because it only registers the device in the system

>>>>>> and doesn't touch any hardware at all. IOMMU gets resumed during

>>>>>> jpeg-codec probe(). After the mentioned patch (and my fix), IOMMU is kept

>>>>>> runtime active, because rpm_idle() returns early with -EAGAIN and doesn't

>>>>>> call __rpm_callback()/rpm_put_suppliers(). This is valid until the first

>>>>>> use of jpeg-codec (so the first transition of jpeg-codec device from

>>>>>> runtime active to runtime suspended state).

>>>>>>

>>>>>> Do you have any suggestion how this regression can be solved? I'm really

>>>>>> curious if there was any case which got fixed the the mentioned commit?

>>>>>

>>>>> Having had a deeper look into this code I'm now thinking that I should

>>>>> revert the Ulf's commit, because the case mentioned in its changelog

>>>>> should be covered exactly by the condition modified by your patch.

>>>>

>>>> That would break the case for DL_FLAG_STATELESS. Which is the one I

>>>> have been focusing on (and clearly messed up the other case, sorry

>>>> about that).

>>>

>>> Well, so why don't we start over?

>>

>> I think it's easier with a new fix on top. Reverting it would break

>> DL_FLAG_STATELESS, so just moving things into another broken state.

>

> Yes, the DL_FLAG_STATELESS is broken and it needs to be fixed, but I'm

> not really sure if the approach in your commit is the best one.

>

> Even if it is, the right way to do it is to revert things to a

> previous state and do a proper fix then IMO.

>

>> But, it's up to you.

>

> So as I said before, I'll revert it and let's fix the problem properly.


Got it. I am working on a new fix then, taking also the other cases
but DL_FLAG_STATELESS into account.

Kind regards
Uffe
Rafael J. Wysocki June 12, 2018, 9:10 a.m. | #13
On Tuesday, June 12, 2018 10:48:13 AM CEST Ulf Hansson wrote:
> On 12 June 2018 at 10:42, Rafael J. Wysocki <rafael@kernel.org> wrote:

> > On Tue, Jun 12, 2018 at 10:37 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:

> >> On 12 June 2018 at 10:33, Rafael J. Wysocki <rafael@kernel.org> wrote:


[cut]

> >>>

> >>> Well, so why don't we start over?

> >>

> >> I think it's easier with a new fix on top. Reverting it would break

> >> DL_FLAG_STATELESS, so just moving things into another broken state.

> >

> > Yes, the DL_FLAG_STATELESS is broken and it needs to be fixed, but I'm

> > not really sure if the approach in your commit is the best one.

> >

> > Even if it is, the right way to do it is to revert things to a

> > previous state and do a proper fix then IMO.

> >

> >> But, it's up to you.

> >

> > So as I said before, I'll revert it and let's fix the problem properly.

> 

> Got it. I am working on a new fix then, taking also the other cases

> but DL_FLAG_STATELESS into account.


Actually, wouldn't something as simple as the patch below be sufficient
here?

---
 drivers/base/core.c |    9 +++++++++
 1 file changed, 9 insertions(+)

Index: linux-pm/drivers/base/core.c
===================================================================
--- linux-pm.orig/drivers/base/core.c
+++ linux-pm/drivers/base/core.c
@@ -229,6 +229,15 @@ struct device_link *device_link_add(stru
 	/* Determine the initial link state. */
 	if (flags & DL_FLAG_STATELESS) {
 		link->status = DL_STATE_NONE;
+		/*
+		 * If the link is being added at probe time, it is necessary to
+		 * bump up the supplier usage counter here even if the link
+		 * itself is stateless to balance the usage counter properly.
+		 */
+		if (supplier->links.status == DL_DEV_DRIVER_BOUND &&
+		    consumer->links.status == DL_DEV_PROBING &&
+		    flags & DL_FLAG_PM_RUNTIME)
+			pm_runtime_get_noresume(supplier);
 	} else {
 		switch (supplier->links.status) {
 		case DL_DEV_DRIVER_BOUND:
Ulf Hansson June 12, 2018, 9:56 a.m. | #14
On 12 June 2018 at 11:10, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Tuesday, June 12, 2018 10:48:13 AM CEST Ulf Hansson wrote:

>> On 12 June 2018 at 10:42, Rafael J. Wysocki <rafael@kernel.org> wrote:

>> > On Tue, Jun 12, 2018 at 10:37 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:

>> >> On 12 June 2018 at 10:33, Rafael J. Wysocki <rafael@kernel.org> wrote:

>

> [cut]

>

>> >>>

>> >>> Well, so why don't we start over?

>> >>

>> >> I think it's easier with a new fix on top. Reverting it would break

>> >> DL_FLAG_STATELESS, so just moving things into another broken state.

>> >

>> > Yes, the DL_FLAG_STATELESS is broken and it needs to be fixed, but I'm

>> > not really sure if the approach in your commit is the best one.

>> >

>> > Even if it is, the right way to do it is to revert things to a

>> > previous state and do a proper fix then IMO.

>> >

>> >> But, it's up to you.

>> >

>> > So as I said before, I'll revert it and let's fix the problem properly.

>>

>> Got it. I am working on a new fix then, taking also the other cases

>> but DL_FLAG_STATELESS into account.

>

> Actually, wouldn't something as simple as the patch below be sufficient

> here?

>

> ---

>  drivers/base/core.c |    9 +++++++++

>  1 file changed, 9 insertions(+)

>

> Index: linux-pm/drivers/base/core.c

> ===================================================================

> --- linux-pm.orig/drivers/base/core.c

> +++ linux-pm/drivers/base/core.c

> @@ -229,6 +229,15 @@ struct device_link *device_link_add(stru

>         /* Determine the initial link state. */

>         if (flags & DL_FLAG_STATELESS) {

>                 link->status = DL_STATE_NONE;

> +               /*

> +                * If the link is being added at probe time, it is necessary to

> +                * bump up the supplier usage counter here even if the link

> +                * itself is stateless to balance the usage counter properly.

> +                */

> +               if (supplier->links.status == DL_DEV_DRIVER_BOUND &&

> +                   consumer->links.status == DL_DEV_PROBING &&

> +                   flags & DL_FLAG_PM_RUNTIME)


I don't think checking the supplier->links.status is needed. For
example, does this work if the supplier don't have a driver bound?

Otherwise, yes, this should work. Moreover, it's good that we can keep
calling pm_runtime_put() for all suppliers after probe is done. This
prevents them from being unnecessary runtime resumed, which is the
thing I didn't really like in my approach.

Are going to send a patch, or you want me to pick it up?


> +                       pm_runtime_get_noresume(supplier);

>         } else {

>                 switch (supplier->links.status) {

>                 case DL_DEV_DRIVER_BOUND:

>


Below this else, I think we should also convert the
pm_runtime_get_sync(supplier), to a pm_runtime_get_noresume(supplier).

Callers of device_link_add() should know that they need to explicitly
use DL_FLAG_RPM_ACTIVE, if they want the supplier to be resumed.

Of course, I think that is better done in a separate patch, if revert
is needed. :-)

Kind regards
Uffe
Rafael J. Wysocki June 12, 2018, 10:26 a.m. | #15
On Tue, Jun 12, 2018 at 11:56 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On 12 June 2018 at 11:10, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:

>> On Tuesday, June 12, 2018 10:48:13 AM CEST Ulf Hansson wrote:

>>> On 12 June 2018 at 10:42, Rafael J. Wysocki <rafael@kernel.org> wrote:

>>> > On Tue, Jun 12, 2018 at 10:37 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:

>>> >> On 12 June 2018 at 10:33, Rafael J. Wysocki <rafael@kernel.org> wrote:

>>

>> [cut]

>>

>>> >>>

>>> >>> Well, so why don't we start over?

>>> >>

>>> >> I think it's easier with a new fix on top. Reverting it would break

>>> >> DL_FLAG_STATELESS, so just moving things into another broken state.

>>> >

>>> > Yes, the DL_FLAG_STATELESS is broken and it needs to be fixed, but I'm

>>> > not really sure if the approach in your commit is the best one.

>>> >

>>> > Even if it is, the right way to do it is to revert things to a

>>> > previous state and do a proper fix then IMO.

>>> >

>>> >> But, it's up to you.

>>> >

>>> > So as I said before, I'll revert it and let's fix the problem properly.

>>>

>>> Got it. I am working on a new fix then, taking also the other cases

>>> but DL_FLAG_STATELESS into account.

>>

>> Actually, wouldn't something as simple as the patch below be sufficient

>> here?

>>

>> ---

>>  drivers/base/core.c |    9 +++++++++

>>  1 file changed, 9 insertions(+)

>>

>> Index: linux-pm/drivers/base/core.c

>> ===================================================================

>> --- linux-pm.orig/drivers/base/core.c

>> +++ linux-pm/drivers/base/core.c

>> @@ -229,6 +229,15 @@ struct device_link *device_link_add(stru

>>         /* Determine the initial link state. */

>>         if (flags & DL_FLAG_STATELESS) {

>>                 link->status = DL_STATE_NONE;

>> +               /*

>> +                * If the link is being added at probe time, it is necessary to

>> +                * bump up the supplier usage counter here even if the link

>> +                * itself is stateless to balance the usage counter properly.

>> +                */

>> +               if (supplier->links.status == DL_DEV_DRIVER_BOUND &&

>> +                   consumer->links.status == DL_DEV_PROBING &&

>> +                   flags & DL_FLAG_PM_RUNTIME)

>

> I don't think checking the supplier->links.status is needed. For

> example, does this work if the supplier don't have a driver bound?


I wanted to follow the stateful branch which does this check, but
since _put_suppliers() doesn't look at the status, you are right that
this should be done in any case, so the stateful branch is missing it
too in some cases.

> Otherwise, yes, this should work. Moreover, it's good that we can keep

> calling pm_runtime_put() for all suppliers after probe is done. This

> prevents them from being unnecessary runtime resumed, which is the

> thing I didn't really like in my approach.

>

> Are going to send a patch, or you want me to pick it up?


I'll send it.

>

>> +                       pm_runtime_get_noresume(supplier);

>>         } else {

>>                 switch (supplier->links.status) {

>>                 case DL_DEV_DRIVER_BOUND:

>>

>

> Below this else, I think we should also convert the

> pm_runtime_get_sync(supplier), to a pm_runtime_get_noresume(supplier).

>

> Callers of device_link_add() should know that they need to explicitly

> use DL_FLAG_RPM_ACTIVE, if they want the supplier to be resumed.

>

> Of course, I think that is better done in a separate patch, if revert

> is needed. :-)


Right, but since the supplied usage counter bumping up is also missing
in some cases in the stateful branch, I think it needs to be done
outside of that big "if ()" and so I will use noresume in it.  And if
someone really wants suppliers to resume, they should use
DL_FLAG_RPM_ACTIVE as documented.
Ulf Hansson June 12, 2018, 10:40 a.m. | #16
[...]

>>>         } else {

>>>                 switch (supplier->links.status) {

>>>                 case DL_DEV_DRIVER_BOUND:

>>>

>>

>> Below this else, I think we should also convert the

>> pm_runtime_get_sync(supplier), to a pm_runtime_get_noresume(supplier).

>>

>> Callers of device_link_add() should know that they need to explicitly

>> use DL_FLAG_RPM_ACTIVE, if they want the supplier to be resumed.

>>

>> Of course, I think that is better done in a separate patch, if revert

>> is needed. :-)

>

> Right, but since the supplied usage counter bumping up is also missing

> in some cases in the stateful branch, I think it needs to be done

> outside of that big "if ()" and so I will use noresume in it.  And if

> someone really wants suppliers to resume, they should use

> DL_FLAG_RPM_ACTIVE as documented.


That makes perfect sense to me as well!

Thanks for helping out and kind regards
Uffe

Patch

diff --git a/drivers/base/core.c b/drivers/base/core.c
index ad7b50897bcc..8ffd353f40ce 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -259,8 +259,10 @@  struct device_link *device_link_add(struct device *consumer,
 				 * runtime PM usage counter after consumer probe
 				 * in driver_probe_device().
 				 */
-				if (flags & DL_FLAG_PM_RUNTIME)
+				if (flags & DL_FLAG_PM_RUNTIME) {
 					pm_runtime_get_sync(supplier);
+					link->rpm_active = true;
+				}
 
 				link->status = DL_STATE_CONSUMER_PROBE;
 				break;