diff mbox series

[RFC,v2,1/2] PM: domains: Skip disabling unused domains if provider has sync_state

Message ID 20230127104054.895129-1-abel.vesa@linaro.org
State New
Headers show
Series [RFC,v2,1/2] PM: domains: Skip disabling unused domains if provider has sync_state | expand

Commit Message

Abel Vesa Jan. 27, 2023, 10:40 a.m. UTC
Currently, there are cases when a domain needs to remain enabled until
the consumer driver probes. Sometimes such consumer drivers may be built
as modules. Since the genpd_power_off_unused is called too early for
such consumer driver modules to get a chance to probe, the domain, since
it is unused, will get disabled. On the other hand, the best time for
an unused domain to be disabled is on the provider's sync_state
callback. So, if the provider has registered a sync_state callback,
assume the unused domains for that provider will be disabled on its
sync_state callback. Also provide a generic sync_state callback which
disables all the domains unused for the provider that registers it.

Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
---

This approach has been applied for unused clocks as well.
With this patch merged in, all the providers that have sync_state
callback registered will leave the domains enabled unless the provider's
sync_state callback explicitly disables them. So those providers will
need to add the disabling part to their sync_state callback. On the
other hand, the platforms that have cases where domains need to remain
enabled (even if unused) until the consumer driver probes, will be able,
with this patch in, to run without the pd_ignore_unused kernel argument,
which seems to be the case for most Qualcomm platforms, at this moment.

The v1 is here:
https://lore.kernel.org/all/20230126234013.3638425-1-abel.vesa@linaro.org/

Changes since v1:
 * added a generic sync state callback to be registered by providers in
   order to disable the unused domains on their sync state. Also
   mentioned this in the commit message.

 drivers/base/power/domain.c | 17 ++++++++++++++++-
 include/linux/pm_domain.h   |  3 +++
 2 files changed, 19 insertions(+), 1 deletion(-)

Comments

Matthias Kaehlcke Feb. 2, 2023, 6:24 p.m. UTC | #1
Hi Abel,

On Fri, Jan 27, 2023 at 12:40:53PM +0200, Abel Vesa wrote:
> Currently, there are cases when a domain needs to remain enabled until
> the consumer driver probes. Sometimes such consumer drivers may be built
> as modules. Since the genpd_power_off_unused is called too early for
> such consumer driver modules to get a chance to probe, the domain, since
> it is unused, will get disabled. On the other hand, the best time for
> an unused domain to be disabled is on the provider's sync_state
> callback. So, if the provider has registered a sync_state callback,
> assume the unused domains for that provider will be disabled on its
> sync_state callback. Also provide a generic sync_state callback which
> disables all the domains unused for the provider that registers it.
> 
> Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
> ---
> 
> This approach has been applied for unused clocks as well.
> With this patch merged in, all the providers that have sync_state
> callback registered will leave the domains enabled unless the provider's
> sync_state callback explicitly disables them. So those providers will
> need to add the disabling part to their sync_state callback. On the
> other hand, the platforms that have cases where domains need to remain
> enabled (even if unused) until the consumer driver probes, will be able,
> with this patch in, to run without the pd_ignore_unused kernel argument,
> which seems to be the case for most Qualcomm platforms, at this moment.

I recently encountered a related issue on a Qualcomm platform with a
v6.2-rc kernel, which includes 3a39049f88e4 ("soc: qcom: rpmhpd: Use
highest corner until sync_state"). The issue involves a DT node with a
rpmhpd, the DT node is enabled, however the corresponding device driver
is not enabled in the kernel. In such a scenario the sync_state callback
is never called, because the genpd consumer never probes. As a result
the Always-on subsystem (AOSS) of the SoC doesn't enter sleep mode during
system suspend, which results in a substantially higher power consumption
in S3.

I wonder if genpd (and some other frameworks) needs something like
regulator_init_complete(), which turns off unused regulators 30s after
system boot. That's conceptually similar to the current
genpd_power_off_unused(), but would provide time for modules being loaded.

> The v1 is here:
> https://lore.kernel.org/all/20230126234013.3638425-1-abel.vesa@linaro.org/
> 
> Changes since v1:
>  * added a generic sync state callback to be registered by providers in
>    order to disable the unused domains on their sync state. Also
>    mentioned this in the commit message.
> 
>  drivers/base/power/domain.c | 17 ++++++++++++++++-
>  include/linux/pm_domain.h   |  3 +++
>  2 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> index 84662d338188..c2a5f77c01f3 100644
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -1099,7 +1099,8 @@ static int __init genpd_power_off_unused(void)
>  	mutex_lock(&gpd_list_lock);
>  
>  	list_for_each_entry(genpd, &gpd_list, gpd_list_node)
> -		genpd_queue_power_off_work(genpd);
> +		if (!dev_has_sync_state(genpd->provider->dev))
> +			genpd_queue_power_off_work(genpd);
>  
>  	mutex_unlock(&gpd_list_lock);
>  
> @@ -1107,6 +1108,20 @@ static int __init genpd_power_off_unused(void)
>  }
>  late_initcall(genpd_power_off_unused);
>  
> +void genpd_power_off_unused_sync_state(struct device *dev)
> +{
> +	struct generic_pm_domain *genpd;
> +
> +	mutex_lock(&gpd_list_lock);
> +
> +	list_for_each_entry(genpd, &gpd_list, gpd_list_node)
> +		if (genpd->provider->dev == dev)
> +			genpd_queue_power_off_work(genpd);
> +
> +	mutex_unlock(&gpd_list_lock);
> +}
> +EXPORT_SYMBOL_GPL(genpd_power_off_unused_sync_state);
> +
>  #ifdef CONFIG_PM_SLEEP
>  
>  /**
> diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
> index f776fb93eaa0..1fd5aa500c81 100644
> --- a/include/linux/pm_domain.h
> +++ b/include/linux/pm_domain.h
> @@ -351,6 +351,7 @@ struct device *genpd_dev_pm_attach_by_id(struct device *dev,
>  					 unsigned int index);
>  struct device *genpd_dev_pm_attach_by_name(struct device *dev,
>  					   const char *name);
> +void genpd_power_off_unused_sync_state(struct device *dev);
>  #else /* !CONFIG_PM_GENERIC_DOMAINS_OF */
>  static inline int of_genpd_add_provider_simple(struct device_node *np,
>  					struct generic_pm_domain *genpd)
> @@ -419,6 +420,8 @@ struct generic_pm_domain *of_genpd_remove_last(struct device_node *np)
>  {
>  	return ERR_PTR(-EOPNOTSUPP);
>  }
> +
> +static inline genpd_power_off_unused_sync_state(struct device *dev) {}
>  #endif /* CONFIG_PM_GENERIC_DOMAINS_OF */
>  
>  #ifdef CONFIG_PM
> -- 
> 2.34.1
>
Doug Anderson Feb. 2, 2023, 7:20 p.m. UTC | #2
Hi,

On Thu, Feb 2, 2023 at 10:24 AM Matthias Kaehlcke <mka@chromium.org> wrote:
>
> Hi Abel,
>
> On Fri, Jan 27, 2023 at 12:40:53PM +0200, Abel Vesa wrote:
> > Currently, there are cases when a domain needs to remain enabled until
> > the consumer driver probes. Sometimes such consumer drivers may be built
> > as modules. Since the genpd_power_off_unused is called too early for
> > such consumer driver modules to get a chance to probe, the domain, since
> > it is unused, will get disabled. On the other hand, the best time for
> > an unused domain to be disabled is on the provider's sync_state
> > callback. So, if the provider has registered a sync_state callback,
> > assume the unused domains for that provider will be disabled on its
> > sync_state callback. Also provide a generic sync_state callback which
> > disables all the domains unused for the provider that registers it.
> >
> > Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
> > ---
> >
> > This approach has been applied for unused clocks as well.
> > With this patch merged in, all the providers that have sync_state
> > callback registered will leave the domains enabled unless the provider's
> > sync_state callback explicitly disables them. So those providers will
> > need to add the disabling part to their sync_state callback. On the
> > other hand, the platforms that have cases where domains need to remain
> > enabled (even if unused) until the consumer driver probes, will be able,
> > with this patch in, to run without the pd_ignore_unused kernel argument,
> > which seems to be the case for most Qualcomm platforms, at this moment.
>
> I recently encountered a related issue on a Qualcomm platform with a
> v6.2-rc kernel, which includes 3a39049f88e4 ("soc: qcom: rpmhpd: Use
> highest corner until sync_state"). The issue involves a DT node with a
> rpmhpd, the DT node is enabled, however the corresponding device driver
> is not enabled in the kernel. In such a scenario the sync_state callback
> is never called, because the genpd consumer never probes. As a result
> the Always-on subsystem (AOSS) of the SoC doesn't enter sleep mode during
> system suspend, which results in a substantially higher power consumption
> in S3.
>
> I wonder if genpd (and some other frameworks) needs something like
> regulator_init_complete(), which turns off unused regulators 30s after
> system boot. That's conceptually similar to the current
> genpd_power_off_unused(), but would provide time for modules being loaded.

Just for completeness, there are at least a few other similar concepts
in the kernel where the kernel needs to decide that it's going to stop
waiting for modules to show up and it just shuts off anything that's
unused. The other one that jumps to the top of my head is related to
"driver_deferred_probe_timeout". There we give 10 seconds (by default)
for userspace to load modules. After that point in time we start
returning errors instead of waiting longer. You can even see that the
default depends on whether "CONFIG_MODULES" is set.

-Doug
Dmitry Baryshkov Feb. 2, 2023, 7:53 p.m. UTC | #3
On 02/02/2023 20:24, Matthias Kaehlcke wrote:
> Hi Abel,
> 
> On Fri, Jan 27, 2023 at 12:40:53PM +0200, Abel Vesa wrote:
>> Currently, there are cases when a domain needs to remain enabled until
>> the consumer driver probes. Sometimes such consumer drivers may be built
>> as modules. Since the genpd_power_off_unused is called too early for
>> such consumer driver modules to get a chance to probe, the domain, since
>> it is unused, will get disabled. On the other hand, the best time for
>> an unused domain to be disabled is on the provider's sync_state
>> callback. So, if the provider has registered a sync_state callback,
>> assume the unused domains for that provider will be disabled on its
>> sync_state callback. Also provide a generic sync_state callback which
>> disables all the domains unused for the provider that registers it.
>>
>> Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
>> ---
>>
>> This approach has been applied for unused clocks as well.
>> With this patch merged in, all the providers that have sync_state
>> callback registered will leave the domains enabled unless the provider's
>> sync_state callback explicitly disables them. So those providers will
>> need to add the disabling part to their sync_state callback. On the
>> other hand, the platforms that have cases where domains need to remain
>> enabled (even if unused) until the consumer driver probes, will be able,
>> with this patch in, to run without the pd_ignore_unused kernel argument,
>> which seems to be the case for most Qualcomm platforms, at this moment.
> 
> I recently encountered a related issue on a Qualcomm platform with a
> v6.2-rc kernel, which includes 3a39049f88e4 ("soc: qcom: rpmhpd: Use
> highest corner until sync_state"). The issue involves a DT node with a
> rpmhpd, the DT node is enabled, however the corresponding device driver
> is not enabled in the kernel. In such a scenario the sync_state callback
> is never called, because the genpd consumer never probes. As a result
> the Always-on subsystem (AOSS) of the SoC doesn't enter sleep mode during
> system suspend, which results in a substantially higher power consumption
> in S3.
> 
> I wonder if genpd (and some other frameworks) needs something like
> regulator_init_complete(), which turns off unused regulators 30s after
> system boot. That's conceptually similar to the current
> genpd_power_off_unused(), but would provide time for modules being loaded.

I think the overall goal is to move away from ad-hoc implementations 
like clk_disable_unused/genpd_power_off_unused/regulator_init_complete 
towards the sync_state.

So inherently one either has to provide drivers for all devices in 
question or disable unused devices in DT.

> 
>> The v1 is here:
>> https://lore.kernel.org/all/20230126234013.3638425-1-abel.vesa@linaro.org/
>>
>> Changes since v1:
>>   * added a generic sync state callback to be registered by providers in
>>     order to disable the unused domains on their sync state. Also
>>     mentioned this in the commit message.
>>
>>   drivers/base/power/domain.c | 17 ++++++++++++++++-
>>   include/linux/pm_domain.h   |  3 +++
>>   2 files changed, 19 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
>> index 84662d338188..c2a5f77c01f3 100644
>> --- a/drivers/base/power/domain.c
>> +++ b/drivers/base/power/domain.c
>> @@ -1099,7 +1099,8 @@ static int __init genpd_power_off_unused(void)
>>   	mutex_lock(&gpd_list_lock);
>>   
>>   	list_for_each_entry(genpd, &gpd_list, gpd_list_node)
>> -		genpd_queue_power_off_work(genpd);
>> +		if (!dev_has_sync_state(genpd->provider->dev))
>> +			genpd_queue_power_off_work(genpd);
>>   
>>   	mutex_unlock(&gpd_list_lock);
>>   
>> @@ -1107,6 +1108,20 @@ static int __init genpd_power_off_unused(void)
>>   }
>>   late_initcall(genpd_power_off_unused);
>>   
>> +void genpd_power_off_unused_sync_state(struct device *dev)
>> +{
>> +	struct generic_pm_domain *genpd;
>> +
>> +	mutex_lock(&gpd_list_lock);
>> +
>> +	list_for_each_entry(genpd, &gpd_list, gpd_list_node)
>> +		if (genpd->provider->dev == dev)
>> +			genpd_queue_power_off_work(genpd);
>> +
>> +	mutex_unlock(&gpd_list_lock);
>> +}
>> +EXPORT_SYMBOL_GPL(genpd_power_off_unused_sync_state);
>> +
>>   #ifdef CONFIG_PM_SLEEP
>>   
>>   /**
>> diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
>> index f776fb93eaa0..1fd5aa500c81 100644
>> --- a/include/linux/pm_domain.h
>> +++ b/include/linux/pm_domain.h
>> @@ -351,6 +351,7 @@ struct device *genpd_dev_pm_attach_by_id(struct device *dev,
>>   					 unsigned int index);
>>   struct device *genpd_dev_pm_attach_by_name(struct device *dev,
>>   					   const char *name);
>> +void genpd_power_off_unused_sync_state(struct device *dev);
>>   #else /* !CONFIG_PM_GENERIC_DOMAINS_OF */
>>   static inline int of_genpd_add_provider_simple(struct device_node *np,
>>   					struct generic_pm_domain *genpd)
>> @@ -419,6 +420,8 @@ struct generic_pm_domain *of_genpd_remove_last(struct device_node *np)
>>   {
>>   	return ERR_PTR(-EOPNOTSUPP);
>>   }
>> +
>> +static inline genpd_power_off_unused_sync_state(struct device *dev) {}
>>   #endif /* CONFIG_PM_GENERIC_DOMAINS_OF */
>>   
>>   #ifdef CONFIG_PM
>> -- 
>> 2.34.1
>>
Doug Anderson Feb. 2, 2023, 10:20 p.m. UTC | #4
Hi,

On Thu, Feb 2, 2023 at 11:53 AM Dmitry Baryshkov
<dmitry.baryshkov@linaro.org> wrote:
>
> On 02/02/2023 20:24, Matthias Kaehlcke wrote:
> > Hi Abel,
> >
> > On Fri, Jan 27, 2023 at 12:40:53PM +0200, Abel Vesa wrote:
> >> Currently, there are cases when a domain needs to remain enabled until
> >> the consumer driver probes. Sometimes such consumer drivers may be built
> >> as modules. Since the genpd_power_off_unused is called too early for
> >> such consumer driver modules to get a chance to probe, the domain, since
> >> it is unused, will get disabled. On the other hand, the best time for
> >> an unused domain to be disabled is on the provider's sync_state
> >> callback. So, if the provider has registered a sync_state callback,
> >> assume the unused domains for that provider will be disabled on its
> >> sync_state callback. Also provide a generic sync_state callback which
> >> disables all the domains unused for the provider that registers it.
> >>
> >> Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
> >> ---
> >>
> >> This approach has been applied for unused clocks as well.
> >> With this patch merged in, all the providers that have sync_state
> >> callback registered will leave the domains enabled unless the provider's
> >> sync_state callback explicitly disables them. So those providers will
> >> need to add the disabling part to their sync_state callback. On the
> >> other hand, the platforms that have cases where domains need to remain
> >> enabled (even if unused) until the consumer driver probes, will be able,
> >> with this patch in, to run without the pd_ignore_unused kernel argument,
> >> which seems to be the case for most Qualcomm platforms, at this moment.
> >
> > I recently encountered a related issue on a Qualcomm platform with a
> > v6.2-rc kernel, which includes 3a39049f88e4 ("soc: qcom: rpmhpd: Use
> > highest corner until sync_state"). The issue involves a DT node with a
> > rpmhpd, the DT node is enabled, however the corresponding device driver
> > is not enabled in the kernel. In such a scenario the sync_state callback
> > is never called, because the genpd consumer never probes. As a result
> > the Always-on subsystem (AOSS) of the SoC doesn't enter sleep mode during
> > system suspend, which results in a substantially higher power consumption
> > in S3.
> >
> > I wonder if genpd (and some other frameworks) needs something like
> > regulator_init_complete(), which turns off unused regulators 30s after
> > system boot. That's conceptually similar to the current
> > genpd_power_off_unused(), but would provide time for modules being loaded.
>
> I think the overall goal is to move away from ad-hoc implementations
> like clk_disable_unused/genpd_power_off_unused/regulator_init_complete
> towards the sync_state.
>
> So inherently one either has to provide drivers for all devices in
> question or disable unused devices in DT.

Hmm. I guess I haven't been involved too much in those discussions,
but overall I thought:

1. The device tree should ideally be describing the hardware. Thus if
the hardware is there / available to use on a given board then the
device should be marked enabled.

2. Users are not actually required to enable drivers for all hardware
on their board. Things should still function OK even if a driver is
disabled. For instance, if the SoC had a crypto accelerator you'd
describe it in the device tree but it would be OK for someone to build
a kernel that didn't enable the crypto accelerator driver.

Am I mistaken? Which point did I get wrong, #1, or #2?

-Doug
Matthias Kaehlcke Feb. 3, 2023, 1:20 a.m. UTC | #5
Hi Dmitry,

On Thu, Feb 02, 2023 at 09:53:41PM +0200, Dmitry Baryshkov wrote:
> On 02/02/2023 20:24, Matthias Kaehlcke wrote:
> > Hi Abel,
> > 
> > On Fri, Jan 27, 2023 at 12:40:53PM +0200, Abel Vesa wrote:
> > > Currently, there are cases when a domain needs to remain enabled until
> > > the consumer driver probes. Sometimes such consumer drivers may be built
> > > as modules. Since the genpd_power_off_unused is called too early for
> > > such consumer driver modules to get a chance to probe, the domain, since
> > > it is unused, will get disabled. On the other hand, the best time for
> > > an unused domain to be disabled is on the provider's sync_state
> > > callback. So, if the provider has registered a sync_state callback,
> > > assume the unused domains for that provider will be disabled on its
> > > sync_state callback. Also provide a generic sync_state callback which
> > > disables all the domains unused for the provider that registers it.
> > > 
> > > Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
> > > ---
> > > 
> > > This approach has been applied for unused clocks as well.
> > > With this patch merged in, all the providers that have sync_state
> > > callback registered will leave the domains enabled unless the provider's
> > > sync_state callback explicitly disables them. So those providers will
> > > need to add the disabling part to their sync_state callback. On the
> > > other hand, the platforms that have cases where domains need to remain
> > > enabled (even if unused) until the consumer driver probes, will be able,
> > > with this patch in, to run without the pd_ignore_unused kernel argument,
> > > which seems to be the case for most Qualcomm platforms, at this moment.
> > 
> > I recently encountered a related issue on a Qualcomm platform with a
> > v6.2-rc kernel, which includes 3a39049f88e4 ("soc: qcom: rpmhpd: Use
> > highest corner until sync_state"). The issue involves a DT node with a
> > rpmhpd, the DT node is enabled, however the corresponding device driver
> > is not enabled in the kernel. In such a scenario the sync_state callback
> > is never called, because the genpd consumer never probes. As a result
> > the Always-on subsystem (AOSS) of the SoC doesn't enter sleep mode during
> > system suspend, which results in a substantially higher power consumption
> > in S3.
> > 
> > I wonder if genpd (and some other frameworks) needs something like
> > regulator_init_complete(), which turns off unused regulators 30s after
> > system boot. That's conceptually similar to the current
> > genpd_power_off_unused(), but would provide time for modules being loaded.
> 
> I think the overall goal is to move away from ad-hoc implementations like
> clk_disable_unused/genpd_power_off_unused/regulator_init_complete towards
> the sync_state.

I generally agree with the goal of using common mechanisms whenever possible.

> So inherently one either has to provide drivers for all devices in question
> or disable unused devices in DT.

I don't think that's a great solution, it essentially hands the issue down to
the users or downstream maintainers of the kernel, who might not be aware that
there is an issue, nor know about the specifics of genpd (or interconnects and
clocks which have similar problems).

In general symptoms are probably subtle, like a (potentially substantially)
increased power consumption during system suspend. The issue might have been
introduced by an update to a newer kernel, which now includes a DT node for a
new SoC feature which wasn't supported by the 'old' kernel. It's common
practice to use the 'old' .config, at least as a starting point, which
obviously doesn't enable the new driver. That happend to me with [1] when
testing v6.1. It took me quite some time to track the 'culprit' commit down
and then some debugging to understand what's going on. Shortly after that I
ran into a related issue involving genpds when testing v6.2-rc, which again
took a non-trivial amount of time to track down (and I'm familiar with the SoC
platform and the general nature of the issue). I don't think it's reasonable
to expect every user/downstream maintainer of an impacted system to go through
this, one person at a time.

Maybe there could be a generic solution for drivers with a 'sync_state'
callback, e.g. a the driver (or framework) could have a 'sync_state_timeout'
callback (or similar), which is called by the driver framework if 'sync_state'
wasn't called (for example) 30s after the device was probed. Then the provider
can power off or throttle unclaimed resources.

m.

[1] https://lore.kernel.org/lkml/20220902043511.17130-5-quic_rjendra@quicinc.com/


> > > The v1 is here:
> > > https://lore.kernel.org/all/20230126234013.3638425-1-abel.vesa@linaro.org/
> > > 
> > > Changes since v1:
> > >   * added a generic sync state callback to be registered by providers in
> > >     order to disable the unused domains on their sync state. Also
> > >     mentioned this in the commit message.
> > > 
> > >   drivers/base/power/domain.c | 17 ++++++++++++++++-
> > >   include/linux/pm_domain.h   |  3 +++
> > >   2 files changed, 19 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> > > index 84662d338188..c2a5f77c01f3 100644
> > > --- a/drivers/base/power/domain.c
> > > +++ b/drivers/base/power/domain.c
> > > @@ -1099,7 +1099,8 @@ static int __init genpd_power_off_unused(void)
> > >   	mutex_lock(&gpd_list_lock);
> > >   	list_for_each_entry(genpd, &gpd_list, gpd_list_node)
> > > -		genpd_queue_power_off_work(genpd);
> > > +		if (!dev_has_sync_state(genpd->provider->dev))
> > > +			genpd_queue_power_off_work(genpd);
> > >   	mutex_unlock(&gpd_list_lock);
> > > @@ -1107,6 +1108,20 @@ static int __init genpd_power_off_unused(void)
> > >   }
> > >   late_initcall(genpd_power_off_unused);
> > > +void genpd_power_off_unused_sync_state(struct device *dev)
> > > +{
> > > +	struct generic_pm_domain *genpd;
> > > +
> > > +	mutex_lock(&gpd_list_lock);
> > > +
> > > +	list_for_each_entry(genpd, &gpd_list, gpd_list_node)
> > > +		if (genpd->provider->dev == dev)
> > > +			genpd_queue_power_off_work(genpd);
> > > +
> > > +	mutex_unlock(&gpd_list_lock);
> > > +}
> > > +EXPORT_SYMBOL_GPL(genpd_power_off_unused_sync_state);
> > > +
> > >   #ifdef CONFIG_PM_SLEEP
> > >   /**
> > > diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
> > > index f776fb93eaa0..1fd5aa500c81 100644
> > > --- a/include/linux/pm_domain.h
> > > +++ b/include/linux/pm_domain.h
> > > @@ -351,6 +351,7 @@ struct device *genpd_dev_pm_attach_by_id(struct device *dev,
> > >   					 unsigned int index);
> > >   struct device *genpd_dev_pm_attach_by_name(struct device *dev,
> > >   					   const char *name);
> > > +void genpd_power_off_unused_sync_state(struct device *dev);
> > >   #else /* !CONFIG_PM_GENERIC_DOMAINS_OF */
> > >   static inline int of_genpd_add_provider_simple(struct device_node *np,
> > >   					struct generic_pm_domain *genpd)
> > > @@ -419,6 +420,8 @@ struct generic_pm_domain *of_genpd_remove_last(struct device_node *np)
> > >   {
> > >   	return ERR_PTR(-EOPNOTSUPP);
> > >   }
> > > +
> > > +static inline genpd_power_off_unused_sync_state(struct device *dev) {}
> > >   #endif /* CONFIG_PM_GENERIC_DOMAINS_OF */
> > >   #ifdef CONFIG_PM
> > > -- 
> > > 2.34.1
> > > 
> 
> -- 
> With best wishes
> Dmitry
>
Dmitry Baryshkov Feb. 3, 2023, 8 p.m. UTC | #6
On 03/02/2023 03:20, Matthias Kaehlcke wrote:
> Hi Dmitry,
> 
> On Thu, Feb 02, 2023 at 09:53:41PM +0200, Dmitry Baryshkov wrote:
>> On 02/02/2023 20:24, Matthias Kaehlcke wrote:
>>> Hi Abel,
>>>
>>> On Fri, Jan 27, 2023 at 12:40:53PM +0200, Abel Vesa wrote:
>>>> Currently, there are cases when a domain needs to remain enabled until
>>>> the consumer driver probes. Sometimes such consumer drivers may be built
>>>> as modules. Since the genpd_power_off_unused is called too early for
>>>> such consumer driver modules to get a chance to probe, the domain, since
>>>> it is unused, will get disabled. On the other hand, the best time for
>>>> an unused domain to be disabled is on the provider's sync_state
>>>> callback. So, if the provider has registered a sync_state callback,
>>>> assume the unused domains for that provider will be disabled on its
>>>> sync_state callback. Also provide a generic sync_state callback which
>>>> disables all the domains unused for the provider that registers it.
>>>>
>>>> Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
>>>> ---
>>>>
>>>> This approach has been applied for unused clocks as well.
>>>> With this patch merged in, all the providers that have sync_state
>>>> callback registered will leave the domains enabled unless the provider's
>>>> sync_state callback explicitly disables them. So those providers will
>>>> need to add the disabling part to their sync_state callback. On the
>>>> other hand, the platforms that have cases where domains need to remain
>>>> enabled (even if unused) until the consumer driver probes, will be able,
>>>> with this patch in, to run without the pd_ignore_unused kernel argument,
>>>> which seems to be the case for most Qualcomm platforms, at this moment.
>>>
>>> I recently encountered a related issue on a Qualcomm platform with a
>>> v6.2-rc kernel, which includes 3a39049f88e4 ("soc: qcom: rpmhpd: Use
>>> highest corner until sync_state"). The issue involves a DT node with a
>>> rpmhpd, the DT node is enabled, however the corresponding device driver
>>> is not enabled in the kernel. In such a scenario the sync_state callback
>>> is never called, because the genpd consumer never probes. As a result
>>> the Always-on subsystem (AOSS) of the SoC doesn't enter sleep mode during
>>> system suspend, which results in a substantially higher power consumption
>>> in S3.
>>>
>>> I wonder if genpd (and some other frameworks) needs something like
>>> regulator_init_complete(), which turns off unused regulators 30s after
>>> system boot. That's conceptually similar to the current
>>> genpd_power_off_unused(), but would provide time for modules being loaded.
>>
>> I think the overall goal is to move away from ad-hoc implementations like
>> clk_disable_unused/genpd_power_off_unused/regulator_init_complete towards
>> the sync_state.
> 
> I generally agree with the goal of using common mechanisms whenever possible.
> 
>> So inherently one either has to provide drivers for all devices in question
>> or disable unused devices in DT.
> 
> I don't think that's a great solution, it essentially hands the issue down to
> the users or downstream maintainers of the kernel, who might not be aware that
> there is an issue, nor know about the specifics of genpd (or interconnects and
> clocks which have similar problems).

The goal is to move the control down to individual drivers. Previously 
we had issues with clk_disable_unused() disabling mdss/mdp clocks 
incorrectly, which frequently led to broken display output. Other 
clock/genpd/regulator drivers might have other internal dependencies. 
Thus it is not really possible to handle resource shutdown in the common 
  (framework) code.

> 
> In general symptoms are probably subtle, like a (potentially substantially)
> increased power consumption during system suspend. The issue might have been
> introduced by an update to a newer kernel, which now includes a DT node for a
> new SoC feature which wasn't supported by the 'old' kernel. It's common
> practice to use the 'old' .config, at least as a starting point, which
> obviously doesn't enable the new driver. That happend to me with [1] when
> testing v6.1. It took me quite some time to track the 'culprit' commit down
> and then some debugging to understand what's going on. Shortly after that I
> ran into a related issue involving genpds when testing v6.2-rc, which again
> took a non-trivial amount of time to track down (and I'm familiar with the SoC
> platform and the general nature of the issue). I don't think it's reasonable
> to expect every user/downstream maintainer of an impacted system to go through
> this, one person at a time.

I think it would be nice to have some way of 'sync_pending' debug 
available (compare this to debugfs/devices_deferred).

Note, we are trying to make sure that all supported drivers are enabled 
at least as modules (if possible). If we fail, please send a patch 
fixing the defconfig.

> Maybe there could be a generic solution for drivers with a 'sync_state'
> callback, e.g. a the driver (or framework) could have a 'sync_state_timeout'
> callback (or similar), which is called by the driver framework if 'sync_state'
> wasn't called (for example) 30s after the device was probed. Then the provider
> can power off or throttle unclaimed resources.

I might be missing a point somewhere, but for me it looks like a logical 
solution. Please send a proposal.
Matthias Kaehlcke Feb. 3, 2023, 9:18 p.m. UTC | #7
On Fri, Feb 03, 2023 at 10:00:27PM +0200, Dmitry Baryshkov wrote:
> On 03/02/2023 03:20, Matthias Kaehlcke wrote:
> > Hi Dmitry,
> > 
> > On Thu, Feb 02, 2023 at 09:53:41PM +0200, Dmitry Baryshkov wrote:
> > > On 02/02/2023 20:24, Matthias Kaehlcke wrote:
> > > > Hi Abel,
> > > > 
> > > > On Fri, Jan 27, 2023 at 12:40:53PM +0200, Abel Vesa wrote:
> > > > > Currently, there are cases when a domain needs to remain enabled until
> > > > > the consumer driver probes. Sometimes such consumer drivers may be built
> > > > > as modules. Since the genpd_power_off_unused is called too early for
> > > > > such consumer driver modules to get a chance to probe, the domain, since
> > > > > it is unused, will get disabled. On the other hand, the best time for
> > > > > an unused domain to be disabled is on the provider's sync_state
> > > > > callback. So, if the provider has registered a sync_state callback,
> > > > > assume the unused domains for that provider will be disabled on its
> > > > > sync_state callback. Also provide a generic sync_state callback which
> > > > > disables all the domains unused for the provider that registers it.
> > > > > 
> > > > > Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
> > > > > ---
> > > > > 
> > > > > This approach has been applied for unused clocks as well.
> > > > > With this patch merged in, all the providers that have sync_state
> > > > > callback registered will leave the domains enabled unless the provider's
> > > > > sync_state callback explicitly disables them. So those providers will
> > > > > need to add the disabling part to their sync_state callback. On the
> > > > > other hand, the platforms that have cases where domains need to remain
> > > > > enabled (even if unused) until the consumer driver probes, will be able,
> > > > > with this patch in, to run without the pd_ignore_unused kernel argument,
> > > > > which seems to be the case for most Qualcomm platforms, at this moment.
> > > > 
> > > > I recently encountered a related issue on a Qualcomm platform with a
> > > > v6.2-rc kernel, which includes 3a39049f88e4 ("soc: qcom: rpmhpd: Use
> > > > highest corner until sync_state"). The issue involves a DT node with a
> > > > rpmhpd, the DT node is enabled, however the corresponding device driver
> > > > is not enabled in the kernel. In such a scenario the sync_state callback
> > > > is never called, because the genpd consumer never probes. As a result
> > > > the Always-on subsystem (AOSS) of the SoC doesn't enter sleep mode during
> > > > system suspend, which results in a substantially higher power consumption
> > > > in S3.
> > > > 
> > > > I wonder if genpd (and some other frameworks) needs something like
> > > > regulator_init_complete(), which turns off unused regulators 30s after
> > > > system boot. That's conceptually similar to the current
> > > > genpd_power_off_unused(), but would provide time for modules being loaded.
> > > 
> > > I think the overall goal is to move away from ad-hoc implementations like
> > > clk_disable_unused/genpd_power_off_unused/regulator_init_complete towards
> > > the sync_state.
> > 
> > I generally agree with the goal of using common mechanisms whenever possible.
> > 
> > > So inherently one either has to provide drivers for all devices in question
> > > or disable unused devices in DT.
> > 
> > I don't think that's a great solution, it essentially hands the issue down to
> > the users or downstream maintainers of the kernel, who might not be aware that
> > there is an issue, nor know about the specifics of genpd (or interconnects and
> > clocks which have similar problems).
> 
> The goal is to move the control down to individual drivers. Previously we
> had issues with clk_disable_unused() disabling mdss/mdp clocks incorrectly,
> which frequently led to broken display output. Other clock/genpd/regulator
> drivers might have other internal dependencies. Thus it is not really
> possible to handle resource shutdown in the common  (framework) code.
> 
> > 
> > In general symptoms are probably subtle, like a (potentially substantially)
> > increased power consumption during system suspend. The issue might have been
> > introduced by an update to a newer kernel, which now includes a DT node for a
> > new SoC feature which wasn't supported by the 'old' kernel. It's common
> > practice to use the 'old' .config, at least as a starting point, which
> > obviously doesn't enable the new driver. That happend to me with [1] when
> > testing v6.1. It took me quite some time to track the 'culprit' commit down
> > and then some debugging to understand what's going on. Shortly after that I
> > ran into a related issue involving genpds when testing v6.2-rc, which again
> > took a non-trivial amount of time to track down (and I'm familiar with the SoC
> > platform and the general nature of the issue). I don't think it's reasonable
> > to expect every user/downstream maintainer of an impacted system to go through
> > this, one person at a time.
> 
> I think it would be nice to have some way of 'sync_pending' debug available
> (compare this to debugfs/devices_deferred).

Most folks are probably not even aware that they have a 'sync_state' issue and
wouldn't look in debugfs, so I think this would have to be something proactive,
like a warning log that is enabled by default (possibly with the option to
disable it). Something in debugfs could be a nice complement.

> Note, we are trying to make sure that all supported drivers are enabled at
> least as modules (if possible). If we fail, please send a patch fixing the
> defconfig.

That's great, however not everybody uses the defconfig, it's just a default.

> > Maybe there could be a generic solution for drivers with a 'sync_state'
> > callback, e.g. a the driver (or framework) could have a 'sync_state_timeout'
> > callback (or similar), which is called by the driver framework if 'sync_state'
> > wasn't called (for example) 30s after the device was probed. Then the provider
> > can power off or throttle unclaimed resources.
> 
> I might be missing a point somewhere, but for me it looks like a logical
> solution. Please send a proposal.

I started working on a patch, I'll probably send it out next week if I don't
encounter any evident major issues.
Abel Vesa Feb. 6, 2023, 4:21 p.m. UTC | #8
On 23-02-03 22:00:27, Dmitry Baryshkov wrote:
> On 03/02/2023 03:20, Matthias Kaehlcke wrote:
> > Hi Dmitry,
> > 
> > On Thu, Feb 02, 2023 at 09:53:41PM +0200, Dmitry Baryshkov wrote:
> > > On 02/02/2023 20:24, Matthias Kaehlcke wrote:
> > > > Hi Abel,
> > > > 
> > > > On Fri, Jan 27, 2023 at 12:40:53PM +0200, Abel Vesa wrote:
> > > > > Currently, there are cases when a domain needs to remain enabled until
> > > > > the consumer driver probes. Sometimes such consumer drivers may be built
> > > > > as modules. Since the genpd_power_off_unused is called too early for
> > > > > such consumer driver modules to get a chance to probe, the domain, since
> > > > > it is unused, will get disabled. On the other hand, the best time for
> > > > > an unused domain to be disabled is on the provider's sync_state
> > > > > callback. So, if the provider has registered a sync_state callback,
> > > > > assume the unused domains for that provider will be disabled on its
> > > > > sync_state callback. Also provide a generic sync_state callback which
> > > > > disables all the domains unused for the provider that registers it.
> > > > > 
> > > > > Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
> > > > > ---
> > > > > 
> > > > > This approach has been applied for unused clocks as well.
> > > > > With this patch merged in, all the providers that have sync_state
> > > > > callback registered will leave the domains enabled unless the provider's
> > > > > sync_state callback explicitly disables them. So those providers will
> > > > > need to add the disabling part to their sync_state callback. On the
> > > > > other hand, the platforms that have cases where domains need to remain
> > > > > enabled (even if unused) until the consumer driver probes, will be able,
> > > > > with this patch in, to run without the pd_ignore_unused kernel argument,
> > > > > which seems to be the case for most Qualcomm platforms, at this moment.
> > > > 
> > > > I recently encountered a related issue on a Qualcomm platform with a
> > > > v6.2-rc kernel, which includes 3a39049f88e4 ("soc: qcom: rpmhpd: Use
> > > > highest corner until sync_state"). The issue involves a DT node with a
> > > > rpmhpd, the DT node is enabled, however the corresponding device driver
> > > > is not enabled in the kernel. In such a scenario the sync_state callback
> > > > is never called, because the genpd consumer never probes. As a result
> > > > the Always-on subsystem (AOSS) of the SoC doesn't enter sleep mode during
> > > > system suspend, which results in a substantially higher power consumption
> > > > in S3.
> > > > 
> > > > I wonder if genpd (and some other frameworks) needs something like
> > > > regulator_init_complete(), which turns off unused regulators 30s after
> > > > system boot. That's conceptually similar to the current
> > > > genpd_power_off_unused(), but would provide time for modules being loaded.
> > > 
> > > I think the overall goal is to move away from ad-hoc implementations like
> > > clk_disable_unused/genpd_power_off_unused/regulator_init_complete towards
> > > the sync_state.
> > 
> > I generally agree with the goal of using common mechanisms whenever possible.
> > 
> > > So inherently one either has to provide drivers for all devices in question
> > > or disable unused devices in DT.
> > 
> > I don't think that's a great solution, it essentially hands the issue down to
> > the users or downstream maintainers of the kernel, who might not be aware that
> > there is an issue, nor know about the specifics of genpd (or interconnects and
> > clocks which have similar problems).
> 
> The goal is to move the control down to individual drivers. Previously we
> had issues with clk_disable_unused() disabling mdss/mdp clocks incorrectly,
> which frequently led to broken display output. Other clock/genpd/regulator
> drivers might have other internal dependencies. Thus it is not really
> possible to handle resource shutdown in the common  (framework) code.
> 
> > 
> > In general symptoms are probably subtle, like a (potentially substantially)
> > increased power consumption during system suspend. The issue might have been
> > introduced by an update to a newer kernel, which now includes a DT node for a
> > new SoC feature which wasn't supported by the 'old' kernel. It's common
> > practice to use the 'old' .config, at least as a starting point, which
> > obviously doesn't enable the new driver. That happend to me with [1] when
> > testing v6.1. It took me quite some time to track the 'culprit' commit down
> > and then some debugging to understand what's going on. Shortly after that I
> > ran into a related issue involving genpds when testing v6.2-rc, which again
> > took a non-trivial amount of time to track down (and I'm familiar with the SoC
> > platform and the general nature of the issue). I don't think it's reasonable
> > to expect every user/downstream maintainer of an impacted system to go through
> > this, one person at a time.
> 
> I think it would be nice to have some way of 'sync_pending' debug available
> (compare this to debugfs/devices_deferred).

There is actually a 'state_synced' sysfs interface (per device) that
either shows 0, meaning it hasn't reach sync_state yet, or the file is
not available at all, meaning it has reached sync_state.

> 
> Note, we are trying to make sure that all supported drivers are enabled at
> least as modules (if possible). If we fail, please send a patch fixing the
> defconfig.
> 
> > Maybe there could be a generic solution for drivers with a 'sync_state'
> > callback, e.g. a the driver (or framework) could have a 'sync_state_timeout'
> > callback (or similar), which is called by the driver framework if 'sync_state'
> > wasn't called (for example) 30s after the device was probed. Then the provider
> > can power off or throttle unclaimed resources.
> 
> I might be missing a point somewhere, but for me it looks like a logical
> solution. Please send a proposal.
> 
> -- 
> With best wishes
> Dmitry
>
Abel Vesa Feb. 6, 2023, 4:39 p.m. UTC | #9
On 23-02-06 18:21:20, Abel Vesa wrote:
> On 23-02-03 22:00:27, Dmitry Baryshkov wrote:
> > On 03/02/2023 03:20, Matthias Kaehlcke wrote:
> > > Hi Dmitry,
> > > 
> > > On Thu, Feb 02, 2023 at 09:53:41PM +0200, Dmitry Baryshkov wrote:
> > > > On 02/02/2023 20:24, Matthias Kaehlcke wrote:
> > > > > Hi Abel,
> > > > > 
> > > > > On Fri, Jan 27, 2023 at 12:40:53PM +0200, Abel Vesa wrote:
> > > > > > Currently, there are cases when a domain needs to remain enabled until
> > > > > > the consumer driver probes. Sometimes such consumer drivers may be built
> > > > > > as modules. Since the genpd_power_off_unused is called too early for
> > > > > > such consumer driver modules to get a chance to probe, the domain, since
> > > > > > it is unused, will get disabled. On the other hand, the best time for
> > > > > > an unused domain to be disabled is on the provider's sync_state
> > > > > > callback. So, if the provider has registered a sync_state callback,
> > > > > > assume the unused domains for that provider will be disabled on its
> > > > > > sync_state callback. Also provide a generic sync_state callback which
> > > > > > disables all the domains unused for the provider that registers it.
> > > > > > 
> > > > > > Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
> > > > > > ---
> > > > > > 
> > > > > > This approach has been applied for unused clocks as well.
> > > > > > With this patch merged in, all the providers that have sync_state
> > > > > > callback registered will leave the domains enabled unless the provider's
> > > > > > sync_state callback explicitly disables them. So those providers will
> > > > > > need to add the disabling part to their sync_state callback. On the
> > > > > > other hand, the platforms that have cases where domains need to remain
> > > > > > enabled (even if unused) until the consumer driver probes, will be able,
> > > > > > with this patch in, to run without the pd_ignore_unused kernel argument,
> > > > > > which seems to be the case for most Qualcomm platforms, at this moment.
> > > > > 
> > > > > I recently encountered a related issue on a Qualcomm platform with a
> > > > > v6.2-rc kernel, which includes 3a39049f88e4 ("soc: qcom: rpmhpd: Use
> > > > > highest corner until sync_state"). The issue involves a DT node with a
> > > > > rpmhpd, the DT node is enabled, however the corresponding device driver
> > > > > is not enabled in the kernel. In such a scenario the sync_state callback
> > > > > is never called, because the genpd consumer never probes. As a result
> > > > > the Always-on subsystem (AOSS) of the SoC doesn't enter sleep mode during
> > > > > system suspend, which results in a substantially higher power consumption
> > > > > in S3.
> > > > > 
> > > > > I wonder if genpd (and some other frameworks) needs something like
> > > > > regulator_init_complete(), which turns off unused regulators 30s after
> > > > > system boot. That's conceptually similar to the current
> > > > > genpd_power_off_unused(), but would provide time for modules being loaded.
> > > > 
> > > > I think the overall goal is to move away from ad-hoc implementations like
> > > > clk_disable_unused/genpd_power_off_unused/regulator_init_complete towards
> > > > the sync_state.
> > > 
> > > I generally agree with the goal of using common mechanisms whenever possible.
> > > 
> > > > So inherently one either has to provide drivers for all devices in question
> > > > or disable unused devices in DT.
> > > 
> > > I don't think that's a great solution, it essentially hands the issue down to
> > > the users or downstream maintainers of the kernel, who might not be aware that
> > > there is an issue, nor know about the specifics of genpd (or interconnects and
> > > clocks which have similar problems).
> > 
> > The goal is to move the control down to individual drivers. Previously we
> > had issues with clk_disable_unused() disabling mdss/mdp clocks incorrectly,
> > which frequently led to broken display output. Other clock/genpd/regulator
> > drivers might have other internal dependencies. Thus it is not really
> > possible to handle resource shutdown in the common  (framework) code.
> > 
> > > 
> > > In general symptoms are probably subtle, like a (potentially substantially)
> > > increased power consumption during system suspend. The issue might have been
> > > introduced by an update to a newer kernel, which now includes a DT node for a
> > > new SoC feature which wasn't supported by the 'old' kernel. It's common
> > > practice to use the 'old' .config, at least as a starting point, which
> > > obviously doesn't enable the new driver. That happend to me with [1] when
> > > testing v6.1. It took me quite some time to track the 'culprit' commit down
> > > and then some debugging to understand what's going on. Shortly after that I
> > > ran into a related issue involving genpds when testing v6.2-rc, which again
> > > took a non-trivial amount of time to track down (and I'm familiar with the SoC
> > > platform and the general nature of the issue). I don't think it's reasonable
> > > to expect every user/downstream maintainer of an impacted system to go through
> > > this, one person at a time.
> > 
> > I think it would be nice to have some way of 'sync_pending' debug available
> > (compare this to debugfs/devices_deferred).
> 
> There is actually a 'state_synced' sysfs interface (per device) that
> either shows 0, meaning it hasn't reach sync_state yet, or the file is
> not available at all, meaning it has reached sync_state.

For the sake of correctness, drivers that have sync_state callback
registered have the state_synced attribute. 0 means not state synced
yet, 1 means already states synced.

According to:
https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-devices-state_synced

> 
> > 
> > Note, we are trying to make sure that all supported drivers are enabled at
> > least as modules (if possible). If we fail, please send a patch fixing the
> > defconfig.
> > 
> > > Maybe there could be a generic solution for drivers with a 'sync_state'
> > > callback, e.g. a the driver (or framework) could have a 'sync_state_timeout'
> > > callback (or similar), which is called by the driver framework if 'sync_state'
> > > wasn't called (for example) 30s after the device was probed. Then the provider
> > > can power off or throttle unclaimed resources.
> > 
> > I might be missing a point somewhere, but for me it looks like a logical
> > solution. Please send a proposal.
> > 
> > -- 
> > With best wishes
> > Dmitry
> >
Matthias Kaehlcke Feb. 6, 2023, 5:22 p.m. UTC | #10
On Mon, Feb 06, 2023 at 06:31:21PM +0200, Abel Vesa wrote:
> On 23-02-02 18:24:15, Matthias Kaehlcke wrote:
> > Hi Abel,
> > 
> > On Fri, Jan 27, 2023 at 12:40:53PM +0200, Abel Vesa wrote:
> > > Currently, there are cases when a domain needs to remain enabled until
> > > the consumer driver probes. Sometimes such consumer drivers may be built
> > > as modules. Since the genpd_power_off_unused is called too early for
> > > such consumer driver modules to get a chance to probe, the domain, since
> > > it is unused, will get disabled. On the other hand, the best time for
> > > an unused domain to be disabled is on the provider's sync_state
> > > callback. So, if the provider has registered a sync_state callback,
> > > assume the unused domains for that provider will be disabled on its
> > > sync_state callback. Also provide a generic sync_state callback which
> > > disables all the domains unused for the provider that registers it.
> > > 
> > > Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
> > > ---
> > > 
> > > This approach has been applied for unused clocks as well.
> > > With this patch merged in, all the providers that have sync_state
> > > callback registered will leave the domains enabled unless the provider's
> > > sync_state callback explicitly disables them. So those providers will
> > > need to add the disabling part to their sync_state callback. On the
> > > other hand, the platforms that have cases where domains need to remain
> > > enabled (even if unused) until the consumer driver probes, will be able,
> > > with this patch in, to run without the pd_ignore_unused kernel argument,
> > > which seems to be the case for most Qualcomm platforms, at this moment.
> > 
> > I recently encountered a related issue on a Qualcomm platform with a
> > v6.2-rc kernel, which includes 3a39049f88e4 ("soc: qcom: rpmhpd: Use
> > highest corner until sync_state"). The issue involves a DT node with a
> > rpmhpd, the DT node is enabled, however the corresponding device driver
> > is not enabled in the kernel. In such a scenario the sync_state callback
> > is never called, because the genpd consumer never probes. As a result
> > the Always-on subsystem (AOSS) of the SoC doesn't enter sleep mode during
> > system suspend, which results in a substantially higher power consumption
> > in S3.
> 
> If I get this correctly, one of the providers is missing (doesn't matter
> the reason), in which case, your kernel needs that driver, period. There
> is no reason why you would expect the consumer to work without the
> provider. Or, you could just remove the property in the devicetree node,
> the property that makes the consumer wait for that provider. Anyway, you
> should never end up with a consumer provider relationship in devicetree
> without providing the provider driver.

I would agree if it was actually a provider that's missing, however it's a
'missing' consumer that prevents the sync_state() call.

> > I wonder if genpd (and some other frameworks) needs something like
> > regulator_init_complete(), which turns off unused regulators 30s after
> > system boot. That's conceptually similar to the current
> > genpd_power_off_unused(), but would provide time for modules being loaded.
> 
> NACK, timeouts are just another hack in this case, specially when we
> have a pretty reliable mechanism like sync_state.

It does not work properly unless all consumers are probed successfully. It
makes sense to wait some time for the consumers to probe, but not eternally,
it's perfectly valid that a driver for a (potential) consumer is not enabled.
Matthias Kaehlcke Feb. 6, 2023, 5:37 p.m. UTC | #11
On Mon, Feb 06, 2023 at 06:24:30PM +0200, Abel Vesa wrote:
> On 23-02-02 14:20:56, Doug Anderson wrote:
> > Hi,
> > 
> > On Thu, Feb 2, 2023 at 11:53 AM Dmitry Baryshkov
> > <dmitry.baryshkov@linaro.org> wrote:
> > >
> > > On 02/02/2023 20:24, Matthias Kaehlcke wrote:
> > > > Hi Abel,
> > > >
> > > > On Fri, Jan 27, 2023 at 12:40:53PM +0200, Abel Vesa wrote:
> > > >> Currently, there are cases when a domain needs to remain enabled until
> > > >> the consumer driver probes. Sometimes such consumer drivers may be built
> > > >> as modules. Since the genpd_power_off_unused is called too early for
> > > >> such consumer driver modules to get a chance to probe, the domain, since
> > > >> it is unused, will get disabled. On the other hand, the best time for
> > > >> an unused domain to be disabled is on the provider's sync_state
> > > >> callback. So, if the provider has registered a sync_state callback,
> > > >> assume the unused domains for that provider will be disabled on its
> > > >> sync_state callback. Also provide a generic sync_state callback which
> > > >> disables all the domains unused for the provider that registers it.
> > > >>
> > > >> Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
> > > >> ---
> > > >>
> > > >> This approach has been applied for unused clocks as well.
> > > >> With this patch merged in, all the providers that have sync_state
> > > >> callback registered will leave the domains enabled unless the provider's
> > > >> sync_state callback explicitly disables them. So those providers will
> > > >> need to add the disabling part to their sync_state callback. On the
> > > >> other hand, the platforms that have cases where domains need to remain
> > > >> enabled (even if unused) until the consumer driver probes, will be able,
> > > >> with this patch in, to run without the pd_ignore_unused kernel argument,
> > > >> which seems to be the case for most Qualcomm platforms, at this moment.
> > > >
> > > > I recently encountered a related issue on a Qualcomm platform with a
> > > > v6.2-rc kernel, which includes 3a39049f88e4 ("soc: qcom: rpmhpd: Use
> > > > highest corner until sync_state"). The issue involves a DT node with a
> > > > rpmhpd, the DT node is enabled, however the corresponding device driver
> > > > is not enabled in the kernel. In such a scenario the sync_state callback
> > > > is never called, because the genpd consumer never probes. As a result
> > > > the Always-on subsystem (AOSS) of the SoC doesn't enter sleep mode during
> > > > system suspend, which results in a substantially higher power consumption
> > > > in S3.
> > > >
> > > > I wonder if genpd (and some other frameworks) needs something like
> > > > regulator_init_complete(), which turns off unused regulators 30s after
> > > > system boot. That's conceptually similar to the current
> > > > genpd_power_off_unused(), but would provide time for modules being loaded.
> > >
> > > I think the overall goal is to move away from ad-hoc implementations
> > > like clk_disable_unused/genpd_power_off_unused/regulator_init_complete
> > > towards the sync_state.
> > >
> > > So inherently one either has to provide drivers for all devices in
> > > question or disable unused devices in DT.
> > 
> > Hmm. I guess I haven't been involved too much in those discussions,
> > but overall I thought:
> > 
> > 1. The device tree should ideally be describing the hardware. Thus if
> > the hardware is there / available to use on a given board then the
> > device should be marked enabled.
> 
> That is correct.
> 
> > 
> > 2. Users are not actually required to enable drivers for all hardware
> > on their board. Things should still function OK even if a driver is
> > disabled. For instance, if the SoC had a crypto accelerator you'd
> > describe it in the device tree but it would be OK for someone to build
> > a kernel that didn't enable the crypto accelerator driver.
> 
> Right, but sync state is relying on fw_devlinks to decide if there are
> any consumers left that still need to probe. So if one of the consumer
> devicetree nodes needs some provider, the consumer will simply not work
> without the provider. In theory at least.

It is correct that a device which actually depends on a provider won't
work without that provider, however that isn't the case here. In the
scenario above the consumer is the crypto accelerator. It doesn't probe
because it's driver is not enabled, not because it's waiting for a
provider (clk, interconnect, ...).
Matthias Kaehlcke Feb. 6, 2023, 6:36 p.m. UTC | #12
On Mon, Feb 06, 2023 at 07:48:23PM +0200, Abel Vesa wrote:
> 
> CC'ed Saravana
> 
> On 23-02-06 17:22:23, Matthias Kaehlcke wrote:
> > On Mon, Feb 06, 2023 at 06:31:21PM +0200, Abel Vesa wrote:
> > > On 23-02-02 18:24:15, Matthias Kaehlcke wrote:
> > > > Hi Abel,
> > > > 
> > > > On Fri, Jan 27, 2023 at 12:40:53PM +0200, Abel Vesa wrote:
> > > > > Currently, there are cases when a domain needs to remain enabled until
> > > > > the consumer driver probes. Sometimes such consumer drivers may be built
> > > > > as modules. Since the genpd_power_off_unused is called too early for
> > > > > such consumer driver modules to get a chance to probe, the domain, since
> > > > > it is unused, will get disabled. On the other hand, the best time for
> > > > > an unused domain to be disabled is on the provider's sync_state
> > > > > callback. So, if the provider has registered a sync_state callback,
> > > > > assume the unused domains for that provider will be disabled on its
> > > > > sync_state callback. Also provide a generic sync_state callback which
> > > > > disables all the domains unused for the provider that registers it.
> > > > > 
> > > > > Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
> > > > > ---
> > > > > 
> > > > > This approach has been applied for unused clocks as well.
> > > > > With this patch merged in, all the providers that have sync_state
> > > > > callback registered will leave the domains enabled unless the provider's
> > > > > sync_state callback explicitly disables them. So those providers will
> > > > > need to add the disabling part to their sync_state callback. On the
> > > > > other hand, the platforms that have cases where domains need to remain
> > > > > enabled (even if unused) until the consumer driver probes, will be able,
> > > > > with this patch in, to run without the pd_ignore_unused kernel argument,
> > > > > which seems to be the case for most Qualcomm platforms, at this moment.
> > > > 
> > > > I recently encountered a related issue on a Qualcomm platform with a
> > > > v6.2-rc kernel, which includes 3a39049f88e4 ("soc: qcom: rpmhpd: Use
> > > > highest corner until sync_state"). The issue involves a DT node with a
> > > > rpmhpd, the DT node is enabled, however the corresponding device driver
> > > > is not enabled in the kernel. In such a scenario the sync_state callback
> > > > is never called, because the genpd consumer never probes. As a result
> > > > the Always-on subsystem (AOSS) of the SoC doesn't enter sleep mode during
> > > > system suspend, which results in a substantially higher power consumption
> > > > in S3.
> > > 
> > > If I get this correctly, one of the providers is missing (doesn't matter
> > > the reason), in which case, your kernel needs that driver, period. There
> > > is no reason why you would expect the consumer to work without the
> > > provider. Or, you could just remove the property in the devicetree node,
> > > the property that makes the consumer wait for that provider. Anyway, you
> > > should never end up with a consumer provider relationship in devicetree
> > > without providing the provider driver.
> > 
> > I would agree if it was actually a provider that's missing, however it's a
> > 'missing' consumer that prevents the sync_state() call.
> 
> Oh, my bad.
> 
> Still, why would you keep the consumer node enabled in devicetree if you don't
> intend to allow its driver to ever probe?

As Doug pointed out, the device tree is supposed to describe the hardware, but
that shouldn't impose a user/admin/downstream maintainer to use every single
existing piece of hardware with the potential power implications on battery
powererd devices.

I someone uses an off the shelf board like a Raspberry Pi for a project in
which they only use a subset of the functionality, they would be forced to
use a downstream device tree if they can't just disable the drivers they
are not interested in. Supposedly we want people/companies to use upstream
kernels as much as possible, however you suggest to adapt the device tree
in a way that does not describe the hardware, which effectively forces folks
to user downstream kernels (or at least device trees).

> > > > I wonder if genpd (and some other frameworks) needs something like
> > > > regulator_init_complete(), which turns off unused regulators 30s after
> > > > system boot. That's conceptually similar to the current
> > > > genpd_power_off_unused(), but would provide time for modules being loaded.
> > > 
> > > NACK, timeouts are just another hack in this case, specially when we
> > > have a pretty reliable mechanism like sync_state.
> > 
> > It does not work properly unless all consumers are probed successfully. It
> > makes sense to wait some time for the consumers to probe, but not eternally,
> > it's perfectly valid that a driver for a (potential) consumer is not enabled.
> 
> Usually, if you have a consumer devicetree node that you consider it
> should not probe, you should consider disabling that node in your board
> dts, specially if you don't intend to provide its driver.

Nope, the device tree is supposed to describe the hardware and the hardware
doesn't change because a particular use case doesn't require/want the use of
all existing parts.

> Again, timeouts are bad all-around. What happens if rootfs doesn't get
> mounted in time? Will 30 seconds be enough for every scenario? What
> happens if I want to load the driver (module) for a consumer a day after boot?

I am not sure if I have a complete/correct picture here. My understanding is
that sync_state is above all used for handing over critical hardware from the
bootloader to the kernel. For example the CPUs may require clocks and
interconnects to run at a certain speed during boot, before they are 'probed'
and can ask the provider to run the resource (at least) at a certain speed.

I would expect that drivers on the roots aren't critical for the system to
enter the rootfs, all these should be either built-in or as modules on an
initramfs. Let's say we have an audio driver that uses a clock (provider)
and this audio driver lives as a modules on the rootfs. For some reason our
rootfs takes a long time to mount and sync_state() of the clock provider is
called, due to the timeout we introduced. The provider determines that it
hasn't received any requests for the audio clock and disables it. Now
finally the rootfs is mounted and out audio module is loaded. The audio
device probes, gets the clock and asks the provider to run it at certain
speed. The clock provider puts the clock at the requested speed and audio
works as if the sync_state timeout never happened.

Am I missing/misunderstanding anything important?

> IMHO, I think even the regulator_init_complete should be switched to some sync
> state approach.

Maybe, with a timeout :)
Saravana Kannan Feb. 6, 2023, 7:32 p.m. UTC | #13
On Mon, Feb 6, 2023 at 9:48 AM Abel Vesa <abel.vesa@linaro.org> wrote:
>
>
> CC'ed Saravana

Thanks. Please do cc me for stuff like this from the start. I skimmed
the series and I think it's doing one of my TODO items. So, thanks for
the patch!

I'll take a closer look within a few days -- trying to get through
some existing fw_devlink stuff.

But long story short, it is the right thing to keep a supplier on
indefinitely if there's a consumer device (that's not disabled in DT)
that never gets probed. It's a pretty common scenario -- for example,
say a display backlight. The default case should be functional
correctness. And then we can add stuff that allows changing this
behavior with command line args or something else that can be done
from userspace.

+1 to what Doug said elsewhere in this thread too. I'm trying to
consolidate the "when do we give up" decision at the driver core level
independent of what framework is being used.

-Saravana

>
> On 23-02-06 17:22:23, Matthias Kaehlcke wrote:
> > On Mon, Feb 06, 2023 at 06:31:21PM +0200, Abel Vesa wrote:
> > > On 23-02-02 18:24:15, Matthias Kaehlcke wrote:
> > > > Hi Abel,
> > > >
> > > > On Fri, Jan 27, 2023 at 12:40:53PM +0200, Abel Vesa wrote:
> > > > > Currently, there are cases when a domain needs to remain enabled until
> > > > > the consumer driver probes. Sometimes such consumer drivers may be built
> > > > > as modules. Since the genpd_power_off_unused is called too early for
> > > > > such consumer driver modules to get a chance to probe, the domain, since
> > > > > it is unused, will get disabled. On the other hand, the best time for
> > > > > an unused domain to be disabled is on the provider's sync_state
> > > > > callback. So, if the provider has registered a sync_state callback,
> > > > > assume the unused domains for that provider will be disabled on its
> > > > > sync_state callback. Also provide a generic sync_state callback which
> > > > > disables all the domains unused for the provider that registers it.
> > > > >
> > > > > Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
> > > > > ---
> > > > >
> > > > > This approach has been applied for unused clocks as well.
> > > > > With this patch merged in, all the providers that have sync_state
> > > > > callback registered will leave the domains enabled unless the provider's
> > > > > sync_state callback explicitly disables them. So those providers will
> > > > > need to add the disabling part to their sync_state callback. On the
> > > > > other hand, the platforms that have cases where domains need to remain
> > > > > enabled (even if unused) until the consumer driver probes, will be able,
> > > > > with this patch in, to run without the pd_ignore_unused kernel argument,
> > > > > which seems to be the case for most Qualcomm platforms, at this moment.
> > > >
> > > > I recently encountered a related issue on a Qualcomm platform with a
> > > > v6.2-rc kernel, which includes 3a39049f88e4 ("soc: qcom: rpmhpd: Use
> > > > highest corner until sync_state"). The issue involves a DT node with a
> > > > rpmhpd, the DT node is enabled, however the corresponding device driver
> > > > is not enabled in the kernel. In such a scenario the sync_state callback
> > > > is never called, because the genpd consumer never probes. As a result
> > > > the Always-on subsystem (AOSS) of the SoC doesn't enter sleep mode during
> > > > system suspend, which results in a substantially higher power consumption
> > > > in S3.
> > >
> > > If I get this correctly, one of the providers is missing (doesn't matter
> > > the reason), in which case, your kernel needs that driver, period. There
> > > is no reason why you would expect the consumer to work without the
> > > provider. Or, you could just remove the property in the devicetree node,
> > > the property that makes the consumer wait for that provider. Anyway, you
> > > should never end up with a consumer provider relationship in devicetree
> > > without providing the provider driver.
> >
> > I would agree if it was actually a provider that's missing, however it's a
> > 'missing' consumer that prevents the sync_state() call.
>
> Oh, my bad.
>
> Still, why would you keep the consumer node enabled in devicetree if you don't
> intend to allow its driver to ever probe?
>
> >
> > > > I wonder if genpd (and some other frameworks) needs something like
> > > > regulator_init_complete(), which turns off unused regulators 30s after
> > > > system boot. That's conceptually similar to the current
> > > > genpd_power_off_unused(), but would provide time for modules being loaded.
> > >
> > > NACK, timeouts are just another hack in this case, specially when we
> > > have a pretty reliable mechanism like sync_state.
> >
> > It does not work properly unless all consumers are probed successfully. It
> > makes sense to wait some time for the consumers to probe, but not eternally,
> > it's perfectly valid that a driver for a (potential) consumer is not enabled.
>
> Usually, if you have a consumer devicetree node that you consider it
> should not probe, you should consider disabling that node in your board
> dts, specially if you don't intend to provide its driver.
>
> Again, timeouts are bad all-around. What happens if rootfs doesn't get
> mounted in time? Will 30 seconds be enough for every scenario? What
> happens if I want to load the driver (module) for a consumer a day after boot?
>
> IMHO, I think even the regulator_init_complete should be switched to some sync
> state approach.
Doug Anderson Feb. 6, 2023, 9:10 p.m. UTC | #14
Hi,

On Mon, Feb 6, 2023 at 11:33 AM Saravana Kannan <saravanak@google.com> wrote:
>
> On Mon, Feb 6, 2023 at 9:48 AM Abel Vesa <abel.vesa@linaro.org> wrote:
> >
> >
> > CC'ed Saravana
>
> Thanks. Please do cc me for stuff like this from the start. I skimmed
> the series and I think it's doing one of my TODO items. So, thanks for
> the patch!
>
> I'll take a closer look within a few days -- trying to get through
> some existing fw_devlink stuff.
>
> But long story short, it is the right thing to keep a supplier on
> indefinitely if there's a consumer device (that's not disabled in DT)
> that never gets probed. It's a pretty common scenario -- for example,
> say a display backlight. The default case should be functional
> correctness. And then we can add stuff that allows changing this
> behavior with command line args or something else that can be done
> from userspace.
>
> +1 to what Doug said elsewhere in this thread too. I'm trying to
> consolidate the "when do we give up" decision at the driver core level
> independent of what framework is being used.

I'm not really sure I agree with the above, at least not without lots
of discussion in the community. It really goes against what the kernel
has been doing for years and years in the regulator and clock
frameworks. Those frameworks both eventually give up and power down
resources that no active drivers are using. Either changing the
regulator/clock frameworks or saying that other frameworks should work
in an opposite way seems like a recipe for confusion.

Now, certainly I won't say that the way that the regulator and clock
frameworks function is perfect nor will I say that they don't cause
any problems. However, going the opposite way where resources are kept
at full power indefinitely will _also_ cause problems.

Specifically, let's look at the case you mentioned of a display
backlight. I think you're saying that if there is no backlight driver
enabled in the kernel that you'd expect the backlight to just be on at
full brightness. Would you expect this even if the firmware didn't
leave the backlight on? In any case, why do you say it's more correct?
I suppose you'd say that the screen is at least usable like this.
...except that you've broken a different feature: suspend/resume.
Without being able to turn the backlight off at suspend time the
device would drain tons of power. It could also overheat when you
stuffed it in your backpack and damage the battery or start a fire.
Even if you argue that in the case of the display backlight you're
better off, what about a keyboard backlight? It's pretty easy to use a
laptop without the keyboard backlight and if you didn't have a driver
for it you'd be in better shape leaving it off instead of leaving it
on 100% of the time, even when the device is suspended.

Overall: if a kernel isn't configured for a given driver we shouldn't
be expecting the hardware controlled by that driver to work. The best
we can hope for is that it's at least in a low power state.

In general I think that having a well-defined way to know it's time to
give up and power off anything for which a driver didn't probe needs
to be an important part of any designs here.


> -Saravana
>
> >
> > On 23-02-06 17:22:23, Matthias Kaehlcke wrote:
> > > On Mon, Feb 06, 2023 at 06:31:21PM +0200, Abel Vesa wrote:
> > > > On 23-02-02 18:24:15, Matthias Kaehlcke wrote:
> > > > > Hi Abel,
> > > > >
> > > > > On Fri, Jan 27, 2023 at 12:40:53PM +0200, Abel Vesa wrote:
> > > > > > Currently, there are cases when a domain needs to remain enabled until
> > > > > > the consumer driver probes. Sometimes such consumer drivers may be built
> > > > > > as modules. Since the genpd_power_off_unused is called too early for
> > > > > > such consumer driver modules to get a chance to probe, the domain, since
> > > > > > it is unused, will get disabled. On the other hand, the best time for
> > > > > > an unused domain to be disabled is on the provider's sync_state
> > > > > > callback. So, if the provider has registered a sync_state callback,
> > > > > > assume the unused domains for that provider will be disabled on its
> > > > > > sync_state callback. Also provide a generic sync_state callback which
> > > > > > disables all the domains unused for the provider that registers it.
> > > > > >
> > > > > > Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
> > > > > > ---
> > > > > >
> > > > > > This approach has been applied for unused clocks as well.
> > > > > > With this patch merged in, all the providers that have sync_state
> > > > > > callback registered will leave the domains enabled unless the provider's
> > > > > > sync_state callback explicitly disables them. So those providers will
> > > > > > need to add the disabling part to their sync_state callback. On the
> > > > > > other hand, the platforms that have cases where domains need to remain
> > > > > > enabled (even if unused) until the consumer driver probes, will be able,
> > > > > > with this patch in, to run without the pd_ignore_unused kernel argument,
> > > > > > which seems to be the case for most Qualcomm platforms, at this moment.
> > > > >
> > > > > I recently encountered a related issue on a Qualcomm platform with a
> > > > > v6.2-rc kernel, which includes 3a39049f88e4 ("soc: qcom: rpmhpd: Use
> > > > > highest corner until sync_state"). The issue involves a DT node with a
> > > > > rpmhpd, the DT node is enabled, however the corresponding device driver
> > > > > is not enabled in the kernel. In such a scenario the sync_state callback
> > > > > is never called, because the genpd consumer never probes. As a result
> > > > > the Always-on subsystem (AOSS) of the SoC doesn't enter sleep mode during
> > > > > system suspend, which results in a substantially higher power consumption
> > > > > in S3.
> > > >
> > > > If I get this correctly, one of the providers is missing (doesn't matter
> > > > the reason), in which case, your kernel needs that driver, period. There
> > > > is no reason why you would expect the consumer to work without the
> > > > provider. Or, you could just remove the property in the devicetree node,
> > > > the property that makes the consumer wait for that provider. Anyway, you
> > > > should never end up with a consumer provider relationship in devicetree
> > > > without providing the provider driver.
> > >
> > > I would agree if it was actually a provider that's missing, however it's a
> > > 'missing' consumer that prevents the sync_state() call.
> >
> > Oh, my bad.
> >
> > Still, why would you keep the consumer node enabled in devicetree if you don't
> > intend to allow its driver to ever probe?
> >
> > >
> > > > > I wonder if genpd (and some other frameworks) needs something like
> > > > > regulator_init_complete(), which turns off unused regulators 30s after
> > > > > system boot. That's conceptually similar to the current
> > > > > genpd_power_off_unused(), but would provide time for modules being loaded.
> > > >
> > > > NACK, timeouts are just another hack in this case, specially when we
> > > > have a pretty reliable mechanism like sync_state.
> > >
> > > It does not work properly unless all consumers are probed successfully. It
> > > makes sense to wait some time for the consumers to probe, but not eternally,
> > > it's perfectly valid that a driver for a (potential) consumer is not enabled.
> >
> > Usually, if you have a consumer devicetree node that you consider it
> > should not probe, you should consider disabling that node in your board
> > dts, specially if you don't intend to provide its driver.
> >
> > Again, timeouts are bad all-around. What happens if rootfs doesn't get
> > mounted in time? Will 30 seconds be enough for every scenario? What
> > happens if I want to load the driver (module) for a consumer a day after boot?
> >
> > IMHO, I think even the regulator_init_complete should be switched to some sync
> > state approach.
Ulf Hansson Feb. 15, 2023, 11:57 a.m. UTC | #15
On Fri, 27 Jan 2023 at 11:40, Abel Vesa <abel.vesa@linaro.org> wrote:
>
> Currently, there are cases when a domain needs to remain enabled until
> the consumer driver probes. Sometimes such consumer drivers may be built
> as modules. Since the genpd_power_off_unused is called too early for
> such consumer driver modules to get a chance to probe, the domain, since
> it is unused, will get disabled. On the other hand, the best time for
> an unused domain to be disabled is on the provider's sync_state
> callback. So, if the provider has registered a sync_state callback,
> assume the unused domains for that provider will be disabled on its
> sync_state callback. Also provide a generic sync_state callback which
> disables all the domains unused for the provider that registers it.
>
> Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
> ---
>
> This approach has been applied for unused clocks as well.
> With this patch merged in, all the providers that have sync_state
> callback registered will leave the domains enabled unless the provider's
> sync_state callback explicitly disables them. So those providers will
> need to add the disabling part to their sync_state callback. On the
> other hand, the platforms that have cases where domains need to remain
> enabled (even if unused) until the consumer driver probes, will be able,
> with this patch in, to run without the pd_ignore_unused kernel argument,
> which seems to be the case for most Qualcomm platforms, at this moment.

My apologies for the somewhat late reply. Please see my comments below.

>
> The v1 is here:
> https://lore.kernel.org/all/20230126234013.3638425-1-abel.vesa@linaro.org/
>
> Changes since v1:
>  * added a generic sync state callback to be registered by providers in
>    order to disable the unused domains on their sync state. Also
>    mentioned this in the commit message.
>
>  drivers/base/power/domain.c | 17 ++++++++++++++++-
>  include/linux/pm_domain.h   |  3 +++
>  2 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> index 84662d338188..c2a5f77c01f3 100644
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -1099,7 +1099,8 @@ static int __init genpd_power_off_unused(void)
>         mutex_lock(&gpd_list_lock);
>
>         list_for_each_entry(genpd, &gpd_list, gpd_list_node)
> -               genpd_queue_power_off_work(genpd);
> +               if (!dev_has_sync_state(genpd->provider->dev))

Unfortunately, this doesn't really help, due to the fact that a
genpd's ->power_off() callback may get called anyway. At power off,
the genpd core only cares about those consumers that are currently
attached, not those that might get attached at some point later in
time.

In other words, it's the responsibility for each specific genpd
provider to cope with the condition that its ->sync_state() callback
may *not* have been called, while its ->power_off() callback is being
called.

In these cases, the genpd provider should probably make the
->power_off() callback to return -EBUSY. This is what we do in
psci_pd_power_off(), for example.

> +                       genpd_queue_power_off_work(genpd);
>
>         mutex_unlock(&gpd_list_lock);
>
> @@ -1107,6 +1108,20 @@ static int __init genpd_power_off_unused(void)
>  }
>  late_initcall(genpd_power_off_unused);
>
> +void genpd_power_off_unused_sync_state(struct device *dev)
> +{
> +       struct generic_pm_domain *genpd;
> +
> +       mutex_lock(&gpd_list_lock);
> +
> +       list_for_each_entry(genpd, &gpd_list, gpd_list_node)
> +               if (genpd->provider->dev == dev)
> +                       genpd_queue_power_off_work(genpd);
> +
> +       mutex_unlock(&gpd_list_lock);
> +}
> +EXPORT_SYMBOL_GPL(genpd_power_off_unused_sync_state);

I don't think this function is needed at all.

In fact, this part of the problem that you are trying to solve should
already be managed by the driver core, as it calls
dev->pm_domain->sync() (which is assigned to genpd_dev_pm_sync()) , in
really_probe(). Or isn't that taking care of the problem for you?

> +
>  #ifdef CONFIG_PM_SLEEP
>
>  /**
> diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
> index f776fb93eaa0..1fd5aa500c81 100644
> --- a/include/linux/pm_domain.h
> +++ b/include/linux/pm_domain.h
> @@ -351,6 +351,7 @@ struct device *genpd_dev_pm_attach_by_id(struct device *dev,
>                                          unsigned int index);
>  struct device *genpd_dev_pm_attach_by_name(struct device *dev,
>                                            const char *name);
> +void genpd_power_off_unused_sync_state(struct device *dev);
>  #else /* !CONFIG_PM_GENERIC_DOMAINS_OF */
>  static inline int of_genpd_add_provider_simple(struct device_node *np,
>                                         struct generic_pm_domain *genpd)
> @@ -419,6 +420,8 @@ struct generic_pm_domain *of_genpd_remove_last(struct device_node *np)
>  {
>         return ERR_PTR(-EOPNOTSUPP);
>  }
> +
> +static inline genpd_power_off_unused_sync_state(struct device *dev) {}
>  #endif /* CONFIG_PM_GENERIC_DOMAINS_OF */
>
>  #ifdef CONFIG_PM
> --
> 2.34.1
>

Kind regards
Uffe
Abel Vesa Feb. 15, 2023, 12:21 p.m. UTC | #16
On 23-02-15 12:57:54, Ulf Hansson wrote:
> On Fri, 27 Jan 2023 at 11:40, Abel Vesa <abel.vesa@linaro.org> wrote:
> >
> > Currently, there are cases when a domain needs to remain enabled until
> > the consumer driver probes. Sometimes such consumer drivers may be built
> > as modules. Since the genpd_power_off_unused is called too early for
> > such consumer driver modules to get a chance to probe, the domain, since
> > it is unused, will get disabled. On the other hand, the best time for
> > an unused domain to be disabled is on the provider's sync_state
> > callback. So, if the provider has registered a sync_state callback,
> > assume the unused domains for that provider will be disabled on its
> > sync_state callback. Also provide a generic sync_state callback which
> > disables all the domains unused for the provider that registers it.
> >
> > Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
> > ---
> >
> > This approach has been applied for unused clocks as well.
> > With this patch merged in, all the providers that have sync_state
> > callback registered will leave the domains enabled unless the provider's
> > sync_state callback explicitly disables them. So those providers will
> > need to add the disabling part to their sync_state callback. On the
> > other hand, the platforms that have cases where domains need to remain
> > enabled (even if unused) until the consumer driver probes, will be able,
> > with this patch in, to run without the pd_ignore_unused kernel argument,
> > which seems to be the case for most Qualcomm platforms, at this moment.
> 
> My apologies for the somewhat late reply. Please see my comments below.
> 
> >
> > The v1 is here:
> > https://lore.kernel.org/all/20230126234013.3638425-1-abel.vesa@linaro.org/
> >
> > Changes since v1:
> >  * added a generic sync state callback to be registered by providers in
> >    order to disable the unused domains on their sync state. Also
> >    mentioned this in the commit message.
> >
> >  drivers/base/power/domain.c | 17 ++++++++++++++++-
> >  include/linux/pm_domain.h   |  3 +++
> >  2 files changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> > index 84662d338188..c2a5f77c01f3 100644
> > --- a/drivers/base/power/domain.c
> > +++ b/drivers/base/power/domain.c
> > @@ -1099,7 +1099,8 @@ static int __init genpd_power_off_unused(void)
> >         mutex_lock(&gpd_list_lock);
> >
> >         list_for_each_entry(genpd, &gpd_list, gpd_list_node)
> > -               genpd_queue_power_off_work(genpd);
> > +               if (!dev_has_sync_state(genpd->provider->dev))
> 
> Unfortunately, this doesn't really help, due to the fact that a
> genpd's ->power_off() callback may get called anyway. At power off,
> the genpd core only cares about those consumers that are currently
> attached, not those that might get attached at some point later in
> time.
> 
> In other words, it's the responsibility for each specific genpd
> provider to cope with the condition that its ->sync_state() callback
> may *not* have been called, while its ->power_off() callback is being
> called.
> 
> In these cases, the genpd provider should probably make the
> ->power_off() callback to return -EBUSY. This is what we do in
> psci_pd_power_off(), for example.
> 

Hmm, this might actually be a better idea. Bjorn, do you agree?

> > +                       genpd_queue_power_off_work(genpd);
> >
> >         mutex_unlock(&gpd_list_lock);
> >
> > @@ -1107,6 +1108,20 @@ static int __init genpd_power_off_unused(void)
> >  }
> >  late_initcall(genpd_power_off_unused);
> >
> > +void genpd_power_off_unused_sync_state(struct device *dev)
> > +{
> > +       struct generic_pm_domain *genpd;
> > +
> > +       mutex_lock(&gpd_list_lock);
> > +
> > +       list_for_each_entry(genpd, &gpd_list, gpd_list_node)
> > +               if (genpd->provider->dev == dev)
> > +                       genpd_queue_power_off_work(genpd);
> > +
> > +       mutex_unlock(&gpd_list_lock);
> > +}
> > +EXPORT_SYMBOL_GPL(genpd_power_off_unused_sync_state);
> 
> I don't think this function is needed at all.
> 
> In fact, this part of the problem that you are trying to solve should
> already be managed by the driver core, as it calls
> dev->pm_domain->sync() (which is assigned to genpd_dev_pm_sync()) , in
> really_probe(). Or isn't that taking care of the problem for you?

Hmm, I missed the genpd_dev_pm_sync scenario entirely. Yes, that is
actually what is needed, and yes, this function I added here is useless
in this case.

> 
> > +
> >  #ifdef CONFIG_PM_SLEEP
> >
> >  /**
> > diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
> > index f776fb93eaa0..1fd5aa500c81 100644
> > --- a/include/linux/pm_domain.h
> > +++ b/include/linux/pm_domain.h
> > @@ -351,6 +351,7 @@ struct device *genpd_dev_pm_attach_by_id(struct device *dev,
> >                                          unsigned int index);
> >  struct device *genpd_dev_pm_attach_by_name(struct device *dev,
> >                                            const char *name);
> > +void genpd_power_off_unused_sync_state(struct device *dev);
> >  #else /* !CONFIG_PM_GENERIC_DOMAINS_OF */
> >  static inline int of_genpd_add_provider_simple(struct device_node *np,
> >                                         struct generic_pm_domain *genpd)
> > @@ -419,6 +420,8 @@ struct generic_pm_domain *of_genpd_remove_last(struct device_node *np)
> >  {
> >         return ERR_PTR(-EOPNOTSUPP);
> >  }
> > +
> > +static inline genpd_power_off_unused_sync_state(struct device *dev) {}
> >  #endif /* CONFIG_PM_GENERIC_DOMAINS_OF */
> >
> >  #ifdef CONFIG_PM
> > --
> > 2.34.1
> >
> 
> Kind regards
> Uffe
Bjorn Andersson Feb. 20, 2023, 4:43 p.m. UTC | #17
On Wed, Feb 15, 2023 at 02:21:53PM +0200, Abel Vesa wrote:
> On 23-02-15 12:57:54, Ulf Hansson wrote:
> > On Fri, 27 Jan 2023 at 11:40, Abel Vesa <abel.vesa@linaro.org> wrote:
> > >
> > > Currently, there are cases when a domain needs to remain enabled until
> > > the consumer driver probes. Sometimes such consumer drivers may be built
> > > as modules. Since the genpd_power_off_unused is called too early for
> > > such consumer driver modules to get a chance to probe, the domain, since
> > > it is unused, will get disabled. On the other hand, the best time for
> > > an unused domain to be disabled is on the provider's sync_state
> > > callback. So, if the provider has registered a sync_state callback,
> > > assume the unused domains for that provider will be disabled on its
> > > sync_state callback. Also provide a generic sync_state callback which
> > > disables all the domains unused for the provider that registers it.
> > >
> > > Signed-off-by: Abel Vesa <abel.vesa@linaro.org>
> > > ---
> > >
> > > This approach has been applied for unused clocks as well.
> > > With this patch merged in, all the providers that have sync_state
> > > callback registered will leave the domains enabled unless the provider's
> > > sync_state callback explicitly disables them. So those providers will
> > > need to add the disabling part to their sync_state callback. On the
> > > other hand, the platforms that have cases where domains need to remain
> > > enabled (even if unused) until the consumer driver probes, will be able,
> > > with this patch in, to run without the pd_ignore_unused kernel argument,
> > > which seems to be the case for most Qualcomm platforms, at this moment.
> > 
> > My apologies for the somewhat late reply. Please see my comments below.
> > 
> > >
> > > The v1 is here:
> > > https://lore.kernel.org/all/20230126234013.3638425-1-abel.vesa@linaro.org/
> > >
> > > Changes since v1:
> > >  * added a generic sync state callback to be registered by providers in
> > >    order to disable the unused domains on their sync state. Also
> > >    mentioned this in the commit message.
> > >
> > >  drivers/base/power/domain.c | 17 ++++++++++++++++-
> > >  include/linux/pm_domain.h   |  3 +++
> > >  2 files changed, 19 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> > > index 84662d338188..c2a5f77c01f3 100644
> > > --- a/drivers/base/power/domain.c
> > > +++ b/drivers/base/power/domain.c
> > > @@ -1099,7 +1099,8 @@ static int __init genpd_power_off_unused(void)
> > >         mutex_lock(&gpd_list_lock);
> > >
> > >         list_for_each_entry(genpd, &gpd_list, gpd_list_node)
> > > -               genpd_queue_power_off_work(genpd);
> > > +               if (!dev_has_sync_state(genpd->provider->dev))
> > 
> > Unfortunately, this doesn't really help, due to the fact that a
> > genpd's ->power_off() callback may get called anyway. At power off,
> > the genpd core only cares about those consumers that are currently
> > attached, not those that might get attached at some point later in
> > time.
> > 
> > In other words, it's the responsibility for each specific genpd
> > provider to cope with the condition that its ->sync_state() callback
> > may *not* have been called, while its ->power_off() callback is being
> > called.
> > 
> > In these cases, the genpd provider should probably make the
> > ->power_off() callback to return -EBUSY. This is what we do in
> > psci_pd_power_off(), for example.
> > 
> 
> Hmm, this might actually be a better idea. Bjorn, do you agree?
> 

Yes, I agree.

Regards,
Bjorn
Bjorn Andersson Feb. 20, 2023, 5:15 p.m. UTC | #18
On Tue, Feb 07, 2023 at 03:45:35PM -0800, Doug Anderson wrote:
> Hi,
> 
> On Mon, Feb 6, 2023 at 1:35 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > On Mon, Feb 6, 2023 at 1:10 PM Doug Anderson <dianders@chromium.org> wrote:
> > >
> > > Hi,
> > >
> > > On Mon, Feb 6, 2023 at 11:33 AM Saravana Kannan <saravanak@google.com> wrote:
> > > >
> > > > On Mon, Feb 6, 2023 at 9:48 AM Abel Vesa <abel.vesa@linaro.org> wrote:
> > > > >
> > > > >
> > > > > CC'ed Saravana
> > > >
> > > > Thanks. Please do cc me for stuff like this from the start. I skimmed
> > > > the series and I think it's doing one of my TODO items. So, thanks for
> > > > the patch!
> > > >
> > > > I'll take a closer look within a few days -- trying to get through
> > > > some existing fw_devlink stuff.
> > > >
> > > > But long story short, it is the right thing to keep a supplier on
> > > > indefinitely if there's a consumer device (that's not disabled in DT)
> > > > that never gets probed. It's a pretty common scenario -- for example,
> > > > say a display backlight. The default case should be functional
> > > > correctness. And then we can add stuff that allows changing this
> > > > behavior with command line args or something else that can be done
> > > > from userspace.
> > > >
> > > > +1 to what Doug said elsewhere in this thread too. I'm trying to
> > > > consolidate the "when do we give up" decision at the driver core level
> > > > independent of what framework is being used.
> > >
> > > I'm not really sure I agree with the above, at least not without lots
> > > of discussion in the community. It really goes against what the kernel
> > > has been doing for years and years in the regulator and clock
> > > frameworks. Those frameworks both eventually give up and power down
> > > resources that no active drivers are using. Either changing the
> > > regulator/clock frameworks or saying that other frameworks should work
> > > in an opposite way seems like a recipe for confusion.
> > >
> > > Now, certainly I won't say that the way that the regulator and clock
> > > frameworks function is perfect nor will I say that they don't cause
> > > any problems. However, going the opposite way where resources are kept
> > > at full power indefinitely will _also_ cause problems.
> > >
> > > Specifically, let's look at the case you mentioned of a display
> > > backlight. I think you're saying that if there is no backlight driver
> > > enabled in the kernel that you'd expect the backlight to just be on at
> > > full brightness.
> >
> > No, I'm not saying that.
> >
> > > Would you expect this even if the firmware didn't
> > > leave the backlight on?
> >
> > sync_state() never turns on anything that wasn't already on at boot.
> > So in your example, if the firmware didn't turn on the backlight, then
> > it'll remain off.
> 
> As per offline discussion, part of the problems are that today this
> _isn't_ true for a few Qualcomm things (like interconnect). The
> interconnect frameway specifically maxes things out for early boot.
> 

The problem being solved here is that the bootloader leaves some vote at
1GB/s, as needed by hardware related to driver B.

Driver A is loaded first and votes for 1kb/s; what should the kernel do
now, without knowledge of the needs from the hardware associated with B,
or the ability to read back the bootloader's votes.

This was the behavior of the initial implementation, and the practical
implications was seen as the UART would typically come along really
early, cast a low vote on the various buses and it would take forever to
get to the probing of the drivers that actually gave us reasonable
votes.


Also consider the case where driver A probes, votes for bandwidth, does
it's initialization and then votes for 0. Without making assumptions
about the needs of B (or a potential B even), we'd turn off critical
resources - possible preventing us from ever attempting to probe B.



As such, the only safe solution is to assume that there might be a later
loaded/probed client that has a large vote and preemptively vote for
some higher bandwidth until then.

> 
> > > In any case, why do you say it's more correct?
> >
> > Because if you turn off the display, the device is unusable. In other
> > circumstances, it can crash a device because the firmware powered it
> > on left it in a "good enough" state, but we'd go turn it off and crash
> > the system.
> >
> > > I suppose you'd say that the screen is at least usable like this.
> > > ...except that you've broken a different feature: suspend/resume.
> >
> > If the display is off and the laptop is unusable, then we have bigger
> > problems than suspend/resume?
> 
> I suspect that here we'll have to agree to disagree. IMO it's a
> non-goal to expect hardware to work for which there is no driver. So
> making the backlight work without a backlight driver isn't really
> something we should strive for.
> 

Without trying to make you agree ;)

How can you differentiate between "the driver wasn't built" and "the
driver isn't yet available"?

Consider the case where I boot my laptop, I have some set of builtin
drivers, some set of drivers in the ramdisk and some set of drivers in
the root filesystem.

In the event that something goes wrong mounting the rootfs, I will now
be in the ramdisk console. Given the current timer-based disabling of
regulators, I have ~25 seconds to solve my problem before the backlight
goes blank.


Obviously this isn't a typical scenario in a consumer device, but it
seems conceivable that your ramdisk would run fsck for some amount of
time before mounting the rootfs and picking up the last tier of drivers.

Regards,
Bjorn
Matthias Kaehlcke Feb. 21, 2023, 6:32 p.m. UTC | #19
On Mon, Feb 20, 2023 at 09:15:50AM -0800, Bjorn Andersson wrote:
> On Tue, Feb 07, 2023 at 03:45:35PM -0800, Doug Anderson wrote:
> > Hi,
> > 
> > On Mon, Feb 6, 2023 at 1:35 PM Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > On Mon, Feb 6, 2023 at 1:10 PM Doug Anderson <dianders@chromium.org> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Mon, Feb 6, 2023 at 11:33 AM Saravana Kannan <saravanak@google.com> wrote:
> > > > >
> > > > > On Mon, Feb 6, 2023 at 9:48 AM Abel Vesa <abel.vesa@linaro.org> wrote:
> > > > > >
> > > > > >
> > > > > > CC'ed Saravana
> > > > >
> > > > > Thanks. Please do cc me for stuff like this from the start. I skimmed
> > > > > the series and I think it's doing one of my TODO items. So, thanks for
> > > > > the patch!
> > > > >
> > > > > I'll take a closer look within a few days -- trying to get through
> > > > > some existing fw_devlink stuff.
> > > > >
> > > > > But long story short, it is the right thing to keep a supplier on
> > > > > indefinitely if there's a consumer device (that's not disabled in DT)
> > > > > that never gets probed. It's a pretty common scenario -- for example,
> > > > > say a display backlight. The default case should be functional
> > > > > correctness. And then we can add stuff that allows changing this
> > > > > behavior with command line args or something else that can be done
> > > > > from userspace.
> > > > >
> > > > > +1 to what Doug said elsewhere in this thread too. I'm trying to
> > > > > consolidate the "when do we give up" decision at the driver core level
> > > > > independent of what framework is being used.
> > > >
> > > > I'm not really sure I agree with the above, at least not without lots
> > > > of discussion in the community. It really goes against what the kernel
> > > > has been doing for years and years in the regulator and clock
> > > > frameworks. Those frameworks both eventually give up and power down
> > > > resources that no active drivers are using. Either changing the
> > > > regulator/clock frameworks or saying that other frameworks should work
> > > > in an opposite way seems like a recipe for confusion.
> > > >
> > > > Now, certainly I won't say that the way that the regulator and clock
> > > > frameworks function is perfect nor will I say that they don't cause
> > > > any problems. However, going the opposite way where resources are kept
> > > > at full power indefinitely will _also_ cause problems.
> > > >
> > > > Specifically, let's look at the case you mentioned of a display
> > > > backlight. I think you're saying that if there is no backlight driver
> > > > enabled in the kernel that you'd expect the backlight to just be on at
> > > > full brightness.
> > >
> > > No, I'm not saying that.
> > >
> > > > Would you expect this even if the firmware didn't
> > > > leave the backlight on?
> > >
> > > sync_state() never turns on anything that wasn't already on at boot.
> > > So in your example, if the firmware didn't turn on the backlight, then
> > > it'll remain off.
> > 
> > As per offline discussion, part of the problems are that today this
> > _isn't_ true for a few Qualcomm things (like interconnect). The
> > interconnect frameway specifically maxes things out for early boot.
> > 
> 
> The problem being solved here is that the bootloader leaves some vote at
> 1GB/s, as needed by hardware related to driver B.
> 
> Driver A is loaded first and votes for 1kb/s; what should the kernel do
> now, without knowledge of the needs from the hardware associated with B,
> or the ability to read back the bootloader's votes.
> 
> This was the behavior of the initial implementation, and the practical
> implications was seen as the UART would typically come along really
> early, cast a low vote on the various buses and it would take forever to
> get to the probing of the drivers that actually gave us reasonable
> votes.

I generally understand this problem and agree that it makes sense to bump
the resources *initially*. Doug and I primarily question the 'wait forever'
part of it.

> Also consider the case where driver A probes, votes for bandwidth, does
> it's initialization and then votes for 0. Without making assumptions
> about the needs of B (or a potential B even), we'd turn off critical
> resources - possible preventing us from ever attempting to probe B.

For the most critical devices that are probed during early boot this
would still work if the resources are initially bumped and then turned
off after some timeout.

Could you provide an example for some other type of device that is/would
be probed later? Except for auto-probing buses like USB or PCI the device
should probe regardless of the resources being enabled and then vote
during probe for the required bandwidth, voltage, etc., which should put
the resources into the required state. Am I missing something here?

> As such, the only safe solution is to assume that there might be a later
> loaded/probed client that has a large vote and preemptively vote for
> some higher bandwidth until then.

> > > > In any case, why do you say it's more correct?
> > >
> > > Because if you turn off the display, the device is unusable. In other
> > > circumstances, it can crash a device because the firmware powered it
> > > on left it in a "good enough" state, but we'd go turn it off and crash
> > > the system.
> > >
> > > > I suppose you'd say that the screen is at least usable like this.
> > > > ...except that you've broken a different feature: suspend/resume.
> > >
> > > If the display is off and the laptop is unusable, then we have bigger
> > > problems than suspend/resume?
> > 
> > I suspect that here we'll have to agree to disagree. IMO it's a
> > non-goal to expect hardware to work for which there is no driver. So
> > making the backlight work without a backlight driver isn't really
> > something we should strive for.
> > 
> 
> Without trying to make you agree ;)
> 
> How can you differentiate between "the driver wasn't built" and "the
> driver isn't yet available"?

Unfortunately you can't AFAIK.

> Consider the case where I boot my laptop, I have some set of builtin
> drivers, some set of drivers in the ramdisk and some set of drivers in
> the root filesystem.
> 
> In the event that something goes wrong mounting the rootfs, I will now
> be in the ramdisk console. Given the current timer-based disabling of
> regulators, I have ~25 seconds to solve my problem before the backlight
> goes blank.
> 
> 
> Obviously this isn't a typical scenario in a consumer device, but it
> seems conceivable that your ramdisk would run fsck for some amount of
> time before mounting the rootfs and picking up the last tier of drivers.

If the laptop is running a kernel that is tailored for this device I'd
say the most practial solution would be to either build the backlight
driver into the kernel or have it on the ramdisk as a module.

However the laptop might be running a distribution like Debian or Red Hat
with (I assume) a single kernel+ramdisk for all systems of a given
architecture (e.g. aarch64). In that case it might not be desirable to
have all possible backlight drivers in the kernel image or ramdisk. For
this kind of system 'wait forever' might be a suitable solution.

I have the impression we might want both options, a timeout after which
resources are turned off unless they have been voted for, and 'wait
forever', with a Kconfig option to select the desired behavior.

For most tailored systems the timout is probably a more suitable solution.
The maintainer of the kernel/system can decide to not enable certain
drivers because they aren't needed and include essential drivers into
the kernel image or ramdisk. The timeout ensures that the system doesn't
burn extra power for reasons that aren't evident for the maintainer (who
might not even be aware of the whole sync_state story).

On the other hand general purpose distributions might want to wait
forever, since they have to support a wide range of hardware and enable
most available drivers anyway.

If we end up with such an option I think the timeout should be the
default. Why? There are hundreds of maintainers of tailored kernels
who might run into hard to detect/debug power issues with 'wait
forever'. On the other hand there is a limited number of general purpose
distributions, with kernel teams that probably already know about
'sync_state'. They only have to enable 'wait forever' once and then
carry it forward to future versions.
Bjorn Andersson Feb. 22, 2023, 6:51 p.m. UTC | #20
On Tue, Feb 21, 2023 at 06:32:06PM +0000, Matthias Kaehlcke wrote:
> On Mon, Feb 20, 2023 at 09:15:50AM -0800, Bjorn Andersson wrote:
> > On Tue, Feb 07, 2023 at 03:45:35PM -0800, Doug Anderson wrote:
> > > Hi,
> > > 
> > > On Mon, Feb 6, 2023 at 1:35 PM Saravana Kannan <saravanak@google.com> wrote:
> > > >
> > > > On Mon, Feb 6, 2023 at 1:10 PM Doug Anderson <dianders@chromium.org> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Mon, Feb 6, 2023 at 11:33 AM Saravana Kannan <saravanak@google.com> wrote:
> > > > > >
> > > > > > On Mon, Feb 6, 2023 at 9:48 AM Abel Vesa <abel.vesa@linaro.org> wrote:
> > > > > > >
> > > > > > >
> > > > > > > CC'ed Saravana
> > > > > >
> > > > > > Thanks. Please do cc me for stuff like this from the start. I skimmed
> > > > > > the series and I think it's doing one of my TODO items. So, thanks for
> > > > > > the patch!
> > > > > >
> > > > > > I'll take a closer look within a few days -- trying to get through
> > > > > > some existing fw_devlink stuff.
> > > > > >
> > > > > > But long story short, it is the right thing to keep a supplier on
> > > > > > indefinitely if there's a consumer device (that's not disabled in DT)
> > > > > > that never gets probed. It's a pretty common scenario -- for example,
> > > > > > say a display backlight. The default case should be functional
> > > > > > correctness. And then we can add stuff that allows changing this
> > > > > > behavior with command line args or something else that can be done
> > > > > > from userspace.
> > > > > >
> > > > > > +1 to what Doug said elsewhere in this thread too. I'm trying to
> > > > > > consolidate the "when do we give up" decision at the driver core level
> > > > > > independent of what framework is being used.
> > > > >
> > > > > I'm not really sure I agree with the above, at least not without lots
> > > > > of discussion in the community. It really goes against what the kernel
> > > > > has been doing for years and years in the regulator and clock
> > > > > frameworks. Those frameworks both eventually give up and power down
> > > > > resources that no active drivers are using. Either changing the
> > > > > regulator/clock frameworks or saying that other frameworks should work
> > > > > in an opposite way seems like a recipe for confusion.
> > > > >
> > > > > Now, certainly I won't say that the way that the regulator and clock
> > > > > frameworks function is perfect nor will I say that they don't cause
> > > > > any problems. However, going the opposite way where resources are kept
> > > > > at full power indefinitely will _also_ cause problems.
> > > > >
> > > > > Specifically, let's look at the case you mentioned of a display
> > > > > backlight. I think you're saying that if there is no backlight driver
> > > > > enabled in the kernel that you'd expect the backlight to just be on at
> > > > > full brightness.
> > > >
> > > > No, I'm not saying that.
> > > >
> > > > > Would you expect this even if the firmware didn't
> > > > > leave the backlight on?
> > > >
> > > > sync_state() never turns on anything that wasn't already on at boot.
> > > > So in your example, if the firmware didn't turn on the backlight, then
> > > > it'll remain off.
> > > 
> > > As per offline discussion, part of the problems are that today this
> > > _isn't_ true for a few Qualcomm things (like interconnect). The
> > > interconnect frameway specifically maxes things out for early boot.
> > > 
> > 
> > The problem being solved here is that the bootloader leaves some vote at
> > 1GB/s, as needed by hardware related to driver B.
> > 
> > Driver A is loaded first and votes for 1kb/s; what should the kernel do
> > now, without knowledge of the needs from the hardware associated with B,
> > or the ability to read back the bootloader's votes.
> > 
> > This was the behavior of the initial implementation, and the practical
> > implications was seen as the UART would typically come along really
> > early, cast a low vote on the various buses and it would take forever to
> > get to the probing of the drivers that actually gave us reasonable
> > votes.
> 
> I generally understand this problem and agree that it makes sense to bump
> the resources *initially*. Doug and I primarily question the 'wait forever'
> part of it.
> 

The question is when "initially" ends.

> > Also consider the case where driver A probes, votes for bandwidth, does
> > it's initialization and then votes for 0. Without making assumptions
> > about the needs of B (or a potential B even), we'd turn off critical
> > resources - possible preventing us from ever attempting to probe B.
> 
> For the most critical devices that are probed during early boot this
> would still work if the resources are initially bumped and then turned
> off after some timeout.
> 

The resources that needs to be kept on are those which we rely on for
the system to reach said driver 'B'.

The obvious ones are those allowing us to execute code (e.g. some form
of DDR vote) and things such as earlycon - until the console driver is
probed properly and can cast a interconnect vote.

But the typical example here is display, which depending on system
configuration might rely on the hardware being able to fetch data from
DDR until all the relevant display driver components is starting to take
care of the voting.

> Could you provide an example for some other type of device that is/would
> be probed later? Except for auto-probing buses like USB or PCI the device
> should probe regardless of the resources being enabled and then vote
> during probe for the required bandwidth, voltage, etc., which should put
> the resources into the required state. Am I missing something here?
> 

What do you mean with "later"?

We have one consistent definition of "later", and that's
late_initcall(). By that definition it's pretty much everything beyond
gcc, interconnect and UART is or may probe "later" (because starting
userspace without a valid /dev/console is suboptimal)..


Looking at something specific, if you where to try to boot sc7280 using
the upstream defconfig I don't think you have anyone ensuring that the
CPU is allowed to reach DDR at this point.

> > As such, the only safe solution is to assume that there might be a later
> > loaded/probed client that has a large vote and preemptively vote for
> > some higher bandwidth until then.
> 
> > > > > In any case, why do you say it's more correct?
> > > >
> > > > Because if you turn off the display, the device is unusable. In other
> > > > circumstances, it can crash a device because the firmware powered it
> > > > on left it in a "good enough" state, but we'd go turn it off and crash
> > > > the system.
> > > >
> > > > > I suppose you'd say that the screen is at least usable like this.
> > > > > ...except that you've broken a different feature: suspend/resume.
> > > >
> > > > If the display is off and the laptop is unusable, then we have bigger
> > > > problems than suspend/resume?
> > > 
> > > I suspect that here we'll have to agree to disagree. IMO it's a
> > > non-goal to expect hardware to work for which there is no driver. So
> > > making the backlight work without a backlight driver isn't really
> > > something we should strive for.
> > > 
> > 
> > Without trying to make you agree ;)
> > 
> > How can you differentiate between "the driver wasn't built" and "the
> > driver isn't yet available"?
> 
> Unfortunately you can't AFAIK.
> 
> > Consider the case where I boot my laptop, I have some set of builtin
> > drivers, some set of drivers in the ramdisk and some set of drivers in
> > the root filesystem.
> > 
> > In the event that something goes wrong mounting the rootfs, I will now
> > be in the ramdisk console. Given the current timer-based disabling of
> > regulators, I have ~25 seconds to solve my problem before the backlight
> > goes blank.
> > 
> > 
> > Obviously this isn't a typical scenario in a consumer device, but it
> > seems conceivable that your ramdisk would run fsck for some amount of
> > time before mounting the rootfs and picking up the last tier of drivers.
> 
> If the laptop is running a kernel that is tailored for this device I'd
> say the most practial solution would be to either build the backlight
> driver into the kernel or have it on the ramdisk as a module.
> 
> However the laptop might be running a distribution like Debian or Red Hat
> with (I assume) a single kernel+ramdisk for all systems of a given
> architecture (e.g. aarch64). In that case it might not be desirable to
> have all possible backlight drivers in the kernel image or ramdisk. For
> this kind of system 'wait forever' might be a suitable solution.
> 

For most users, most of the time, I don't think "wait forever" is very
good either; e.g. if a distro is lacking one .ko the user would see
significant worse battery performance...

But at the same time, when your distro asks for your crypto key for your
rootfs and the kernel decides that times up, that's pretty bad user
experience as well..

> I have the impression we might want both options, a timeout after which
> resources are turned off unless they have been voted for, and 'wait
> forever', with a Kconfig option to select the desired behavior.
> 
> For most tailored systems the timout is probably a more suitable solution.
> The maintainer of the kernel/system can decide to not enable certain
> drivers because they aren't needed and include essential drivers into
> the kernel image or ramdisk. The timeout ensures that the system doesn't
> burn extra power for reasons that aren't evident for the maintainer (who
> might not even be aware of the whole sync_state story).
> 
> On the other hand general purpose distributions might want to wait
> forever, since they have to support a wide range of hardware and enable
> most available drivers anyway.
> 
> If we end up with such an option I think the timeout should be the
> default. Why? There are hundreds of maintainers of tailored kernels
> who might run into hard to detect/debug power issues with 'wait
> forever'. On the other hand there is a limited number of general purpose
> distributions, with kernel teams that probably already know about
> 'sync_state'. They only have to enable 'wait forever' once and then
> carry it forward to future versions.

In your tailored system you test every hardware and driver combinations,
so the wait forever seems like a very easy thing to handle.

In a general purpose distribution, the distribution doesn't have a way
to test every single system. So it's quite likely that they won't enable
some optional set of drivers (e.g. camera, venus) and wait forever would
be the wrong thing to do. Except when the user needs that backlight, or
the framebuffer etc.

I don't know how to solve this, the global timeout is likely the best
way we have for now - at least it works, for all cases except when
someone needs to stay in their initramfs for more than 30 seconds...

Regards,
Bjorn
Matthias Kaehlcke Feb. 23, 2023, 11:20 p.m. UTC | #21
On Wed, Feb 22, 2023 at 10:51:52AM -0800, Bjorn Andersson wrote:
> On Tue, Feb 21, 2023 at 06:32:06PM +0000, Matthias Kaehlcke wrote:
> > On Mon, Feb 20, 2023 at 09:15:50AM -0800, Bjorn Andersson wrote:
> > > On Tue, Feb 07, 2023 at 03:45:35PM -0800, Doug Anderson wrote:
> > > > Hi,
> > > > 
> > > > On Mon, Feb 6, 2023 at 1:35 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > >
> > > > > On Mon, Feb 6, 2023 at 1:10 PM Doug Anderson <dianders@chromium.org> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > On Mon, Feb 6, 2023 at 11:33 AM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > >
> > > > > > > On Mon, Feb 6, 2023 at 9:48 AM Abel Vesa <abel.vesa@linaro.org> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > CC'ed Saravana
> > > > > > >
> > > > > > > Thanks. Please do cc me for stuff like this from the start. I skimmed
> > > > > > > the series and I think it's doing one of my TODO items. So, thanks for
> > > > > > > the patch!
> > > > > > >
> > > > > > > I'll take a closer look within a few days -- trying to get through
> > > > > > > some existing fw_devlink stuff.
> > > > > > >
> > > > > > > But long story short, it is the right thing to keep a supplier on
> > > > > > > indefinitely if there's a consumer device (that's not disabled in DT)
> > > > > > > that never gets probed. It's a pretty common scenario -- for example,
> > > > > > > say a display backlight. The default case should be functional
> > > > > > > correctness. And then we can add stuff that allows changing this
> > > > > > > behavior with command line args or something else that can be done
> > > > > > > from userspace.
> > > > > > >
> > > > > > > +1 to what Doug said elsewhere in this thread too. I'm trying to
> > > > > > > consolidate the "when do we give up" decision at the driver core level
> > > > > > > independent of what framework is being used.
> > > > > >
> > > > > > I'm not really sure I agree with the above, at least not without lots
> > > > > > of discussion in the community. It really goes against what the kernel
> > > > > > has been doing for years and years in the regulator and clock
> > > > > > frameworks. Those frameworks both eventually give up and power down
> > > > > > resources that no active drivers are using. Either changing the
> > > > > > regulator/clock frameworks or saying that other frameworks should work
> > > > > > in an opposite way seems like a recipe for confusion.
> > > > > >
> > > > > > Now, certainly I won't say that the way that the regulator and clock
> > > > > > frameworks function is perfect nor will I say that they don't cause
> > > > > > any problems. However, going the opposite way where resources are kept
> > > > > > at full power indefinitely will _also_ cause problems.
> > > > > >
> > > > > > Specifically, let's look at the case you mentioned of a display
> > > > > > backlight. I think you're saying that if there is no backlight driver
> > > > > > enabled in the kernel that you'd expect the backlight to just be on at
> > > > > > full brightness.
> > > > >
> > > > > No, I'm not saying that.
> > > > >
> > > > > > Would you expect this even if the firmware didn't
> > > > > > leave the backlight on?
> > > > >
> > > > > sync_state() never turns on anything that wasn't already on at boot.
> > > > > So in your example, if the firmware didn't turn on the backlight, then
> > > > > it'll remain off.
> > > > 
> > > > As per offline discussion, part of the problems are that today this
> > > > _isn't_ true for a few Qualcomm things (like interconnect). The
> > > > interconnect frameway specifically maxes things out for early boot.
> > > > 
> > > 
> > > The problem being solved here is that the bootloader leaves some vote at
> > > 1GB/s, as needed by hardware related to driver B.
> > > 
> > > Driver A is loaded first and votes for 1kb/s; what should the kernel do
> > > now, without knowledge of the needs from the hardware associated with B,
> > > or the ability to read back the bootloader's votes.
> > > 
> > > This was the behavior of the initial implementation, and the practical
> > > implications was seen as the UART would typically come along really
> > > early, cast a low vote on the various buses and it would take forever to
> > > get to the probing of the drivers that actually gave us reasonable
> > > votes.
> > 
> > I generally understand this problem and agree that it makes sense to bump
> > the resources *initially*. Doug and I primarily question the 'wait forever'
> > part of it.
> > 
> 
> The question is when "initially" ends.

ack

> > > Also consider the case where driver A probes, votes for bandwidth, does
> > > it's initialization and then votes for 0. Without making assumptions
> > > about the needs of B (or a potential B even), we'd turn off critical
> > > resources - possible preventing us from ever attempting to probe B.
> > 
> > For the most critical devices that are probed during early boot this
> > would still work if the resources are initially bumped and then turned
> > off after some timeout.
> > 
> 
> The resources that needs to be kept on are those which we rely on for
> the system to reach said driver 'B'.
> 
> The obvious ones are those allowing us to execute code (e.g. some form
> of DDR vote) and things such as earlycon - until the console driver is
> probed properly and can cast a interconnect vote.

Yes, these were the things that came to my mind and I'd expect the
corresponding drivers to be either builtin or on the ramdisk, so they
wouldn't be impacted if the rootfs takes longer to mount.

> But the typical example here is display, which depending on system
> configuration might rely on the hardware being able to fetch data from
> DDR until all the relevant display driver components is starting to take
> care of the voting.

Thanks for the example, it's certainly desirable to keep the display
working even when the rootf can't be mounted (in time).

> > Could you provide an example for some other type of device that is/would
> > be probed later? Except for auto-probing buses like USB or PCI the device
> > should probe regardless of the resources being enabled and then vote
> > during probe for the required bandwidth, voltage, etc., which should put
> > the resources into the required state. Am I missing something here?
> > 
> 
> What do you mean with "later"?

I was thinking beyond 'early boot', you are right, there is no clear
definition.

> We have one consistent definition of "later", and that's
> late_initcall(). By that definition it's pretty much everything beyond
> gcc, interconnect and UART is or may probe "later" (because starting
> userspace without a valid /dev/console is suboptimal)..
> 
> 
> Looking at something specific, if you where to try to boot sc7280 using
> the upstream defconfig I don't think you have anyone ensuring that the
> CPU is allowed to reach DDR at this point.

Why is that the case?

> > > As such, the only safe solution is to assume that there might be a later
> > > loaded/probed client that has a large vote and preemptively vote for
> > > some higher bandwidth until then.
> > 
> > > > > > In any case, why do you say it's more correct?
> > > > >
> > > > > Because if you turn off the display, the device is unusable. In other
> > > > > circumstances, it can crash a device because the firmware powered it
> > > > > on left it in a "good enough" state, but we'd go turn it off and crash
> > > > > the system.
> > > > >
> > > > > > I suppose you'd say that the screen is at least usable like this.
> > > > > > ...except that you've broken a different feature: suspend/resume.
> > > > >
> > > > > If the display is off and the laptop is unusable, then we have bigger
> > > > > problems than suspend/resume?
> > > > 
> > > > I suspect that here we'll have to agree to disagree. IMO it's a
> > > > non-goal to expect hardware to work for which there is no driver. So
> > > > making the backlight work without a backlight driver isn't really
> > > > something we should strive for.
> > > > 
> > > 
> > > Without trying to make you agree ;)
> > > 
> > > How can you differentiate between "the driver wasn't built" and "the
> > > driver isn't yet available"?
> > 
> > Unfortunately you can't AFAIK.
> > 
> > > Consider the case where I boot my laptop, I have some set of builtin
> > > drivers, some set of drivers in the ramdisk and some set of drivers in
> > > the root filesystem.
> > > 
> > > In the event that something goes wrong mounting the rootfs, I will now
> > > be in the ramdisk console. Given the current timer-based disabling of
> > > regulators, I have ~25 seconds to solve my problem before the backlight
> > > goes blank.
> > > 
> > > 
> > > Obviously this isn't a typical scenario in a consumer device, but it
> > > seems conceivable that your ramdisk would run fsck for some amount of
> > > time before mounting the rootfs and picking up the last tier of drivers.
> > 
> > If the laptop is running a kernel that is tailored for this device I'd
> > say the most practial solution would be to either build the backlight
> > driver into the kernel or have it on the ramdisk as a module.
> > 
> > However the laptop might be running a distribution like Debian or Red Hat
> > with (I assume) a single kernel+ramdisk for all systems of a given
> > architecture (e.g. aarch64). In that case it might not be desirable to
> > have all possible backlight drivers in the kernel image or ramdisk. For
> > this kind of system 'wait forever' might be a suitable solution.
> > 
> 
> For most users, most of the time, I don't think "wait forever" is very
> good either; e.g. if a distro is lacking one .ko the user would see
> significant worse battery performance...
> 
> But at the same time, when your distro asks for your crypto key for your
> rootfs and the kernel decides that times up, that's pretty bad user
> experience as well..

Indeed, both options have their significant caveats :(

> > I have the impression we might want both options, a timeout after which
> > resources are turned off unless they have been voted for, and 'wait
> > forever', with a Kconfig option to select the desired behavior.
> > 
> > For most tailored systems the timout is probably a more suitable solution.
> > The maintainer of the kernel/system can decide to not enable certain
> > drivers because they aren't needed and include essential drivers into
> > the kernel image or ramdisk. The timeout ensures that the system doesn't
> > burn extra power for reasons that aren't evident for the maintainer (who
> > might not even be aware of the whole sync_state story).
> > 
> > On the other hand general purpose distributions might want to wait
> > forever, since they have to support a wide range of hardware and enable
> > most available drivers anyway.
> > 
> > If we end up with such an option I think the timeout should be the
> > default. Why? There are hundreds of maintainers of tailored kernels
> > who might run into hard to detect/debug power issues with 'wait
> > forever'. On the other hand there is a limited number of general purpose
> > distributions, with kernel teams that probably already know about
> > 'sync_state'. They only have to enable 'wait forever' once and then
> > carry it forward to future versions.
> 
> In your tailored system you test every hardware and driver combinations,
> so the wait forever seems like a very easy thing to handle.

I disagree

For one you might not want to enable certain drivers, because the system
doesn't make use of the underlying hardware. In theory (probably not as
much in practice) the device tree could describe existing hardware which
isn't supported by the kernel (yet).

More importantly it might not be evident to the maintaner(s) of the system
that they have an issue impacting power. Think of small companies with a
single person or very small kernel team and no vendor support. How do
they even know their system is consuming more power than it would if
only all these drivers they don't really care about were enabled? Some of
them will be aware of the problematic and pro-actively look out for
possible issues, but I expect many aren't. It wouldn't be scalable and
'user'-friendly to put the burden on them and expect them to sort out the
issues individually.

> In a general purpose distribution, the distribution doesn't have a way
> to test every single system. So it's quite likely that they won't enable
> some optional set of drivers (e.g. camera, venus) and wait forever would
> be the wrong thing to do. Except when the user needs that backlight, or
> the framebuffer etc.

True, even a general purpose distribution doesn't necessarily enable
everything.

> I don't know how to solve this, the global timeout is likely the best
> way we have for now - at least it works, for all cases except when
> someone needs to stay in their initramfs for more than 30 seconds...

There is indeed (at least for now) no solution that addresses all the
issues. A timeout sounds good to me for the time being.
Doug Anderson March 1, 2023, 10:40 p.m. UTC | #22
Hi,

On Mon, Feb 20, 2023 at 9:12 AM Bjorn Andersson <andersson@kernel.org> wrote:
>
> > I suspect that here we'll have to agree to disagree. IMO it's a
> > non-goal to expect hardware to work for which there is no driver. So
> > making the backlight work without a backlight driver isn't really
> > something we should strive for.
> >
>
> Without trying to make you agree ;)
>
> How can you differentiate between "the driver wasn't built" and "the
> driver isn't yet available"?

BTW, when I was responding to Saravana's series [1], I realized that
you _can_ differentiate between these two cases, at least from a
practical point of view. Specifically, when the
"deferred_probe_timeout" expires then you should assume that "the
driver wasn't built". Said another way, once the
"deferred_probe_timeout" expires then you should assume that the
driver won't be available in the future. While you still could try
loading it, in general once that timeout has expired the kernel has
made decisions (like making -EPROBE_DEFER non-retriable) that make it
very awkward to load new drivers.

Of course, one could say "hey, let's get rid of the
deferred_probe_timeout". That might be a tough sell unless you can
come up with an equivalent solution for those currently using this
feature.

[1] https://lore.kernel.org/r/CAD=FV=XQnLpD1P8sRBcizTMjCQyHTjaiNvjcPdgyZc5JCzvOtw@mail.gmail.com


> Consider the case where I boot my laptop, I have some set of builtin
> drivers, some set of drivers in the ramdisk and some set of drivers in
> the root filesystem.
>
> In the event that something goes wrong mounting the rootfs, I will now
> be in the ramdisk console. Given the current timer-based disabling of
> regulators, I have ~25 seconds to solve my problem before the backlight
> goes blank.

I personally don't love the timeout. It feels like the kind of thing
that userspace knows and should be able to tell the kernel. I know we
don't like to put impositions on userspace, but userspace is pretty
definitely involved in things like loading modules. It just makes
sense (in my mind) for userspace to say when it's done and all modules
for cold-plugged devices have been loaded...

-Doug
diff mbox series

Patch

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 84662d338188..c2a5f77c01f3 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -1099,7 +1099,8 @@  static int __init genpd_power_off_unused(void)
 	mutex_lock(&gpd_list_lock);
 
 	list_for_each_entry(genpd, &gpd_list, gpd_list_node)
-		genpd_queue_power_off_work(genpd);
+		if (!dev_has_sync_state(genpd->provider->dev))
+			genpd_queue_power_off_work(genpd);
 
 	mutex_unlock(&gpd_list_lock);
 
@@ -1107,6 +1108,20 @@  static int __init genpd_power_off_unused(void)
 }
 late_initcall(genpd_power_off_unused);
 
+void genpd_power_off_unused_sync_state(struct device *dev)
+{
+	struct generic_pm_domain *genpd;
+
+	mutex_lock(&gpd_list_lock);
+
+	list_for_each_entry(genpd, &gpd_list, gpd_list_node)
+		if (genpd->provider->dev == dev)
+			genpd_queue_power_off_work(genpd);
+
+	mutex_unlock(&gpd_list_lock);
+}
+EXPORT_SYMBOL_GPL(genpd_power_off_unused_sync_state);
+
 #ifdef CONFIG_PM_SLEEP
 
 /**
diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
index f776fb93eaa0..1fd5aa500c81 100644
--- a/include/linux/pm_domain.h
+++ b/include/linux/pm_domain.h
@@ -351,6 +351,7 @@  struct device *genpd_dev_pm_attach_by_id(struct device *dev,
 					 unsigned int index);
 struct device *genpd_dev_pm_attach_by_name(struct device *dev,
 					   const char *name);
+void genpd_power_off_unused_sync_state(struct device *dev);
 #else /* !CONFIG_PM_GENERIC_DOMAINS_OF */
 static inline int of_genpd_add_provider_simple(struct device_node *np,
 					struct generic_pm_domain *genpd)
@@ -419,6 +420,8 @@  struct generic_pm_domain *of_genpd_remove_last(struct device_node *np)
 {
 	return ERR_PTR(-EOPNOTSUPP);
 }
+
+static inline genpd_power_off_unused_sync_state(struct device *dev) {}
 #endif /* CONFIG_PM_GENERIC_DOMAINS_OF */
 
 #ifdef CONFIG_PM