[V8,1/6] PM / Domains: Add support to select performance-state of domains

Message ID 52daf22d2c9d92b0e61c8c5c5a88516d7a08a31a.1498026827.git.viresh.kumar@linaro.org
State New
Headers show
Series
  • [V8,1/6] PM / Domains: Add support to select performance-state of domains
Related show

Commit Message

Viresh Kumar June 21, 2017, 7:10 a.m.
Some platforms have the capability to configure the performance state of
their Power Domains. The performance levels are identified by positive
integer values, a lower value represents lower performance state.

This patch adds a new genpd API: pm_genpd_update_performance_state().
The caller passes the affected device and the frequency representing its
next DVFS state.

The power domains get two new callbacks:

- get_performance_state(): This is called by the genpd core to retrieve
  the performance state (integer value) corresponding to a target
  frequency for the device. This state is used by the genpd core to find
  the highest requested state by all the devices powered by a domain.

- set_performance_state(): The highest state retrieved from above
  interface is then passed to this callback to finally program the
  performance state of the power domain.

The power domains can avoid supplying these callbacks, if they don't
support setting performance-states.

A power domain may have only get_performance_state() callback, if it
doesn't have the capability of changing the performance state itself but
someone in its parent hierarchy has.

A power domain may have only set_performance_state(), if it doesn't have
any direct devices below it but subdomains. And so the
get_performance_state() will never get called from the core.

The more common case would be to have both the callbacks set.

Another API, pm_genpd_has_performance_state(), is also added to let
other parts of the kernel check if the power domain of a device supports
performance-states or not. This could have been done from
pm_genpd_update_performance_state() as well, but that routine gets
called every time we do DVFS for the device and it wouldn't be optimal
in that case.

Note that, the performance level as returned by
->get_performance_state() for the parent domain of a device is used for
all domains in parent hierarchy.

Tested-by: Rajendra Nayak <rnayak@codeaurora.org>

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

---
 drivers/base/power/domain.c | 223 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pm_domain.h   |  22 +++++
 2 files changed, 245 insertions(+)

-- 
2.13.0.71.gd7076ec9c9cb

Comments

Ulf Hansson July 17, 2017, 12:38 p.m. | #1
On 21 June 2017 at 09:10, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> Some platforms have the capability to configure the performance state of

> their Power Domains. The performance levels are identified by positive

> integer values, a lower value represents lower performance state.

>

> This patch adds a new genpd API: pm_genpd_update_performance_state().

> The caller passes the affected device and the frequency representing its

> next DVFS state.

>

> The power domains get two new callbacks:

>

> - get_performance_state(): This is called by the genpd core to retrieve

>   the performance state (integer value) corresponding to a target

>   frequency for the device. This state is used by the genpd core to find

>   the highest requested state by all the devices powered by a domain.


Please clarify this a bit more.

I guess what you want to say is that genpd aggregates all requested
performance state of all its devices and its subdomains, to be able to
set a correct (highest requested) performance state.

Moreover, could you perhaps explain a bit on *when* this callback
becomes invoked.

>

> - set_performance_state(): The highest state retrieved from above

>   interface is then passed to this callback to finally program the

>   performance state of the power domain.


When will this callback be invoked?

What happens when a power domain gets powered off and then on. Is the
performance state restored? Please elaborate a bit on this.

>

> The power domains can avoid supplying these callbacks, if they don't

> support setting performance-states.

>

> A power domain may have only get_performance_state() callback, if it

> doesn't have the capability of changing the performance state itself but

> someone in its parent hierarchy has.

>

> A power domain may have only set_performance_state(), if it doesn't have

> any direct devices below it but subdomains. And so the

> get_performance_state() will never get called from the core.

>


It seems like the ->get_perfomance_state() is a device specific
callback, while the ->set_performance_state() is a genpd domain
callback.

I am wondering whether we should try to improve the names of the
callbacks to reflect this.

> The more common case would be to have both the callbacks set.

>

> Another API, pm_genpd_has_performance_state(), is also added to let

> other parts of the kernel check if the power domain of a device supports

> performance-states or not. This could have been done from


I think a better name of this function is:
dev_pm_genpd_has_performance_state(). What do you think?

We might even want to decide to explicitly stay using the terminology
"DVFS" instead. In such case, perhaps converting the names of the
callbacks/API to use "dvfs" instead. For the API added here, maybe
dev_pm_genpd_can_dvfs().

> pm_genpd_update_performance_state() as well, but that routine gets

> called every time we do DVFS for the device and it wouldn't be optimal

> in that case.


So pm_genpd_update_performance_state() is also a new API added in
$subject patch. But there is no information about it in the changelog,
besides the above. Please add that.

Moreover, perhaps we should rename the function to dev_pm_genpd_set_dvfs()

>

> Note that, the performance level as returned by

> ->get_performance_state() for the parent domain of a device is used for

> all domains in parent hierarchy.


Please clarify a bit on this. What exactly does this mean?

>

> Tested-by: Rajendra Nayak <rnayak@codeaurora.org>

> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

> ---

>  drivers/base/power/domain.c | 223 ++++++++++++++++++++++++++++++++++++++++++++

>  include/linux/pm_domain.h   |  22 +++++

>  2 files changed, 245 insertions(+)

>

> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c

> index 71c95ad808d5..d506be9ff1f7 100644

> --- a/drivers/base/power/domain.c

> +++ b/drivers/base/power/domain.c

> @@ -466,6 +466,229 @@ static int genpd_dev_pm_qos_notifier(struct notifier_block *nb,

>         return NOTIFY_DONE;

>  }

>

> +/*

> + * Returns true if anyone in genpd's parent hierarchy has

> + * set_performance_state() set.

> + */

> +static bool genpd_has_set_performance_state(struct generic_pm_domain *genpd)

> +{


So this function will be become in-directly called by generic drivers
that supports DVFS of the genpd for their devices.

I think the data you validate here would be better to be pre-validated
at pm_genpd_init() and at pm_genpd_add|remove_subdomain() and the
result stored in a variable in the genpd struct. Especially when a
subdomain is added, that is a point when you can verify the
*_performance_state() callbacks, and thus make sure it's a correct
setup from the topology point of view.

> +       struct gpd_link *link;

> +

> +       if (genpd->set_performance_state)

> +               return true;

> +

> +       list_for_each_entry(link, &genpd->slave_links, slave_node) {

> +               if (genpd_has_set_performance_state(link->master))

> +                       return true;

> +       }

> +

> +       return false;

> +}

> +

> +/**

> + * pm_genpd_has_performance_state - Checks if power domain does performance

> + * state management.

> + *

> + * @dev: Device whose power domain is getting inquired.

> + *

> + * This must be called by the user drivers, before they start calling

> + * pm_genpd_update_performance_state(), to guarantee that all dependencies are

> + * met and the device's genpd supports performance states.

> + *

> + * It is assumed that the user driver guarantees that the genpd wouldn't be

> + * detached while this routine is getting called.

> + *

> + * Returns "true" if device's genpd supports performance states, "false"

> + * otherwise.

> + */

> +bool pm_genpd_has_performance_state(struct device *dev)

> +{

> +       struct generic_pm_domain *genpd = genpd_lookup_dev(dev);

> +

> +       /* The parent domain must have set get_performance_state() */


This comment is wrong. This is not about *parent* domains.

Instead I think it should say: "The genpd must have the
->get_performance_state() assigned." ...

> +       if (!IS_ERR(genpd) && genpd->get_performance_state) {

> +               if (genpd_has_set_performance_state(genpd))

> +                       return true;


... while this is about verifying that some of the domains (genpds) in
domain topology has a ->set_performance_state() callback.

In other words, either the genpd or some of its masters must have a
->set_performance_state() callback.

That makes me wonder: what will happen if there are more than one
master having a ->set_perfomance_state() callback assigned? I guess
that is non-allowed configuration?

> +

> +               /*

> +                * A genpd with ->get_performance_state() callback must also

> +                * allow setting performance state.

> +                */

> +               dev_err(dev, "genpd doesn't support setting performance state\n");

> +       }

> +

> +       return false;

> +}

> +EXPORT_SYMBOL_GPL(pm_genpd_has_performance_state);

> +

> +/*

> + * Re-evaluate performance state of a power domain. Finds the highest requested

> + * performance state by the devices and subdomains within the power domain and

> + * then tries to change its performance state. If the power domain doesn't have

> + * a set_performance_state() callback, then we move the request to its parent

> + * power domain.

> + *

> + * Locking: Access (or update) to device's "pd_data->performance_state" field

> + * happens only with parent domain's lock held. Subdomains have their


What is a *parent* domain here?

In genpd we try to use the terminology of master- and sub-domains.
Could you re-phrase this to get some clarity on what you try to
explain from the above?

> + * "genpd->performance_state" protected with their own lock (and they are the

> + * only user of this field) and their per-parent-domain

> + * "link->performance_state" field is protected with individual parent power

> + * domain's lock and is only accessed/updated with that lock held.

> + */


I recall we discussed this off-list, but I am not sure this was the
conclusion on how to solve the locking. :-) More comments below.

> +static int genpd_update_performance_state(struct generic_pm_domain *genpd,

> +                                         int depth)

> +{

> +       struct generic_pm_domain_data *pd_data;

> +       struct generic_pm_domain *master;

> +       struct pm_domain_data *pdd;

> +       unsigned int state = 0, prev;

> +       struct gpd_link *link;

> +       int ret;

> +

> +       /* Traverse all devices within the domain */

> +       list_for_each_entry(pdd, &genpd->dev_list, list_node) {

> +               pd_data = to_gpd_data(pdd);

> +

> +               if (pd_data->performance_state > state)

> +                       state = pd_data->performance_state;

> +       }


I don't think walking the list of devices for each master domain is
necessary, unless the aggregated performance state from the subdomain
was increased.

> +

> +       /* Traverse all subdomains within the domain */

> +       list_for_each_entry(link, &genpd->master_links, master_node) {

> +               if (link->performance_state > state)

> +                       state = link->performance_state;

> +       }

> +


From a locking point of view we always traverse the topology from
bottom an up. In other words, we walk the genpd's ->slave_links, and
lock the masters in the order the are defined via the slave_links
list. The order is important to avoid deadlocks. I don't think you
should walk the master_links as being done above, especially not
without using locks.

> +       if (genpd->performance_state == state)

> +               return 0;

> +

> +       if (genpd->set_performance_state) {

> +               ret = genpd->set_performance_state(genpd, state);

> +               if (!ret)

> +                       genpd->performance_state = state;

> +

> +               return ret;


This looks wrong.

I think you should continue to walk upwards in the domain topology, as
there may be some other master than needs to get its performance state
updated.

> +       }

> +

> +       /*

> +        * Not all domains support updating performance state. Move on to their

> +        * parent domains in that case.


/s/parent/master

> +        */

> +       prev = genpd->performance_state;

> +

> +       list_for_each_entry(link, &genpd->slave_links, slave_node) {

> +               master = link->master;

> +

> +               genpd_lock_nested(master, depth + 1);

> +

> +               link->performance_state = state;

> +               ret = genpd_update_performance_state(master, depth + 1);

> +               if (ret)

> +                       link->performance_state = prev;

> +

> +               genpd_unlock(master);

> +

> +               if (ret)

> +                       goto err;

> +       }

> +


A general comment is that I think you should look more closely in the
code of genpd_power_off|on(). And also how it calls the
->power_on|off() callbacks.

Depending whether you want to update the performance state of the
master domain before the subdomain or the opposite, you will find one
of them being suited for this case as well.

> +       /*

> +        * The parent domains are updated now, lets get genpd performance_state

> +        * in sync with those.

> +        */

> +       genpd->performance_state = state;

> +       return 0;

> +

> +err:

> +       list_for_each_entry_continue_reverse(link, &genpd->slave_links,

> +                                            slave_node) {

> +               master = link->master;

> +

> +               genpd_lock_nested(master, depth + 1);

> +               link->performance_state = prev;

> +               if (genpd_update_performance_state(master, depth + 1))

> +                       pr_err("%s: Failed to roll back to %d performance state\n",

> +                              genpd->name, prev);

> +               genpd_unlock(master);

> +       }

> +

> +       return ret;

> +}

> +

> +static int __dev_update_performance_state(struct device *dev, int state)


Please use the prefix genpd, _genpd_ or __genpd for static functions.

> +{

> +       struct generic_pm_domain_data *gpd_data;

> +       int ret;

> +

> +       spin_lock_irq(&dev->power.lock);


Actually there is no need to use this lock.

Because you hold the genpd lock here, then the device can't be removed
from its genpd and thus there is always a valid gpd_data.

> +

> +       if (!dev->power.subsys_data || !dev->power.subsys_data->domain_data) {

> +               ret = -ENODEV;

> +               goto unlock;

> +       }

> +

> +       gpd_data = to_gpd_data(dev->power.subsys_data->domain_data);

> +

> +       ret = gpd_data->performance_state;

> +       gpd_data->performance_state = state;

> +

> +unlock:

> +       spin_unlock_irq(&dev->power.lock);

> +

> +       return ret;

> +}

> +

> +/**

> + * pm_genpd_update_performance_state - Update performance state of device's

> + * parent power domain for the target frequency for the device.

> + *

> + * @dev: Device for which the performance-state needs to be adjusted.

> + * @rate: Device's next frequency. This can be set as 0 when the device doesn't

> + * have any performance state constraints left (And so the device wouldn't

> + * participate anymore to find the target performance state of the genpd).

> + *

> + * This must be called by the user drivers (as many times as they want) only

> + * after pm_genpd_has_performance_state() is called (only once) and that

> + * returned "true".

> + *

> + * It is assumed that the user driver guarantees that the genpd wouldn't be

> + * detached while this routine is getting called.

> + *

> + * Returns 0 on success and negative error values on failures.

> + */

> +int pm_genpd_update_performance_state(struct device *dev, unsigned long rate)

> +{

> +       struct generic_pm_domain *genpd = dev_to_genpd(dev);

> +       int ret, state;

> +

> +       if (IS_ERR(genpd))

> +               return -ENODEV;

> +

> +       genpd_lock(genpd);

> +

> +       state = genpd->get_performance_state(dev, rate);

> +       if (state < 0) {

> +               ret = state;

> +               goto unlock;

> +       }

> +

> +       state = __dev_update_performance_state(dev, state);

> +       if (state < 0) {

> +               ret = state;

> +               goto unlock;

> +       }

> +

> +       ret = genpd_update_performance_state(genpd, 0);

> +       if (ret)

> +               __dev_update_performance_state(dev, state);

> +

> +unlock:

> +       genpd_unlock(genpd);

> +

> +       return ret;

> +}

> +EXPORT_SYMBOL_GPL(pm_genpd_update_performance_state);

> +

>  /**

>   * genpd_power_off_work_fn - Power off PM domain whose subdomain count is 0.

>   * @work: Work structure used for scheduling the execution of this function.

> diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h

> index b7803a251044..bf90177208a2 100644

> --- a/include/linux/pm_domain.h

> +++ b/include/linux/pm_domain.h

> @@ -63,8 +63,12 @@ struct generic_pm_domain {

>         unsigned int device_count;      /* Number of devices */

>         unsigned int suspended_count;   /* System suspend device counter */

>         unsigned int prepared_count;    /* Suspend counter of prepared devices */

> +       unsigned int performance_state; /* Max requested performance state */

>         int (*power_off)(struct generic_pm_domain *domain);

>         int (*power_on)(struct generic_pm_domain *domain);

> +       int (*get_performance_state)(struct device *dev, unsigned long rate);

> +       int (*set_performance_state)(struct generic_pm_domain *domain,

> +                                    unsigned int state);

>         struct gpd_dev_ops dev_ops;

>         s64 max_off_time_ns;    /* Maximum allowed "suspended" time. */

>         bool max_off_time_changed;

> @@ -99,6 +103,9 @@ struct gpd_link {

>         struct list_head master_node;

>         struct generic_pm_domain *slave;

>         struct list_head slave_node;

> +

> +       /* Sub-domain's per-parent domain performance state */

> +       unsigned int performance_state;


How about aggregate_dvfs_state, and move it to the struct generic_pm_domain.

Because I think you only need to keep track one aggregated state per
domain and per each subdomain, right?

>  };

>

>  struct gpd_timing_data {

> @@ -118,6 +125,7 @@ struct generic_pm_domain_data {

>         struct pm_domain_data base;

>         struct gpd_timing_data td;

>         struct notifier_block nb;

> +       unsigned int performance_state;

>         void *data;

>  };

>

> @@ -148,6 +156,9 @@ extern int pm_genpd_remove(struct generic_pm_domain *genpd);

>

>  extern struct dev_power_governor simple_qos_governor;

>  extern struct dev_power_governor pm_domain_always_on_gov;

> +extern bool pm_genpd_has_performance_state(struct device *dev);

> +extern int pm_genpd_update_performance_state(struct device *dev,

> +                                            unsigned long rate);


Please move these above governor defines, as to be consistent with
below changes.

>  #else

>

>  static inline struct generic_pm_domain_data *dev_gpd_data(struct device *dev)

> @@ -185,6 +196,17 @@ static inline int pm_genpd_remove(struct generic_pm_domain *genpd)

>         return -ENOTSUPP;

>  }

>

> +static inline bool pm_genpd_has_performance_state(struct device *dev)

> +{

> +       return false;

> +}

> +

> +static inline int pm_genpd_update_performance_state(struct device *dev,

> +                                                   unsigned long rate)

> +{

> +       return -ENOTSUPP;

> +}

> +

>  #define simple_qos_governor            (*(struct dev_power_governor *)(NULL))

>  #define pm_domain_always_on_gov                (*(struct dev_power_governor *)(NULL))

>  #endif

> --

> 2.13.0.71.gd7076ec9c9cb

>


Huh, I hope my comments make some sense. It's a bit of tricky code,
when walking the domain topology and I think there are couple
optimization we can do while doing that. Hopefully you can understand
most of my comments at least. :-)

Feel free to ping me on IRC, if further explanations is needed.

Kind regards
Uffe
Ulf Hansson July 21, 2017, 8:35 a.m. | #2
[...]

>

>> What happens when a power domain gets powered off and then on. Is the

>> performance state restored? Please elaborate a bit on this.

>

> Can this happen while the genpd is still in use? If not then we

> wouldn't have a problem here as the users of it would have revoked

> their constraints by now.


This depends on how drivers are dealing with runtime PM in conjunction
with the new pm_genpd_update_performance_state().

In case you don't want to manage some of this in genpd, then each
driver will have to drop their constraints every time they are about
to runtime suspend its device. And restore them at runtime resume.

To me, that's seems like a bad idea. Then it's better to make genpd
deal with this - somehow.

[...]

>>

>> I think a better name of this function is:

>> dev_pm_genpd_has_performance_state(). What do you think?

>

> Sure.

>

>> We might even want to decide to explicitly stay using the terminology

>> "DVFS" instead. In such case, perhaps converting the names of the

>> callbacks/API to use "dvfs" instead. For the API added here, maybe

>> dev_pm_genpd_can_dvfs().

>

> I am not sure about that really. Because in most of the cases genpd

> wouldn't do any freq switching, but only voltage.


Fair enough, let's stick with "performance" then. However, then please
make sure to not mention DVFS in the changelog/comments as it could be
confusing.

>

>> > Note that, the performance level as returned by

>> > ->get_performance_state() for the parent domain of a device is used for

>> > all domains in parent hierarchy.

>>

>> Please clarify a bit on this. What exactly does this mean?

>

> For a hierarchy like this:

>

> PPdomain 0               PPdomain 1

>    |                        |

>    --------------------------

>                 |

>              Pdomain

>                 |

>               device

>

> ->dev_get_performance_state(dev) would be called for the device and it

> will return a single value (X) representing the performance index of

> its parent ("Pdomain"). But the direct parent domain may not support


This is not parent or parent domain, but just "domain" or perhaps "PM
domain" to make it clear.

> setting of performance index and so we need to propagate the call to

> parents of Pdomain. And that would be PPdomain 0 and 1.


Use "master domains" instead.

>

> Now the paragraph in the commit says that the same performance index

> value X will be used for both these PPdomains, as we don't want to

> make things more complex to begin with.


Alright, I get it, thanks!

>

>> > Tested-by: Rajendra Nayak <rnayak@codeaurora.org>

>> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

>> > ---

>> >  drivers/base/power/domain.c | 223 ++++++++++++++++++++++++++++++++++++++++++++

>> >  include/linux/pm_domain.h   |  22 +++++

>> >  2 files changed, 245 insertions(+)

>> >

>> > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c

>> > index 71c95ad808d5..d506be9ff1f7 100644

>> > --- a/drivers/base/power/domain.c

>> > +++ b/drivers/base/power/domain.c

>> > @@ -466,6 +466,229 @@ static int genpd_dev_pm_qos_notifier(struct notifier_block *nb,

>> >         return NOTIFY_DONE;

>> >  }

>> >

>> > +/*

>> > + * Returns true if anyone in genpd's parent hierarchy has

>> > + * set_performance_state() set.

>> > + */

>> > +static bool genpd_has_set_performance_state(struct generic_pm_domain *genpd)

>> > +{

>>

>> So this function will be become in-directly called by generic drivers

>> that supports DVFS of the genpd for their devices.

>>

>> I think the data you validate here would be better to be pre-validated

>> at pm_genpd_init() and at pm_genpd_add|remove_subdomain() and the

>> result stored in a variable in the genpd struct. Especially when a

>> subdomain is added, that is a point when you can verify the

>> *_performance_state() callbacks, and thus make sure it's a correct

>> setup from the topology point of view.

>

> Something like this ?

>

> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c

> index 4a898e095a1d..182c1911ea9c 100644

> --- a/drivers/base/power/domain.c

> +++ b/drivers/base/power/domain.c

> @@ -466,25 +466,6 @@ static int genpd_dev_pm_qos_notifier(struct notifier_block *nb,

>         return NOTIFY_DONE;

>  }

>

> -/*

> - * Returns true if anyone in genpd's parent hierarchy has

> - * set_performance_state() set.

> - */

> -static bool genpd_has_set_performance_state(struct generic_pm_domain *genpd)

> -{

> -       struct gpd_link *link;

> -

> -       if (genpd->set_performance_state)

> -               return true;

> -

> -       list_for_each_entry(link, &genpd->slave_links, slave_node) {

> -               if (genpd_has_set_performance_state(link->master))

> -                       return true;

> -       }

> -

> -       return false;

> -}

> -

>  /**

>   * pm_genpd_has_performance_state - Checks if power domain does performance

>   * state management.

> @@ -507,7 +488,7 @@ bool pm_genpd_has_performance_state(struct device *dev)

>

>         /* The parent domain must have set get_performance_state() */

>         if (!IS_ERR(genpd) && genpd->get_performance_state) {

> -               if (genpd_has_set_performance_state(genpd))

> +               if (genpd->can_set_performance_state)

>                         return true;

>

>                 /*

> @@ -1594,6 +1575,8 @@ static int genpd_add_subdomain(struct generic_pm_domain *genpd,

>         if (genpd_status_on(subdomain))

>                 genpd_sd_counter_inc(genpd);

>

> +       subdomain->can_set_performance_state += genpd->can_set_performance_state;

> +

>   out:

>         genpd_unlock(genpd);

>         genpd_unlock(subdomain);

> @@ -1654,6 +1637,8 @@ int pm_genpd_remove_subdomain(struct generic_pm_domain *genpd,

>                 if (genpd_status_on(subdomain))

>                         genpd_sd_counter_dec(genpd);

>

> +               subdomain->can_set_performance_state -= genpd->can_set_performance_state;

> +

>                 ret = 0;

>                 break;

>         }

> @@ -1721,6 +1706,7 @@ int pm_genpd_init(struct generic_pm_domain *genpd,

>         genpd->max_off_time_changed = true;

>         genpd->provider = NULL;

>         genpd->has_provider = false;

> +       genpd->can_set_performance_state = !!genpd->set_performance_state;

>         genpd->domain.ops.runtime_suspend = genpd_runtime_suspend;

>         genpd->domain.ops.runtime_resume = genpd_runtime_resume;

>         genpd->domain.ops.prepare = pm_genpd_prepare;

> diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h

> index bf90177208a2..995d0cb1bc14 100644

> --- a/include/linux/pm_domain.h

> +++ b/include/linux/pm_domain.h

> @@ -64,6 +64,7 @@ struct generic_pm_domain {

>         unsigned int suspended_count;   /* System suspend device counter */

>         unsigned int prepared_count;    /* Suspend counter of prepared devices */

>         unsigned int performance_state; /* Max requested performance state */

> +       unsigned int can_set_performance_state; /* Number of parent domains supporting set state */

>         int (*power_off)(struct generic_pm_domain *domain);

>         int (*power_on)(struct generic_pm_domain *domain);

>         int (*get_performance_state)(struct device *dev, unsigned long rate);

>


Yes!

On top of that change, you could also add some validation if the
get/set callbacks is there are any constraints on how they must be
assigned.

[...]

>> That makes me wonder: what will happen if there are more than one

>> master having a ->set_perfomance_state() callback assigned? I guess

>> that is non-allowed configuration?

>

> This patch supports them at least. Device's domain can have multiple

> masters which require this configuration, but the same performance

> index will be used for all of them (for simplicity unless we have a

> real example to serve).


Okay!

[...]

>> What is a *parent* domain here?

>

> Same crappy wording I have been using. Its about device's genpd.

>

>> In genpd we try to use the terminology of master- and sub-domains.

>> Could you re-phrase this to get some clarity on what you try to

>> explain from the above?

>

> Yeah, sure.

>

> So do we call a device's power domain as its master domain? I thought

> that master is only used in context of sub-domains, right ?


Correct. A master domain should be used in context of sub-domain. In
most cases we also use "PM domain" instead of "power domain" is it has
a slightly different meaning.

>>

>> From a locking point of view we always traverse the topology from

>> bottom an up. In other words, we walk the genpd's ->slave_links, and

>> lock the masters in the order the are defined via the slave_links

>> list. The order is important to avoid deadlocks. I don't think you

>> should walk the master_links as being done above, especially not

>> without using locks.

>

> So we need to look at the performance states of the subdomains of a

> master. The way it is done in this patch with help of

> link->performance_state, we don't need that locking while traversing

> the master_links list. Here is how:

>

> - Master's (genpd) master_links list is only updated under master's

>   lock, which we have already taken here. So master_links list can't

>   get updated concurrently.

>

> - The link->performance_state field of a subdomain (or slave) is only

>   updated from within the master's lock. And we are reading it here

>   from the same lock.

>

> AFAIU, there shouldn't be any deadlocks or locking issues here. Can

> you describe some case that may blow up ?


My main concern is the order of how you take the looks. We never take
a masters lock before the current domain lock.

And when walking the topology, we use the slave links and locks the
first master from that list. Continues with that tree, then get back
to slave list and pick the next master.

If you change that order, we could end getting deadlocks.

[...]

>> A general comment is that I think you should look more closely in the

>> code of genpd_power_off|on(). And also how it calls the

>> ->power_on|off() callbacks.

>>

>> Depending whether you want to update the performance state of the

>> master domain before the subdomain or the opposite, you will find one

>> of them being suited for this case as well.

>

> Isn't it very much similar to that already ? The only major difference

> is link->performance_state and I already explained why is it required

> to be done that way to avoid deadlocks.


No, because you walk the master lists. Thus getting a different order or locks.

I did some drawing of this, using the slave links, and I don't see any
issues why you can't use that instead.

[...]

>> > +{

>> > +       struct generic_pm_domain_data *gpd_data;

>> > +       int ret;

>> > +

>> > +       spin_lock_irq(&dev->power.lock);

>>

>> Actually there is no need to use this lock.

>>

>> Because you hold the genpd lock here, then the device can't be removed

>> from its genpd and thus there is always a valid gpd_data.

>

> I am afraid we still need this lock.

>

> genpd_free_dev_data() is called from genpd_remove_device() without

> genpd lock and so it is possible that we reach here after that lock is

> dropped from genpd_remove_device() but before genpd_free_dev_data() is

> called.

>

> Right ?


I must had something else in mind, because you are absolutely correct.

Sorry for the noise.

[...]

Kind regards
Uffe
Viresh Kumar Aug. 2, 2017, 8:21 a.m. | #3
On 31-07-17, 09:44, Viresh Kumar wrote:
> On 29-07-17, 10:24, Ulf Hansson wrote:

> > Let's invent a new genpd flag, GENPD_FLAG_PERF_STATE!

> > 

> > The creator of the genpd then needs to set this before calling

> > pm_genpd_init(). Similar as we are dealing with GENPD_FLAG_PM_CLK.

> > 

> > The requirement for GENPD_FLAG_PERF_STATES, is to have the

> > ->get_performance_state() assigned. This shall be verified during

> > pm_genpd_init().

> > 

> > The pm_genpd_has_performance_state() then only need to return true, in

> > cases the device's genpd has GENPD_FLAG_PERF_STATE set - else false.

> > 

> > Regarding ->set_performance_state(), let's just make it optional - and

> > when trying to set a new performance state, just walk the genpd

> > hierarchy, from bottom to up, then invoke the callback when it's

> > assigned.

> 

> Sounds good.


Actually, I don't think we need this flag at all. The presence of the
get_performance_state() callback itself can be used as a flag here instead of
defining a new one.

As with both, your above solution and my solution, we pretty much don't check
presence of set_performance_state() callbacks in the master hierarchy. If its
present, we call it, else nothing happens.

-- 
viresh

Patch

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 71c95ad808d5..d506be9ff1f7 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -466,6 +466,229 @@  static int genpd_dev_pm_qos_notifier(struct notifier_block *nb,
 	return NOTIFY_DONE;
 }
 
+/*
+ * Returns true if anyone in genpd's parent hierarchy has
+ * set_performance_state() set.
+ */
+static bool genpd_has_set_performance_state(struct generic_pm_domain *genpd)
+{
+	struct gpd_link *link;
+
+	if (genpd->set_performance_state)
+		return true;
+
+	list_for_each_entry(link, &genpd->slave_links, slave_node) {
+		if (genpd_has_set_performance_state(link->master))
+			return true;
+	}
+
+	return false;
+}
+
+/**
+ * pm_genpd_has_performance_state - Checks if power domain does performance
+ * state management.
+ *
+ * @dev: Device whose power domain is getting inquired.
+ *
+ * This must be called by the user drivers, before they start calling
+ * pm_genpd_update_performance_state(), to guarantee that all dependencies are
+ * met and the device's genpd supports performance states.
+ *
+ * It is assumed that the user driver guarantees that the genpd wouldn't be
+ * detached while this routine is getting called.
+ *
+ * Returns "true" if device's genpd supports performance states, "false"
+ * otherwise.
+ */
+bool pm_genpd_has_performance_state(struct device *dev)
+{
+	struct generic_pm_domain *genpd = genpd_lookup_dev(dev);
+
+	/* The parent domain must have set get_performance_state() */
+	if (!IS_ERR(genpd) && genpd->get_performance_state) {
+		if (genpd_has_set_performance_state(genpd))
+			return true;
+
+		/*
+		 * A genpd with ->get_performance_state() callback must also
+		 * allow setting performance state.
+		 */
+		dev_err(dev, "genpd doesn't support setting performance state\n");
+	}
+
+	return false;
+}
+EXPORT_SYMBOL_GPL(pm_genpd_has_performance_state);
+
+/*
+ * Re-evaluate performance state of a power domain. Finds the highest requested
+ * performance state by the devices and subdomains within the power domain and
+ * then tries to change its performance state. If the power domain doesn't have
+ * a set_performance_state() callback, then we move the request to its parent
+ * power domain.
+ *
+ * Locking: Access (or update) to device's "pd_data->performance_state" field
+ * happens only with parent domain's lock held. Subdomains have their
+ * "genpd->performance_state" protected with their own lock (and they are the
+ * only user of this field) and their per-parent-domain
+ * "link->performance_state" field is protected with individual parent power
+ * domain's lock and is only accessed/updated with that lock held.
+ */
+static int genpd_update_performance_state(struct generic_pm_domain *genpd,
+					  int depth)
+{
+	struct generic_pm_domain_data *pd_data;
+	struct generic_pm_domain *master;
+	struct pm_domain_data *pdd;
+	unsigned int state = 0, prev;
+	struct gpd_link *link;
+	int ret;
+
+	/* Traverse all devices within the domain */
+	list_for_each_entry(pdd, &genpd->dev_list, list_node) {
+		pd_data = to_gpd_data(pdd);
+
+		if (pd_data->performance_state > state)
+			state = pd_data->performance_state;
+	}
+
+	/* Traverse all subdomains within the domain */
+	list_for_each_entry(link, &genpd->master_links, master_node) {
+		if (link->performance_state > state)
+			state = link->performance_state;
+	}
+
+	if (genpd->performance_state == state)
+		return 0;
+
+	if (genpd->set_performance_state) {
+		ret = genpd->set_performance_state(genpd, state);
+		if (!ret)
+			genpd->performance_state = state;
+
+		return ret;
+	}
+
+	/*
+	 * Not all domains support updating performance state. Move on to their
+	 * parent domains in that case.
+	 */
+	prev = genpd->performance_state;
+
+	list_for_each_entry(link, &genpd->slave_links, slave_node) {
+		master = link->master;
+
+		genpd_lock_nested(master, depth + 1);
+
+		link->performance_state = state;
+		ret = genpd_update_performance_state(master, depth + 1);
+		if (ret)
+			link->performance_state = prev;
+
+		genpd_unlock(master);
+
+		if (ret)
+			goto err;
+	}
+
+	/*
+	 * The parent domains are updated now, lets get genpd performance_state
+	 * in sync with those.
+	 */
+	genpd->performance_state = state;
+	return 0;
+
+err:
+	list_for_each_entry_continue_reverse(link, &genpd->slave_links,
+					     slave_node) {
+		master = link->master;
+
+		genpd_lock_nested(master, depth + 1);
+		link->performance_state = prev;
+		if (genpd_update_performance_state(master, depth + 1))
+			pr_err("%s: Failed to roll back to %d performance state\n",
+			       genpd->name, prev);
+		genpd_unlock(master);
+	}
+
+	return ret;
+}
+
+static int __dev_update_performance_state(struct device *dev, int state)
+{
+	struct generic_pm_domain_data *gpd_data;
+	int ret;
+
+	spin_lock_irq(&dev->power.lock);
+
+	if (!dev->power.subsys_data || !dev->power.subsys_data->domain_data) {
+		ret = -ENODEV;
+		goto unlock;
+	}
+
+	gpd_data = to_gpd_data(dev->power.subsys_data->domain_data);
+
+	ret = gpd_data->performance_state;
+	gpd_data->performance_state = state;
+
+unlock:
+	spin_unlock_irq(&dev->power.lock);
+
+	return ret;
+}
+
+/**
+ * pm_genpd_update_performance_state - Update performance state of device's
+ * parent power domain for the target frequency for the device.
+ *
+ * @dev: Device for which the performance-state needs to be adjusted.
+ * @rate: Device's next frequency. This can be set as 0 when the device doesn't
+ * have any performance state constraints left (And so the device wouldn't
+ * participate anymore to find the target performance state of the genpd).
+ *
+ * This must be called by the user drivers (as many times as they want) only
+ * after pm_genpd_has_performance_state() is called (only once) and that
+ * returned "true".
+ *
+ * It is assumed that the user driver guarantees that the genpd wouldn't be
+ * detached while this routine is getting called.
+ *
+ * Returns 0 on success and negative error values on failures.
+ */
+int pm_genpd_update_performance_state(struct device *dev, unsigned long rate)
+{
+	struct generic_pm_domain *genpd = dev_to_genpd(dev);
+	int ret, state;
+
+	if (IS_ERR(genpd))
+		return -ENODEV;
+
+	genpd_lock(genpd);
+
+	state = genpd->get_performance_state(dev, rate);
+	if (state < 0) {
+		ret = state;
+		goto unlock;
+	}
+
+	state = __dev_update_performance_state(dev, state);
+	if (state < 0) {
+		ret = state;
+		goto unlock;
+	}
+
+	ret = genpd_update_performance_state(genpd, 0);
+	if (ret)
+		__dev_update_performance_state(dev, state);
+
+unlock:
+	genpd_unlock(genpd);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(pm_genpd_update_performance_state);
+
 /**
  * genpd_power_off_work_fn - Power off PM domain whose subdomain count is 0.
  * @work: Work structure used for scheduling the execution of this function.
diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
index b7803a251044..bf90177208a2 100644
--- a/include/linux/pm_domain.h
+++ b/include/linux/pm_domain.h
@@ -63,8 +63,12 @@  struct generic_pm_domain {
 	unsigned int device_count;	/* Number of devices */
 	unsigned int suspended_count;	/* System suspend device counter */
 	unsigned int prepared_count;	/* Suspend counter of prepared devices */
+	unsigned int performance_state;	/* Max requested performance state */
 	int (*power_off)(struct generic_pm_domain *domain);
 	int (*power_on)(struct generic_pm_domain *domain);
+	int (*get_performance_state)(struct device *dev, unsigned long rate);
+	int (*set_performance_state)(struct generic_pm_domain *domain,
+				     unsigned int state);
 	struct gpd_dev_ops dev_ops;
 	s64 max_off_time_ns;	/* Maximum allowed "suspended" time. */
 	bool max_off_time_changed;
@@ -99,6 +103,9 @@  struct gpd_link {
 	struct list_head master_node;
 	struct generic_pm_domain *slave;
 	struct list_head slave_node;
+
+	/* Sub-domain's per-parent domain performance state */
+	unsigned int performance_state;
 };
 
 struct gpd_timing_data {
@@ -118,6 +125,7 @@  struct generic_pm_domain_data {
 	struct pm_domain_data base;
 	struct gpd_timing_data td;
 	struct notifier_block nb;
+	unsigned int performance_state;
 	void *data;
 };
 
@@ -148,6 +156,9 @@  extern int pm_genpd_remove(struct generic_pm_domain *genpd);
 
 extern struct dev_power_governor simple_qos_governor;
 extern struct dev_power_governor pm_domain_always_on_gov;
+extern bool pm_genpd_has_performance_state(struct device *dev);
+extern int pm_genpd_update_performance_state(struct device *dev,
+					     unsigned long rate);
 #else
 
 static inline struct generic_pm_domain_data *dev_gpd_data(struct device *dev)
@@ -185,6 +196,17 @@  static inline int pm_genpd_remove(struct generic_pm_domain *genpd)
 	return -ENOTSUPP;
 }
 
+static inline bool pm_genpd_has_performance_state(struct device *dev)
+{
+	return false;
+}
+
+static inline int pm_genpd_update_performance_state(struct device *dev,
+						    unsigned long rate)
+{
+	return -ENOTSUPP;
+}
+
 #define simple_qos_governor		(*(struct dev_power_governor *)(NULL))
 #define pm_domain_always_on_gov		(*(struct dev_power_governor *)(NULL))
 #endif