[V2,5/5] PM / Domains: Propagate performance state updates

Message ID 5a29ceb2704239a8efb4db2e47ca155a422a6e57.1543219386.git.viresh.kumar@linaro.org
State New
Headers show
  • Untitled series #16659
Related show

Commit Message

Viresh Kumar Nov. 26, 2018, 8:10 a.m.
This commit updates genpd core to start propagating performance state
updates to master domains that have their set_performance_state()
callback set.

A genpd handles two type of performance states now. The first one is the
performance state requirement put on the genpd by the devices and
sub-domains under it, which is already represented by
genpd->performance_state. The second type, introduced in this commit, is
the performance state requirement(s) put by the genpd on its master
domain(s). There is a separate value required for each master that the
genpd has and so a new field is added to the struct gpd_link
(link->performance_state), which represents the link between a genpd and
its master. The struct gpd_link also got another field
prev_performance_state, which is used by genpd core as a temporary
variable during transitions.

We need to propagate setting performance state while powering-on a genpd
as well, as we ignore performance state requirements from sub-domains
which are powered-off. For this reason _genpd_power_on() also received
the additional parameter, depth, which is used for hierarchical locking
within genpd.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

 drivers/base/power/domain.c | 113 ++++++++++++++++++++++++++++++------
 include/linux/pm_domain.h   |   4 ++
 2 files changed, 98 insertions(+), 19 deletions(-)



Viresh Kumar Nov. 30, 2018, 11:06 a.m. | #1
On 30-11-18, 11:18, Ulf Hansson wrote:
> On Fri, 30 Nov 2018 at 10:59, Viresh Kumar <viresh.kumar@linaro.org> wrote:

> > Sure, but the ordering of locks is always subdomain first and then master.

> > Considering the case of Qcom, we have two domains Cx (sub-domain) and Mx (master).

> >

> > On first genpd_power_on(Cx) call, we will first call genpd_power_on(Mx) which

> > will just power it on as none of its master will have perf-state support. We

> > then call _genpd_power_on(Cx) which will also not do anything with Mx as its own

> > (Cx's) pstate would be 0 at that time. But even if it had a valid value, it will

> > propagate just fine with all proper locking in place.


> Can you explain that, it's not super easy to follow the flow.

Sorry, I somehow assumed you would know it already :)

> So what will happen if Cx has a value that needs to be propagated?

> What locks will be taken, and in what order?


> Following, what if we had a Bx domain, being the subdomain of Cx, and

> it too had a value that needs to be propagated.

Lets take the worst example, we have Bx (sub-domain of Cx), Cx (sub-domain of
Mx) and Dx (master). Normal power-on/off will always have the values 0, so lets
consider resume sequence where all the domains will have a value pstate value.
And please forgive me for any bugs I have introduced in the following
super-complex sequence :)

genpd_runtime_resume(dev) //domain Bx
    -> genpd_lock(Bx)
    -> genpd_power_on(Bx)
        -> genpd_lock(Cx)
        -> genpd_power_on(Cx)
            -> genpd_lock(Dx)
            -> genpd_power_on(Dx)
                -> _genpd_power_on(Dx)
                    -> _genpd_set_performance_state(Dx, Dxstate) {
                        //Doesn't have any masters
                        -> genpd->set_performance_state(Dx, Dxstate);
            -> genpd_unlock(Dx)
            -> _genpd_power_on(Cx)
                -> _genpd_set_performance_state(Cx, Cxstate) {
                    //have one master, Dx
                    -> genpd_lock(Dx)
                        -> _genpd_set_performance_state(Dx, Dxstate) {
                            //Doesn't have any masters
                            -> genpd->set_performance_state(Dx, Dxstate);
                    -> genpd_unlock(Dx)
                    // Change Cx state
                    -> genpd->set_performance_state(Cx, Cxstate);
        -> genpd_unlock(Cx)
        -> _genpd_power_on(Bx)
            -> _genpd_set_performance_state(Bx, Bxstate) {
                //have one master, Cx
                -> genpd_lock(Cx)
                -> _genpd_set_performance_state(Cx, Cxstate) {
                    //have one master, Dx
                    -> genpd_lock(Dx)
                        -> _genpd_set_performance_state(Dx, Dxstate) {
                            //Doesn't have any masters
                            -> genpd->set_performance_state(Dx, Dxstate);
                    -> genpd_unlock(Dx)
                    // Change Cx state
                    -> genpd->set_performance_state(Cx, Cxstate);
                -> genpd_unlock(Cx)
                -> genpd->set_performance_state(Bx, Bxstate);
    -> genpd_unlock(Bx)

> It sounds like we will

> do the propagation one time per level. Is that really necessary,

> couldn't we just do it once, after the power on sequence have been

> completed?

It will be a BIG hack somewhere, isn't it ? How will we know when has the time
come to shoot the final sequence of set_performance_state() ? And where will we
do it? genpd_runtime_resume() ? And then we will have more problems, for example
Rajendra earlier compared this stuff to clk framework where it is possible to do
clk_set_rate() first and then only call clk_enable() and the same should be
possible with genpd as well, i.e. set performance state first and then only
enable the device/domain. And so we need this right within genpd_power_on().

I know things are repetitive here, but that's the right way of doing it IMHO.
What do you say ?

Ulf Hansson Dec. 5, 2018, 5:29 p.m. | #2
On Wed, 5 Dec 2018 at 07:42, Stephen Boyd <sboyd@kernel.org> wrote:

> Quoting Ulf Hansson (2018-12-03 05:38:35)

> > + Stephen, Mike, Graham

> >

> > On Fri, 30 Nov 2018 at 12:06, Viresh Kumar <viresh.kumar@linaro.org> wrote:

> > >

> > > On 30-11-18, 11:18, Ulf Hansson wrote:

> > There is one a big difference while comparing with clocks, which make

> > this more difficult.

> >

> > That is, in dev_pm_genpd_set_performance_state(), we are *not* calling

> > ->the set_performance_state() callback of the genpd, unless the genpd

> > is already powered on. Instead, for that case, we are only aggregating

> > the performance states votes, to defer to invoke

> > ->set_performance_state() until the genpd becomes powered on. In some

> > way this makes sense, but for clock_set_rate(), the clock's rate can

> > be changed, no matter if the clock has been prepared/enabled or not.

> >

> > I recall we discussed this behavior of genpd, while introducing the

> > performance states support to it. Reaching this point, introducing the

> > master-domain propagation of performance states votes, we may need to

> > re-consider the behavior, as there is evidently an overhead that grows

> > along with the hierarchy.

> >

> > As a matter of fact, what I think this boils to, is to consider if we

> > want to temporary drop the performance state vote for a device from

> > genpd's ->runtime_suspend() callback. Thus, also restore the vote from

> > genpd's ->runtime_resume() callback. That's because, this is directly

> > related to whether genpd should care about whether it's powered on or

> > off, when calling the ->set_performance_state(). We have had

> > discussions at LKML already around this topic. It seems like we need

> > to pick them up to reach a consensus, before we can move forward with

> > this.


> What are we trying to gain consensus on exactly?

Apologize for not being more clear. Let me re-phrase the question.

Should genpd (via genpd's internal runtime suspend callback) reset the
performance state for a device to zero, when that device becomes
runtime suspended?

Of course, if such reset is done, then genpd should also re-compute
the aggregation of the performance states for the corresponding PM
domain (and its master domains). Consequentially, these operations
then needs to be reversed, when the device in question becomes runtime


> I've been thinking that we would want there to be a performance state

> 'target' for the 'devices are runtime suspended' state that we apply

> when the genpd is powered down from runtime PM suspend of all devices in

> the domain. This would then be overwritten with whatever the aggregated

> performance state is when the genpd is powered back on due to some

> device runtime resuming.

From the above, I assume "target" to not always be zero. That confirms
my understanding from earlier discussions.

That is sort of what we already have today. The consumer driver are
free to change the performance state, via
dev_pm_genpd_set_performance_state() (or indirectly via
dev_pm_opp_set_rate()), for its device at any time. For example, it
may want to set a new performance state at runtime suspend.

> Don't we already sort of have support for this

> right now in genpd with commit fc5cbf0c94b6 ("PM / Domains: Support for

> multiple states")? So I would think that we either let that state_idx

> member be 0 or some implementation defined 'off' state index of the

> genpd that can be achieved.

Well, implementation details, but thanks for sharing your ideas.

> I was also thinking that the genpd_power_state was more than just a

> bunch of idle states of a device so that we could combine

> on/off/idle/active(performance states?) into one genpd_power_state

> space that is affected by idle and on/off runtime PM status.


> In the Qualcomm use-case, the state of the clk, either on or off, is

> factored into the decision to consider the voltage constraint that the

> clk has on the CX domain. So when the clk is disabled, the voltage

> constraint on CX can be removed/ignored in the aggregation equation

> because the hardware isn't clocking. This needs to work properly so that

> devices can disable their clks and save power.


> I hope that we can move clk on/off/rate control to be implemented with

> clk domains based upon genpds so that we can generically reason about

> genpds being on/off and at some performance state (i.e. a frequency)

> instead of making something clk specific in the genpd code. If we can do

> that, then this can be stated as a slave genpd (the clk domain) that's

> linked to a master genpd (the voltage/corner domain) can only affect the

> performance state requirement of the master genpd when the slave genpd

> is on. When the slave genpd is off, the performance state requirement is

> dropped and the master domain settles to a lower requirement based on

> the other domains linked to it that are on. The clk domain would turn on

> and off when the set of devices it is attached to are all suspended and

> that would "do the right thing" by bubbling up the requirements to the

> master domain this way.

This sounds to me, like there are reasons to keep the existing
behavior of genpd, thus leaving the performance state as is during
runtime suspend/resume.

But, on the other hand it also seems like a common case to reset the
performance state of a device to zero at runtime suspend. This leads
to that lots of boilerplate code needs to be added to driver's
->runtime_suspend|resume() callbacks, calling
dev_pm_genpd_set_performance_state() or dev_pm_opp_set_rate().

Maybe we should just wait and see what happens to consumer drivers, as
they starts deploying support for this.

Kind regards


diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index d6b389a9f678..206e263abe90 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -239,24 +239,88 @@  static void genpd_update_accounting(struct generic_pm_domain *genpd)
 static inline void genpd_update_accounting(struct generic_pm_domain *genpd) {}
+static int _genpd_reeval_performance_state(struct generic_pm_domain *genpd,
+					   unsigned int state, int depth);
 static int _genpd_set_performance_state(struct generic_pm_domain *genpd,
-					unsigned int state)
+					unsigned int state, int depth)
+	struct generic_pm_domain *master;
+	struct gpd_link *link;
+	unsigned int mstate;
 	int ret;
+	/* Propagate to masters of genpd */
+	list_for_each_entry(link, &genpd->slave_links, slave_node) {
+		master = link->master;
+		if (!master->set_performance_state)
+			continue;
+		if (unlikely(!state)) {
+			mstate = 0;
+		} else {
+			/* Find master's performance state */
+			mstate = dev_pm_opp_xlate_performance_state(genpd->opp_table,
+					master->opp_table, state);
+			if (unlikely(!mstate)) {
+				ret = -EINVAL;
+				goto err;
+			}
+		}
+		genpd_lock_nested(master, depth + 1);
+		link->prev_performance_state = link->performance_state;
+		link->performance_state = mstate;
+		ret = _genpd_reeval_performance_state(master, mstate, depth + 1);
+		if (ret)
+			link->performance_state = link->prev_performance_state;
+		genpd_unlock(master);
+		if (ret)
+			goto err;
+	}
 	ret = genpd->set_performance_state(genpd, state);
 	if (ret)
-		return ret;
+		goto err;
 	genpd->performance_state = state;
 	return 0;
+	/* Encountered an error, lets rollback */
+	list_for_each_entry_continue_reverse(link, &genpd->slave_links,
+					     slave_node) {
+		master = link->master;
+		if (!master->set_performance_state)
+			continue;
+		genpd_lock_nested(master, depth + 1);
+		mstate = link->prev_performance_state;
+		link->performance_state = mstate;
+		if (_genpd_reeval_performance_state(master, mstate, depth + 1)) {
+			pr_err("%s: Failed to roll back to %d performance state\n",
+			       master->name, mstate);
+		}
+		genpd_unlock(master);
+	}
+	return ret;
 static int _genpd_reeval_performance_state(struct generic_pm_domain *genpd,
-					   unsigned int state)
+					   unsigned int state, int depth)
 	struct generic_pm_domain_data *pd_data;
 	struct pm_domain_data *pdd;
+	struct gpd_link *link;
 	/* New requested state is same as Max requested state */
 	if (state == genpd->performance_state)
@@ -274,18 +338,27 @@  static int _genpd_reeval_performance_state(struct generic_pm_domain *genpd,
 			state = pd_data->performance_state;
-	if (state == genpd->performance_state)
-		return 0;
-	 * We aren't propagating performance state changes of a subdomain to its
-	 * masters as we don't have hardware that needs it. Over that, the
-	 * performance states of subdomain and its masters may not have
-	 * one-to-one mapping and would require additional information. We can
-	 * get back to this once we have hardware that needs it. For that
-	 * reason, we don't have to consider performance state of the subdomains
-	 * of genpd here.
+	 * Traverse all powered-on subdomains within the domain. This can be
+	 * done without any additional locking as the link->performance_state
+	 * field is protected by the master genpd->lock, which is already taken.
+	 *
+	 * Also note that link->performance_state (subdomain's performance state
+	 * requirement to master domain) is different from
+	 * link->slave->performance_state (current performance state requirement
+	 * of the devices/sub-domains of the subdomain) and so can have a
+	 * different value.
+	list_for_each_entry(link, &genpd->master_links, master_node) {
+		if (!genpd_status_on(link->slave))
+			continue;
+		if (link->performance_state > state)
+			state = link->performance_state;
+	}
+	if (state == genpd->performance_state)
+		return 0;
 	if (!genpd_status_on(genpd)) {
@@ -293,7 +366,7 @@  static int _genpd_reeval_performance_state(struct generic_pm_domain *genpd,
 		return 0;
-	return _genpd_set_performance_state(genpd, state);
+	return _genpd_set_performance_state(genpd, state, depth);
@@ -337,7 +410,7 @@  int dev_pm_genpd_set_performance_state(struct device *dev, unsigned int state)
 	prev = gpd_data->performance_state;
 	gpd_data->performance_state = state;
-	ret = _genpd_reeval_performance_state(genpd, state);
+	ret = _genpd_reeval_performance_state(genpd, state, 0);
 	if (ret)
 		gpd_data->performance_state = prev;
@@ -347,7 +420,8 @@  int dev_pm_genpd_set_performance_state(struct device *dev, unsigned int state)
-static int _genpd_power_on(struct generic_pm_domain *genpd, bool timed)
+static int
+_genpd_power_on(struct generic_pm_domain *genpd, bool timed, int depth)
 	unsigned int state_idx = genpd->state_idx;
 	ktime_t time_start;
@@ -368,7 +442,8 @@  static int _genpd_power_on(struct generic_pm_domain *genpd, bool timed)
 	elapsed_ns = ktime_to_ns(ktime_sub(ktime_get(), time_start));
 	if (unlikely(genpd->set_performance_state)) {
-		ret = genpd->set_performance_state(genpd, genpd->performance_state);
+		ret = _genpd_set_performance_state(genpd,
+					genpd->performance_state, depth);
 		if (ret) {
 			pr_warn("%s: Failed to set performance state %d (%d)\n",
 				genpd->name, genpd->performance_state, ret);
@@ -558,7 +633,7 @@  static int genpd_power_on(struct generic_pm_domain *genpd, unsigned int depth)
-	ret = _genpd_power_on(genpd, true);
+	ret = _genpd_power_on(genpd, true, depth);
 	if (ret)
 		goto err;
@@ -963,7 +1038,7 @@  static void genpd_sync_power_on(struct generic_pm_domain *genpd, bool use_lock,
-	_genpd_power_on(genpd, false);
+	_genpd_power_on(genpd, false, depth);
 	genpd->status = GPD_STATE_ACTIVE;
diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h
index 9ad101362aef..dd364abb649a 100644
--- a/include/linux/pm_domain.h
+++ b/include/linux/pm_domain.h
@@ -136,6 +136,10 @@  struct gpd_link {
 	struct list_head master_node;
 	struct generic_pm_domain *slave;
 	struct list_head slave_node;
+	/* Sub-domain's per-master domain performance state */
+	unsigned int performance_state;
+	unsigned int prev_performance_state;
 struct gpd_timing_data {