mbox series

[V4,0/9] PM / Domains: Implement domain performance states

Message ID cover.1490001099.git.viresh.kumar@linaro.org
Headers show
Series PM / Domains: Implement domain performance states | expand

Message

Viresh Kumar March 20, 2017, 9:32 a.m. UTC
Hi,

The main feedback I got for the V3 series came from Kevin, who suggested
that we should reuse OPP tables for genpd devices as well, instead of
creating a new table type. And that's what this version is trying to do.

Some platforms have the capability to configure the performance state of
their power domains. The process of configuring the performance state is
pretty much platform dependent and we may need to work with a wide range
of configurables.  For some platforms, like Qcom, it can be a positive
integer value alone, while in other cases it can be voltage levels, etc.

The power-domain framework until now was only designed for the idle
state management of the device and this needs to change in order to
reuse the power-domain framework for active state management of the
devices.

This series adapts the genpd and OPP frameworks to allow OPP tables to
be used for the genpd devices as well.

The first 2 patches update the DT bindings of the power-domains and OPP
tables. And the other 7 patches implement the details in QoS, genpd and
OPP frameworks.

This is tested currently by hacking the kernel a bit with virtual
power-domains for the dual A15 exynos platform. The earlier version of
patches was also tested by Rajendra Nayak (Qcom) on *real* Qualcomm
hardware for which this work is getting done. And so this version should
work as well.

Here is sample DT and C code we need to write for platforms:

DT:
---

/ {
	domain_opp_table: opp_table0 {
		compatible = "operating-points-v2";

		opp@1 {
			domain-performance-state = <1>;
			opp-microvolt = <975000 970000 985000>;
		};
		opp@2 {
			domain-performance-state = <2>;
			opp-microvolt = <1075000 1000000 1085000>;
		};
	};

	foo_domain: power-controller@12340000 {
		compatible = "foo,power-controller";
		reg = <0x12340000 0x1000>;
		#power-domain-cells = <0>;
		operating-points-v2 = <&domain_opp_table>;
	}

	cpu0_opp_table: opp_table1 {
		compatible = "operating-points-v2";
		opp-shared;

		opp@1000000000 {
			opp-hz = /bits/ 64 <1000000000>;
			domain-performance-state = <1>;
		};
		opp@1100000000 {
			opp-hz = /bits/ 64 <1100000000>;
			domain-performance-state = <2>;
		};
		opp@1200000000 {
			opp-hz = /bits/ 64 <1200000000>;
			domain-performance-state = <2>;
		};
	};

	cpus {
		#address-cells = <1>;
		#size-cells = <0>;

		cpu@0 {
			compatible = "arm,cortex-a9";
			reg = <0>;
			clocks = <&clk_controller 0>;
			clock-names = "cpu";
			operating-points-v2 = <&cpu0_opp_table>;
			power-domains = <&foo_domain>;
		};
	};
};

Driver code:
------------

static int pd_performance(struct generic_pm_domain *domain, unsigned int state)
{
	struct dev_pm_opp *opp;

	opp = dev_pm_opp_find_dps(&domain->dev, state, true);

	/* Use OPP and state in platform specific way */

	return 0;
}

static const struct of_device_id pm_domain_of_match[] __initconst = {
       { .compatible = "foo,genpd", },
       { },
};

static int __init genpd_test_init(void)
{
       struct device *dev = get_cpu_device(0);
       struct device_node *np;
       const struct of_device_id *match;
       int n;
       int ret;

       for_each_matching_node_and_match(np, pm_domain_of_match, &match) {
               pd.name = kstrdup_const(strrchr(np->full_name, '/') + 1,
                               GFP_KERNEL);
               if (!pd.name) {
                       of_node_put(np);
                       return -ENOMEM;
               }

               pd.set_performance_state = pd_performance;

               pm_genpd_init(&pd, NULL, false);
               of_genpd_add_provider_simple(np, &pd);
       }

       ret = dev_pm_domain_attach(dev, false);

       return ret;
}

Pushed here as well:

https://git.linaro.org/people/viresh.kumar/linux.git/log/?h=opp/genpd-performance-state

V3->V4:
- Use OPP table for genpd devices as well.
- Add struct device to genpd, in order to reuse OPP infrastructure.
- Based over: https://marc.info/?l=linux-kernel&m=148972988002317&w=2
- Fixed examples in DT document to have voltage in target,min,max order.

V2->V3:
- Based over latest pm/linux-next
- Bindings and code are merged together
- Lots of updates in bindings
  - the performance-states node is present within the power-domain now,
    instead of its phandle.
  - performance-level property is replaced by "reg".
  - domain-performance-state property of the consumers contain an
    integer value now instead of phandle.
- Lots of updates to the code as well
  - Patch "PM / QOS: Add default case to the switch" is merged with
    other patches and the code is changed a bit as well.
  - Don't pass 'type' to dev_pm_qos_add_notifier(), rather handle all
    notifiers with a single list. A new patch is added for that.
  - The OPP framework patch can be applied now and has proper SoB from
    me.
  - Dropped "PM / domain: Save/restore performance state at runtime
    suspend/resume".
  - Drop all WARN().
  - Tested-by Rajendra nayak.

V1->V2:
- Based over latest pm/linux-next
- It is mostly a resend of what is sent earlier as this series hasn't
  got any reviews so far and Rafael suggested that its better I resend
  it.
- Only the 4/6 patch got an update, which was shared earlier as reply to
  V1 as well. It has got several fixes for taking care of power domain
  hierarchy, etc.

--
viresh


Viresh Kumar (9):
  PM / OPP: Allow OPP table to be used for power-domains
  PM / Domains: Use OPP tables for power-domains
  PM / QOS: Keep common notifier list for genpd constraints
  PM / QOS: Add DEV_PM_QOS_PERFORMANCE request
  PM / OPP: Add support to parse OPP table for power-domains
  PM / OPP: Add dev_pm_opp_find_dps() helper
  PM / domain: Register for PM QOS performance notifier
  PM / Domain: Add struct device to genpd
  PM / Domain: Add support to parse domain's OPP table

 Documentation/devicetree/bindings/opp/opp.txt      |  73 ++++++-
 .../devicetree/bindings/power/power_domain.txt     |  42 ++++
 Documentation/power/pm_qos_interface.txt           |   2 +-
 drivers/base/power/domain.c                        | 183 ++++++++++++++--
 drivers/base/power/opp/core.c                      | 229 +++++++++++++++++++--
 drivers/base/power/opp/debugfs.c                   |   9 +-
 drivers/base/power/opp/of.c                        |  80 ++++++-
 drivers/base/power/opp/opp.h                       |  14 ++
 drivers/base/power/qos.c                           |  36 +++-
 include/linux/pm_domain.h                          |   6 +
 include/linux/pm_opp.h                             |   8 +
 include/linux/pm_qos.h                             |  17 ++
 kernel/power/qos.c                                 |   2 +-
 13 files changed, 646 insertions(+), 55 deletions(-)

-- 
2.12.0.432.g71c3a4f4ba37

Comments

Viresh Kumar April 10, 2017, 9:25 a.m. UTC | #1
On 24-03-17, 10:44, Rob Herring wrote:
> On Mon, Mar 20, 2017 at 03:02:13PM +0530, Viresh Kumar wrote:

> > Power-domains need to express their active states in DT and what's

> > better than OPP table for that.

> > 

> > This patch allows power-domains to reuse OPP tables to express their

> > active states. The "opp-hz" property isn't a required property anymore

> > as power-domains may not always use them.

> 

> Then maybe you shouldn't be trying to make OPP table work here. At that 

> point you just need a table of voltage(s) per performance state?


Because that's what Kevin strongly recommended in the previous
versions.

@Kevin: Would you like to reply here ?

> > Add a new property "domain-performance-state", which will contain

> > positive integer values to represent performance levels of the

> > power-domains as described in this patch.

> 

> Why not reference the OPP entries from the domain:

> 

> performance-states = <&opp1>, <&opp2>;


Because that would require additional code in the OPP core to parse
these then. Right now it is quite straight forward with the bindings I
presented.

> Just thinking out loud, not saying that is what you should do. The 

> continual evolution of power (management) domain, idle state, and OPP 

> bindings is getting tiring.


I agree :)

-- 
viresh
Viresh Kumar April 12, 2017, 2:24 p.m. UTC | #2
On 20-03-17, 15:02, Viresh Kumar wrote:
> Hi,

> 

> The main feedback I got for the V3 series came from Kevin, who suggested

> that we should reuse OPP tables for genpd devices as well, instead of

> creating a new table type. And that's what this version is trying to do.

> 

> Some platforms have the capability to configure the performance state of

> their power domains. The process of configuring the performance state is

> pretty much platform dependent and we may need to work with a wide range

> of configurables.  For some platforms, like Qcom, it can be a positive

> integer value alone, while in other cases it can be voltage levels, etc.

> 

> The power-domain framework until now was only designed for the idle

> state management of the device and this needs to change in order to

> reuse the power-domain framework for active state management of the

> devices.


Hi Ulf/Kevin,

Over 3 weeks since the time this version was posted :(
Can we get some reviews of this stuff and decide on how we are
supposed to proceed on this ?

Its getting delayed a lot unnecessarily. Thanks.

-- 
viresh
Sudeep Holla April 12, 2017, 4:49 p.m. UTC | #3
On 20/03/17 09:32, Viresh Kumar wrote:
> Power-domains need to express their active states in DT and what's

> better than OPP table for that.

> 

> This patch allows power-domains to reuse OPP tables to express their

> active states. The "opp-hz" property isn't a required property anymore

> as power-domains may not always use them.

> 

> Add a new property "domain-performance-state", which will contain

> positive integer values to represent performance levels of the

> power-domains as described in this patch.

> 

> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

> ---

>  Documentation/devicetree/bindings/opp/opp.txt | 73 ++++++++++++++++++++++++++-

>  1 file changed, 71 insertions(+), 2 deletions(-)

> 

> diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt

> index 63725498bd20..d0b95c9e1011 100644

> --- a/Documentation/devicetree/bindings/opp/opp.txt

> +++ b/Documentation/devicetree/bindings/opp/opp.txt

> @@ -76,10 +76,9 @@ This describes the OPPs belonging to a device. This node can have following

>  This defines voltage-current-frequency combinations along with other related

>  properties.

>  

> -Required properties:

> +Optional properties:

>  - opp-hz: Frequency in Hz, expressed as a 64-bit big-endian integer.

>  

> -Optional properties:

>  - opp-microvolt: voltage in micro Volts.

>  

>    A single regulator's voltage is specified with an array of size one or three.

> @@ -154,6 +153,19 @@ properties.

>  

>  - status: Marks the node enabled/disabled.

>  

> +- domain-performance-state: A positive integer value representing the minimum

> +  power-domain performance level required by the device for the OPP node. The


So the above definition is when this field in in the device node rather
than the OPP table entry, right ? For simplicity why not have the
properties named slightly different or just use phandle to an entry in
the device node for this purpose.

> +  The integer value '0' represents the lowest performance level and the higher

> +  values represent higher performance levels. 


needs to be changed as OPP table entry.

>  When present in the OPP table of a

> + power-domain, it represents the performance level of the domain. When present


again "performance level of the domain corresponding to that OPP entry"
on something similar

> +  in the OPP table of a normal device, it represents the performance level of


what do you mean by normal device ? needs description as that's
something new introduced here.

> +  the parent power-domain. The OPP table can contain the

> +  "domain-performance-state" property, only if the device node contains the

> +  "power-domains" or "#power-domain-cells" property. 


Why such a restriction ?

> The OPP nodes aren't

> +  allowed to contain the "domain-performance-state" property partially, i.e.

> +  Either all OPP nodes in the OPP table have the "domain-performance-state"

> +  property or none of them have it.

> +

>  Example 1: Single cluster Dual-core ARM cortex A9, switch DVFS states together.

>  

>  / {

> @@ -528,3 +540,60 @@ Example 5: opp-supported-hw

>  		};

>  	};

>  };

> +

> +Example 7: domain-Performance-state:

> +(example: For 1GHz require domain state 1 and for 1.1 & 1.2 GHz require state 2)

> +

> +/ {

> +	domain_opp_table: opp_table0 {

> +		compatible = "operating-points-v2";

> +

> +		opp@1 {

> +			domain-performance-state = <1>;

> +			opp-microvolt = <975000 970000 985000>;

> +		};

> +		opp@2 {

> +			domain-performance-state = <2>;

> +			opp-microvolt = <1075000 1000000 1085000>;

> +		};

> +	};

> +

> +	foo_domain: power-controller@12340000 {

> +		compatible = "foo,power-controller";

> +		reg = <0x12340000 0x1000>;

> +		#power-domain-cells = <0>;

> +		operating-points-v2 = <&domain_opp_table>;


How does it scale with power domain providers with multiple power domain ?

> +	}

> +

> +	cpu0_opp_table: opp_table1 {

> +		compatible = "operating-points-v2";

> +		opp-shared;

> +

> +		opp@1000000000 {

> +			opp-hz = /bits/ 64 <1000000000>;

> +			domain-performance-state = <1>;

> +		};

> +		opp@1100000000 {

> +			opp-hz = /bits/ 64 <1100000000>;

> +			domain-performance-state = <2>;

> +		};

> +		opp@1200000000 {

> +			opp-hz = /bits/ 64 <1200000000>;

> +			domain-performance-state = <2>;

> +		};

> +	};

> +

> +	cpus {

> +		#address-cells = <1>;

> +		#size-cells = <0>;

> +

> +		cpu@0 {

> +			compatible = "arm,cortex-a9";

> +			reg = <0>;

> +			clocks = <&clk_controller 0>;

> +			clock-names = "cpu";

> +			operating-points-v2 = <&cpu0_opp_table>;


Do we ignore operating-points-v2 above as this device/cpu node contains
power domain which has operating-points-v2 property ? In other words
how do they correlate ?

> +			power-domains = <&foo_domain>;

> +		};

> +	};

> +};

> 


-- 
Regards,
Sudeep
Viresh Kumar April 19, 2017, 11:47 a.m. UTC | #4
On 18-04-17, 17:01, Sudeep Holla wrote:
> Understood. I would incline towards reusing regulators we that's what is


It can be just a regulator, but it can be anything else as well. That
entity may have its own clock/volt/current tunables, etc.

> changed behind the scene. Calling this operating performance point

> is misleading and doesn't align well with existing specs/features.


Yeah, but there are no voltage levels available here and that doesn't
fit as a regulator then.

> Understood. We have exactly same thing with SCPI but it controls both

> frequency and voltage referred as operating points. In general, this OPP

> terminology is used in SCPI/ACPI/SCMI specifications as both frequency

> and voltage control. I am bit worried that this binding might introduce

> confusions on the definitions. But it can be reworded/renamed easily if

> required.


Yeah, so far we have been looking at OPPs as freq-voltage pairs ONLY
and that is changing. I am not sure if it going in the wrong
direction really. Without frequency also it is an operating point for
the domain. Isn't it?

-- 
viresh
Viresh Kumar April 20, 2017, 9:52 a.m. UTC | #5
On 20-04-17, 10:43, Sudeep Holla wrote:
> Just that the term performance is closely related to frequency, it needs

> to be explicit on what *exactly* it means. As it stands now,

> it can be used for OPP as I explain which controls both but as you

> clarify that's not what it's designed for.


We are talking about active states of a power domain here and
*performance* is the best word I got. And yes we can still have
frequency as a configurable here, just that current platforms don't
have it.

> I am not sure if choosing highest performance point makes it difficult

> to fit it in regulator framework. It could be some configuration.


I was just pointing out a difference :)

> Also IIUC the actual programming is done in the firmware in this case

> and I fail to see how that adds lot of platform code.


Oh I meant that for converting general regulator only cases to OPP. No
firmware was involved there.

-- 
viresh
Rajendra Nayak April 26, 2017, 4:32 a.m. UTC | #6
> On 17/04/17 06:27, Viresh Kumar wrote:

>> On 13-04-17, 14:42, Sudeep Holla wrote:

>>> What I was referring is about power domain provider with multiple power

>>> domains(simply #power-domain-cells=<1> case as explained in the

>>> power-domain specification.

>>

>> I am not sure if we should be looking to target such a situation for now, as

>> that would be like this:

>>

>> Device controlled by Domain A. Domain A itself is controlled by Domain B and

>> Domain C.

>>

> 

> No, may be I was not so clear. I am just referring a power controller

> that provides say 3 different power domains and are indexed 0 - 2.

> The consumer just passes the index along with the phandle for the power

> domain info.

> 

>> Though we will end up converting the domain-performance-state property to an

>> array if that is required in near future.

>>

> 

> OK, better to document that so that we know how to extend it. We have

> #power-domain-cells=<1> on Juno with SCPI.

> 

>>> Yes. To simplify what not we just have power-domain for a device and

>>> change state of that domain to change the performance of that device.

>>

>> Consider this case to understand what I have in Mind.

>>

>> The power domain have its states as A, B, C, D. There can be multiple devices

>> regulated by that domain and one of the devices have its power states as: A1,

>> A2, A3, B1, B2, B3, C1, C2, C3, D1, D2, D3 and all these states have different

>> frequency/voltages.

>>

>> IOW, the devices can have regulators as well and may want to fine tune within

>> the domain performance-state.

>>

> 

> Understood. I would incline towards reusing regulators we that's what is

> changed behind the scene. Calling this operating performance point

> is misleading and doesn't align well with existing specs/features.


[]...
 
>>> If we are looking this power-domains with performance as just some

>>> *advanced regulators*, I don't like the complexity added.


+ Mark

I don;t see any public discussions on why we ruled out using regulators to
support this but maybe there were some offline discussions on this.

Mark, this is a long thread, so just summarizing here to give you the context.

At qualcomm, we have an external M3 core (running its own firmware) which controls
a few voltage rails (including AVS on those). The devices vote for the voltage levels
(or performance levels) they need by passing an integer value to the M3 (not actual
voltage values). Since that didn't fit well with the existing regulator apis it was
proposed we look at modeling these as powerdomain performance levels (and reuse genpd
framework) which is what this series from Viresh is about.

Since the discussion now is moving towards 'why not use regulator framework for this
instead of adding all the complexity with powerdomain performance levels since
these are regulators underneath', I looped you in so you can provide some feedback
on can these really be modeled as some *advanced regulators* with some apis to set some
regulator performance levels (instead of voltage levels).

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
Mark Brown April 26, 2017, 1:55 p.m. UTC | #7
On Wed, Apr 26, 2017 at 10:02:39AM +0530, Rajendra Nayak wrote:
> > On 17/04/17 06:27, Viresh Kumar wrote:


> >>> If we are looking this power-domains with performance as just some

> >>> *advanced regulators*, I don't like the complexity added.


> + Mark


> I don;t see any public discussions on why we ruled out using regulators to

> support this but maybe there were some offline discussions on this.


> Mark, this is a long thread, so just summarizing here to give you the context.


> At qualcomm, we have an external M3 core (running its own firmware) which controls

> a few voltage rails (including AVS on those). The devices vote for the voltage levels

> (or performance levels) they need by passing an integer value to the M3 (not actual

> voltage values). Since that didn't fit well with the existing regulator apis it was


As I'm getting fed up of saying: if the values you are setting are not
voltages and do not behave like voltages then the hardware should not be
represented as a voltage regulator since if they are represented as
voltage regulators things will expect to be able to control them as
voltage regulators.  This hardware is quite clearly providing OPPs
directly, I would expect this to be handled in the OPP code somehow.
Mark Brown May 14, 2017, 9:55 a.m. UTC | #8
On Wed, May 03, 2017 at 12:21:54PM +0100, Sudeep Holla wrote:
> On 30/04/17 13:49, Mark Brown wrote:


> > DT doesn't much care about that though.


> No sure about that, may be doesn't care about the internals, but we need

> to care about interface, no ?


Well, we need to at least describe what's there - my point is that it's
no different to describing a piece of hardware, the fact that it happens
to be implemented as firmware doesn't really change the abstraction
level DT is operating at.