diff mbox

[V4,1/3] OPP: Redefine bindings to overcome shortcomings

Message ID d225e73f183e01fa0b71e4b9248b6a19a3f7d697.1430394884.git.viresh.kumar@linaro.org
State New
Headers show

Commit Message

Viresh Kumar April 30, 2015, 12:07 p.m. UTC
Current OPP (Operating performance point) DT bindings are proven to be
insufficient at multiple instances.

The shortcomings we are trying to solve here:

- Getting clock sharing information between CPUs. Single shared clock vs
  independent clock per core vs shared clock per cluster.

- Support for specifying current levels along with voltages.

- Support for multiple regulators.

- Support for turbo modes.

- Other per OPP settings: transition latencies, disabled status, etc.?

- Expandability of OPPs in future.

This patch introduces new bindings "operating-points-v2" to get these problems
solved. Refer to the bindings for more details.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 Documentation/devicetree/bindings/power/opp.txt | 366 +++++++++++++++++++++++-
 1 file changed, 362 insertions(+), 4 deletions(-)

Comments

Viresh Kumar May 5, 2015, 10:48 a.m. UTC | #1
On 4 May 2015 at 17:42, Mark Brown <broonie@kernel.org> wrote:
> On Thu, Apr 30, 2015 at 05:37:59PM +0530, Viresh Kumar wrote:
>
>> +- opp-microamp: current in micro Amperes. It can contain entries for multiple
>> +  regulators.
>> +
>> +  A single regulator's current is specified with an array of size one or three.
>> +  Single entry is for target current and three entries are for <target min max>
>> +  currents.
>
> What is this for - are you trying to define OPPs for current regulators?
> If you are that's worrying, I can't think of a sensible use case.  If
> that's not what's happening then the binding needs to be more specific
> about what this is supposed to mean.

Its another property of the same voltage regulator we are using in
opp-microvolt.
I hope that makes sense ?

And that's why I wrote this as well:

+  Entries for multiple regulators must be present in the same order as
+  regulators are specified in device's DT node. If few regulators don't provide
+  capability to configure current, then values for then should be marked as
+  zero.

i.e. this is optional but the voltage regulator isn't..

But, perhaps I need to write it with more clarity? Or this field doesn't make
sense ?

FWIW, its an outcome of this request from Stephen:

https://www.marc.info/?l=linux-arm-kernel&m=142370250522589&w=3
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar May 12, 2015, 5:16 a.m. UTC | #2
Hi Nishanth,

On 10-05-15, 20:02, Nishanth Menon wrote:
> On 04/30/2015 07:07 AM, Viresh Kumar wrote:

> > +the liberty of choosing these. These triplets are called Operating Performance
> 
> I suggest we dont call them as triplets either -> note, we have added
> clk transition latency per OPP, so they are not exactly triplets either.

Sure.

> > +Points aka OPPs. This document defines bindings for these OPPs applicable across
> > +wide range of devices. For illustration purpose, this document uses CPU as a
> > +device.
> > +
> > +
> 
> Would be good to point to operating-points (the legacy document) as
> well. It might be good to state that only one of the properties should
> be used for the device node OPP description.

Hmm, okay.

> > +Optional properties:
> > +- shared-opp: Indicates that device nodes using this OPP descriptor's phandle
> > +  switch their DVFS state together, i.e. they share clock/voltage/current lines.
> > +  Missing property means devices have independent clock/voltage/current lines,
> > +  but they share OPP tables.
> 
> Is'nt this property redundant? by the fact that phandle is reused for
> cpu1 should indicate shared OPP table, correct?

There are two things we can share:
- OPP tables (And not the clock lines themselves). For example,
  consider a quad-core platform with independent clock/voltage lines
  for all CPUs but all the CPUs have same OPP table.

- Clock/voltage rails: This is described by the above property. I
  thought the examples should have made this clear, anything left
  there ?

> > +- turbo-mode: Marks the OPP to be used only for turbo modes.
> 
> Might be great to add some description for what "turbo mode" means here.

Okay.

> That said, flexibility of this approach is awesome, thanks for doing
> this, I believe we did have a discussion about "safe OPP" for thermal
> behavior -> now that we have phandles for OPPs, we  can give phandle
> pointer to the thermal framework. in addition, we can also use phandle
> for marking throttling information in thermal framework instead of
> indices as well. which allows a lot of flexibility.
> 
> in some systems, we'd have need to mark certain OPPs for entry into
> suspend(tegra?) or during shutdown (for example) - do we allow for such
> description as phandle pointer back to the OPP nodes (from cpu0 device)
> OR do we describe them as additional properties?

Haven't thought about it earlier. In case we need them, probably it
should go in the OPP table.

NOTE: Mike T. once told me that we shouldn't over-abuse the bindings
by putting every detail there. And I am not sure at all on how to draw
that line.

> > +- status: Marks the node enabled/disabled.
> 
> One question we'd probably want the new framework to address is the need
> for OPPs based on Efuse options - Am I correct in believing that the
> solution that iMX, and TI SoCs probably need is something similar to
> modifying the status to disabled? or do we have Vendor specfic modifier
> properties that would be allowed?

See PATCH 2/3 for that.
 
> One additional behavior need I noticed in various old threads is a need
> to restrict transition OPPs -> certain SoCs might have constraints on
> next OPP they can transition from certain OPPs in-order to achieve a new
> OPP. again, such behavior could be phandle lists OR properties that link
> the chain up together. Number of such needs did sound less, but still
> just thought to bring it up.

See 0/3 and 3/3 for that. Its present in my solution but Mike T. is
strictly against keeping that in OPP table. And so 3/3 may get Nak'd.

Thanks a lot for your reviews.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Turquette May 12, 2015, 9:42 p.m. UTC | #3
Quoting Viresh Kumar (2015-04-30 05:07:59)
> Current OPP (Operating performance point) DT bindings are proven to be
> insufficient at multiple instances.

Hi Viresh,

Sorry for chiming in so late in the process. Also, sorry for the long
email. Lots of repetition below:

Why should this new binding exist? Is Devicetree really the right place
to put all of this data? If DT is the right place for some users, some
of the time ... is it always the right place for all users, all of the
time?

> 
> The shortcomings we are trying to solve here:
> 
> - Getting clock sharing information between CPUs. Single shared clock vs
>   independent clock per core vs shared clock per cluster.
> 
> - Support for specifying current levels along with voltages.
> 
> - Support for multiple regulators.
> 
> - Support for turbo modes.
> 
> - Other per OPP settings: transition latencies, disabled status, etc.?

I agree that operating-points-v1 does not do this stuff. There is a need
to express this data, but is DT the right place to do so? Why doesn't a
driver contain this data?

> 
> - Expandability of OPPs in future.

This point above gives me heartburn. It appears that this binding is not
meant to be "sub-classed" by vendor-specific compatible strings. Please
let me know if I'm wrong on that.

The problem with this approach is that every little problem that someone
has with the binding will have to be solved by changing the generic opp
binding. I do not see how this can scale given the complexity of data,
programming sequences and other stuff associated with operating points
and dvfs transitions.

> 
> This patch introduces new bindings "operating-points-v2" to get these problems
> solved. Refer to the bindings for more details.

I like the existing operating points binding. It is very simple. The
data can be entirely encoded in DT and then used by the driver for
simple dvfs use cases.

Maybe we should just stop there and keep it simple? If the data needed
to express an operating point or dvfs transition is very complex, why
use DT for that?

We went through the same issue with the clock bindings where some people
(myself included) wanted to use DT as a data-driven interface into the
kernel. Turns out DT isn't great for that. What we have now is more
scalable: clock providers (usually a clock generator IP block) get a
node in DT. Clock consumers reference that provider node plus an index
value which corresponds to a specific clock signal that the downstream
node consumes. That's it. The binding basically only creates connections
across devices using phandles.

All of the data about clock rates, parents, programming sequences,
register definitions and bitfields, supported operations, etc all belong
in drivers.

For very simple clock providers (such as a fixed-rate crystal) we do
have a dedicated binding that allows for all of the data to belong in DT
(just the clock name and rate is required). I think this is analogous to
the existing operating-points-v1 today, which works nicely for the
simple case.

For the complex case, why not just create a link between the device that
provides the operating points (e.g. a pmic, or system controller, or
clock provider, or whatever) and the device that consumes them (e.g.
cpufreq driver, gpu driver, io controller driver, etc)? Leave all of the
data in C.

> 
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  Documentation/devicetree/bindings/power/opp.txt | 366 +++++++++++++++++++++++-
>  1 file changed, 362 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt
> index 74499e5033fc..3b67a5c8d965 100644
> --- a/Documentation/devicetree/bindings/power/opp.txt
> +++ b/Documentation/devicetree/bindings/power/opp.txt
> @@ -1,8 +1,366 @@
> -* Generic OPP Interface
> +Generic OPP (Operating Performance Points) Bindings
> +----------------------------------------------------
>  
> -SoCs have a standard set of tuples consisting of frequency and
> -voltage pairs that the device will support per voltage domain. These
> -are called Operating Performance Points or OPPs.
> +Devices work at voltage-current-frequency triplets and some implementations have
> +the liberty of choosing these. These triplets are called Operating Performance
> +Points aka OPPs. This document defines bindings for these OPPs applicable across
> +wide range of devices. For illustration purpose, this document uses CPU as a
> +device.
> +
> +
> +* Property: operating-points-v2
> +
> +Devices supporting OPPs must set their "operating-points-v2" property with
> +phandle to a OPP descriptor in their DT node. The OPP core will use this phandle
> +to find the operating points for the device.
> +
> +
> +* OPP Descriptor Node
> +
> +This describes the OPPs belonging to a device. This node can have following
> +properties:
> +
> +Required properties:
> +- compatible: Allow OPPs to express their compatibility. It should be:
> +  "operating-points-v2".
> +- OPP nodes: One or more OPP nodes describing voltage-current-frequency
> +  triplets. Their name isn't significant but their phandle can be used to
> +  reference an OPP.
> +
> +Optional properties:
> +- shared-opp: Indicates that device nodes using this OPP descriptor's phandle
> +  switch their DVFS state together, i.e. they share clock/voltage/current lines.
> +  Missing property means devices have independent clock/voltage/current lines,
> +  but they share OPP tables.

What is the behavior of not setting 'shared-opp'? The same table is
re-used by multiple consumers/devices?

I think a provider/consumer model works better here. E.g. if we have 4
cpus that scale independently then there would be 4 opp providers, each
provider corresponding to the unique frequency and voltage domains per
cpu. If multiple cpu nodes consume the same opp phandle then the sharing
becomes implicit: those cpus are in the same frequency and power
domains.

This is how it works for other resources, such as specifying a clock or
regulator in DT. If two device nodes reference that same resource then
it clear that they are using the same physical hardware. Having just a
table of data in a node that does not clearly map onto hardware (or a
software abstraction that provides access to that hardware) is not as
nice IMO.

> +
> +
> +* OPP Node
> +
> +This defines voltage-current-frequency triplets along with other related
> +properties.
> +
> +Required properties:
> +- opp-khz: Frequency in kHz
> +
> +Optional properties:
> +- opp-microvolt: voltage in micro Volts. It can contain entries for multiple
> +  regulators.
> +
> +  A single regulator's voltage is specified with an array of size one or three.
> +  Single entry is for target voltage and three entries are for <target min max>
> +  voltages.
> +
> +  Entries for multiple regulators must be present in the same order as
> +  regulators are specified in device's DT node.
> +
> +- opp-microamp: current in micro Amperes. It can contain entries for multiple
> +  regulators.
> +
> +  A single regulator's current is specified with an array of size one or three.
> +  Single entry is for target current and three entries are for <target min max>
> +  currents.
> +
> +  Entries for multiple regulators must be present in the same order as
> +  regulators are specified in device's DT node. If few regulators don't provide
> +  capability to configure current, then values for then should be marked as
> +  zero.
> +
> +- clock-latency-ns: Specifies the maximum possible transition latency (in
> +  nanoseconds) for switching to this OPP from any other OPP.
> +- turbo-mode: Marks the OPP to be used only for turbo modes.
> +- status: Marks the node enabled/disabled.

s/clock/transition/

Scaling the regulator can take longer than the clock. Better to reflect
that this value is capturing the total transition time.

> +
> +Example 1: Single cluster Dual-core ARM cortex A9, switch DVFS states together.
> +
> +/ {
> +       cpus {
> +               #address-cells = <1>;
> +               #size-cells = <0>;
> +
> +               cpu@0 {
> +                       compatible = "arm,cortex-a9";
> +                       reg = <0>;
> +                       next-level-cache = <&L2>;
> +                       clocks = <&clk_controller 0>;
> +                       clock-names = "cpu";
> +                       opp-supply = <&cpu_supply0>;
> +                       operating-points-v2 = <&cpu0_opp>;
> +               };
> +
> +               cpu@1 {
> +                       compatible = "arm,cortex-a9";
> +                       reg = <1>;
> +                       next-level-cache = <&L2>;
> +                       clocks = <&clk_controller 0>;
> +                       clock-names = "cpu";
> +                       opp-supply = <&cpu_supply0>;
> +                       operating-points-v2 = <&cpu0_opp>;
> +               };
> +       };
> +
> +       cpu0_opp: opp0 {
> +               compatible = "operating-points-v2";
> +               shared-opp;
> +
> +               entry00 {
> +                       opp-khz = <1000000>;
> +                       opp-microvolt = <970000 975000 985000>;
> +                       opp-microamp = <70000 75000 85000>;
> +                       clock-latency-ns = <300000>;
> +               };
> +               entry01 {
> +                       opp-khz = <1100000>;
> +                       opp-microvolt = <980000 1000000 1010000>;
> +                       opp-microamp = <80000 81000 82000>;
> +                       clock-latency-ns = <310000>;
> +               };
> +               entry02 {
> +                       opp-khz = <1200000>;
> +                       opp-microvolt = <1025000>;
> +                       clock-latency-ns = <290000>;
> +                       turbo-mode;
> +               };
> +       };
> +};

It seems wrong to me that the clock and supply data is owned by the cpu
node, and not the opp descriptor. Everything about the opp transition
should belong to a provider node. Then the cpu simply needs to consume
that via a phandle.

<snip>

> +Deprecated Bindings
> +-------------------
>  
>  Properties:
>  - operating-points: An array of 2-tuples items, and each item consists

I think we should keep these. They work for the simple case.

I know that DT bindings are notorious for bike shedding. But in this
case I'm not debating the color of the bike shed but instead questioning
whether we need the shed at all :-)

Regards,
Mike

> -- 
> 2.3.0.rc0.44.ga94655d
> 
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar May 13, 2015, 5:05 a.m. UTC | #4
On 12-05-15, 11:04, Nishanth Menon wrote:
> Just curious -> is'nt it better to just have min<->max range? binding
> as it stands right now is open to interpretation as to what will be
> attempted and in what sequence? with a valid min, target or max -
> is'nt it more power efficient always to go for a "min" than a target?
> 
> Further, min<->max implies anywhere in that range and is more
> consistent with "regulator like" description.

It came out after some discussions on the list, you may want to go
through that.

https://lists.linaro.org/pipermail/linaro-kernel/2015-January/019844.html

> True. one option might be to allow for vendor specific property
> extensions - that will let vendors add in additional quirky data
> custom to their SoCs as needed.

Yeah, I am planning to support them.
Viresh Kumar May 13, 2015, 8:55 a.m. UTC | #5
On 12-05-15, 14:42, Michael Turquette wrote:
> Quoting Viresh Kumar (2015-04-30 05:07:59)

> Sorry for chiming in so late in the process. Also, sorry for the long
> email. Lots of repetition below:

You are always welcome..

> Why should this new binding exist?

The answer to this particular query is perhaps simple, i.e. we have
unsolved problems that we wanted to solve in a generic way.

But probably the bigger question is "Should we really put the OPPs
(new or old bindings) in DT".

Lets start by agreeing on what can be kept in DT. AFAIU, anything that
describes the device in a OS independent way. Like:
  - base address
  - irq line + parent
  - clocks, regulators, etc..
  - What about things like register information? That's not what we do
    normally, but why?

    Perhaps because we want to get rid of redundancy as much as
    possible.
    - An implementation of the same device is going to be same across
      all platforms (unless the platform has tweaked it). And so there
      is no fun passing the same register information from multiple DT
      files.
    - And so we better keep things like register information, bit
      field descriptions, etc. in driver itself.
    - For example consider a proprietary controller that has three
      registers A, B and C. Now the bitwise description/behavior of
      all the registers in perfectly same for every implementation but
      the order of these in memory varies per implementation.

      Now we can keep the offsets of these registers in either DT or C
      code. If only few implementations are using it, then we might
      keep it in C code, but if there are 10-20 implementations using
      it in order ABC or BAC or CAB, then it will probably be a good
      option to try keeping these offsets in DT.

So its not really that we just describe connectivity between
devices/nodes in DT, it can be anything related to the device.

What if OPPs were kept in C code instead ?
- In most of the cases (not all of course), we will end up replicating
  code for platforms. But yes we can still do it and this is already
  allowed if you really think your platform is special.
- (Copied from booting-without-of.txt):
  "It also makes it more flexible for board vendors to do minor
  hardware upgrades without significantly impacting the kernel code or
  cluttering it with special cases."

> Is Devicetree really the right place
> to put all of this data? If DT is the right place for some users, some
> of the time ... is it always the right place for all users, all of the
> time?

This is just an attempt to reuse generic code. And platforms are free
to choose DT or non-DT way for keeping this information.

For some users it might be a good place to keep it, while for others
it may not be. For example, if a platform really needs to do some stuff
from its platform code, then it is free to do so.

> > - Expandability of OPPs in future.
> 
> This point above gives me heartburn. It appears that this binding is not
> meant to be "sub-classed" by vendor-specific compatible strings. Please
> let me know if I'm wrong on that.
> 
> The problem with this approach is that every little problem that someone
> has with the binding will have to be solved by changing the generic opp
> binding. I do not see how this can scale given the complexity of data,
> programming sequences and other stuff associated with operating points
> and dvfs transitions.

(I thought about it a bit more after our offline discussion):

If there is something generic enough, which is required by multiple
platforms, then of course we can make additions to the bindings.

But I believe we can 'sub-class' this by vendor-specific compatible
strings as well.

What I told you earlier (in our offline discussion) was that it isn't
allowed to put driver specific strings here to just choose a driver
amongst few available at runtime. For example, choosing between
cpufreq-dt or arm_big_little or any platform driver.

But if a Vendor do need few bindings just for his platforms, then we
can have a compatible sting for that. As the compatible string now
will express device's compatibility.

> I like the existing operating points binding. It is very simple. The
> data can be entirely encoded in DT and then used by the driver for
> simple dvfs use cases.

The new ones are also simple :)

> Maybe we should just stop there and keep it simple? If the data needed
> to express an operating point or dvfs transition is very complex, why
> use DT for that?

Its simple but not complete. And I don't really agree that things are
way too complex here. In some cases, yes they can be.

> > +Optional properties:
> > +- shared-opp: Indicates that device nodes using this OPP descriptor's phandle
> > +  switch their DVFS state together, i.e. they share clock/voltage/current lines.
> > +  Missing property means devices have independent clock/voltage/current lines,
> > +  but they share OPP tables.
> 
> What is the behavior of not setting 'shared-opp'? The same table is
> re-used by multiple consumers/devices?

Yes.

> I think a provider/consumer model works better here. E.g. if we have 4
> cpus that scale independently then there would be 4 opp providers, each
> provider corresponding to the unique frequency and voltage domains per
> cpu. If multiple cpu nodes consume the same opp phandle then the sharing
> becomes implicit: those cpus are in the same frequency and power
> domains.
> 
> This is how it works for other resources, such as specifying a clock or
> regulator in DT. If two device nodes reference that same resource then
> it clear that they are using the same physical hardware. Having just a
> table of data in a node that does not clearly map onto hardware (or a
> software abstraction that provides access to that hardware) is not as
> nice IMO.

I partially agree to what you are saying. It is written this way to
reduce redundancy in DT. For example, in a quad-core system with
independently scalable CPUs that have exactly same tables for all
CPUs, we do not want to replicate the same set of 20 OPPs, four times.

But perhaps we can improve it, based on your suggestion. The way I
have written it today:

/ {
	cpus {
		cpu@0 {
			operating-points-v2 = <&cpu_opp>;
		};

		cpu@1 {
			operating-points-v2 = <&cpu_opp>;
		};
	};

	cpu_opp: opp {
		compatible = "operating-points-v2";
		// shared-opp; Not an shared OPP

		entry00 {
			opp-khz = <1000000>;
		};
		entry01 {
			opp-khz = <1100000>;
		};
	};
};

And we can rewrite it as:

/ {
	cpus {
		cpu@0 {
			operating-points-v2 = <&opp0_provider>;
		};

		cpu@1 {
			operating-points-v2 = <&opp1_provider>;
		};
	};

        /* Table is shared between providers */
	opp_table: opp_table {
		entry00 {
			opp-khz = <1000000>;
		};
		entry01 {
			opp-khz = <1100000>;
		};
	};

	opp0_provider: opp0 {
		compatible = "operating-points-v2";
                opp_table = <&opp_table>;
	};

	opp1_provider: opp1 {
		compatible = "operating-points-v2";
                opp_table = <&opp_table>;
	};
};


But this is similar to what I proposed initially and people objected
to it.

> > +- clock-latency-ns: Specifies the maximum possible transition latency (in
> > +  nanoseconds) for switching to this OPP from any other OPP.
> > +- turbo-mode: Marks the OPP to be used only for turbo modes.
> > +- status: Marks the node enabled/disabled.
> 
> s/clock/transition/
> 
> Scaling the regulator can take longer than the clock. Better to reflect
> that this value is capturing the total transition time.

So this is how we are doing it today and I am not sure if we want to
get it from DT now.

	ret = regulator_set_voltage_time(cpu_reg, min_uV, max_uV);
	if (ret > 0)
		transition_latency += ret * 1000;

> It seems wrong to me that the clock and supply data is owned by the cpu
> node, and not the opp descriptor. Everything about the opp transition
> should belong to a provider node. Then the cpu simply needs to consume
> that via a phandle.

https://lists.linaro.org/pipermail/linaro-kernel/2014-December/019505.html

> > +Deprecated Bindings
> > +-------------------
> >  
> >  Properties:
> >  - operating-points: An array of 2-tuples items, and each item consists
> 
> I think we should keep these. They work for the simple case.

Hmm, maybe.
Mike Turquette May 14, 2015, 12:32 a.m. UTC | #6
Quoting Mark Brown (2015-05-13 04:03:57)
> On Wed, May 13, 2015 at 02:25:28PM +0530, Viresh Kumar wrote:
> > On 12-05-15, 14:42, Michael Turquette wrote:
> > > Quoting Viresh Kumar (2015-04-30 05:07:59)
> > 
> > > Why should this new binding exist?
> 
> > The answer to this particular query is perhaps simple, i.e. we have
> > unsolved problems that we wanted to solve in a generic way.
> 
> > But probably the bigger question is "Should we really put the OPPs
> > (new or old bindings) in DT".
> 
> And also is trying to do this in a completely generic manner the right
> way of going about things - do we really understand the problem area
> well enough to create a completely generic solution for all cases,
> bearing in mind that once things go into a DT binding they become an ABI?

No, we don't understand the problem space well enough to form an ABI.
And even if we did understand it perfectly today, the constraints and
expressions will change tomorrow.

Putting this stuff in C without any philosophical constraints on whether
or not we can change it later is the way to go.

Regards,
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar May 15, 2015, 2:15 p.m. UTC | #7
On 14-05-15, 03:25, Michael Turquette wrote:
> No, we don't understand the problem space well enough to form an ABI.

And why do you think so? We have been facing many problems since a
long time which we are trying to solve here.

I agree that it might not be right to try too many things which may
not be required later, but most of the things we have now in new
bindings are actually required.

> Putting this stuff in C without any philosophical constraints on whether
> or not we can change it later is the way to go.

I don't agree to that :)
Viresh Kumar May 20, 2015, 2:07 a.m. UTC | #8
On 19-05-15, 17:51, Stephen Boyd wrote:
> This looks like it covers my usecases. I'm not sure if I'm going to put
> all of this data in DT because of the philosophical question regarding
> what should be in DT and the technical decisions that some platforms I'm
> working on have made to encode parts of the OPPs in fuses, but at least
> the binding looks to support all the scenarios I have today.

Thanks.

> > +Required properties:
> > +- opp-khz: Frequency in kHz
> 
> Can this be in Hz? I thought there were some problems with kHz and Hz
> before where we wanted to make this into Hz so that rounding problems
> don't arise?

Yeah, good point.

> Also I wonder if all properties should be optional? I don't have this
> scenario today, but perhaps the frequencies could be encoded in fuses,
> but the voltages wouldn't be and so we might want to read out the
> frequencies for a fixed set of voltages. Of course, if there's nothing
> in the OPP node at all, it's not very useful, so perhaps some statement
> that at least one of the frequency/voltage/amperage properties should be
> present.

I am not sure. What we are trying to do (fill partially in DT and
partially in platform), is a trick and not the right use of bindings.

Ideally whatever is passed in DT should be complete by itself and
doesn't require platform to tweak it (which it can't). For example,
the cpufreq-dt driver will try to initialize OPPs from the DT directly
and wouldn't know about the platform tweaks. That can work eventually
as platform will add OPPs for the same bindings before cpufreq driver
will try to do, but that's a trick.

And then its all about frequency in the first place, and so marking
that optional looks wrong. Probably not the right use of these
bindings.

> > +  Entries for multiple regulators must be present in the same order as
> > +  regulators are specified in device's DT node. If few regulators don't provide
> > +  capability to configure current, then values for then should be marked as
> 
> s/then/them/

Thanks. But this part is already re-written and doesn't have this
issue anymore.

> Please put some space between properties so they can be found quickly:

Sure, thanks.
Viresh Kumar May 21, 2015, 4:33 a.m. UTC | #9
On 20-05-15, 12:39, Stephen Boyd wrote:
> On 05/19/15 19:07, Viresh Kumar wrote:
> > On 19-05-15, 17:51, Stephen Boyd wrote:
> >
> >> Also I wonder if all properties should be optional? I don't have this
> >> scenario today, but perhaps the frequencies could be encoded in fuses,
> >> but the voltages wouldn't be and so we might want to read out the
> >> frequencies for a fixed set of voltages. Of course, if there's nothing
> >> in the OPP node at all, it's not very useful, so perhaps some statement
> >> that at least one of the frequency/voltage/amperage properties should be
> >> present.
> > I am not sure. What we are trying to do (fill partially in DT and
> > partially in platform), is a trick and not the right use of bindings.
> >
> > Ideally whatever is passed in DT should be complete by itself and
> > doesn't require platform to tweak it (which it can't). For example,
> > the cpufreq-dt driver will try to initialize OPPs from the DT directly
> > and wouldn't know about the platform tweaks. That can work eventually
> > as platform will add OPPs for the same bindings before cpufreq driver
> > will try to do, but that's a trick.
> >
> > And then its all about frequency in the first place, and so marking
> > that optional looks wrong. Probably not the right use of these
> > bindings.
> 
> Ok then I won't be using these bindings on any of the new platforms I
> have where half the data is in one place, and half in another. But for
> some of Krait based platforms I have they should be useable.

You are not the only one, I have seen other requests (even for the
existing bindings) to fill stuff partially in DT as they want freq to
come from bootloader.

@Rob: What do you suggest for such platforms? Let them keep (ab)using
old or new OPP DT bindings or create another binding to just pass on
freq table?

@Stephen: Can you please provide your feedback on the updated version
please. Thanks.
Viresh Kumar May 22, 2015, 2:04 p.m. UTC | #10
On 21-05-15, 01:02, Nishanth Menon wrote:
> This argues that clock is an input to the cpu, this is not in-correct,
> but, it could also be argued that OPP tables are clock dependent.
> 
> For example, with multiple clock source options that a device might
> choose to select from internally(by some means.. lets not just restrict
> ourselves to just CPUs here for a moment), the tables might be
> different. We can always debate that this then is the responsibility of
> the driver handling the description for that device and we might want
> possibility of vice versa as well - same OPP table used by different
> clock source selections as well.

@Rob: Any inputs ?
Viresh Kumar May 25, 2015, 11:59 a.m. UTC | #11
On 21-05-15, 10:03, Viresh Kumar wrote:
> On 20-05-15, 12:39, Stephen Boyd wrote:
> > On 05/19/15 19:07, Viresh Kumar wrote:
> > > On 19-05-15, 17:51, Stephen Boyd wrote:
> > >
> > >> Also I wonder if all properties should be optional? I don't have this
> > >> scenario today, but perhaps the frequencies could be encoded in fuses,
> > >> but the voltages wouldn't be and so we might want to read out the
> > >> frequencies for a fixed set of voltages. Of course, if there's nothing
> > >> in the OPP node at all, it's not very useful, so perhaps some statement
> > >> that at least one of the frequency/voltage/amperage properties should be
> > >> present.
> > > I am not sure. What we are trying to do (fill partially in DT and
> > > partially in platform), is a trick and not the right use of bindings.
> > >
> > > Ideally whatever is passed in DT should be complete by itself and
> > > doesn't require platform to tweak it (which it can't). For example,
> > > the cpufreq-dt driver will try to initialize OPPs from the DT directly
> > > and wouldn't know about the platform tweaks. That can work eventually
> > > as platform will add OPPs for the same bindings before cpufreq driver
> > > will try to do, but that's a trick.
> > >
> > > And then its all about frequency in the first place, and so marking
> > > that optional looks wrong. Probably not the right use of these
> > > bindings.
> > 
> > Ok then I won't be using these bindings on any of the new platforms I
> > have where half the data is in one place, and half in another. But for
> > some of Krait based platforms I have they should be useable.
> 
> You are not the only one, I have seen other requests (even for the
> existing bindings) to fill stuff partially in DT as they want freq to
> come from bootloader.
> 
> @Rob: What do you suggest for such platforms? Let them keep (ab)using
> old or new OPP DT bindings or create another binding to just pass on
> freq table?

Rob: Can you please comment on this one as well ? Thanks.
Viresh Kumar May 26, 2015, 5:25 a.m. UTC | #12
On 22-05-15, 12:42, Nishanth Menon wrote:
> Yes, ofcourse... it is always an option, we just tend to like to have
> all data in the same place if possible - DT is the preferred approach.

If we are done, may I send the next version incorporating your
suggestions ?
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt
index 74499e5033fc..3b67a5c8d965 100644
--- a/Documentation/devicetree/bindings/power/opp.txt
+++ b/Documentation/devicetree/bindings/power/opp.txt
@@ -1,8 +1,366 @@ 
-* Generic OPP Interface
+Generic OPP (Operating Performance Points) Bindings
+----------------------------------------------------
 
-SoCs have a standard set of tuples consisting of frequency and
-voltage pairs that the device will support per voltage domain. These
-are called Operating Performance Points or OPPs.
+Devices work at voltage-current-frequency triplets and some implementations have
+the liberty of choosing these. These triplets are called Operating Performance
+Points aka OPPs. This document defines bindings for these OPPs applicable across
+wide range of devices. For illustration purpose, this document uses CPU as a
+device.
+
+
+* Property: operating-points-v2
+
+Devices supporting OPPs must set their "operating-points-v2" property with
+phandle to a OPP descriptor in their DT node. The OPP core will use this phandle
+to find the operating points for the device.
+
+
+* OPP Descriptor Node
+
+This describes the OPPs belonging to a device. This node can have following
+properties:
+
+Required properties:
+- compatible: Allow OPPs to express their compatibility. It should be:
+  "operating-points-v2".
+- OPP nodes: One or more OPP nodes describing voltage-current-frequency
+  triplets. Their name isn't significant but their phandle can be used to
+  reference an OPP.
+
+Optional properties:
+- shared-opp: Indicates that device nodes using this OPP descriptor's phandle
+  switch their DVFS state together, i.e. they share clock/voltage/current lines.
+  Missing property means devices have independent clock/voltage/current lines,
+  but they share OPP tables.
+
+
+* OPP Node
+
+This defines voltage-current-frequency triplets along with other related
+properties.
+
+Required properties:
+- opp-khz: Frequency in kHz
+
+Optional properties:
+- opp-microvolt: voltage in micro Volts. It can contain entries for multiple
+  regulators.
+
+  A single regulator's voltage is specified with an array of size one or three.
+  Single entry is for target voltage and three entries are for <target min max>
+  voltages.
+
+  Entries for multiple regulators must be present in the same order as
+  regulators are specified in device's DT node.
+
+- opp-microamp: current in micro Amperes. It can contain entries for multiple
+  regulators.
+
+  A single regulator's current is specified with an array of size one or three.
+  Single entry is for target current and three entries are for <target min max>
+  currents.
+
+  Entries for multiple regulators must be present in the same order as
+  regulators are specified in device's DT node. If few regulators don't provide
+  capability to configure current, then values for then should be marked as
+  zero.
+
+- clock-latency-ns: Specifies the maximum possible transition latency (in
+  nanoseconds) for switching to this OPP from any other OPP.
+- turbo-mode: Marks the OPP to be used only for turbo modes.
+- status: Marks the node enabled/disabled.
+
+Example 1: Single cluster Dual-core ARM cortex A9, switch DVFS states together.
+
+/ {
+	cpus {
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		cpu@0 {
+			compatible = "arm,cortex-a9";
+			reg = <0>;
+			next-level-cache = <&L2>;
+			clocks = <&clk_controller 0>;
+			clock-names = "cpu";
+			opp-supply = <&cpu_supply0>;
+			operating-points-v2 = <&cpu0_opp>;
+		};
+
+		cpu@1 {
+			compatible = "arm,cortex-a9";
+			reg = <1>;
+			next-level-cache = <&L2>;
+			clocks = <&clk_controller 0>;
+			clock-names = "cpu";
+			opp-supply = <&cpu_supply0>;
+			operating-points-v2 = <&cpu0_opp>;
+		};
+	};
+
+	cpu0_opp: opp0 {
+		compatible = "operating-points-v2";
+		shared-opp;
+
+		entry00 {
+			opp-khz = <1000000>;
+			opp-microvolt = <970000 975000 985000>;
+			opp-microamp = <70000 75000 85000>;
+			clock-latency-ns = <300000>;
+		};
+		entry01 {
+			opp-khz = <1100000>;
+			opp-microvolt = <980000 1000000 1010000>;
+			opp-microamp = <80000 81000 82000>;
+			clock-latency-ns = <310000>;
+		};
+		entry02 {
+			opp-khz = <1200000>;
+			opp-microvolt = <1025000>;
+			clock-latency-ns = <290000>;
+			turbo-mode;
+		};
+	};
+};
+
+Example 2: Single cluster, Quad-core Qualcom-krait, switches DVFS states
+independently.
+
+/ {
+	cpus {
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		cpu@0 {
+			compatible = "qcom,krait";
+			reg = <0>;
+			next-level-cache = <&L2>;
+			clocks = <&clk_controller 0>;
+			clock-names = "cpu";
+			opp-supply = <&cpu_supply0>;
+			operating-points-v2 = <&cpu0_opp>;
+		};
+
+		cpu@1 {
+			compatible = "qcom,krait";
+			reg = <1>;
+			next-level-cache = <&L2>;
+			clocks = <&clk_controller 1>;
+			clock-names = "cpu";
+			opp-supply = <&cpu_supply1>;
+			operating-points-v2 = <&cpu0_opp>;
+		};
+
+		cpu@2 {
+			compatible = "qcom,krait";
+			reg = <2>;
+			next-level-cache = <&L2>;
+			clocks = <&clk_controller 2>;
+			clock-names = "cpu";
+			opp-supply = <&cpu_supply2>;
+			operating-points-v2 = <&cpu0_opp>;
+		};
+
+		cpu@3 {
+			compatible = "qcom,krait";
+			reg = <3>;
+			next-level-cache = <&L2>;
+			clocks = <&clk_controller 3>;
+			clock-names = "cpu";
+			opp-supply = <&cpu_supply3>;
+			operating-points-v2 = <&cpu0_opp>;
+		};
+	};
+
+	cpu0_opp: opp0 {
+		compatible = "operating-points-v2";
+
+		/*
+		 * Missing shared-opp property means CPUs switch DVFS states
+		 * independently.
+		 */
+
+		entry00 {
+			opp-khz = <1000000>;
+			opp-microvolt = <970000 975000 985000>;
+			opp-microamp = <70000 75000 85000>;
+			clock-latency-ns = <300000>;
+		};
+		entry01 {
+			opp-khz = <1100000>;
+			opp-microvolt = <980000 1000000 1010000>;
+			opp-microamp = <80000 81000 82000>;
+			clock-latency-ns = <310000>;
+		};
+		entry02 {
+			opp-khz = <1200000>;
+			opp-microvolt = <1025000>;
+			opp-microamp = <90000;
+			lock-latency-ns = <290000>;
+			turbo-mode;
+		};
+	};
+};
+
+Example 3: Dual-cluster, Dual-core per cluster. CPUs within a cluster switch
+DVFS state together.
+
+/ {
+	cpus {
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		cpu@0 {
+			compatible = "arm,cortex-a7";
+			reg = <0>;
+			next-level-cache = <&L2>;
+			clocks = <&clk_controller 0>;
+			clock-names = "cpu";
+			opp-supply = <&cpu_supply0>;
+			operating-points-v2 = <&cluster0_opp>;
+		};
+
+		cpu@1 {
+			compatible = "arm,cortex-a7";
+			reg = <1>;
+			next-level-cache = <&L2>;
+			clocks = <&clk_controller 0>;
+			clock-names = "cpu";
+			opp-supply = <&cpu_supply0>;
+			operating-points-v2 = <&cluster0_opp>;
+		};
+
+		cpu@100 {
+			compatible = "arm,cortex-a15";
+			reg = <100>;
+			next-level-cache = <&L2>;
+			clocks = <&clk_controller 1>;
+			clock-names = "cpu";
+			opp-supply = <&cpu_supply1>;
+			operating-points-v2 = <&cluster1_opp>;
+		};
+
+		cpu@101 {
+			compatible = "arm,cortex-a15";
+			reg = <101>;
+			next-level-cache = <&L2>;
+			clocks = <&clk_controller 1>;
+			clock-names = "cpu";
+			opp-supply = <&cpu_supply1>;
+			operating-points-v2 = <&cluster1_opp>;
+		};
+	};
+
+	cluster0_opp: opp0 {
+		compatible = "operating-points-v2";
+		shared-opp;
+
+		entry00 {
+			opp-khz = <1000000>;
+			opp-microvolt = <970000 975000 985000>;
+			opp-microamp = <70000 75000 85000>;
+			clock-latency-ns = <300000>;
+		};
+		entry01 {
+			opp-khz = <1100000>;
+			opp-microvolt = <980000 1000000 1010000>;
+			opp-microamp = <80000 81000 82000>;
+			clock-latency-ns = <310000>;
+		};
+		entry02 {
+			opp-khz = <1200000>;
+			opp-microvolt = <1025000>;
+			opp-microamp = <90000>;
+			clock-latency-ns = <290000>;
+			turbo-mode;
+		};
+	};
+
+	cluster1_opp: opp1 {
+		compatible = "operating-points-v2";
+		shared-opp;
+
+		entry10 {
+			opp-khz = <1300000>;
+			opp-microvolt = <1045000 1050000 1055000>;
+			opp-microamp = <95000 100000 105000>;
+			clock-latency-ns = <400000>;
+		};
+		entry11 {
+			opp-khz = <1400000>;
+			opp-microvolt = <1075000>;
+			opp-microamp = <100000>;
+			clock-latency-ns = <400000>;
+		};
+		entry12 {
+			opp-khz = <1500000>;
+			opp-microvolt = <1010000 1100000 1110000>;
+			opp-microamp = <95000 100000 105000>;
+			clock-latency-ns = <400000>;
+			turbo-mode;
+		};
+	};
+};
+
+Example 4: Handling multiple regulators
+
+/ {
+	cpus {
+		cpu@0 {
+			compatible = "arm,cortex-a7";
+			...
+
+			opp-supply = <&cpu_supply0>, <&cpu_supply1>, <&cpu_supply2>;
+			operating-points-v2 = <&cpu0_opp>;
+		};
+	};
+
+	cpu0_opp: opp0 {
+		compatible = "operating-points-v2";
+		shared-opp;
+
+		entry00 {
+			opp-khz = <1000000>;
+			opp-microvolt = <970000>, /* Supply 0 */
+					<960000>, /* Supply 1 */
+					<960000>; /* Supply 2 */
+			opp-microamp =  <70000>,  /* Supply 0 */
+					<70000>,  /* Supply 1 */
+					<70000>;  /* Supply 2 */
+			clock-latency-ns = <300000>;
+		};
+
+		/* OR */
+
+		entry00 {
+			opp-khz = <1000000>;
+			opp-microvolt = <970000 975000 985000>, /* Supply 0 */
+					<960000 965000 975000>, /* Supply 1 */
+					<960000 965000 975000>; /* Supply 2 */
+			opp-microamp =  <70000  75000  85000>,  /* Supply 0 */
+					<70000  75000  85000>,  /* Supply 1 */
+					<70000  75000  85000>;  /* Supply 2 */
+			clock-latency-ns = <300000>;
+		};
+
+		/* OR */
+
+		entry00 {
+			opp-khz = <1000000>;
+			opp-microvolt = <970000 975000 985000>, /* Supply 0 */
+					<960000 965000 975000>, /* Supply 1 */
+					<960000 965000 975000>; /* Supply 2 */
+			opp-microamp =  <70000  75000  85000>,  /* Supply 0 */
+					<0	0      0>,	/* Supply 1 doesn't support current change */
+					<70000  75000  85000>;  /* Supply 2 */
+			clock-latency-ns = <300000>;
+		};
+	};
+};
+
+
+Deprecated Bindings
+-------------------
 
 Properties:
 - operating-points: An array of 2-tuples items, and each item consists