diff mbox

[RFC,v2,2/2] Documentation: arm: define DT C-states bindings

Message ID 1390240079-6495-3-git-send-email-lorenzo.pieralisi@arm.com
State New
Headers show

Commit Message

Lorenzo Pieralisi Jan. 20, 2014, 5:47 p.m. UTC
ARM based platforms implement a variety of power management schemes that
allow processors to enter at run-time low-power states, aka C-states
in ACPI jargon. The parameters defining these C-states vary on a per-platform
basis forcing the OS to hardcode the state parameters in platform
specific static tables whose size grows as the number of platforms supported
in the kernel increases and hampers device drivers standardization.

Therefore, this patch aims at standardizing C-state device tree bindings for
ARM platforms. Bindings define C-state parameters inclusive of entry methods
and state latencies, to allow operating systems to retrieve the
configuration entries from the device tree and initialize the related
power management drivers, paving the way for common code in the kernel
to deal with power states and removing the need for static data in current
and previous kernel versions.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
---
 Documentation/devicetree/bindings/arm/c-states.txt | 774 +++++++++++++++++++++
 Documentation/devicetree/bindings/arm/cpus.txt     |  10 +
 2 files changed, 784 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/c-states.txt

Comments

Vincent Guittot Jan. 21, 2014, 11:16 a.m. UTC | #1
Hi Lorenzo,

On 20 January 2014 18:47, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote:
> ARM based platforms implement a variety of power management schemes that
> allow processors to enter at run-time low-power states, aka C-states
> in ACPI jargon. The parameters defining these C-states vary on a per-platform
> basis forcing the OS to hardcode the state parameters in platform
> specific static tables whose size grows as the number of platforms supported
> in the kernel increases and hampers device drivers standardization.
>
> Therefore, this patch aims at standardizing C-state device tree bindings for
> ARM platforms. Bindings define C-state parameters inclusive of entry methods
> and state latencies, to allow operating systems to retrieve the
> configuration entries from the device tree and initialize the related
> power management drivers, paving the way for common code in the kernel
> to deal with power states and removing the need for static data in current
> and previous kernel versions.
>
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> ---
>  Documentation/devicetree/bindings/arm/c-states.txt | 774 +++++++++++++++++++++
>  Documentation/devicetree/bindings/arm/cpus.txt     |  10 +
>  2 files changed, 784 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/arm/c-states.txt
>
> diff --git a/Documentation/devicetree/bindings/arm/c-states.txt b/Documentation/devicetree/bindings/arm/c-states.txt
> new file mode 100644
> index 0000000..0b5617b
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/c-states.txt
> @@ -0,0 +1,774 @@
> +==========================================
> +ARM C-states binding description
> +==========================================
> +
> +==========================================
> +1 - Introduction
> +==========================================
> +
> +ARM systems contain HW capable of managing power consumption dynamically,
> +where cores can be put in different low-power states (ranging from simple
> +wfi to power gating) according to OSPM policies. Borrowing concepts
> +from the ACPI specification[1], the CPU states representing the range of
> +dynamic states that a processor can enter at run-time, aka C-state, can be
> +specified through device tree bindings representing the parameters required to
> +enter/exit specific C-states on a given processor.
> +
> +The state an ARM CPU can be put into is loosely identified by one of the
> +following operating modes:
> +
> +- Running:
> +        # Processor core is executing instructions
> +
> +- Wait for Interrupt:
> +       # An ARM processor enters wait for interrupt (WFI) low power
> +         state by executing a wfi instruction. When a processor enters
> +         wfi state it disables most of the clocks while keeping the processor
> +         powered up. This state is standard on all ARM processors and it is
> +         defined as C1 in the remainder of this document.
> +

> +- Dormant:
> +       # Dormant mode is entered by executing wfi instructions and by sending
> +         platform specific commands to the platform power controller (coupled
> +         with processor specific SW/HW control sequences).
> +         In dormant mode, most of the processor control and debug logic is
> +         powered up but cache RAM can be put in retention state, providing

Base on your description, it's not clear for me what is on, what is
lost and what is power down ?
My understand of the dormant mode that you described above is : the
cache is preserved (and especially the cache RAM) but the processor
state is lost (registers ...). Do I understand correctly ?

What about retention mode where the contents of processor and cache
are preserved but the power consumption is reduced ? it can be seen as
a special wfi mode which need specific SW/HW control sequences but i'm
not sure to understand how to describe such state with your proposal.

> +         additional power savings.
> +
> +- Sleep:
> +       # Sleep mode is entered by executing the wfi instruction and by sending
> +         platform specific commands to the platform power controller (coupled
> +         with processor specific SW/HW control sequences). In sleep mode, a
> +         processor and its caches are shutdown, the entire processor state is
> +         lost.
> +
> +Building on top of the previous processor modes, ARM platforms implement power
> +management schemes that allow an OS PM implementation to put the processor in
> +different CPU states (C-states). C-states parameters (eg latency) are
> +platform specific and need to be characterized with bindings that provide the
> +required information to OSPM code so that it can build the required tables and
> +use them at runtime.
> +
> +The device tree binding definition for ARM C-states is the subject of this
> +document.
> +

[snip]

> +
> +       - psci-power-state
> +               Usage: Required if entry-method property value is set to
> +                      "psci".
> +               Value type: <u32>
> +               Definition: power_state parameter to pass to the PSCI
> +                           suspend call to enter the C-state.

Why psci has got a dedicated field and not vendor methods ? can't you
make that more generic ?

> +
> +       - latency
> +               Usage: Required
> +               Value type: <prop-encoded-array>
> +               Definition: List of u32 values representing worst case latency
> +                           in microseconds required to enter and exit the
> +                           C-state, one value per OPP [2]. The list should
> +                           be specified in the same order as the operating
> +                           points property list of the cpu this state is
> +                           valid on.
> +                           If no OPP bindings are present, the latency value
> +                           is associated with the current OPP of CPUs in the
> +                           system.
> +

[snip]

Thanks
Vincent
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Pieralisi Jan. 21, 2014, 1:31 p.m. UTC | #2
On Tue, Jan 21, 2014 at 11:16:46AM +0000, Vincent Guittot wrote:

[...]

> > +- Dormant:
> > +       # Dormant mode is entered by executing wfi instructions and by sending
> > +         platform specific commands to the platform power controller (coupled
> > +         with processor specific SW/HW control sequences).
> > +         In dormant mode, most of the processor control and debug logic is
> > +         powered up but cache RAM can be put in retention state, providing
> 
> Base on your description, it's not clear for me what is on, what is
> lost and what is power down ?

Sorry, typo, "powered down", not powered up.

> My understand of the dormant mode that you described above is : the
> cache is preserved (and especially the cache RAM) but the processor
> state is lost (registers ...). Do I understand correctly ?

Yes.

> What about retention mode where the contents of processor and cache
> are preserved but the power consumption is reduced ? it can be seen as
> a special wfi mode which need specific SW/HW control sequences but i'm
> not sure to understand how to describe such state with your proposal.

True, and I omitted that on purpose so that it can be debated and to
keep it simple (well, so to speak) thanks for pointing that out.

The bindings allow a C-state to link to a power domain. Each device can
link itself to a power domain. Hence at least now we know what devices
are affected by a C-state (and by device I also mean arch timers, PMUs,
GIC, etc).

Now, retention vs. off. In theory we could link a device to a C-state
and define what mode would be that device on C-state entry, but honestly
it starts becoming looooots of data in the DT.

For instance, we could define for every device the max C-state index allowed
for the device context to be powered-up (or retained).

Or, find a way to describe it through the power domain specifier:

cache {
	power-domain = <&foo 0 &foo 1>:
	power-state = <1 0>;
};

which means that for the pair <&foo 0> cache is retained (1 == retained,
0 == lost) and for power domain <&foo 1> cache is lost.

I have no complete answer, certainly this adds complexity (but it is a very
complex problem, so..) and it is a bit horrible, ideas welcome.

[...]

> > +       - psci-power-state
> > +               Usage: Required if entry-method property value is set to
> > +                      "psci".
> > +               Value type: <u32>
> > +               Definition: power_state parameter to pass to the PSCI
> > +                           suspend call to enter the C-state.
> 
> Why psci has got a dedicated field and not vendor methods ? can't you
> make that more generic ?

If anyone provides me with an example usage why not, for now I know I
need that parameter for PSCI, I can call it differently, define it for PSCI
and leave it as optional for other methods.

Thanks,
Lorenzo

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amit Kucheria Jan. 21, 2014, 2:35 p.m. UTC | #3
Hi Lorenzo,

On Mon, Jan 20, 2014 at 11:17 PM, Lorenzo Pieralisi
<lorenzo.pieralisi@arm.com> wrote:
> ARM based platforms implement a variety of power management schemes that
> allow processors to enter at run-time low-power states, aka C-states
> in ACPI jargon. The parameters defining these C-states vary on a per-platform
> basis forcing the OS to hardcode the state parameters in platform
> specific static tables whose size grows as the number of platforms supported
> in the kernel increases and hampers device drivers standardization.
>
> Therefore, this patch aims at standardizing C-state device tree bindings for
> ARM platforms. Bindings define C-state parameters inclusive of entry methods
> and state latencies, to allow operating systems to retrieve the
> configuration entries from the device tree and initialize the related
> power management drivers, paving the way for common code in the kernel
> to deal with power states and removing the need for static data in current
> and previous kernel versions.
>
> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> ---
>  Documentation/devicetree/bindings/arm/c-states.txt | 774 +++++++++++++++++++++
>  Documentation/devicetree/bindings/arm/cpus.txt     |  10 +
>  2 files changed, 784 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/arm/c-states.txt
>
> diff --git a/Documentation/devicetree/bindings/arm/c-states.txt b/Documentation/devicetree/bindings/arm/c-states.txt

s/c-states/idle-states?

While C-states are widely used when talking about idle-states as
you've noted, idle-states are still the correct generic term for them.

> new file mode 100644
> index 0000000..0b5617b
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/c-states.txt
> @@ -0,0 +1,774 @@
> +==========================================
> +ARM C-states binding description
> +==========================================
> +
> +==========================================
> +1 - Introduction
> +==========================================
> +
> +ARM systems contain HW capable of managing power consumption dynamically,
> +where cores can be put in different low-power states (ranging from simple
> +wfi to power gating) according to OSPM policies. Borrowing concepts
> +from the ACPI specification[1], the CPU states representing the range of
> +dynamic states that a processor can enter at run-time, aka C-state, can be
> +specified through device tree bindings representing the parameters required to
> +enter/exit specific C-states on a given processor.
> +
> +The state an ARM CPU can be put into is loosely identified by one of the
> +following operating modes:
> +
> +- Running:
> +        # Processor core is executing instructions
> +
> +- Wait for Interrupt:
> +       # An ARM processor enters wait for interrupt (WFI) low power
> +         state by executing a wfi instruction. When a processor enters
> +         wfi state it disables most of the clocks while keeping the processor
> +         powered up. This state is standard on all ARM processors and it is
> +         defined as C1 in the remainder of this document.
> +
> +- Dormant:
> +       # Dormant mode is entered by executing wfi instructions and by sending
> +         platform specific commands to the platform power controller (coupled
> +         with processor specific SW/HW control sequences).
> +         In dormant mode, most of the processor control and debug logic is
> +         powered up but cache RAM can be put in retention state, providing
> +         additional power savings.
> +
> +- Sleep:
> +       # Sleep mode is entered by executing the wfi instruction and by sending
> +         platform specific commands to the platform power controller (coupled
> +         with processor specific SW/HW control sequences). In sleep mode, a
> +         processor and its caches are shutdown, the entire processor state is
> +         lost.
> +
> +Building on top of the previous processor modes, ARM platforms implement power

Nitpick: s/previous/above

> +management schemes that allow an OS PM implementation to put the processor in
> +different CPU states (C-states). C-states parameters (eg latency) are
> +platform specific and need to be characterized with bindings that provide the
> +required information to OSPM code so that it can build the required tables and
> +use them at runtime.
> +
> +The device tree binding definition for ARM C-states is the subject of this
> +document.
> +
> +===========================================
> +2 - cpu-power-states node
> +===========================================
> +
> +ARM processor C-states are defined within the cpu-power-states node, which is
> +a direct child of the cpus node and provides a container where the processor
> +states, defined as device tree nodes, are listed.
> +
> +- cpu-power-states node

What do you think of s/cpu-power-states/cpu-idle-states?

CPU Power management is more than just idle. Unless you have plans to
add more properties to the cpu-power-states node later.

> +
> +       Usage: Optional - On ARM systems, is a container of processor C-state
> +                         nodes. If the system does not provide CPU power
> +                         management capabilities or the processor just
> +                         supports WFI (C1 state) a cpu-power-states node is
> +                         not required.
> +
> +       Description: cpu-power-states node is a container node, where its
> +                    subnodes describe the CPU low-power C-states.
> +
> +       Node name must be "cpu-power-states".
> +
> +       The cpu-power-states node's parent node must be cpus node.
> +
> +       The cpu-power-states node's child nodes can be:
> +
> +       - one or more state nodes
> +
> +       Any other configuration is considered invalid.
> +
> +The nodes describing the C-states (state) can only be defined within the
> +cpu-power-states node.
> +
> +Any other configuration is consider invalid and therefore must be ignored.
> +
> +===========================================
> +2 - state node
> +===========================================
> +
> +A state node represents a C-state description and must be defined as follows:
> +
> +- state node
> +
> +       Description: must be child of either the cpu-power-states node or
> +                    a state node.
> +
> +       The state node name shall be "stateN", where N = {0, 1, ...} is
> +       the node number; state nodes which are siblings within a single common
> +       parent node must be given a unique and sequential N value, starting
> +       from 0.
> +
> +       A state node can contain state child nodes. Child nodes inherit
> +       properties from the parent state nodes that work as state
> +       properties aggregators (ie contain properties valid on all state
> +       nodes children).
> +
> +       A state node defines the following properties (either explicitly
> +       or by inheriting them from a parent node):
> +
> +       - compatible
> +               Usage: Required
> +               Value type: <stringlist>
> +               Definition: Must be "arm,cpu-power-state".
> +
> +       - index
> +               Usage: Required
> +               Value type: <u32>
> +               Definition: It represents C-state index, starting from 2 (index
> +                           0 represents the processor state "running" and
> +                           index 1 represents processor mode "WFI"; indexes 0
> +                           and 1 are standard ARM states that need not be
> +                           described).
> +
> +       - power-domain
> +               Usage: Required
> +               Value type: <prop-encoded-array>
> +               Definition: List of phandle and power domain specifiers
> +                           as defined by bindings of power controller
> +                           specified by the phandle [3]. It represents the
> +                           power domains associated with the C-state. The
> +                           power domains list can be used by OSPM to
> +                           retrieve the devices belonging to the power
> +                           domains and carry out corresponding actions to
> +                           preserve functionality across power cycles
> +                           (ie context save/restore, cache flushing).
> +
> +       - entry-method
> +               Usage: Required
> +               Value type: <stringlist>
> +               Definition: Describes the method by which a CPU enters the
> +                           C-state. This property is required and must be one
> +                           of:
> +
> +                           - "psci"
> +                             ARM Standard firmware interface
> +
> +                           - "[vendor],[method]"
> +                             An implementation dependent string with
> +                             format "vendor,method", where vendor is a string
> +                             denoting the name of the manufacturer and
> +                             method is a string specifying the mechanism
> +                             used to enter the C-state.
> +
> +       - psci-power-state
> +               Usage: Required if entry-method property value is set to
> +                      "psci".
> +               Value type: <u32>
> +               Definition: power_state parameter to pass to the PSCI
> +                           suspend call to enter the C-state.
> +
> +       - latency
> +               Usage: Required
> +               Value type: <prop-encoded-array>
> +               Definition: List of u32 values representing worst case latency
> +                           in microseconds required to enter and exit the
> +                           C-state, one value per OPP [2]. The list should
> +                           be specified in the same order as the operating
> +                           points property list of the cpu this state is
> +                           valid on.
> +                           If no OPP bindings are present, the latency value
> +                           is associated with the current OPP of CPUs in the
> +                           system.

DT-newbie here. What would happen if a vendor does not characterise
the latency at each OPP? IOW, the table only contains latency values
for a subset of the OPPs.

> +
> +       - min-residency
> +               Usage: Required
> +               Value type: <prop-encoded-array>
> +               Definition: List of u32 values representing time in
> +                           microseconds required for the CPU to be in
> +                           the C-state to make up for the dynamic power
> +                           consumed to enter/exit the C-state in order to
> +                           break even in terms of power consumption compared
> +                           to C1 state (wfi), one value per-OPP [2].
> +                           This parameter depends on the operating conditions
> +                           (HW state) and must assume worst case scenario.
> +                           The list should be specified in the same order as
> +                           the operating points property list of the cpu this
> +                           state is valid on.
> +                           If no OPP bindings are present the min-residency
> +                           value is associated with the current OPP of CPUs
> +                           in the system.

Same question as latency above.

> +
> +===========================================
> +3 - Examples
> +===========================================
> +
> +Example 1 (ARM 64-bit, 16-cpu system, two clusters of clusters):


     ^^^^^^^^^^^^^
                                                         clusters of cpus?
> +
> +pd_clusters: power-domain-clusters@80002000 {
> +       compatible = "arm,power-controller";
> +       reg = <0x0 0x80002000 0x0 0x1000>;
> +       #power-domain-cells = <1>;
> +       #address-cells = <2>;
> +       #size-cells = <2>;
> +
> +       pd_cores: power-domain-cores@80000000 {
> +               compatible = "arm,power-controller";
> +               reg = <0x0 0x80000000 0x0 0x1000>;
> +               #power-domain-cells = <1>;
> +       };
> +};
> +

<snip>
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Pieralisi Jan. 21, 2014, 3:23 p.m. UTC | #4
Hi Amit,

On Tue, Jan 21, 2014 at 02:35:11PM +0000, Amit Kucheria wrote:
> Hi Lorenzo,
> 
> On Mon, Jan 20, 2014 at 11:17 PM, Lorenzo Pieralisi
> <lorenzo.pieralisi@arm.com> wrote:
> > ARM based platforms implement a variety of power management schemes that
> > allow processors to enter at run-time low-power states, aka C-states
> > in ACPI jargon. The parameters defining these C-states vary on a per-platform
> > basis forcing the OS to hardcode the state parameters in platform
> > specific static tables whose size grows as the number of platforms supported
> > in the kernel increases and hampers device drivers standardization.
> >
> > Therefore, this patch aims at standardizing C-state device tree bindings for
> > ARM platforms. Bindings define C-state parameters inclusive of entry methods
> > and state latencies, to allow operating systems to retrieve the
> > configuration entries from the device tree and initialize the related
> > power management drivers, paving the way for common code in the kernel
> > to deal with power states and removing the need for static data in current
> > and previous kernel versions.
> >
> > Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > ---
> >  Documentation/devicetree/bindings/arm/c-states.txt | 774 +++++++++++++++++++++
> >  Documentation/devicetree/bindings/arm/cpus.txt     |  10 +
> >  2 files changed, 784 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/arm/c-states.txt
> >
> > diff --git a/Documentation/devicetree/bindings/arm/c-states.txt b/Documentation/devicetree/bindings/arm/c-states.txt
> 
> s/c-states/idle-states?
> 
> While C-states are widely used when talking about idle-states as
> you've noted, idle-states are still the correct generic term for them.

C-states on ACPI are processor power states, I think we can keep the
same nomenclature. I do not mind changing it though, more comments
below.

> > new file mode 100644
> > index 0000000..0b5617b
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/arm/c-states.txt
> > @@ -0,0 +1,774 @@
> > +==========================================
> > +ARM C-states binding description
> > +==========================================
> > +
> > +==========================================
> > +1 - Introduction
> > +==========================================
> > +
> > +ARM systems contain HW capable of managing power consumption dynamically,
> > +where cores can be put in different low-power states (ranging from simple
> > +wfi to power gating) according to OSPM policies. Borrowing concepts
> > +from the ACPI specification[1], the CPU states representing the range of
> > +dynamic states that a processor can enter at run-time, aka C-state, can be
> > +specified through device tree bindings representing the parameters required to
> > +enter/exit specific C-states on a given processor.
> > +
> > +The state an ARM CPU can be put into is loosely identified by one of the
> > +following operating modes:
> > +
> > +- Running:
> > +        # Processor core is executing instructions
> > +
> > +- Wait for Interrupt:
> > +       # An ARM processor enters wait for interrupt (WFI) low power
> > +         state by executing a wfi instruction. When a processor enters
> > +         wfi state it disables most of the clocks while keeping the processor
> > +         powered up. This state is standard on all ARM processors and it is
> > +         defined as C1 in the remainder of this document.
> > +
> > +- Dormant:
> > +       # Dormant mode is entered by executing wfi instructions and by sending
> > +         platform specific commands to the platform power controller (coupled
> > +         with processor specific SW/HW control sequences).
> > +         In dormant mode, most of the processor control and debug logic is
> > +         powered up but cache RAM can be put in retention state, providing
> > +         additional power savings.
> > +
> > +- Sleep:
> > +       # Sleep mode is entered by executing the wfi instruction and by sending
> > +         platform specific commands to the platform power controller (coupled
> > +         with processor specific SW/HW control sequences). In sleep mode, a
> > +         processor and its caches are shutdown, the entire processor state is
> > +         lost.
> > +
> > +Building on top of the previous processor modes, ARM platforms implement power
> 
> Nitpick: s/previous/above

Ok.

> > +management schemes that allow an OS PM implementation to put the processor in
> > +different CPU states (C-states). C-states parameters (eg latency) are
> > +platform specific and need to be characterized with bindings that provide the
> > +required information to OSPM code so that it can build the required tables and
> > +use them at runtime.
> > +
> > +The device tree binding definition for ARM C-states is the subject of this
> > +document.
> > +
> > +===========================================
> > +2 - cpu-power-states node
> > +===========================================
> > +
> > +ARM processor C-states are defined within the cpu-power-states node, which is
> > +a direct child of the cpus node and provides a container where the processor
> > +states, defined as device tree nodes, are listed.
> > +
> > +- cpu-power-states node
> 
> What do you think of s/cpu-power-states/cpu-idle-states?
> 
> CPU Power management is more than just idle. Unless you have plans to
> add more properties to the cpu-power-states node later.

Ok, if by saying that CPU power management is more than just idle you
also mean managing power while processor is running (ie DVFS) I think
you have a point. Again, I do not mind changing it, keeping in mind that
names stick so if we think we require more info from these bindings,
and that's probably the case, we'd better stick to the current naming
scheme.

[...]

> > +       - latency
> > +               Usage: Required
> > +               Value type: <prop-encoded-array>
> > +               Definition: List of u32 values representing worst case latency
> > +                           in microseconds required to enter and exit the
> > +                           C-state, one value per OPP [2]. The list should
> > +                           be specified in the same order as the operating
> > +                           points property list of the cpu this state is
> > +                           valid on.
> > +                           If no OPP bindings are present, the latency value
> > +                           is associated with the current OPP of CPUs in the
> > +                           system.
> 
> DT-newbie here. What would happen if a vendor does not characterise
> the latency at each OPP? IOW, the table only contains latency values
> for a subset of the OPPs.

The bindings are explicit, so the kernel will barf. Adding a LUT to map
latencies to OPPs make me cringe, so I would not change the current
bindings.

> > +       - min-residency
> > +               Usage: Required
> > +               Value type: <prop-encoded-array>
> > +               Definition: List of u32 values representing time in
> > +                           microseconds required for the CPU to be in
> > +                           the C-state to make up for the dynamic power
> > +                           consumed to enter/exit the C-state in order to
> > +                           break even in terms of power consumption compared
> > +                           to C1 state (wfi), one value per-OPP [2].
> > +                           This parameter depends on the operating conditions
> > +                           (HW state) and must assume worst case scenario.
> > +                           The list should be specified in the same order as
> > +                           the operating points property list of the cpu this
> > +                           state is valid on.
> > +                           If no OPP bindings are present the min-residency
> > +                           value is associated with the current OPP of CPUs
> > +                           in the system.
> 
> Same question as latency above.

Same opinion, I am not keen on adding further complexity, after all if
some operating points are not characterized either they should be
disabled or people do not care, hence they can add estimated values just
as well to the respective latencies in the DT.

> > +
> > +===========================================
> > +3 - Examples
> > +===========================================
> > +
> > +Example 1 (ARM 64-bit, 16-cpu system, two clusters of clusters):
> 
> 
>      ^^^^^^^^^^^^^
>                                                          clusters of cpus?

Well I took it from an example where topology was clusters of clusters
but I removed the topology node and honestly it does not add anything to
the discussion so I will reword it.

Thanks for having a look,
Lorenzo

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Pieralisi Jan. 22, 2014, 4:23 p.m. UTC | #5
Hi Mark,

On Wed, Jan 22, 2014 at 11:52:14AM +0000, Mark Brown wrote:
> On Tue, Jan 21, 2014 at 03:23:59PM +0000, Lorenzo Pieralisi wrote:
> > On Tue, Jan 21, 2014 at 02:35:11PM +0000, Amit Kucheria wrote:
> 
> > > DT-newbie here. What would happen if a vendor does not characterise
> > > the latency at each OPP? IOW, the table only contains latency values
> > > for a subset of the OPPs.
> 
> > The bindings are explicit, so the kernel will barf. Adding a LUT to map
> > latencies to OPPs make me cringe, so I would not change the current
> > bindings.
> 
> Actually looking at the OPP binding I do wonder if it might not be
> better to have a v2/rich binding for them which is extensible - the fact
> that it's not possible to add additional information seems like an
> issue, this can't be the only thing anyone might want to add and lining
> up multiple tables is never fun.

On one hand OPP bindings need improvement and that's on the cards. I am not
really following what you mean by "extensible", I only want to make sure
that the C-state bindings do not become too complex.

Do you mean extending OPP bindings to add eg C-state information there
(or whatever piece of information that is OPP dependent) ?
It seems a bit of a stretch but I can think about that.

I think that C-state properties are better defined in the C-state bindings
that was the idea but I am open to suggestions.

Thank you !
Lorenzo

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Pieralisi Jan. 22, 2014, 4:33 p.m. UTC | #6
On Wed, Jan 22, 2014 at 11:42:32AM +0000, Mark Brown wrote:
> On Tue, Jan 21, 2014 at 01:31:48PM +0000, Lorenzo Pieralisi wrote:
> > On Tue, Jan 21, 2014 at 11:16:46AM +0000, Vincent Guittot wrote:
> 
> > > > +       - psci-power-state
> > > > +               Usage: Required if entry-method property value is set to
> > > > +                      "psci".
> > > > +               Value type: <u32>
> > > > +               Definition: power_state parameter to pass to the PSCI
> > > > +                           suspend call to enter the C-state.
> 
> > > Why psci has got a dedicated field and not vendor methods ? can't you
> > > make that more generic ?
> 
> > If anyone provides me with an example usage why not, for now I know I
> > need that parameter for PSCI, I can call it differently, define it for PSCI
> > and leave it as optional for other methods.
> 
> Would it not be sensible to define a PSCI binding that extends this and
> other bindings - ISTR some other properties getting scattered into
> bindings for it?

You mean adding the properties in the PSCI bindings instead of defining
them here ? Let me think about this, I really reckon these are C-state
specific properties that belong in here (but actually I have to add
a statement related to PSCI - ie bindings require a PSCI node to be
present and valid), I will look into this.

Thank you,
Lorenzo

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Pieralisi Jan. 22, 2014, 7:20 p.m. UTC | #7
Hi Vincent,

On Tue, Jan 21, 2014 at 11:16:46AM +0000, Vincent Guittot wrote:
> Hi Lorenzo,
> 
> On 20 January 2014 18:47, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote:
> > ARM based platforms implement a variety of power management schemes that
> > allow processors to enter at run-time low-power states, aka C-states
> > in ACPI jargon. The parameters defining these C-states vary on a per-platform
> > basis forcing the OS to hardcode the state parameters in platform
> > specific static tables whose size grows as the number of platforms supported
> > in the kernel increases and hampers device drivers standardization.
> >
> > Therefore, this patch aims at standardizing C-state device tree bindings for
> > ARM platforms. Bindings define C-state parameters inclusive of entry methods
> > and state latencies, to allow operating systems to retrieve the
> > configuration entries from the device tree and initialize the related
> > power management drivers, paving the way for common code in the kernel
> > to deal with power states and removing the need for static data in current
> > and previous kernel versions.
> >
> > Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > ---
> >  Documentation/devicetree/bindings/arm/c-states.txt | 774 +++++++++++++++++++++
> >  Documentation/devicetree/bindings/arm/cpus.txt     |  10 +
> >  2 files changed, 784 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/arm/c-states.txt
> >
> > diff --git a/Documentation/devicetree/bindings/arm/c-states.txt b/Documentation/devicetree/bindings/arm/c-states.txt
> > new file mode 100644
> > index 0000000..0b5617b
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/arm/c-states.txt
> > @@ -0,0 +1,774 @@
> > +==========================================
> > +ARM C-states binding description
> > +==========================================
> > +
> > +==========================================
> > +1 - Introduction
> > +==========================================
> > +
> > +ARM systems contain HW capable of managing power consumption dynamically,
> > +where cores can be put in different low-power states (ranging from simple
> > +wfi to power gating) according to OSPM policies. Borrowing concepts
> > +from the ACPI specification[1], the CPU states representing the range of
> > +dynamic states that a processor can enter at run-time, aka C-state, can be
> > +specified through device tree bindings representing the parameters required to
> > +enter/exit specific C-states on a given processor.
> > +
> > +The state an ARM CPU can be put into is loosely identified by one of the
> > +following operating modes:
> > +
> > +- Running:
> > +        # Processor core is executing instructions
> > +
> > +- Wait for Interrupt:
> > +       # An ARM processor enters wait for interrupt (WFI) low power
> > +         state by executing a wfi instruction. When a processor enters
> > +         wfi state it disables most of the clocks while keeping the processor
> > +         powered up. This state is standard on all ARM processors and it is
> > +         defined as C1 in the remainder of this document.
> > +
> 
> > +- Dormant:
> > +       # Dormant mode is entered by executing wfi instructions and by sending
> > +         platform specific commands to the platform power controller (coupled
> > +         with processor specific SW/HW control sequences).
> > +         In dormant mode, most of the processor control and debug logic is
> > +         powered up but cache RAM can be put in retention state, providing
> 
> Base on your description, it's not clear for me what is on, what is
> lost and what is power down ?
> My understand of the dormant mode that you described above is : the
> cache is preserved (and especially the cache RAM) but the processor
> state is lost (registers ...). Do I understand correctly ?
> 
> What about retention mode where the contents of processor and cache
> are preserved but the power consumption is reduced ? it can be seen as
> a special wfi mode which need specific SW/HW control sequences but i'm
> not sure to understand how to describe such state with your proposal.

I had an idea. To simplify things, I think that one possibility is to
add a parameter to the power domain specifier (platform specific, see
Tomasz bindings):

Documentation/devicetree/bindings/power/power_domain.txt

http://lists.infradead.org/pipermail/linux-arm-kernel/2014-January/224928.html

to represent, when that state is entered the behavior of the power
controller (ie cache RAM retention or cache shutdown or in general any
substate within a power domain). Since it is platform specific, and since
we are able to link caches to the power domain, the power controller will
actually define what happens to the cache when that state is entered
(basically we use the power domain specifier additional parameter to define
a "substate" in that power domain e.g.:

Example:

foo_power_controller {
	[...]
	/*
	 * first cell is register index, second one is the state index
	 * that in turn implies the state behavior - eg cache lost or
	 * retained
	 */
	#power-domain-cells = <2>;
};

l1-cache {
	[...]
	/*
	 * syntax: power-domains = list of power domain specifiers
		<[&power_domain_phandle register-index state],[&power_domain_phandle register-index state]>;
		The syntax is defined by the power controller du jour
		as described by Tomasz bindings
	*/
	power-domains =<&foo_power_controller 0 0 &foo_power_controller 0 1>;

}:

and then

state0 {
	index = <2>;
	compatible = "arm,cpu-power-state";
	latency = <...>;
	/*
	 * This means that when the state is entered, the power
	 * controller should use register index 0 and state 0,
	 * whose meaning is power controller specific. Since we
	 * know all components affected (for every component
	 * we declare its power domain(s) and states so we
	 * know what components are affected by the state entry.
	 * Given the cache node above and this phandle, the state
	 * implies that the cache is retained, register index == 0 state == 0
	 /*
	power-domain =<&foo_power_controller 0 0>;
};

state1 {
	index = <3>;
	compatible = "arm,cpu-power-state";
	latency = <...>;
	/*
	 * This means that when the state is entered, the power
	 * controller should use register index 0 and state 1,
	 * whose meaning is power controller specific. Since we
	 * know all components affected (for every component
	 * we declare its power domain(s) and states so we
	 * know what components are affected by the state entry.
	 * Given the cache node above and this phandle, the state
	 * implies that the cache is lost, register index == 0 state == 1
	 /*
	power-domain =<&foo_power_controller 0 1>;
};

It is complex but it is probably the cleanest way. And it leaves complexity
to power controller implementations (if managed in the kernel....), which
actually makes sense because it is up to power controller to define the
behavior of certain states.

All in all it is just an idea, feel free to scotch it, it is complex but
we have to sort it out, one way or another.

Vincent, Tomasz, anyone, thoughts ?
Lorenzo

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vincent Guittot Jan. 24, 2014, 8:40 a.m. UTC | #8
On 22 January 2014 20:20, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote:
> Hi Vincent,
>
> On Tue, Jan 21, 2014 at 11:16:46AM +0000, Vincent Guittot wrote:
>> Hi Lorenzo,
>>
>> On 20 January 2014 18:47, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote:
>> > ARM based platforms implement a variety of power management schemes that
>> > allow processors to enter at run-time low-power states, aka C-states
>> > in ACPI jargon. The parameters defining these C-states vary on a per-platform
>> > basis forcing the OS to hardcode the state parameters in platform
>> > specific static tables whose size grows as the number of platforms supported
>> > in the kernel increases and hampers device drivers standardization.
>> >
>> > Therefore, this patch aims at standardizing C-state device tree bindings for
>> > ARM platforms. Bindings define C-state parameters inclusive of entry methods
>> > and state latencies, to allow operating systems to retrieve the
>> > configuration entries from the device tree and initialize the related
>> > power management drivers, paving the way for common code in the kernel
>> > to deal with power states and removing the need for static data in current
>> > and previous kernel versions.
>> >
>> > Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
>> > ---
>> >  Documentation/devicetree/bindings/arm/c-states.txt | 774 +++++++++++++++++++++
>> >  Documentation/devicetree/bindings/arm/cpus.txt     |  10 +
>> >  2 files changed, 784 insertions(+)
>> >  create mode 100644 Documentation/devicetree/bindings/arm/c-states.txt
>> >
>> > diff --git a/Documentation/devicetree/bindings/arm/c-states.txt b/Documentation/devicetree/bindings/arm/c-states.txt
>> > new file mode 100644
>> > index 0000000..0b5617b
>> > --- /dev/null
>> > +++ b/Documentation/devicetree/bindings/arm/c-states.txt
>> > @@ -0,0 +1,774 @@
>> > +==========================================
>> > +ARM C-states binding description
>> > +==========================================
>> > +
>> > +==========================================
>> > +1 - Introduction
>> > +==========================================
>> > +
>> > +ARM systems contain HW capable of managing power consumption dynamically,
>> > +where cores can be put in different low-power states (ranging from simple
>> > +wfi to power gating) according to OSPM policies. Borrowing concepts
>> > +from the ACPI specification[1], the CPU states representing the range of
>> > +dynamic states that a processor can enter at run-time, aka C-state, can be
>> > +specified through device tree bindings representing the parameters required to
>> > +enter/exit specific C-states on a given processor.
>> > +
>> > +The state an ARM CPU can be put into is loosely identified by one of the
>> > +following operating modes:
>> > +
>> > +- Running:
>> > +        # Processor core is executing instructions
>> > +
>> > +- Wait for Interrupt:
>> > +       # An ARM processor enters wait for interrupt (WFI) low power
>> > +         state by executing a wfi instruction. When a processor enters
>> > +         wfi state it disables most of the clocks while keeping the processor
>> > +         powered up. This state is standard on all ARM processors and it is
>> > +         defined as C1 in the remainder of this document.
>> > +
>>
>> > +- Dormant:
>> > +       # Dormant mode is entered by executing wfi instructions and by sending
>> > +         platform specific commands to the platform power controller (coupled
>> > +         with processor specific SW/HW control sequences).
>> > +         In dormant mode, most of the processor control and debug logic is
>> > +         powered up but cache RAM can be put in retention state, providing
>>
>> Base on your description, it's not clear for me what is on, what is
>> lost and what is power down ?
>> My understand of the dormant mode that you described above is : the
>> cache is preserved (and especially the cache RAM) but the processor
>> state is lost (registers ...). Do I understand correctly ?
>>
>> What about retention mode where the contents of processor and cache
>> are preserved but the power consumption is reduced ? it can be seen as
>> a special wfi mode which need specific SW/HW control sequences but i'm
>> not sure to understand how to describe such state with your proposal.
>

Hi Lorenzo,

Sorry for the late reply,


> I had an idea. To simplify things, I think that one possibility is to
> add a parameter to the power domain specifier (platform specific, see
> Tomasz bindings):

We can't use a simple boolean state (on/off) for defining the
powerdomain state associated to a c-state so your proposal of being
able to add a parameter that will define the power domain state is
interesting.

>
> Documentation/devicetree/bindings/power/power_domain.txt
>
> http://lists.infradead.org/pipermail/linux-arm-kernel/2014-January/224928.html
>
> to represent, when that state is entered the behavior of the power
> controller (ie cache RAM retention or cache shutdown or in general any
> substate within a power domain). Since it is platform specific, and since
> we are able to link caches to the power domain, the power controller will
> actually define what happens to the cache when that state is entered
> (basically we use the power domain specifier additional parameter to define
> a "substate" in that power domain e.g.:
>
> Example:
>
> foo_power_controller {
>         [...]
>         /*
>          * first cell is register index, second one is the state index
>          * that in turn implies the state behavior - eg cache lost or
>          * retained
>          */
>         #power-domain-cells = <2>;
> };
>
> l1-cache {
>         [...]
>         /*
>          * syntax: power-domains = list of power domain specifiers
>                 <[&power_domain_phandle register-index state],[&power_domain_phandle register-index state]>;
>                 The syntax is defined by the power controller du jour
>                 as described by Tomasz bindings
>         */
>         power-domains =<&foo_power_controller 0 0 &foo_power_controller 0 1>;

Normally, power-domains describes a list of power domain specifiers
that are necessary for the l1-cache to at least retain its state so
i'm not sure understand your example above

If we take the example of system that support running, retention and
powerdown state described as state 0, 1 and 2 for the power domain, i
would have set the l1-cache like:
       power-domains =<&foo_power_controller 0 1>;

for saying that the state is retained up to state 1

Please look below, i have modified the rest of your example accordingly

>
> }:
>
> and then
>
> state0 {
>         index = <2>;
>         compatible = "arm,cpu-power-state";
>         latency = <...>;
>         /*
>          * This means that when the state is entered, the power
>          * controller should use register index 0 and state 0,
>          * whose meaning is power controller specific. Since we
>          * know all components affected (for every component
>          * we declare its power domain(s) and states so we
>          * know what components are affected by the state entry.
>          * Given the cache node above and this phandle, the state
>          * implies that the cache is retained, register index == 0 state == 0
>          /*
>         power-domain =<&foo_power_controller 0 0>;

for retention state we need to set the power domain in state 1
        power-domain =<&foo_power_controller 0 1>;

> };
>
> state1 {
>         index = <3>;
>         compatible = "arm,cpu-power-state";
>         latency = <...>;
>         /*
>          * This means that when the state is entered, the power
>          * controller should use register index 0 and state 1,
>          * whose meaning is power controller specific. Since we
>          * know all components affected (for every component
>          * we declare its power domain(s) and states so we
>          * know what components are affected by the state entry.
>          * Given the cache node above and this phandle, the state
>          * implies that the cache is lost, register index == 0 state == 1
>          /*
>         power-domain =<&foo_power_controller 0 1>;

for power down mode, we need to set thge power domain in state 2
        power-domain =<&foo_power_controller 0 2>;


Vincent

> };
>
> It is complex but it is probably the cleanest way. And it leaves complexity
> to power controller implementations (if managed in the kernel....), which
> actually makes sense because it is up to power controller to define the
> behavior of certain states.
>
> All in all it is just an idea, feel free to scotch it, it is complex but
> we have to sort it out, one way or another.
>
> Vincent, Tomasz, anyone, thoughts ?
> Lorenzo
>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Pieralisi Jan. 24, 2014, 5:58 p.m. UTC | #9
Hi Vincent,

On Fri, Jan 24, 2014 at 08:40:40AM +0000, Vincent Guittot wrote:

[...]

> Hi Lorenzo,
> 
> Sorry for the late reply,
> 
> 
> > I had an idea. To simplify things, I think that one possibility is to
> > add a parameter to the power domain specifier (platform specific, see
> > Tomasz bindings):
> 
> We can't use a simple boolean state (on/off) for defining the
> powerdomain state associated to a c-state so your proposal of being
> able to add a parameter that will define the power domain state is
> interesting.
> 
> >
> > Documentation/devicetree/bindings/power/power_domain.txt
> >
> > http://lists.infradead.org/pipermail/linux-arm-kernel/2014-January/224928.html
> >
> > to represent, when that state is entered the behavior of the power
> > controller (ie cache RAM retention or cache shutdown or in general any
> > substate within a power domain). Since it is platform specific, and since
> > we are able to link caches to the power domain, the power controller will
> > actually define what happens to the cache when that state is entered
> > (basically we use the power domain specifier additional parameter to define
> > a "substate" in that power domain e.g.:
> >
> > Example:
> >
> > foo_power_controller {
> >         [...]
> >         /*
> >          * first cell is register index, second one is the state index
> >          * that in turn implies the state behavior - eg cache lost or
> >          * retained
> >          */
> >         #power-domain-cells = <2>;
> > };
> >
> > l1-cache {
> >         [...]
> >         /*
> >          * syntax: power-domains = list of power domain specifiers
> >                 <[&power_domain_phandle register-index state],[&power_domain_phandle register-index state]>;
> >                 The syntax is defined by the power controller du jour
> >                 as described by Tomasz bindings
> >         */
> >         power-domains =<&foo_power_controller 0 0 &foo_power_controller 0 1>;
> 
> Normally, power-domains describes a list of power domain specifiers
> that are necessary for the l1-cache to at least retain its state so
> i'm not sure understand your example above

> 
> If we take the example of system that support running, retention and
> powerdown state described as state 0, 1 and 2 for the power domain, i
> would have set the l1-cache like:
>        power-domains =<&foo_power_controller 0 1>;
> 
> for saying that the state is retained up to state 1
> 
> Please look below, i have modified the rest of your example accordingly
> 
> >
> > }:
> >
> > and then
> >
> > state0 {
> >         index = <2>;
> >         compatible = "arm,cpu-power-state";
> >         latency = <...>;
> >         /*
> >          * This means that when the state is entered, the power
> >          * controller should use register index 0 and state 0,
> >          * whose meaning is power controller specific. Since we
> >          * know all components affected (for every component
> >          * we declare its power domain(s) and states so we
> >          * know what components are affected by the state entry.
> >          * Given the cache node above and this phandle, the state
> >          * implies that the cache is retained, register index == 0 state == 0
> >          /*
> >         power-domain =<&foo_power_controller 0 0>;
> 
> for retention state we need to set the power domain in state 1
>         power-domain =<&foo_power_controller 0 1>;
> 
> > };
> >
> > state1 {
> >         index = <3>;
> >         compatible = "arm,cpu-power-state";
> >         latency = <...>;
> >         /*
> >          * This means that when the state is entered, the power
> >          * controller should use register index 0 and state 1,
> >          * whose meaning is power controller specific. Since we
> >          * know all components affected (for every component
> >          * we declare its power domain(s) and states so we
> >          * know what components are affected by the state entry.
> >          * Given the cache node above and this phandle, the state
> >          * implies that the cache is lost, register index == 0 state == 1
> >          /*
> >         power-domain =<&foo_power_controller 0 1>;
> 
> for power down mode, we need to set thge power domain in state 2
>         power-domain =<&foo_power_controller 0 2>;

Ok, what I meant was not what you got, but your approach looks sensible
too. What I do not like is that the power-domain specifier is power
controller specific (that was true even for my example). In theory
we can achieve something identical by forcing every component in a power
domain to specify the max C-state index that allows it to retain its
state (through a specific property). Same logic to your example applies.
Nice thing is that we do not change the power domain specifiers, bad thing
is that it adds two properties to each device (c-state index and
power-domain-specifier - but we can make it hierarchical so that device
nodes can inherit the maximum operating C-state by inheriting the value
from a parent node providing a common value).

In my example the third parameter was just a number that the power
controller would decode (eg 0 = cache retained, 1 = cache lost)
according to its implementation, it was not a "state index". The
power controller would know what to do with eg a cache component (that
declares to be in that power domain) when a C-state with that power
domain specifier was entered.

Not very different from what you are saying, let's get to the nub:

- Either we define it in a platform specific way through the power
  domain specifier
- Or we force a max-c-state-supported property for every device,
  possibly hierarchical

Thoughts ?

Thank you !
Lorenzo

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Antti P Miettinen Jan. 25, 2014, 8:15 a.m. UTC | #10
From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Subject: [PATCH RFC v2 2/2] Documentation: arm: define DT C-states bindings
Date: Mon, 20 Jan 2014 17:47:59 +0000
> +	- latency
> +		Usage: Required
> +		Value type: <prop-encoded-array>
> +		Definition: List of u32 values representing worst case latency
> +			    in microseconds required to enter and exit the
> +			    C-state, one value per OPP [2]. The list should
> +			    be specified in the same order as the operating
> +			    points property list of the cpu this state is
> +			    valid on.
> +			    If no OPP bindings are present, the latency value
> +			    is associated with the current OPP of CPUs in the
> +			    system.

I'm afraid the CPU OPP is not enough to capture the variance in
latencies. Especially memory frequency affects some of the latencies
very stronly.

	--Antti
Lorenzo Pieralisi Jan. 27, 2014, 11:41 a.m. UTC | #11
On Sat, Jan 25, 2014 at 08:15:46AM +0000, Antti P Miettinen wrote:
> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Subject: [PATCH RFC v2 2/2] Documentation: arm: define DT C-states bindings
> Date: Mon, 20 Jan 2014 17:47:59 +0000
> > +	- latency
> > +		Usage: Required
> > +		Value type: <prop-encoded-array>
> > +		Definition: List of u32 values representing worst case latency
> > +			    in microseconds required to enter and exit the
> > +			    C-state, one value per OPP [2]. The list should
> > +			    be specified in the same order as the operating
> > +			    points property list of the cpu this state is
> > +			    valid on.
> > +			    If no OPP bindings are present, the latency value
> > +			    is associated with the current OPP of CPUs in the
> > +			    system.
> 
> I'm afraid the CPU OPP is not enough to capture the variance in
> latencies. Especially memory frequency affects some of the latencies
> very stronly.

That's why I defined the worst case. How did you implemented it in your
idle drivers ? That would help generalize it, after all these bindings
are there to simplify drivers upstreaming, feedback welcome.

Thanks,
Lorenzo
Antti P Miettinen Jan. 27, 2014, 12:48 p.m. UTC | #12
From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> That's why I defined the worst case. How did you implemented it in your
> idle drivers ? That would help generalize it, after all these bindings
> are there to simplify drivers upstreaming, feedback welcome.

Currently we do not handle this well downstream either. The problem
with worst case is that the absolute worst case can be really bad and
probability of it might be very low. Sorry - no ready answer :-)

	--Antti
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Pieralisi Jan. 27, 2014, 6:22 p.m. UTC | #13
On Mon, Jan 27, 2014 at 12:48:15PM +0000, Antti P Miettinen wrote:
> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > That's why I defined the worst case. How did you implemented it in your
> > idle drivers ? That would help generalize it, after all these bindings
> > are there to simplify drivers upstreaming, feedback welcome.
> 
> Currently we do not handle this well downstream either. The problem
> with worst case is that the absolute worst case can be really bad and
> probability of it might be very low. Sorry - no ready answer :-)

Point taken, but these bindings still get us to a point that is much
better than today situation. After all, if the worst case can happen
either we design for worst case or we update parameters at runtime in
the kernel (which is not happening as of now) according to a notification
mechanism.

It is certainly worth investigating, probably we can define OPPs as
generic (ie not tied to the CPU), as performance point or system
operating points. I will think about this.

In the meantime if you have further pieces of feedback please keep them
coming.

Thanks,
Lorenzo
Vincent Guittot Jan. 28, 2014, 8:24 a.m. UTC | #14
On 24 January 2014 18:58, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote:
> Hi Vincent,
>
> On Fri, Jan 24, 2014 at 08:40:40AM +0000, Vincent Guittot wrote:
>
> [...]
>
>> Hi Lorenzo,
>>
>> Sorry for the late reply,
>>
>>
>> > I had an idea. To simplify things, I think that one possibility is to
>> > add a parameter to the power domain specifier (platform specific, see
>> > Tomasz bindings):
>>
>> We can't use a simple boolean state (on/off) for defining the
>> powerdomain state associated to a c-state so your proposal of being
>> able to add a parameter that will define the power domain state is
>> interesting.
>>
>> >
>> > Documentation/devicetree/bindings/power/power_domain.txt
>> >
>> > http://lists.infradead.org/pipermail/linux-arm-kernel/2014-January/224928.html
>> >
>> > to represent, when that state is entered the behavior of the power
>> > controller (ie cache RAM retention or cache shutdown or in general any
>> > substate within a power domain). Since it is platform specific, and since
>> > we are able to link caches to the power domain, the power controller will
>> > actually define what happens to the cache when that state is entered
>> > (basically we use the power domain specifier additional parameter to define
>> > a "substate" in that power domain e.g.:
>> >
>> > Example:
>> >
>> > foo_power_controller {
>> >         [...]
>> >         /*
>> >          * first cell is register index, second one is the state index
>> >          * that in turn implies the state behavior - eg cache lost or
>> >          * retained
>> >          */
>> >         #power-domain-cells = <2>;
>> > };
>> >
>> > l1-cache {
>> >         [...]
>> >         /*
>> >          * syntax: power-domains = list of power domain specifiers
>> >                 <[&power_domain_phandle register-index state],[&power_domain_phandle register-index state]>;
>> >                 The syntax is defined by the power controller du jour
>> >                 as described by Tomasz bindings
>> >         */
>> >         power-domains =<&foo_power_controller 0 0 &foo_power_controller 0 1>;
>>
>> Normally, power-domains describes a list of power domain specifiers
>> that are necessary for the l1-cache to at least retain its state so
>> i'm not sure understand your example above
>
>>
>> If we take the example of system that support running, retention and
>> powerdown state described as state 0, 1 and 2 for the power domain, i
>> would have set the l1-cache like:
>>        power-domains =<&foo_power_controller 0 1>;
>>
>> for saying that the state is retained up to state 1
>>
>> Please look below, i have modified the rest of your example accordingly
>>
>> >
>> > }:
>> >
>> > and then
>> >
>> > state0 {
>> >         index = <2>;
>> >         compatible = "arm,cpu-power-state";
>> >         latency = <...>;
>> >         /*
>> >          * This means that when the state is entered, the power
>> >          * controller should use register index 0 and state 0,
>> >          * whose meaning is power controller specific. Since we
>> >          * know all components affected (for every component
>> >          * we declare its power domain(s) and states so we
>> >          * know what components are affected by the state entry.
>> >          * Given the cache node above and this phandle, the state
>> >          * implies that the cache is retained, register index == 0 state == 0
>> >          /*
>> >         power-domain =<&foo_power_controller 0 0>;
>>
>> for retention state we need to set the power domain in state 1
>>         power-domain =<&foo_power_controller 0 1>;
>>
>> > };
>> >
>> > state1 {
>> >         index = <3>;
>> >         compatible = "arm,cpu-power-state";
>> >         latency = <...>;
>> >         /*
>> >          * This means that when the state is entered, the power
>> >          * controller should use register index 0 and state 1,
>> >          * whose meaning is power controller specific. Since we
>> >          * know all components affected (for every component
>> >          * we declare its power domain(s) and states so we
>> >          * know what components are affected by the state entry.
>> >          * Given the cache node above and this phandle, the state
>> >          * implies that the cache is lost, register index == 0 state == 1
>> >          /*
>> >         power-domain =<&foo_power_controller 0 1>;
>>
>> for power down mode, we need to set thge power domain in state 2
>>         power-domain =<&foo_power_controller 0 2>;
>
> Ok, what I meant was not what you got, but your approach looks sensible
> too. What I do not like is that the power-domain specifier is power

sorry for the misconception of your example

> controller specific (that was true even for my example). In theory
> we can achieve something identical by forcing every component in a power
> domain to specify the max C-state index that allows it to retain its

I'm not sure that we should force a component to set an opaque (for
the component) max c-state. The device should describe its power
domain requirements and the correlation of the latter with the
description of the c-state binding should be enough to deduct the max
c-state.

> state (through a specific property). Same logic to your example applies.
> Nice thing is that we do not change the power domain specifiers, bad thing
> is that it adds two properties to each device (c-state index and
> power-domain-specifier - but we can make it hierarchical so that device
> nodes can inherit the maximum operating C-state by inheriting the value
> from a parent node providing a common value).
>
> In my example the third parameter was just a number that the power
> controller would decode (eg 0 = cache retained, 1 = cache lost)
> according to its implementation, it was not a "state index". The
> power controller would know what to do with eg a cache component (that
> declares to be in that power domain) when a C-state with that power
> domain specifier was entered.
>
> Not very different from what you are saying, let's get to the nub:
>
> - Either we define it in a platform specific way through the power
>   domain specifier
> - Or we force a max-c-state-supported property for every device,
>   possibly hierarchical

As explained above, adding a max-cstate property for a device that
only know the power-domain is not a good thing IMHO.

Vincent

>
> Thoughts ?
>
> Thank you !
> Lorenzo
>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Pieralisi Jan. 29, 2014, 12:42 p.m. UTC | #15
On Tue, Jan 28, 2014 at 08:24:54AM +0000, Vincent Guittot wrote:
> On 24 January 2014 18:58, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote:
[...]

> >> Please look below, i have modified the rest of your example accordingly
> >>
> >> >
> >> > }:
> >> >
> >> > and then
> >> >
> >> > state0 {
> >> >         index = <2>;
> >> >         compatible = "arm,cpu-power-state";
> >> >         latency = <...>;
> >> >         /*
> >> >          * This means that when the state is entered, the power
> >> >          * controller should use register index 0 and state 0,
> >> >          * whose meaning is power controller specific. Since we
> >> >          * know all components affected (for every component
> >> >          * we declare its power domain(s) and states so we
> >> >          * know what components are affected by the state entry.
> >> >          * Given the cache node above and this phandle, the state
> >> >          * implies that the cache is retained, register index == 0 state == 0
> >> >          /*
> >> >         power-domain =<&foo_power_controller 0 0>;
> >>
> >> for retention state we need to set the power domain in state 1
> >>         power-domain =<&foo_power_controller 0 1>;
> >>
> >> > };
> >> >
> >> > state1 {
> >> >         index = <3>;
> >> >         compatible = "arm,cpu-power-state";
> >> >         latency = <...>;
> >> >         /*
> >> >          * This means that when the state is entered, the power
> >> >          * controller should use register index 0 and state 1,
> >> >          * whose meaning is power controller specific. Since we
> >> >          * know all components affected (for every component
> >> >          * we declare its power domain(s) and states so we
> >> >          * know what components are affected by the state entry.
> >> >          * Given the cache node above and this phandle, the state
> >> >          * implies that the cache is lost, register index == 0 state == 1
> >> >          /*
> >> >         power-domain =<&foo_power_controller 0 1>;
> >>
> >> for power down mode, we need to set thge power domain in state 2
> >>         power-domain =<&foo_power_controller 0 2>;
> >
> > Ok, what I meant was not what you got, but your approach looks sensible
> > too. What I do not like is that the power-domain specifier is power
> 
> sorry for the misconception of your example
> 
> > controller specific (that was true even for my example). In theory
> > we can achieve something identical by forcing every component in a power
> > domain to specify the max C-state index that allows it to retain its
> 
> I'm not sure that we should force a component to set an opaque (for
> the component) max c-state. The device should describe its power
> domain requirements and the correlation of the latter with the
> description of the c-state binding should be enough to deduct the max
> c-state.

I agree, that was an option, I just loathe the idea of implementing it.
Using power domain specifiers is ways cleaner IMHO, the only drawback is
that, it is up to the power domain documentation to define what a state
means in terms of save/restore and cache behavior. I think that makes
perfect sense, at least for me.

> > state (through a specific property). Same logic to your example applies.
> > Nice thing is that we do not change the power domain specifiers, bad thing
> > is that it adds two properties to each device (c-state index and
> > power-domain-specifier - but we can make it hierarchical so that device
> > nodes can inherit the maximum operating C-state by inheriting the value
> > from a parent node providing a common value).
> >
> > In my example the third parameter was just a number that the power
> > controller would decode (eg 0 = cache retained, 1 = cache lost)
> > according to its implementation, it was not a "state index". The
> > power controller would know what to do with eg a cache component (that
> > declares to be in that power domain) when a C-state with that power
> > domain specifier was entered.
> >
> > Not very different from what you are saying, let's get to the nub:
> >
> > - Either we define it in a platform specific way through the power
> >   domain specifier
> > - Or we force a max-c-state-supported property for every device,
> >   possibly hierarchical
> 
> As explained above, adding a max-cstate property for a device that
> only know the power-domain is not a good thing IMHO.

I agree, if nobody complains that's the way I will define the bindings.

Thank you,
Lorenzo

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/arm/c-states.txt b/Documentation/devicetree/bindings/arm/c-states.txt
new file mode 100644
index 0000000..0b5617b
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/c-states.txt
@@ -0,0 +1,774 @@ 
+==========================================
+ARM C-states binding description
+==========================================
+
+==========================================
+1 - Introduction
+==========================================
+
+ARM systems contain HW capable of managing power consumption dynamically,
+where cores can be put in different low-power states (ranging from simple
+wfi to power gating) according to OSPM policies. Borrowing concepts
+from the ACPI specification[1], the CPU states representing the range of
+dynamic states that a processor can enter at run-time, aka C-state, can be
+specified through device tree bindings representing the parameters required to
+enter/exit specific C-states on a given processor.
+
+The state an ARM CPU can be put into is loosely identified by one of the
+following operating modes:
+
+- Running:
+	 # Processor core is executing instructions
+
+- Wait for Interrupt:
+	# An ARM processor enters wait for interrupt (WFI) low power
+	  state by executing a wfi instruction. When a processor enters
+	  wfi state it disables most of the clocks while keeping the processor
+	  powered up. This state is standard on all ARM processors and it is
+	  defined as C1 in the remainder of this document.
+
+- Dormant:
+	# Dormant mode is entered by executing wfi instructions and by sending
+	  platform specific commands to the platform power controller (coupled
+	  with processor specific SW/HW control sequences).
+	  In dormant mode, most of the processor control and debug logic is
+	  powered up but cache RAM can be put in retention state, providing
+	  additional power savings.
+
+- Sleep:
+	# Sleep mode is entered by executing the wfi instruction and by sending
+	  platform specific commands to the platform power controller (coupled
+	  with processor specific SW/HW control sequences). In sleep mode, a
+	  processor and its caches are shutdown, the entire processor state is
+	  lost.
+
+Building on top of the previous processor modes, ARM platforms implement power
+management schemes that allow an OS PM implementation to put the processor in
+different CPU states (C-states). C-states parameters (eg latency) are
+platform specific and need to be characterized with bindings that provide the
+required information to OSPM code so that it can build the required tables and
+use them at runtime.
+
+The device tree binding definition for ARM C-states is the subject of this
+document.
+
+===========================================
+2 - cpu-power-states node
+===========================================
+
+ARM processor C-states are defined within the cpu-power-states node, which is
+a direct child of the cpus node and provides a container where the processor
+states, defined as device tree nodes, are listed.
+
+- cpu-power-states node
+
+	Usage: Optional - On ARM systems, is a container of processor C-state
+			  nodes. If the system does not provide CPU power
+			  management capabilities or the processor just
+			  supports WFI (C1 state) a cpu-power-states node is
+			  not required.
+
+	Description: cpu-power-states node is a container node, where its
+		     subnodes describe the CPU low-power C-states.
+
+	Node name must be "cpu-power-states".
+
+	The cpu-power-states node's parent node must be cpus node.
+
+	The cpu-power-states node's child nodes can be:
+
+	- one or more state nodes
+
+	Any other configuration is considered invalid.
+
+The nodes describing the C-states (state) can only be defined within the
+cpu-power-states node.
+
+Any other configuration is consider invalid and therefore must be ignored.
+
+===========================================
+2 - state node
+===========================================
+
+A state node represents a C-state description and must be defined as follows:
+
+- state node
+
+	Description: must be child of either the cpu-power-states node or
+		     a state node.
+
+	The state node name shall be "stateN", where N = {0, 1, ...} is
+	the node number; state nodes which are siblings within a single common
+	parent node must be given a unique and sequential N value, starting
+	from 0.
+
+	A state node can contain state child nodes. Child nodes inherit
+	properties from the parent state nodes that work as state
+	properties aggregators (ie contain properties valid on all state
+	nodes children).
+
+	A state node defines the following properties (either explicitly
+	or by inheriting them from a parent node):
+
+	- compatible
+		Usage: Required
+		Value type: <stringlist>
+		Definition: Must be "arm,cpu-power-state".
+
+	- index
+		Usage: Required
+		Value type: <u32>
+		Definition: It represents C-state index, starting from 2 (index
+			    0 represents the processor state "running" and
+			    index 1 represents processor mode "WFI"; indexes 0
+			    and 1 are standard ARM states that need not be
+			    described).
+
+	- power-domain
+		Usage: Required
+		Value type: <prop-encoded-array>
+		Definition: List of phandle and power domain specifiers
+			    as defined by bindings of power controller
+			    specified by the phandle [3]. It represents the
+			    power domains associated with the C-state. The
+			    power domains list can be used by OSPM to
+			    retrieve the devices belonging to the power
+			    domains and carry out corresponding actions to
+			    preserve functionality across power cycles
+			    (ie context save/restore, cache flushing).
+
+	- entry-method
+		Usage: Required
+		Value type: <stringlist>
+		Definition: Describes the method by which a CPU enters the
+			    C-state. This property is required and must be one
+			    of:
+
+			    - "psci"
+			      ARM Standard firmware interface
+
+			    - "[vendor],[method]"
+			      An implementation dependent string with
+			      format "vendor,method", where vendor is a string
+			      denoting the name of the manufacturer and
+			      method is a string specifying the mechanism
+			      used to enter the C-state.
+
+	- psci-power-state
+		Usage: Required if entry-method property value is set to
+		       "psci".
+		Value type: <u32>
+		Definition: power_state parameter to pass to the PSCI
+			    suspend call to enter the C-state.
+
+	- latency
+		Usage: Required
+		Value type: <prop-encoded-array>
+		Definition: List of u32 values representing worst case latency
+			    in microseconds required to enter and exit the
+			    C-state, one value per OPP [2]. The list should
+			    be specified in the same order as the operating
+			    points property list of the cpu this state is
+			    valid on.
+			    If no OPP bindings are present, the latency value
+			    is associated with the current OPP of CPUs in the
+			    system.
+
+	- min-residency
+		Usage: Required
+		Value type: <prop-encoded-array>
+		Definition: List of u32 values representing time in
+			    microseconds required for the CPU to be in
+			    the C-state to make up for the dynamic power
+			    consumed to enter/exit the C-state in order to
+			    break even in terms of power consumption compared
+			    to C1 state (wfi), one value per-OPP [2].
+			    This parameter depends on the operating conditions
+			    (HW state) and must assume worst case scenario.
+			    The list should be specified in the same order as
+			    the operating points property list of the cpu this
+			    state is valid on.
+			    If no OPP bindings are present the min-residency
+			    value is associated with the current OPP of CPUs
+			    in the system.
+
+===========================================
+3 - Examples
+===========================================
+
+Example 1 (ARM 64-bit, 16-cpu system, two clusters of clusters):
+
+pd_clusters: power-domain-clusters@80002000 {
+	compatible = "arm,power-controller";
+	reg = <0x0 0x80002000 0x0 0x1000>;
+	#power-domain-cells = <1>;
+	#address-cells = <2>;
+	#size-cells = <2>;
+
+	pd_cores: power-domain-cores@80000000 {
+		compatible = "arm,power-controller";
+		reg = <0x0 0x80000000 0x0 0x1000>;
+		#power-domain-cells = <1>;
+	};
+};
+
+cpus {
+	#size-cells = <0>;
+	#address-cells = <2>;
+
+	cpu-power-states {
+
+		state0 {
+			compatible = "arm,cpu-power-state";
+			index = <2>;
+			entry-method = "psci";
+			psci-power-state = <0x1010000>;
+			latency = <400>;
+			min-residency = <300>;
+			STATE0_0: state0 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 0>;
+			};
+			STATE0_1: state1 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 1>;
+			};
+			STATE0_2: state2 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 2>;
+			};
+			STATE0_3: state3 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 3>;
+			};
+			STATE0_4: state4 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 4>;
+			};
+			STATE0_5: state5 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 5>;
+			};
+			STATE0_6: state6 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 6>;
+			};
+			STATE0_7: state7 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 7>;
+			};
+		};
+
+		state1 {
+			compatible = "arm,cpu-power-state";
+			index = <2>;
+			entry-method = "psci";
+			psci-power-state = <0x1010000>;
+			latency = <400>;
+			min-residency = <500>;
+			STATE1_0: state0 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 8>;
+			};
+			STATE1_1: state1 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 9>;
+			};
+			STATE1_2: state2 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 10>;
+			};
+			STATE1_3: state3 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 11>;
+			};
+			STATE1_4: state4 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 12>;
+			};
+			STATE1_5: state5 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 13>;
+			};
+			STATE1_6: state6 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 14>;
+			};
+			STATE1_7: state7 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 15>;
+			};
+		};
+
+		STATE2: state2 {
+			compatible = "arm,cpu-power-state";
+			index = <3>;
+			entry-method = "psci";
+			psci-power-state = <0x3010000>;
+			latency = <1000>;
+			min-residency = <2500>;
+			power-domain = <&pd_clusters 0>;
+		};
+
+		STATE3: state3 {
+			compatible = "arm,cpu-power-state";
+			index = <3>;
+			entry-method = "psci";
+			psci-power-state = <0x3010000>;
+			latency = <4500>;
+			min-residency = <6500>;
+			power-domain = <&pd_clusters 1>;
+		};
+	};
+
+	CPU0: cpu@0 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x0>;
+		enable-method = "psci";
+		next-level-cache = <&L1_0>;
+		cpu-power-states = <&STATE0_0 &STATE2>;
+		L1_0: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 0>;
+		};
+		L2_0: l2-cache {
+			compatible = "arm,arch-cache";
+			power-domain = <&pd_clusters 0>;
+		};
+	};
+
+	CPU1: cpu@1 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x1>;
+		enable-method = "psci";
+		next-level-cache = <&L1_1>;
+		cpu-power-states = <&STATE0_1 &STATE2>;
+		L1_1: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 1>;
+		};
+	};
+
+	CPU2: cpu@100 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x100>;
+		enable-method = "psci";
+		next-level-cache = <&L1_2>;
+		cpu-power-states = <&STATE0_2 &STATE2>;
+		L1_2: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 2>;
+		};
+	};
+
+	CPU3: cpu@101 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x101>;
+		enable-method = "psci";
+		next-level-cache = <&L1_3>;
+		cpu-power-states = <&STATE0_3 &STATE2>;
+		L1_3: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 3>;
+		};
+	};
+
+	CPU4: cpu@10000 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x10000>;
+		enable-method = "psci";
+		next-level-cache = <&L1_4>;
+		cpu-power-states = <&STATE0_4 &STATE2>;
+		L1_4: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 4>;
+		};
+	};
+
+	CPU5: cpu@10001 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x10001>;
+		enable-method = "psci";
+		next-level-cache = <&L1_5>;
+		cpu-power-states = <&STATE0_5 &STATE2>;
+		L1_5: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 5>;
+		};
+	};
+
+	CPU6: cpu@10100 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x10100>;
+		enable-method = "psci";
+		next-level-cache = <&L1_6>;
+		cpu-power-states = <&STATE0_6 &STATE2>;
+		L1_6: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 6>;
+		};
+	};
+
+	CPU7: cpu@10101 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x10101>;
+		enable-method = "psci";
+		next-level-cache = <&L1_7>;
+		cpu-power-states = <&STATE0_7 &STATE2>;
+		L1_7: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 7>;
+		};
+	};
+
+	CPU8: cpu@100000000 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x0>;
+		enable-method = "psci";
+		next-level-cache = <&L1_8>;
+		cpu-power-states = <&STATE1_0 &STATE3>;
+		L1_8: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 8>;
+		};
+		L2_1: l2-cache {
+			compatible = "arm,arch-cache";
+			power-domain = <&pd_clusters 1>;
+		};
+	};
+
+	CPU9: cpu@100000001 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x1>;
+		enable-method = "psci";
+		next-level-cache = <&L1_9>;
+		cpu-power-states = <&STATE1_1 &STATE3>;
+		L1_9: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 9>;
+		};
+	};
+
+	CPU10: cpu@100000100 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x100>;
+		enable-method = "psci";
+		next-level-cache = <&L1_10>;
+		cpu-power-states = <&STATE1_2 &STATE3>;
+		L1_10: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 10>;
+		};
+	};
+
+	CPU11: cpu@100000101 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x101>;
+		enable-method = "psci";
+		next-level-cache = <&L1_11>;
+		cpu-power-states = <&STATE1_3 &STATE3>;
+		L1_11: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 11>;
+		};
+	};
+
+	CPU12: cpu@100010000 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x10000>;
+		enable-method = "psci";
+		next-level-cache = <&L1_12>;
+		cpu-power-states = <&STATE1_4 &STATE3>;
+		L1_12: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 12>;
+		};
+	};
+
+	CPU13: cpu@100010001 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x10001>;
+		enable-method = "psci";
+		next-level-cache = <&L1_13>;
+		cpu-power-states = <&STATE1_5 &STATE3>;
+		L1_13: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 13>;
+		};
+	};
+
+	CPU14: cpu@100010100 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x10100>;
+		enable-method = "psci";
+		next-level-cache = <&L1_14>;
+		cpu-power-states = <&STATE1_6 &STATE3>;
+		L1_14: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 14>;
+		};
+	};
+
+	CPU15: cpu@100010101 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x10101>;
+		enable-method = "psci";
+		next-level-cache = <&L1_15>;
+		cpu-power-states = <&STATE1_7 &STATE3>;
+		L1_15: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 15>;
+		};
+	};
+};
+
+Example 2 (ARM 32-bit, 8-cpu system, two clusters):
+
+pd_clusters: power-domain-clusters@80002000 {
+	compatible = "arm,power-controller";
+	reg = <0x80002000 0x1000>;
+	#power-domain-cells = <1>;
+	#address-cells = <1>;
+	#size-cells = <1>;
+
+	pd_cores: power-domain-cores@80000000 {
+		compatible = "arm,power-controller";
+		reg = <0x80000000 0x1000>;
+		#power-domain-cells = <1>;
+	};
+};
+
+cpus {
+	#size-cells = <0>;
+	#address-cells = <1>;
+
+	cpu-power-states {
+
+		state0 {
+			compatible = "arm,cpu-power-state";
+			index = <2>;
+			entry-method = "psci";
+			psci-power-state = <0x1010000>;
+			latency = <400>;
+			min-residency = <300>;
+			STATE0_0: state0 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 0>;
+			};
+			STATE0_1: state1 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 1>;
+			};
+			STATE0_2: state2 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 2>;
+			};
+			STATE0_3: state3 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 3>;
+			};
+		};
+
+		state1 {
+			compatible = "arm,cpu-power-state";
+			index = <2>;
+			entry-method = "psci";
+			psci-power-state = <0x1010000>;
+			latency = <400>;
+			min-residency = <500>;
+			STATE1_0: state0 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 4>;
+			};
+			STATE1_1: state1 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 5>;
+			};
+			STATE1_2: state2 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 6>;
+			};
+			STATE1_3: state3 {
+				compatible = "arm,cpu-power-state";
+				power-domain = <&pd_cores 7>;
+			};
+		};
+
+		STATE2: state2 {
+			compatible = "arm,cpu-power-state";
+			index = <3>;
+			entry-method = "psci";
+			psci-power-state = <0x3010000>;
+			latency = <1000>;
+			min-residency = <1500>;
+			power-domain = <&pd_clusters 0>;
+		};
+
+		STATE3: state3 {
+			compatible = "arm,cpu-power-state";
+			index = <3>;
+			entry-method = "psci";
+			psci-power-state = <0x3010000>;
+			latency = <4500>;
+			min-residency = <6500>;
+			power-domain = <&pd_clusters 1>;
+		};
+	};
+
+	CPU0: cpu@0 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a15";
+		reg = <0x0>;
+		next-level-cache = <&L1_0>;
+		cpu-power-states = <&STATE0_0 &STATE2>;
+		L1_0: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 0>;
+		};
+		L2_0: l2-cache {
+			compatible = "arm,arch-cache";
+			power-domain = <&pd_clusters 0>;
+		};
+	};
+
+	CPU1: cpu@1 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a15";
+		reg = <0x1>;
+		next-level-cache = <&L1_1>;
+		cpu-power-states = <&STATE0_1 &STATE2>;
+		L1_1: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 1>;
+		};
+	};
+
+	CPU2: cpu@2 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a15";
+		reg = <0x2>;
+		next-level-cache = <&L1_2>;
+		cpu-power-states = <&STATE0_2 &STATE2>;
+		L1_2: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 2>;
+		};
+	};
+
+	CPU3: cpu@3 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a15";
+		reg = <0x3>;
+		next-level-cache = <&L1_3>;
+		cpu-power-states = <&STATE0_3 &STATE2>;
+		L1_3: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 3>;
+		};
+	};
+
+	CPU4: cpu@100 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a7";
+		reg = <0x100>;
+		next-level-cache = <&L1_4>;
+		cpu-power-states = <&STATE1_0 &STATE3>;
+		L1_4: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 4>;
+		};
+		L2_1: l2-cache {
+			compatible = "arm,arch-cache";
+			power-domain = <&pd_clusters 1>;
+		};
+	};
+
+	CPU5: cpu@101 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a7";
+		reg = <0x101>;
+		next-level-cache = <&L1_5>;
+		cpu-power-states = <&STATE1_1 &STATE3>;
+		L1_5: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 5>;
+		};
+	};
+
+	CPU6: cpu@102 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a7";
+		reg = <0x102>;
+		next-level-cache = <&L1_6>;
+		cpu-power-states = <&STATE1_2 &STATE3>;
+		L1_6: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 6>;
+		};
+	};
+
+	CPU7: cpu@103 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a7";
+		reg = <0x103>;
+		next-level-cache = <&L1_7>;
+		cpu-power-states = <&STATE1_3 &STATE3>;
+		L1_7: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 7>;
+		};
+	};
+};
+
+===========================================
+4 - References
+===========================================
+
+[1] ACPI v5.0 specification
+    http://www.acpi.info/spec50.htm
+
+[2] ARM Linux kernel documentation - OPP bindings
+    Documentation/devicetree/bindings/power/opp.txt
+
+[3] ARM Linux Kernel documentation - power domain bindings
+    Documentation/devicetree/bindings/power/power_domain.txt
diff --git a/Documentation/devicetree/bindings/arm/cpus.txt b/Documentation/devicetree/bindings/arm/cpus.txt
index 9130435..a3c9193 100644
--- a/Documentation/devicetree/bindings/arm/cpus.txt
+++ b/Documentation/devicetree/bindings/arm/cpus.txt
@@ -191,6 +191,13 @@  nodes to be present and contain the properties described below.
 			  property identifying a 64-bit zero-initialised
 			  memory location.
 
+	- cpu-power-states
+		Usage: Optional
+		Value type: <prop-encoded-array>
+		Definition:
+			# List of phandles to cpu power state nodes supported
+			  by this cpu [1].
+
 Example 1 (dual-cluster big.LITTLE system 32-bit):
 
 	cpus {
@@ -382,3 +389,6 @@  cpus {
 		cpu-release-addr = <0 0x20000000>;
 	};
 };
+
+[1] ARM Linux kernel documentation - C-state bindings
+    Documentation/devicetree/bindings/arm/c-states.txt