diff mbox

[RFC,v4,3/3] Documentation: arm: define DT idle states bindings

Message ID 1392724051-11950-4-git-send-email-lorenzo.pieralisi@arm.com
State New
Headers show

Commit Message

Lorenzo Pieralisi Feb. 18, 2014, 11:47 a.m. UTC
ARM based platforms implement a variety of power management schemes that
allow processors to enter idle states at run-time.
The parameters defining these idle states vary on a per-platform basis forcing
the OS to hardcode the state parameters in platform specific static tables
whose size grows as the number of platforms supported in the kernel increases
and hampers device drivers standardization.

Therefore, this patch aims at standardizing idle state device tree bindings for
ARM platforms. Bindings define idle state parameters inclusive of entry methods
and state latencies, to allow operating systems to retrieve the configuration
entries from the device tree and initialize the related power management
drivers, paving the way for common code in the kernel to deal with idle
states and removing the need for static data in current and previous kernel
versions.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
---
 Documentation/devicetree/bindings/arm/cpus.txt        |  10 +
 Documentation/devicetree/bindings/arm/idle-states.txt | 781 +++++
 2 files changed, 791 insertions(+)

Comments

Sebastian Capella Feb. 19, 2014, 4:04 p.m. UTC | #1
Quoting Lorenzo Pieralisi (2014-02-18 03:47:31)
> +       - index
> +               Usage: Required
> +               Value type: <u32>
> +               Definition: It represents the idle state index.
> +                           An increasing index value implies less power
> +                           consumption. Index must be given a sequential
> +                           value = {0, 1, ....}, starting from 0.
One minor comment.  In the example, it can be tricky to see how this is sequential
since the states interleave.  Not sure if it merits rewording here?

These look good to me!

Thanks!

Sebastian
Antti P Miettinen March 17, 2014, 11:15 a.m. UTC | #2
Sorry for having been lazy in commenting..

From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Date: Tue, 18 Feb 2014 11:47:31 +0000
> +	- min-residency
> +		Usage: Required
> +		Value type: <prop-encoded-array>
> +		Definition: u32 value representing time in microseconds
> +			    required for the CPU to be in the idle state to
> +			    break even in power consumption terms compared
> +			    to idle state idle_standby ([4][5]).

To me this continues to be a bit illdefined. Say we have three states:
0,1,2. State 0 is the idle_standby. Providing a minimum residency for
state 1 compared to state 0 sort of makes sense, but if we provide a
minimum residency for state 2 compared to state 0 the break even time
is going to be smaller than break even when comparing state 1 and
state 2. With this data we'd enter state 2 when we'd be better off
entering state 1.

	--Antti
Lorenzo Pieralisi March 17, 2014, 11:53 a.m. UTC | #3
Hi Antti,

On Mon, Mar 17, 2014 at 11:15:07AM +0000, Antti P Miettinen wrote:
> Sorry for having been lazy in commenting..

No worries, comments always welcome.

> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Date: Tue, 18 Feb 2014 11:47:31 +0000
> > +	- min-residency
> > +		Usage: Required
> > +		Value type: <prop-encoded-array>
> > +		Definition: u32 value representing time in microseconds
> > +			    required for the CPU to be in the idle state to
> > +			    break even in power consumption terms compared
> > +			    to idle state idle_standby ([4][5]).
> 
> To me this continues to be a bit illdefined. Say we have three states:
> 0,1,2. State 0 is the idle_standby. Providing a minimum residency for
> state 1 compared to state 0 sort of makes sense, but if we provide a
> minimum residency for state 2 compared to state 0 the break even time
> is going to be smaller than break even when comparing state 1 and
> state 2. With this data we'd enter state 2 when we'd be better off
> entering state 1.

I am not sure I got your reply right, but min-residency for
state 2 will be higher than state 1, since it has to cater for the
dynamic power consumed by entering the state (but burns less power
than state 1 when _in_ the state).

Entering a state has a power cost and min-residency should take that into
account, worst-case as per other stats.

min-residency (and so the break-even) should take into account that
entering the state is not for free.

I think that comparing against idle_standby is the only sane way we can
define that parameter, either that or we remove it.

Does it make sense ?

Thanks !
Lorenzo
Antti P Miettinen March 17, 2014, 1:49 p.m. UTC | #4
From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Hi Antti,
> 
> On Mon, Mar 17, 2014 at 11:15:07AM +0000, Antti P Miettinen wrote:
>> Sorry for having been lazy in commenting..
> 
> No worries, comments always welcome.
> 
>> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
>> Date: Tue, 18 Feb 2014 11:47:31 +0000
>> > +	- min-residency
>> > +		Usage: Required
>> > +		Value type: <prop-encoded-array>
>> > +		Definition: u32 value representing time in microseconds
>> > +			    required for the CPU to be in the idle state to
>> > +			    break even in power consumption terms compared
>> > +			    to idle state idle_standby ([4][5]).
>> 
>> To me this continues to be a bit illdefined. Say we have three states:
>> 0,1,2. State 0 is the idle_standby. Providing a minimum residency for
>> state 1 compared to state 0 sort of makes sense, but if we provide a
>> minimum residency for state 2 compared to state 0 the break even time
>> is going to be smaller than break even when comparing state 1 and
>> state 2. With this data we'd enter state 2 when we'd be better off
>> entering state 1.
> 
> I am not sure I got your reply right, but min-residency for
> state 2 will be higher than state 1, since it has to cater for the
> dynamic power consumed by entering the state (but burns less power
> than state 1 when _in_ the state).
> 
> Entering a state has a power cost and min-residency should take that into
> account, worst-case as per other stats.
> 
> min-residency (and so the break-even) should take into account that
> entering the state is not for free.
> 
> I think that comparing against idle_standby is the only sane way we can
> define that parameter, either that or we remove it.
> 
> Does it make sense ?
> 
> Thanks !
> Lorenzo

The point is that if you compare breakeven between state 0 and state 2
the breakeven time will be smaller that when you compare the breakeven
between state 1 and state 2. Assuming states ordered by "deepness" in
the sense that deeper states have lower in-state power and longer
entry/exit times.

I guess you could specify that the min-residency defines the time when
the state breaks even compared to the previous (shallower) state.

	--Antti
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Pieralisi March 17, 2014, 2:45 p.m. UTC | #5
On Mon, Mar 17, 2014 at 01:49:40PM +0000, Antti P Miettinen wrote:
> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > Hi Antti,
> > 
> > On Mon, Mar 17, 2014 at 11:15:07AM +0000, Antti P Miettinen wrote:
> >> Sorry for having been lazy in commenting..
> > 
> > No worries, comments always welcome.
> > 
> >> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> >> Date: Tue, 18 Feb 2014 11:47:31 +0000
> >> > +	- min-residency
> >> > +		Usage: Required
> >> > +		Value type: <prop-encoded-array>
> >> > +		Definition: u32 value representing time in microseconds
> >> > +			    required for the CPU to be in the idle state to
> >> > +			    break even in power consumption terms compared
> >> > +			    to idle state idle_standby ([4][5]).
> >> 
> >> To me this continues to be a bit illdefined. Say we have three states:
> >> 0,1,2. State 0 is the idle_standby. Providing a minimum residency for
> >> state 1 compared to state 0 sort of makes sense, but if we provide a
> >> minimum residency for state 2 compared to state 0 the break even time
> >> is going to be smaller than break even when comparing state 1 and
> >> state 2. With this data we'd enter state 2 when we'd be better off
> >> entering state 1.
> > 
> > I am not sure I got your reply right, but min-residency for
> > state 2 will be higher than state 1, since it has to cater for the
> > dynamic power consumed by entering the state (but burns less power
> > than state 1 when _in_ the state).
> > 
> > Entering a state has a power cost and min-residency should take that into
> > account, worst-case as per other stats.
> > 
> > min-residency (and so the break-even) should take into account that
> > entering the state is not for free.
> > 
> > I think that comparing against idle_standby is the only sane way we can
> > define that parameter, either that or we remove it.
> > 
> > Does it make sense ?
> > 
> > Thanks !
> > Lorenzo
> 
> The point is that if you compare breakeven between state 0 and state 2
> the breakeven time will be smaller that when you compare the breakeven
> between state 1 and state 2. Assuming states ordered by "deepness" in
> the sense that deeper states have lower in-state power and longer
> entry/exit times.
> 
> I guess you could specify that the min-residency defines the time when
> the state breaks even compared to the previous (shallower) state.

I am not following Antti I am sorry. States are ordered in terms of
power consumption which also means that deeper idle states have a longer
required min-residency to break even against idle_standby in order to actually
save power.

When we make a decision on what idle state to enter all we do, and
that's OS agnostic, is predicting (+checking the next event) the next IRQ and
see if it is worth entering a state or not. We have to compare it against
a baseline, which is the processor being in standbywfi and that's what
these bindings define.

I do not understand why you want to define min-residency against the
previous shallower state.

What this binding says is: standbywfi is the shallower idle state in
power consumption terms. Deeper idle states save more power than
standbywfi if the residency in that state is at least min-residency.

I do not see where the problem is to be honest, maybe I need an example.

Thanks!
Lorenzo
Antti P Miettinen March 17, 2014, 6:26 p.m. UTC | #6
From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> When we make a decision on what idle state to enter all we do, and
> that's OS agnostic, is predicting (+checking the next event) the next IRQ and
> see if it is worth entering a state or not. We have to compare it against
> a baseline, which is the processor being in standbywfi and that's what
> these bindings define.
> 
> I do not understand why you want to define min-residency against the
> previous shallower state.
> 
> What this binding says is: standbywfi is the shallower idle state in
> power consumption terms. Deeper idle states save more power than
> standbywfi if the residency in that state is at least min-residency.
> 
> I do not see where the problem is to be honest, maybe I need an example.
> 
> Thanks!
> Lorenzo

Sorry, I should have explained myself more clearly. I've been
pondering about these issues somewhat lately so I'm perhaps suffering
from a bit of a tunnel vision.

In short, when we choose an idle state based on expected idle duration
we are not comparing wfi against all possible idle states in turn and
making a decision between wfi and state X. Instead we want to choose
among all states the one that gives minimum energy for the expected
idle time. I'll try to elaborate..

Entering and exiting idle states takes time at nonzero power. To make
up for this lost energy we indeed want the time in the idle state to
be sufficiently long to make up for the lost energy. Now the important
question here is "make up compared to what?".

The energy over the idle time can be also interpreted as average
power. When the idle time increases the average power for a state
approaches the in-state power. A deeper idle state would be a state
with lower in-state power and longer entry/exit time. Therefore the
average power for a deeper idle state drops slower as function of idle
time than the average power for a shallower idle state. If we'd plot
the average power for a number of idle states as function of idle
duration, we'd get a set of "constant over idle time plus constant"
style curves. Average power for state 0 will drop fastest close to the
in-state power of state 0. Average power for state 1 will drop slower
and approach the in-state power of state 1, average power for state 2
will drop even slower and approach the in-state power of state 3.

To define that the min-residency is the breakeven time against state 0
means that we are looking at the curves and asking "when does the
average power for state X cross the average power for state 0?". But
that would be the guideline for making a decision between state 0 and
the state in question. Even if average power for state 2 is below
the average power of state 0 it is not necessarily yet below the
average power of state 1. To break even against state 1 the idle
duration needs to be longer.

Yet another way to look at this: for three states we can define three
times of interest:
- t1: the time when state1 breaks even against state0
- t2: the time when state2 breaks even against state0
- t3: the time when state2 breaks even against state1
and t3 would typically be larger than t2.

	--Antti
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lorenzo Pieralisi March 17, 2014, 7:24 p.m. UTC | #7
On Mon, Mar 17, 2014 at 06:26:38PM +0000, Antti P Miettinen wrote:
> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > When we make a decision on what idle state to enter all we do, and
> > that's OS agnostic, is predicting (+checking the next event) the next IRQ and
> > see if it is worth entering a state or not. We have to compare it against
> > a baseline, which is the processor being in standbywfi and that's what
> > these bindings define.
> > 
> > I do not understand why you want to define min-residency against the
> > previous shallower state.
> > 
> > What this binding says is: standbywfi is the shallower idle state in
> > power consumption terms. Deeper idle states save more power than
> > standbywfi if the residency in that state is at least min-residency.
> > 
> > I do not see where the problem is to be honest, maybe I need an example.
> > 
> > Thanks!
> > Lorenzo
> 
> Sorry, I should have explained myself more clearly. I've been
> pondering about these issues somewhat lately so I'm perhaps suffering
> from a bit of a tunnel vision.
> 
> In short, when we choose an idle state based on expected idle duration
> we are not comparing wfi against all possible idle states in turn and
> making a decision between wfi and state X. Instead we want to choose
> among all states the one that gives minimum energy for the expected
> idle time. I'll try to elaborate..
> 
> Entering and exiting idle states takes time at nonzero power. To make
> up for this lost energy we indeed want the time in the idle state to
> be sufficiently long to make up for the lost energy. Now the important
> question here is "make up compared to what?".
> 
> The energy over the idle time can be also interpreted as average
> power. When the idle time increases the average power for a state
> approaches the in-state power. A deeper idle state would be a state
> with lower in-state power and longer entry/exit time. Therefore the
> average power for a deeper idle state drops slower as function of idle
> time than the average power for a shallower idle state. If we'd plot
> the average power for a number of idle states as function of idle
> duration, we'd get a set of "constant over idle time plus constant"
> style curves. Average power for state 0 will drop fastest close to the
> in-state power of state 0. Average power for state 1 will drop slower
> and approach the in-state power of state 1, average power for state 2
> will drop even slower and approach the in-state power of state 3.
> 
> To define that the min-residency is the breakeven time against state 0
> means that we are looking at the curves and asking "when does the
> average power for state X cross the average power for state 0?". But
> that would be the guideline for making a decision between state 0 and
> the state in question. Even if average power for state 2 is below
> the average power of state 0 it is not necessarily yet below the
> average power of state 1. To break even against state 1 the idle
> duration needs to be longer.
> 
> Yet another way to look at this: for three states we can define three
> times of interest:
> - t1: the time when state1 breaks even against state0
> - t2: the time when state2 breaks even against state0
> - t3: the time when state2 breaks even against state1
> and t3 would typically be larger than t2.

Now it is crystal clear, and you are absolutely right, sorry for
misunderstanding.

Help me define it then please:

- min-residency-us

"u32 value representing time in microseconds required for the CPU to be in
the idle state to guarantee power savings maximization".

Rather vague (on purpose), if anyone comes up with a better definition please
shout.

Thanks !
Lorenzo
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/arm/cpus.txt b/Documentation/devicetree/bindings/arm/cpus.txt
index 9130435..fd1fd8d 100644
--- a/Documentation/devicetree/bindings/arm/cpus.txt
+++ b/Documentation/devicetree/bindings/arm/cpus.txt
@@ -191,6 +191,13 @@  nodes to be present and contain the properties described below.
 			  property identifying a 64-bit zero-initialised
 			  memory location.
 
+	- cpu-idle-states
+		Usage: Optional
+		Value type: <prop-encoded-array>
+		Definition:
+			# List of phandles to idle state nodes supported
+			  by this cpu [1].
+
 Example 1 (dual-cluster big.LITTLE system 32-bit):
 
 	cpus {
@@ -382,3 +389,6 @@  cpus {
 		cpu-release-addr = <0 0x20000000>;
 	};
 };
+
+[1] ARM Linux kernel documentation - idle states bindings
+    Documentation/devicetree/bindings/arm/idle-states.txt
diff --git a/Documentation/devicetree/bindings/arm/idle-states.txt b/Documentation/devicetree/bindings/arm/idle-states.txt
new file mode 100644
index 0000000..f9a48a1
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/idle-states.txt
@@ -0,0 +1,781 @@ 
+==========================================
+ARM idle states binding description
+==========================================
+
+==========================================
+1 - Introduction
+==========================================
+
+ARM systems contain HW capable of managing power consumption dynamically,
+where cores can be put in different low-power states (ranging from simple
+wfi to power gating) according to OSPM policies. The CPU states representing
+the range of dynamic idle states that a processor can enter at run-time, can be
+specified through device tree bindings representing the parameters required
+to enter/exit specific idle states on a given processor.
+
+According to the Server Base System Architecture document (SBSA, [4]), the
+power states an ARM CPU can be put into are identified by the following list:
+
+- Running
+- Idle_standby
+- Idle_retention
+- Sleep
+- Off
+
+The power states described in the SBSA document define the basic CPU states on
+top of which ARM platforms implement power management schemes that allow an OS
+PM implementation to put the processor in different idle states (which include
+states listed above; "off" state is not an idle state since it does not have
+wake-up capabilities, hence it is not considered in this document).
+
+Idle state parameters (eg entry latency) are platform specific and need to be
+characterized with bindings that provide the required information to OSPM
+code so that it can build the required tables and use them at runtime.
+
+The device tree binding definition for ARM idle states is the subject of this
+document.
+
+===========================================
+2 - idle-states node
+===========================================
+
+ARM processor idle states are defined within the idle-states node, which is
+a direct child of the cpus node and provides a container where the processor
+idle states, defined as device tree nodes, are listed.
+
+- idle-states node
+
+	Usage: Optional - On ARM systems, is a container of processor idle
+			  states nodes. If the system does not provide CPU
+			  power management capabilities or the processor just
+			  supports idle_standby an idle-states node is not
+			  required.
+
+	Description: idle-states node is a container node, where its
+		     subnodes describe the CPU idle states.
+
+	Node name must be "idle-states".
+
+	The idle-states node's parent node must be the cpus node.
+
+	The idle-states node's child nodes can be:
+
+	- one or more state nodes
+
+	Any other configuration is considered invalid.
+
+	An idle-states node defines the following properties:
+
+	- entry-method
+		Usage: Required
+		Value type: <stringlist>
+		Definition: Describes the method by which a CPU enters the
+			    idle states. This property is required and must be
+			    one of:
+
+			    - "arm,psci-cpu-suspend"
+			      ARM PSCI firmware interface, CPU suspend
+			      method[3].
+
+			    - "[vendor],[method]"
+			      An implementation dependent string with
+			      format "vendor,method", where vendor is a string
+			      denoting the name of the manufacturer and
+			      method is a string specifying the mechanism
+			      used to enter the idle state.
+
+The nodes describing the idle states (state) can only be defined within the
+idle-states node.
+
+Any other configuration is consider invalid and therefore must be ignored.
+
+===========================================
+3 - state node
+===========================================
+
+A state node represents an idle state description and must be defined as
+follows:
+
+- state node
+
+	Description: must be child of either the idle-states node or
+		     a state node.
+
+	The state node name shall follow standard device tree naming
+	rules ([6], 2.2.1 "Node names"), in particular state nodes which
+	are siblings within a single common parent must be given a unique name.
+
+	The idle state entered by executing the wfi instruction (idle_standby
+	SBSA,[4][5]) is considered standard on all ARM platforms and therefore
+	must not be listed.
+
+	A state node can contain state child nodes. A state node with
+	children represents a hierarchical state, which is a superset of
+	the child states. Hierarchical states require all CPUs on which
+	they are valid (ie cpu nodes [1] containing cpu-idle-states arrays
+	having a phandle to the state) to request the state in order for it
+	to be entered.
+
+	A state node defines the following properties:
+
+	- compatible
+		Usage: Required
+		Value type: <stringlist>
+		Definition: Must be "arm,idle-state".
+
+	- index
+		Usage: Required
+		Value type: <u32>
+		Definition: It represents the idle state index.
+			    An increasing index value implies less power
+			    consumption. Index must be given a sequential
+			    value = {0, 1, ....}, starting from 0.
+			    Phandles in the cpu nodes [1] cpu-idle-states
+			    array property are not allowed to point at idle
+			    state nodes having the same index value.
+
+	- logic-state-retained
+		Usage: See definition
+		Value type: <none>
+		Definition: if present logic is retained on state entry,
+			    otherwise it is lost.
+
+	- cache-state-retained
+		Usage: See definition
+		Value type: <none>
+		Definition: if present cache memory is retained on state entry,
+			    otherwise it is lost.
+
+	- entry-method-param
+		Usage: See definition.
+		Value type: <u32>
+		Definition: Depends on the idle-states node entry-method
+			    property value. Refer to the entry-method bindings
+			    for this property value definition.
+
+	- entry-latency
+		Usage: Required
+		Value type: <prop-encoded-array>
+		Definition: u32 value representing worst case latency
+			    in microseconds required to enter the idle state.
+
+	- exit-latency
+		Usage: Required
+		Value type: <prop-encoded-array>
+		Definition: u32 value representing worst case latency
+			    in microseconds required to exit the idle state.
+
+	- min-residency
+		Usage: Required
+		Value type: <prop-encoded-array>
+		Definition: u32 value representing time in microseconds
+			    required for the CPU to be in the idle state to
+			    break even in power consumption terms compared
+			    to idle state idle_standby ([4][5]).
+
+	- power-domains
+		Usage: Optional
+		Value type: <prop-encoded-array>
+		Definition: List of power domain specifiers ([2]) describing
+			    the power domains that are affected by the idle
+			    state entry. All devices whose power-domain phandle
+			    points at one of the power domains listed in this
+			    property are affected by the idle state entry.
+
+
+===========================================
+4 - Examples
+===========================================
+
+Example 1 (ARM 64-bit, 16-cpu system):
+
+pd_clusters: power-domain-clusters@80002000 {
+	compatible = "arm,power-controller";
+	reg = <0x0 0x80002000 0x0 0x1000>;
+	#power-domain-cells = <1>;
+	#address-cells = <2>;
+	#size-cells = <2>;
+
+	pd_cores: power-domain-cores@80000000 {
+		compatible = "arm,power-controller";
+		reg = <0x0 0x80000000 0x0 0x1000>;
+		#power-domain-cells = <1>;
+	};
+};
+
+cpus {
+	#size-cells = <0>;
+	#address-cells = <2>;
+
+	idle-states {
+		entry-method = "arm,psci-cpu-suspend";
+
+		CLUSTER_RET_0: cluster-ret-0 {
+			/* cluster retention */
+			compatible = "arm,idle-state";
+			index = <2>;
+			logic-state-retained;
+			cache-state-retained;
+			entry-method-param = <0x1010000>;
+			entry-latency = <50>;
+			exit-latency = <100>;
+			min-residency = <250>;
+			power-domains = <&pd_clusters 0>;
+			CPU_RET_0_0: cpu-ret-0 {
+				/* cpu retention */
+				compatible = "arm,idle-state";
+				index = <0>;
+				cache-state-retained;
+				entry-method-param = <0x0010000>;
+				entry-latency = <20>;
+				exit-latency = <40>;
+				min-residency = <30>;
+				power-domains = <&pd_cores 0>,
+						<&pd_cores 1>,
+						<&pd_cores 2>,
+						<&pd_cores 3>,
+						<&pd_cores 4>,
+						<&pd_cores 5>,
+						<&pd_cores 6>,
+						<&pd_cores 7>;
+			};
+		};
+
+		CLUSTER_SLEEP_0: cluster-sleep-0 {
+			/* cluster sleep */
+			compatible = "arm,idle-state";
+			index = <3>;
+			entry-method-param = <0x1010000>;
+			entry-latency = <600>;
+			exit-latency = <1100>;
+			min-residency = <2700>;
+			power-domains = <&pd_clusters 0>;
+			CPU_SLEEP_0_0: cpu-sleep-0 {
+				/* cpu sleep */
+				compatible = "arm,idle-state";
+				index = <1>;
+				entry-method-param = <0x0010000>;
+				entry-latency = <250>;
+				exit-latency = <500>;
+				min-residency = <350>;
+				power-domains = <&pd_cores 0>,
+						<&pd_cores 1>,
+						<&pd_cores 2>,
+						<&pd_cores 3>,
+						<&pd_cores 4>,
+						<&pd_cores 5>,
+						<&pd_cores 6>,
+						<&pd_cores 7>;
+			};
+		};
+		CLUSTER_RET_1: cluster-ret-1 {
+			/* cluster retention */
+			compatible = "arm,idle-state";
+			index = <2>;
+			logic-state-retained;
+			cache-state-retained;
+			entry-method-param = <0x1010000>;
+			entry-latency = <50>;
+			exit-latency = <100>;
+			min-residency = <270>;
+			power-domains = <&pd_clusters 1>;
+			CPU_RET_1_0: cpu-ret-0 {
+				/* cpu retention */
+				compatible = "arm,idle-state";
+				index = <0>;
+				cache-state-retained;
+				entry-method-param = <0x0010000>;
+				entry-latency = <20>;
+				exit-latency = <40>;
+				min-residency = <30>;
+				power-domains = <&pd_cores 8>,
+						<&pd_cores 9>,
+						<&pd_cores 10>,
+						<&pd_cores 11>,
+						<&pd_cores 12>,
+						<&pd_cores 13>,
+						<&pd_cores 14>,
+						<&pd_cores 15>;
+			};
+		};
+
+		CLUSTER_SLEEP_1: cluster-sleep-1 {
+			/* cluster sleep */
+			compatible = "arm,idle-state";
+			index = <3>;
+			entry-method-param = <0x1010000>;
+			entry-latency = <500>;
+			exit-latency = <1200>;
+			min-residency = <3500>;
+			power-domains = <&pd_clusters 1>;
+			CPU_SLEEP_1_0: cpu-sleep-0 {
+				/* cpu sleep */
+				compatible = "arm,idle-state";
+				index = <1>;
+				entry-method-param = <0x0010000>;
+				entry-latency = <70>;
+				exit-latency = <100>;
+				min-residency = <100>;
+				power-domains = <&pd_cores 8>,
+						<&pd_cores 9>,
+						<&pd_cores 10>,
+						<&pd_cores 11>,
+						<&pd_cores 12>,
+						<&pd_cores 13>,
+						<&pd_cores 14>,
+						<&pd_cores 15>;
+			};
+		};
+	};
+
+	CPU0: cpu@0 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x0>;
+		enable-method = "psci";
+		next-level-cache = <&L1_0>;
+		cpu-idle-states = <&CPU_RET_0_0 &CPU_SLEEP_0_0
+				   &CLUSTER_RET_0 &CLUSTER_SLEEP_0>;
+		L1_0: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 0>;
+		};
+		L2_0: l2-cache {
+			compatible = "arm,arch-cache";
+			power-domain = <&pd_clusters 0>;
+		};
+	};
+
+	CPU1: cpu@1 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x1>;
+		enable-method = "psci";
+		next-level-cache = <&L1_1>;
+		cpu-idle-states = <&CPU_RET_0_0 &CPU_SLEEP_0_0
+				   &CLUSTER_RET_0 &CLUSTER_SLEEP_0>;
+		L1_1: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 1>;
+		};
+	};
+
+	CPU2: cpu@100 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x100>;
+		enable-method = "psci";
+		next-level-cache = <&L1_2>;
+		cpu-idle-states = <&CPU_RET_0_0 &CPU_SLEEP_0_0
+				   &CLUSTER_RET_0 &CLUSTER_SLEEP_0>;
+		L1_2: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 2>;
+		};
+	};
+
+	CPU3: cpu@101 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x101>;
+		enable-method = "psci";
+		next-level-cache = <&L1_3>;
+		cpu-idle-states = <&CPU_RET_0_0 &CPU_SLEEP_0_0
+				   &CLUSTER_RET_0 &CLUSTER_SLEEP_0>;
+		L1_3: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 3>;
+		};
+	};
+
+	CPU4: cpu@10000 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x10000>;
+		enable-method = "psci";
+		next-level-cache = <&L1_4>;
+		cpu-idle-states = <&CPU_RET_0_0 &CPU_SLEEP_0_0
+				   &CLUSTER_RET_0 &CLUSTER_SLEEP_0>;
+		L1_4: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 4>;
+		};
+	};
+
+	CPU5: cpu@10001 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x10001>;
+		enable-method = "psci";
+		next-level-cache = <&L1_5>;
+		cpu-idle-states = <&CPU_RET_0_0 &CPU_SLEEP_0_0
+				   &CLUSTER_RET_0 &CLUSTER_SLEEP_0>;
+		L1_5: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 5>;
+		};
+	};
+
+	CPU6: cpu@10100 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x10100>;
+		enable-method = "psci";
+		next-level-cache = <&L1_6>;
+		cpu-idle-states = <&CPU_RET_0_0 &CPU_SLEEP_0_0
+				   &CLUSTER_RET_0 &CLUSTER_SLEEP_0>;
+		L1_6: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 6>;
+		};
+	};
+
+	CPU7: cpu@10101 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a57";
+		reg = <0x0 0x10101>;
+		enable-method = "psci";
+		next-level-cache = <&L1_7>;
+		cpu-idle-states = <&CPU_RET_0_0 &CPU_SLEEP_0_0
+				   &CLUSTER_RET_0 &CLUSTER_SLEEP_0>;
+		L1_7: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 7>;
+		};
+	};
+
+	CPU8: cpu@100000000 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x0>;
+		enable-method = "psci";
+		next-level-cache = <&L1_8>;
+		cpu-idle-states = <&CPU_RET_1_0 &CPU_SLEEP_1_0
+				   &CLUSTER_RET_1 &CLUSTER_SLEEP_1>;
+		L1_8: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 8>;
+		};
+		L2_1: l2-cache {
+			compatible = "arm,arch-cache";
+			power-domain = <&pd_clusters 1>;
+		};
+	};
+
+	CPU9: cpu@100000001 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x1>;
+		enable-method = "psci";
+		next-level-cache = <&L1_9>;
+		cpu-idle-states = <&CPU_RET_1_0 &CPU_SLEEP_1_0
+				   &CLUSTER_RET_1 &CLUSTER_SLEEP_1>;
+		L1_9: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 9>;
+		};
+	};
+
+	CPU10: cpu@100000100 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x100>;
+		enable-method = "psci";
+		next-level-cache = <&L1_10>;
+		cpu-idle-states = <&CPU_RET_1_0 &CPU_SLEEP_1_0
+				   &CLUSTER_RET_1 &CLUSTER_SLEEP_1>;
+		L1_10: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 10>;
+		};
+	};
+
+	CPU11: cpu@100000101 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x101>;
+		enable-method = "psci";
+		next-level-cache = <&L1_11>;
+		cpu-idle-states = <&CPU_RET_1_0 &CPU_SLEEP_1_0
+				   &CLUSTER_RET_1 &CLUSTER_SLEEP_1>;
+		L1_11: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 11>;
+		};
+	};
+
+	CPU12: cpu@100010000 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x10000>;
+		enable-method = "psci";
+		next-level-cache = <&L1_12>;
+		cpu-idle-states = <&CPU_RET_1_0 &CPU_SLEEP_1_0
+				   &CLUSTER_RET_1 &CLUSTER_SLEEP_1>;
+		L1_12: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 12>;
+		};
+	};
+
+	CPU13: cpu@100010001 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x10001>;
+		enable-method = "psci";
+		next-level-cache = <&L1_13>;
+		cpu-idle-states = <&CPU_RET_1_0 &CPU_SLEEP_1_0
+				   &CLUSTER_RET_1 &CLUSTER_SLEEP_1>;
+		L1_13: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 13>;
+		};
+	};
+
+	CPU14: cpu@100010100 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x10100>;
+		enable-method = "psci";
+		next-level-cache = <&L1_14>;
+		cpu-idle-states = <&CPU_RET_1_0 &CPU_SLEEP_1_0
+				   &CLUSTER_RET_1 &CLUSTER_SLEEP_1>;
+		L1_14: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 14>;
+		};
+	};
+
+	CPU15: cpu@100010101 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a53";
+		reg = <0x1 0x10101>;
+		enable-method = "psci";
+		next-level-cache = <&L1_15>;
+		cpu-idle-states = <&CPU_RET_1_0 &CPU_SLEEP_1_0
+				   &CLUSTER_RET_1 &CLUSTER_SLEEP_1>;
+		L1_15: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 15>;
+		};
+	};
+};
+
+Example 2 (ARM 32-bit, 8-cpu system, two clusters):
+
+pd_clusters: power-domain-clusters@80002000 {
+	compatible = "arm,power-controller";
+	reg = <0x80002000 0x1000>;
+	#power-domain-cells = <1>;
+	#address-cells = <1>;
+	#size-cells = <1>;
+
+	pd_cores: power-domain-cores@80000000 {
+		compatible = "arm,power-controller";
+		reg = <0x80000000 0x1000>;
+		#power-domain-cells = <1>;
+	};
+};
+
+cpus {
+	#size-cells = <0>;
+	#address-cells = <1>;
+
+	idle-states {
+		entry-method = "arm,psci-cpu-suspend";
+
+		CLUSTER_SLEEP_0: cluster-sleep-0 {
+			compatible = "arm,idle-state";
+			index = <1>;
+			entry-method-param = <0x1010000>;
+			entry-latency = <1000>;
+			exit-latency = <1500>;
+			min-residency = <1500>;
+			power-domains = <&pd_clusters 0>;
+			CPU_SLEEP_0_0: cpu-sleep-0 {
+				compatible = "arm,idle-state";
+				index = <0>;
+				entry-method-param = <0x0010000>;
+				entry-latency = <400>;
+				exit-latency = <500>;
+				min-residency = <300>;
+				power-domains = <&pd_cores 0>,
+						<&pd_cores 1>,
+						<&pd_cores 2>,
+						<&pd_cores 3>;
+			};
+		};
+
+		CLUSTER_SLEEP_1: cluster-sleep-1 {
+			compatible = "arm,idle-state";
+			index = <1>;
+			entry-method-param = <0x1010000>;
+			entry-latency = <800>;
+			exit-latency = <2000>;
+			min-residency = <6500>;
+			power-domains = <&pd_clusters 1>;
+			CPU_SLEEP_1_0: cpu-sleep-0 {
+				compatible = "arm,idle-state";
+				index = <0>;
+				entry-method-param = <0x0010000>;
+				entry-latency = <300>;
+				exit-latency = <500>;
+				min-residency = <500>;
+				power-domains = <&pd_cores 4>,
+						<&pd_cores 5>,
+						<&pd_cores 6>,
+						<&pd_cores 7>;
+			};
+		};
+	};
+
+	CPU0: cpu@0 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a15";
+		reg = <0x0>;
+		next-level-cache = <&L1_0>;
+		cpu-idle-states = <&CPU_SLEEP_0_0 &CLUSTER_SLEEP_0>;
+		L1_0: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 0>;
+		};
+		L2_0: l2-cache {
+			compatible = "arm,arch-cache";
+			power-domain = <&pd_clusters 0>;
+		};
+	};
+
+	CPU1: cpu@1 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a15";
+		reg = <0x1>;
+		next-level-cache = <&L1_1>;
+		cpu-idle-states = <&CPU_SLEEP_0_0 &CLUSTER_SLEEP_0>;
+		L1_1: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 1>;
+		};
+	};
+
+	CPU2: cpu@2 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a15";
+		reg = <0x2>;
+		next-level-cache = <&L1_2>;
+		cpu-idle-states = <&CPU_SLEEP_0_0 &CLUSTER_SLEEP_0>;
+		L1_2: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 2>;
+		};
+	};
+
+	CPU3: cpu@3 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a15";
+		reg = <0x3>;
+		next-level-cache = <&L1_3>;
+		cpu-idle-states = <&CPU_SLEEP_0_0 &CLUSTER_SLEEP_0>;
+		L1_3: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_0>;
+			power-domain = <&pd_cores 3>;
+		};
+	};
+
+	CPU4: cpu@100 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a7";
+		reg = <0x100>;
+		next-level-cache = <&L1_4>;
+		cpu-idle-states = <&CPU_SLEEP_1_0 &CLUSTER_SLEEP_1>;
+		L1_4: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 4>;
+		};
+		L2_1: l2-cache {
+			compatible = "arm,arch-cache";
+			power-domain = <&pd_clusters 1>;
+		};
+	};
+
+	CPU5: cpu@101 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a7";
+		reg = <0x101>;
+		next-level-cache = <&L1_5>;
+		cpu-idle-states = <&CPU_SLEEP_1_0 &CLUSTER_SLEEP_1>;
+		L1_5: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 5>;
+		};
+	};
+
+	CPU6: cpu@102 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a7";
+		reg = <0x102>;
+		next-level-cache = <&L1_6>;
+		cpu-idle-states = <&CPU_SLEEP_1_0 &CLUSTER_SLEEP_1>;
+		L1_6: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 6>;
+		};
+	};
+
+	CPU7: cpu@103 {
+		device_type = "cpu";
+		compatible = "arm,cortex-a7";
+		reg = <0x103>;
+		next-level-cache = <&L1_7>;
+		cpu-idle-states = <&CPU_SLEEP_1_0 &CLUSTER_SLEEP_1>;
+		L1_7: l1-cache {
+			compatible = "arm,arch-cache";
+			next-level-cache = <&L2_1>;
+			power-domain = <&pd_cores 7>;
+		};
+	};
+};
+
+===========================================
+4 - References
+===========================================
+
+[1] ARM Linux Kernel documentation - CPUs bindings
+    Documentation/devicetree/bindings/arm/cpus.txt
+
+[2] ARM Linux Kernel documentation - power domain bindings
+    Documentation/devicetree/bindings/power/power_domain.txt
+
+[3] ARM Linux Kernel documentation - PSCI bindings
+    Documentation/devicetree/bindings/arm/psci.txt
+
+[4] ARM Server Base System Architecture (SBSA)
+    http://infocenter.arm.com/help/index.jsp
+
+[5] ARM Architecture Reference Manuals
+    http://infocenter.arm.com/help/index.jsp
+
+[6] ePAPR standard
+    https://www.power.org/documentation/epapr-version-1-1/