mbox series

[v2,0/5] Introduce OPP bandwidth bindings

Message ID 20190423132823.7915-1-georgi.djakov@linaro.org
Headers show
Series Introduce OPP bandwidth bindings | expand

Message

Georgi Djakov April 23, 2019, 1:28 p.m. UTC
Here is a proposal to extend the OPP bindings with bandwidth based on
a previous discussion [1].

Every functional block on a SoC can contribute to the system power
efficiency by expressing its own bandwidth needs (to memory or other SoC
modules). This will allow the system to save power when high throughput
is not required (and also provide maximum throughput when needed).

There are at least three ways for a device to determine its bandwidth
needs:
	1. The device can dynamically calculate the needed bandwidth
based on some known variable. For example: UART (baud rate), I2C (fast
mode, high-speed mode, etc), USB (specification version, data transfer
type), SDHC (SD standard, clock rate, bus-width), Video Encoder/Decoder
(video format, resolution, frame-rate)

	2. There is a hardware specific value. For example: hardware
specific constant value (e.g. for PRNG) or use-case specific value that
is hard-coded.

	3. Predefined SoC/board specific bandwidth values. For example:
CPU or GPU bandwidth is related to the current core frequency and both
bandwidth and frequency are scaled together.

This patchset is trying to address point 3 above by extending the OPP
bindings to support predefined SoC/board bandwidth values and adds
support in cpufreq-dt to scale the interconnect between the CPU and the
DDR together with frequency and voltage.

[1] https://patchwork.kernel.org/patch/10577315/

Changes in v2:
* Added support for configuring multiple interconnect paths per each device
and changed the way we describe it in DT. (Viresh)
* Rename the DT property opp-bw-MBps to bandwidth-MBps. (Viresh)
* Document MBps in property-units.txt. (Rob)
* New patch to add of_icc_get_by_index() helper function.
* Add _of_find_paths() to populate OPP tables with interconnect path data
from DT. (Viresh)

v1: https://lore.kernel.org/lkml/20190313090010.20534-1-georgi.djakov@linaro.org/


Georgi Djakov (5):
  dt-bindings: opp: Introduce bandwidth-MBps bindings
  interconnect: Add of_icc_get_by_index() helper function
  OPP: Add support for parsing the interconnect bandwidth
  OPP: Update the bandwidth on OPP frequency changes
  cpufreq: dt: Add support for interconnect bandwidth scaling

 Documentation/devicetree/bindings/opp/opp.txt |  38 +++++++
 .../devicetree/bindings/property-units.txt    |   4 +
 drivers/cpufreq/cpufreq-dt.c                  |  27 ++++-
 drivers/interconnect/core.c                   |  45 ++++++--
 drivers/opp/core.c                            |  96 ++++++++++++++++-
 drivers/opp/of.c                              | 102 ++++++++++++++++++
 drivers/opp/opp.h                             |   9 ++
 include/linux/interconnect.h                  |   6 ++
 include/linux/pm_opp.h                        |  14 +++
 9 files changed, 327 insertions(+), 14 deletions(-)

Comments

Viresh Kumar April 24, 2019, 5:33 a.m. UTC | #1
On 23-04-19, 16:28, Georgi Djakov wrote:
> In addition to frequency and voltage, some devices may have bandwidth

> requirements for their interconnect throughput - for example a CPU

> or GPU may also need to increase or decrease their bandwidth to DDR

> memory based on the current operating performance point.

> 

> Extend the OPP tables with additional property to describe the bandwidth

> needs of a device. The average and peak bandwidth values depend on the

> hardware and its properties.

> 

> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>

> ---

>  Documentation/devicetree/bindings/opp/opp.txt | 38 +++++++++++++++++++

>  .../devicetree/bindings/property-units.txt    |  4 ++

>  2 files changed, 42 insertions(+)

> 

> diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt

> index 76b6c79604a5..830f0206aea7 100644

> --- a/Documentation/devicetree/bindings/opp/opp.txt

> +++ b/Documentation/devicetree/bindings/opp/opp.txt

> @@ -132,6 +132,9 @@ Optional properties:

>  - opp-level: A value representing the performance level of the device,

>    expressed as a 32-bit integer.

>  

> +- bandwidth-MBps: The interconnect bandwidth is specified with an array containing

> +  the two integer values for average and peak bandwidth in megabytes per second.

> +

>  - clock-latency-ns: Specifies the maximum possible transition latency (in

>    nanoseconds) for switching to this OPP from any other OPP.

>  

> @@ -546,3 +549,38 @@ Example 6: opp-microvolt-<name>, opp-microamp-<name>:

>  		};

>  	};

>  };

> +

> +Example 7: bandwidth-MBps:

> +Average and peak bandwidth values for the interconnects between CPU and DDR

> +memory and also between CPU and L3 are defined per each OPP. Bandwidth of both

> +interconnects is scaled together with CPU frequency.

> +

> +/ {

> +	cpus {

> +		CPU0: cpu@0 {

> +			compatible = "arm,cortex-a53", "arm,armv8";

> +			...

> +			operating-points-v2 = <&cpu_opp_table>;

> +			/* path between CPU and DDR memory and CPU and L3 */

> +			interconnects = <&noc MASTER_CPU &noc SLAVE_DDR>,

> +					<&noc MASTER_CPU &noc SLAVE_L3>;

> +		};

> +	};

> +

> +	cpu_opp_table: cpu_opp_table {

> +		compatible = "operating-points-v2";

> +		opp-shared;

> +

> +		opp-200000000 {

> +			opp-hz = /bits/ 64 <200000000>;

> +			/* CPU<->DDR bandwidth: 457 MB/s average, 1525 MB/s peak */

> +			 * CPU<->L3 bandwidth: 914 MB/s average, 3050 MB/s peak */

> +			bandwidth-MBps = <457 1525>, <914 3050>;

> +		};

> +		opp-400000000 {

> +			opp-hz = /bits/ 64 <400000000>;

> +			/* CPU<->DDR bandwidth: 915 MB/s average, 3051 MB/s peak */

> +			 * CPU<->L3 bandwidth: 1828 MB/s average, 6102 MB/s peak */

> +			bandwidth-MBps = <915 3051>, <1828 6102>;

> +		};

> +	};

> diff --git a/Documentation/devicetree/bindings/property-units.txt b/Documentation/devicetree/bindings/property-units.txt

> index bfd33734faca..9c3dbefcdae8 100644

> --- a/Documentation/devicetree/bindings/property-units.txt

> +++ b/Documentation/devicetree/bindings/property-units.txt

> @@ -41,3 +41,7 @@ Temperature

>  Pressure

>  ----------------------------------------

>  -kpascal	: kiloPascal

> +

> +Throughput

> +----------------------------------------

> +-MBps		: megabytes per second


LGTM

-- 
viresh
Sibi Sankar April 24, 2019, 9 a.m. UTC | #2
Hey Viresh,

On 4/24/19 12:19 PM, Viresh Kumar wrote:
> On 24-04-19, 12:16, Rajendra Nayak wrote:

>>

>>

>> On 4/23/2019 6:58 PM, Georgi Djakov wrote:

>>> In addition to frequency and voltage, some devices may have bandwidth

>>> requirements for their interconnect throughput - for example a CPU

>>> or GPU may also need to increase or decrease their bandwidth to DDR

>>> memory based on the current operating performance point.

>>>

>>> Extend the OPP tables with additional property to describe the bandwidth

>>> needs of a device. The average and peak bandwidth values depend on the

>>> hardware and its properties.

>>>

>>> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>

>>> ---

>>>    Documentation/devicetree/bindings/opp/opp.txt | 38 +++++++++++++++++++

>>>    .../devicetree/bindings/property-units.txt    |  4 ++

>>>    2 files changed, 42 insertions(+)

>>>

>>> diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt

>>> index 76b6c79604a5..830f0206aea7 100644

>>> --- a/Documentation/devicetree/bindings/opp/opp.txt

>>> +++ b/Documentation/devicetree/bindings/opp/opp.txt

>>> @@ -132,6 +132,9 @@ Optional properties:

>>>    - opp-level: A value representing the performance level of the device,

>>>      expressed as a 32-bit integer.

>>> +- bandwidth-MBps: The interconnect bandwidth is specified with an array containing

>>> +  the two integer values for average and peak bandwidth in megabytes per second.

>>> +

>>>    - clock-latency-ns: Specifies the maximum possible transition latency (in

>>>      nanoseconds) for switching to this OPP from any other OPP.

>>> @@ -546,3 +549,38 @@ Example 6: opp-microvolt-<name>, opp-microamp-<name>:

>>>    		};

>>>    	};

>>>    };

>>> +

>>> +Example 7: bandwidth-MBps:

>>> +Average and peak bandwidth values for the interconnects between CPU and DDR

>>> +memory and also between CPU and L3 are defined per each OPP. Bandwidth of both

>>> +interconnects is scaled together with CPU frequency.

>>> +

>>> +/ {

>>> +	cpus {

>>> +		CPU0: cpu@0 {

>>> +			compatible = "arm,cortex-a53", "arm,armv8";

>>> +			...

>>> +			operating-points-v2 = <&cpu_opp_table>;

>>> +			/* path between CPU and DDR memory and CPU and L3 */

>>> +			interconnects = <&noc MASTER_CPU &noc SLAVE_DDR>,

>>> +					<&noc MASTER_CPU &noc SLAVE_L3>;

>>> +		};

>>> +	};

>>> +

>>> +	cpu_opp_table: cpu_opp_table {

>>> +		compatible = "operating-points-v2";

>>> +		opp-shared;

>>> +

>>> +		opp-200000000 {

>>> +			opp-hz = /bits/ 64 <200000000>;

>>> +			/* CPU<->DDR bandwidth: 457 MB/s average, 1525 MB/s peak */

>>> +			 * CPU<->L3 bandwidth: 914 MB/s average, 3050 MB/s peak */

>>> +			bandwidth-MBps = <457 1525>, <914 3050>;

>>

>> Should this also have a bandwidth-MBps-name perhaps? Without that I guess we assume

>> the order in which we specify the interconnects is the same as the order here?

> 

> Right, so I suggested not to add the -name property and to rely on the

> order. Though I missed that he hasn't mentioned the order thing here.


by skipping names, aren't we forced to specify all the specified paths
bandwidths for each opp even if it is redundant? i.e if the first/second
icc path doesn't have to change across a few opps but if the other path
does need to change this scheme would force it to be included and will
try to set the first/second path again.


e.g: Here the first path does not have to change across these two opps
but have to specified nonetheless since we omit names.

+		opp-1200000000 {
+			opp-hz = /bits/ 64 <1200000000>;
+			bandwidth-MBps = <457 1525>, <914 3050>;
+		};
+		opp-1400000000 {
+			opp-hz = /bits/ 64 <1400000000>;
+			bandwidth-MBps = <457 1525>, <1828 6102>;
+		};


> 

> @Georgi: Please mention above in the binding that the order is same as

> interconnects binding.

> 


-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc, is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
Saravana Kannan June 1, 2019, 2:12 a.m. UTC | #3
I'll have to Nack this series because it's making a couple of wrong assumptions
about bandwidth voting.

Firstly, it's mixing up OPP to bandwidth mapping (Eg: CPU freq to CPU<->DDR
bandwidth mapping) with the bandwidth levels that are actually supported by an
interconnect path (Eg: CPU<->DDR bandwidth levels). For example, CPU0 might
decide to vote for a max of 10 GB/s because it's a little CPU and never needs
anything higher than 10 GB/s even at CPU0's max frequency. But that has no
bearing on bandwidth level available between CPU<->DDR.

There needs to be a separate BW OPP table describing the bandwith levels
available for the CPU<->DDR path and then a separate mapping between CPU OPP to
CPU<->DDR BW OPP. That way, the mapping decision (policy or voltage based
config decision) doesn't affect the description of what the hardware really is
capable of.

Put another way, if someone comes around and decides the CPU0's max freq should
ask for 15 GB/s because some shared voltage rail would already be pushed to a
voltage sufficient to support 15 GB/s, then it shouldn't change the HW
description of what bandwidth levels are available between CPU<->DDR. If the
CPU<->DDR path supports 20 GB/s, it always does independent of the CPU OPP
table mapping.

By splitting out the available bandwidth levels of the CPU<->DDR path into a
separate BW OPP table, we avoid these kinds of issues.

Also, one could easily imagine using a bandwidth counter or some other means of
BW measurement hardware to vote for bandwidth between CPU<->DDR and CPU<->L3.
That entity should be able to know/learn all the available bandwidth levels in
the CPU<->DDR path without forcing bandwidth levels to be listed in CPU OPP
table. And if it's measuring bandwidth at a point common for all CPUs, what CPU
OPP table is it even supposed to look at to learn all the available bandwidth
levels. It just doesn't make sense.

It's also easy to envision having multiple policies or devfreq governors voting
for an interconnect path. The mapping you are talking about in this series is
just an input for one of them (the cpufreq-map governor I sent out a while
ago).

Secondly, when it comes to bandwidth OPP tables, the peak bandwidth should be
the key/first element (similar to how frequency is now). Long explanation
follows.

All the sensible frequency combinations of all the hardware interconnects
between the master and slave port (Eg: GPU -> MMNOC -> BIMC -> DDR) determine
the peak bandwidth levels available in that interconnect path.

If multiple devices (GPU, CPU, etc) vote for different peak bandwidths for an
interconnect (say BIMC), the interconnect provider picks the max peak bandwidth
amongst all the requests and then picks the lowest interconnect frequency that
can support the max peak bandwidth. 

So the devices (GPU, CPU, etc), actually have some control on what interconnect
frequencies are picked by asking for a specific peak bandwidth -- so there's
actually a notion of useful levels.

Average bandwidth is an additive property -- so if CPU and GPU ask for 5 GB/s
and 3 GB/s respectively for an interconnect, the interconnect provider adds
them up and configures the interconnect for 8 GB/s. So if GPU asks for 5 GB/s
average bandwidth, it has no idea what frequency the interconnect will actually
get configured to. So, average bandwidth really doesn't provide a sense of
levels to pick from for a given interconnect path.

So peak bandwidth is a much better pick than average bandwidth for being a key
to the bandwidth OPP table.

So what I think we need is:
* Bandwidth OPP support in the kernel
* Bandwidth OPP DT binding to describe the bandwidth levels available for
  different interconnect paths.
* A new "interconnect-opp" property that can point to different BW OPP
  tables for each of the interconnect paths listed under interconnects
  property.

Then for mapping from device OPP to interconnect path bandwidth OPPs, you
just used the existing required-opps binding to link an entry in GPU OPP to
an entry in GPU<->DDR bandwidth OPP table. That way the hardware is
actually described correctly and the mapping is kept separate.

So, in the end, it'll look something like this in DT.

gpu_cache_opp_table: gpu_cache_opp_table {
	compatible = "operating-points-v2";

	gpu_cache_3000: opp-3000 {
		opp-peak-mbps = <3000>;
		avg-mbps = <1000>;
	};
	gpu_cache_6000: opp-6000 {
		opp-peak-mbps = <6000>;
		avg-mbps = <2000>;
	};
	gpu_cache_9000: opp-9000 {
		opp-peak-mbps = <9000>;
		avg-mbps = <9000>;
	};
};

gpu_ddr_opp_table: gpu_ddr_opp_table {
	compatible = "operating-points-v2";

	gpu_ddr_1525: opp-1525 {
		opp-peak-mbps = <1525>;
		avg-mbps = <452>;
	};
	gpu_ddr_3051: opp-3051 {
		opp-peak-mbps = <3051>;
		avg-mbps = <915>;
	};
	gpu_ddr_7500: opp-7500 {
		opp-peak-mbps = <7500>;
		avg-mbps = <3000>;
	};
};

gpu_opp_table: gpu_opp_table {
	compatible = "operating-points-v2";
	opp-shared;

	opp-200000000 {
		opp-hz = /bits/ 64 <200000000>;
		required-opps = <&gpu_cache_3000>, <&gpu_ddr_1525>;
	};
	opp-400000000 {
		opp-hz = /bits/ 64 <400000000>;
		required-opps = <&gpu_cache_6000>, <&gpu_ddr_3051>;
	};
};

gpu@7864000 {
	...
	operating-points-v2 = <&gpu_opp_table>;
	interconnects = <&mmnoc MASTER_GPU_1 &bimc SLAVE_SYSTEL_CACHE>,
			<&mmnoc MASTER_GPU_1 &bimc SLAVE_DDR>;
	interconnect-names = "gpu-cache", "gpu-mem";
	interconnect-opps = <&gpu_cache_bw_opp>, <&gpu_ddr_bw_opp>
};

It's very clear what the HW supports vs what the mapping chooses to use. It's
also very clearer what the mapping is doing because it actually points to
entries in appropriately names OPP tables. There's no confusion on what mapping
corresponds to what interconnect paths -- which is why this doesn't need a
comment to clarify the intent here whereas in this patch series, the mappings
needed comments on which interconnect they are referring to.

Sorry about the long email and jumping in out of nowhere. The need for
something like this has been in my mind for a long time and my situation has
also changed where I can be more active upstream compared to before.

Thanks and have a nice weekend.

-Saravana

--