mbox series

[v10,0/8] Introduce on-chip interconnect API

Message ID 20181127180349.29997-1-georgi.djakov@linaro.org
Headers show
Series Introduce on-chip interconnect API | expand

Message

Georgi Djakov Nov. 27, 2018, 6:03 p.m. UTC
Modern SoCs have multiple processors and various dedicated cores (video, gpu,
graphics, modem). These cores are talking to each other and can generate a
lot of data flowing through the on-chip interconnects. These interconnect
buses could form different topologies such as crossbar, point to point buses,
hierarchical buses or use the network-on-chip concept.

These buses have been sized usually to handle use cases with high data
throughput but it is not necessary all the time and consume a lot of power.
Furthermore, the priority between masters can vary depending on the running
use case like video playback or CPU intensive tasks.

Having an API to control the requirement of the system in terms of bandwidth
and QoS, so we can adapt the interconnect configuration to match those by
scaling the frequencies, setting link priority and tuning QoS parameters.
This configuration can be a static, one-time operation done at boot for some
platforms or a dynamic set of operations that happen at run-time.

This patchset introduce a new API to get the requirement and configure the
interconnect buses across the entire chipset to fit with the current demand.
The API is NOT for changing the performance of the endpoint devices, but only
the interconnect path in between them.

The API is using a consumer/provider-based model, where the providers are
the interconnect buses and the consumers could be various drivers.
The consumers request interconnect resources (path) to an endpoint and set
the desired constraints on this data flow path. The provider(s) receive
requests from consumers and aggregate these requests for all master-slave
pairs on that path. Then the providers configure each participating in the
topology node according to the requested data flow path, physical links and
constraints. The topology could be complicated and multi-tiered and is SoC
specific.

Below is a simplified diagram of a real-world SoC topology. The interconnect
providers are the NoCs.

+----------------+    +----------------+
| HW Accelerator |--->|      M NoC     |<---------------+
+----------------+    +----------------+                |
                        |      |                    +------------+
 +-----+  +-------------+      V       +------+     |            |
 | DDR |  |                +--------+  | PCIe |     |            |
 +-----+  |                | Slaves |  +------+     |            |
   ^ ^    |                +--------+     |         |   C NoC    |
   | |    V                               V         |            |
+------------------+   +------------------------+   |            |   +-----+
|                  |-->|                        |-->|            |-->| CPU |
|                  |-->|                        |<--|            |   +-----+
|     Mem NoC      |   |         S NoC          |   +------------+
|                  |<--|                        |---------+    |
|                  |<--|                        |<------+ |    |   +--------+
+------------------+   +------------------------+       | |    +-->| Slaves |
  ^  ^    ^    ^          ^                             | |        +--------+
  |  |    |    |          |                             | V
+------+  |  +-----+   +-----+  +---------+   +----------------+   +--------+
| CPUs |  |  | GPU |   | DSP |  | Masters |-->|       P NoC    |-->| Slaves |
+------+  |  +-----+   +-----+  +---------+   +----------------+   +--------+
          |
      +-------+
      | Modem |
      +-------+

TODO:
* Create icc_set_extended() to handle parameters such as latency and other
  QoS values. Nvidia and Qcom guys are interested in this.
* Cache the path between the nodes instead of walking the graph on each get().
* Sync interconnect requests with the idle state of the device.

Changes since patchset v9 (https://lkml.org/lkml/2018/8/31/444)
* Converted from using global node identifiers to local per provider ids.
* Dropped msm8916 platform driver until we figure out DT bindings.
* Included sdm845 platform driver instead.
* Added macros for converting to mbps, gbps, etc. to icc units.
* Added comments about aggregation, other minor changes.
* Fixed uninitialized variable. (Gustavo A. R. Silva)
* Removed set but not used variable. (YueHaibing)
* Fixed build error without DEBUGFS. (Arnd Bergmann)

Changes since patchset v8 (https://lkml.org/lkml/2018/8/10/387)
* Fixed the names of the files when built as modules.
* Corrected some typos in comments.

Changes since patchset v7 (https://lkml.org/lkml/2018/7/31/647)
* Addressed comments on kernel-doc and grammar. (Randy)
* Picked Reviewed-by: Evan
* Squashed consumer and provider DT bindings into single patch. (Rob)
* Cleaned-up msm8916 DT bindings docs by removing unused port ids.
* Updated documentation for the cases when NULL is returned. (Saravana)
* New patch to add myself as maintainer.

Changes since patchset v6 (https://lkml.org/lkml/2018/7/9/698)
* [patches 1,6]: Move the aggregation within the provider from the framework to
  the platform driver's set() callback, as the aggregation point could be SoC
  specific.
* [patch 1]: Include missing header, reset state only of the traversed nodes,
  move more code into path_init(), add more asserts, move misplaced mutex,
  simplify icc_link_destroy() (Evan)
* [patch 1]: Fix the order of requests to go from source to destination. (Alex)
* [patch 7]: Use better wording in the documentation. (Evan)
* [patch 6]: Reorder struct members, sort nodes alphabetically, improve naming
  of variables , add missing clk_disable_unprepare() in error paths. (Matthias)
* [patch 6]: Remove redundant NULL pointer check in msm8916 driver. (Alex)
* [patch 6]: Add missing depend on QCOM_SMD_RPM in Kconfig. (Evan)
* [patch 3]: Don't check for errors on debugfs calls, remove debugfs directory
  when module is unloaded (Greg)

Changes since patchset v5 (https://lkml.org/lkml/2018/6/20/453)
* Fix the modular build, make rpm-smd driver a module.
* Optimize locking and move to higher level. (Evan)
* Code cleanups. Fix typos. (Evan, Matthias)
* Add the source node to the path. (Evan)
* Rename path_allocate() to path_init() with minor refactoring. (Evan)
* Rename *_remove() functions to *_destroy().
* Return fixed errors in icc_link_destroy(). (Evan)
* Fix krealloc() usage in icc_link_destroy(). (Evan)
* Add missing kfree() in icc_node_create(). (Matthias)
* Make icc_node_add() return void. (Matthias)
* Change mutex_init to mutex_lock in icc_provider_add(). (Matthias)
* Add new icc_node_del() function to delete nodes from provider.
* Fix the header guard to reflect the path in smd-rpm.h. (Evan)
* Check for errors returned by qcom_icc_rpm_smd_send(). (Evan)
* Propagate the error of icc_provider_del(). (Evan)

Changes since patchset v4 (https://lkml.org/lkml/2018/3/9/856)
* Simplified locking by using a single global mutex. (Evan)
* Changed the aggregation function interface.
* Implemented functions for node, link, provider removal. (Evan)
* Naming changes on variables and functions, removed redundant code. (Evan)
* Fixes and clarifications in the docs. (Matthias, Evan, Amit, Alexandre)
* Removed mandatory reg DT property, made interconnect-names optional. (Bjorn)
* Made interconnect-cells property required to align with other bindings. (Neil)
* Moved msm8916 specific bindings into a separate file and patch. (Bjorn)
* Use the names, instead of the hardcoded ids for topology. (Matthias)
* Init the node before creating the links. (Evan)
* Added icc_units_to_bps macro. (Amit)

Changes since patchset v3 (https://lkml.org/lkml/2017/9/8/544)
* Refactored the constraints aggregation.
* Use the IDR API.
* Split the provider and consumer bindings into separate patches and propose
  new bindings for consumers, which allows to specify the local source port.
* Adopted the icc_ prefix for API functions.
* Introduced separate API functions for creating interconnect nodes and links.
* Added DT lookup support in addition to platform data.
* Dropped the event tracing patch for now.
* Added a patch to provide summary via debugfs.
* Use macro for the list of topology definitions in the platform driver.
* Various minor changes.

Changes since patchset v2 (https://lkml.org/lkml/2017/7/20/825)
* Split the aggregation into per node and per provider. Cache the
  aggregated values.
* Various small refactorings and cleanups in the framework.
* Added a patch introducing basic tracepoint support for monitoring
  the time required to update the interconnect nodes.

Changes since patchset v1 (https://lkml.org/lkml/2017/6/27/890)
* Updates in the documentation.
* Changes in request aggregation, locking.
* Dropped the aggregate() callback and use the default as it currently
  sufficient for the single vendor driver. Will add it later when needed.
* Dropped the dt-bindings draft patch for now.

Changes since RFC v2 (https://lkml.org/lkml/2017/6/12/316)
* Converted documentation to rst format.
* Fixed an incorrect call to mutex_lock. Renamed max_bw to peak_bw.

Changes since RFC v1 (https://lkml.org/lkml/2017/5/15/605)
* Refactored code into shorter functions.
* Added a new aggregate() API function.
* Rearranged some structs to reduce padding bytes.

Changes since RFC v0 (https://lkml.org/lkml/2017/3/1/599)
* Removed DT support and added optional Patch 3 with new bindings proposal.
* Converted the topology into internal driver data.
* Made the framework modular.
* interconnect_get() now takes (src and dst ports as arguments).
* Removed public declarations of some structs.
* Now passing prev/next nodes to the vendor driver.
* Properly remove requests on _put().
* Added refcounting.
* Updated documentation.
* Changed struct interconnect_path to use array instead of linked list.

David Dai (2):
  interconnect: qcom: Add sdm845 interconnect provider driver
  arm64: dts: sdm845: Add interconnect provider DT nodes

Georgi Djakov (5):
  interconnect: Add generic on-chip interconnect API
  dt-bindings: Introduce interconnect binding
  interconnect: Allow endpoints translation via DT
  interconnect: Add debugfs support
  MAINTAINERS: add a maintainer for the interconnect API

 .../bindings/interconnect/interconnect.txt    |  60 ++
 .../bindings/interconnect/qcom,sdm845.txt     |  24 +
 Documentation/interconnect/interconnect.rst   |  94 ++
 MAINTAINERS                                   |  10 +
 arch/arm64/boot/dts/qcom/sdm845.dtsi          |   5 +
 drivers/Kconfig                               |   2 +
 drivers/Makefile                              |   1 +
 drivers/interconnect/Kconfig                  |  15 +
 drivers/interconnect/Makefile                 |   6 +
 drivers/interconnect/core.c                   | 796 +++++++++++++++++
 drivers/interconnect/qcom/Kconfig             |  13 +
 drivers/interconnect/qcom/Makefile            |   5 +
 drivers/interconnect/qcom/sdm845.c            | 836 ++++++++++++++++++
 .../dt-bindings/interconnect/qcom,sdm845.h    | 143 +++
 include/linux/interconnect-provider.h         | 142 +++
 include/linux/interconnect.h                  |  58 ++
 16 files changed, 2210 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/interconnect/interconnect.txt
 create mode 100644 Documentation/devicetree/bindings/interconnect/qcom,sdm845.txt
 create mode 100644 Documentation/interconnect/interconnect.rst
 create mode 100644 drivers/interconnect/Kconfig
 create mode 100644 drivers/interconnect/Makefile
 create mode 100644 drivers/interconnect/core.c
 create mode 100644 drivers/interconnect/qcom/Kconfig
 create mode 100644 drivers/interconnect/qcom/Makefile
 create mode 100644 drivers/interconnect/qcom/sdm845.c
 create mode 100644 include/dt-bindings/interconnect/qcom,sdm845.h
 create mode 100644 include/linux/interconnect-provider.h
 create mode 100644 include/linux/interconnect.h

Comments

Georgi Djakov Nov. 28, 2018, 6:18 p.m. UTC | #1
Hi Joe,

On 11/27/18 20:35, Joe Perches wrote:
> On Tue, 2018-11-27 at 20:03 +0200, Georgi Djakov wrote:

>> This patch introduces a new API to get requirements and configure the

>> interconnect buses across the entire chipset to fit with the current

>> demand.

> 

> trivial notes:

> 

>> diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c

> []

>> +static int apply_constraints(struct icc_path *path)

>> +{

>> +	struct icc_node *next, *prev = NULL;

>> +	int ret = -EINVAL;

>> +	int i;

>> +

>> +	for (i = 0; i < path->num_nodes; i++, prev = next) {

>> +		struct icc_provider *p;

>> +

>> +		next = path->reqs[i].node;

>> +		/*

>> +		 * Both endpoints should be valid master-slave pairs of the

>> +		 * same interconnect provider that will be configured.

>> +		 */

>> +		if (!prev || next->provider != prev->provider)

>> +			continue;

>> +

>> +		p = next->provider;

>> +

>> +		/* set the constraints */

>> +		ret = p->set(prev, next);

>> +		if (ret)

>> +			goto out;

>> +	}

>> +out:

>> +	return ret;

>> +}

> 

> The use of ", prev = next" appears somewhat tricky code.

> Perhaps move the assignment of prev to the bottom of the loop.

> Perhaps the temporary p assignment isn't useful either.

> 

>> +int icc_set(struct icc_path *path, u32 avg_bw, u32 peak_bw)

>> +{

> []

>> +	ret = apply_constraints(path);

>> +	if (ret)

>> +		pr_debug("interconnect: error applying constraints (%d)", ret);

> 

> Ideally all pr_<foo> formats should end in '\n'

> 

>> +static struct icc_node *icc_node_create_nolock(int id)

>> +{

>> +	struct icc_node *node;

>> +

>> +	/* check if node already exists */

>> +	node = node_find(id);

>> +	if (node)

>> +		goto out;

>> +

>> +	node = kzalloc(sizeof(*node), GFP_KERNEL);

>> +	if (!node) {

>> +		node = ERR_PTR(-ENOMEM);

>> +		goto out;

> 

> Generally, this code appears to overly rely on goto when

> direct returns could be more readable.

> 

>> +	}

>> +

>> +	id = idr_alloc(&icc_idr, node, id, id + 1, GFP_KERNEL);

>> +	if (WARN(id < 0, "couldn't get idr")) {

> 

> This seems to unnecessarily hide the id < 0 test in a WARN

> 

> Why is this a WARN and not a simpler

> 	if (id < 0) {

> 		[ pr_err(...); or WARN(1, ...); ]

> 

>> +		kfree(node);

>> +		node = ERR_PTR(id);

>> +		goto out;

>> +	}

>> +

>> +	node->id = id;

>> +

>> +out:

>> +	return node;

>> +}


Thank you for helping to improve the code. The above suggestions make it 
cleaner indeed.

> []

>> diff --git a/include/linux/interconnect.h b/include/linux/interconnect.h

> []

>> +/* macros for converting to icc units */

>> +#define bps_to_icc(x)	(1)

>> +#define kBps_to_icc(x)	(x)

> []

>> +#define MBps_to_icc(x)	(x * 1000)

>> +#define GBps_to_icc(x)	(x * 1000 * 1000)

>> +#define kbps_to_icc(x)	(x / 8 + ((x) % 8 ? 1 : 0))

>> +#define Mbps_to_icc(x)	(x * 1000 / 8 )

>> +#define Gbps_to_icc(x)	(x * 1000 * 1000 / 8)

> 

> The last 5 macros should parenthesize x


Oops.. obviously i forgot to run checkpatch --strict. Will fix!

BR,
Georgi
Evan Green Dec. 1, 2018, 12:39 a.m. UTC | #2
On Tue, Nov 27, 2018 at 10:04 AM Georgi Djakov <georgi.djakov@linaro.org> wrote:
>

> From: David Dai <daidavid1@codeaurora.org>

>

> Add RSC (Resource State Coordinator) provider

> dictating network-on-chip interconnect bus performance

> found on SDM845-based platforms.

>

> Signed-off-by: David Dai <daidavid1@codeaurora.org>

> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>

> ---

>  arch/arm64/boot/dts/qcom/sdm845.dtsi | 5 +++++

>  1 file changed, 5 insertions(+)

>

> diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi

> index b72bdb0a31a5..856d33604e9c 100644

> --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi

> +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi

> @@ -1324,6 +1324,11 @@

>                                 compatible = "qcom,sdm845-rpmh-clk";

>                                 #clock-cells = <1>;

>                         };

> +

> +                       qnoc: qnoc {

> +                               compatible = "qcom,sdm845-rsc-hlos";

> +                               #interconnect-cells = <1>;

> +                       };


Should we alphabetize this node above rpmhcc?

>                 };

>

>                 intc: interrupt-controller@17a00000 {
Rob Herring Dec. 5, 2018, 4:16 p.m. UTC | #3
On Tue, Nov 27, 2018 at 12:03 PM Georgi Djakov <georgi.djakov@linaro.org> wrote:
>

> This patch introduces a new API to get requirements and configure the

> interconnect buses across the entire chipset to fit with the current

> demand.

>

> The API is using a consumer/provider-based model, where the providers are

> the interconnect buses and the consumers could be various drivers.

> The consumers request interconnect resources (path) between endpoints and

> set the desired constraints on this data flow path. The providers receive

> requests from consumers and aggregate these requests for all master-slave

> pairs on that path. Then the providers configure each participating in the

> topology node according to the requested data flow path, physical links and

> constraints. The topology could be complicated and multi-tiered and is SoC

> specific.

>

> Signed-off-by: Georgi Djakov <georgi.djakov@linaro.org>

> Reviewed-by: Evan Green <evgreen@chromium.org>


[...]

> +struct icc_path *icc_get(struct device *dev, const int src_id,

> +                        const int dst_id);

> +void icc_put(struct icc_path *path);

> +int icc_set(struct icc_path *path, u32 avg_bw, u32 peak_bw);


icc_set() is very generic, but this function isn't easily extended to
other parameters than bandwidth. Perhaps icc_set_bandwidth() instead.
Then when you add some other setting, you just add a new function. Of
course, if you wind up with a bunch of different parameters, then
you'll probably need an atomic type interface where you test all the
settings together and then commit them separately in one call. But
from a DT perspective, I certainly hope there are not lots of new
settings folks want to add. :)

Rob
Georgi Djakov Dec. 7, 2018, 10:06 a.m. UTC | #4
Hi Greg and Evan,

On 12/6/18 16:55, Greg KH wrote:
> On Wed, Dec 05, 2018 at 12:41:35PM -0800, Evan Green wrote:

>> On Tue, Nov 27, 2018 at 10:03 AM Georgi Djakov <georgi.djakov@linaro.org> wrote:

>>>

>>> Modern SoCs have multiple processors and various dedicated cores (video, gpu,

>>> graphics, modem). These cores are talking to each other and can generate a

>>> lot of data flowing through the on-chip interconnects. These interconnect

>>> buses could form different topologies such as crossbar, point to point buses,

>>> hierarchical buses or use the network-on-chip concept.

>>>

>>> These buses have been sized usually to handle use cases with high data

>>> throughput but it is not necessary all the time and consume a lot of power.

>>> Furthermore, the priority between masters can vary depending on the running

>>> use case like video playback or CPU intensive tasks.

>>>

>>> Having an API to control the requirement of the system in terms of bandwidth

>>> and QoS, so we can adapt the interconnect configuration to match those by

>>> scaling the frequencies, setting link priority and tuning QoS parameters.

>>> This configuration can be a static, one-time operation done at boot for some

>>> platforms or a dynamic set of operations that happen at run-time.

>>>

>>> This patchset introduce a new API to get the requirement and configure the

>>> interconnect buses across the entire chipset to fit with the current demand.

>>> The API is NOT for changing the performance of the endpoint devices, but only

>>> the interconnect path in between them.

>>

>> For what it's worth, we are ready to land this in Chrome OS. I think

>> this series has been very well discussed and reviewed, hasn't changed

>> much in the last few spins, and is in good enough shape to use as a

>> base for future patches. Georgi's also done a great job reaching out

>> to other SoC vendors, and there appears to be enough consensus that

>> this framework will be usable by more than just Qualcomm. There are

>> also several drivers out on the list trying to add patches to use this

>> framework, with more to come, so it made sense (to us) to get this

>> base framework nailed down. In my experiments this is an important

>> piece of the overall power management story, especially on systems

>> that are mostly idle.

>>

>> I'll continue to track changes to this series and we will ultimately

>> reconcile with whatever happens upstream, but I thought it was worth

>> sending this note to express our "thumbs up" towards this framework.

> 

> Looks like a v11 will be forthcoming, so I'll wait for that one to apply

> it to the tree if all looks good.

> 


Yes, it's coming. I will also include an additional fixup patch, as the
sdm845 provider driver will fail to build in linux-next, due to a recent
change in the cmd_db API.

Thanks,
Georgi
Rafael J. Wysocki Dec. 10, 2018, 11 a.m. UTC | #5
On Mon, Dec 10, 2018 at 11:18 AM Georgi Djakov <georgi.djakov@linaro.org> wrote:
>

> Hi Rafael,

>

> On 12/10/18 11:04, Rafael J. Wysocki wrote:

> > On Thu, Dec 6, 2018 at 3:55 PM Greg KH <gregkh@linuxfoundation.org> wrote:

> >>

> >> On Wed, Dec 05, 2018 at 12:41:35PM -0800, Evan Green wrote:

> >>> On Tue, Nov 27, 2018 at 10:03 AM Georgi Djakov <georgi.djakov@linaro.org> wrote:

> >>>>

> >>>> Modern SoCs have multiple processors and various dedicated cores (video, gpu,

> >>>> graphics, modem). These cores are talking to each other and can generate a

> >>>> lot of data flowing through the on-chip interconnects. These interconnect

> >>>> buses could form different topologies such as crossbar, point to point buses,

> >>>> hierarchical buses or use the network-on-chip concept.

> >>>>

> >>>> These buses have been sized usually to handle use cases with high data

> >>>> throughput but it is not necessary all the time and consume a lot of power.

> >>>> Furthermore, the priority between masters can vary depending on the running

> >>>> use case like video playback or CPU intensive tasks.

> >>>>

> >>>> Having an API to control the requirement of the system in terms of bandwidth

> >>>> and QoS, so we can adapt the interconnect configuration to match those by

> >>>> scaling the frequencies, setting link priority and tuning QoS parameters.

> >>>> This configuration can be a static, one-time operation done at boot for some

> >>>> platforms or a dynamic set of operations that happen at run-time.

> >>>>

> >>>> This patchset introduce a new API to get the requirement and configure the

> >>>> interconnect buses across the entire chipset to fit with the current demand.

> >>>> The API is NOT for changing the performance of the endpoint devices, but only

> >>>> the interconnect path in between them.

> >>>

> >>> For what it's worth, we are ready to land this in Chrome OS. I think

> >>> this series has been very well discussed and reviewed, hasn't changed

> >>> much in the last few spins, and is in good enough shape to use as a

> >>> base for future patches. Georgi's also done a great job reaching out

> >>> to other SoC vendors, and there appears to be enough consensus that

> >>> this framework will be usable by more than just Qualcomm. There are

> >>> also several drivers out on the list trying to add patches to use this

> >>> framework, with more to come, so it made sense (to us) to get this

> >>> base framework nailed down. In my experiments this is an important

> >>> piece of the overall power management story, especially on systems

> >>> that are mostly idle.

> >>>

> >>> I'll continue to track changes to this series and we will ultimately

> >>> reconcile with whatever happens upstream, but I thought it was worth

> >>> sending this note to express our "thumbs up" towards this framework.

> >>

> >> Looks like a v11 will be forthcoming, so I'll wait for that one to apply

> >> it to the tree if all looks good.

> >

> > I'm honestly not sure if it is ready yet.

> >

> > New versions are coming on and on, which may make such an impression,

> > but we had some discussion on it at the LPC and some serious questions

> > were asked during it, for instance regarding the DT binding introduced

> > here.  I'm not sure how this particular issue has been addressed here,

> > for example.

>

> There have been no changes in bindings since v4 (other than squashing

> consumer and provider bindings into a single patch and fixing typos).

>

> The last DT comment was on v9 [1] where Rob wanted confirmation from

> other SoC vendors that this works for them too. And now we have that

> confirmation and there are patches posted on the list [2].


OK

> The second thing (also discussed at LPC) was about possible cases where

> some consumer drivers can't calculate how much bandwidth they actually

> need and how to address that. The proposal was to extend the OPP

> bindings with one more property, but this is not part of this patchset.

> It is a future step that needs more discussion on the mailing list. If a

> driver really needs some bandwidth data now, it should be put into the

> driver and not in DT. After we have enough consumers, we can discuss

> again if it makes sense to extract something into DT or not.


That's fine by me.

Admittedly, I have some reservations regarding the extent to which
this approach will turn out to be useful in practice, but I guess as
long as there is enough traction, the best way to find out it to try
and see. :-)

From now on I will assume that this series is going to be applied by Greg.

Thanks,
Rafael
Georgi Djakov Dec. 10, 2018, 2:50 p.m. UTC | #6
On 12/10/18 13:00, Rafael J. Wysocki wrote:
> On Mon, Dec 10, 2018 at 11:18 AM Georgi Djakov <georgi.djakov@linaro.org> wrote:

>>

>> Hi Rafael,

>>

>> On 12/10/18 11:04, Rafael J. Wysocki wrote:

>>> On Thu, Dec 6, 2018 at 3:55 PM Greg KH <gregkh@linuxfoundation.org> wrote:

>>>>

>>>> On Wed, Dec 05, 2018 at 12:41:35PM -0800, Evan Green wrote:

>>>>> On Tue, Nov 27, 2018 at 10:03 AM Georgi Djakov <georgi.djakov@linaro.org> wrote:

>>>>>>

>>>>>> Modern SoCs have multiple processors and various dedicated cores (video, gpu,

>>>>>> graphics, modem). These cores are talking to each other and can generate a

>>>>>> lot of data flowing through the on-chip interconnects. These interconnect

>>>>>> buses could form different topologies such as crossbar, point to point buses,

>>>>>> hierarchical buses or use the network-on-chip concept.

>>>>>>

>>>>>> These buses have been sized usually to handle use cases with high data

>>>>>> throughput but it is not necessary all the time and consume a lot of power.

>>>>>> Furthermore, the priority between masters can vary depending on the running

>>>>>> use case like video playback or CPU intensive tasks.

>>>>>>

>>>>>> Having an API to control the requirement of the system in terms of bandwidth

>>>>>> and QoS, so we can adapt the interconnect configuration to match those by

>>>>>> scaling the frequencies, setting link priority and tuning QoS parameters.

>>>>>> This configuration can be a static, one-time operation done at boot for some

>>>>>> platforms or a dynamic set of operations that happen at run-time.

>>>>>>

>>>>>> This patchset introduce a new API to get the requirement and configure the

>>>>>> interconnect buses across the entire chipset to fit with the current demand.

>>>>>> The API is NOT for changing the performance of the endpoint devices, but only

>>>>>> the interconnect path in between them.

>>>>>

>>>>> For what it's worth, we are ready to land this in Chrome OS. I think

>>>>> this series has been very well discussed and reviewed, hasn't changed

>>>>> much in the last few spins, and is in good enough shape to use as a

>>>>> base for future patches. Georgi's also done a great job reaching out

>>>>> to other SoC vendors, and there appears to be enough consensus that

>>>>> this framework will be usable by more than just Qualcomm. There are

>>>>> also several drivers out on the list trying to add patches to use this

>>>>> framework, with more to come, so it made sense (to us) to get this

>>>>> base framework nailed down. In my experiments this is an important

>>>>> piece of the overall power management story, especially on systems

>>>>> that are mostly idle.

>>>>>

>>>>> I'll continue to track changes to this series and we will ultimately

>>>>> reconcile with whatever happens upstream, but I thought it was worth

>>>>> sending this note to express our "thumbs up" towards this framework.

>>>>

>>>> Looks like a v11 will be forthcoming, so I'll wait for that one to apply

>>>> it to the tree if all looks good.

>>>

>>> I'm honestly not sure if it is ready yet.

>>>

>>> New versions are coming on and on, which may make such an impression,

>>> but we had some discussion on it at the LPC and some serious questions

>>> were asked during it, for instance regarding the DT binding introduced

>>> here.  I'm not sure how this particular issue has been addressed here,

>>> for example.

>>

>> There have been no changes in bindings since v4 (other than squashing

>> consumer and provider bindings into a single patch and fixing typos).

>>

>> The last DT comment was on v9 [1] where Rob wanted confirmation from

>> other SoC vendors that this works for them too. And now we have that

>> confirmation and there are patches posted on the list [2].

> 

> OK

> 

>> The second thing (also discussed at LPC) was about possible cases where

>> some consumer drivers can't calculate how much bandwidth they actually

>> need and how to address that. The proposal was to extend the OPP

>> bindings with one more property, but this is not part of this patchset.

>> It is a future step that needs more discussion on the mailing list. If a

>> driver really needs some bandwidth data now, it should be put into the

>> driver and not in DT. After we have enough consumers, we can discuss

>> again if it makes sense to extract something into DT or not.

> 

> That's fine by me.

> 

> Admittedly, I have some reservations regarding the extent to which

> this approach will turn out to be useful in practice, but I guess as

> long as there is enough traction, the best way to find out it to try

> and see. :-)

> 

> From now on I will assume that this series is going to be applied by Greg.


That was the initial idea, but the problem is that there is a recent
change in the cmd_db API (needed by the sdm845 provider driver), which
is going through arm-soc/qcom/drivers. So either Greg pulls also the
qcom-drivers-for-4.21 tag from Andy or the whole series goes via Olof
and Arnd. Maybe there are other options. I don't have any preference and
don't want to put extra burden on any maintainers, so i am ok with what
they prefer.

Thanks,
Georgi
Georgi Djakov Dec. 17, 2018, 11:17 a.m. UTC | #7
Hi Greg,

On 12/11/18 08:58, Greg Kroah-Hartman wrote:
> On Mon, Dec 10, 2018 at 04:50:00PM +0200, Georgi Djakov wrote:

>> On 12/10/18 13:00, Rafael J. Wysocki wrote:

>>> On Mon, Dec 10, 2018 at 11:18 AM Georgi Djakov <georgi.djakov@linaro.org> wrote:

>>>>

>>>> Hi Rafael,

>>>>

>>>> On 12/10/18 11:04, Rafael J. Wysocki wrote:

>>>>> On Thu, Dec 6, 2018 at 3:55 PM Greg KH <gregkh@linuxfoundation.org> wrote:

>>>>>>

>>>>>> On Wed, Dec 05, 2018 at 12:41:35PM -0800, Evan Green wrote:

>>>>>>> On Tue, Nov 27, 2018 at 10:03 AM Georgi Djakov <georgi.djakov@linaro.org> wrote:

>>>>>>>>

>>>>>>>> Modern SoCs have multiple processors and various dedicated cores (video, gpu,

>>>>>>>> graphics, modem). These cores are talking to each other and can generate a

>>>>>>>> lot of data flowing through the on-chip interconnects. These interconnect

>>>>>>>> buses could form different topologies such as crossbar, point to point buses,

>>>>>>>> hierarchical buses or use the network-on-chip concept.

>>>>>>>>

>>>>>>>> These buses have been sized usually to handle use cases with high data

>>>>>>>> throughput but it is not necessary all the time and consume a lot of power.

>>>>>>>> Furthermore, the priority between masters can vary depending on the running

>>>>>>>> use case like video playback or CPU intensive tasks.

>>>>>>>>

>>>>>>>> Having an API to control the requirement of the system in terms of bandwidth

>>>>>>>> and QoS, so we can adapt the interconnect configuration to match those by

>>>>>>>> scaling the frequencies, setting link priority and tuning QoS parameters.

>>>>>>>> This configuration can be a static, one-time operation done at boot for some

>>>>>>>> platforms or a dynamic set of operations that happen at run-time.

>>>>>>>>

>>>>>>>> This patchset introduce a new API to get the requirement and configure the

>>>>>>>> interconnect buses across the entire chipset to fit with the current demand.

>>>>>>>> The API is NOT for changing the performance of the endpoint devices, but only

>>>>>>>> the interconnect path in between them.

>>>>>>>

>>>>>>> For what it's worth, we are ready to land this in Chrome OS. I think

>>>>>>> this series has been very well discussed and reviewed, hasn't changed

>>>>>>> much in the last few spins, and is in good enough shape to use as a

>>>>>>> base for future patches. Georgi's also done a great job reaching out

>>>>>>> to other SoC vendors, and there appears to be enough consensus that

>>>>>>> this framework will be usable by more than just Qualcomm. There are

>>>>>>> also several drivers out on the list trying to add patches to use this

>>>>>>> framework, with more to come, so it made sense (to us) to get this

>>>>>>> base framework nailed down. In my experiments this is an important

>>>>>>> piece of the overall power management story, especially on systems

>>>>>>> that are mostly idle.

>>>>>>>

>>>>>>> I'll continue to track changes to this series and we will ultimately

>>>>>>> reconcile with whatever happens upstream, but I thought it was worth

>>>>>>> sending this note to express our "thumbs up" towards this framework.

>>>>>>

>>>>>> Looks like a v11 will be forthcoming, so I'll wait for that one to apply

>>>>>> it to the tree if all looks good.

>>>>>

>>>>> I'm honestly not sure if it is ready yet.

>>>>>

>>>>> New versions are coming on and on, which may make such an impression,

>>>>> but we had some discussion on it at the LPC and some serious questions

>>>>> were asked during it, for instance regarding the DT binding introduced

>>>>> here.  I'm not sure how this particular issue has been addressed here,

>>>>> for example.

>>>>

>>>> There have been no changes in bindings since v4 (other than squashing

>>>> consumer and provider bindings into a single patch and fixing typos).

>>>>

>>>> The last DT comment was on v9 [1] where Rob wanted confirmation from

>>>> other SoC vendors that this works for them too. And now we have that

>>>> confirmation and there are patches posted on the list [2].

>>>

>>> OK

>>>

>>>> The second thing (also discussed at LPC) was about possible cases where

>>>> some consumer drivers can't calculate how much bandwidth they actually

>>>> need and how to address that. The proposal was to extend the OPP

>>>> bindings with one more property, but this is not part of this patchset.

>>>> It is a future step that needs more discussion on the mailing list. If a

>>>> driver really needs some bandwidth data now, it should be put into the

>>>> driver and not in DT. After we have enough consumers, we can discuss

>>>> again if it makes sense to extract something into DT or not.

>>>

>>> That's fine by me.

>>>

>>> Admittedly, I have some reservations regarding the extent to which

>>> this approach will turn out to be useful in practice, but I guess as

>>> long as there is enough traction, the best way to find out it to try

>>> and see. :-)

>>>

>>> From now on I will assume that this series is going to be applied by Greg.

>>

>> That was the initial idea, but the problem is that there is a recent

>> change in the cmd_db API (needed by the sdm845 provider driver), which

>> is going through arm-soc/qcom/drivers. So either Greg pulls also the

>> qcom-drivers-for-4.21 tag from Andy or the whole series goes via Olof

>> and Arnd. Maybe there are other options. I don't have any preference and

>> don't want to put extra burden on any maintainers, so i am ok with what

>> they prefer.

> 

> Let me take the time later this week to review the code, which I haven't

> done in a while...

> 


When you get a chance to review, please keep in mind that the latest
version is v12 (from 08.Dec). The same is also available in linux-next
with no reported issues.

Thanks,
Georgi