mbox series

[v3,0/3] Convert thermal bindings to yaml

Message ID cover.1585117436.git.amit.kucheria@linaro.org
Headers show
Series Convert thermal bindings to yaml | expand

Message

Amit Kucheria March 25, 2020, 6:34 a.m. UTC
Hi all,

Here is a series splitting up the thermal bindings into 3 separate bindings
in YAML, one each of the sensor, cooling-device and the thermal zones.

A series to remove thermal.txt and change over all references to it will
follow shortly. Another series to fixup problems found by enforcing this
yaml definition across dts files will also follow.

Changes since v2:
- Addressed review comment from Rob
- Added required properties for thermal-zones node
- Added select: true to thermal-cooling-devices.yaml
- Fixed up example to pass dt_binding_check

Changes since v1:
- Addressed review comments from Rob
- Moved the license back to GPLv2, waiting for other authors to give
  permission to relicense to BSD-2-Clause as well
- Fixed up warnings thrown by dt_binding_check

I have to add that the bindings as they exist today, don't really follow
the "describe the hardware" model of devicetree. e.g. the entire
thermal-zone binding is a software abstraction to tie arbitrary,
board-specific trip points to cooling strategies. This doesn't fit well
into the model where the same SoC in two different form-factor devices e.g.
mobile and laptop, will have fairly different thermal profiles and might
benefit from different trip points and mitigation heuristics. I've started
some experiments with moving the thermal zone data to a board-specific
platform data that is used to initialise a "thermal zone driver".

In any case, if we ever move down that path, it'll probably end up being v2
of the binding, so this series is still relevant.

Please help review.

Regards,
Amit

Amit Kucheria (3):
  dt-bindings: thermal: Add yaml bindings for thermal sensors
  dt-bindings: thermal: Add yaml bindings for thermal cooling-devices
  dt-bindings: thermal: Add yaml bindings for thermal zones

 .../thermal/thermal-cooling-devices.yaml      | 116 +++++++
 .../bindings/thermal/thermal-sensor.yaml      |  72 ++++
 .../bindings/thermal/thermal-zones.yaml       | 324 ++++++++++++++++++
 3 files changed, 512 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/thermal/thermal-cooling-devices.yaml
 create mode 100644 Documentation/devicetree/bindings/thermal/thermal-sensor.yaml
 create mode 100644 Documentation/devicetree/bindings/thermal/thermal-zones.yaml

Comments

Lukasz Luba March 25, 2020, 11:06 a.m. UTC | #1
On 3/25/20 6:34 AM, Amit Kucheria wrote:
> As part of moving the thermal bindings to YAML, split it up into 3
> bindings: thermal sensors, cooling devices and thermal zones.
> 
> The thermal-zone binding is a software abstraction to capture the
> properties of each zone - how often they should be checked, the
> temperature thresholds (trips) at which mitigation actions need to be
> taken and the level of mitigation needed at those thresholds.
> 
> Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
> ---
> Changes since v2:
> - Addressed review comment from Rob
> - Added required properties for thermal-zones node
> - Added select: true to thermal-cooling-devices.yaml
> - Fixed up example to pass dt_binding_check
> 
>   .../bindings/thermal/thermal-zones.yaml       | 324 ++++++++++++++++++
>   1 file changed, 324 insertions(+)
>   create mode 100644 Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> 
> diff --git a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> new file mode 100644
> index 000000000000..5632304dcf62
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> @@ -0,0 +1,324 @@
> +# SPDX-License-Identifier: (GPL-2.0)
> +# Copyright 2020 Linaro Ltd.
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/thermal/thermal-zones.yaml#
> +$schema: http://devicetree.org/meta-schemas/base.yaml#
> +
> +title: Thermal zone binding
> +
> +maintainers:
> +  - Amit Kucheria <amitk@kernel.org>
> +
> +description: |
> +  Thermal management is achieved in devicetree by describing the sensor hardware
> +  and the software abstraction of cooling devices and thermal zones required to
> +  take appropriate action to mitigate thermal overloads.
> +
> +  The following node types are used to completely describe a thermal management
> +  system in devicetree:
> +   - thermal-sensor: device that measures temperature, has SoC-specific bindings
> +   - cooling-device: device used to dissipate heat either passively or actively
> +   - thermal-zones: a container of the following node types used to describe all
> +     thermal data for the platform
> +
> +  This binding describes the thermal-zones.
> +
> +  The polling-delay properties of a thermal-zone are bound to the maximum dT/dt
> +  (temperature derivative over time) in two situations for a thermal zone:
> +    1. when passive cooling is activated (polling-delay-passive)
> +    2. when the zone just needs to be monitored (polling-delay) or when
> +       active cooling is activated.
> +
> +  The maximum dT/dt is highly bound to hardware power consumption and
> +  dissipation capability. The delays should be chosen to account for said
> +  max dT/dt, such that a device does not cross several trip boundaries
> +  unexpectedly between polls. Choosing the right polling delays shall avoid
> +  having the device in temperature ranges that may damage the silicon structures
> +  and reduce silicon lifetime.
> +
> +properties:
> +  $nodename:
> +    const: thermal-zones
> +    description:
> +      A /thermal-zones node is required in order to use the thermal framework to
> +      manage input from the various thermal zones in the system in order to
> +      mitigate thermal overload conditions. It does not represent a real device
> +      in the system, but acts as a container to link thermal sensor devices,

I would say 'thermal sensor device', since there is 1-to-1 mapping and
aggregating a few sensors inside one tz is not allowed (or I missed
some patches queuing).

> +      platform-data regarding temperature thresholds and the mitigation actions
> +      to take when the temperature crosses those thresholds.
> +
> +patternProperties:
> +  "^[a-zA-Z][a-zA-Z0-9\\-]{1,12}-thermal$":
> +    type: object
> +    description:
> +      Each thermal zone node contains information about how frequently it
> +      must be checked, the sensor responsible for reporting temperature for
> +      this zone, one sub-node containing the various trip points for this
> +      zone and one sub-node containing all the zone cooling-maps.
> +
> +    properties:
> +      polling-delay:
> +        $ref: /schemas/types.yaml#/definitions/uint32
> +        description:
> +          The maximum number of milliseconds to wait between polls when
> +          checking this thermal zone. Setting this to 0 disables the polling
> +          timers setup by the thermal framework and assumes that the thermal
> +          sensors in this zone support interrupts.
> +
> +      polling-delay-passive:
> +        $ref: /schemas/types.yaml#/definitions/uint32
> +        description:
> +          The maximum number of milliseconds to wait between polls when
> +          checking this thermal zone while doing passive cooling. Setting
> +          this to 0 disables the polling timers setup by the thermal
> +          framework and assumes that the thermal sensors in this zone
> +          support interrupts.
> +
> +      thermal-sensors:
> +        $ref: /schemas/types.yaml#/definitions/phandle-array
> +        description:
> +          A list of thermal sensor phandles and sensor specifiers used to
> +          monitor this thermal zone.

I don't know why it's not consistent with the actual code in
of-thermal.c, where there is even a comment stated:
/* For now, thermal framework supports only 1 sensor per zone */

I think this is the place where developers should be informed about
the limitation and not even try to put more sensors into the list.

> +
> +      trips:
> +        type: object
> +        description:
> +          This node describes a set of points in the temperature domain at
> +          which the thermal framework needs to takes action. The actions to

s/needs to takes/needs to take/

> +          be taken are defined in another node called cooling-maps.
> +
> +        patternProperties:
> +          "^[a-zA-Z][a-zA-Z0-9\\-_]{0,63}$":
> +            type: object
> +
> +            properties:
> +              temperature:
> +                $ref: /schemas/types.yaml#/definitions/int32
> +                minimum: -273000
> +                maximum: 200000
> +                description:
> +                  An integer expressing the trip temperature in millicelsius.
> +
> +              hysteresis:
> +                $ref: /schemas/types.yaml#/definitions/uint32
> +                description:
> +                  An unsigned integer expressing the hysteresis delta with
> +                  respect to the trip temperature property above, also in
> +                  millicelsius.

This property is worth a bit longer description.

> +
> +              type:
> +                $ref: /schemas/types.yaml#/definitions/string
> +                enum:
> +                  - active   # enable active cooling e.g. fans
> +                  - passive  # enable passive cooling e.g. throttling cpu
> +                  - hot      # send notification to driver
> +                  - critical # send notification to driver, trigger shutdown
> +                description: |
> +                  There are four valid trip types: active, passive, hot,
> +                  critical.

[snip]

> +
> +    thermal-zones {
> +            cpu0-thermal {
> +                    polling-delay-passive = <250>;
> +                    polling-delay = <1000>;
> +
> +                    thermal-sensors = <&tsens0 1>;
> +
> +                    trips {
> +                            cpu0_alert0: trip-point0 {
> +                                    temperature = <90000>;
> +                                    hysteresis = <2000>;
> +                                    type = "passive";
> +                            };
> +
> +                            cpu0_alert1: trip-point1 {
> +                                    temperature = <95000>;
> +                                    hysteresis = <2000>;
> +                                    type = "passive";
> +                            };
> +
> +                            cpu0_crit: cpu_crit {
> +                                    temperature = <110000>;
> +                                    hysteresis = <1000>;
> +                                    type = "critical";
> +                            };
> +                    };
> +
> +                    cooling-maps {
> +                            map0 {
> +                                    trip = <&cpu0_alert0>;
> +                                    cooling-device = <&CPU0 THERMAL_NO_LIMIT
> +                                                            THERMAL_NO_LIMIT>,
> +                                                     <&CPU1 THERMAL_NO_LIMIT
> +                                                            THERMAL_NO_LIMIT>,
> +                                                     <&CPU2 THERMAL_NO_LIMIT
> +                                                            THERMAL_NO_LIMIT>,
> +                                                     <&CPU3 THERMAL_NO_LIMIT
> +                                                            THERMAL_NO_LIMIT>;
> +                            };
> +
> +                            map1 {
> +                                    trip = <&cpu0_alert1>;
> +                                    cooling-device = <&CPU0 THERMAL_NO_LIMIT
> +                                                            THERMAL_NO_LIMIT>,
> +                                                     <&CPU1 THERMAL_NO_LIMIT
> +                                                            THERMAL_NO_LIMIT>,
> +                                                     <&CPU2 THERMAL_NO_LIMIT
> +                                                            THERMAL_NO_LIMIT>,
> +                                                     <&CPU3 THERMAL_NO_LIMIT
> +                                                            THERMAL_NO_LIMIT>;

 From this two examples of handling cpu0_alert0 and cpu0_alert1 you
cannot conclude anything (if you don't understand thermal framework (and
probably IPA). As a simple example it would be better to put a comment
with a description and limit min, max to a specific OPP:

map0 {
     trip = <&cpu0_alert0>;
     /* Corresponds to 1400MHz in OPP table */
     cooling-device = <&CPU0 3 3>, <&CPU1 3 3>, <&CPU2 3 3>, <&CPU3 3 3>;
};

map1 {
     trip = <&cpu0_alert1>;
     /* Corresponds to 1000MHz in OPP table */
     cooling-device = <&CPU0 5 5>, <&CPU1 5 5>, <&CPU2 5 5>, <&CPU3 5 5>;
};

IMHO this kind of example would tell more to an avg driver developer.

Regards,
Lukasz
Amit Kucheria March 25, 2020, 3:42 p.m. UTC | #2
On Wed, Mar 25, 2020 at 4:36 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>
>
>
> On 3/25/20 6:34 AM, Amit Kucheria wrote:
> > As part of moving the thermal bindings to YAML, split it up into 3
> > bindings: thermal sensors, cooling devices and thermal zones.
> >
> > The thermal-zone binding is a software abstraction to capture the
> > properties of each zone - how often they should be checked, the
> > temperature thresholds (trips) at which mitigation actions need to be
> > taken and the level of mitigation needed at those thresholds.
> >
> > Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
> > ---
> > Changes since v2:
> > - Addressed review comment from Rob
> > - Added required properties for thermal-zones node
> > - Added select: true to thermal-cooling-devices.yaml
> > - Fixed up example to pass dt_binding_check
> >
> >   .../bindings/thermal/thermal-zones.yaml       | 324 ++++++++++++++++++
> >   1 file changed, 324 insertions(+)
> >   create mode 100644 Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> >
> > diff --git a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> > new file mode 100644
> > index 000000000000..5632304dcf62
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
> > @@ -0,0 +1,324 @@
> > +# SPDX-License-Identifier: (GPL-2.0)
> > +# Copyright 2020 Linaro Ltd.
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/thermal/thermal-zones.yaml#
> > +$schema: http://devicetree.org/meta-schemas/base.yaml#
> > +
> > +title: Thermal zone binding
> > +
> > +maintainers:
> > +  - Amit Kucheria <amitk@kernel.org>
> > +
> > +description: |
> > +  Thermal management is achieved in devicetree by describing the sensor hardware
> > +  and the software abstraction of cooling devices and thermal zones required to
> > +  take appropriate action to mitigate thermal overloads.
> > +
> > +  The following node types are used to completely describe a thermal management
> > +  system in devicetree:
> > +   - thermal-sensor: device that measures temperature, has SoC-specific bindings
> > +   - cooling-device: device used to dissipate heat either passively or actively
> > +   - thermal-zones: a container of the following node types used to describe all
> > +     thermal data for the platform
> > +
> > +  This binding describes the thermal-zones.
> > +
> > +  The polling-delay properties of a thermal-zone are bound to the maximum dT/dt
> > +  (temperature derivative over time) in two situations for a thermal zone:
> > +    1. when passive cooling is activated (polling-delay-passive)
> > +    2. when the zone just needs to be monitored (polling-delay) or when
> > +       active cooling is activated.
> > +
> > +  The maximum dT/dt is highly bound to hardware power consumption and
> > +  dissipation capability. The delays should be chosen to account for said
> > +  max dT/dt, such that a device does not cross several trip boundaries
> > +  unexpectedly between polls. Choosing the right polling delays shall avoid
> > +  having the device in temperature ranges that may damage the silicon structures
> > +  and reduce silicon lifetime.
> > +
> > +properties:
> > +  $nodename:
> > +    const: thermal-zones
> > +    description:
> > +      A /thermal-zones node is required in order to use the thermal framework to
> > +      manage input from the various thermal zones in the system in order to
> > +      mitigate thermal overload conditions. It does not represent a real device
> > +      in the system, but acts as a container to link thermal sensor devices,
>
> I would say 'thermal sensor device', since there is 1-to-1 mapping and
> aggregating a few sensors inside one tz is not allowed (or I missed
> some patches queuing).

See below.

>
> > +      platform-data regarding temperature thresholds and the mitigation actions
> > +      to take when the temperature crosses those thresholds.
> > +
> > +patternProperties:
> > +  "^[a-zA-Z][a-zA-Z0-9\\-]{1,12}-thermal$":
> > +    type: object
> > +    description:
> > +      Each thermal zone node contains information about how frequently it
> > +      must be checked, the sensor responsible for reporting temperature for
> > +      this zone, one sub-node containing the various trip points for this
> > +      zone and one sub-node containing all the zone cooling-maps.
> > +
> > +    properties:
> > +      polling-delay:
> > +        $ref: /schemas/types.yaml#/definitions/uint32
> > +        description:
> > +          The maximum number of milliseconds to wait between polls when
> > +          checking this thermal zone. Setting this to 0 disables the polling
> > +          timers setup by the thermal framework and assumes that the thermal
> > +          sensors in this zone support interrupts.
> > +
> > +      polling-delay-passive:
> > +        $ref: /schemas/types.yaml#/definitions/uint32
> > +        description:
> > +          The maximum number of milliseconds to wait between polls when
> > +          checking this thermal zone while doing passive cooling. Setting
> > +          this to 0 disables the polling timers setup by the thermal
> > +          framework and assumes that the thermal sensors in this zone
> > +          support interrupts.
> > +
> > +      thermal-sensors:
> > +        $ref: /schemas/types.yaml#/definitions/phandle-array
> > +        description:
> > +          A list of thermal sensor phandles and sensor specifiers used to
> > +          monitor this thermal zone.
>
> I don't know why it's not consistent with the actual code in
> of-thermal.c, where there is even a comment stated:
> /* For now, thermal framework supports only 1 sensor per zone */
>
> I think this is the place where developers should be informed about
> the limitation and not even try to put more sensors into the list.

That is a good point. I'm currently "porting" the existing binding as
described in thermal.txt to yaml. If you look at some of the example
(c) in there, the bindings allow many sensors to a zone mapping but
the thermal core doesn't implement that functionality.

So should we fix the core code or change the bindings? Thoughts - Rob,
Daniel, Rui?

> > +
> > +      trips:
> > +        type: object
> > +        description:
> > +          This node describes a set of points in the temperature domain at
> > +          which the thermal framework needs to takes action. The actions to
>
> s/needs to takes/needs to take/

Will fix.

> > +          be taken are defined in another node called cooling-maps.
> > +
> > +        patternProperties:
> > +          "^[a-zA-Z][a-zA-Z0-9\\-_]{0,63}$":
> > +            type: object
> > +
> > +            properties:
> > +              temperature:
> > +                $ref: /schemas/types.yaml#/definitions/int32
> > +                minimum: -273000
> > +                maximum: 200000
> > +                description:
> > +                  An integer expressing the trip temperature in millicelsius.
> > +
> > +              hysteresis:
> > +                $ref: /schemas/types.yaml#/definitions/uint32
> > +                description:
> > +                  An unsigned integer expressing the hysteresis delta with
> > +                  respect to the trip temperature property above, also in
> > +                  millicelsius.
>
> This property is worth a bit longer description.

Will improve the description.

> > +
> > +              type:
> > +                $ref: /schemas/types.yaml#/definitions/string
> > +                enum:
> > +                  - active   # enable active cooling e.g. fans
> > +                  - passive  # enable passive cooling e.g. throttling cpu
> > +                  - hot      # send notification to driver
> > +                  - critical # send notification to driver, trigger shutdown
> > +                description: |
> > +                  There are four valid trip types: active, passive, hot,
> > +                  critical.
>
> [snip]
>
> > +
> > +    thermal-zones {
> > +            cpu0-thermal {
> > +                    polling-delay-passive = <250>;
> > +                    polling-delay = <1000>;
> > +
> > +                    thermal-sensors = <&tsens0 1>;
> > +
> > +                    trips {
> > +                            cpu0_alert0: trip-point0 {
> > +                                    temperature = <90000>;
> > +                                    hysteresis = <2000>;
> > +                                    type = "passive";
> > +                            };
> > +
> > +                            cpu0_alert1: trip-point1 {
> > +                                    temperature = <95000>;
> > +                                    hysteresis = <2000>;
> > +                                    type = "passive";
> > +                            };
> > +
> > +                            cpu0_crit: cpu_crit {
> > +                                    temperature = <110000>;
> > +                                    hysteresis = <1000>;
> > +                                    type = "critical";
> > +                            };
> > +                    };
> > +
> > +                    cooling-maps {
> > +                            map0 {
> > +                                    trip = <&cpu0_alert0>;
> > +                                    cooling-device = <&CPU0 THERMAL_NO_LIMIT
> > +                                                            THERMAL_NO_LIMIT>,
> > +                                                     <&CPU1 THERMAL_NO_LIMIT
> > +                                                            THERMAL_NO_LIMIT>,
> > +                                                     <&CPU2 THERMAL_NO_LIMIT
> > +                                                            THERMAL_NO_LIMIT>,
> > +                                                     <&CPU3 THERMAL_NO_LIMIT
> > +                                                            THERMAL_NO_LIMIT>;
> > +                            };
> > +
> > +                            map1 {
> > +                                    trip = <&cpu0_alert1>;
> > +                                    cooling-device = <&CPU0 THERMAL_NO_LIMIT
> > +                                                            THERMAL_NO_LIMIT>,
> > +                                                     <&CPU1 THERMAL_NO_LIMIT
> > +                                                            THERMAL_NO_LIMIT>,
> > +                                                     <&CPU2 THERMAL_NO_LIMIT
> > +                                                            THERMAL_NO_LIMIT>,
> > +                                                     <&CPU3 THERMAL_NO_LIMIT
> > +                                                            THERMAL_NO_LIMIT>;
>
>  From this two examples of handling cpu0_alert0 and cpu0_alert1 you
> cannot conclude anything (if you don't understand thermal framework (and
> probably IPA). As a simple example it would be better to put a comment
> with a description and limit min, max to a specific OPP:
>
> map0 {
>      trip = <&cpu0_alert0>;
>      /* Corresponds to 1400MHz in OPP table */
>      cooling-device = <&CPU0 3 3>, <&CPU1 3 3>, <&CPU2 3 3>, <&CPU3 3 3>;
> };
>
> map1 {
>      trip = <&cpu0_alert1>;
>      /* Corresponds to 1000MHz in OPP table */
>      cooling-device = <&CPU0 5 5>, <&CPU1 5 5>, <&CPU2 5 5>, <&CPU3 5 5>;
> };
>
> IMHO this kind of example would tell more to an avg driver developer.

Will fix.

Thanks for the review.

Regards,
Amit
Daniel Lezcano March 30, 2020, 1:07 p.m. UTC | #3
Hi Amit,

On 30/03/2020 12:34, Amit Kucheria wrote:

[ ... ]

>>> I don't know why it's not consistent with the actual code in
>>> of-thermal.c, where there is even a comment stated: /* For now,
>>> thermal framework supports only 1 sensor per zone */
>>>
>>> I think this is the place where developers should be informed
>>> about the limitation and not even try to put more sensors into
>>> the list.
>>
>> That is a good point. I'm currently "porting" the existing
>> binding as described in thermal.txt to yaml. If you look at some
>> of the example (c) in there, the bindings allow many sensors to a
>> zone mapping but the thermal core doesn't implement that
>> functionality.
>>
>> So should we fix the core code or change the bindings? Thoughts -
>> Rob, Daniel, Rui?
>
> Rob, Daniel: Any comments? We don't have any concerns for Linux
> backward compatibility since multiple sensors per zone isn't used
> anywhere. But asking since bindings are supposed to be
> OS-agnostic.

IMO, we should remove it as it is not used anywhere.

We still have to decide how we aggregate multiple sensors.
Rob Herring (Arm) March 31, 2020, 9:13 p.m. UTC | #4
On Mon, Mar 30, 2020 at 03:07:53PM +0200, Daniel Lezcano wrote:
> 
> Hi Amit,
> 
> On 30/03/2020 12:34, Amit Kucheria wrote:
> 
> [ ... ]
> 
> >>> I don't know why it's not consistent with the actual code in
> >>> of-thermal.c, where there is even a comment stated: /* For now,
> >>> thermal framework supports only 1 sensor per zone */
> >>>
> >>> I think this is the place where developers should be informed
> >>> about the limitation and not even try to put more sensors into
> >>> the list.
> >>
> >> That is a good point. I'm currently "porting" the existing
> >> binding as described in thermal.txt to yaml. If you look at some
> >> of the example (c) in there, the bindings allow many sensors to a
> >> zone mapping but the thermal core doesn't implement that
> >> functionality.
> >>
> >> So should we fix the core code or change the bindings? Thoughts -
> >> Rob, Daniel, Rui?
> >
> > Rob, Daniel: Any comments? We don't have any concerns for Linux
> > backward compatibility since multiple sensors per zone isn't used
> > anywhere. But asking since bindings are supposed to be
> > OS-agnostic.
> 
> IMO, we should remove it as it is not used anywhere.
> 
> We still have to decide how we aggregate multiple sensors.

The schema only needs to pass what currently exists (assuming no 
errors), so extending it later is fine with me.

Rob