diff mbox series

Regression after recent changes to drivers/thermal/thermal_of.c

Message ID CACRpkdbB5hgkrPZVb-+9tuKErvwjTKNaBQ1LvH1==fR7bndjSQ@mail.gmail.com
State New
Headers show
Series Regression after recent changes to drivers/thermal/thermal_of.c | expand

Commit Message

Linus Walleij Oct. 25, 2022, 9:13 p.m. UTC
Hi Folks,

I have this in my dmesg in v6.1-rc1:

[    3.879229] ab8500-fg ab8500-fg.0: line impedance: 36000 uOhm
[    3.892793] power_supply ab8500_usb: Samsung SDI EB-L1M7FLU battery 1500 mAh
[    3.901663] thermal_sys: Failed to find 'trips' node
[    3.906635] thermal_sys: Failed to find trip points for thermistor id=0
[    3.913427] ntc-thermistor thermistor: unable to register as hwmon device.
[    3.920350] ntc-thermistor: probe of thermistor failed with error -22

The device tree looks like this
(arch/arm/boot/dts/ste-ux500-samsung-golden.dts):

        thermal-zones {
                battery-thermal {
                        /* This zone will be polled by the battery
temperature code */
                        polling-delay = <0>;
                        polling-delay-passive = <0>;
                        thermal-sensors = <&bat_therm>;
                };
        };

This is a thermal zone without trip points, which it seems like the new
code does not allow, also the bindings were patched to not allow this,
in commit 8c596324232d22e19f8df59ba03410b9b5b0f3d7
"dt-bindings: thermal: Fix missing required property"
but this broke my systems. The requirement to have trip points also
broke my device trees.

The reason why I have this is that the thermal zone is not managed
by the OF thermal core, but by the battery charging algorithm which
just retrieves the thermal zone and use it to read the temperature, see
commit 2b0e7ac0841b3906aeecf432567b02af683a596c
"power: supply: ab8500: Integrate thermal zone".

The code is using
thermal_zone_get_zone_by_name()
thermal_zone_get_temp()
and applying its own policy on the thermal zone in order to not
dulicate code.

I understand from the code and changes to the bindings that the
authors assume that no zones without trips exist but... well they
exist.

I understand that the bindings always said that trips are required
but ... thermal zones without trip points make a bit of sense.
It's just a zone without a policy. It can be observed even if it can't
be acted on.

How do you want to solve this? Can we make trips non-compulsory
again or shall I add dummy trip points to the device trees?

This:


Makes the thermal zone probe again. I can't see if that really solves
it because there are other possibly unrelated bugs.

There are some other users of thermal_zone_get_temp() and they
may be broken too, I haven't' looked close.

Yours,
Linus Walleij

Comments

Rob Herring Oct. 26, 2022, 3:47 p.m. UTC | #1
On Tue, Oct 25, 2022 at 4:13 PM Linus Walleij <linus.walleij@linaro.org> wrote:
>
> Hi Folks,
>
> I have this in my dmesg in v6.1-rc1:
>
> [    3.879229] ab8500-fg ab8500-fg.0: line impedance: 36000 uOhm
> [    3.892793] power_supply ab8500_usb: Samsung SDI EB-L1M7FLU battery 1500 mAh
> [    3.901663] thermal_sys: Failed to find 'trips' node
> [    3.906635] thermal_sys: Failed to find trip points for thermistor id=0
> [    3.913427] ntc-thermistor thermistor: unable to register as hwmon device.
> [    3.920350] ntc-thermistor: probe of thermistor failed with error -22
>
> The device tree looks like this
> (arch/arm/boot/dts/ste-ux500-samsung-golden.dts):
>
>         thermal-zones {
>                 battery-thermal {
>                         /* This zone will be polled by the battery
> temperature code */
>                         polling-delay = <0>;
>                         polling-delay-passive = <0>;
>                         thermal-sensors = <&bat_therm>;
>                 };
>         };
>
> This is a thermal zone without trip points, which it seems like the new
> code does not allow, also the bindings were patched to not allow this,
> in commit 8c596324232d22e19f8df59ba03410b9b5b0f3d7
> "dt-bindings: thermal: Fix missing required property"
> but this broke my systems. The requirement to have trip points also
> broke my device trees.
>
> The reason why I have this is that the thermal zone is not managed
> by the OF thermal core, but by the battery charging algorithm which
> just retrieves the thermal zone and use it to read the temperature, see
> commit 2b0e7ac0841b3906aeecf432567b02af683a596c
> "power: supply: ab8500: Integrate thermal zone".
>
> The code is using
> thermal_zone_get_zone_by_name()
> thermal_zone_get_temp()
> and applying its own policy on the thermal zone in order to not
> dulicate code.
>
> I understand from the code and changes to the bindings that the
> authors assume that no zones without trips exist but... well they
> exist.
>
> I understand that the bindings always said that trips are required
> but ... thermal zones without trip points make a bit of sense.
> It's just a zone without a policy. It can be observed even if it can't
> be acted on.
>
> How do you want to solve this? Can we make trips non-compulsory
> again or shall I add dummy trip points to the device trees?
>
> This:
>
> diff --git a/arch/arm/boot/dts/ste-ux500-samsung-golden.dts
> b/arch/arm/boot/dts/ste-ux500-samsung-golden.dts
> index b0dce91aff4b..d00e9e6ebbf7 100644
> --- a/arch/arm/boot/dts/ste-ux500-samsung-golden.dts
> +++ b/arch/arm/boot/dts/ste-ux500-samsung-golden.dts
> @@ -35,6 +35,15 @@ battery-thermal {
>                         polling-delay = <0>;
>                         polling-delay-passive = <0>;
>                         thermal-sensors = <&bat_therm>;
> +
> +                       trips {
> +                               /* Unused trip point to please the framework */
> +                               dummy {
> +                                       temperature = <700000>;
> +                                       hysteresis = <2000>;
> +                                       type = "passive";
> +                               };
> +                       };

That's ugly and requiring a DT update breaks the ABI. So the
requirement for 'trips' should be reverted. (Well the schema should, I
imagine the code change is not just a revert.)

Rob
Rafael J. Wysocki Oct. 26, 2022, 5:06 p.m. UTC | #2
On Wed, Oct 26, 2022 at 5:47 PM Rob Herring <robh+dt@kernel.org> wrote:
>
> On Tue, Oct 25, 2022 at 4:13 PM Linus Walleij <linus.walleij@linaro.org> wrote:
> >
> > Hi Folks,
> >
> > I have this in my dmesg in v6.1-rc1:
> >
> > [    3.879229] ab8500-fg ab8500-fg.0: line impedance: 36000 uOhm
> > [    3.892793] power_supply ab8500_usb: Samsung SDI EB-L1M7FLU battery 1500 mAh
> > [    3.901663] thermal_sys: Failed to find 'trips' node
> > [    3.906635] thermal_sys: Failed to find trip points for thermistor id=0
> > [    3.913427] ntc-thermistor thermistor: unable to register as hwmon device.
> > [    3.920350] ntc-thermistor: probe of thermistor failed with error -22
> >
> > The device tree looks like this
> > (arch/arm/boot/dts/ste-ux500-samsung-golden.dts):
> >
> >         thermal-zones {
> >                 battery-thermal {
> >                         /* This zone will be polled by the battery
> > temperature code */
> >                         polling-delay = <0>;
> >                         polling-delay-passive = <0>;
> >                         thermal-sensors = <&bat_therm>;
> >                 };
> >         };
> >
> > This is a thermal zone without trip points, which it seems like the new
> > code does not allow, also the bindings were patched to not allow this,
> > in commit 8c596324232d22e19f8df59ba03410b9b5b0f3d7
> > "dt-bindings: thermal: Fix missing required property"
> > but this broke my systems. The requirement to have trip points also
> > broke my device trees.
> >
> > The reason why I have this is that the thermal zone is not managed
> > by the OF thermal core, but by the battery charging algorithm which
> > just retrieves the thermal zone and use it to read the temperature, see
> > commit 2b0e7ac0841b3906aeecf432567b02af683a596c
> > "power: supply: ab8500: Integrate thermal zone".
> >
> > The code is using
> > thermal_zone_get_zone_by_name()
> > thermal_zone_get_temp()
> > and applying its own policy on the thermal zone in order to not
> > dulicate code.
> >
> > I understand from the code and changes to the bindings that the
> > authors assume that no zones without trips exist but... well they
> > exist.
> >
> > I understand that the bindings always said that trips are required
> > but ... thermal zones without trip points make a bit of sense.
> > It's just a zone without a policy. It can be observed even if it can't
> > be acted on.
> >
> > How do you want to solve this? Can we make trips non-compulsory
> > again or shall I add dummy trip points to the device trees?
> >
> > This:
> >
> > diff --git a/arch/arm/boot/dts/ste-ux500-samsung-golden.dts
> > b/arch/arm/boot/dts/ste-ux500-samsung-golden.dts
> > index b0dce91aff4b..d00e9e6ebbf7 100644
> > --- a/arch/arm/boot/dts/ste-ux500-samsung-golden.dts
> > +++ b/arch/arm/boot/dts/ste-ux500-samsung-golden.dts
> > @@ -35,6 +35,15 @@ battery-thermal {
> >                         polling-delay = <0>;
> >                         polling-delay-passive = <0>;
> >                         thermal-sensors = <&bat_therm>;
> > +
> > +                       trips {
> > +                               /* Unused trip point to please the framework */
> > +                               dummy {
> > +                                       temperature = <700000>;
> > +                                       hysteresis = <2000>;
> > +                                       type = "passive";
> > +                               };
> > +                       };
>
> That's ugly and requiring a DT update breaks the ABI. So the
> requirement for 'trips' should be reverted. (Well the schema should, I
> imagine the code change is not just a revert.)

I agree and the code change is not just a revert AFAICS.

This is Daniel's work, so I'm still hoping that he'll chime in
shortly, but in any case the code has to work with the existing
setups, no question about that.
Linus Walleij Oct. 28, 2022, 8:04 a.m. UTC | #3
On Wed, Oct 26, 2022 at 11:40 PM Daniel Lezcano
<daniel.lezcano@linexp.org> wrote:
> On 26/10/2022 19:06, Rafael J. Wysocki wrote:

> > This is Daniel's work, so I'm still hoping that he'll chime in
> > shortly,
>
> Yep, I'm in sick leave ATM, I broke my arm (without wordplay).

Yeah I heard, get well soon!

> I took sometime to read the code, so from my POV we should keep the
> required property patch because the DT was defined that as required
> property. The conversion to yaml obviously spotted the DT not conforming
> with the bindings.

So I guess you mean I should add some trip points to my device
trees then so they pass validation?

It's fine with me, I can just put some absolute maximum temperatures
around the batteries, I am more worrying if there are other users
out there that might get upset.

I have a problem to add a trip point like this:

                battery-crit-lo {
                    temperature = <-50000>;
                    hysteresis = <2000>;
                    type = "critical";
                };

Despite it is legal to the schema:

            properties:
              temperature:
                $ref: /schemas/types.yaml#/definitions/int32
                minimum: -273000
                maximum: 200000
                description:
                  An integer expressing the trip temperature in millicelsius.

I get this error:

  DTC     arch/arm/boot/dts/ste-ux500-samsung-golden.dtb
Error: ../arch/arm/boot/dts/ste-ux500-samsung-golden.dts:50.21-22 syntax error

Does anyone know how to put a negative number in a
property?

>  From an implementation POV, that was not spotted initially because of
> the old OF code design IMO (but I'm not sure).
>
> We can continue registering the thermal with no trip points but then
> still raise a message.
>
> However, a thermal zone without trip point does not really make sense
> IMO. If I'm correct, the ACPI at least defines the critical temperature
> as a non optional object.

I don't know about that, this is from one of my laptops, output
from "sensors" command:

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +46.0°C  (crit = +99.0°C)
temp2:        +46.0°C

This temp2 looks like a temperature zone without trip point...

I guess Rafael might know for sure what is out there?

But if the idea is that DT want to mimic what ACPI is doing
then it seems to me that ACPI has thermal zones without
trip points.

> Did you consider using hwmon instead of a thermal zone ?

The concept of "thermal zone" actually makes much more
sense for a battery since the thermistor is often not mounted
in the battery (at least not in this case) and is measuring
the proximity of the battery, not the battery per se.

> Below a patch (not tested): one hand writing is painful

This works!
I can sign off the patch and send it if you like.
I would probably alter the warning text "please add trip
points to your DTS..."

Yours,
Linus Walleij
Daniel Lezcano Oct. 28, 2022, 9:31 a.m. UTC | #4
Hi Linus,

On 28/10/2022 10:04, Linus Walleij wrote:
> On Wed, Oct 26, 2022 at 11:40 PM Daniel Lezcano
> <daniel.lezcano@linexp.org> wrote:
>> On 26/10/2022 19:06, Rafael J. Wysocki wrote:
> 
>>> This is Daniel's work, so I'm still hoping that he'll chime in
>>> shortly,
>>
>> Yep, I'm in sick leave ATM, I broke my arm (without wordplay).
> 
> Yeah I heard, get well soon!

Thanks

>> I took sometime to read the code, so from my POV we should keep the
>> required property patch because the DT was defined that as required
>> property. The conversion to yaml obviously spotted the DT not conforming
>> with the bindings.
> 
> So I guess you mean I should add some trip points to my device
> trees then so they pass validation?

May be a critical trip point? (but a monitoring delay will be needed 
implying additional wake ups if the interrupt mode is not supported)

> It's fine with me, I can just put some absolute maximum temperatures
> around the batteries, I am more worrying if there are other users
> out there that might get upset.

If you put an active or passive trip point without a cooling device, 
that trip point won't do anything except sending a notification to the 
userspace when it is crossed (if monitored).

It is always useful to have a passive trip, so when the writable trip 
option is enabled, the userspace can set the temperature and get notified.

> I have a problem to add a trip point like this:
> 
>                  battery-crit-lo {
>                      temperature = <-50000>;
>                      hysteresis = <2000>;
>                      type = "critical";
>                  };
> 
> Despite it is legal to the schema:
> 
>              properties:
>                temperature:
>                  $ref: /schemas/types.yaml#/definitions/int32
>                  minimum: -273000
>                  maximum: 200000
>                  description:
>                    An integer expressing the trip temperature in millicelsius.
> 
> I get this error:
> 
>    DTC     arch/arm/boot/dts/ste-ux500-samsung-golden.dtb
> Error: ../arch/arm/boot/dts/ste-ux500-samsung-golden.dts:50.21-22 syntax error
> 
> Does anyone know how to put a negative number in a
> property?

I don't know but the thermal framework does not support the cold trip 
points (yet). That means here, the battery will be in temperature 
violation if the temperature is above -50°C


>>   From an implementation POV, that was not spotted initially because of
>> the old OF code design IMO (but I'm not sure).
>>
>> We can continue registering the thermal with no trip points but then
>> still raise a message.
>>
>> However, a thermal zone without trip point does not really make sense
>> IMO. If I'm correct, the ACPI at least defines the critical temperature
>> as a non optional object.
> 
> I don't know about that, this is from one of my laptops, output
> from "sensors" command:
> 
> acpitz-acpi-0
> Adapter: ACPI interface
> temp1:        +46.0°C  (crit = +99.0°C)
> temp2:        +46.0°C
> 
> This temp2 looks like a temperature zone without trip point...

Yeah, ACPI ...

Mine is:

acpi -tci
Thermal 0: ok, 16.8 degrees C
Thermal 0: trip point 0 switches to mode critical at temperature 20.8 
degrees C
Thermal 0: trip point 1 switches to mode hot at temperature 19.8 degrees C
Thermal 1: ok, 16.8 degrees C
Thermal 1: trip point 0 switches to mode critical at temperature 20.8 
degrees C
Thermal 1: trip point 1 switches to mode hot at temperature 19.8 degrees C

:)

> I guess Rafael might know for sure what is out there?
> 
> But if the idea is that DT want to mimic what ACPI is doing
> then it seems to me that ACPI has thermal zones without
> trip points.
> 
>> Did you consider using hwmon instead of a thermal zone ?
> 
> The concept of "thermal zone" actually makes much more
> sense for a battery since the thermistor is often not mounted
> in the battery (at least not in this case) and is measuring
> the proximity of the battery, not the battery per se.

IMO, you are reinventing the wheel in the battery code.

Why not use the cooling device psy_register_cooler()? And let the 
thermal framework deal with the monitoring and the mitigation ?

(cold trip point handling will have to stay in the current code)

>> Below a patch (not tested): one hand writing is painful
> 
> This works!
> I can sign off the patch and send it if you like.

Sure, no problem. May be see if that could be done more elegantly?

> I would probably alter the warning text "please add trip
> points to your DTS..."

Ok
Linus Walleij Oct. 28, 2022, 9:42 a.m. UTC | #5
On Fri, Oct 28, 2022 at 11:31 AM Daniel Lezcano
<daniel.lezcano@linexp.org> wrote:

> > I have a problem to add a trip point like this:
> >
> >                  battery-crit-lo {
> >                      temperature = <-50000>;
> >                      hysteresis = <2000>;
> >                      type = "critical";
> >                  };
> >
> > Despite it is legal to the schema:
> >
> >              properties:
> >                temperature:
> >                  $ref: /schemas/types.yaml#/definitions/int32
> >                  minimum: -273000
> >                  maximum: 200000
> >                  description:
> >                    An integer expressing the trip temperature in millicelsius.
> >
> > I get this error:
> >
> >    DTC     arch/arm/boot/dts/ste-ux500-samsung-golden.dtb
> > Error: ../arch/arm/boot/dts/ste-ux500-samsung-golden.dts:50.21-22 syntax error
> >
> > Does anyone know how to put a negative number in a
> > property?
>
> I don't know but the thermal framework does not support the cold trip
> points (yet).

Aha! But does the DT bindings support it?

> That means here, the battery will be in temperature
> violation if the temperature is above -50°C

Hm are the DT bindings written like such or do they also imply that
we only have "above this temperature" trips? The way I read it,
it is kind of an open question.

Maybe we should make it explicit
in the DT bindings that if there is a positive number in the temperature
of a trip point we trip above the point, and if there is a negative
number we trip below the point?

I can make a patch.

Then another day we can add "heating-maps" :D

These use cases might seem alien but I think they actually exist.

> > The concept of "thermal zone" actually makes much more
> > sense for a battery since the thermistor is often not mounted
> > in the battery (at least not in this case) and is measuring
> > the proximity of the battery, not the battery per se.
>
> IMO, you are reinventing the wheel in the battery code.
>
> Why not use the cooling device psy_register_cooler()? And let the
> thermal framework deal with the monitoring and the mitigation ?
>
> (cold trip point handling will have to stay in the current code)

OK I will look into this. It's not a big thing really, very little code.

Mainly the charging state machine likes to keep everything
under its own umbrella, so the state machine moves into a special
state when things get too hot or too cold, so the idea would be
to have the framework cooler trigger the state instead.

> >> Below a patch (not tested): one hand writing is painful
> >
> > This works!
> > I can sign off the patch and send it if you like.
>
> Sure, no problem. May be see if that could be done more elegantly?
>
> > I would probably alter the warning text "please add trip
> > points to your DTS..."
>
> Ok

I'll send something soon!

Yours,
Linus Walleij
Rob Herring Oct. 28, 2022, 12:26 p.m. UTC | #6
On Fri, Oct 28, 2022 at 3:04 AM Linus Walleij <linus.walleij@linaro.org> wrote:
>
> On Wed, Oct 26, 2022 at 11:40 PM Daniel Lezcano
> <daniel.lezcano@linexp.org> wrote:
> > On 26/10/2022 19:06, Rafael J. Wysocki wrote:
>
> > > This is Daniel's work, so I'm still hoping that he'll chime in
> > > shortly,
> >
> > Yep, I'm in sick leave ATM, I broke my arm (without wordplay).
>
> Yeah I heard, get well soon!
>
> > I took sometime to read the code, so from my POV we should keep the
> > required property patch because the DT was defined that as required
> > property. The conversion to yaml obviously spotted the DT not conforming
> > with the bindings.
>
> So I guess you mean I should add some trip points to my device
> trees then so they pass validation?
>
> It's fine with me, I can just put some absolute maximum temperatures
> around the batteries, I am more worrying if there are other users
> out there that might get upset.
>
> I have a problem to add a trip point like this:
>
>                 battery-crit-lo {
>                     temperature = <-50000>;
>                     hysteresis = <2000>;
>                     type = "critical";
>                 };
>
> Despite it is legal to the schema:
>
>             properties:
>               temperature:
>                 $ref: /schemas/types.yaml#/definitions/int32
>                 minimum: -273000
>                 maximum: 200000
>                 description:
>                   An integer expressing the trip temperature in millicelsius.
>
> I get this error:
>
>   DTC     arch/arm/boot/dts/ste-ux500-samsung-golden.dtb
> Error: ../arch/arm/boot/dts/ste-ux500-samsung-golden.dts:50.21-22 syntax error
>
> Does anyone know how to put a negative number in a
> property?

Expressions have to be enclosed in parenthesis.

Rob
Thorsten Leemhuis Nov. 9, 2022, 3:55 p.m. UTC | #7
On 26.10.22 11:29, Thorsten Leemhuis wrote:
> [Note: this mail is primarily send for documentation purposes and/or for
> regzbot, my Linux kernel regression tracking bot. That's why I removed
> most or all folks from the list of recipients, but left any that looked
> like a mailing lists. These mails usually contain '#forregzbot' in the
> subject, to make them easy to spot and filter out.]

> #regzbot ^introduced 8c596324232d22e19f
> #regzbot title dt-bindings: thermal: thermal zone without trip points broke
> #regzbot ignore-activity

#regzbot fixed-by: cd73adcdba
diff mbox series

Patch

diff --git a/arch/arm/boot/dts/ste-ux500-samsung-golden.dts
b/arch/arm/boot/dts/ste-ux500-samsung-golden.dts
index b0dce91aff4b..d00e9e6ebbf7 100644
--- a/arch/arm/boot/dts/ste-ux500-samsung-golden.dts
+++ b/arch/arm/boot/dts/ste-ux500-samsung-golden.dts
@@ -35,6 +35,15 @@  battery-thermal {
                        polling-delay = <0>;
                        polling-delay-passive = <0>;
                        thermal-sensors = <&bat_therm>;
+
+                       trips {
+                               /* Unused trip point to please the framework */
+                               dummy {
+                                       temperature = <700000>;
+                                       hysteresis = <2000>;
+                                       type = "passive";
+                               };
+                       };
                };
        };