diff mbox series

[RESEND,RESEND] thermal/of: support thermal zones w/o trips subnode

Message ID 20230722122534.2279689-1-zhengxingda@iscas.ac.cn
State New
Headers show
Series [RESEND,RESEND] thermal/of: support thermal zones w/o trips subnode | expand

Commit Message

Icenowy Zheng July 22, 2023, 12:25 p.m. UTC
From: Icenowy Zheng <uwu@icenowy.me>

Although the current device tree binding of thermal zones require the
trips subnode, the binding in kernel v5.15 does not require it, and many
device trees shipped with the kernel, for example,
allwinner/sun50i-a64.dtsi and mediatek/mt8183-kukui.dtsi in ARM64, still
comply to the old binding and contain no trips subnode.

Allow the code to successfully register thermal zones w/o trips subnode
for DT binding compatibility now.

Furtherly, the inconsistency between DTs and bindings should be resolved
by either adding empty trips subnode or dropping the trips subnode
requirement.

Fixes: d0c75fa2c17f ("thermal/of: Initialize trip points separately")
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
---

Unfortunately the code gets dropped by mailing lists again and again...

Sorry for the disturbance.

 drivers/thermal/thermal_of.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

Comments

Mark Brown July 22, 2023, 8:11 p.m. UTC | #1
On Sat, Jul 22, 2023 at 08:25:34PM +0800, Icenowy Zheng wrote:
> From: Icenowy Zheng <uwu@icenowy.me>
> 
> Although the current device tree binding of thermal zones require the
> trips subnode, the binding in kernel v5.15 does not require it, and many
> device trees shipped with the kernel, for example,
> allwinner/sun50i-a64.dtsi and mediatek/mt8183-kukui.dtsi in ARM64, still
> comply to the old binding and contain no trips subnode.
> 
> Allow the code to successfully register thermal zones w/o trips subnode
> for DT binding compatibility now.
> 
> Furtherly, the inconsistency between DTs and bindings should be resolved
> by either adding empty trips subnode or dropping the trips subnode
> requirement.

This makes sense to me - it allows people to see the reported
temperature even if there's no trips defined which seems more
helpful than refusing to register.

Reviewed-by: Mark Brown <broonie@kernel.org>
Daniel Lezcano July 23, 2023, 10:12 a.m. UTC | #2
Hi Mark,

On 22/07/2023 22:11, Mark Brown wrote:
> On Sat, Jul 22, 2023 at 08:25:34PM +0800, Icenowy Zheng wrote:
>> From: Icenowy Zheng <uwu@icenowy.me>
>>
>> Although the current device tree binding of thermal zones require the
>> trips subnode, the binding in kernel v5.15 does not require it, and many
>> device trees shipped with the kernel, for example,
>> allwinner/sun50i-a64.dtsi and mediatek/mt8183-kukui.dtsi in ARM64, still
>> comply to the old binding and contain no trips subnode.
>>
>> Allow the code to successfully register thermal zones w/o trips subnode
>> for DT binding compatibility now.
>>
>> Furtherly, the inconsistency between DTs and bindings should be resolved
>> by either adding empty trips subnode or dropping the trips subnode
>> requirement.
> 
> This makes sense to me - it allows people to see the reported
> temperature even if there's no trips defined which seems more
> helpful than refusing to register.

The binding describes the trip points as required and that since the 
beginning.

What changed is now the code reflects the required property while before 
it was permissive, that was an oversight.

Just a reminder about the thermal framework goals:

   1. It protects the silicon (thus critical and hot trip points)

   2. It mitigates the temperature (thus cooling device bound to trip 
points)

   3. It notifies the userspace when a trip point is crossed

So if the thermal zone is described but without any of this goal above, 
it is pointless.

If the goal is to report the temperature only, then hwmon should be used 
instead.

If the goal is to mitigate by userspace, then the trip point *must* be 
used to prevent the userspace polling the temperature. With the trip 
point the sensor will be set to fire an interrupt at the given trip 
temperature.

IOW, trip points are not optional
Mark Brown July 23, 2023, 3:05 p.m. UTC | #3
On Sun, Jul 23, 2023 at 12:12:49PM +0200, Daniel Lezcano wrote:
> On 22/07/2023 22:11, Mark Brown wrote:

> > This makes sense to me - it allows people to see the reported
> > temperature even if there's no trips defined which seems more
> > helpful than refusing to register.

...

> If the goal is to report the temperature only, then hwmon should be used
> instead.

Sure, that doesn't seem to be the case in the impacted systems though -
AFAICT the issue with these is that it's a generic SoC DT that's not
fully fleshed out, either because more data is needed for the silicon or
because the numbers need to be system specific for some reason.

> If the goal is to mitigate by userspace, then the trip point *must* be used
> to prevent the userspace polling the temperature. With the trip point the
> sensor will be set to fire an interrupt at the given trip temperature.

I'm not clear a trip point prevent userspace polling if it feels so
moved?  Is it just that it makes it more likely that someone will
implement something that polls?

> IOW, trip points are not optional

I can see printing a loud warning given that the system is not fully
configured (there's a warning already, I did nearly comment on this
patch downgrading it all the way to a debug log), perhaps even
suppressing the registraton of the userspace interface, but returning a
failure to the registering driver feels like it's escalating the problem
and complicating the driver code.  Suppressing the registration to
userspace seemed like it was adding more complexity in the core but it
would avoid any potential confusion for userspace.

For me the main issue is the impact on devices that support multiple
thermal zones, in order to avoid having working zones stay registered
their drivers will all have to handle the possibility of some of the
zones failing to register due to missing configuration which is going to
add complexity both at both registration and runtime and be easy to miss.
If the core just accepts the zones then whatever complexity there is
gets factored out into the core.
Icenowy Zheng July 24, 2023, 2:35 a.m. UTC | #4
在 2023-07-23星期日的 16:05 +0100,Mark Brown写道:
> On Sun, Jul 23, 2023 at 12:12:49PM +0200, Daniel Lezcano wrote:
> > On 22/07/2023 22:11, Mark Brown wrote:
> 
> > > This makes sense to me - it allows people to see the reported
> > > temperature even if there's no trips defined which seems more
> > > helpful than refusing to register.
> 
> ...
> 
> > If the goal is to report the temperature only, then hwmon should be
> > used
> > instead.
> 
> Sure, that doesn't seem to be the case in the impacted systems though
> -
> AFAICT the issue with these is that it's a generic SoC DT that's not
> fully fleshed out, either because more data is needed for the silicon
> or
> because the numbers need to be system specific for some reason.

Well maybe we should move all thermal sensors to hwmon framework, then
let thermal framework pull the readout from hwmon; but two frameworks
have the same functionality of reading temperature is the current
situation, we shouldn't break things.

> 
> > If the goal is to mitigate by userspace, then the trip point *must*
> > be used
> > to prevent the userspace polling the temperature. With the trip
> > point the
> > sensor will be set to fire an interrupt at the given trip
> > temperature.
> 
> I'm not clear a trip point prevent userspace polling if it feels so
> moved?  Is it just that it makes it more likely that someone will
> implement something that polls?
> 
> > IOW, trip points are not optional

If it's declared optional in DT binding in a released kernel version,
then it's optional, at least it should be optional in practice to
support this legacy DT binding, and even there are DT files shipped
with the kernel that utilizes the optionalness. Showing a warning is
okay, but bailing out is not an option, according to my understand of
current DT maintaince model.

> 
> I can see printing a loud warning given that the system is not fully
> configured (there's a warning already, I did nearly comment on this
> patch downgrading it all the way to a debug log), perhaps even
> suppressing the registraton of the userspace interface, but returning
> a
> failure to the registering driver feels like it's escalating the
> problem
> and complicating the driver code.  Suppressing the registration to
> userspace seemed like it was adding more complexity in the core but
> it
> would avoid any potential confusion for userspace.
> 
> For me the main issue is the impact on devices that support multiple
> thermal zones, in order to avoid having working zones stay registered
> their drivers will all have to handle the possibility of some of the
> zones failing to register due to missing configuration which is going
> to

Well I think in the case of Allwinner SoCs, the thermal sensor is a
multi-channel one, so it's possible that some channels (e.g. the CPU
sensor) are used for thermal throttling and other channels (e.g. the
GPU one, considering Mali-400 is quite weak, and usually no DVFS
equipped) are only used for monitoring.

We should allow this kind of configuration in kernel. Moving everything
to hwmon is an option, but it's a too gaint change.

> add complexity both at both registration and runtime and be easy to
> miss.
> If the core just accepts the zones then whatever complexity there is
> gets factored out into the core.
Chen-Yu Tsai July 24, 2023, 4:25 a.m. UTC | #5
On Sun, Jul 23, 2023 at 12:12:49PM +0200, Daniel Lezcano wrote:
> 
> Hi Mark,
> 
> On 22/07/2023 22:11, Mark Brown wrote:
> > On Sat, Jul 22, 2023 at 08:25:34PM +0800, Icenowy Zheng wrote:
> > > From: Icenowy Zheng <uwu@icenowy.me>
> > > 
> > > Although the current device tree binding of thermal zones require the
> > > trips subnode, the binding in kernel v5.15 does not require it, and many
> > > device trees shipped with the kernel, for example,
> > > allwinner/sun50i-a64.dtsi and mediatek/mt8183-kukui.dtsi in ARM64, still
> > > comply to the old binding and contain no trips subnode.
> > > 
> > > Allow the code to successfully register thermal zones w/o trips subnode
> > > for DT binding compatibility now.
> > > 
> > > Furtherly, the inconsistency between DTs and bindings should be resolved
> > > by either adding empty trips subnode or dropping the trips subnode
> > > requirement.
> > 
> > This makes sense to me - it allows people to see the reported
> > temperature even if there's no trips defined which seems more
> > helpful than refusing to register.
> 
> The binding describes the trip points as required and that since the
> beginning.

Not really. It was made optional in the v5.15 kernel release by commit

    22fc857538c3 dt-bindings: thermal: Make trips node optional

> What changed is now the code reflects the required property while before it
> was permissive, that was an oversight.
> 
> Just a reminder about the thermal framework goals:
> 
>   1. It protects the silicon (thus critical and hot trip points)
> 
>   2. It mitigates the temperature (thus cooling device bound to trip points)
> 
>   3. It notifies the userspace when a trip point is crossed
> 
> So if the thermal zone is described but without any of this goal above, it
> is pointless.
> 
> If the goal is to report the temperature only, then hwmon should be used
> instead.

What about thermal sensors with multiple channels? Some of the channels
are indeed tied to important hardware blocks like the CPU cores and
should be tied into the thermal tripping. However other channels might
only be used for temperature read-out and have no such requirement.

Should we be mixing thermal and hwmon APIs in the driver?

> If the goal is to mitigate by userspace, then the trip point *must* be used
> to prevent the userspace polling the temperature. With the trip point the
> sensor will be set to fire an interrupt at the given trip temperature.
> 
> IOW, trip points are not optional

for measurement points that are used for thermal throttling /
mitigation.

ChenYu
Icenowy Zheng July 25, 2023, 4:01 a.m. UTC | #6
于 2023年7月24日 GMT+08:00 12:25:02, Chen-Yu Tsai <wenst@chromium.org> 写到:
>On Sun, Jul 23, 2023 at 12:12:49PM +0200, Daniel Lezcano wrote:
>> 
>> Hi Mark,
>> 
>> On 22/07/2023 22:11, Mark Brown wrote:
>> > On Sat, Jul 22, 2023 at 08:25:34PM +0800, Icenowy Zheng wrote:
>> > > From: Icenowy Zheng <uwu@icenowy.me>
>> > > 
>> > > Although the current device tree binding of thermal zones require the
>> > > trips subnode, the binding in kernel v5.15 does not require it, and many
>> > > device trees shipped with the kernel, for example,
>> > > allwinner/sun50i-a64.dtsi and mediatek/mt8183-kukui.dtsi in ARM64, still
>> > > comply to the old binding and contain no trips subnode.
>> > > 
>> > > Allow the code to successfully register thermal zones w/o trips subnode
>> > > for DT binding compatibility now.
>> > > 
>> > > Furtherly, the inconsistency between DTs and bindings should be resolved
>> > > by either adding empty trips subnode or dropping the trips subnode
>> > > requirement.
>> > 
>> > This makes sense to me - it allows people to see the reported
>> > temperature even if there's no trips defined which seems more
>> > helpful than refusing to register.
>> 
>> The binding describes the trip points as required and that since the
>> beginning.
>
>Not really. It was made optional in the v5.15 kernel release by commit
>
>    22fc857538c3 dt-bindings: thermal: Make trips node optional

Yes, thanks for the clarification.

My understand of DT binding tells me that this means lacking of the trips node
must be handled, before we solve the inconsistency between current
DT binding and shipped DTs.

The latter problem could be discussed, but the former problem is a MUST
unless we're breaking the compatibility promise of DT bindings (and shipped DTs).

>
>> What changed is now the code reflects the required property while before it
>> was permissive, that was an oversight.
>> 
>> Just a reminder about the thermal framework goals:
>> 
>>   1. It protects the silicon (thus critical and hot trip points)
>> 
>>   2. It mitigates the temperature (thus cooling device bound to trip points)
>> 
>>   3. It notifies the userspace when a trip point is crossed
>> 
>> So if the thermal zone is described but without any of this goal above, it
>> is pointless.
>> 
>> If the goal is to report the temperature only, then hwmon should be used
>> instead.
>
>What about thermal sensors with multiple channels? Some of the channels
>are indeed tied to important hardware blocks like the CPU cores and
>should be tied into the thermal tripping. However other channels might
>only be used for temperature read-out and have no such requirement.
>
>Should we be mixing thermal and hwmon APIs in the driver?
>
>> If the goal is to mitigate by userspace, then the trip point *must* be used
>> to prevent the userspace polling the temperature. With the trip point the
>> sensor will be set to fire an interrupt at the given trip temperature.
>> 
>> IOW, trip points are not optional
>
>for measurement points that are used for thermal throttling /
>mitigation.
>
>ChenYu
>
Icenowy Zheng Aug. 1, 2023, 2:10 p.m. UTC | #7
在 2023-07-24星期一的 12:25 +0800,Chen-Yu Tsai写道:
> On Sun, Jul 23, 2023 at 12:12:49PM +0200, Daniel Lezcano wrote:
> > 
> > Hi Mark,
> > 
> > On 22/07/2023 22:11, Mark Brown wrote:
> > > On Sat, Jul 22, 2023 at 08:25:34PM +0800, Icenowy Zheng wrote:
> > > > From: Icenowy Zheng <uwu@icenowy.me>
> > > > 
> > > > Although the current device tree binding of thermal zones
> > > > require the
> > > > trips subnode, the binding in kernel v5.15 does not require it,
> > > > and many
> > > > device trees shipped with the kernel, for example,
> > > > allwinner/sun50i-a64.dtsi and mediatek/mt8183-kukui.dtsi in
> > > > ARM64, still
> > > > comply to the old binding and contain no trips subnode.
> > > > 
> > > > Allow the code to successfully register thermal zones w/o trips
> > > > subnode
> > > > for DT binding compatibility now.
> > > > 
> > > > Furtherly, the inconsistency between DTs and bindings should be
> > > > resolved
> > > > by either adding empty trips subnode or dropping the trips
> > > > subnode
> > > > requirement.
> > > 
> > > This makes sense to me - it allows people to see the reported
> > > temperature even if there's no trips defined which seems more
> > > helpful than refusing to register.
> > 
> > The binding describes the trip points as required and that since
> > the
> > beginning.
> 
> Not really. It was made optional in the v5.15 kernel release by
> commit
> 
>     22fc857538c3 dt-bindings: thermal: Make trips node optional

I agree, this is why I send this patch (and why I say 'for DT binding
compatibility now' in the commit message). Further discussion could be
performed, but this patch should be applied regardless of the result of
further discussion.

DT binding compatibility is the unbreakable law.

> 
> > What changed is now the code reflects the required property while
> > before it
> > was permissive, that was an oversight.
> > 
> > Just a reminder about the thermal framework goals:
> > 
> >   1. It protects the silicon (thus critical and hot trip points)
> > 
> >   2. It mitigates the temperature (thus cooling device bound to
> > trip points)
> > 
> >   3. It notifies the userspace when a trip point is crossed
> > 
> > So if the thermal zone is described but without any of this goal
> > above, it
> > is pointless.
> > 
> > If the goal is to report the temperature only, then hwmon should be
> > used
> > instead.
> 
> What about thermal sensors with multiple channels? Some of the
> channels
> are indeed tied to important hardware blocks like the CPU cores and
> should be tied into the thermal tripping. However other channels
> might
> only be used for temperature read-out and have no such requirement.
> 
> Should we be mixing thermal and hwmon APIs in the driver?

Well you have no right to decide which sensor should be used for
throttling and which not. So the only way to make the semantic correct
is just rip every sensor driver out of thermal API to hwmon API, and
let thermal framework to use hwmon's.

> 
> > If the goal is to mitigate by userspace, then the trip point *must*
> > be used
> > to prevent the userspace polling the temperature. With the trip
> > point the
> > sensor will be set to fire an interrupt at the given trip
> > temperature.
> > 
> > IOW, trip points are not optional
> 
> for measurement points that are used for thermal throttling /
> mitigation.
> 
> ChenYu
>
diff mbox series

Patch

diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c
index 6fb14e521197..2c76df847e84 100644
--- a/drivers/thermal/thermal_of.c
+++ b/drivers/thermal/thermal_of.c
@@ -127,15 +127,17 @@  static struct thermal_trip *thermal_of_trips_init(struct device_node *np, int *n
 
 	trips = of_get_child_by_name(np, "trips");
 	if (!trips) {
-		pr_err("Failed to find 'trips' node\n");
-		return ERR_PTR(-EINVAL);
+		pr_debug("Failed to find 'trips' node\n");
+		*ntrips = 0;
+		return NULL;
 	}
 
 	count = of_get_child_count(trips);
 	if (!count) {
-		pr_err("No trip point defined\n");
-		ret = -EINVAL;
-		goto out_of_node_put;
+		pr_debug("No trip point defined\n");
+		of_node_put(trips);
+		*ntrips = 0;
+		return NULL;
 	}
 
 	tt = kzalloc(sizeof(*tt) * count, GFP_KERNEL);
@@ -519,7 +521,10 @@  static struct thermal_zone_device *thermal_of_zone_register(struct device_node *
 	of_ops->bind = thermal_of_bind;
 	of_ops->unbind = thermal_of_unbind;
 
-	mask = GENMASK_ULL((ntrips) - 1, 0);
+	if (ntrips)
+		mask = GENMASK_ULL((ntrips) - 1, 0);
+	else
+		mask = 0;
 
 	tz = thermal_zone_device_register_with_trips(np->name, trips, ntrips,
 						     mask, data, of_ops, tzp,