Message ID | 20200710135154.181454-1-thara.gopinath@linaro.org |
---|---|
Headers | show |
Series | thermal: Introduce support for monitoring falling temperature | expand |
On 10/07/2020 15:51, Thara Gopinath wrote: > Thermal framework today supports monitoring for rising temperatures and > subsequently initiating cooling action in case of a thermal trip point > being crossed. There are scenarios where a SoC need some warming action to > be activated if the temperature falls below a cetain permissible limit. > Since warming action can be considered mirror opposite of cooling action, > most of the thermal framework can be re-used to achieve this. > > This patch series is yet another attempt to add support for monitoring > falling temperature in thermal framework. Unlike the first attempt[1] > (where a new property was added to thermal trip point binding to indicate > direction of temperature monitoring), this series introduces a new trip > point type (THERMAL_TRIP_COLD) to indicate a trip point at which falling > temperature monitoring must be triggered. This patch series uses Daniel > Lezcano's recently added thermal genetlink interface[2] to notify userspace > of falling temperature and rising temperature at the cold trip point. This > will enable a user space engine to trigger the relevant mitigation for > falling temperature. At present, no support is added to any of the thermal > governors to monitor and mitigate falling temperature at the cold trip > point;rather all governors return doing nothing if triggered for a cold > trip point. As future extension, monitoring of falling temperature can be > added to the relevant thermal governor. I agree we need a cold trip point in order to introduce the functioning temperature range in the thermal framework. Rui, what is your opinion ? -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog
On 10/07/2020 15:51, Thara Gopinath wrote: > Extend thermal trip point type property to include "cold" trip type > indicating point in the temperature domain below which a warming action > must be intiated. > > Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org> > --- > Documentation/devicetree/bindings/thermal/thermal.txt | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt b/Documentation/devicetree/bindings/thermal/thermal.txt > index f78bec19ca35..1689d9ba1471 100644 > --- a/Documentation/devicetree/bindings/thermal/thermal.txt > +++ b/Documentation/devicetree/bindings/thermal/thermal.txt > @@ -87,6 +87,7 @@ Required properties: > "active": A trip point to enable active cooling > "passive": A trip point to enable passive cooling > "hot": A trip point to notify emergency > + "cold": A trip point to enable warming > "critical": Hardware not reliable. > Type: string thermal.txt should have been removed. Perhaps, a patch is missing. The thermal.txt has been converted into 3 yaml schema. The change should be in thermal-zones.yaml. -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog
On 7/13/20 11:05 AM, Daniel Lezcano wrote: > On 10/07/2020 15:51, Thara Gopinath wrote: >> Extend thermal trip point type property to include "cold" trip type >> indicating point in the temperature domain below which a warming action >> must be intiated. >> >> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org> >> --- >> Documentation/devicetree/bindings/thermal/thermal.txt | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt b/Documentation/devicetree/bindings/thermal/thermal.txt >> index f78bec19ca35..1689d9ba1471 100644 >> --- a/Documentation/devicetree/bindings/thermal/thermal.txt >> +++ b/Documentation/devicetree/bindings/thermal/thermal.txt >> @@ -87,6 +87,7 @@ Required properties: >> "active": A trip point to enable active cooling >> "passive": A trip point to enable passive cooling >> "hot": A trip point to notify emergency >> + "cold": A trip point to enable warming >> "critical": Hardware not reliable. >> Type: string > > > thermal.txt should have been removed. Perhaps, a patch is missing. The > thermal.txt has been converted into 3 yaml schema. > > The change should be in thermal-zones.yaml. Hi Daniel.. Thanks for the review. My bad.. I will fix this in the next version. I can send a patch removing thermal.txt as well > > -- Warm Regards Thara
On 13/07/2020 19:01, Thara Gopinath wrote: > > > On 7/13/20 11:05 AM, Daniel Lezcano wrote: >> On 10/07/2020 15:51, Thara Gopinath wrote: >>> Extend thermal trip point type property to include "cold" trip type >>> indicating point in the temperature domain below which a warming action >>> must be intiated. >>> >>> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org> >>> --- >>> Documentation/devicetree/bindings/thermal/thermal.txt | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/Documentation/devicetree/bindings/thermal/thermal.txt >>> b/Documentation/devicetree/bindings/thermal/thermal.txt >>> index f78bec19ca35..1689d9ba1471 100644 >>> --- a/Documentation/devicetree/bindings/thermal/thermal.txt >>> +++ b/Documentation/devicetree/bindings/thermal/thermal.txt >>> @@ -87,6 +87,7 @@ Required properties: >>> "active": A trip point to enable active cooling >>> "passive": A trip point to enable passive cooling >>> "hot": A trip point to notify emergency >>> + "cold": A trip point to enable warming >>> "critical": Hardware not reliable. >>> Type: string >> >> >> thermal.txt should have been removed. Perhaps, a patch is missing. The >> thermal.txt has been converted into 3 yaml schema. >> >> The change should be in thermal-zones.yaml. > > Hi Daniel.. > > Thanks for the review. My bad.. I will fix this in the next version. > I can send a patch removing thermal.txt as well Yes, sure. Thanks -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog
On Mon, 2020-07-13 at 17:03 +0200, Daniel Lezcano wrote: > On 10/07/2020 15:51, Thara Gopinath wrote: > > Thermal framework today supports monitoring for rising temperatures > > and > > subsequently initiating cooling action in case of a thermal trip > > point > > being crossed. There are scenarios where a SoC need some warming > > action to > > be activated if the temperature falls below a cetain permissible > > limit. > > Since warming action can be considered mirror opposite of cooling > > action, > > most of the thermal framework can be re-used to achieve this. > > > > This patch series is yet another attempt to add support for > > monitoring > > falling temperature in thermal framework. Unlike the first > > attempt[1] > > (where a new property was added to thermal trip point binding to > > indicate > > direction of temperature monitoring), this series introduces a new > > trip > > point type (THERMAL_TRIP_COLD) to indicate a trip point at which > > falling > > temperature monitoring must be triggered. This patch series uses > > Daniel > > Lezcano's recently added thermal genetlink interface[2] to notify > > userspace > > of falling temperature and rising temperature at the cold trip > > point. This > > will enable a user space engine to trigger the relevant mitigation > > for > > falling temperature. At present, no support is added to any of the > > thermal > > governors to monitor and mitigate falling temperature at the cold > > trip > > point;rather all governors return doing nothing if triggered for a > > cold > > trip point. As future extension, monitoring of falling temperature > > can be > > added to the relevant thermal governor. > > I agree we need a cold trip point in order to introduce the > functioning > temperature range in the thermal framework. > > Rui, what is your opinion ? I agree with the concept of "cold" trip point. In this patch set, the cold trip point is defined with only netlink event support. But there are still quite a lot of things unclear, especially what we should do in thermal framework? For example, to support this, we can either introduce both "cold" trip points and "warming devices", and introduce new logic in thermal framework and governors to handle them, Or introduce "cold" trip point and "warming" device, but only semantically, and treat them just like normal trip points and cooling devices. And strictly define cooling state 0 as the state that generates most heat, and define max cooling state as the state that generates least heat. Then, say, we have a trip point at -10C, the "warming" device is set to cooling state 0 when the temperature is lower than -10C, and in most cases, this thermal zone is always in a "overheating" state (temperature higher than -10C), and the "warming" device for this thermal zone is "throttled" to generate as least heat as possible. And this is pretty much what the current code has always been doing, right? I can not say which one is better for now as I don't have the background of this requirement. It's nice that Thara sent this RFC series for discussion, but from upstream point of view, I'd prefer to see a full stack solution, before taking any code. thanks, Rui
On 7/14/20 9:49 AM, Zhang Rui wrote: > On Mon, 2020-07-13 at 17:03 +0200, Daniel Lezcano wrote: >> On 10/07/2020 15:51, Thara Gopinath wrote: >>> Thermal framework today supports monitoring for rising temperatures >>> and >>> subsequently initiating cooling action in case of a thermal trip >>> point >>> being crossed. There are scenarios where a SoC need some warming >>> action to >>> be activated if the temperature falls below a cetain permissible >>> limit. >>> Since warming action can be considered mirror opposite of cooling >>> action, >>> most of the thermal framework can be re-used to achieve this. >>> >>> This patch series is yet another attempt to add support for >>> monitoring >>> falling temperature in thermal framework. Unlike the first >>> attempt[1] >>> (where a new property was added to thermal trip point binding to >>> indicate >>> direction of temperature monitoring), this series introduces a new >>> trip >>> point type (THERMAL_TRIP_COLD) to indicate a trip point at which >>> falling >>> temperature monitoring must be triggered. This patch series uses >>> Daniel >>> Lezcano's recently added thermal genetlink interface[2] to notify >>> userspace >>> of falling temperature and rising temperature at the cold trip >>> point. This >>> will enable a user space engine to trigger the relevant mitigation >>> for >>> falling temperature. At present, no support is added to any of the >>> thermal >>> governors to monitor and mitigate falling temperature at the cold >>> trip >>> point;rather all governors return doing nothing if triggered for a >>> cold >>> trip point. As future extension, monitoring of falling temperature >>> can be >>> added to the relevant thermal governor. >> >> I agree we need a cold trip point in order to introduce the >> functioning >> temperature range in the thermal framework. >> >> Rui, what is your opinion ? > > I agree with the concept of "cold" trip point. > In this patch set, the cold trip point is defined with only netlink > event support. But there are still quite a lot of things unclear, > especially what we should do in thermal framework? Hi Rui, Thanks for the comments. You are right that cold trip points are dealt with only by netlink events in this patch series. Eventually IMHO, governors should handle them with a logic opposite to what is being currently done for non-cold trip points. > > For example, to support this, we can > either > introduce both "cold" trip points and "warming devices", and introduce > new logic in thermal framework and governors to handle them, > Or > introduce "cold" trip point and "warming" device, but only > semantically, and treat them just like normal trip points and cooling > devices. And strictly define cooling state 0 as the state that > generates most heat, and define max cooling state as the state that > generates least heat. Then, say, we have a trip point at -10C, the > "warming" device is set to cooling state 0 when the temperature is > lower than -10C, and in most cases, this thermal zone is always in a > "overheating" state (temperature higher than -10C), and the "warming" > device for this thermal zone is "throttled" to generate as least heat > as possible. And this is pretty much what the current code has always > been doing, right? IMHO, thermal framework should move to a direction where the term "mitigation" is used rather than cooling or warming. In this case "cooling dev" and "warming dev" should will become "temp-mitigating-dev". So going by this, I think what you mention as option 1 is more suitable where new logic is introduced into the framework and governors to handle the trip points marked as "cold". Also in the current set of requirements, we have a few power domain rails and other resources that are used exclusively in the thermal framework for warming alone as in they are not used ever for cooling down a zone. But then one of the requirements we have discussed is for cpufreq and gpu scaling to be behave as warming devices where the minimum operating point/ voltage of the relevant cpu/gpu is restricted. So in this case, Daniel had this suggestion of introducing negative states for presently what is defined as cooling devices. So cooling dev / temp-mitigation-dev states can range from say -3 to 5 with 0 as the good state where no mitigation is happening. This is an interesting idea though I have not proto-typed it yet. > > I can not say which one is better for now as I don't have the > background of this requirement. It's nice that Thara sent this RFC > series for discussion, but from upstream point of view, I'd prefer to > see a full stack solution, before taking any code. We had done a session at ELC on this requirement. Here is the link to the presentation. Hopefully it gives you some back ground on this. https://elinux.org/images/f/f7/ELC-2020-Thara-Ram-Linux-Kernel-Thermal-Warming.pdf I have sent across some patches for introducing a generic power domain warming device which is under review by Daniel. So how do you want to proceed on this? Can you elaborate a bit more on what you mean by a full stack solution. > > thanks, > Rui > -- Warm Regards Thara
On Fri, 2020-07-10 at 09:51 -0400, Thara Gopinath wrote: > For now, thermal governors do not support monitoring of falling > temperature. Hence, in case of calls to the governor for trip points > marked > as cold, return doing nothing. > > Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org> > --- > drivers/thermal/gov_bang_bang.c | 8 ++++++++ > drivers/thermal/gov_fair_share.c | 8 ++++++++ > drivers/thermal/gov_power_allocator.c | 8 ++++++++ > drivers/thermal/gov_step_wise.c | 8 ++++++++ > 4 files changed, 32 insertions(+) userspace governor does not support cold trip point neither. So how about adding the check in handle_non_critical_trips first, and remove the check later, after all the governors support cold trip? thanks, rui > > diff --git a/drivers/thermal/gov_bang_bang.c > b/drivers/thermal/gov_bang_bang.c > index 991a1c54296d..8324d13de1e7 100644 > --- a/drivers/thermal/gov_bang_bang.c > +++ b/drivers/thermal/gov_bang_bang.c > @@ -99,6 +99,14 @@ static void thermal_zone_trip_update(struct > thermal_zone_device *tz, int trip) > static int bang_bang_control(struct thermal_zone_device *tz, int > trip) > { > struct thermal_instance *instance; > + enum thermal_trip_type trip_type; > + > + /* Return doing nothing in case of cold trip point */ > + if (trip != THERMAL_TRIPS_NONE) { > + tz->ops->get_trip_type(tz, trip, &trip_type); > + if (trip_type == THERMAL_TRIP_COLD) > + return 0; > + } > > thermal_zone_trip_update(tz, trip); > > diff --git a/drivers/thermal/gov_fair_share.c > b/drivers/thermal/gov_fair_share.c > index aaa07180ab48..c0adce525faa 100644 > --- a/drivers/thermal/gov_fair_share.c > +++ b/drivers/thermal/gov_fair_share.c > @@ -81,6 +81,14 @@ static int fair_share_throttle(struct > thermal_zone_device *tz, int trip) > int total_weight = 0; > int total_instance = 0; > int cur_trip_level = get_trip_level(tz); > + enum thermal_trip_type trip_type; > + > + /* Return doing nothing in case of cold trip point */ > + if (trip != THERMAL_TRIPS_NONE) { > + tz->ops->get_trip_type(tz, trip, &trip_type); > + if (trip_type == THERMAL_TRIP_COLD) > + return 0; > + } > > list_for_each_entry(instance, &tz->thermal_instances, tz_node) > { > if (instance->trip != trip) > diff --git a/drivers/thermal/gov_power_allocator.c > b/drivers/thermal/gov_power_allocator.c > index 44636475b2a3..2644ad4d4032 100644 > --- a/drivers/thermal/gov_power_allocator.c > +++ b/drivers/thermal/gov_power_allocator.c > @@ -613,8 +613,16 @@ static int power_allocator_throttle(struct > thermal_zone_device *tz, int trip) > { > int ret; > int switch_on_temp, control_temp; > + enum thermal_trip_type trip_type; > struct power_allocator_params *params = tz->governor_data; > > + /* Return doing nothing in case of cold trip point */ > + if (trip != THERMAL_TRIPS_NONE) { > + tz->ops->get_trip_type(tz, trip, &trip_type); > + if (trip_type == THERMAL_TRIP_COLD) > + return 0; > + } > + > /* > * We get called for every trip point but we only need to do > * our calculations once > diff --git a/drivers/thermal/gov_step_wise.c > b/drivers/thermal/gov_step_wise.c > index 2ae7198d3067..009aefda0441 100644 > --- a/drivers/thermal/gov_step_wise.c > +++ b/drivers/thermal/gov_step_wise.c > @@ -186,6 +186,14 @@ static void thermal_zone_trip_update(struct > thermal_zone_device *tz, int trip) > static int step_wise_throttle(struct thermal_zone_device *tz, int > trip) > { > struct thermal_instance *instance; > + enum thermal_trip_type trip_type; > + > + /* For now, return doing nothing in case of cold trip point */ > + if (trip != THERMAL_TRIPS_NONE) { > + tz->ops->get_trip_type(tz, trip, &trip_type); > + if (trip_type == THERMAL_TRIP_COLD) > + return 0; > + } > > thermal_zone_trip_update(tz, trip); >
On Fri, 2020-07-10 at 09:51 -0400, Thara Gopinath wrote: > Add notification calls for trip type THERMAL_TRIP_COLD when > temperature > crosses the trip point in either direction. > > Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org> > --- > drivers/thermal/thermal_core.c | 21 +++++++++++++++------ > 1 file changed, 15 insertions(+), 6 deletions(-) > > diff --git a/drivers/thermal/thermal_core.c > b/drivers/thermal/thermal_core.c > index 750a89f0c20a..e2302ca1cd3b 100644 > --- a/drivers/thermal/thermal_core.c > +++ b/drivers/thermal/thermal_core.c > @@ -429,12 +429,21 @@ static void handle_thermal_trip(struct > thermal_zone_device *tz, int trip) > tz->ops->get_trip_hyst(tz, trip, &hyst); > > if (tz->last_temperature != THERMAL_TEMP_INVALID) { > - if (tz->last_temperature < trip_temp && > - tz->temperature >= trip_temp) > - thermal_notify_tz_trip_up(tz->id, trip); > - if (tz->last_temperature >= trip_temp && > - tz->temperature < (trip_temp - hyst)) > - thermal_notify_tz_trip_down(tz->id, trip); > + if (type == THERMAL_TRIP_COLD) { > + if (tz->last_temperature > trip_temp && > + tz->temperature <= trip_temp) > + thermal_notify_tz_trip_down(tz->id, > trip); trip_type should also be part of the event because trip_down/trip_up for hot trip and cold trip have different meanings. Or can we use some more generic names like trip_on/trip_off? trip_on means the trip point is violated or actions need to be taken for the specific trip points, for both hot and cold trips. I know trip_on/trip_off doesn't represent what I mean clearly, but surely you can find a better name. thanks, rui > + if (tz->last_temperature <= trip_temp && > + tz->temperature > (trip_temp + hyst)) > + thermal_notify_tz_trip_up(tz->id, > trip); > + } else { > + if (tz->last_temperature < trip_temp && > + tz->temperature >= trip_temp) > + thermal_notify_tz_trip_up(tz->id, > trip); > + if (tz->last_temperature >= trip_temp && > + tz->temperature < (trip_temp - hyst)) > + thermal_notify_tz_trip_down(tz->id, > trip); > + } > } > > if (type == THERMAL_TRIP_CRITICAL || type == THERMAL_TRIP_HOT)
On 7/15/20 4:46 AM, Zhang Rui wrote: > On Fri, 2020-07-10 at 09:51 -0400, Thara Gopinath wrote: >> Add notification calls for trip type THERMAL_TRIP_COLD when >> temperature >> crosses the trip point in either direction. >> >> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org> >> --- >> drivers/thermal/thermal_core.c | 21 +++++++++++++++------ >> 1 file changed, 15 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/thermal/thermal_core.c >> b/drivers/thermal/thermal_core.c >> index 750a89f0c20a..e2302ca1cd3b 100644 >> --- a/drivers/thermal/thermal_core.c >> +++ b/drivers/thermal/thermal_core.c >> @@ -429,12 +429,21 @@ static void handle_thermal_trip(struct >> thermal_zone_device *tz, int trip) >> tz->ops->get_trip_hyst(tz, trip, &hyst); >> >> if (tz->last_temperature != THERMAL_TEMP_INVALID) { >> - if (tz->last_temperature < trip_temp && >> - tz->temperature >= trip_temp) >> - thermal_notify_tz_trip_up(tz->id, trip); >> - if (tz->last_temperature >= trip_temp && >> - tz->temperature < (trip_temp - hyst)) >> - thermal_notify_tz_trip_down(tz->id, trip); >> + if (type == THERMAL_TRIP_COLD) { >> + if (tz->last_temperature > trip_temp && >> + tz->temperature <= trip_temp) >> + thermal_notify_tz_trip_down(tz->id, >> trip); > > trip_type should also be part of the event because trip_down/trip_up > for hot trip and cold trip have different meanings. > Or can we use some more generic names like trip_on/trip_off? trip_on > means the trip point is violated or actions need to be taken for the > specific trip points, for both hot and cold trips. I know > trip_on/trip_off doesn't represent what I mean clearly, but surely you > can find a better name. Makes sense.. I will fix this in the next version. I don't have a good name at the moment but will think of something! -- Warm Regards Thara