mbox series

[v1,0/7] ACPI: thermal: Use trip point table to register thermal zones

Message ID 13318886.uLZWGnKmhe@kreacher
Headers show
Series ACPI: thermal: Use trip point table to register thermal zones | expand

Message

Rafael J. Wysocki July 18, 2023, 6:01 p.m. UTC
Hi Everyone,

This patch series makes the ACPI thermal driver register thermal zones
with the help of thermal_zone_device_register_with_trips(), so it
doesn't need to use the thermal zone callbacks related to trip points
any more (and they are dropped in the last patch).

The approach presented here is quite radically different from the
previous attempts, as it doesn't really rearrange the driver's
internal data structures, but adds the trip table support on top of
them.  For this purpose, it uses an additional field in struct thermal_trip
introduced in the first patch.

I have run it on my test-bed systems, but this is not too representative,
because they each have only one ACPI thermal zone with only one (critical)
trip point in it.

Thanks,
Rafael

Comments

Rafael J. Wysocki July 25, 2023, 12:02 p.m. UTC | #1
Hi Everyone,

This is the second iteration of the $subject patch series and its original
description below is still applicable

On Tuesday, July 18, 2023 8:01:20 PM CEST Rafael J. Wysocki wrote:
>
> This patch series makes the ACPI thermal driver register thermal zones
> with the help of thermal_zone_device_register_with_trips(), so it
> doesn't need to use the thermal zone callbacks related to trip points
> any more (and they are dropped in the last patch).
>
> The approach presented here is quite radically different from the
> previous attempts, as it doesn't really rearrange the driver's
> internal data structures, but adds the trip table support on top of
> them.  For this purpose, it uses an additional field in struct thermal_trip
> introduced in the first patch.

This update is mostly related to the observation that the critical and hot trip
points never change after initialization, so they don't really need to be
connected back to the corresponding thermal_trip structures.  It also fixes
an error code path memory leak in patch [5/8].

Thanks!
Rafael J. Wysocki July 25, 2023, 6:08 p.m. UTC | #2
On Tuesday, July 25, 2023 2:20:57 PM CEST Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Make the ACPI thermal driver use thermal_zone_device_register_with_trips()
> to register its thermal zones.
> 
> For this purpose, make it create a trip point table and pass it to
> thermal_zone_device_register_with_trips() as an argument and use the
> struct thermal_trip_ref introduced previously to connect the generic
> thermal trip structures to the internal data structures representing
> trip points in the driver.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> v2 -> v3:
>    * Fix error code path memory leak in acpi_thermal_register_thermal_zone().
>    * Notice that the critical and hot trips never change after initialization,
>      so don't add struct thermal_trip_ref to any of them.
> 
> v1 -> v2:
>    * Use thermal_zone_device_lock()/thermal_zone_device_unlock() in
>      acpi_thermal_check_fn() explicitly and call __thermal_zone_device_update()
>      from there without unlocking the thermal zone.
>    * Export __thermal_zone_device_update() to modules (so it can be called by
>      the ACPI thermal code).
> 
> ---
>  drivers/acpi/thermal.c         |  106 ++++++++++++++++++++++++++++++++++++++---
>  drivers/thermal/thermal_core.c |    1 
>  include/linux/thermal.h        |    2 
>  3 files changed, 102 insertions(+), 7 deletions(-)
> 
> Index: linux-pm/drivers/acpi/thermal.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/thermal.c
> +++ linux-pm/drivers/acpi/thermal.c
> @@ -107,6 +107,7 @@ struct acpi_thermal_hot {
>  };
>  
>  struct acpi_thermal_passive {
> +	struct thermal_trip_ref trip_ref;
>  	struct acpi_handle_list devices;
>  	unsigned long temperature;
>  	unsigned long tc1;
> @@ -116,6 +117,7 @@ struct acpi_thermal_passive {
>  };
>  
>  struct acpi_thermal_active {
> +	struct thermal_trip_ref trip_ref;
>  	struct acpi_handle_list devices;
>  	unsigned long temperature;
>  	bool valid;
> @@ -137,6 +139,7 @@ struct acpi_thermal {
>  	unsigned long polling_frequency;
>  	volatile u8 zombie;
>  	struct acpi_thermal_trips trips;
> +	struct thermal_trip *trip_table;
>  	struct acpi_handle_list devices;
>  	struct thermal_zone_device *thermal_zone;
>  	int kelvin_offset;	/* in millidegrees */
> @@ -190,6 +193,14 @@ static int acpi_thermal_get_polling_freq
>  	return 0;
>  }
>  
> +static void acpi_thermal_trip_update_temp(struct acpi_thermal *tz,
> +					  struct thermal_trip *trip,
> +					  long temperature)
> +{
> +	trip->temperature = deci_kelvin_to_millicelsius_with_offset(temperature,
> +								    tz->kelvin_offset);
> +}
> +
>  static void __acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
>  {
>  	acpi_status status;
> @@ -756,6 +767,7 @@ static void acpi_thermal_zone_sysfs_remo
>  
>  static int acpi_thermal_register_thermal_zone(struct acpi_thermal *tz)
>  {
> +	struct thermal_trip *trip;
>  	int passive_delay = 0;
>  	int trip_count = 0;
>  	int result;
> @@ -776,12 +788,54 @@ static int acpi_thermal_register_thermal
>  	for (i = 0; i < ACPI_THERMAL_MAX_ACTIVE && tz->trips.active[i].valid; i++)
>  		trip_count++;
>  
> -	tz->thermal_zone = thermal_zone_device_register("acpitz", trip_count, 0,
> -							tz, &acpi_thermal_zone_ops,
> -							NULL, passive_delay,
> -							tz->polling_frequency * 100);
> -	if (IS_ERR(tz->thermal_zone))
> -		return -ENODEV;
> +	trip = kcalloc(trip_count, sizeof(*trip), GFP_KERNEL);
> +	if (!trip)
> +		return -ENOMEM;
> +
> +	tz->trip_table = trip;
> +
> +	if (tz->trips.critical.valid) {
> +		trip->type = THERMAL_TRIP_CRITICAL;
> +		acpi_thermal_trip_update_temp(tz, trip,
> +					      tz->trips.critical.temperature);
> +		trip++;
> +	}
> +
> +	if (tz->trips.hot.valid) {
> +		trip->type = THERMAL_TRIP_HOT;
> +		acpi_thermal_trip_update_temp(tz, trip,
> +					      tz->trips.hot.temperature);
> +		trip++;
> +	}
> +
> +	if (tz->trips.passive.valid) {
> +		trip->type = THERMAL_TRIP_PASSIVE;
> +		acpi_thermal_trip_update_temp(tz, trip,
> +					      tz->trips.passive.temperature);
> +		trip->driver_ref = &tz->trips.passive.trip_ref;
> +		trip++;
> +	}
> +
> +	for (i = 0; i < ACPI_THERMAL_MAX_ACTIVE && tz->trips.active[i].valid; i++) {
> +		trip->type = THERMAL_TRIP_ACTIVE;
> +		acpi_thermal_trip_update_temp(tz, trip,
> +					      tz->trips.active[i].temperature);
> +		trip->driver_ref = &tz->trips.active[i].trip_ref;
> +		trip++;
> +	}
> +
> +	tz->thermal_zone = thermal_zone_device_register_with_trips("acpitz",
> +								   tz->trip_table,
> +								   trip_count,
> +								   0, tz,
> +								   &acpi_thermal_zone_ops,
> +								   NULL,
> +								   passive_delay,
> +								   tz->polling_frequency * 100);
> +	if (IS_ERR(tz->thermal_zone)) {
> +		result = PTR_ERR(tz->thermal_zone);
> +		goto free_trip_table;
> +	}
>  
>  	result = acpi_thermal_zone_sysfs_add(tz);
>  	if (result)
> @@ -809,6 +863,8 @@ remove_links:
>  	acpi_thermal_zone_sysfs_remove(tz);
>  unregister_tzd:
>  	thermal_zone_device_unregister(tz->thermal_zone);
> +free_trip_table:
> +	kfree(tz->trip_table);
>  
>  	return result;
>  }
> @@ -817,6 +873,7 @@ static void acpi_thermal_unregister_ther
>  {
>  	acpi_thermal_zone_sysfs_remove(tz);
>  	thermal_zone_device_unregister(tz->thermal_zone);
> +	kfree(tz->trip_table);
>  	tz->thermal_zone = NULL;
>  	acpi_bus_detach_private_data(tz->device->handle);
>  }
> @@ -950,6 +1007,9 @@ static void acpi_thermal_check_fn(struct
>  {
>  	struct acpi_thermal *tz = container_of(work, struct acpi_thermal,
>  					       thermal_check_work);
> +	struct thermal_trip *trip;
> +	long temperature;
> +	int i;
>  
>  	/*
>  	 * In general, it is not sufficient to check the pending bit, because
> @@ -964,7 +1024,39 @@ static void acpi_thermal_check_fn(struct
>  
>  	mutex_lock(&tz->thermal_check_lock);
>  
> -	thermal_zone_device_update(tz->thermal_zone, THERMAL_EVENT_UNSPECIFIED);
> +	thermal_zone_device_lock(tz->thermal_zone);
> +
> +	trip = tz->trips.passive.trip_ref.trip;
> +	if (trip) {
> +		/*
> +		 * This means that the passive trip was valid initially, so
> +		 * update its temperature in case it has changed or the trip
> +		 * has become invalid.
> +		 */
> +		temperature = tz->trips.passive.valid ?
> +				tz->trips.passive.temperature :
> +				THERMAL_TEMP_INVALID;
> +		acpi_thermal_trip_update_temp(tz, trip, temperature);

This is a mistake, because THERMAL_TEMP_INVALID should not be passed to
acpi_thermal_trip_update_temp() (and same below).

I'll send an update of this patch in a moment.

> +	}
> +
> +	for (i = 0; i < ACPI_THERMAL_MAX_ACTIVE; i++) {
> +		trip = tz->trips.active[i].trip_ref.trip;
> +		if (trip) {
> +			/*
> +			 * This means that the active trip #i was valid
> +			 * initially, so update its temperature in case it has
> +			 * changed or the trip has become invalid.
> +			 */
> +			temperature = tz->trips.active[i].valid ?
> +					tz->trips.active[i].temperature :
> +					THERMAL_TEMP_INVALID;
> +			acpi_thermal_trip_update_temp(tz, trip, temperature);
> +		}
> +	}
> +
> +	__thermal_zone_device_update(tz->thermal_zone, THERMAL_EVENT_UNSPECIFIED);
> +
> +	thermal_zone_device_unlock(tz->thermal_zone);
>  
>  	refcount_inc(&tz->thermal_check_count);
>  
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -429,6 +429,7 @@ void __thermal_zone_device_update(struct
>  
>  	monitor_thermal_zone(tz);
>  }
> +EXPORT_SYMBOL_GPL(__thermal_zone_device_update);
>  
>  static int thermal_zone_device_set_mode(struct thermal_zone_device *tz,
>  					enum thermal_device_mode mode)
> Index: linux-pm/include/linux/thermal.h
> ===================================================================
> --- linux-pm.orig/include/linux/thermal.h
> +++ linux-pm/include/linux/thermal.h
> @@ -334,6 +334,8 @@ int thermal_zone_bind_cooling_device(str
>  				     unsigned int);
>  int thermal_zone_unbind_cooling_device(struct thermal_zone_device *, int,
>  				       struct thermal_cooling_device *);
> +void __thermal_zone_device_update(struct thermal_zone_device *,
> +				  enum thermal_notify_event);
>  void thermal_zone_device_update(struct thermal_zone_device *,
>  				enum thermal_notify_event);
>  void thermal_zone_device_lock(struct thermal_zone_device *tz);
>
Daniel Lezcano Aug. 1, 2023, 6:26 p.m. UTC | #3
Hi Rafael,

On 25/07/2023 14:02, Rafael J. Wysocki wrote:
> Hi Everyone,
> 
> This is the second iteration of the $subject patch series and its original
> description below is still applicable
> 
> On Tuesday, July 18, 2023 8:01:20 PM CEST Rafael J. Wysocki wrote:
>>
>> This patch series makes the ACPI thermal driver register thermal zones
>> with the help of thermal_zone_device_register_with_trips(), so it
>> doesn't need to use the thermal zone callbacks related to trip points
>> any more (and they are dropped in the last patch).
>>
>> The approach presented here is quite radically different from the
>> previous attempts, as it doesn't really rearrange the driver's
>> internal data structures, but adds the trip table support on top of
>> them.  For this purpose, it uses an additional field in struct thermal_trip
>> introduced in the first patch.
> 
> This update is mostly related to the observation that the critical and hot trip
> points never change after initialization, so they don't really need to be
> connected back to the corresponding thermal_trip structures.  It also fixes
> an error code path memory leak in patch [5/8].

I've been through the series. It is really cool that we can get rid of 
the ops usage at the end of the series.

However, the series introduces a wrapper to the thermal zone lock and 
exports that in the public header. That goes in the opposite direction 
of the recent cleanups and obviously will give the opportunity to 
drivers to do silly things [again].

On the other side, the structure thermal_trip introduces a circular 
reference, which is usually something to avoid.

Apart those two points, the ACPI changes look ok.

Comments in the different patches will follow

Thanks
Rafael J. Wysocki Aug. 1, 2023, 6:39 p.m. UTC | #4
On Tue, Aug 1, 2023 at 8:27 PM Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>
>
> Hi Rafael,
>
> On 25/07/2023 14:02, Rafael J. Wysocki wrote:
> > Hi Everyone,
> >
> > This is the second iteration of the $subject patch series and its original
> > description below is still applicable
> >
> > On Tuesday, July 18, 2023 8:01:20 PM CEST Rafael J. Wysocki wrote:
> >>
> >> This patch series makes the ACPI thermal driver register thermal zones
> >> with the help of thermal_zone_device_register_with_trips(), so it
> >> doesn't need to use the thermal zone callbacks related to trip points
> >> any more (and they are dropped in the last patch).
> >>
> >> The approach presented here is quite radically different from the
> >> previous attempts, as it doesn't really rearrange the driver's
> >> internal data structures, but adds the trip table support on top of
> >> them.  For this purpose, it uses an additional field in struct thermal_trip
> >> introduced in the first patch.
> >
> > This update is mostly related to the observation that the critical and hot trip
> > points never change after initialization, so they don't really need to be
> > connected back to the corresponding thermal_trip structures.  It also fixes
> > an error code path memory leak in patch [5/8].
>
> I've been through the series. It is really cool that we can get rid of
> the ops usage at the end of the series.
>
> However, the series introduces a wrapper to the thermal zone lock and
> exports that in the public header. That goes in the opposite direction
> of the recent cleanups and obviously will give the opportunity to
> drivers to do silly things [again].

Because it is necessary to update the trip points in the table under
the zone lock, the driver needs to somehow make that happen.

I suppose I can pass a callback to thermal_zone_device_update() or
similar, but I'm not sure if that's better.

> On the other side, the structure thermal_trip introduces a circular
> reference, which is usually something to avoid.

What do you mean by "a circular reference"?

> Apart those two points, the ACPI changes look ok.
>
> Comments in the different patches will follow

Thanks!
Rafael J. Wysocki Aug. 4, 2023, 8:58 p.m. UTC | #5
Hi Everyone,

This is the 4th iteration of the $subject patch series and its original
description below is still applicable

On Tuesday, July 18, 2023 8:01:20 PM CEST Rafael J. Wysocki wrote:
>
> This patch series makes the ACPI thermal driver register thermal zones
> with the help of thermal_zone_device_register_with_trips(), so it
> doesn't need to use the thermal zone callbacks related to trip points
> any more (and they are dropped in the last patch).
>
> The approach presented here is quite radically different from the
> previous attempts, as it doesn't really rearrange the driver's
> internal data structures, but adds the trip table support on top of
> them.  For this purpose, it uses an additional field in struct thermal_trip

This update uses a different approach to mutual exclusion between trip point
updates in the ACPI thermal driver and the thermal core (roughly speaking, it
adds a "thermal zone update" callback and a helper to invoke that callback
under the zone lock) and a different mechanism to get from the local trip
point representation in the driver to the corresponding trips[i] in the
thermal zone structure.

As a result, some have been dropped, some new patches have been added and
the majority of patches in the series have been reworked from scratch.

Thanks!