Message ID | 20220601151441.9128-3-daniel.lezcano@linaro.org |
---|---|
State | New |
Headers | show |
Series | [1/3] thermal/core: Encapsulate the set_cur_state function | expand |
Hi Daniel, On 6/1/22 16:14, Daniel Lezcano wrote: > The statistics are for debugging purpose and belong to debugfs rather > than sysfs. As the previous changes introduced the same statistics in > debugfs, those in sysfs are no longer needed and can be removed. I just want to let you know that in current Android kernels we cannot even compile the kernel with CONFIG_DEBUG_FS. I have this pain with Energy Model there... Some vendors might see useful info via this sysfs interface in bring-up of the SoC. I don't know if there are user-space tools tracking this information via sysfs. We probably should check that. I agree that these statistics look more like debug info, rather than something useful for control. Furthermore, we have trace events for the cooling state changes, which should be good enough for bring-up and experiments. I don't have strong preferences here. I tend to agree to remove this interface if there are no user-space tools using it. Regards, Lukasz
Hi Lukasz, [Adding Todd] On 01/06/2022 17:33, Lukasz Luba wrote: > Hi Daniel, > > > On 6/1/22 16:14, Daniel Lezcano wrote: >> The statistics are for debugging purpose and belong to debugfs rather >> than sysfs. As the previous changes introduced the same statistics in >> debugfs, those in sysfs are no longer needed and can be removed. > > I just want to let you know that in current Android kernels we cannot > even compile the kernel with CONFIG_DEBUG_FS. Right, it makes sense. Precisely, with the sysfs stats they are always compiled in for the Android kernel and is a problem for low memory systems. While debugfs can fulfill its purpose in the developement and will be removed in production systems. > I have this pain with > Energy Model there... Some vendors might see useful info via this > sysfs interface in bring-up of the SoC. Well alternatively, information can be extracted from procfs in the device-tree description. What prevents to add energy information in sysfs now that the energy model is per device ? > I don't know if there are user-space tools tracking this > information via sysfs. We probably should check that. > > I agree that these statistics look more like debug info, rather than > something useful for control. > > Furthermore, we have trace events for the cooling state changes, which > should be good enough for bring-up and experiments. > > I don't have strong preferences here. I tend to agree to remove this > interface if there are no user-space tools using it. I agree userspace can also get information about the transition but the goal of the debugfs is also add information about thermal internals like average temperature at mitigation time, min and max, timings, etc ...
On 6/2/22 09:37, Daniel Lezcano wrote: > > Hi Lukasz, > > [Adding Todd] > > On 01/06/2022 17:33, Lukasz Luba wrote: >> Hi Daniel, >> >> >> On 6/1/22 16:14, Daniel Lezcano wrote: >>> The statistics are for debugging purpose and belong to debugfs rather >>> than sysfs. As the previous changes introduced the same statistics in >>> debugfs, those in sysfs are no longer needed and can be removed. >> >> I just want to let you know that in current Android kernels we cannot >> even compile the kernel with CONFIG_DEBUG_FS. > > Right, it makes sense. Precisely, with the sysfs stats they are always > compiled in for the Android kernel and is a problem for low memory > systems. While debugfs can fulfill its purpose in the developement and > will be removed in production systems. True. > >> I have this pain with >> Energy Model there... Some vendors might see useful info via this >> sysfs interface in bring-up of the SoC. > > Well alternatively, information can be extracted from procfs in the > device-tree description. > > What prevents to add energy information in sysfs now that the energy > model is per device ? Probably nothing, but we need strong need. I have proposed this a few times internally, but this must have a requirement. If a user-space tool would ask for it, then I could send a patch exposing the sysfs. So far we have only one user-space tool, which suffers the missing debugfs EM dir: LISA (but we are working on a workaround for it). If you have a tool or plan to have such, which uses EM, please let me know. I'm gathering the requirements. > >> I don't know if there are user-space tools tracking this >> information via sysfs. We probably should check that. >> >> I agree that these statistics look more like debug info, rather than >> something useful for control. >> >> Furthermore, we have trace events for the cooling state changes, which >> should be good enough for bring-up and experiments. >> >> I don't have strong preferences here. I tend to agree to remove this >> interface if there are no user-space tools using it. > > I agree userspace can also get information about the transition but the > goal of the debugfs is also add information about thermal internals like > average temperature at mitigation time, min and max, timings, etc ... > > I see, it makes sense. Let's see if Todd and Android folks don't use this thermal sysfs stats, so we could remove them.
On Thu, Jun 2, 2022 at 2:16 AM Lukasz Luba <lukasz.luba@arm.com> wrote: > > > > On 6/2/22 09:37, Daniel Lezcano wrote: > > > > Hi Lukasz, > > > > [Adding Todd] > > > > On 01/06/2022 17:33, Lukasz Luba wrote: > >> Hi Daniel, > >> > >> > >> On 6/1/22 16:14, Daniel Lezcano wrote: > >>> The statistics are for debugging purpose and belong to debugfs rather > >>> than sysfs. As the previous changes introduced the same statistics in > >>> debugfs, those in sysfs are no longer needed and can be removed. > >> > >> I just want to let you know that in current Android kernels we cannot > >> even compile the kernel with CONFIG_DEBUG_FS. > > > > Right, it makes sense. Precisely, with the sysfs stats they are always > > compiled in for the Android kernel and is a problem for low memory > > systems. While debugfs can fulfill its purpose in the developement and > > will be removed in production systems. > > True. > > > > >> I have this pain with > >> Energy Model there... Some vendors might see useful info via this > >> sysfs interface in bring-up of the SoC. > > > > Well alternatively, information can be extracted from procfs in the > > device-tree description. > > > > What prevents to add energy information in sysfs now that the energy > > model is per device ? > > Probably nothing, but we need strong need. I have proposed this > a few times internally, but this must have a requirement. > If a user-space tool would ask for it, then I could send a patch > exposing the sysfs. So far we have only one user-space tool, which > suffers the missing debugfs EM dir: LISA (but we are working on a > workaround for it). > If you have a tool or plan to have such, which uses EM, please let > me know. I'm gathering the requirements. > > > > >> I don't know if there are user-space tools tracking this > >> information via sysfs. We probably should check that. > >> > >> I agree that these statistics look more like debug info, rather than > >> something useful for control. > >> > >> Furthermore, we have trace events for the cooling state changes, which > >> should be good enough for bring-up and experiments. > >> > >> I don't have strong preferences here. I tend to agree to remove this > >> interface if there are no user-space tools using it. > > > > I agree userspace can also get information about the transition but the > > goal of the debugfs is also add information about thermal internals like > > average temperature at mitigation time, min and max, timings, etc ... > > > > > > I see, it makes sense. Let's see if Todd and Android folks don't > use this thermal sysfs stats, so we could remove them. Android HALs do use the thermal sysfs stats. debugfs isn't a viable replacement since debugfs must not be mounted during normal operation.
Hi Todd, [adding Wei] On 02/06/2022 21:02, Todd Kjos wrote: > On Thu, Jun 2, 2022 at 2:16 AM Lukasz Luba <lukasz.luba@arm.com> wrote: [ ... ] >> I see, it makes sense. Let's see if Todd and Android folks don't >> use this thermal sysfs stats, so we could remove them. > > Android HALs do use the thermal sysfs stats. debugfs isn't a viable > replacement since debugfs must not be mounted during normal operation. Thanks for your answer. I'm curious, what is the purpose of getting the statistics, especially the transitions stats from normal operation? There were some complains about systems having a high number of cooling devices with a lot of states. The state transitions are represented as a matrix and result in up to hundred of megabytes of memory wasted. Moreover, sysfs being limited a page size, the output is often truncated. As it is automatically enabled for GKI, this waste of memory which is not negligible for system with low memory can not be avoided. I went through the thermal HAL but did not find an usage of these statistics, do you have a pointer to the code using them ? Thanks -- Daniel
On Fri, Jun 3, 2022 at 4:04 AM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: > > > Hi Todd, > > [adding Wei] > > On 02/06/2022 21:02, Todd Kjos wrote: > > On Thu, Jun 2, 2022 at 2:16 AM Lukasz Luba <lukasz.luba@arm.com> wrote: > > [ ... ] > > >> I see, it makes sense. Let's see if Todd and Android folks don't > >> use this thermal sysfs stats, so we could remove them. > > > > Android HALs do use the thermal sysfs stats. debugfs isn't a viable > > replacement since debugfs must not be mounted during normal operation. > > Thanks for your answer. > > I'm curious, what is the purpose of getting the statistics, especially > the transitions stats from normal operation? > > There were some complains about systems having a high number of cooling > devices with a lot of states. The state transitions are represented as a > matrix and result in up to hundred of megabytes of memory wasted. > > Moreover, sysfs being limited a page size, the output is often truncated. > > As it is automatically enabled for GKI, this waste of memory which is > not negligible for system with low memory can not be avoided. > > I went through the thermal HAL but did not find an usage of these > statistics, do you have a pointer to the code using them ? > > Thanks > > -- Daniel > > Sorry for the late reply, trying to catch up on emails after sick recovery. We use it for stats collection to understand thermal residency, and it is not in the HAL code, we don't use the transition table heavily though. Are some of the devices having too many cooling devices? Can we have a config to enable stats for a given cooling device? Thanks! -Wei > > -- > <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs > > Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | > <http://twitter.com/#!/linaroorg> Twitter | > <http://www.linaro.org/linaro-blog/> Blog
On 24/06/2022 08:02, Wei Wang wrote: > On Fri, Jun 3, 2022 at 4:04 AM Daniel Lezcano <daniel.lezcano@linaro.org> wrote: >> >> >> Hi Todd, >> >> [adding Wei] >> >> On 02/06/2022 21:02, Todd Kjos wrote: >>> On Thu, Jun 2, 2022 at 2:16 AM Lukasz Luba <lukasz.luba@arm.com> wrote: >> >> [ ... ] >> >>>> I see, it makes sense. Let's see if Todd and Android folks don't >>>> use this thermal sysfs stats, so we could remove them. >>> >>> Android HALs do use the thermal sysfs stats. debugfs isn't a viable >>> replacement since debugfs must not be mounted during normal operation. >> >> Thanks for your answer. >> >> I'm curious, what is the purpose of getting the statistics, especially >> the transitions stats from normal operation? >> >> There were some complains about systems having a high number of cooling >> devices with a lot of states. The state transitions are represented as a >> matrix and result in up to hundred of megabytes of memory wasted. >> >> Moreover, sysfs being limited a page size, the output is often truncated. >> >> As it is automatically enabled for GKI, this waste of memory which is >> not negligible for system with low memory can not be avoided. >> >> I went through the thermal HAL but did not find an usage of these >> statistics, do you have a pointer to the code using them ? >> >> Thanks >> >> -- Daniel >> >> > > Sorry for the late reply, trying to catch up on emails after sick > recovery. We use it for stats collection to understand thermal > residency, and it is not in the HAL code, we don't use the transition > table heavily though. Are some of the devices having too many cooling > devices? Can we have a config to enable stats for a given cooling > device? The stats table is bogus. As soon as the combination of the states leads to a size greater than a page size, then the result is truncated. As the cooling devices is also abused to mimic power capping capable device, we are ending up to 1024 states sometimes. Moreover, having the option set also create tables which take MB of memory for nothing. The option only enables the stats for all the cooling device stats. As you mentioned, it seems to you are using the stats for debugging purpose. The debugfs provides the same information, except it does only show the transitions would actually happened and we are no longer limited to a page size. Would it be acceptable to remove the sysfs stats ?
diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig index 42400c6aed61..0c7b776462a9 100644 --- a/drivers/thermal/Kconfig +++ b/drivers/thermal/Kconfig @@ -26,13 +26,6 @@ config THERMAL_NETLINK trip point crossed, cooling device update or governor change. It is recommended to enable the feature. -config THERMAL_STATISTICS - bool "Thermal state transition statistics" - help - Export thermal state transition statistics information through sysfs. - - If in doubt, say N. - config THERMAL_DEBUGFS bool "Thermal debugging file system" depends on DEBUG_FS diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index a2d6eb0d0895..4851f2c7910f 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -949,7 +949,6 @@ __thermal_cooling_device_register(struct device_node *np, return cdev; out_kfree_type: - thermal_cooling_device_destroy_sysfs(cdev); kfree(cdev->type); put_device(&cdev->device); cdev = NULL; @@ -1117,7 +1116,6 @@ void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev) ida_simple_remove(&thermal_cdev_ida, cdev->id); device_del(&cdev->device); - thermal_cooling_device_destroy_sysfs(cdev); kfree(cdev->type); put_device(&cdev->device); } diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h index 49331b8a5404..dcae52f20258 100644 --- a/drivers/thermal/thermal_core.h +++ b/drivers/thermal/thermal_core.h @@ -134,22 +134,12 @@ void thermal_set_delay_jiffies(unsigned long *delay_jiffies, int delay_ms); int thermal_zone_create_device_groups(struct thermal_zone_device *, int); void thermal_zone_destroy_device_groups(struct thermal_zone_device *); void thermal_cooling_device_setup_sysfs(struct thermal_cooling_device *); -void thermal_cooling_device_destroy_sysfs(struct thermal_cooling_device *cdev); /* used only at binding time */ ssize_t trip_point_show(struct device *, struct device_attribute *, char *); ssize_t weight_show(struct device *, struct device_attribute *, char *); ssize_t weight_store(struct device *, struct device_attribute *, const char *, size_t); -#ifdef CONFIG_THERMAL_STATISTICS -void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev, - unsigned long new_state); -#else -static inline void -thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev, - unsigned long new_state) {} -#endif /* CONFIG_THERMAL_STATISTICS */ - /* device tree support */ #ifdef CONFIG_THERMAL_OF int of_parse_thermal_zones(void); diff --git a/drivers/thermal/thermal_helpers.c b/drivers/thermal/thermal_helpers.c index 040d9e9b55e1..81185bb0d526 100644 --- a/drivers/thermal/thermal_helpers.c +++ b/drivers/thermal/thermal_helpers.c @@ -204,7 +204,6 @@ int thermal_cdev_set_state(struct thermal_cooling_device *cdev, int state) ret = cdev->ops->set_cur_state(cdev, state); if (!ret) { thermal_notify_cdev_state_update(cdev->id, state); - thermal_cooling_device_stats_update(cdev, state); thermal_debugfs_cdev_transition(cdev, state); } diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c index 935e79909121..e48c5ad26611 100644 --- a/drivers/thermal/thermal_sysfs.c +++ b/drivers/thermal/thermal_sysfs.c @@ -641,231 +641,14 @@ static const struct attribute_group cooling_device_attr_group = { static const struct attribute_group *cooling_device_attr_groups[] = { &cooling_device_attr_group, - NULL, /* Space allocated for cooling_device_stats_attr_group */ NULL, }; -#ifdef CONFIG_THERMAL_STATISTICS -struct cooling_dev_stats { - spinlock_t lock; - unsigned int total_trans; - unsigned long state; - unsigned long max_states; - ktime_t last_time; - ktime_t *time_in_state; - unsigned int *trans_table; -}; - -static void update_time_in_state(struct cooling_dev_stats *stats) -{ - ktime_t now = ktime_get(), delta; - - delta = ktime_sub(now, stats->last_time); - stats->time_in_state[stats->state] = - ktime_add(stats->time_in_state[stats->state], delta); - stats->last_time = now; -} - -void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev, - unsigned long new_state) -{ - struct cooling_dev_stats *stats = cdev->stats; - - if (!stats) - return; - - spin_lock(&stats->lock); - - if (stats->state == new_state) - goto unlock; - - update_time_in_state(stats); - stats->trans_table[stats->state * stats->max_states + new_state]++; - stats->state = new_state; - stats->total_trans++; - -unlock: - spin_unlock(&stats->lock); -} - -static ssize_t total_trans_show(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct thermal_cooling_device *cdev = to_cooling_device(dev); - struct cooling_dev_stats *stats = cdev->stats; - int ret; - - spin_lock(&stats->lock); - ret = sprintf(buf, "%u\n", stats->total_trans); - spin_unlock(&stats->lock); - - return ret; -} - -static ssize_t -time_in_state_ms_show(struct device *dev, struct device_attribute *attr, - char *buf) -{ - struct thermal_cooling_device *cdev = to_cooling_device(dev); - struct cooling_dev_stats *stats = cdev->stats; - ssize_t len = 0; - int i; - - spin_lock(&stats->lock); - update_time_in_state(stats); - - for (i = 0; i < stats->max_states; i++) { - len += sprintf(buf + len, "state%u\t%llu\n", i, - ktime_to_ms(stats->time_in_state[i])); - } - spin_unlock(&stats->lock); - - return len; -} - -static ssize_t -reset_store(struct device *dev, struct device_attribute *attr, const char *buf, - size_t count) -{ - struct thermal_cooling_device *cdev = to_cooling_device(dev); - struct cooling_dev_stats *stats = cdev->stats; - int i, states = stats->max_states; - - spin_lock(&stats->lock); - - stats->total_trans = 0; - stats->last_time = ktime_get(); - memset(stats->trans_table, 0, - states * states * sizeof(*stats->trans_table)); - - for (i = 0; i < stats->max_states; i++) - stats->time_in_state[i] = ktime_set(0, 0); - - spin_unlock(&stats->lock); - - return count; -} - -static ssize_t trans_table_show(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct thermal_cooling_device *cdev = to_cooling_device(dev); - struct cooling_dev_stats *stats = cdev->stats; - ssize_t len = 0; - int i, j; - - len += snprintf(buf + len, PAGE_SIZE - len, " From : To\n"); - len += snprintf(buf + len, PAGE_SIZE - len, " : "); - for (i = 0; i < stats->max_states; i++) { - if (len >= PAGE_SIZE) - break; - len += snprintf(buf + len, PAGE_SIZE - len, "state%2u ", i); - } - if (len >= PAGE_SIZE) - return PAGE_SIZE; - - len += snprintf(buf + len, PAGE_SIZE - len, "\n"); - - for (i = 0; i < stats->max_states; i++) { - if (len >= PAGE_SIZE) - break; - - len += snprintf(buf + len, PAGE_SIZE - len, "state%2u:", i); - - for (j = 0; j < stats->max_states; j++) { - if (len >= PAGE_SIZE) - break; - len += snprintf(buf + len, PAGE_SIZE - len, "%8u ", - stats->trans_table[i * stats->max_states + j]); - } - if (len >= PAGE_SIZE) - break; - len += snprintf(buf + len, PAGE_SIZE - len, "\n"); - } - - if (len >= PAGE_SIZE) { - pr_warn_once("Thermal transition table exceeds PAGE_SIZE. Disabling\n"); - return -EFBIG; - } - return len; -} - -static DEVICE_ATTR_RO(total_trans); -static DEVICE_ATTR_RO(time_in_state_ms); -static DEVICE_ATTR_WO(reset); -static DEVICE_ATTR_RO(trans_table); - -static struct attribute *cooling_device_stats_attrs[] = { - &dev_attr_total_trans.attr, - &dev_attr_time_in_state_ms.attr, - &dev_attr_reset.attr, - &dev_attr_trans_table.attr, - NULL -}; - -static const struct attribute_group cooling_device_stats_attr_group = { - .attrs = cooling_device_stats_attrs, - .name = "stats" -}; - -static void cooling_device_stats_setup(struct thermal_cooling_device *cdev) -{ - struct cooling_dev_stats *stats; - unsigned long states; - int var; - - if (cdev->ops->get_max_state(cdev, &states)) - return; - - states++; /* Total number of states is highest state + 1 */ - - var = sizeof(*stats); - var += sizeof(*stats->time_in_state) * states; - var += sizeof(*stats->trans_table) * states * states; - - stats = kzalloc(var, GFP_KERNEL); - if (!stats) - return; - - stats->time_in_state = (ktime_t *)(stats + 1); - stats->trans_table = (unsigned int *)(stats->time_in_state + states); - cdev->stats = stats; - stats->last_time = ktime_get(); - stats->max_states = states; - - spin_lock_init(&stats->lock); - - /* Fill the empty slot left in cooling_device_attr_groups */ - var = ARRAY_SIZE(cooling_device_attr_groups) - 2; - cooling_device_attr_groups[var] = &cooling_device_stats_attr_group; -} - -static void cooling_device_stats_destroy(struct thermal_cooling_device *cdev) -{ - kfree(cdev->stats); - cdev->stats = NULL; -} - -#else - -static inline void -cooling_device_stats_setup(struct thermal_cooling_device *cdev) {} -static inline void -cooling_device_stats_destroy(struct thermal_cooling_device *cdev) {} - -#endif /* CONFIG_THERMAL_STATISTICS */ - void thermal_cooling_device_setup_sysfs(struct thermal_cooling_device *cdev) { - cooling_device_stats_setup(cdev); cdev->device.groups = cooling_device_attr_groups; } -void thermal_cooling_device_destroy_sysfs(struct thermal_cooling_device *cdev) -{ - cooling_device_stats_destroy(cdev); -} - /* these helper will be used only at the time of bindig */ ssize_t trip_point_show(struct device *dev, struct device_attribute *attr, char *buf) diff --git a/include/linux/thermal.h b/include/linux/thermal.h index 4a69e8a6868e..522c9180a08d 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -96,7 +96,6 @@ struct thermal_cooling_device { struct device device; struct device_node *np; void *devdata; - void *stats; const struct thermal_cooling_device_ops *ops; bool updated; /* true if the cooling device does not need update */ struct mutex lock; /* protect thermal_instances list */
The statistics are for debugging purpose and belong to debugfs rather than sysfs. As the previous changes introduced the same statistics in debugfs, those in sysfs are no longer needed and can be removed. That solves a couple of problems reported by different SoC vendors: - Extra memory used in low profile boards where the memory is limited resource - Truncated information and memory allocation failure with cooling devices having a large number of states Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> --- drivers/thermal/Kconfig | 7 - drivers/thermal/thermal_core.c | 2 - drivers/thermal/thermal_core.h | 10 -- drivers/thermal/thermal_helpers.c | 1 - drivers/thermal/thermal_sysfs.c | 217 ------------------------------ include/linux/thermal.h | 1 - 6 files changed, 238 deletions(-)