diff mbox series

[V5] thermal: Add cooling device's statistics in sysfs

Message ID bab196f91cbddca175c725ae6159c0af639bfe07.1522666398.git.viresh.kumar@linaro.org
State Accepted
Commit 8ea229511e06f9635ecc338dcbe0db41a73623f0
Headers show
Series [V5] thermal: Add cooling device's statistics in sysfs | expand

Commit Message

Viresh Kumar April 2, 2018, 10:56 a.m. UTC
This extends the sysfs interface for thermal cooling devices and exposes
some pretty useful statistics. These statistics have proven to be quite
useful specially while doing benchmarks related to the task scheduler,
where we want to make sure that nothing has disrupted the test,
specially the cooling device which may have put constraints on the CPUs.
The information exposed here tells us to what extent the CPUs were
constrained by the thermal framework.

The write-only "reset" file is used to reset the statistics.

The read-only "time_in_state_ms" file shows the time (in msec) spent by the
device in the respective cooling states, and it prints one line per
cooling state.

The read-only "total_trans" file shows single positive integer value
showing the total number of cooling state transitions the device has
gone through since the time the cooling device is registered or the time
when statistics were reset last.

The read-only "trans_table" file shows a two dimensional matrix, where
an entry <i,j> (row i, column j) represents the number of transitions
from State_i to State_j.

This is how the directory structure looks like for a single cooling
device:

$ ls -R /sys/class/thermal/cooling_device0/
/sys/class/thermal/cooling_device0/:
cur_state  max_state  power  stats  subsystem  type  uevent

/sys/class/thermal/cooling_device0/power:
autosuspend_delay_ms  runtime_active_time  runtime_suspended_time
control               runtime_status

/sys/class/thermal/cooling_device0/stats:
reset  time_in_state_ms  total_trans  trans_table

This is tested on ARM 64-bit Hisilicon hikey620 board running Ubuntu and
ARM 64-bit Hisilicon hikey960 board running Android.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

---
V4->V5:
- time_in_state's unit is msec now instead of clock_t.
- Remove double setting of ->stats pointer.

V3->V4:
- Added CONFIG_THERMAL_STATISTICS
- Added transition table file in sysfs
- Updated documentation for new sysfs files
- The unit of time in time_in_state is clock_t now
- Separate routines for cooling device stat setup/destroy

V2->V3:
- Total number of states is max_level + 1. The earlier version didn't
  take that into account and so the stats for the highest state were
  missing.

V1->V2:
- Move to sysfs from debugfs

 Documentation/thermal/sysfs-api.txt |  31 +++++
 drivers/thermal/Kconfig             |   7 ++
 drivers/thermal/thermal_core.c      |   3 +-
 drivers/thermal/thermal_core.h      |  10 ++
 drivers/thermal/thermal_helpers.c   |   5 +-
 drivers/thermal/thermal_sysfs.c     | 225 ++++++++++++++++++++++++++++++++++++
 include/linux/thermal.h             |   1 +
 7 files changed, 280 insertions(+), 2 deletions(-)

-- 
2.15.0.194.g9af6a3dea062

Comments

Dmitry Osipenko Aug. 13, 2018, 4:06 p.m. UTC | #1
On 02.04.2018 13:56, Viresh Kumar wrote:
> This extends the sysfs interface for thermal cooling devices and exposes

> some pretty useful statistics. These statistics have proven to be quite

> useful specially while doing benchmarks related to the task scheduler,

> where we want to make sure that nothing has disrupted the test,

> specially the cooling device which may have put constraints on the CPUs.

> The information exposed here tells us to what extent the CPUs were

> constrained by the thermal framework.

> 

> The write-only "reset" file is used to reset the statistics.

> 

> The read-only "time_in_state_ms" file shows the time (in msec) spent by the

> device in the respective cooling states, and it prints one line per

> cooling state.

> 

> The read-only "total_trans" file shows single positive integer value

> showing the total number of cooling state transitions the device has

> gone through since the time the cooling device is registered or the time

> when statistics were reset last.

> 

> The read-only "trans_table" file shows a two dimensional matrix, where

> an entry <i,j> (row i, column j) represents the number of transitions

> from State_i to State_j.

> 

> This is how the directory structure looks like for a single cooling

> device:

> 

> $ ls -R /sys/class/thermal/cooling_device0/

> /sys/class/thermal/cooling_device0/:

> cur_state  max_state  power  stats  subsystem  type  uevent

> 

> /sys/class/thermal/cooling_device0/power:

> autosuspend_delay_ms  runtime_active_time  runtime_suspended_time

> control               runtime_status

> 

> /sys/class/thermal/cooling_device0/stats:

> reset  time_in_state_ms  total_trans  trans_table

> 

> This is tested on ARM 64-bit Hisilicon hikey620 board running Ubuntu and

> ARM 64-bit Hisilicon hikey960 board running Android.

> 

> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

> ---


Hello,

I'm working on adding support of OPP and cooling for NVIDIA Tegra20/30 CPUFreq driver and stumbled upon a bug that is introduced by this patch. It is triggered on the driver module unload.

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 6ab982309e6a..de53c821a282 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -1102,8 +1102,8 @@ void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev)
        mutex_unlock(&thermal_list_lock);
 
        ida_simple_remove(&thermal_cdev_ida, cdev->id);
-       device_unregister(&cdev->device);
        thermal_cooling_device_destroy_sysfs(cdev);
+       device_unregister(&cdev->device);
 }
 EXPORT_SYMBOL_GPL(thermal_cooling_device_unregister);

This patch fixes the issue with the "cooling_device", but I'm not sure that it won't break thermal_zone". Also see KASAN report below.


[   65.553469] ==================================================================
[   65.572514] BUG: KASAN: use-after-free in thermal_cooling_device_destroy_sysfs+0x24/0x40
[   65.592300] Read of size 4 at addr ced17c80 by task rmmod/206

[   65.632387] CPU: 1 PID: 206 Comm: rmmod Not tainted 4.18.0-rc8-next-20180810-00148-g2863c2b33049-dirty #361
[   65.654241] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
[   65.676552] [<c0116784>] (unwind_backtrace) from [<c010fd54>] (show_stack+0x20/0x24)
[   65.699719] [<c010fd54>] (show_stack) from [<c10861b4>] (dump_stack+0x9c/0xb0)
[   65.723224] [<c10861b4>] (dump_stack) from [<c03012ac>] (print_address_description+0x60/0x268)
[   65.747525] [<c03012ac>] (print_address_description) from [<c03018c8>] (kasan_report+0x120/0x388)
[   65.771873] [<c03018c8>] (kasan_report) from [<c02fff44>] (__asan_load4+0x64/0xb4)
[   65.796324] [<c02fff44>] (__asan_load4) from [<c0b76d00>] (thermal_cooling_device_destroy_sysfs+0x24/0x40)
[   65.820990] [<c0b76d00>] (thermal_cooling_device_destroy_sysfs) from [<c0b73804>] (thermal_cooling_device_unregister+0x130/0x238)
[   65.846039] [<c0b73804>] (thermal_cooling_device_unregister) from [<c0b7a26c>] (cpufreq_cooling_unregister+0xa8/0xfc)
[   65.870897] [<c0b7a26c>] (cpufreq_cooling_unregister) from [<bf0003c0>] (tegra_cpu_exit+0x2c/0x74 [tegra20_cpufreq])
[   65.895940] [<bf0003c0>] (tegra_cpu_exit [tegra20_cpufreq]) from [<c0b83fa4>] (cpufreq_offline+0x160/0x298)
[   65.920899] [<c0b83fa4>] (cpufreq_offline) from [<c0b841cc>] (cpufreq_remove_dev+0xd0/0xd4)
[   65.945804] [<c0b841cc>] (cpufreq_remove_dev) from [<c0867c90>] (subsys_interface_unregister+0xe4/0x130)
[   65.971622] [<c0867c90>] (subsys_interface_unregister) from [<c0b823f0>] (cpufreq_unregister_driver+0x44/0x8c)
[   65.998135] [<c0b823f0>] (cpufreq_unregister_driver) from [<bf00002c>] (tegra20_cpufreq_remove+0x2c/0x34 [tegra20_cpufreq])
[   66.025805] [<bf00002c>] (tegra20_cpufreq_remove [tegra20_cpufreq]) from [<c086cde4>] (platform_drv_remove+0x44/0x64)
[   66.053768] [<c086cde4>] (platform_drv_remove) from [<c086a93c>] (device_release_driver_internal+0x1f0/0x2e0)
[   66.081707] [<c086a93c>] (device_release_driver_internal) from [<c086aab8>] (driver_detach+0x68/0xb8)
[   66.110346] [<c086aab8>] (driver_detach) from [<c0869128>] (bus_remove_driver+0x84/0xfc)
[   66.139530] [<c0869128>] (bus_remove_driver) from [<c086b898>] (driver_unregister+0x4c/0x6c)
[   66.169514] [<c086b898>] (driver_unregister) from [<c086cee8>] (platform_driver_unregister+0x1c/0x20)
[   66.200091] [<c086cee8>] (platform_driver_unregister) from [<bf000980>] (tegra20_cpufreq_driver_exit+0x18/0x698 [tegra20_cpufreq])
[   66.232017] [<bf000980>] (tegra20_cpufreq_driver_exit [tegra20_cpufreq]) from [<c01ff02c>] (sys_delete_module+0x198/0x224)
[   66.264804] [<c01ff02c>] (sys_delete_module) from [<c0101000>] (ret_fast_syscall+0x0/0x58)
[   66.298137] Exception stack(0xce94bfa8 to 0xce94bff0)
[   66.331825] bfa0:                   0003f0d0 00000002 0003f10c 00000800 5e6a7500 5e6a7500
[   66.366665] bfc0: 0003f0d0 00000002 0003f0d0 00000081 b6a723d0 b6a7207c b6a7226c 00000001
[   66.401864] bfe0: aec42610 b6a72014 00022408 aec4261c

[   66.472603] Allocated by task 151:
[   66.508377]  kasan_kmalloc+0xd4/0x174
[   66.544570]  kmem_cache_alloc_trace+0x198/0x2e8
[   66.581197]  __thermal_cooling_device_register+0x9c/0x4c0
[   66.618085]  thermal_of_cooling_device_register+0x18/0x1c
[   66.655387]  __cpufreq_cooling_register+0x4c4/0x604
[   66.692976]  of_cpufreq_cooling_register+0x88/0xe8
[   66.730726]  tegra_cpu_ready+0x28/0x3c [tegra20_cpufreq]
[   66.768872]  cpufreq_online+0x798/0x8d0
[   66.807262]  cpufreq_add_dev+0xa0/0xac
[   66.845892]  subsys_interface_register+0x104/0x148
[   66.884167]  cpufreq_register_driver+0x1d0/0x264
[   66.922070]  tegra20_cpufreq_probe+0x1f8/0x27c [tegra20_cpufreq]
[   66.959803]  platform_drv_probe+0x70/0xc8
[   66.997149]  really_probe+0x284/0x3d4
[   67.034006]  driver_probe_device+0x80/0x1b8
[   67.070515]  __driver_attach+0x130/0x134
[   67.106447]  bus_for_each_dev+0x98/0xc4
[   67.141867]  driver_attach+0x38/0x3c
[   67.177010]  bus_add_driver+0x238/0x2cc
[   67.211717]  driver_register+0xdc/0x1b0
[   67.245684]  __platform_driver_register+0x7c/0x84
[   67.279456]  0xbf005028
[   67.312693]  do_one_initcall+0x60/0x344
[   67.345795]  do_init_module+0xe4/0x30c
[   67.378294]  load_module+0x3008/0x3784
[   67.410423]  sys_finit_module+0xac/0xc4
[   67.442102]  ret_fast_syscall+0x0/0x58
[   67.472788]  0xb6781c10

[   67.531724] Freed by task 206:
[   67.560135]  __kasan_slab_free+0x12c/0x204
[   67.587993]  kasan_slab_free+0x14/0x18
[   67.615343]  kfree+0x90/0x294
[   67.642143]  thermal_release+0x6c/0x98
[   67.668309]  device_release+0x4c/0xe8
[   67.693667]  kobject_put+0xac/0x11c
[   67.718166]  device_unregister+0x2c/0x30
[   67.742308]  thermal_cooling_device_unregister+0x128/0x238
[   67.766189]  cpufreq_cooling_unregister+0xa8/0xfc
[   67.789630]  tegra_cpu_exit+0x2c/0x74 [tegra20_cpufreq]
[   67.812973]  cpufreq_offline+0x160/0x298
[   67.835506]  cpufreq_remove_dev+0xd0/0xd4
[   67.857115]  subsys_interface_unregister+0xe4/0x130
[   67.878280]  cpufreq_unregister_driver+0x44/0x8c
[   67.899235]  tegra20_cpufreq_remove+0x2c/0x34 [tegra20_cpufreq]
[   67.919948]  platform_drv_remove+0x44/0x64
[   67.940467]  device_release_driver_internal+0x1f0/0x2e0
[   67.960895]  driver_detach+0x68/0xb8
[   67.981161]  bus_remove_driver+0x84/0xfc
[   68.001382]  driver_unregister+0x4c/0x6c
[   68.021561]  platform_driver_unregister+0x1c/0x20
[   68.041879]  tegra20_cpufreq_driver_exit+0x18/0x698 [tegra20_cpufreq]
[   68.062376]  sys_delete_module+0x198/0x224
[   68.082826]  ret_fast_syscall+0x0/0x58
[   68.103010]  0xb6a72014

-- 
Dmitry
Viresh Kumar Aug. 13, 2018, 4:21 p.m. UTC | #2
On 13 August 2018 at 21:36, Dmitry Osipenko <digetx@gmail.com> wrote:

> I'm working on adding support of OPP and cooling for NVIDIA Tegra20/30 CPUFreq driver and stumbled upon a bug that is introduced by this patch. It is triggered on the driver module unload.


The problem is that device_unregister() will end up freeing the cdev as well, so
the current sequence is surely wrong.

> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c

> index 6ab982309e6a..de53c821a282 100644

> --- a/drivers/thermal/thermal_core.c

> +++ b/drivers/thermal/thermal_core.c

> @@ -1102,8 +1102,8 @@ void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev)

>         mutex_unlock(&thermal_list_lock);

>

>         ida_simple_remove(&thermal_cdev_ida, cdev->id);

> -       device_unregister(&cdev->device);

>         thermal_cooling_device_destroy_sysfs(cdev);

> +       device_unregister(&cdev->device);


But this looks wrong as well, as the device is still around while
memory of its sysfs data is gone.

Maybe something like this is what we need:

device_del();
thermal_cooling_device_destroy_sysfs();
device_put();

--
viresh
Dmitry Osipenko Aug. 13, 2018, 4:43 p.m. UTC | #3
On Monday, 13 August 2018 19:21:43 MSK Viresh Kumar wrote:
> On 13 August 2018 at 21:36, Dmitry Osipenko <digetx@gmail.com> wrote:

> > I'm working on adding support of OPP and cooling for NVIDIA Tegra20/30

> > CPUFreq driver and stumbled upon a bug that is introduced by this patch.

> > It is triggered on the driver module unload.

> The problem is that device_unregister() will end up freeing the cdev as

> well, so the current sequence is surely wrong.

> 

> > diff --git a/drivers/thermal/thermal_core.c

> > b/drivers/thermal/thermal_core.c index 6ab982309e6a..de53c821a282 100644

> > --- a/drivers/thermal/thermal_core.c

> > +++ b/drivers/thermal/thermal_core.c

> > @@ -1102,8 +1102,8 @@ void thermal_cooling_device_unregister(struct

> > thermal_cooling_device *cdev)> 

> >         mutex_unlock(&thermal_list_lock);

> >         

> >         ida_simple_remove(&thermal_cdev_ida, cdev->id);

> > 

> > -       device_unregister(&cdev->device);

> > 

> >         thermal_cooling_device_destroy_sysfs(cdev);

> > 

> > +       device_unregister(&cdev->device);

> 

> But this looks wrong as well, as the device is still around while

> memory of its sysfs data is gone.


Indeed.

> Maybe something like this is what we need:

> 

> device_del();

> thermal_cooling_device_destroy_sysfs();

> device_put();


[I just realized that thermal_zone and cooling_device are not interrelated. 
I'm not familiar with the thermal/ code]

Thank you Viresh, your proposal looks good to me and works fine. Will you make 
a proper patch?
Viresh Kumar Aug. 13, 2018, 4:53 p.m. UTC | #4
On 13 August 2018 at 22:13, Dmitry Osipenko <digetx@gmail.com> wrote:
> On Monday, 13 August 2018 19:21:43 MSK Viresh Kumar wrote:

>> On 13 August 2018 at 21:36, Dmitry Osipenko <digetx@gmail.com> wrote:

>> > I'm working on adding support of OPP and cooling for NVIDIA Tegra20/30

>> > CPUFreq driver and stumbled upon a bug that is introduced by this patch.

>> > It is triggered on the driver module unload.

>> The problem is that device_unregister() will end up freeing the cdev as

>> well, so the current sequence is surely wrong.

>>

>> > diff --git a/drivers/thermal/thermal_core.c

>> > b/drivers/thermal/thermal_core.c index 6ab982309e6a..de53c821a282 100644

>> > --- a/drivers/thermal/thermal_core.c

>> > +++ b/drivers/thermal/thermal_core.c

>> > @@ -1102,8 +1102,8 @@ void thermal_cooling_device_unregister(struct

>> > thermal_cooling_device *cdev)>

>> >         mutex_unlock(&thermal_list_lock);

>> >

>> >         ida_simple_remove(&thermal_cdev_ida, cdev->id);

>> >

>> > -       device_unregister(&cdev->device);

>> >

>> >         thermal_cooling_device_destroy_sysfs(cdev);

>> >

>> > +       device_unregister(&cdev->device);

>>

>> But this looks wrong as well, as the device is still around while

>> memory of its sysfs data is gone.

>

> Indeed.

>

>> Maybe something like this is what we need:

>>

>> device_del();

>> thermal_cooling_device_destroy_sysfs();

>> device_put();

>

> [I just realized that thermal_zone and cooling_device are not interrelated.

> I'm not familiar with the thermal/ code]

>

> Thank you Viresh, your proposal looks good to me and works fine. Will you make

> a proper patch?


Maybe you send a patch for this and take the credit as well :)
Dmitry Osipenko Aug. 13, 2018, 5:02 p.m. UTC | #5
On Monday, 13 August 2018 19:53:33 MSK Viresh Kumar wrote:
> On 13 August 2018 at 22:13, Dmitry Osipenko <digetx@gmail.com> wrote:

> > On Monday, 13 August 2018 19:21:43 MSK Viresh Kumar wrote:

> >> On 13 August 2018 at 21:36, Dmitry Osipenko <digetx@gmail.com> wrote:

> >> > I'm working on adding support of OPP and cooling for NVIDIA Tegra20/30

> >> > CPUFreq driver and stumbled upon a bug that is introduced by this

> >> > patch.

> >> > It is triggered on the driver module unload.

> >> 

> >> The problem is that device_unregister() will end up freeing the cdev as

> >> well, so the current sequence is surely wrong.

> >> 

> >> > diff --git a/drivers/thermal/thermal_core.c

> >> > b/drivers/thermal/thermal_core.c index 6ab982309e6a..de53c821a282

> >> > 100644

> >> > --- a/drivers/thermal/thermal_core.c

> >> > +++ b/drivers/thermal/thermal_core.c

> >> > @@ -1102,8 +1102,8 @@ void thermal_cooling_device_unregister(struct

> >> > thermal_cooling_device *cdev)>

> >> > 

> >> >         mutex_unlock(&thermal_list_lock);

> >> >         

> >> >         ida_simple_remove(&thermal_cdev_ida, cdev->id);

> >> > 

> >> > -       device_unregister(&cdev->device);

> >> > 

> >> >         thermal_cooling_device_destroy_sysfs(cdev);

> >> > 

> >> > +       device_unregister(&cdev->device);

> >> 

> >> But this looks wrong as well, as the device is still around while

> >> memory of its sysfs data is gone.

> > 

> > Indeed.

> > 

> >> Maybe something like this is what we need:

> >> 

> >> device_del();

> >> thermal_cooling_device_destroy_sysfs();

> >> device_put();

> > 

> > [I just realized that thermal_zone and cooling_device are not

> > interrelated.

> > I'm not familiar with the thermal/ code]

> > 

> > Thank you Viresh, your proposal looks good to me and works fine. Will you

> > make a proper patch?

> 

> Maybe you send a patch for this and take the credit as well :)


Okay, I'll make the patch. Thank you for the help.
diff mbox series

Patch

diff --git a/Documentation/thermal/sysfs-api.txt b/Documentation/thermal/sysfs-api.txt
index bb9a0a53e76b..911399730c1c 100644
--- a/Documentation/thermal/sysfs-api.txt
+++ b/Documentation/thermal/sysfs-api.txt
@@ -255,6 +255,7 @@  temperature) and throttle appropriate devices.
 2. sysfs attributes structure
 
 RO	read only value
+WO	write only value
 RW	read/write value
 
 Thermal sysfs attributes will be represented under /sys/class/thermal.
@@ -286,6 +287,11 @@  if hwmon is compiled in or built as a module.
     |---type:			Type of the cooling device(processor/fan/...)
     |---max_state:		Maximum cooling state of the cooling device
     |---cur_state:		Current cooling state of the cooling device
+    |---stats:			Directory containing cooling device's statistics
+    |---stats/reset:		Writing any value resets the statistics
+    |---stats/time_in_state_ms:	Time (msec) spent in various cooling states
+    |---stats/total_trans:	Total number of times cooling state is changed
+    |---stats/trans_table:	Cooing state transition table
 
 
 Then next two dynamic attributes are created/removed in pairs. They represent
@@ -490,6 +496,31 @@  cur_state
 	- cur_state == max_state means the maximum cooling.
 	RW, Required
 
+stats/reset
+	Writing any value resets the cooling device's statistics.
+	WO, Required
+
+stats/time_in_state_ms:
+	The amount of time spent by the cooling device in various cooling
+	states. The output will have "<state> <time>" pair in each line, which
+	will mean this cooling device spent <time> msec of time at <state>.
+	Output will have one line for each of the supported states.  usertime
+	units here is 10mS (similar to other time exported in /proc).
+	RO, Required
+
+stats/total_trans:
+	A single positive value showing the total number of times the state of a
+	cooling device is changed.
+	RO, Required
+
+stats/trans_table:
+	This gives fine grained information about all the cooling state
+	transitions. The cat output here is a two dimensional matrix, where an
+	entry <i,j> (row i, column j) represents the number of transitions from
+	State_i to State_j. If the transition table is bigger than PAGE_SIZE,
+	reading this will return an -EFBIG error.
+	RO, Required
+
 3. A simple implementation
 
 ACPI thermal zone may support multiple trip points like critical, hot,
diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index b6adc54b96f1..82979880f985 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -15,6 +15,13 @@  menuconfig THERMAL
 
 if THERMAL
 
+config THERMAL_STATISTICS
+	bool "Thermal state transition statistics"
+	help
+	  Export thermal state transition statistics information through sysfs.
+
+	  If in doubt, say N.
+
 config THERMAL_EMERGENCY_POWEROFF_DELAY_MS
 	int "Emergency poweroff delay in milli-seconds"
 	depends on THERMAL
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 2b1b0ba393a4..d64325e078db 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -972,8 +972,8 @@  __thermal_cooling_device_register(struct device_node *np,
 	cdev->ops = ops;
 	cdev->updated = false;
 	cdev->device.class = &thermal_class;
-	thermal_cooling_device_setup_sysfs(cdev);
 	cdev->devdata = devdata;
+	thermal_cooling_device_setup_sysfs(cdev);
 	dev_set_name(&cdev->device, "cooling_device%d", cdev->id);
 	result = device_register(&cdev->device);
 	if (result) {
@@ -1106,6 +1106,7 @@  void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev)
 
 	ida_simple_remove(&thermal_cdev_ida, cdev->id);
 	device_unregister(&cdev->device);
+	thermal_cooling_device_destroy_sysfs(cdev);
 }
 EXPORT_SYMBOL_GPL(thermal_cooling_device_unregister);
 
diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
index 27e3b1df7360..5e4150261500 100644
--- a/drivers/thermal/thermal_core.h
+++ b/drivers/thermal/thermal_core.h
@@ -73,6 +73,7 @@  int thermal_build_list_of_policies(char *buf);
 int thermal_zone_create_device_groups(struct thermal_zone_device *, int);
 void thermal_zone_destroy_device_groups(struct thermal_zone_device *);
 void thermal_cooling_device_setup_sysfs(struct thermal_cooling_device *);
+void thermal_cooling_device_destroy_sysfs(struct thermal_cooling_device *cdev);
 /* used only at binding time */
 ssize_t
 thermal_cooling_device_trip_point_show(struct device *,
@@ -84,6 +85,15 @@  ssize_t thermal_cooling_device_weight_store(struct device *,
 					    struct device_attribute *,
 					    const char *, size_t);
 
+#ifdef CONFIG_THERMAL_STATISTICS
+void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev,
+					 unsigned long new_state);
+#else
+static inline void
+thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev,
+				    unsigned long new_state) {}
+#endif /* CONFIG_THERMAL_STATISTICS */
+
 #ifdef CONFIG_THERMAL_GOV_STEP_WISE
 int thermal_gov_step_wise_register(void);
 void thermal_gov_step_wise_unregister(void);
diff --git a/drivers/thermal/thermal_helpers.c b/drivers/thermal/thermal_helpers.c
index 8cdf75adcce1..eb03d7e099bb 100644
--- a/drivers/thermal/thermal_helpers.c
+++ b/drivers/thermal/thermal_helpers.c
@@ -187,7 +187,10 @@  void thermal_cdev_update(struct thermal_cooling_device *cdev)
 		if (instance->target > target)
 			target = instance->target;
 	}
-	cdev->ops->set_cur_state(cdev, target);
+
+	if (!cdev->ops->set_cur_state(cdev, target))
+		thermal_cooling_device_stats_update(cdev, target);
+
 	cdev->updated = true;
 	mutex_unlock(&cdev->lock);
 	trace_cdev_update(cdev, target);
diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index ba81c9080f6e..23b5e0a709b0 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -20,6 +20,7 @@ 
 #include <linux/err.h>
 #include <linux/slab.h>
 #include <linux/string.h>
+#include <linux/jiffies.h>
 
 #include "thermal_core.h"
 
@@ -721,6 +722,7 @@  thermal_cooling_device_cur_state_store(struct device *dev,
 	result = cdev->ops->set_cur_state(cdev, state);
 	if (result)
 		return result;
+	thermal_cooling_device_stats_update(cdev, state);
 	return count;
 }
 
@@ -745,14 +747,237 @@  static const struct attribute_group cooling_device_attr_group = {
 
 static const struct attribute_group *cooling_device_attr_groups[] = {
 	&cooling_device_attr_group,
+	NULL, /* Space allocated for cooling_device_stats_attr_group */
 	NULL,
 };
 
+#ifdef CONFIG_THERMAL_STATISTICS
+struct cooling_dev_stats {
+	spinlock_t lock;
+	unsigned int total_trans;
+	unsigned long state;
+	unsigned long max_states;
+	ktime_t last_time;
+	ktime_t *time_in_state;
+	unsigned int *trans_table;
+};
+
+static void update_time_in_state(struct cooling_dev_stats *stats)
+{
+	ktime_t now = ktime_get(), delta;
+
+	delta = ktime_sub(now, stats->last_time);
+	stats->time_in_state[stats->state] =
+		ktime_add(stats->time_in_state[stats->state], delta);
+	stats->last_time = now;
+}
+
+void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev,
+					 unsigned long new_state)
+{
+	struct cooling_dev_stats *stats = cdev->stats;
+
+	spin_lock(&stats->lock);
+
+	if (stats->state == new_state)
+		goto unlock;
+
+	update_time_in_state(stats);
+	stats->trans_table[stats->state * stats->max_states + new_state]++;
+	stats->state = new_state;
+	stats->total_trans++;
+
+unlock:
+	spin_unlock(&stats->lock);
+}
+
+static ssize_t
+thermal_cooling_device_total_trans_show(struct device *dev,
+					struct device_attribute *attr,
+					char *buf)
+{
+	struct thermal_cooling_device *cdev = to_cooling_device(dev);
+	struct cooling_dev_stats *stats = cdev->stats;
+	int ret;
+
+	spin_lock(&stats->lock);
+	ret = sprintf(buf, "%u\n", stats->total_trans);
+	spin_unlock(&stats->lock);
+
+	return ret;
+}
+
+static ssize_t
+thermal_cooling_device_time_in_state_show(struct device *dev,
+					  struct device_attribute *attr,
+					  char *buf)
+{
+	struct thermal_cooling_device *cdev = to_cooling_device(dev);
+	struct cooling_dev_stats *stats = cdev->stats;
+	ssize_t len = 0;
+	int i;
+
+	spin_lock(&stats->lock);
+	update_time_in_state(stats);
+
+	for (i = 0; i < stats->max_states; i++) {
+		len += sprintf(buf + len, "state%u\t%llu\n", i,
+			       ktime_to_ms(stats->time_in_state[i]));
+	}
+	spin_unlock(&stats->lock);
+
+	return len;
+}
+
+static ssize_t
+thermal_cooling_device_reset_store(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf, size_t count)
+{
+	struct thermal_cooling_device *cdev = to_cooling_device(dev);
+	struct cooling_dev_stats *stats = cdev->stats;
+	int i, states = stats->max_states;
+
+	spin_lock(&stats->lock);
+
+	stats->total_trans = 0;
+	stats->last_time = ktime_get();
+	memset(stats->trans_table, 0,
+	       states * states * sizeof(*stats->trans_table));
+
+	for (i = 0; i < stats->max_states; i++)
+		stats->time_in_state[i] = ktime_set(0, 0);
+
+	spin_unlock(&stats->lock);
+
+	return count;
+}
+
+static ssize_t
+thermal_cooling_device_trans_table_show(struct device *dev,
+					struct device_attribute *attr,
+					char *buf)
+{
+	struct thermal_cooling_device *cdev = to_cooling_device(dev);
+	struct cooling_dev_stats *stats = cdev->stats;
+	ssize_t len = 0;
+	int i, j;
+
+	len += snprintf(buf + len, PAGE_SIZE - len, " From  :    To\n");
+	len += snprintf(buf + len, PAGE_SIZE - len, "       : ");
+	for (i = 0; i < stats->max_states; i++) {
+		if (len >= PAGE_SIZE)
+			break;
+		len += snprintf(buf + len, PAGE_SIZE - len, "state%2u  ", i);
+	}
+	if (len >= PAGE_SIZE)
+		return PAGE_SIZE;
+
+	len += snprintf(buf + len, PAGE_SIZE - len, "\n");
+
+	for (i = 0; i < stats->max_states; i++) {
+		if (len >= PAGE_SIZE)
+			break;
+
+		len += snprintf(buf + len, PAGE_SIZE - len, "state%2u:", i);
+
+		for (j = 0; j < stats->max_states; j++) {
+			if (len >= PAGE_SIZE)
+				break;
+			len += snprintf(buf + len, PAGE_SIZE - len, "%8u ",
+				stats->trans_table[i * stats->max_states + j]);
+		}
+		if (len >= PAGE_SIZE)
+			break;
+		len += snprintf(buf + len, PAGE_SIZE - len, "\n");
+	}
+
+	if (len >= PAGE_SIZE) {
+		pr_warn_once("Thermal transition table exceeds PAGE_SIZE. Disabling\n");
+		return -EFBIG;
+	}
+	return len;
+}
+
+static DEVICE_ATTR(total_trans, 0444, thermal_cooling_device_total_trans_show,
+		   NULL);
+static DEVICE_ATTR(time_in_state_ms, 0444,
+		   thermal_cooling_device_time_in_state_show, NULL);
+static DEVICE_ATTR(reset, 0200, NULL, thermal_cooling_device_reset_store);
+static DEVICE_ATTR(trans_table, 0444,
+		   thermal_cooling_device_trans_table_show, NULL);
+
+static struct attribute *cooling_device_stats_attrs[] = {
+	&dev_attr_total_trans.attr,
+	&dev_attr_time_in_state_ms.attr,
+	&dev_attr_reset.attr,
+	&dev_attr_trans_table.attr,
+	NULL
+};
+
+static const struct attribute_group cooling_device_stats_attr_group = {
+	.attrs = cooling_device_stats_attrs,
+	.name = "stats"
+};
+
+static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
+{
+	struct cooling_dev_stats *stats;
+	unsigned long states;
+	int var;
+
+	if (cdev->ops->get_max_state(cdev, &states))
+		return;
+
+	states++; /* Total number of states is highest state + 1 */
+
+	var = sizeof(*stats);
+	var += sizeof(*stats->time_in_state) * states;
+	var += sizeof(*stats->trans_table) * states * states;
+
+	stats = kzalloc(var, GFP_KERNEL);
+	if (!stats)
+		return;
+
+	stats->time_in_state = (ktime_t *)(stats + 1);
+	stats->trans_table = (unsigned int *)(stats->time_in_state + states);
+	cdev->stats = stats;
+	stats->last_time = ktime_get();
+	stats->max_states = states;
+
+	spin_lock_init(&stats->lock);
+
+	/* Fill the empty slot left in cooling_device_attr_groups */
+	var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
+	cooling_device_attr_groups[var] = &cooling_device_stats_attr_group;
+}
+
+static void cooling_device_stats_destroy(struct thermal_cooling_device *cdev)
+{
+	kfree(cdev->stats);
+	cdev->stats = NULL;
+}
+
+#else
+
+static inline void
+cooling_device_stats_setup(struct thermal_cooling_device *cdev) {}
+static inline void
+cooling_device_stats_destroy(struct thermal_cooling_device *cdev) {}
+
+#endif /* CONFIG_THERMAL_STATISTICS */
+
 void thermal_cooling_device_setup_sysfs(struct thermal_cooling_device *cdev)
 {
+	cooling_device_stats_setup(cdev);
 	cdev->device.groups = cooling_device_attr_groups;
 }
 
+void thermal_cooling_device_destroy_sysfs(struct thermal_cooling_device *cdev)
+{
+	cooling_device_stats_destroy(cdev);
+}
+
 /* these helper will be used only at the time of bindig */
 ssize_t
 thermal_cooling_device_trip_point_show(struct device *dev,
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index 8c5302374eaa..7834be668d80 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -148,6 +148,7 @@  struct thermal_cooling_device {
 	struct device device;
 	struct device_node *np;
 	void *devdata;
+	void *stats;
 	const struct thermal_cooling_device_ops *ops;
 	bool updated; /* true if the cooling device does not need update */
 	struct mutex lock; /* protect thermal_instances list */