diff mbox

[1/5] Thermal: do bind operation after thermal zone or cooling device register returns.

Message ID 1350387889-15324-2-git-send-email-hongbo.zhang@linaro.com
State New
Headers show

Commit Message

Hongbo Zhang Oct. 16, 2012, 11:44 a.m. UTC
From: "hongbo.zhang" <hongbo.zhang@linaro.com>

In the previous bind function, cdev->get_max_state(cdev, &max_state) is called
before the registration function finishes, but at this moment, the parameter
cdev at thermal driver layer isn't ready--it will get ready only after its
registration, so the the get_max_state callback cannot tell the max_state
according to the cdev input.
This problem can be fixed by separating the bind operation out of registration
and doing it when registration completely finished.

Signed-off-by: hongbo.zhang <hongbo.zhang@linaro.com>
---
 drivers/thermal/thermal_sys.c | 86 +++++++++++++++++++++++++++++--------------
 1 file changed, 58 insertions(+), 28 deletions(-)

Comments

Francesco Lavra Oct. 21, 2012, 10:05 a.m. UTC | #1
Hi,

On 10/16/2012 01:44 PM, hongbo.zhang wrote:
> From: "hongbo.zhang" <hongbo.zhang@linaro.com>
> 
> In the previous bind function, cdev->get_max_state(cdev, &max_state) is called
> before the registration function finishes, but at this moment, the parameter
> cdev at thermal driver layer isn't ready--it will get ready only after its
> registration, so the the get_max_state callback cannot tell the max_state
> according to the cdev input.
> This problem can be fixed by separating the bind operation out of registration
> and doing it when registration completely finished.

When thermal_cooling_device_register() is called, the thermal framework
assumes the cooling device is "ready", i.e. all of its ops callbacks
return meaningful results. If the cooling device is not ready at this
point, then this is a bug in the code that registers it.
Specifically, the faulty code in your case is in the cpufreq cooling
implementation, where the cooling device is registered before being
added to the internal list of cpufreq cooling devices. So, IMHO the fix
is needed there.

--
Francesco
Hongbo Zhang Oct. 23, 2012, 8:23 a.m. UTC | #2
Hi Francesco,
I found out more points about this issue.

[1]
cdev should be ready when get_max_state callback be called, otherwise
parameter cdev is useless, imagine there may be cases that
get_max_state call back is shared by more than one cooling devices of
same kind, like this:
common_get_max_state(*cdev, *state)
{
    if (cdev == cdev1)  *state = 3;
    else if (cdev == cdev)  *state = 5;
    else
}

[2]
In my cpufreq cooling case(in fact cdev is not used to calculate
max_state), the cooling_cpufreq_list should be ready when
get_max_state call back is called. In this patch I defer the binding
when registration finished, but this is not perfect now, see this
routine in cpufreq_cooling_register:

thermal_cooling_device_register;
at this time: thermal_bind_work -> get_max_state -> get NULL
cooling_cpufreq_list
and then: list_add_tail(&cpufreq_dev->node, &cooling_cpufreq_list)
This is due to we cannot know exactly when the bind work is executed.
(and this can be fixed by moving mutex_lock(&cooling_cpufreq_lock)
aheadof thermal_cooling_device_register and other corresponding
modifications, but I found another way as below)

[3]
Root cause of this problem is calling get_max_state in register -> bind routine.
Better solution is to add another parameter in cooling device register
function, also add a max_state member in struct cdev, like:
thermal_cooling_device_register(char *type, void *devdata, const
struct thermal_cooling_device_ops *ops, unsigned long max_state)
and then in the bind function:
if(cdev->max_state)
    max_state = cdev->max_state;
else
    cdev->get_max_state(cdev, &max_state)

It is common sense that the cooling driver should know its cooling
max_state(ability) before registration, and it can be offered when
register.
I think this way doesn't change both thermal and cooling layer much,
it is more clear.
I will update this patch soon.


On 21 October 2012 18:05, Francesco Lavra <francescolavra.fl@gmail.com> wrote:
> Hi,
>
> On 10/16/2012 01:44 PM, hongbo.zhang wrote:
>> From: "hongbo.zhang" <hongbo.zhang@linaro.com>
>>
>> In the previous bind function, cdev->get_max_state(cdev, &max_state) is called
>> before the registration function finishes, but at this moment, the parameter
>> cdev at thermal driver layer isn't ready--it will get ready only after its
>> registration, so the the get_max_state callback cannot tell the max_state
>> according to the cdev input.
>> This problem can be fixed by separating the bind operation out of registration
>> and doing it when registration completely finished.
>
> When thermal_cooling_device_register() is called, the thermal framework
> assumes the cooling device is "ready", i.e. all of its ops callbacks
> return meaningful results. If the cooling device is not ready at this
> point, then this is a bug in the code that registers it.
> Specifically, the faulty code in your case is in the cpufreq cooling
> implementation, where the cooling device is registered before being
> added to the internal list of cpufreq cooling devices. So, IMHO the fix
> is needed there.
>
> --
> Francesco
Francesco Lavra Oct. 23, 2012, 10:13 p.m. UTC | #3
Hi,
On 10/23/2012 10:23 AM, Hongbo Zhang wrote:
> Hi Francesco,
> I found out more points about this issue.
> 
> [1]
> cdev should be ready when get_max_state callback be called, otherwise
> parameter cdev is useless, imagine there may be cases that
> get_max_state call back is shared by more than one cooling devices of
> same kind, like this:
> common_get_max_state(*cdev, *state)
> {
>     if (cdev == cdev1)  *state = 3;
>     else if (cdev == cdev)  *state = 5;
>     else
> }
> 
> [2]
> In my cpufreq cooling case(in fact cdev is not used to calculate
> max_state), the cooling_cpufreq_list should be ready when
> get_max_state call back is called. In this patch I defer the binding
> when registration finished, but this is not perfect now, see this
> routine in cpufreq_cooling_register:
> 
> thermal_cooling_device_register;
> at this time: thermal_bind_work -> get_max_state -> get NULL
> cooling_cpufreq_list
> and then: list_add_tail(&cpufreq_dev->node, &cooling_cpufreq_list)
> This is due to we cannot know exactly when the bind work is executed.
> (and this can be fixed by moving mutex_lock(&cooling_cpufreq_lock)
> aheadof thermal_cooling_device_register and other corresponding
> modifications, but I found another way as below)
> 
> [3]
> Root cause of this problem is calling get_max_state in register -> bind routine.
> Better solution is to add another parameter in cooling device register
> function, also add a max_state member in struct cdev, like:
> thermal_cooling_device_register(char *type, void *devdata, const
> struct thermal_cooling_device_ops *ops, unsigned long max_state)
> and then in the bind function:
> if(cdev->max_state)
>     max_state = cdev->max_state;
> else
>     cdev->get_max_state(cdev, &max_state)
> 
> It is common sense that the cooling driver should know its cooling
> max_state(ability) before registration, and it can be offered when
> register.
> I think this way doesn't change both thermal and cooling layer much,
> it is more clear.
> I will update this patch soon.

I still believe the thermal layer doesn't need any change to work around
this problem, and I still believe that when a cooling device is being
registered, all of its ops should be fully functional.
The problem with the cpufreq cooling device driver is that its callbacks
use the internal list of devices to retrieve the struct
cpufreq_cooling_device instance corresponding to a given struct
thermal_cooling_device. This is not necessary, because the struct
thermal_cooling_device has a private data pointer (devdata) which in
this case is exactly a reference to the struct cpufreq_cooling_device
instance the callbacks are looking for. In fact, I think the
cooling_cpufreq_list is not necessary at all, and should be removed from
the cpufreq cooling driver.
So the 3 callbacks, instead of iterating through the device list, should
have something like:
struct cpufreq_cooling_device *cpufreq_device = cdev->devdata;
That would do the trick.

--
Francesco
Hongbo Zhang Oct. 24, 2012, 2:37 a.m. UTC | #4
On 24 October 2012 06:13, Francesco Lavra <francescolavra.fl@gmail.com> wrote:
> Hi,
> On 10/23/2012 10:23 AM, Hongbo Zhang wrote:
>> Hi Francesco,
>> I found out more points about this issue.
>>
>> [1]
>> cdev should be ready when get_max_state callback be called, otherwise
>> parameter cdev is useless, imagine there may be cases that
>> get_max_state call back is shared by more than one cooling devices of
>> same kind, like this:
>> common_get_max_state(*cdev, *state)
>> {
>>     if (cdev == cdev1)  *state = 3;
>>     else if (cdev == cdev)  *state = 5;
>>     else
>> }
>>
>> [2]
>> In my cpufreq cooling case(in fact cdev is not used to calculate
>> max_state), the cooling_cpufreq_list should be ready when
>> get_max_state call back is called. In this patch I defer the binding
>> when registration finished, but this is not perfect now, see this
>> routine in cpufreq_cooling_register:
>>
>> thermal_cooling_device_register;
>> at this time: thermal_bind_work -> get_max_state -> get NULL
>> cooling_cpufreq_list
>> and then: list_add_tail(&cpufreq_dev->node, &cooling_cpufreq_list)
>> This is due to we cannot know exactly when the bind work is executed.
>> (and this can be fixed by moving mutex_lock(&cooling_cpufreq_lock)
>> aheadof thermal_cooling_device_register and other corresponding
>> modifications, but I found another way as below)
>>
>> [3]
>> Root cause of this problem is calling get_max_state in register -> bind routine.
>> Better solution is to add another parameter in cooling device register
>> function, also add a max_state member in struct cdev, like:
>> thermal_cooling_device_register(char *type, void *devdata, const
>> struct thermal_cooling_device_ops *ops, unsigned long max_state)
>> and then in the bind function:
>> if(cdev->max_state)
>>     max_state = cdev->max_state;
>> else
>>     cdev->get_max_state(cdev, &max_state)
>>
>> It is common sense that the cooling driver should know its cooling
>> max_state(ability) before registration, and it can be offered when
>> register.
>> I think this way doesn't change both thermal and cooling layer much,
>> it is more clear.
>> I will update this patch soon.
>
> I still believe the thermal layer doesn't need any change to work around
> this problem, and I still believe that when a cooling device is being
> registered, all of its ops should be fully functional.
> The problem with the cpufreq cooling device driver is that its callbacks
> use the internal list of devices to retrieve the struct
> cpufreq_cooling_device instance corresponding to a given struct
> thermal_cooling_device. This is not necessary, because the struct
> thermal_cooling_device has a private data pointer (devdata) which in
> this case is exactly a reference to the struct cpufreq_cooling_device
> instance the callbacks are looking for. In fact, I think the
> cooling_cpufreq_list is not necessary at all, and should be removed from
> the cpufreq cooling driver.
> So the 3 callbacks, instead of iterating through the device list, should
> have something like:
> struct cpufreq_cooling_device *cpufreq_device = cdev->devdata;
> That would do the trick.

Hi Francesco,
When I found out this issue, I was hesitating to select the best
solution among several ideas.
It is clear now after talk with you, I will send patch for cpufreq
cooling layer.
Thank you.
>
> --
> Francesco
diff mbox

Patch

diff --git a/drivers/thermal/thermal_sys.c b/drivers/thermal/thermal_sys.c
index 9ee42ca..dd3d024 100644
--- a/drivers/thermal/thermal_sys.c
+++ b/drivers/thermal/thermal_sys.c
@@ -70,6 +70,8 @@  static LIST_HEAD(thermal_tz_list);
 static LIST_HEAD(thermal_cdev_list);
 static DEFINE_MUTEX(thermal_list_lock);
 
+static struct work_struct thermal_bind;
+
 static int get_idr(struct idr *idr, struct mutex *lock, int *id)
 {
 	int err;
@@ -777,7 +779,7 @@  int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
 	dev->lower = lower;
 	dev->target = THERMAL_NO_TARGET;
 
-	result = get_idr(&tz->idr, &tz->lock, &dev->id);
+	result = get_idr(&tz->idr, NULL, &dev->id);
 	if (result)
 		goto free_mem;
 
@@ -796,7 +798,7 @@  int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
 	if (result)
 		goto remove_symbol_link;
 
-	mutex_lock(&tz->lock);
+	/* tz->lock should have been locked outside this function */
 	mutex_lock(&cdev->lock);
 	list_for_each_entry(pos, &tz->thermal_instances, tz_node)
 	    if (pos->tz == tz && pos->trip == trip && pos->cdev == cdev) {
@@ -808,7 +810,6 @@  int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
 		list_add_tail(&dev->cdev_node, &cdev->thermal_instances);
 	}
 	mutex_unlock(&cdev->lock);
-	mutex_unlock(&tz->lock);
 
 	if (!result)
 		return 0;
@@ -895,7 +896,6 @@  thermal_cooling_device_register(char *type, void *devdata,
 				const struct thermal_cooling_device_ops *ops)
 {
 	struct thermal_cooling_device *cdev;
-	struct thermal_zone_device *pos;
 	int result;
 
 	if (type && strlen(type) >= THERMAL_NAME_LENGTH)
@@ -947,16 +947,10 @@  thermal_cooling_device_register(char *type, void *devdata,
 
 	mutex_lock(&thermal_list_lock);
 	list_add(&cdev->node, &thermal_cdev_list);
-	list_for_each_entry(pos, &thermal_tz_list, node) {
-		if (!pos->ops->bind)
-			continue;
-		result = pos->ops->bind(pos, cdev);
-		if (result)
-			break;
-
-	}
 	mutex_unlock(&thermal_list_lock);
 
+	schedule_work(&thermal_bind);
+
 	if (!result)
 		return cdev;
 
@@ -1141,19 +1135,13 @@  static void thermal_zone_trip_update(struct thermal_zone_device *tz,
 
 	return;
 }
-/**
- * thermal_zone_device_update - force an update of a thermal zone's state
- * @ttz:	the thermal zone to update
- */
 
-void thermal_zone_device_update(struct thermal_zone_device *tz)
+void __thermal_zone_device_update(struct thermal_zone_device *tz)
 {
 	int count, ret = 0;
 	long temp, trip_temp;
 	enum thermal_trip_type trip_type;
 
-	mutex_lock(&tz->lock);
-
 	if (tz->ops->get_temp(tz, &temp)) {
 		/* get_temp failed - retry it later */
 		pr_warn("failed to read out thermal zone %d\n", tz->id);
@@ -1206,10 +1194,56 @@  leave:
 		thermal_zone_device_set_polling(tz, tz->polling_delay);
 	else
 		thermal_zone_device_set_polling(tz, 0);
+}
+
+/**
+ * thermal_zone_device_update - force an update of a thermal zone's state
+ * @tz:	the thermal zone to update
+ */
+void thermal_zone_device_update(struct thermal_zone_device *tz)
+{
+	mutex_lock(&tz->lock);
+
+	__thermal_zone_device_update(tz);
+
 	mutex_unlock(&tz->lock);
 }
 EXPORT_SYMBOL(thermal_zone_device_update);
 
+static void thermal_zone_do_bind_work(struct work_struct *work)
+{
+	struct thermal_instance *instance;
+	struct thermal_zone_device *tz;
+	struct thermal_cooling_device *cdev;
+
+	mutex_lock(&thermal_list_lock);
+
+	list_for_each_entry(tz, &thermal_tz_list, node)
+		list_for_each_entry(cdev, &thermal_cdev_list, node) {
+
+			mutex_lock(&tz->lock);
+
+			if (list_empty(&tz->thermal_instances)
+				&& tz->ops->bind) {
+				tz->ops->bind(tz, cdev);
+				__thermal_zone_device_update(tz);
+				mutex_unlock(&tz->lock);
+				break;
+			}
+
+			list_for_each_entry(instance, &tz->thermal_instances,
+				tz_node)
+				if (instance->cdev != cdev && tz->ops->bind) {
+					tz->ops->bind(tz, cdev);
+					__thermal_zone_device_update(tz);
+				}
+
+			mutex_unlock(&tz->lock);
+		}
+
+	mutex_unlock(&thermal_list_lock);
+}
+
 /**
  * create_trip_attrs - create attributes for trip points
  * @tz:		the thermal zone device
@@ -1335,7 +1369,6 @@  struct thermal_zone_device *thermal_zone_device_register(const char *type,
 	int passive_delay, int polling_delay)
 {
 	struct thermal_zone_device *tz;
-	struct thermal_cooling_device *pos;
 	enum thermal_trip_type trip_type;
 	int result;
 	int count;
@@ -1419,17 +1452,12 @@  struct thermal_zone_device *thermal_zone_device_register(const char *type,
 
 	mutex_lock(&thermal_list_lock);
 	list_add_tail(&tz->node, &thermal_tz_list);
-	if (ops->bind)
-		list_for_each_entry(pos, &thermal_cdev_list, node) {
-		result = ops->bind(tz, pos);
-		if (result)
-			break;
-		}
 	mutex_unlock(&thermal_list_lock);
 
-	INIT_DELAYED_WORK(&(tz->poll_queue), thermal_zone_device_check);
+	if (ops->bind)
+		schedule_work(&thermal_bind);
 
-	thermal_zone_device_update(tz);
+	INIT_DELAYED_WORK(&(tz->poll_queue), thermal_zone_device_check);
 
 	if (!result)
 		return tz;
@@ -1588,6 +1616,7 @@  static int __init thermal_init(void)
 {
 	int result = 0;
 
+	INIT_WORK(&thermal_bind, thermal_zone_do_bind_work);
 	result = class_register(&thermal_class);
 	if (result) {
 		idr_destroy(&thermal_tz_idr);
@@ -1601,6 +1630,7 @@  static int __init thermal_init(void)
 
 static void __exit thermal_exit(void)
 {
+	cancel_work_sync(&thermal_bind);
 	class_unregister(&thermal_class);
 	idr_destroy(&thermal_tz_idr);
 	idr_destroy(&thermal_cdev_idr);