diff mbox series

thermal: devfreq_cooling: use local ops instead of global ops

Message ID 20220325094436.101419-1-kant@allwinnertech.com
State Accepted
Commit b947769b8f778db130aad834257fcaca25df2edc
Headers show
Series thermal: devfreq_cooling: use local ops instead of global ops | expand

Commit Message

Kant Fan March 25, 2022, 9:44 a.m. UTC
commit 7b62935828266658714f81d4e9176edad808dc70 upstream.

Fix access illegal address problem in following condition:
There are muti devfreq cooling devices in system, some of them register
with dfc_power but other does not, power model ops such as state2power will
append to global devfreq_cooling_ops when the cooling device with
dfc_power register. It makes the cooling device without dfc_power
also use devfreq_cooling_ops after appending when register later by
of_devfreq_cooling_register_power() or of_devfreq_cooling_register().

IPA governor regards the cooling devices without dfc_power as a power actor
because they also have power model ops, and will access illegal address at
dfc->power_ops when execute cdev->ops->get_requested_power or
cdev->ops->power2state. As the calltrace below shows:

Unable to handle kernel NULL pointer dereference at virtual address
00000008
...
calltrace:
[<c06e5488>] devfreq_cooling_power2state+0x24/0x184
[<c06df420>] power_actor_set_power+0x54/0xa8
[<c06e3774>] power_allocator_throttle+0x770/0x97c
[<c06dd120>] handle_thermal_trip+0x1b4/0x26c
[<c06ddb48>] thermal_zone_device_update+0x154/0x208
[<c014159c>] process_one_work+0x1ec/0x36c
[<c0141c58>] worker_thread+0x204/0x2ec
[<c0146788>] kthread+0x140/0x154
[<c01010e8>] ret_from_fork+0x14/0x2c

Fixes: a76caf55e5b35 ("thermal: Add devfreq cooling")
Cc: stable@vger.kernel.org # 4.4+
Signed-off-by: Kant Fan <kant@allwinnertech.com>
---
 drivers/thermal/devfreq_cooling.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

Comments

Lukasz Luba April 20, 2022, 10:32 a.m. UTC | #1
Hi Kant,

On 4/19/22 16:49, Kant Fan wrote:
> On 29/03/2022 14:59, Lukasz Luba wrote:
>>
>>
>> On 3/25/22 09:44, Kant Fan wrote:
>>> commit 7b62935828266658714f81d4e9176edad808dc70 upstream.
>>>
>>> Fix access illegal address problem in following condition:
>>> There are muti devfreq cooling devices in system, some of them register
>>> with dfc_power but other does not, power model ops such as 
>>> state2power will
>>> append to global devfreq_cooling_ops when the cooling device with
>>> dfc_power register. It makes the cooling device without dfc_power
>>> also use devfreq_cooling_ops after appending when register later by
>>> of_devfreq_cooling_register_power() or of_devfreq_cooling_register().
>>>
>>> IPA governor regards the cooling devices without dfc_power as a power 
>>> actor
>>> because they also have power model ops, and will access illegal 
>>> address at
>>> dfc->power_ops when execute cdev->ops->get_requested_power or
>>> cdev->ops->power2state. As the calltrace below shows:
>>>
>>> Unable to handle kernel NULL pointer dereference at virtual address
>>> 00000008
>>> ...
>>> calltrace:
>>> [<c06e5488>] devfreq_cooling_power2state+0x24/0x184
>>> [<c06df420>] power_actor_set_power+0x54/0xa8
>>> [<c06e3774>] power_allocator_throttle+0x770/0x97c
>>> [<c06dd120>] handle_thermal_trip+0x1b4/0x26c
>>> [<c06ddb48>] thermal_zone_device_update+0x154/0x208
>>> [<c014159c>] process_one_work+0x1ec/0x36c
>>> [<c0141c58>] worker_thread+0x204/0x2ec
>>> [<c0146788>] kthread+0x140/0x154
>>> [<c01010e8>] ret_from_fork+0x14/0x2c
>>>
>>> Fixes: a76caf55e5b35 ("thermal: Add devfreq cooling")
>>> Cc: stable@vger.kernel.org # 4.4+
>>> Signed-off-by: Kant Fan <kant@allwinnertech.com>
>>> ---
>>>   drivers/thermal/devfreq_cooling.c | 25 ++++++++++++++++++-------
>>>   1 file changed, 18 insertions(+), 7 deletions(-)
>>>
>>
>> Looks good. So this patch should be applied for all stable
>> kernels starting from v4.4 to v5.12 (the v5.13 and later need
>> other patch).
>>
>> Next time you might use in the subject something like:
>> [PATCH 4.4] thermal: devfreq_cooling: use local ops instead of global ops
>> It would be better distinguished from your other patch with the
>> same subject, which was for mainline and v5.13+
> 
> Hi Lukasz,
> Thank you for the guidance. I want to know if I'm understanding you in a 
> right way. Could you confirm the following information?
> 
> 1. The stable patches
> After the patch is merged into mainline later, I'll submit the following 
> patches individually for v4.4 ~ v5.12:

Correct, after it gets mainline you can point to that commit hash and
process with those patches. I don't now which of those older stable
kernels are still maintained, since some of them have longer support
and the rest had shorter and might already ended. You can check the
end of life for those 'Longterm' here [1]. AFAICS the 4.4 is not in that
table, so you can start from 4.9, should be OK.
So the list of needed patches would be for those stable kernels:
4.9, 4.14, 4.19, 5.4, 5.10
I can see that last release for 5.11.x was in May 2021, so it's probably
ended, similar for 5.12.x (Jul 2021). That's why I suggested that list
for the long support kernels.

> 
> [PATCH 4.4] thermal: devfreq_cooling: use local ops instead of global ops
> [PATCH 4.5] thermal: devfreq_cooling: use local ops instead of global ops
> ...
> [PATCH 5.12] thermal: devfreq_cooling: use local ops instead of global ops
> 
> And also the following patches individually for v5.13+ :

For this, you probably don't have to. You have added 'v5.13+' in the
original patch v2, so it will be picked correctly. It should apply
on those stable kernels w/o issues. If there will be, stable kernel
engineers will ping us.

> [PATCH 5.13] thermal: devfreq_cooling: use local ops instead of global ops
> [PATCH 5.14] thermal: devfreq_cooling: use local ops instead of global ops
> ...
> [PATCH 5.17] thermal: devfreq_cooling: use local ops instead of global ops
> 
> 2. The mainline patch
> I saw your mail with Rafael, seems there are conflicts... I wonder if 
> there's anything wrong with my patch, or anything I can help?
> 

Thank you for offering help. Rafael solved that correctly, so it doesn't
need any more work.

Thank you for doing that work!

Regards,
Lukasz

[1] https://www.kernel.org/category/releases.html
Kant Fan April 23, 2022, 10:49 a.m. UTC | #2
On 20/04/2022 18:32, Lukasz Luba wrote:
> Hi Kant,
> 
> On 4/19/22 16:49, Kant Fan wrote:
>> On 29/03/2022 14:59, Lukasz Luba wrote:
>>>
>>>
>>> On 3/25/22 09:44, Kant Fan wrote:
>>>> commit 7b62935828266658714f81d4e9176edad808dc70 upstream.
>>>>
>>>> Fix access illegal address problem in following condition:
>>>> There are muti devfreq cooling devices in system, some of them register
>>>> with dfc_power but other does not, power model ops such as 
>>>> state2power will
>>>> append to global devfreq_cooling_ops when the cooling device with
>>>> dfc_power register. It makes the cooling device without dfc_power
>>>> also use devfreq_cooling_ops after appending when register later by
>>>> of_devfreq_cooling_register_power() or of_devfreq_cooling_register().
>>>>
>>>> IPA governor regards the cooling devices without dfc_power as a 
>>>> power actor
>>>> because they also have power model ops, and will access illegal 
>>>> address at
>>>> dfc->power_ops when execute cdev->ops->get_requested_power or
>>>> cdev->ops->power2state. As the calltrace below shows:
>>>>
>>>> Unable to handle kernel NULL pointer dereference at virtual address
>>>> 00000008
>>>> ...
>>>> calltrace:
>>>> [<c06e5488>] devfreq_cooling_power2state+0x24/0x184
>>>> [<c06df420>] power_actor_set_power+0x54/0xa8
>>>> [<c06e3774>] power_allocator_throttle+0x770/0x97c
>>>> [<c06dd120>] handle_thermal_trip+0x1b4/0x26c
>>>> [<c06ddb48>] thermal_zone_device_update+0x154/0x208
>>>> [<c014159c>] process_one_work+0x1ec/0x36c
>>>> [<c0141c58>] worker_thread+0x204/0x2ec
>>>> [<c0146788>] kthread+0x140/0x154
>>>> [<c01010e8>] ret_from_fork+0x14/0x2c
>>>>
>>>> Fixes: a76caf55e5b35 ("thermal: Add devfreq cooling")
>>>> Cc: stable@vger.kernel.org # 4.4+
>>>> Signed-off-by: Kant Fan <kant@allwinnertech.com>
>>>> ---
>>>>   drivers/thermal/devfreq_cooling.c | 25 ++++++++++++++++++-------
>>>>   1 file changed, 18 insertions(+), 7 deletions(-)
>>>>
>>>
>>> Looks good. So this patch should be applied for all stable
>>> kernels starting from v4.4 to v5.12 (the v5.13 and later need
>>> other patch).
>>>
>>> Next time you might use in the subject something like:
>>> [PATCH 4.4] thermal: devfreq_cooling: use local ops instead of global 
>>> ops
>>> It would be better distinguished from your other patch with the
>>> same subject, which was for mainline and v5.13+
>>
>> Hi Lukasz,
>> Thank you for the guidance. I want to know if I'm understanding you in 
>> a right way. Could you confirm the following information?
>>
>> 1. The stable patches
>> After the patch is merged into mainline later, I'll submit the 
>> following patches individually for v4.4 ~ v5.12:
> 
> Correct, after it gets mainline you can point to that commit hash and
> process with those patches. I don't now which of those older stable
> kernels are still maintained, since some of them have longer support
> and the rest had shorter and might already ended. You can check the
> end of life for those 'Longterm' here [1]. AFAICS the 4.4 is not in that
> table, so you can start from 4.9, should be OK.
> So the list of needed patches would be for those stable kernels:
> 4.9, 4.14, 4.19, 5.4, 5.10
> I can see that last release for 5.11.x was in May 2021, so it's probably
> ended, similar for 5.12.x (Jul 2021). That's why I suggested that list
> for the long support kernels.
> 

Hi Lukasz,
Thanks for figuring it out. I'll check the stable versions carefully.

>>
>> [PATCH 4.4] thermal: devfreq_cooling: use local ops instead of global ops
>> [PATCH 4.5] thermal: devfreq_cooling: use local ops instead of global ops
>> ...
>> [PATCH 5.12] thermal: devfreq_cooling: use local ops instead of global 
>> ops
>>
>> And also the following patches individually for v5.13+ :
> 
> For this, you probably don't have to. You have added 'v5.13+' in the
> original patch v2, so it will be picked correctly. It should apply
> on those stable kernels w/o issues. If there will be, stable kernel
> engineers will ping us.
> 
>> [PATCH 5.13] thermal: devfreq_cooling: use local ops instead of global 
>> ops
>> [PATCH 5.14] thermal: devfreq_cooling: use local ops instead of global 
>> ops
>> ...
>> [PATCH 5.17] thermal: devfreq_cooling: use local ops instead of global 
>> ops
>>
>> 2. The mainline patch
>> I saw your mail with Rafael, seems there are conflicts... I wonder if 
>> there's anything wrong with my patch, or anything I can help?
>>
> 
> Thank you for offering help. Rafael solved that correctly, so it doesn't
> need any more work.
> 
> Thank you for doing that work!
> 
> Regards,
> Lukasz
> 
> [1] https://www.kernel.org/category/releases.html

No problem. I'll submit the stable patches after the mainline patch is 
merged.
diff mbox series

Patch

diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c
index dfab49a67252..d36f70513e6a 100644
--- a/drivers/thermal/devfreq_cooling.c
+++ b/drivers/thermal/devfreq_cooling.c
@@ -462,22 +462,29 @@  of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
 {
 	struct thermal_cooling_device *cdev;
 	struct devfreq_cooling_device *dfc;
+	struct thermal_cooling_device_ops *ops;
 	char dev_name[THERMAL_NAME_LENGTH];
 	int err;
 
-	dfc = kzalloc(sizeof(*dfc), GFP_KERNEL);
-	if (!dfc)
+	ops = kmemdup(&devfreq_cooling_ops, sizeof(*ops), GFP_KERNEL);
+	if (!ops)
 		return ERR_PTR(-ENOMEM);
 
+	dfc = kzalloc(sizeof(*dfc), GFP_KERNEL);
+	if (!dfc) {
+		err = -ENOMEM;
+		goto free_ops;
+	}
+
 	dfc->devfreq = df;
 
 	if (dfc_power) {
 		dfc->power_ops = dfc_power;
 
-		devfreq_cooling_ops.get_requested_power =
+		ops->get_requested_power =
 			devfreq_cooling_get_requested_power;
-		devfreq_cooling_ops.state2power = devfreq_cooling_state2power;
-		devfreq_cooling_ops.power2state = devfreq_cooling_power2state;
+		ops->state2power = devfreq_cooling_state2power;
+		ops->power2state = devfreq_cooling_power2state;
 	}
 
 	err = devfreq_cooling_gen_tables(dfc);
@@ -497,8 +504,7 @@  of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
 
 	snprintf(dev_name, sizeof(dev_name), "thermal-devfreq-%d", dfc->id);
 
-	cdev = thermal_of_cooling_device_register(np, dev_name, dfc,
-						  &devfreq_cooling_ops);
+	cdev = thermal_of_cooling_device_register(np, dev_name, dfc, ops);
 	if (IS_ERR(cdev)) {
 		err = PTR_ERR(cdev);
 		dev_err(df->dev.parent,
@@ -522,6 +528,8 @@  of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df,
 	kfree(dfc->freq_table);
 free_dfc:
 	kfree(dfc);
+free_ops:
+	kfree(ops);
 
 	return ERR_PTR(err);
 }
@@ -557,10 +565,12 @@  EXPORT_SYMBOL_GPL(devfreq_cooling_register);
 void devfreq_cooling_unregister(struct thermal_cooling_device *cdev)
 {
 	struct devfreq_cooling_device *dfc;
+	const struct thermal_cooling_device_ops *ops;
 
 	if (!cdev)
 		return;
 
+	ops = cdev->ops;
 	dfc = cdev->devdata;
 
 	thermal_cooling_device_unregister(dfc->cdev);
@@ -570,5 +580,6 @@  void devfreq_cooling_unregister(struct thermal_cooling_device *cdev)
 	kfree(dfc->freq_table);
 
 	kfree(dfc);
+	kfree(ops);
 }
 EXPORT_SYMBOL_GPL(devfreq_cooling_unregister);