Message ID | 20240704-topic-sm8x50-upstream-fix-battmgr-temp-tz-warn-v1-1-9d66d6f6efde@linaro.org |
---|---|
State | New |
Headers | show |
Series | power: supply: core: return -EAGAIN on uninitialized read temp | expand |
On 05/07/2024 10:08, Daniel Lezcano wrote: > On 05/07/2024 07:56, Krzysztof Kozlowski wrote: >> On 04/07/2024 18:41, Daniel Lezcano wrote: >>> On 04/07/2024 10:52, Neil Armstrong wrote: >>>> If the thermal core tries to update the temperature from an >>>> uninitialized power supply, it will swawn the following warning: >>>> thermal thermal_zoneXX: failed to read out thermal zone (-19) >>>> >>>> But reading from an uninitialized power supply should not be >>>> considered as a fatal error, but the thermal core expects >>>> the -EAGAIN error to be returned in this particular case. >>>> >>>> So convert -ENODEV as -EAGAIN to express the fact that reading >>>> temperature from an uninitialized power supply shouldn't be >>>> a fatal error, but should indicate to the thermal zone it should >>>> retry later. >>>> >>>> It notably removes such messages on Qualcomm platforms using the >>>> qcom_battmgr driver spawning warnings until the aDSP firmware >>>> gets up and the battery manager reports valid data. >>> >>> Is it possible to have the aDSP firmware ready first ? >> >> I don't think so. ADSP firmware is a file, so as every firmware it can >> be loaded from rootfs, not initramfs (unlike this driver), or even missing. > > Ok, said differently, can't we initialize the thermal zone after the firmware is loaded ? This is the goal, but this can't be a fix but a proper rework. > I think changing power_supply_core.c is not the right solution. qcom_battmgr_bat_get_property() should return -EAGAIN instead of -ENODEV. Neil
On 15/07/2024 11:30, Neil Armstrong wrote: > On 05/07/2024 10:08, Daniel Lezcano wrote: >> On 05/07/2024 07:56, Krzysztof Kozlowski wrote: >>> On 04/07/2024 18:41, Daniel Lezcano wrote: >>>> On 04/07/2024 10:52, Neil Armstrong wrote: >>>>> If the thermal core tries to update the temperature from an >>>>> uninitialized power supply, it will swawn the following warning: >>>>> thermal thermal_zoneXX: failed to read out thermal zone (-19) >>>>> >>>>> But reading from an uninitialized power supply should not be >>>>> considered as a fatal error, but the thermal core expects >>>>> the -EAGAIN error to be returned in this particular case. >>>>> >>>>> So convert -ENODEV as -EAGAIN to express the fact that reading >>>>> temperature from an uninitialized power supply shouldn't be >>>>> a fatal error, but should indicate to the thermal zone it should >>>>> retry later. >>>>> >>>>> It notably removes such messages on Qualcomm platforms using the >>>>> qcom_battmgr driver spawning warnings until the aDSP firmware >>>>> gets up and the battery manager reports valid data. >>>> >>>> Is it possible to have the aDSP firmware ready first ? >>> >>> I don't think so. ADSP firmware is a file, so as every firmware it can >>> be loaded from rootfs, not initramfs (unlike this driver), or even >>> missing. >> >> Ok, said differently, can't we initialize the thermal zone after the >> firmware is loaded ? > > This is the goal, but this can't be a fix but a proper rework. Right, it is a design issue and we are finding this problem in several drivers using the thermal zone. Unfortunately that forces the thermal core to do cumbersome mechanisms because of this and obviously it is a friction for thermal core cleanups / rework. IOW, bad driver design => thermal core impacted. > I think changing power_supply_core.c is not the right solution. From my POV, it is the right solution but I agree it could take a cycle or more to fix. > qcom_battmgr_bat_get_property() should return -EAGAIN instead of > -ENODEV. Yes, we can do that in the first place and come back to solve this firmware / async issue in a more generic way later
diff --git a/drivers/power/supply/power_supply_core.c b/drivers/power/supply/power_supply_core.c index 8f6025acd10a..b38bff4dbfc7 100644 --- a/drivers/power/supply/power_supply_core.c +++ b/drivers/power/supply/power_supply_core.c @@ -1287,8 +1287,13 @@ static int power_supply_read_temp(struct thermal_zone_device *tzd, WARN_ON(tzd == NULL); psy = thermal_zone_device_priv(tzd); ret = power_supply_get_property(psy, POWER_SUPPLY_PROP_TEMP, &val); + /* + * The thermal core expects -EAGAIN as non-fatal error, + * convert -ENODEV as -EAGAIN since -ENODEV is returned + * when a power supply device isn't initialized + */ if (ret) - return ret; + return ret == -ENODEV ? -EAGAIN : ret; /* Convert tenths of degree Celsius to milli degree Celsius. */ *temp = val.intval * 100;
If the thermal core tries to update the temperature from an uninitialized power supply, it will swawn the following warning: thermal thermal_zoneXX: failed to read out thermal zone (-19) But reading from an uninitialized power supply should not be considered as a fatal error, but the thermal core expects the -EAGAIN error to be returned in this particular case. So convert -ENODEV as -EAGAIN to express the fact that reading temperature from an uninitialized power supply shouldn't be a fatal error, but should indicate to the thermal zone it should retry later. It notably removes such messages on Qualcomm platforms using the qcom_battmgr driver spawning warnings until the aDSP firmware gets up and the battery manager reports valid data. Link: https://lore.kernel.org/all/2ed4c630-204a-4f80-a37f-f2ca838eb455@linaro.org/ Fixes: 5bc28b93a36e ("power_supply: power_supply_read_temp only if use_cnt > 0") Fixes: 3be330bf8860 ("power_supply: Register battery as a thermal zone") Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> --- drivers/power/supply/power_supply_core.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) --- base-commit: 82e4255305c554b0bb18b7ccf2db86041b4c8b6e change-id: 20240704-topic-sm8x50-upstream-fix-battmgr-temp-tz-warn-077166861efb Best regards,