Message ID | 20220118185612.2067031-1-bjorn.andersson@linaro.org |
---|---|
State | New |
Headers | show |
Series | [1/2] cpufreq: qcom-hw: Use initialized cpumask for thermal pressure update | expand |
On 18-01-22, 10:56, Bjorn Andersson wrote: > In the event that the SoC is under thermal pressure while booting it's > possible for the dcvs notification to happen inbetween the cpufreq > framework calling init and it actually updating the policy's > related_cpus cpumask. > > Prior to the introduction of the thermal pressure update helper an empty > cpumask would simply result in the thermal pressure of no cpus being > updated, but the new code will attempt to dereference an invalid per_cpu > variable. > > Avoid this problem by using the policy's cpus cpumask instead of the > related_cpus mask, as this is initialized before the interrupt is > registered. > > Fixes: 0258cb19c77d ("cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function") > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> > --- > drivers/cpufreq/qcom-cpufreq-hw.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c > index 05f3d7876e44..866fba3ac6fc 100644 > --- a/drivers/cpufreq/qcom-cpufreq-hw.c > +++ b/drivers/cpufreq/qcom-cpufreq-hw.c > @@ -297,7 +297,7 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data) > throttled_freq = freq_hz / HZ_PER_KHZ; > > /* Update thermal pressure (the boost frequencies are accepted) */ > - arch_update_thermal_pressure(policy->related_cpus, throttled_freq); > + arch_update_thermal_pressure(policy->cpus, throttled_freq); policy->cpus keeps on changing with CPU hotplug and this can leave your platform in an inconsistent state. For example, in case where you offline a CPU from policy, other CPUs get their thermal pressure updated, online the CPU back and all CPUs of a policy don't have the same settings anymore. There are few things we can do here now: - Check for empty related_cpus and return early. Since related_cpus is updated only once, this shall work just fine and must not be racy. While at it, I think we can also do something like this in topology_update_thermal_pressure() instead: cpu = cpumask_first(cpus); if (unlikely(cpu >= NR_CPUS)) return; - And while writing this email, I dropped all other ideas in favor of change to topology_update_thermal_pressure() :) -- viresh
On 19-01-22, 12:05, Viresh Kumar wrote: > policy->cpus keeps on changing with CPU hotplug and this can leave > your platform in an inconsistent state. For example, in case where you > offline a CPU from policy, other CPUs get their thermal pressure > updated, online the CPU back and all CPUs of a policy don't have the > same settings anymore. > > There are few things we can do here now: > > - Check for empty related_cpus and return early. Since related_cpus is > updated only once, this shall work just fine and must not be racy. > > While at it, I think we can also do something like this in > topology_update_thermal_pressure() instead: > > cpu = cpumask_first(cpus); > if (unlikely(cpu >= NR_CPUS)) > return; > > - And while writing this email, I dropped all other ideas in favor of > change to topology_update_thermal_pressure() :) And then I saw your second patch, which looks good as otherwise we will not be able to catch the bug in our system where we are sending the empty cpumask :) So the other idea is: - Revert, or bring back a new version of this and register the interrupt from there. But that is also not a very clean solution. commit 4bf8e582119e ("cpufreq: Remove ready() callback") -
On Tue, Jan 18, 2022 at 10:56:12AM -0800, Bjorn Andersson wrote: > Occasionally during boot the Qualcomm cpufreq driver was able to cause > an invalid memory access in topology_update_thermal_pressure() on the > line: > > if (max_freq <= capped_freq) > > It turns out that this was caused by a race, which resulted in the > cpumask passed to the function being empty, in which case > cpumask_first() will return a cpu beyond the number of valid cpus, which > when used to access the per_cpu max_freq would return invalid pointer. > > The bug in the Qualcomm cpufreq driver is being fixed, but having a > sanity check of the arguments would have saved quite a bit of time and > it's not unlikely that others will run into the same issue. > > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> > --- > drivers/base/arch_topology.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c > index 976154140f0b..6560a0c3b969 100644 > --- a/drivers/base/arch_topology.c > +++ b/drivers/base/arch_topology.c > @@ -177,6 +177,9 @@ void topology_update_thermal_pressure(const struct cpumask *cpus, > u32 max_freq; > int cpu; > > + if (WARN_ON(cpumask_empty(cpus))) > + return; Sorry, but I do not want to add any more WARN_ON() calls to the kernel unless really needed. We don't try to save the kernel from itself all the time by validating every internal api call parameters. thjanks, greg k-h
On Tue, Jan 18, 2022 at 10:56:12AM -0800, Bjorn Andersson wrote: > Occasionally during boot the Qualcomm cpufreq driver was able to cause > an invalid memory access in topology_update_thermal_pressure() on the > line: > > if (max_freq <= capped_freq) > > It turns out that this was caused by a race, which resulted in the > cpumask passed to the function being empty, in which case > cpumask_first() will return a cpu beyond the number of valid cpus, which > when used to access the per_cpu max_freq would return invalid pointer. > > The bug in the Qualcomm cpufreq driver is being fixed, but having a > sanity check of the arguments would have saved quite a bit of time and > it's not unlikely that others will run into the same issue. > > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> > --- > drivers/base/arch_topology.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c > index 976154140f0b..6560a0c3b969 100644 > --- a/drivers/base/arch_topology.c > +++ b/drivers/base/arch_topology.c > @@ -177,6 +177,9 @@ void topology_update_thermal_pressure(const struct cpumask *cpus, > u32 max_freq; > int cpu; > > + if (WARN_ON(cpumask_empty(cpus))) > + return; > + Why can't the caller check and call this only when cpus is not empty ? IIUC there are many such APIs that use cpumask and could result in similar issues if called with empty cpus. Probably we could add a note that cpus must not be empty if that helps the callers ?
On Tue 18 Jan 22:40 PST 2022, Viresh Kumar wrote: > On 19-01-22, 12:05, Viresh Kumar wrote: > > policy->cpus keeps on changing with CPU hotplug and this can leave > > your platform in an inconsistent state. For example, in case where you > > offline a CPU from policy, other CPUs get their thermal pressure > > updated, online the CPU back and all CPUs of a policy don't have the > > same settings anymore. > > Oh, I didn't know that. Then my proposal doesn't seem that awesome. > > There are few things we can do here now: > > > > - Check for empty related_cpus and return early. Since related_cpus is > > updated only once, this shall work just fine and must not be racy. > > > > While at it, I think we can also do something like this in > > topology_update_thermal_pressure() instead: > > > > cpu = cpumask_first(cpus); > > if (unlikely(cpu >= NR_CPUS)) > > return; > > > > - And while writing this email, I dropped all other ideas in favor of > > change to topology_update_thermal_pressure() :) > > And then I saw your second patch, which looks good as otherwise we > will not be able to catch the bug in our system where we are sending > the empty cpumask :) > > So the other idea is: > > - Revert, or bring back a new version of this and register the > interrupt from there. But that is also not a very clean solution. > > commit 4bf8e582119e ("cpufreq: Remove ready() callback") > We could do this and keep the interrupt disabled until we hit ready(). But I found the resulting issue non-trivial to debug, so I would prefer if arch_update_thermal_pressure() dealt with the empty cpumask. So as you suggest in your first reply, I'll respin the second patch alone, without the WARN_ON(). Thanks, Bjorn
On Wed 19 Jan 06:43 PST 2022, Sudeep Holla wrote: > On Tue, Jan 18, 2022 at 10:56:12AM -0800, Bjorn Andersson wrote: > > Occasionally during boot the Qualcomm cpufreq driver was able to cause > > an invalid memory access in topology_update_thermal_pressure() on the > > line: > > > > if (max_freq <= capped_freq) > > > > It turns out that this was caused by a race, which resulted in the > > cpumask passed to the function being empty, in which case > > cpumask_first() will return a cpu beyond the number of valid cpus, which > > when used to access the per_cpu max_freq would return invalid pointer. > > > > The bug in the Qualcomm cpufreq driver is being fixed, but having a > > sanity check of the arguments would have saved quite a bit of time and > > it's not unlikely that others will run into the same issue. > > > > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> > > --- > > drivers/base/arch_topology.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c > > index 976154140f0b..6560a0c3b969 100644 > > --- a/drivers/base/arch_topology.c > > +++ b/drivers/base/arch_topology.c > > @@ -177,6 +177,9 @@ void topology_update_thermal_pressure(const struct cpumask *cpus, > > u32 max_freq; > > int cpu; > > > > + if (WARN_ON(cpumask_empty(cpus))) > > + return; > > + > > Why can't the caller check and call this only when cpus is not empty ? > IIUC there are many such APIs that use cpumask and could result in similar > issues if called with empty cpus. Probably we could add a note that cpus > must not be empty if that helps the callers ? > As indicated in the commit message, it took me a while to conclude that the cause for a memory fault on what seemed to be a comparison between two variables on the stack was actually caused by this race - which isn't trivially reproducible, unless you know what the bug is. Now _I_ know better and will hopefully recognize the oops signature right away, but my hope was to put the sanity check on this side to save the next caller of this API some time. Updating the comment probably would have saved me a minute or two at the end, probably as confirmation of my findings after the fact... If you prefer to keep topology_update_thermal_pressure() clean(er) and exciting I can hack around the issue in the Qualcomm driver. PS. I'm onboard with Greg's objection to the WARN_ON()... Regards, Bjorn
diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c index 05f3d7876e44..866fba3ac6fc 100644 --- a/drivers/cpufreq/qcom-cpufreq-hw.c +++ b/drivers/cpufreq/qcom-cpufreq-hw.c @@ -297,7 +297,7 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data) throttled_freq = freq_hz / HZ_PER_KHZ; /* Update thermal pressure (the boost frequencies are accepted) */ - arch_update_thermal_pressure(policy->related_cpus, throttled_freq); + arch_update_thermal_pressure(policy->cpus, throttled_freq); /* * In the unlikely case policy is unregistered do not enable
In the event that the SoC is under thermal pressure while booting it's possible for the dcvs notification to happen inbetween the cpufreq framework calling init and it actually updating the policy's related_cpus cpumask. Prior to the introduction of the thermal pressure update helper an empty cpumask would simply result in the thermal pressure of no cpus being updated, but the new code will attempt to dereference an invalid per_cpu variable. Avoid this problem by using the policy's cpus cpumask instead of the related_cpus mask, as this is initialized before the interrupt is registered. Fixes: 0258cb19c77d ("cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function") Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> --- drivers/cpufreq/qcom-cpufreq-hw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)