Message ID | 426bf6edc80b2e944d459fa7b8dffbe8b73bb3d9.1653623526.git.viresh.kumar@linaro.org |
---|---|
State | New |
Headers | show |
Series | None | expand |
On Fri, May 27, 2022 at 5:53 AM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > With the new design in place, to avoid potential races show() and > store() callbacks check if the policy is active or not before proceeding > any further. And in order to guarantee that cpufreq_policy_free() must > be called after clearing the policy->cpus mask, i.e. by marking it > inactive. > > Lets make sure we don't get a bug around this later and catch this early > by putting a BUG_ON() within cpufreq_policy_free(). > > Also update cpufreq_online() a bit to make sure we clear the cpus mask > for each error case before calling cpufreq_policy_free(). > > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> > --- > V2: Update cpufreq_online() and changelog. > > drivers/cpufreq/cpufreq.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index e24aa5d4bca5..0f8245731783 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -1284,6 +1284,12 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy) > unsigned long flags; > int cpu; > > + /* > + * The callers must ensure the policy is inactive by now, to avoid any > + * races with show()/store() callbacks. > + */ > + BUG_ON(!policy_is_inactive(policy)); I'm not a super-big fan of this change. First off, crashing the kernel outright here because of possible races appears a bit excessive to me. Second, it looks like we are worrying about the code running before the wait_for_completion() call in cpufreq_policy_put_kobj(), because after that call no one can be running show() or store(). So why don't we reorder the wait_for_completion() call with respect to the code in question instead? > + > /* Remove policy from list */ > write_lock_irqsave(&cpufreq_driver_lock, flags); > list_del(&policy->policy_list); > @@ -1538,8 +1544,6 @@ static int cpufreq_online(unsigned int cpu) > for_each_cpu(j, policy->real_cpus) > remove_cpu_dev_symlink(policy, j, get_cpu_device(j)); > > - cpumask_clear(policy->cpus); > - > out_offline_policy: > if (cpufreq_driver->offline) > cpufreq_driver->offline(policy); > @@ -1549,6 +1553,7 @@ static int cpufreq_online(unsigned int cpu) > cpufreq_driver->exit(policy); > > out_free_policy: > + cpumask_clear(policy->cpus); > up_write(&policy->rwsem); > > cpufreq_policy_free(policy); > -- > 2.31.1.272.g89b43f80a514 >
On Wed, Jun 15, 2022 at 7:00 AM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > On 14-06-22, 15:59, Rafael J. Wysocki wrote: > > On Fri, May 27, 2022 at 5:53 AM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > > > > > With the new design in place, to avoid potential races show() and > > > store() callbacks check if the policy is active or not before proceeding > > > any further. And in order to guarantee that cpufreq_policy_free() must > > > be called after clearing the policy->cpus mask, i.e. by marking it > > > inactive. > > > > > > Lets make sure we don't get a bug around this later and catch this early > > > by putting a BUG_ON() within cpufreq_policy_free(). > > > > > > Also update cpufreq_online() a bit to make sure we clear the cpus mask > > > for each error case before calling cpufreq_policy_free(). > > > > > > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> > > > --- > > > V2: Update cpufreq_online() and changelog. > > > > > > drivers/cpufreq/cpufreq.c | 9 +++++++-- > > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > > > index e24aa5d4bca5..0f8245731783 100644 > > > --- a/drivers/cpufreq/cpufreq.c > > > +++ b/drivers/cpufreq/cpufreq.c > > > @@ -1284,6 +1284,12 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy) > > > unsigned long flags; > > > int cpu; > > > > > > + /* > > > + * The callers must ensure the policy is inactive by now, to avoid any > > > + * races with show()/store() callbacks. > > > + */ > > > + BUG_ON(!policy_is_inactive(policy)); > > > > I'm not a super-big fan of this change. > > > > First off, crashing the kernel outright here because of possible races > > appears a bit excessive to me. > > > > Second, it looks like we are worrying about the code running before > > the wait_for_completion() call in cpufreq_policy_put_kobj(), because > > after that call no one can be running show() or store(). So why don't > > we reorder the wait_for_completion() call with respect to the code in > > question instead? > > No, I am not worrying about that race. I am just trying to make sure some change > in future doesn't break this assumption (that policy should be inactive by this > point). That's all. It all looks good for now. > > May be a WARN instead of BUG if we don't want to crash. WARN_ON() would be somewhat better, but then I'm not sure if having a full call trace in this case is really useful, because we know when cpufreq_policy_free() can be called anyway. Maybe just print a warning message.
On 06-07-22, 15:49, Rafael J. Wysocki wrote: > WARN_ON() would be somewhat better, but then I'm not sure if having a > full call trace in this case is really useful, because we know when > cpufreq_policy_free() can be called anyway. > > Maybe just print a warning message. The warning will get printed, yes, but I am sure everyone will end up ignoring it, once it happens. One of the benefits of printing the call-stack is people will take it seriously and report it, and we won't miss a bug, if one gets in somehow.
On Wed, Jul 6, 2022 at 5:23 PM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > On 06-07-22, 15:49, Rafael J. Wysocki wrote: > > WARN_ON() would be somewhat better, but then I'm not sure if having a > > full call trace in this case is really useful, because we know when > > cpufreq_policy_free() can be called anyway. > > > > Maybe just print a warning message. > > The warning will get printed, yes, but I am sure everyone will end up > ignoring it, once it happens. > > One of the benefits of printing the call-stack is people will take it > seriously and report it, and we won't miss a bug, if one gets in > somehow. I'd rather not go into discussing things that people may or may not do and why. My point is that if WARN_ON() gets converted to panic(), they will not see the message at all and if the message gets printed, they will have a chance to see it even in that case. Whether or not they use that chance as desirable is beyond the scope of engineering IMV.
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index e24aa5d4bca5..0f8245731783 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1284,6 +1284,12 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy) unsigned long flags; int cpu; + /* + * The callers must ensure the policy is inactive by now, to avoid any + * races with show()/store() callbacks. + */ + BUG_ON(!policy_is_inactive(policy)); + /* Remove policy from list */ write_lock_irqsave(&cpufreq_driver_lock, flags); list_del(&policy->policy_list); @@ -1538,8 +1544,6 @@ static int cpufreq_online(unsigned int cpu) for_each_cpu(j, policy->real_cpus) remove_cpu_dev_symlink(policy, j, get_cpu_device(j)); - cpumask_clear(policy->cpus); - out_offline_policy: if (cpufreq_driver->offline) cpufreq_driver->offline(policy); @@ -1549,6 +1553,7 @@ static int cpufreq_online(unsigned int cpu) cpufreq_driver->exit(policy); out_free_policy: + cpumask_clear(policy->cpus); up_write(&policy->rwsem); cpufreq_policy_free(policy);
With the new design in place, to avoid potential races show() and store() callbacks check if the policy is active or not before proceeding any further. And in order to guarantee that cpufreq_policy_free() must be called after clearing the policy->cpus mask, i.e. by marking it inactive. Lets make sure we don't get a bug around this later and catch this early by putting a BUG_ON() within cpufreq_policy_free(). Also update cpufreq_online() a bit to make sure we clear the cpus mask for each error case before calling cpufreq_policy_free(). Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> --- V2: Update cpufreq_online() and changelog. drivers/cpufreq/cpufreq.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)