diff mbox

[Resend] cpufreq: Set cpufreq_cpu_data to NULL before putting kobject

Message ID ed8fd187687cb4ea9afd0bc32107ca5abf03e679.1422663249.git.viresh.kumar@linaro.org
State New
Headers show

Commit Message

Viresh Kumar Jan. 31, 2015, 12:32 a.m. UTC
In __cpufreq_remove_dev_finish(), per-cpu 'cpufreq_cpu_data' needs to be cleared
before calling kobject_put(&policy->kobj) *and* under the lock. Otherwise if
someone else calls cpufreq_cpu_get() in parallel with it, they can obtain a
non-NULL policy from it *after* kobject_put(&policy->kobj) was executed.

Consider this case:

Thread A				Thread B
cpufreq_cpu_get()
  read_lock_irqsave()
  read-per-cpu cpufreq_cpu_data
					per_cpu(&cpufreq_cpu_data, cpu) = NULL
					kobject_put(&policy->kobj);
  kobject_get(&policy->kobj);


And this will result in below Warnings:

 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47
 kobject_get+0x41/0x50()
 Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl
 lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon
 ...
 Call Trace:
  [<ffffffff81661b14>] dump_stack+0x46/0x58
  [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0
  [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff812e16d1>] kobject_get+0x41/0x50
  [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0
  [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0
  [<ffffffff810b8cb2>] ? up+0x32/0x50
  [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2
  [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252
  [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0
  [<ffffffff81360967>] ? acpi_has_method+0x25/0x40
  [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82
  [<ffffffff81089566>] ? move_linked_works+0x66/0x90
  [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7
  [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c
  [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22
  [<ffffffff8108c910>] process_one_work+0x160/0x410
  [<ffffffff8108d05b>] worker_thread+0x11b/0x520
  [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380
  [<ffffffff81092421>] kthread+0xe1/0x100
  [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
  [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0
  [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
 ---[ end trace 89e66eb9795efdf7 ]---

And here is the actual race (+ the race mentioned above):

 Thread A: Workqueue: kacpi_notify

 acpi_processor_notify()
   acpi_processor_ppc_has_changed()
         cpufreq_update_policy()
           cpufreq_cpu_get()
             kobject_get()

 Thread B: xenbus_thread()

 xenbus_thread()
   msg->u.watch.handle->callback()
     handle_vcpu_hotplug_event()
       vcpu_hotplug()
         cpu_down()
           __cpu_notify(CPU_POST_DEAD..)
             cpufreq_cpu_callback()
               __cpufreq_remove_dev_finish()
                 cpufreq_policy_put_kobj()
                   kobject_put()

cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under
cpufreq_driver_lock, and once it gets a valid policy it expects it to not be
freed until cpufreq_cpu_put() is called.

But the race happens when another thread puts the kobject first and updates
cpufreq_cpu_data before or later. And so the first thread gets a valid policy
structure and before it does kobject_get() on it, the second one has already
done kobject_put().

Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that
too under locks.

Reported-by: Ethan Zhao <ethan.zhao@oracle.com>
Reported-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Viresh Kumar Jan. 31, 2015, 2:14 a.m. UTC | #1
On 31 January 2015 at 06:02, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> In __cpufreq_remove_dev_finish(), per-cpu 'cpufreq_cpu_data' needs to be cleared
> before calling kobject_put(&policy->kobj) *and* under the lock. Otherwise if
> someone else calls cpufreq_cpu_get() in parallel with it, they can obtain a
> non-NULL policy from it *after* kobject_put(&policy->kobj) was executed.
>
> Consider this case:
>
> Thread A                                Thread B
> cpufreq_cpu_get()
>   read_lock_irqsave()
>   read-per-cpu cpufreq_cpu_data
>                                         per_cpu(&cpufreq_cpu_data, cpu) = NULL
>                                         kobject_put(&policy->kobj);
>   kobject_get(&policy->kobj);
>
>
> And this will result in below Warnings:
>
>  ------------[ cut here ]------------
>  WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47
>  kobject_get+0x41/0x50()
>  Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl
>  lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon
>  ...
>  Call Trace:
>   [<ffffffff81661b14>] dump_stack+0x46/0x58
>   [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0
>   [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20
>   [<ffffffff812e16d1>] kobject_get+0x41/0x50
>   [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0
>   [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0
>   [<ffffffff810b8cb2>] ? up+0x32/0x50
>   [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2
>   [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252
>   [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0
>   [<ffffffff81360967>] ? acpi_has_method+0x25/0x40
>   [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82
>   [<ffffffff81089566>] ? move_linked_works+0x66/0x90
>   [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7
>   [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c
>   [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22
>   [<ffffffff8108c910>] process_one_work+0x160/0x410
>   [<ffffffff8108d05b>] worker_thread+0x11b/0x520
>   [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380
>   [<ffffffff81092421>] kthread+0xe1/0x100
>   [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
>   [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0
>   [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
>  ---[ end trace 89e66eb9795efdf7 ]---
>
> And here is the actual race (+ the race mentioned above):
>
>  Thread A: Workqueue: kacpi_notify
>
>  acpi_processor_notify()
>    acpi_processor_ppc_has_changed()
>          cpufreq_update_policy()
>            cpufreq_cpu_get()
>              kobject_get()
>
>  Thread B: xenbus_thread()
>
>  xenbus_thread()
>    msg->u.watch.handle->callback()
>      handle_vcpu_hotplug_event()
>        vcpu_hotplug()
>          cpu_down()
>            __cpu_notify(CPU_POST_DEAD..)
>              cpufreq_cpu_callback()
>                __cpufreq_remove_dev_finish()
>                  cpufreq_policy_put_kobj()
>                    kobject_put()
>
> cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data under
> cpufreq_driver_lock, and once it gets a valid policy it expects it to not be
> freed until cpufreq_cpu_put() is called.
>
> But the race happens when another thread puts the kobject first and updates
> cpufreq_cpu_data before or later. And so the first thread gets a valid policy
> structure and before it does kobject_get() on it, the second one has already
> done kobject_put().
>
> Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that
> too under locks.
>

Cc: <stable@vger.kernel.org>        # 3.12+
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Feb. 2, 2015, 3:24 a.m. UTC | #2
On 2 February 2015 at 08:50, ethan zhao <ethan.zhao@oracle.com> wrote:
>  This seems couldn't prevent all the 'bad thing' from happening, E.G.
>
>
>  Thread A: Workqueue: kacpi_notify
>
>  acpi_processor_notify()
>    acpi_processor_ppc_has_changed()
>          cpufreq_update_policy()
>            cpufreq_cpu_get()

We take cpufreq_driver_lock() here, and so this will
block thread B.

>            beginning the deference of policy        Thread B:
>            ... ... __cpufreq_remove_dev_finish()
> cpufreq_policy_free(policy);
>
>
> Perhaps move policy->rwsem out side the policy structure is a way to avoid
> it completely.
> and you could stopping the PPC thread stepping forward as my patch as
> temporary workaround.

I couldn't understand your problem completely. Apart from giving a detailed
look of what's going on both threads, always specify where the BUG actually
is..
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Feb. 2, 2015, 3:43 a.m. UTC | #3
On 2 February 2015 at 09:08, ethan zhao <ethan.zhao@oracle.com> wrote:
>> We take cpufreq_driver_lock() here, and so this will
>> block thread B.
>
>  No, there is no  cpufreq_driver_lock acquired between
>
>  cpufreq_cpu_get()  and cpufreq_cpu_put()

I am not saying that the lock is taken between them. But within
cpufreq_cpu_get() to make sure policy doesn't get freed while we
are doing kobject_get().

>>>             beginning the deference of policy        Thread B:
>>>             ... ... __cpufreq_remove_dev_finish()
>>> cpufreq_policy_free(policy);
>>>
>>>
>>> Perhaps move policy->rwsem out side the policy structure is a way to
>>> avoid
>>> it completely.
>>> and you could stopping the PPC thread stepping forward as my patch as
>>> temporary workaround.
>>
>> I couldn't understand your problem completely. Apart from giving a
>> detailed
>> look of what's going on both threads, always specify where the BUG
>> actually
>> is..
>
> The problem is you are using a rwsem inside policy structure to protect its
> assessment, that is bad design.

What is the current bug you are facing right now, I haven't understood it well.
Also a lock within the structure isn't new. Its all over the kernel. I
don't understand
why you say its a bad design.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Feb. 2, 2015, 3:59 a.m. UTC | #4
On 2 February 2015 at 09:26, ethan zhao <ethan.zhao@oracle.com> wrote:
>  How to prevent the policy to be freed between
>
> cpufreq_cpu_get()  and cpufreq_cpu_put() ?

kobject_get() increases the reference count of a policy and the policy
will only be freed when this is zero. And it will only be zero once all
cpufreq_cpu_get() are matched with corresponding cpufreq_cpu_put().

> You are maxing up the water with sand ?

:)
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Feb. 2, 2015, 4:09 a.m. UTC | #5
On 2 February 2015 at 09:36, ethan zhao <ethan.zhao@oracle.com> wrote:
>  Is that an idea it supposed to be or fact ?
>
> if (!cpufreq_suspended)
>             cpufreq_policy_free(policy);
>
>  static void cpufreq_policy_free(struct cpufreq_policy *policy)
> {
>     free_cpumask_var(policy->related_cpus);
>     free_cpumask_var(policy->cpus);
>     kfree(policy);
> }
>
>  It seems you just think about it ideally in mind.

if (!cpufreq_suspended)
        cpufreq_policy_put_kobj(policy);

static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
{
        ...

        kobject_put(kobj);

        /*
        * We need to make sure that the underlying kobj is
        * actually not referenced anymore by anybody before we
        * proceed with unloading.
        */
        pr_debug("waiting for dropping of refcount\n");
        wait_for_completion(cmp);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Viresh Kumar Feb. 2, 2015, 4:26 a.m. UTC | #6
On 2 February 2015 at 09:46, ethan zhao <ethan.zhao@oracle.com> wrote:
>  We am talking about the policy allocation and de-allocation. right ?
>  I showed you the cpufreq_policy_free(policy) doesn't check kobject
>  refcount  as above.
>
>  Hmmm, you are still sleeping in the kobject, wake up and don't mix
>  water anymore.

It would be nice if we give each other the respect we deserve, And talk
about concrete points here.

>> if (!cpufreq_suspended)
>>          cpufreq_policy_put_kobj(policy);
>>
>> static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
>> {
>>          ...
>>
>>          kobject_put(kobj);
>>
>>          /*
>>          * We need to make sure that the underlying kobj is
>>          * actually not referenced anymore by anybody before we
>>          * proceed with unloading.
>>          */
>>          pr_debug("waiting for dropping of refcount\n");
>>          wait_for_completion(cmp);
>> }

I gave you exactly what you wanted to go through, but it seems you
haven't tried enough.

Before freeing policy with cpufreq_policy_free(), we call
cpufreq_policy_put_kobj(). Now what does this function do? It waits
for the completion to fire (wait_for_completion()). This completion
will only fire when refcount of a kobject becomes zero.

Initially when we create the kobject, it is initialized to one. And the
last kobject_put() you see above in cpufreq_policy_put_kobj()
makes it zero. All other cpufreq_cpu_get() and put() should happen
in pairs, otherwise this refcount will never be zero again.

As soon as the refcount becomes zero, we are sure no one else is
using the policy structure anymore. And so we free it with
cpufreq_policy_free().

That routines doesn't have any tricks and simply frees the policy.
Because, before calling cpufreq_policy_put_kobj(), we have set
the per-cpu variable to NULL, nobody else will get the policy
structure by calling cpufreq_cpu_get(). And that's what my patch
tried to solve.

Let me know if I wasn't explanatory enough..

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Viresh Kumar Feb. 2, 2015, 4:54 a.m. UTC | #7
On 2 February 2015 at 10:15, ethan zhao <ethan.zhao@oracle.com> wrote:
> On 2015/2/2 12:26, Viresh Kumar wrote:

>  But there is no checking against refcount in or before
>
>  cpufreq_policy_free(), that is one issue I mentioned.

As I said earlier, the completion will only fire once the refcount
is zero. And so there is no need of any check here.

>> That routines doesn't have any tricks and simply frees the policy.
>> Because, before calling cpufreq_policy_put_kobj(), we have set
>> the per-cpu variable to NULL, nobody else will get the policy
>
>  It is possible cpufreq_cpu_get() within the PPC thread was called just
>  before __cpufreq_remove_dev_finish() is to be called in another thread,
>  so you set the per_cpu(cpufreq_cpu_data, cpu) to NULL will not prevent
>  the actions between cpufreq_cpu_get and cpufreq_cpu_put().
>
>  And then the freeing happens in __cpufreq_remove_dev_finish().

It will.. You aren't looking closely enough. If cpufreq_cpu_get() is called just
before remove-dev, then cpufreq_cpu_get() will take:

read_lock_irqsave(&cpufreq_driver_lock, flags);

And it will do:

read_unlock_irqrestore(&cpufreq_driver_lock, flags);

only after increasing the refcount with kobject_get().

While on the other side __cpufreq_remove_dev_finish() will do this:

       write_lock_irqsave(&cpufreq_driver_lock, flags);
       policy = per_cpu(cpufreq_cpu_data, cpu);
       per_cpu(cpufreq_cpu_data, cpu) = NULL;
       write_unlock_irqrestore(&cpufreq_driver_lock, flags);

So, it will wait for the read_lock in cpufreq_cpu_get() to finish before
setting per-cpu variable to NULL. And so, after kobject_put() in
cpufreq_policy_put_kobj(), we will wait for the completion to fire
and that will only happen once a corresponding cpufreq_cpu_put()
is issued.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 4473eba1d6b0..e3bf702b5588 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1409,9 +1409,10 @@  static int __cpufreq_remove_dev_finish(struct device *dev,
 	unsigned long flags;
 	struct cpufreq_policy *policy;
 
-	read_lock_irqsave(&cpufreq_driver_lock, flags);
+	write_lock_irqsave(&cpufreq_driver_lock, flags);
 	policy = per_cpu(cpufreq_cpu_data, cpu);
-	read_unlock_irqrestore(&cpufreq_driver_lock, flags);
+	per_cpu(cpufreq_cpu_data, cpu) = NULL;
+	write_unlock_irqrestore(&cpufreq_driver_lock, flags);
 
 	if (!policy) {
 		pr_debug("%s: No cpu_data found\n", __func__);
@@ -1466,7 +1467,6 @@  static int __cpufreq_remove_dev_finish(struct device *dev,
 		}
 	}
 
-	per_cpu(cpufreq_cpu_data, cpu) = NULL;
 	return 0;
 }