diff mbox

[RFC,V2] cpufreq: make sure frequency transitions are serialized

Message ID CAKohpomhTJa06a1LSi9tK1+=6Y2MyidJSbYWuffAsU5KxVf7zw@mail.gmail.com
State New
Headers show

Commit Message

Viresh Kumar March 19, 2014, 6:08 a.m. UTC
On 18 March 2014 18:20, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 03/14/2014 01:13 PM, Viresh Kumar wrote:
>> +     if ((state != CPUFREQ_PRECHANGE) && (state != CPUFREQ_POSTCHANGE))
>
> Wait a min, when is this condition ever true? I mean, what else can
> 'state' ever be, apart from CPUFREQ_PRECHANGE and POSTCHANGE?

There were two more 'unused' states available:
CPUFREQ_RESUMECHANGE and CPUFREQ_SUSPENDCHANGE

I have sent a patch to remove them now and this code would go away..

>> +             return notify_transition_for_each_cpu(policy, freqs, state);
>> +
>> +     /* Serialize pre-post notifications */
>> +     mutex_lock(&policy->transition_lock);
>
> Nope, this is definitely not the way to go, IMHO. We should enforce that
> the *callers* serialize the transitions, something like this:
>
>         cpufreq_transition_lock();
>
>         cpufreq_notify_transition(CPUFREQ_PRECHANGE);
>
>         //Perform the frequency change
>
>         cpufreq_notify_transition(CPUFREQ_POSTCHANGE);
>
>         cpufreq_transition_unlock();
>
> That's it!
>
> [ We can either introduce a new "transition" lock or perhaps even reuse
> the cpufreq_driver_lock if it fits... but the point is, the _caller_ has
> to perform the locking; trying to be smart inside cpufreq_notify_transition()
> is a recipe for headache :-( ]
>
> Is there any problem with this approach due to which you didn't take
> this route?

I didn't wanted drivers to handle this as core must make sure things are in
order. Over that it would have helped by not pasting redundant code
everywhere..

Drivers are anyway going to call cpufreq_notify_transition(), why increase
burden on them?

>> +     if (unlikely(WARN_ON(!policy->transition_ongoing &&
>> +                             (state == CPUFREQ_POSTCHANGE)))) {
>> +             mutex_unlock(&policy->transition_lock);
>> +             return;
>> +     }
>> +
>> +     if (state == CPUFREQ_PRECHANGE) {
>> +             while (policy->transition_ongoing) {
>> +                     mutex_unlock(&policy->transition_lock);
>> +                     /* TODO: Can we do something better here? */
>> +                     cpu_relax();
>> +                     mutex_lock(&policy->transition_lock);
>
> If the caller takes care of the synchronization, we can avoid
> these sorts of acrobatics ;-)

If we are fine with taking a mutex for the entire transition, then
we can avoid above kind of acrobatics by just taking the mutex
from PRECHANGE and leaving it at POSTCHANGE..

It will look like this then, hope this looks fine :)

More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Srivatsa S. Bhat March 19, 2014, 9:17 a.m. UTC | #1
On 03/19/2014 11:38 AM, Viresh Kumar wrote:
> On 18 March 2014 18:20, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 03/14/2014 01:13 PM, Viresh Kumar wrote:
>>> +     if ((state != CPUFREQ_PRECHANGE) && (state != CPUFREQ_POSTCHANGE))
>>
>> Wait a min, when is this condition ever true? I mean, what else can
>> 'state' ever be, apart from CPUFREQ_PRECHANGE and POSTCHANGE?
> 
> There were two more 'unused' states available:
> CPUFREQ_RESUMECHANGE and CPUFREQ_SUSPENDCHANGE
>
> I have sent a patch to remove them now and this code would go away..
>

Ok..
 
>>> +             return notify_transition_for_each_cpu(policy, freqs, state);
>>> +
>>> +     /* Serialize pre-post notifications */
>>> +     mutex_lock(&policy->transition_lock);
>>
>> Nope, this is definitely not the way to go, IMHO. We should enforce that
>> the *callers* serialize the transitions, something like this:
>>
>>         cpufreq_transition_lock();
>>
>>         cpufreq_notify_transition(CPUFREQ_PRECHANGE);
>>
>>         //Perform the frequency change
>>
>>         cpufreq_notify_transition(CPUFREQ_POSTCHANGE);
>>
>>         cpufreq_transition_unlock();
>>
>> That's it!
>>
>> [ We can either introduce a new "transition" lock or perhaps even reuse
>> the cpufreq_driver_lock if it fits... but the point is, the _caller_ has
>> to perform the locking; trying to be smart inside cpufreq_notify_transition()
>> is a recipe for headache :-( ]
>>
>> Is there any problem with this approach due to which you didn't take
>> this route?
> 

Wait, I think I remember. The problem was about dealing with drivers that
do asynchronous notification (those that have the ASYNC_NOTIFICATION flag
set). In particular, exynos-5440 driver sends out the POSTCHANGE notification
from a workqueue worker, much later than sending the PRECHANGE notification.

From what I saw, this is how the exynos-5440 driver works:

1. ->target() is invoked, and the driver writes to a register and returns
   to its caller.

2. An interrupt occurs that indicates that the frequency was changed.

3. The interrupt handler kicks off a worker thread which then sends out
   the POSTCHANGE notification.

So the important question here is, how does the exynos-5440 driver
protect itself from say 2 ->target() calls which occur in close sequence
(before allowing the entire chain for the first call to complete)?

As far as I can see there is no such synchronization in the driver at
the moment. Adding Amit to CC for his comments.

Regards,
Srivatsa S. Bhat

> I didn't wanted drivers to handle this as core must make sure things are in
> order. Over that it would have helped by not pasting redundant code
> everywhere..
> 
> Drivers are anyway going to call cpufreq_notify_transition(), why increase
> burden on them?
> 
>>> +     if (unlikely(WARN_ON(!policy->transition_ongoing &&
>>> +                             (state == CPUFREQ_POSTCHANGE)))) {
>>> +             mutex_unlock(&policy->transition_lock);
>>> +             return;
>>> +     }
>>> +
>>> +     if (state == CPUFREQ_PRECHANGE) {
>>> +             while (policy->transition_ongoing) {
>>> +                     mutex_unlock(&policy->transition_lock);
>>> +                     /* TODO: Can we do something better here? */
>>> +                     cpu_relax();
>>> +                     mutex_lock(&policy->transition_lock);
>>
>> If the caller takes care of the synchronization, we can avoid
>> these sorts of acrobatics ;-)
> 
> If we are fine with taking a mutex for the entire transition, then
> we can avoid above kind of acrobatics by just taking the mutex
> from PRECHANGE and leaving it at POSTCHANGE..
> 
> It will look like this then, hope this looks fine :)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 2677ff1..3b9eac4 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -335,8 +335,15 @@ static void __cpufreq_notify_transition(struct
> cpufreq_policy *policy,
>  void cpufreq_notify_transition(struct cpufreq_policy *policy,
>                 struct cpufreq_freqs *freqs, unsigned int state)
>  {
> +       if (state == CPUFREQ_PRECHANGE)
> +               mutex_lock(&policy->transition_lock);
> +
> +       /* Send notifications */
>         for_each_cpu(freqs->cpu, policy->cpus)
>                 __cpufreq_notify_transition(policy, freqs, state);
> +
> +       if (state == CPUFREQ_POSTCHANGE)
> +               mutex_unlock(&policy->transition_lock);
>  }
>  EXPORT_SYMBOL_GPL(cpufreq_notify_transition);
> 
> @@ -983,6 +990,7 @@ static struct cpufreq_policy *cpufreq_policy_alloc(void)
> 
>         INIT_LIST_HEAD(&policy->policy_list);
>         init_rwsem(&policy->rwsem);
> +       mutex_init(&policy->transition_lock);
> 
>         return policy;
> 
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 31c431e..5f9209a 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -104,6 +104,7 @@ struct cpufreq_policy {
>          *     __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);
>          */
>         struct rw_semaphore     rwsem;
> +       struct mutex            transition_lock;
>  };
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar March 19, 2014, 9:20 a.m. UTC | #2
On 19 March 2014 14:47, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> Wait, I think I remember. The problem was about dealing with drivers that
> do asynchronous notification (those that have the ASYNC_NOTIFICATION flag
> set). In particular, exynos-5440 driver sends out the POSTCHANGE notification
> from a workqueue worker, much later than sending the PRECHANGE notification.
>
> From what I saw, this is how the exynos-5440 driver works:
>
> 1. ->target() is invoked, and the driver writes to a register and returns
>    to its caller.
>
> 2. An interrupt occurs that indicates that the frequency was changed.
>
> 3. The interrupt handler kicks off a worker thread which then sends out
>    the POSTCHANGE notification.

Correct!!

> So the important question here is, how does the exynos-5440 driver
> protect itself from say 2 ->target() calls which occur in close sequence
> (before allowing the entire chain for the first call to complete)?
>
> As far as I can see there is no such synchronization in the driver at
> the moment. Adding Amit to CC for his comments.

Yes, and that's what my patch is trying to fix. Where is the confusion?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Srivatsa S. Bhat March 19, 2014, 9:50 a.m. UTC | #3
On 03/19/2014 02:50 PM, Viresh Kumar wrote:
> On 19 March 2014 14:47, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> Wait, I think I remember. The problem was about dealing with drivers that
>> do asynchronous notification (those that have the ASYNC_NOTIFICATION flag
>> set). In particular, exynos-5440 driver sends out the POSTCHANGE notification
>> from a workqueue worker, much later than sending the PRECHANGE notification.
>>
>> From what I saw, this is how the exynos-5440 driver works:
>>
>> 1. ->target() is invoked, and the driver writes to a register and returns
>>    to its caller.
>>
>> 2. An interrupt occurs that indicates that the frequency was changed.
>>
>> 3. The interrupt handler kicks off a worker thread which then sends out
>>    the POSTCHANGE notification.
> 
> Correct!!
> 
>> So the important question here is, how does the exynos-5440 driver
>> protect itself from say 2 ->target() calls which occur in close sequence
>> (before allowing the entire chain for the first call to complete)?
>>
>> As far as I can see there is no such synchronization in the driver at
>> the moment. Adding Amit to CC for his comments.
> 
> Yes, and that's what my patch is trying to fix. Where is the confusion?

Sorry, for a moment I got confused and thought that your patch addresses
the race conditions present in normal drivers alone, and not ASYNC_NOTIFICATION
drivers. But now I understand that your patch intends to fix both the
problems at once. I'll share my thoughts about the design in a separate
reply.
 
Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe cpufreq" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Srivatsa S. Bhat March 19, 2014, 9:50 a.m. UTC | #4
On 03/19/2014 11:38 AM, Viresh Kumar wrote:
> On 18 March 2014 18:20, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 03/14/2014 01:13 PM, Viresh Kumar wrote:
>>> +     if ((state != CPUFREQ_PRECHANGE) && (state != CPUFREQ_POSTCHANGE))
>>
>> Wait a min, when is this condition ever true? I mean, what else can
>> 'state' ever be, apart from CPUFREQ_PRECHANGE and POSTCHANGE?
> 
> There were two more 'unused' states available:
> CPUFREQ_RESUMECHANGE and CPUFREQ_SUSPENDCHANGE
> 
> I have sent a patch to remove them now and this code would go away..
> 
>>> +             return notify_transition_for_each_cpu(policy, freqs, state);
>>> +
>>> +     /* Serialize pre-post notifications */
>>> +     mutex_lock(&policy->transition_lock);
>>
>> Nope, this is definitely not the way to go, IMHO. We should enforce that
>> the *callers* serialize the transitions, something like this:
>>
>>         cpufreq_transition_lock();
>>
>>         cpufreq_notify_transition(CPUFREQ_PRECHANGE);
>>
>>         //Perform the frequency change
>>
>>         cpufreq_notify_transition(CPUFREQ_POSTCHANGE);
>>
>>         cpufreq_transition_unlock();
>>
>> That's it!
>>
>> [ We can either introduce a new "transition" lock or perhaps even reuse
>> the cpufreq_driver_lock if it fits... but the point is, the _caller_ has
>> to perform the locking; trying to be smart inside cpufreq_notify_transition()
>> is a recipe for headache :-( ]
>>
>> Is there any problem with this approach due to which you didn't take
>> this route?
> 
> I didn't wanted drivers to handle this as core must make sure things are in
> order. Over that it would have helped by not pasting redundant code
> everywhere..
> 
> Drivers are anyway going to call cpufreq_notify_transition(), why increase
> burden on them?
>

No, its not about burden. Its about the elegance of the design. We should
not be overly "smart" in the cpufreq core. Hiding the synchronization inside
the cpufreq core only encourages people to write buggy code in their drivers.

Why don't we go with what Rafael suggested? We can have dedicated
begin_transition() and end_transition() calls to demarcate the frequency
transitions. That way, it makes it very clear how the synchronization is
done. Of course, these functions would be provided (exported) by the cpufreq
core, by implementing them using locks/counters/whatever.

Basically what I'm arguing against, is the idea of having the cpufreq
core figure out what the driver _intended_ to do, from inside the
cpufreq_notify_transition() call.

What I would prefer instead is to have the cpufreq driver do something
like this:

cpufreq_freq_transition_begin();

cpufreq_notify_transition(CPUFREQ_PRECHANGE);

//perform the frequency change

cpufreq_notify_transition(CPUFREQ_POSTCHANGE);

cpufreq_freq_transition_end();

[ASYNC_NOTIFICATION drivers will invoke the last two functions in a
separate context/thread.]

Regards,
Srivatsa S. Bhat
 
>>> +     if (unlikely(WARN_ON(!policy->transition_ongoing &&
>>> +                             (state == CPUFREQ_POSTCHANGE)))) {
>>> +             mutex_unlock(&policy->transition_lock);
>>> +             return;
>>> +     }
>>> +
>>> +     if (state == CPUFREQ_PRECHANGE) {
>>> +             while (policy->transition_ongoing) {
>>> +                     mutex_unlock(&policy->transition_lock);
>>> +                     /* TODO: Can we do something better here? */
>>> +                     cpu_relax();
>>> +                     mutex_lock(&policy->transition_lock);
>>
>> If the caller takes care of the synchronization, we can avoid
>> these sorts of acrobatics ;-)
> 
> If we are fine with taking a mutex for the entire transition, then
> we can avoid above kind of acrobatics by just taking the mutex
> from PRECHANGE and leaving it at POSTCHANGE..
> 
> It will look like this then, hope this looks fine :)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 2677ff1..3b9eac4 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -335,8 +335,15 @@ static void __cpufreq_notify_transition(struct
> cpufreq_policy *policy,
>  void cpufreq_notify_transition(struct cpufreq_policy *policy,
>                 struct cpufreq_freqs *freqs, unsigned int state)
>  {
> +       if (state == CPUFREQ_PRECHANGE)
> +               mutex_lock(&policy->transition_lock);
> +
> +       /* Send notifications */
>         for_each_cpu(freqs->cpu, policy->cpus)
>                 __cpufreq_notify_transition(policy, freqs, state);
> +
> +       if (state == CPUFREQ_POSTCHANGE)
> +               mutex_unlock(&policy->transition_lock);
>  }
>  EXPORT_SYMBOL_GPL(cpufreq_notify_transition);
> 
> @@ -983,6 +990,7 @@ static struct cpufreq_policy *cpufreq_policy_alloc(void)
> 
>         INIT_LIST_HEAD(&policy->policy_list);
>         init_rwsem(&policy->rwsem);
> +       mutex_init(&policy->transition_lock);
> 
>         return policy;
> 
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 31c431e..5f9209a 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -104,6 +104,7 @@ struct cpufreq_policy {
>          *     __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);
>          */
>         struct rw_semaphore     rwsem;
> +       struct mutex            transition_lock;
>  };

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Viresh Kumar March 19, 2014, 10:09 a.m. UTC | #5
On 19 March 2014 15:20, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> No, its not about burden. Its about the elegance of the design. We should
> not be overly "smart" in the cpufreq core. Hiding the synchronization inside
> the cpufreq core only encourages people to write buggy code in their drivers.

What kind of buggy code can be there? They are supposed to call notifiers
in the order mentioned and so it shouldn't be a problem at all.. Don't know..

> Why don't we go with what Rafael suggested? We can have dedicated
> begin_transition() and end_transition() calls to demarcate the frequency
> transitions. That way, it makes it very clear how the synchronization is
> done. Of course, these functions would be provided (exported) by the cpufreq
> core, by implementing them using locks/counters/whatever.
>
> Basically what I'm arguing against, is the idea of having the cpufreq
> core figure out what the driver _intended_ to do, from inside the
> cpufreq_notify_transition() call.
>
> What I would prefer instead is to have the cpufreq driver do something
> like this:
>
> cpufreq_freq_transition_begin();
>
> cpufreq_notify_transition(CPUFREQ_PRECHANGE);

Why do we need two routines then? What about doing notification from
inside cpufreq_freq_transition_begin()?

This is a burden for driver writers, who don't normally understand the
relevance of these calls in detail and may think, only the first one is
enough or the second one is..

Its better if they simply let the core that they are starting to do transitions,
i.e. cpufreq_freq_transition_begin() and then the core should send
notifications.

> //perform the frequency change
>
> cpufreq_notify_transition(CPUFREQ_POSTCHANGE);
>
> cpufreq_freq_transition_end();
>
> [ASYNC_NOTIFICATION drivers will invoke the last two functions in a
> separate context/thread.]

Same for the last two routines and yes they would be called from
separate thread for ASYNC_NOTIFICATION drivers..
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar March 19, 2014, 1:35 p.m. UTC | #6
On 19 March 2014 17:45, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> +       bool                    transition_ongoing; /* Tracks transition status */
> +       struct mutex            transition_lock;
> +       wait_queue_head_t       transition_wait;

Similar to what I have done in my last version, why do you need
transition_ongoing and transition_wait? Simply work with
transition_lock? i.e. Acquire it for the complete transition sequence.
--
To unsubscribe from this list: send the line "unsubscribe cpufreq" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki March 19, 2014, 1:53 p.m. UTC | #7
On Wednesday, March 19, 2014 03:39:16 PM Viresh Kumar wrote:
> On 19 March 2014 15:20, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
> > No, its not about burden. Its about the elegance of the design. We should
> > not be overly "smart" in the cpufreq core. Hiding the synchronization inside
> > the cpufreq core only encourages people to write buggy code in their drivers.
> 
> What kind of buggy code can be there? They are supposed to call notifiers
> in the order mentioned and so it shouldn't be a problem at all.. Don't know..
> 
> > Why don't we go with what Rafael suggested? We can have dedicated
> > begin_transition() and end_transition() calls to demarcate the frequency
> > transitions. That way, it makes it very clear how the synchronization is
> > done. Of course, these functions would be provided (exported) by the cpufreq
> > core, by implementing them using locks/counters/whatever.
> >
> > Basically what I'm arguing against, is the idea of having the cpufreq
> > core figure out what the driver _intended_ to do, from inside the
> > cpufreq_notify_transition() call.
> >
> > What I would prefer instead is to have the cpufreq driver do something
> > like this:
> >
> > cpufreq_freq_transition_begin();
> >
> > cpufreq_notify_transition(CPUFREQ_PRECHANGE);
> 
> Why do we need two routines then? What about doing notification from
> inside cpufreq_freq_transition_begin()?

We can do that in my opinion.

> This is a burden for driver writers, who don't normally understand the
> relevance of these calls in detail and may think, only the first one is
> enough or the second one is..
> 
> Its better if they simply let the core that they are starting to do transitions,
> i.e. cpufreq_freq_transition_begin() and then the core should send
> notifications.
> 
> > //perform the frequency change
> >
> > cpufreq_notify_transition(CPUFREQ_POSTCHANGE);
> >
> > cpufreq_freq_transition_end();
> >
> > [ASYNC_NOTIFICATION drivers will invoke the last two functions in a
> > separate context/thread.]
> 
> Same for the last two routines and yes they would be called from
> separate thread for ASYNC_NOTIFICATION drivers..

That'd be fine by me in principle.
Srivatsa S. Bhat March 19, 2014, 2:48 p.m. UTC | #8
On 03/19/2014 07:05 PM, Viresh Kumar wrote:
> On 19 March 2014 17:45, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
>> +       bool                    transition_ongoing; /* Tracks transition status */
>> +       struct mutex            transition_lock;
>> +       wait_queue_head_t       transition_wait;
> 
> Similar to what I have done in my last version, why do you need
> transition_ongoing and transition_wait? Simply work with
> transition_lock? i.e. Acquire it for the complete transition sequence.
> 

We *can't* acquire it for the complete transition sequence
in case of drivers that do asynchronous notification, because
PRECHANGE is done in one thread and POSTCHANGE is done in a
totally different thread! You can't acquire a lock in one
task and release it in a different task. That would be a
fundamental violation of locking.

That's why I introduced the wait queue to help us create
a "flow" which encompasses 2 different, but co-ordinating
tasks. You simply can't do that elegantly by using plain
locks alone.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe cpufreq" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Srivatsa S. Bhat March 19, 2014, 5:38 p.m. UTC | #9
On 03/19/2014 08:18 PM, Srivatsa S. Bhat wrote:
> On 03/19/2014 07:05 PM, Viresh Kumar wrote:
>> On 19 March 2014 17:45, Srivatsa S. Bhat
>> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
>>> +       bool                    transition_ongoing; /* Tracks transition status */
>>> +       struct mutex            transition_lock;
>>> +       wait_queue_head_t       transition_wait;
>>
>> Similar to what I have done in my last version, why do you need
>> transition_ongoing and transition_wait? Simply work with
>> transition_lock? i.e. Acquire it for the complete transition sequence.
>>
> 
> We *can't* acquire it for the complete transition sequence
> in case of drivers that do asynchronous notification, because
> PRECHANGE is done in one thread and POSTCHANGE is done in a
> totally different thread! You can't acquire a lock in one
> task and release it in a different task. That would be a
> fundamental violation of locking.
> 
> That's why I introduced the wait queue to help us create
> a "flow" which encompasses 2 different, but co-ordinating
> tasks. You simply can't do that elegantly by using plain
> locks alone.
> 

By the way, note the updated changelog in my patch. It includes a brief
overview of the synchronization design, which is copy-pasted below for
reference. I forgot to mention this earlier!

-----

This patch introduces a set of synchronization primitives to serialize
frequency transitions, which are to be used as shown below:

cpufreq_freq_transition_begin();

//Perform the frequency change

cpufreq_freq_transition_end();

The _begin() call sends the PRECHANGE notification whereas the _end() call
sends the POSTCHANGE notification. Also, all the necessary synchronization
is handled within these calls. In particular, even drivers which set the
ASYNC_NOTIFICATION flag can also use these APIs for performing frequency
transitions (ie., you can call _begin() from one task, and call the
corresponding _end() from a different task).

The actual synchronization underneath is not that complicated:

The key challenge is to allow drivers to begin the transition from one thread
and end it in a completely different thread (this is to enable drivers that do
asynchronous POSTCHANGE notification from bottom-halves, to also use the same
interface).

To achieve this, a 'transition_ongoing' flag, a 'transition_lock' mutex and a
wait-queue are added per-policy. The flag and the wait-queue are used in
conjunction to create an "uninterrupted flow" from _begin() to _end(). The
mutex-lock is used to ensure that only one such "flow" is in flight at any
given time. Put together, this provides us all the necessary synchronization.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar March 20, 2014, 4:39 a.m. UTC | #10
On 19 March 2014 17:45, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 199b52b..e90388f 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -349,6 +349,38 @@ void cpufreq_notify_post_transition(struct cpufreq_policy *policy,
>  EXPORT_SYMBOL_GPL(cpufreq_notify_post_transition);
>
>
> +void cpufreq_freq_transition_begin(struct cpufreq_policy *policy,
> +               struct cpufreq_freqs *freqs, unsigned int state)
> +{
> +wait:
> +       wait_event(&policy->transition_wait, !policy->transition_ongoing);

I think its broken here. At this point another thread can come take lock,
update transition_ongoing, send notification and finally unlock..

And after that we can take the lock and send another notification..

Correct?

> +       if (!mutex_trylock(&policy->transition_lock))
> +               goto wait;
> +
> +       policy->transition_ongoing++;

s/++/ = true

> +       cpufreq_notify_transition(policy, freqs, CPUFREQ_PRECHANGE);
> +
> +       mutex_unlock(&policy->transition_lock);

We can release the lock before sending notifications, its there just to
protect transition_ongoing.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar March 20, 2014, 8:37 a.m. UTC | #11
On 20 March 2014 14:02, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 199b52b..5283f10 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -349,6 +349,39 @@ void cpufreq_notify_post_transition(struct cpufreq_policy *policy,
>  EXPORT_SYMBOL_GPL(cpufreq_notify_post_transition);
>
>
> +void cpufreq_freq_transition_begin(struct cpufreq_policy *policy,
> +               struct cpufreq_freqs *freqs, unsigned int state)
> +{
> +wait:
> +       wait_event(&policy->transition_wait, !policy->transition_ongoing);
> +
> +       mutex_lock(&policy->transition_lock);
> +
> +       if (policy->transition_ongoing) {
> +               mutex_unlock(&policy->transition_lock);
> +               goto wait;
> +       }
> +
> +       policy->transition_ongoing = true;
> +
> +       mutex_unlock(&policy->transition_lock);
> +
> +       cpufreq_notify_transition(policy, freqs, CPUFREQ_PRECHANGE);
> +}
> +
> +void cpufreq_freq_transition_end(struct cpufreq_policy *policy,
> +               struct cpufreq_freqs *freqs, unsigned int state)
> +{
> +       cpufreq_notify_transition(policy, freqs, CPUFREQ_POSTCHANGE);
> +
> +       mutex_lock(&policy->transition_lock);

Why do we need locking here? You explained that earlier :)

Also, I would like to add this here:

    WARN_ON(policy->transition_ongoing);

> +       policy->transition_ongoing = false;
> +       mutex_unlock(&policy->transition_lock);
> +
> +       wake_up(&policy->transition_wait);
> +}
--
To unsubscribe from this list: send the line "unsubscribe cpufreq" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Srivatsa S. Bhat March 20, 2014, 9:24 a.m. UTC | #12
On 03/20/2014 02:07 PM, Viresh Kumar wrote:
> On 20 March 2014 14:02, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>> index 199b52b..5283f10 100644
>> --- a/drivers/cpufreq/cpufreq.c
>> +++ b/drivers/cpufreq/cpufreq.c
>> @@ -349,6 +349,39 @@ void cpufreq_notify_post_transition(struct cpufreq_policy *policy,
>>  EXPORT_SYMBOL_GPL(cpufreq_notify_post_transition);
>>
>>
>> +void cpufreq_freq_transition_begin(struct cpufreq_policy *policy,
>> +               struct cpufreq_freqs *freqs, unsigned int state)
>> +{
>> +wait:
>> +       wait_event(&policy->transition_wait, !policy->transition_ongoing);
>> +
>> +       mutex_lock(&policy->transition_lock);
>> +
>> +       if (policy->transition_ongoing) {
>> +               mutex_unlock(&policy->transition_lock);
>> +               goto wait;
>> +       }
>> +
>> +       policy->transition_ongoing = true;
>> +
>> +       mutex_unlock(&policy->transition_lock);
>> +
>> +       cpufreq_notify_transition(policy, freqs, CPUFREQ_PRECHANGE);
>> +}
>> +
>> +void cpufreq_freq_transition_end(struct cpufreq_policy *policy,
>> +               struct cpufreq_freqs *freqs, unsigned int state)
>> +{
>> +       cpufreq_notify_transition(policy, freqs, CPUFREQ_POSTCHANGE);
>> +
>> +       mutex_lock(&policy->transition_lock);
> 
> Why do we need locking here? You explained that earlier :)
> 

Hmm.. I had thought of some complex race condition which would make
tasks miss the wake-up event and sleep forever, and hence added
the locking there to prevent that. But now that I think more closely,
I'm not able to recall that race... I will give some more thought to
it and if I can't find any loopholes in doing the second update to
the ongoing flag without locks, then I'll post the patchset with
that lockless version itself.

> Also, I would like to add this here:
> 
>     WARN_ON(policy->transition_ongoing);
>

Hmm? Won't it always be true? We are the ones who set that flag to
true earlier, right? I guess you meant WARN_ON(!policy->transition_ongoing)
perhaps? I'm not sure whether its really worth it, because it kinda looks
obvious. Not sure what kind of bugs it would catch. I can't think of any
such scenario :-(
 
>> +       policy->transition_ongoing = false;
>> +       mutex_unlock(&policy->transition_lock);
>> +
>> +       wake_up(&policy->transition_wait);
>> +}
> 

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Viresh Kumar March 20, 2014, 9:33 a.m. UTC | #13
On 20 March 2014 14:54, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> On 03/20/2014 02:07 PM, Viresh Kumar wrote:
>>     WARN_ON(policy->transition_ongoing);
>>
>
> I guess you meant WARN_ON(!policy->transition_ongoing)
> perhaps?

Ooops!!

> I'm not sure whether its really worth it, because it kinda looks
> obvious. Not sure what kind of bugs it would catch. I can't think of any
> such scenario :-(

Just to catch if somebody is sending a POSTCHANGE one without first
sending a PRECHANGE one.. Just another check to make sure things are
in order.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Srivatsa S. Bhat March 20, 2014, 9:45 a.m. UTC | #14
On 03/20/2014 03:03 PM, Viresh Kumar wrote:
> On 20 March 2014 14:54, Srivatsa S. Bhat
> <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>> On 03/20/2014 02:07 PM, Viresh Kumar wrote:
>>>     WARN_ON(policy->transition_ongoing);
>>>
>>
>> I guess you meant WARN_ON(!policy->transition_ongoing)
>> perhaps?
> 
> Ooops!!
> 
>> I'm not sure whether its really worth it, because it kinda looks
>> obvious. Not sure what kind of bugs it would catch. I can't think of any
>> such scenario :-(
> 
> Just to catch if somebody is sending a POSTCHANGE one without first
> sending a PRECHANGE one.. Just another check to make sure things are
> in order.
> 

Well, that's unlikely, since they will have to call _end() before
_begin() :-) That's the power of having great function names - they make
it impossible to use them incorrectly ;-) But anyway, I can add the check,
just in case somebody misses even such an obvious cue! :-)

By the way, I'm also thinking of using a spinlock instead of a mutex.
The critical section is tiny and we don't sleep inside the critical
section - sounds like the perfect case for a spinlock.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Viresh Kumar March 20, 2014, 9:50 a.m. UTC | #15
On 20 March 2014 15:15, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> By the way, I'm also thinking of using a spinlock instead of a mutex.
> The critical section is tiny and we don't sleep inside the critical
> section - sounds like the perfect case for a spinlock.

Probably yes.
--
To unsubscribe from this list: send the line "unsubscribe cpufreq" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 2677ff1..3b9eac4 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -335,8 +335,15 @@  static void __cpufreq_notify_transition(struct
cpufreq_policy *policy,
 void cpufreq_notify_transition(struct cpufreq_policy *policy,
                struct cpufreq_freqs *freqs, unsigned int state)
 {
+       if (state == CPUFREQ_PRECHANGE)
+               mutex_lock(&policy->transition_lock);
+
+       /* Send notifications */
        for_each_cpu(freqs->cpu, policy->cpus)
                __cpufreq_notify_transition(policy, freqs, state);
+
+       if (state == CPUFREQ_POSTCHANGE)
+               mutex_unlock(&policy->transition_lock);
 }
 EXPORT_SYMBOL_GPL(cpufreq_notify_transition);

@@ -983,6 +990,7 @@  static struct cpufreq_policy *cpufreq_policy_alloc(void)

        INIT_LIST_HEAD(&policy->policy_list);
        init_rwsem(&policy->rwsem);
+       mutex_init(&policy->transition_lock);

        return policy;

diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 31c431e..5f9209a 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -104,6 +104,7 @@  struct cpufreq_policy {
         *     __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);
         */
        struct rw_semaphore     rwsem;
+       struct mutex            transition_lock;
 };
--
To unsubscribe from this list: send the line "unsubscribe cpufreq" in
the body of a message to majordomo@vger.kernel.org