[2/5] clk: notifier handler for dynamic voltage scaling

Message ID 1362026969-11457-3-git-send-email-mturquette@linaro.org
State New
Headers show

Commit Message

Mike Turquette Feb. 28, 2013, 4:49 a.m.
Dynamic voltage and frequency scaling (dvfs) is a common power saving
technique in many of today's modern processors.  This patch introduces a
common clk rate-change notifier handler which scales voltage
appropriately whenever clk_set_rate is called on an affected clock.

There are three prerequisites to using this feature:

1) the affected clocks must be using the common clk framework
2) voltage must be scaled using the regulator framework
3) clock frequency and regulator voltage values must be paired via the
OPP library

If a platform or device meets these requirements then using the notifier
handler is straightforward.  A struct device is used as the basis for
performing initial look-ups for clocks via clk_get and regulators via
regulator_get.  This means that notifiers are subscribed on a per-device
basis and multiple devices can have notifiers subscribed to the same
clock.  Put another way, the voltage chosen for a rail during a call to
clk_set_rate is a function of the device, not the clock.

Signed-off-by: Mike Turquette <mturquette@linaro.org>
---
 drivers/clk/Makefile |    1 +
 drivers/clk/dvfs.c   |  125 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/clk.h  |   27 ++++++++++-
 3 files changed, 152 insertions(+), 1 deletion(-)
 create mode 100644 drivers/clk/dvfs.c

Comments

Bill Huang March 1, 2013, 9:41 a.m. | #1
On Thu, 2013-02-28 at 12:49 +0800, Mike Turquette wrote:
> Dynamic voltage and frequency scaling (dvfs) is a common power saving
> technique in many of today's modern processors.  This patch introduces a
> common clk rate-change notifier handler which scales voltage
> appropriately whenever clk_set_rate is called on an affected clock.

I really think clk_enable and clk_disable should also be triggering
notifier call and DVFS should act accordingly since there are cases
drivers won't set clock rate but instead disable its clock directly, do
you agree?
> 
> There are three prerequisites to using this feature:
> 
> 1) the affected clocks must be using the common clk framework
> 2) voltage must be scaled using the regulator framework
> 3) clock frequency and regulator voltage values must be paired via the
> OPP library

Just a note, Tegra Core won't meet prerequisite #3 since each regulator
voltage values is associated with clocks driving those many sub-HW
blocks in it.
Mike Turquette March 1, 2013, 6:22 p.m. | #2
Quoting Bill Huang (2013-03-01 01:41:31)
> On Thu, 2013-02-28 at 12:49 +0800, Mike Turquette wrote:
> > Dynamic voltage and frequency scaling (dvfs) is a common power saving
> > technique in many of today's modern processors.  This patch introduces a
> > common clk rate-change notifier handler which scales voltage
> > appropriately whenever clk_set_rate is called on an affected clock.
> 
> I really think clk_enable and clk_disable should also be triggering
> notifier call and DVFS should act accordingly since there are cases
> drivers won't set clock rate but instead disable its clock directly, do
> you agree?
> > 

Hi Bill,

I'll think about this.  Perhaps a better solution would be to adapt
these drivers to runtime PM.  Then a call to runtime_pm_put() would
result in a call to clk_disable(...) and regulator_set_voltage(...).

There is no performance-based equivalent to runtime PM, which is one
reason why clk_set_rate is a likely entry point into dvfs.  But for
operations that have nice api's like runtime PM it would be better to
use those interfaces and not overload the clk.h api unnecessarily.

> > There are three prerequisites to using this feature:
> > 
> > 1) the affected clocks must be using the common clk framework
> > 2) voltage must be scaled using the regulator framework
> > 3) clock frequency and regulator voltage values must be paired via the
> > OPP library
> 
> Just a note, Tegra Core won't meet prerequisite #3 since each regulator
> voltage values is associated with clocks driving those many sub-HW
> blocks in it.

This patch isn't the one and only way to perform dvfs.  It is just a
helper function for registering notifier handlers for systems that meet
the above three requirements.  Even if you do not use the OPP library
there is no reason why you could not register your own rate-change
notifier handler to implement dvfs using whatever lookup-table you use
today.

And patches are welcome to extend the usefulness of this helper.  I'd
like as many people to benefit from this mechanism as possible.

At some point we should think hard about DT bindings for these operating
points...

Regards,
Mike
Mike Turquette March 1, 2013, 8:48 p.m. | #3
Quoting Mike Turquette (2013-03-01 10:22:34)
> Quoting Bill Huang (2013-03-01 01:41:31)
> > On Thu, 2013-02-28 at 12:49 +0800, Mike Turquette wrote:
> > > Dynamic voltage and frequency scaling (dvfs) is a common power saving
> > > technique in many of today's modern processors.  This patch introduces a
> > > common clk rate-change notifier handler which scales voltage
> > > appropriately whenever clk_set_rate is called on an affected clock.
> > 
> > I really think clk_enable and clk_disable should also be triggering
> > notifier call and DVFS should act accordingly since there are cases
> > drivers won't set clock rate but instead disable its clock directly, do
> > you agree?
> > > 
> 
> Hi Bill,
> 
> I'll think about this.  Perhaps a better solution would be to adapt
> these drivers to runtime PM.  Then a call to runtime_pm_put() would
> result in a call to clk_disable(...) and regulator_set_voltage(...).
> 
> There is no performance-based equivalent to runtime PM, which is one
> reason why clk_set_rate is a likely entry point into dvfs.  But for
> operations that have nice api's like runtime PM it would be better to
> use those interfaces and not overload the clk.h api unnecessarily.
> 

Bill,

I wasn't thinking at all when I wrote this.  Trying to rush to the
airport I guess...

clk_enable() and clk_disable() must not sleep and all operations are
done under a spinlock.  So this rules out most use of notifiers.  It is
expected for some drivers to very aggressively enable/disable clocks in
interrupt handlers so scaling voltage as a function of clk_{en|dis}able
calls is also likely out of the question.

Some platforms have chosen to implement voltage scaling in their
.prepare callbacks.  I personally do not like this and still prefer
drivers be adapted to runtime pm and let those callbacks handle voltage
scaling along with clock handling.

Regards,
Mike

> > > There are three prerequisites to using this feature:
> > > 
> > > 1) the affected clocks must be using the common clk framework
> > > 2) voltage must be scaled using the regulator framework
> > > 3) clock frequency and regulator voltage values must be paired via the
> > > OPP library
> > 
> > Just a note, Tegra Core won't meet prerequisite #3 since each regulator
> > voltage values is associated with clocks driving those many sub-HW
> > blocks in it.
> 
> This patch isn't the one and only way to perform dvfs.  It is just a
> helper function for registering notifier handlers for systems that meet
> the above three requirements.  Even if you do not use the OPP library
> there is no reason why you could not register your own rate-change
> notifier handler to implement dvfs using whatever lookup-table you use
> today.
> 
> And patches are welcome to extend the usefulness of this helper.  I'd
> like as many people to benefit from this mechanism as possible.
> 
> At some point we should think hard about DT bindings for these operating
> points...
> 
> Regards,
> Mike
Stephen Warren March 1, 2013, 8:49 p.m. | #4
On 03/01/2013 02:41 AM, Bill Huang wrote:
> On Thu, 2013-02-28 at 12:49 +0800, Mike Turquette wrote:
>> Dynamic voltage and frequency scaling (dvfs) is a common power saving
>> technique in many of today's modern processors.  This patch introduces a
>> common clk rate-change notifier handler which scales voltage
>> appropriately whenever clk_set_rate is called on an affected clock.
> 
> I really think clk_enable and clk_disable should also be triggering
> notifier call and DVFS should act accordingly since there are cases
> drivers won't set clock rate but instead disable its clock directly, do
> you agree?
>>
>> There are three prerequisites to using this feature:
>>
>> 1) the affected clocks must be using the common clk framework
>> 2) voltage must be scaled using the regulator framework
>> 3) clock frequency and regulator voltage values must be paired via the
>> OPP library
> 
> Just a note, Tegra Core won't meet prerequisite #3 since each regulator
> voltage values is associated with clocks driving those many sub-HW
> blocks in it.

Perhaps that "just" means extending the dvfs.c code here to iterate over
each clock consumer (rather than each clock provider), and having each
set a minimum voltage (rather than a specific voltage), and having the
regulator core apply the maximum of those minimum constraints?

Or something like that anyway.
Bill Huang March 2, 2013, 2:55 a.m. | #5
On Sat, 2013-03-02 at 04:48 +0800, Mike Turquette wrote:
> Quoting Mike Turquette (2013-03-01 10:22:34)
> > Quoting Bill Huang (2013-03-01 01:41:31)
> > > On Thu, 2013-02-28 at 12:49 +0800, Mike Turquette wrote:
> > > > Dynamic voltage and frequency scaling (dvfs) is a common power saving
> > > > technique in many of today's modern processors.  This patch introduces a
> > > > common clk rate-change notifier handler which scales voltage
> > > > appropriately whenever clk_set_rate is called on an affected clock.
> > > 
> > > I really think clk_enable and clk_disable should also be triggering
> > > notifier call and DVFS should act accordingly since there are cases
> > > drivers won't set clock rate but instead disable its clock directly, do
> > > you agree?
> > > > 
> > 
> > Hi Bill,
> > 
> > I'll think about this.  Perhaps a better solution would be to adapt
> > these drivers to runtime PM.  Then a call to runtime_pm_put() would
> > result in a call to clk_disable(...) and regulator_set_voltage(...).
> > 
> > There is no performance-based equivalent to runtime PM, which is one
> > reason why clk_set_rate is a likely entry point into dvfs.  But for
> > operations that have nice api's like runtime PM it would be better to
> > use those interfaces and not overload the clk.h api unnecessarily.
> > 
> 
> Bill,
> 
> I wasn't thinking at all when I wrote this.  Trying to rush to the
> airport I guess...
> 
> clk_enable() and clk_disable() must not sleep and all operations are
> done under a spinlock.  So this rules out most use of notifiers.  It is
> expected for some drivers to very aggressively enable/disable clocks in
> interrupt handlers so scaling voltage as a function of clk_{en|dis}able
> calls is also likely out of the question.

Yeah for those existing drivers to call enable/disable clocks in
interrupt have ruled out this, I didn't think through when I was asking.
> 
> Some platforms have chosen to implement voltage scaling in their
> .prepare callbacks.  I personally do not like this and still prefer
> drivers be adapted to runtime pm and let those callbacks handle voltage
> scaling along with clock handling.

I think different SoC have different mechanisms or constraints on doing
their DVFS, such as Tegra VDD_CORE rail, it supplies power to many
devices and as a consequence each device do not have their own power
rail to control, instead a central driver to handle/control this power
rail is needed (to set voltage at the maximum of the requested voltage
from all its sub-devices), so I'm wondering even if every drivers are
doing DVFS through runtime pm, we're still having hole on knowing
whether or not clocks of the interested devices are enabled/disabled at
runtime, I'm not familiar with runtime pm and hence do not know if there
is a mechanism to handle this, I'll study a bit. Thanks.
> 
> Regards,
> Mike
> 
> > > > There are three prerequisites to using this feature:
> > > > 
> > > > 1) the affected clocks must be using the common clk framework
> > > > 2) voltage must be scaled using the regulator framework
> > > > 3) clock frequency and regulator voltage values must be paired via the
> > > > OPP library
> > > 
> > > Just a note, Tegra Core won't meet prerequisite #3 since each regulator
> > > voltage values is associated with clocks driving those many sub-HW
> > > blocks in it.
> > 
> > This patch isn't the one and only way to perform dvfs.  It is just a
> > helper function for registering notifier handlers for systems that meet
> > the above three requirements.  Even if you do not use the OPP library
> > there is no reason why you could not register your own rate-change
> > notifier handler to implement dvfs using whatever lookup-table you use
> > today.
> > 
> > And patches are welcome to extend the usefulness of this helper.  I'd
> > like as many people to benefit from this mechanism as possible.

The extension is not so easy for us though since OPP library is assuming
each device has a 1-1 mapping on its operating frequency and voltage.
> > 
> > At some point we should think hard about DT bindings for these operating
> > points...
> > 
> > Regards,
> > Mike
Bill Huang March 2, 2013, 2:58 a.m. | #6
On Sat, 2013-03-02 at 04:49 +0800, Stephen Warren wrote:
> On 03/01/2013 02:41 AM, Bill Huang wrote:
> > On Thu, 2013-02-28 at 12:49 +0800, Mike Turquette wrote:
> >> There are three prerequisites to using this feature:
> >>
> >> 1) the affected clocks must be using the common clk framework
> >> 2) voltage must be scaled using the regulator framework
> >> 3) clock frequency and regulator voltage values must be paired via the
> >> OPP library
> > 
> > Just a note, Tegra Core won't meet prerequisite #3 since each regulator
> > voltage values is associated with clocks driving those many sub-HW
> > blocks in it.
> 
> Perhaps that "just" means extending the dvfs.c code here to iterate over
> each clock consumer (rather than each clock provider), and having each
> set a minimum voltage (rather than a specific voltage), and having the
> regulator core apply the maximum of those minimum constraints?
> 
> Or something like that anyway.

Thanks, I'll think about this or maybe study a bit, it sounds like we
can leverage existing api in regulator framework (which I don't know) to
do what you've proposed, please clarify if I misunderstand.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
Richard Zhao March 2, 2013, 8:22 a.m. | #7
On Fri, Mar 01, 2013 at 06:55:54PM -0800, Bill Huang wrote:
> On Sat, 2013-03-02 at 04:48 +0800, Mike Turquette wrote:
> > Quoting Mike Turquette (2013-03-01 10:22:34)
> > > Quoting Bill Huang (2013-03-01 01:41:31)
> > > > On Thu, 2013-02-28 at 12:49 +0800, Mike Turquette wrote:
> > > > > Dynamic voltage and frequency scaling (dvfs) is a common power saving
> > > > > technique in many of today's modern processors.  This patch introduces a
> > > > > common clk rate-change notifier handler which scales voltage
> > > > > appropriately whenever clk_set_rate is called on an affected clock.
> > > > 
> > > > I really think clk_enable and clk_disable should also be triggering
> > > > notifier call and DVFS should act accordingly since there are cases
> > > > drivers won't set clock rate but instead disable its clock directly, do
> > > > you agree?
> > > > > 
> > > 
> > > Hi Bill,
> > > 
> > > I'll think about this.  Perhaps a better solution would be to adapt
> > > these drivers to runtime PM.  Then a call to runtime_pm_put() would
> > > result in a call to clk_disable(...) and regulator_set_voltage(...).
> > > 
> > > There is no performance-based equivalent to runtime PM, which is one
> > > reason why clk_set_rate is a likely entry point into dvfs.  But for
> > > operations that have nice api's like runtime PM it would be better to
> > > use those interfaces and not overload the clk.h api unnecessarily.
> > > 
> > 
> > Bill,
> > 
> > I wasn't thinking at all when I wrote this.  Trying to rush to the
> > airport I guess...
> > 
> > clk_enable() and clk_disable() must not sleep and all operations are
> > done under a spinlock.  So this rules out most use of notifiers.  It is
> > expected for some drivers to very aggressively enable/disable clocks in
> > interrupt handlers so scaling voltage as a function of clk_{en|dis}able
> > calls is also likely out of the question.
> 
> Yeah for those existing drivers to call enable/disable clocks in
> interrupt have ruled out this, I didn't think through when I was asking.
> > 
> > Some platforms have chosen to implement voltage scaling in their
> > .prepare callbacks.  I personally do not like this and still prefer
> > drivers be adapted to runtime pm and let those callbacks handle voltage
> > scaling along with clock handling.
Voltage scaling in clock notifiers seems similar. Clock and regulater
embedded code into each other will cause things complicated.
> 
> I think different SoC have different mechanisms or constraints on doing
> their DVFS, such as Tegra VDD_CORE rail, it supplies power to many
> devices and as a consequence each device do not have their own power
> rail to control, instead a central driver to handle/control this power
> rail is needed (to set voltage at the maximum of the requested voltage
> from all its sub-devices), so I'm wondering even if every drivers are
> doing DVFS through runtime pm, we're still having hole on knowing
> whether or not clocks of the interested devices are enabled/disabled at
> runtime, I'm not familiar with runtime pm and hence do not know if there
> is a mechanism to handle this, I'll study a bit. Thanks.
> > 
> > Regards,
> > Mike
> > 
> > > > > There are three prerequisites to using this feature:
> > > > > 
> > > > > 1) the affected clocks must be using the common clk framework
> > > > > 2) voltage must be scaled using the regulator framework
> > > > > 3) clock frequency and regulator voltage values must be paired via the
> > > > > OPP library
> > > > 
> > > > Just a note, Tegra Core won't meet prerequisite #3 since each regulator
> > > > voltage values is associated with clocks driving those many sub-HW
> > > > blocks in it.
> > > 
> > > This patch isn't the one and only way to perform dvfs.  It is just a
> > > helper function for registering notifier handlers for systems that meet
> > > the above three requirements.  Even if you do not use the OPP library
> > > there is no reason why you could not register your own rate-change
> > > notifier handler to implement dvfs using whatever lookup-table you use
> > > today.
> > > 
> > > And patches are welcome to extend the usefulness of this helper.  I'd
> > > like as many people to benefit from this mechanism as possible.
> 
> The extension is not so easy for us though since OPP library is assuming
> each device has a 1-1 mapping on its operating frequency and voltage.
Is there someone working on OPP clock/regulator sets support?

Thanks
Richard
> > > 
> > > At some point we should think hard about DT bindings for these operating
> > > points...
> > > 
> > > Regards,
> > > Mike
> 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Mike Turquette March 3, 2013, 10:54 a.m. | #8
Quoting Richard Zhao (2013-03-02 00:22:19)
> On Fri, Mar 01, 2013 at 06:55:54PM -0800, Bill Huang wrote:
> > On Sat, 2013-03-02 at 04:48 +0800, Mike Turquette wrote:
> > > Quoting Mike Turquette (2013-03-01 10:22:34)
> > > > Quoting Bill Huang (2013-03-01 01:41:31)
> > > > > On Thu, 2013-02-28 at 12:49 +0800, Mike Turquette wrote:
> > > > > > Dynamic voltage and frequency scaling (dvfs) is a common power saving
> > > > > > technique in many of today's modern processors.  This patch introduces a
> > > > > > common clk rate-change notifier handler which scales voltage
> > > > > > appropriately whenever clk_set_rate is called on an affected clock.
> > > > > 
> > > > > I really think clk_enable and clk_disable should also be triggering
> > > > > notifier call and DVFS should act accordingly since there are cases
> > > > > drivers won't set clock rate but instead disable its clock directly, do
> > > > > you agree?
> > > > > > 
> > > > 
> > > > Hi Bill,
> > > > 
> > > > I'll think about this.  Perhaps a better solution would be to adapt
> > > > these drivers to runtime PM.  Then a call to runtime_pm_put() would
> > > > result in a call to clk_disable(...) and regulator_set_voltage(...).
> > > > 
> > > > There is no performance-based equivalent to runtime PM, which is one
> > > > reason why clk_set_rate is a likely entry point into dvfs.  But for
> > > > operations that have nice api's like runtime PM it would be better to
> > > > use those interfaces and not overload the clk.h api unnecessarily.
> > > > 
> > > 
> > > Bill,
> > > 
> > > I wasn't thinking at all when I wrote this.  Trying to rush to the
> > > airport I guess...
> > > 
> > > clk_enable() and clk_disable() must not sleep and all operations are
> > > done under a spinlock.  So this rules out most use of notifiers.  It is
> > > expected for some drivers to very aggressively enable/disable clocks in
> > > interrupt handlers so scaling voltage as a function of clk_{en|dis}able
> > > calls is also likely out of the question.
> > 
> > Yeah for those existing drivers to call enable/disable clocks in
> > interrupt have ruled out this, I didn't think through when I was asking.
> > > 
> > > Some platforms have chosen to implement voltage scaling in their
> > > .prepare callbacks.  I personally do not like this and still prefer
> > > drivers be adapted to runtime pm and let those callbacks handle voltage
> > > scaling along with clock handling.
> Voltage scaling in clock notifiers seems similar. Clock and regulater
> embedded code into each other will cause things complicated.

Hi Richard,

Sorry, I do not follow the above statement.  Can you clarify what you
mean?

> > 
> > I think different SoC have different mechanisms or constraints on doing
> > their DVFS, such as Tegra VDD_CORE rail, it supplies power to many
> > devices and as a consequence each device do not have their own power
> > rail to control, instead a central driver to handle/control this power
> > rail is needed (to set voltage at the maximum of the requested voltage
> > from all its sub-devices), so I'm wondering even if every drivers are
> > doing DVFS through runtime pm, we're still having hole on knowing
> > whether or not clocks of the interested devices are enabled/disabled at
> > runtime, I'm not familiar with runtime pm and hence do not know if there
> > is a mechanism to handle this, I'll study a bit. Thanks.
> > > 
> > > Regards,
> > > Mike
> > > 
> > > > > > There are three prerequisites to using this feature:
> > > > > > 
> > > > > > 1) the affected clocks must be using the common clk framework
> > > > > > 2) voltage must be scaled using the regulator framework
> > > > > > 3) clock frequency and regulator voltage values must be paired via the
> > > > > > OPP library
> > > > > 
> > > > > Just a note, Tegra Core won't meet prerequisite #3 since each regulator
> > > > > voltage values is associated with clocks driving those many sub-HW
> > > > > blocks in it.
> > > > 
> > > > This patch isn't the one and only way to perform dvfs.  It is just a
> > > > helper function for registering notifier handlers for systems that meet
> > > > the above three requirements.  Even if you do not use the OPP library
> > > > there is no reason why you could not register your own rate-change
> > > > notifier handler to implement dvfs using whatever lookup-table you use
> > > > today.
> > > > 
> > > > And patches are welcome to extend the usefulness of this helper.  I'd
> > > > like as many people to benefit from this mechanism as possible.
> > 
> > The extension is not so easy for us though since OPP library is assuming
> > each device has a 1-1 mapping on its operating frequency and voltage.
> Is there someone working on OPP clock/regulator sets support?
> 

No, but I'm going to bring this up at LCA on Wednesday.  It would be
nice to have some DT bindings for declaring operating points that tie
clocks & regulators together.

Regards,
Mike

> Thanks
> Richard
> > > > 
> > > > At some point we should think hard about DT bindings for these operating
> > > > points...
> > > > 
> > > > Regards,
> > > > Mike
> > 
> > 
> > 
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Richard Zhao March 3, 2013, 1:27 p.m. | #9
Hi Mike,

On Sun, Mar 03, 2013 at 02:54:24AM -0800, Mike Turquette wrote:
> Quoting Richard Zhao (2013-03-02 00:22:19)
> > On Fri, Mar 01, 2013 at 06:55:54PM -0800, Bill Huang wrote:
> > > On Sat, 2013-03-02 at 04:48 +0800, Mike Turquette wrote:
> > > > Quoting Mike Turquette (2013-03-01 10:22:34)
> > > > > Quoting Bill Huang (2013-03-01 01:41:31)
> > > > > > On Thu, 2013-02-28 at 12:49 +0800, Mike Turquette wrote:
> > > > > > > Dynamic voltage and frequency scaling (dvfs) is a common power saving
> > > > > > > technique in many of today's modern processors.  This patch introduces a
> > > > > > > common clk rate-change notifier handler which scales voltage
> > > > > > > appropriately whenever clk_set_rate is called on an affected clock.
> > > > > > 
> > > > > > I really think clk_enable and clk_disable should also be triggering
> > > > > > notifier call and DVFS should act accordingly since there are cases
> > > > > > drivers won't set clock rate but instead disable its clock directly, do
> > > > > > you agree?
> > > > > > > 
> > > > > 
> > > > > Hi Bill,
> > > > > 
> > > > > I'll think about this.  Perhaps a better solution would be to adapt
> > > > > these drivers to runtime PM.  Then a call to runtime_pm_put() would
> > > > > result in a call to clk_disable(...) and regulator_set_voltage(...).
> > > > > 
> > > > > There is no performance-based equivalent to runtime PM, which is one
> > > > > reason why clk_set_rate is a likely entry point into dvfs.  But for
> > > > > operations that have nice api's like runtime PM it would be better to
> > > > > use those interfaces and not overload the clk.h api unnecessarily.
> > > > > 
> > > > 
> > > > Bill,
> > > > 
> > > > I wasn't thinking at all when I wrote this.  Trying to rush to the
> > > > airport I guess...
> > > > 
> > > > clk_enable() and clk_disable() must not sleep and all operations are
> > > > done under a spinlock.  So this rules out most use of notifiers.  It is
> > > > expected for some drivers to very aggressively enable/disable clocks in
> > > > interrupt handlers so scaling voltage as a function of clk_{en|dis}able
> > > > calls is also likely out of the question.
> > > 
> > > Yeah for those existing drivers to call enable/disable clocks in
> > > interrupt have ruled out this, I didn't think through when I was asking.
> > > > 
> > > > Some platforms have chosen to implement voltage scaling in their
> > > > .prepare callbacks.  I personally do not like this and still prefer
> > > > drivers be adapted to runtime pm and let those callbacks handle voltage
> > > > scaling along with clock handling.
> > Voltage scaling in clock notifiers seems similar. Clock and regulater
> > embedded code into each other will cause things complicated.
> 
> Hi Richard,
> 
> Sorry, I do not follow the above statement.  Can you clarify what you
> mean?
As we have agreement that a operating point may have multiple clocks
and regulators, this patch is impossible to support multi clocks. And
it might mislead dvfs implementer to use clock notifier. It may be good
to have unified api like dvfs_set_opp(opp), or drivers can handle clocks
and regulators theirselves which is more flexible. What do you think?

Thanks
Richard
Mike Turquette March 4, 2013, 7:25 a.m. | #10
Quoting Richard Zhao (2013-03-03 05:27:52)
> Hi Mike,
> 
> On Sun, Mar 03, 2013 at 02:54:24AM -0800, Mike Turquette wrote:
> > Quoting Richard Zhao (2013-03-02 00:22:19)
> > > On Fri, Mar 01, 2013 at 06:55:54PM -0800, Bill Huang wrote:
> > > > On Sat, 2013-03-02 at 04:48 +0800, Mike Turquette wrote:
> > > > > Quoting Mike Turquette (2013-03-01 10:22:34)
> > > > > > Quoting Bill Huang (2013-03-01 01:41:31)
> > > > > > > On Thu, 2013-02-28 at 12:49 +0800, Mike Turquette wrote:
> > > > > > > > Dynamic voltage and frequency scaling (dvfs) is a common power saving
> > > > > > > > technique in many of today's modern processors.  This patch introduces a
> > > > > > > > common clk rate-change notifier handler which scales voltage
> > > > > > > > appropriately whenever clk_set_rate is called on an affected clock.
> > > > > > > 
> > > > > > > I really think clk_enable and clk_disable should also be triggering
> > > > > > > notifier call and DVFS should act accordingly since there are cases
> > > > > > > drivers won't set clock rate but instead disable its clock directly, do
> > > > > > > you agree?
> > > > > > > > 
> > > > > > 
> > > > > > Hi Bill,
> > > > > > 
> > > > > > I'll think about this.  Perhaps a better solution would be to adapt
> > > > > > these drivers to runtime PM.  Then a call to runtime_pm_put() would
> > > > > > result in a call to clk_disable(...) and regulator_set_voltage(...).
> > > > > > 
> > > > > > There is no performance-based equivalent to runtime PM, which is one
> > > > > > reason why clk_set_rate is a likely entry point into dvfs.  But for
> > > > > > operations that have nice api's like runtime PM it would be better to
> > > > > > use those interfaces and not overload the clk.h api unnecessarily.
> > > > > > 
> > > > > 
> > > > > Bill,
> > > > > 
> > > > > I wasn't thinking at all when I wrote this.  Trying to rush to the
> > > > > airport I guess...
> > > > > 
> > > > > clk_enable() and clk_disable() must not sleep and all operations are
> > > > > done under a spinlock.  So this rules out most use of notifiers.  It is
> > > > > expected for some drivers to very aggressively enable/disable clocks in
> > > > > interrupt handlers so scaling voltage as a function of clk_{en|dis}able
> > > > > calls is also likely out of the question.
> > > > 
> > > > Yeah for those existing drivers to call enable/disable clocks in
> > > > interrupt have ruled out this, I didn't think through when I was asking.
> > > > > 
> > > > > Some platforms have chosen to implement voltage scaling in their
> > > > > .prepare callbacks.  I personally do not like this and still prefer
> > > > > drivers be adapted to runtime pm and let those callbacks handle voltage
> > > > > scaling along with clock handling.
> > > Voltage scaling in clock notifiers seems similar. Clock and regulater
> > > embedded code into each other will cause things complicated.
> > 
> > Hi Richard,
> > 
> > Sorry, I do not follow the above statement.  Can you clarify what you
> > mean?
> As we have agreement that a operating point may have multiple clocks
> and regulators, this patch is impossible to support multi clocks. And
> it might mislead dvfs implementer to use clock notifier. It may be good
> to have unified api like dvfs_set_opp(opp), or drivers can handle clocks
> and regulators theirselves which is more flexible. What do you think?
> 

Yes, there is a long-standing question whether clk_set_rate is
sufficient to cover dvfs needs or if a new api is required.  There are
many possible solutions.

One solution is to use clk_set_rate and use the rate-change notifiers to
call clk_set_rate on the other required clocks.  This is graceful from
the perspective of the driver since the driver author only has to think
about directly managing the clock(s) for that device and the rest is
managed automagically.  It is more complicated for the platform
integrator that must make sure the automagical stuff is set up
correctly.  This might be considered a more "distributed" approach.

Another solution would be a new api which calls clk_set_rate
individually on the affected clocks (e.g. a functional clk, an async
bridge child clock, and some other related non-child clock).  This is
less complicated for the platform integrator and represents a more
"centralized" approach.  It is less graceful for the device driver
author who must learn a new api and decide whether to call the new api
or to call clk_set_rate.

A hybrid solution might be to set a flag on a clock (e.g.
CLK_SET_RATE_DVFS) which tells the clk framework that this clock is
special and clk_set_rate should call dvfs_set_opp(), where
dvfs_set_opp() is never exposed to drivers directly.

None of the above solutions are related to your point about scaling
voltage from clk_set_rate.  Voltage may still be scaled from the clock
rate-change notifier despite the method chose to scale groups of clocks.

Regards,
Mike

> Thanks
> Richard
Francesco Lavra March 10, 2013, 10:21 a.m. | #11
On 02/28/2013 05:49 AM, Mike Turquette wrote:
> Dynamic voltage and frequency scaling (dvfs) is a common power saving
> technique in many of today's modern processors.  This patch introduces a
> common clk rate-change notifier handler which scales voltage
> appropriately whenever clk_set_rate is called on an affected clock.
> 
> There are three prerequisites to using this feature:
> 
> 1) the affected clocks must be using the common clk framework
> 2) voltage must be scaled using the regulator framework
> 3) clock frequency and regulator voltage values must be paired via the
> OPP library
> 
> If a platform or device meets these requirements then using the notifier
> handler is straightforward.  A struct device is used as the basis for
> performing initial look-ups for clocks via clk_get and regulators via
> regulator_get.  This means that notifiers are subscribed on a per-device
> basis and multiple devices can have notifiers subscribed to the same
> clock.  Put another way, the voltage chosen for a rail during a call to
> clk_set_rate is a function of the device, not the clock.
> 
> Signed-off-by: Mike Turquette <mturquette@linaro.org>
[...]
> +struct dvfs_info *dvfs_clk_notifier_register(struct dvfs_info_init *dii)
> +{
> +	struct dvfs_info *di;
> +	int ret = 0;
> +
> +	if (!dii)
> +		return ERR_PTR(-EINVAL);
> +
> +	di = kzalloc(sizeof(struct dvfs_info), GFP_KERNEL);
> +	if (!di)
> +		return ERR_PTR(-ENOMEM);
> +
> +	di->dev = dii->dev;
> +	di->clk = clk_get(di->dev, dii->con_id);
> +	if (IS_ERR(di->clk)) {
> +		ret = -ENOMEM;
> +		goto err;
> +	}
> +
> +	di->reg = regulator_get(di->dev, dii->reg_id);
> +	if (IS_ERR(di->reg)) {
> +		ret = -ENOMEM;
> +		goto err;
> +	}
> +
> +	di->tol = dii->tol;
> +	di->nb.notifier_call = dvfs_clk_notifier_handler;
> +
> +	ret = clk_notifier_register(di->clk, &di->nb);
> +
> +	if (ret)
> +		goto err;

Shouldn't regulator_put() and clk_put() be called in the error path?

> +
> +	return di;
> +
> +err:
> +	kfree(di);
> +	return ERR_PTR(ret);
> +}
> +EXPORT_SYMBOL_GPL(dvfs_clk_notifier_register);
> +
> +void dvfs_clk_notifier_unregister(struct dvfs_info *di)
> +{
> +	clk_notifier_unregister(di->clk, &di->nb);
> +	clk_put(di->clk);
> +	regulator_put(di->reg);
> +	kfree(di);
> +}
> +EXPORT_SYMBOL_GPL(dvfs_clk_notifier_unregister);

Regards,
Francesco
Ulf Hansson March 13, 2013, 1:59 p.m. | #12
On 4 March 2013 08:25, Mike Turquette <mturquette@linaro.org> wrote:
> Quoting Richard Zhao (2013-03-03 05:27:52)
>> Hi Mike,
>>
>> On Sun, Mar 03, 2013 at 02:54:24AM -0800, Mike Turquette wrote:
>> > Quoting Richard Zhao (2013-03-02 00:22:19)
>> > > On Fri, Mar 01, 2013 at 06:55:54PM -0800, Bill Huang wrote:
>> > > > On Sat, 2013-03-02 at 04:48 +0800, Mike Turquette wrote:
>> > > > > Quoting Mike Turquette (2013-03-01 10:22:34)
>> > > > > > Quoting Bill Huang (2013-03-01 01:41:31)
>> > > > > > > On Thu, 2013-02-28 at 12:49 +0800, Mike Turquette wrote:
>> > > > > > > > Dynamic voltage and frequency scaling (dvfs) is a common power saving
>> > > > > > > > technique in many of today's modern processors.  This patch introduces a
>> > > > > > > > common clk rate-change notifier handler which scales voltage
>> > > > > > > > appropriately whenever clk_set_rate is called on an affected clock.
>> > > > > > >
>> > > > > > > I really think clk_enable and clk_disable should also be triggering
>> > > > > > > notifier call and DVFS should act accordingly since there are cases
>> > > > > > > drivers won't set clock rate but instead disable its clock directly, do
>> > > > > > > you agree?
>> > > > > > > >
>> > > > > >
>> > > > > > Hi Bill,
>> > > > > >
>> > > > > > I'll think about this.  Perhaps a better solution would be to adapt
>> > > > > > these drivers to runtime PM.  Then a call to runtime_pm_put() would
>> > > > > > result in a call to clk_disable(...) and regulator_set_voltage(...).
>> > > > > >
>> > > > > > There is no performance-based equivalent to runtime PM, which is one
>> > > > > > reason why clk_set_rate is a likely entry point into dvfs.  But for
>> > > > > > operations that have nice api's like runtime PM it would be better to
>> > > > > > use those interfaces and not overload the clk.h api unnecessarily.
>> > > > > >
>> > > > >
>> > > > > Bill,
>> > > > >
>> > > > > I wasn't thinking at all when I wrote this.  Trying to rush to the
>> > > > > airport I guess...
>> > > > >
>> > > > > clk_enable() and clk_disable() must not sleep and all operations are
>> > > > > done under a spinlock.  So this rules out most use of notifiers.  It is
>> > > > > expected for some drivers to very aggressively enable/disable clocks in
>> > > > > interrupt handlers so scaling voltage as a function of clk_{en|dis}able
>> > > > > calls is also likely out of the question.
>> > > >
>> > > > Yeah for those existing drivers to call enable/disable clocks in
>> > > > interrupt have ruled out this, I didn't think through when I was asking.
>> > > > >
>> > > > > Some platforms have chosen to implement voltage scaling in their
>> > > > > .prepare callbacks.  I personally do not like this and still prefer
>> > > > > drivers be adapted to runtime pm and let those callbacks handle voltage
>> > > > > scaling along with clock handling.
>> > > Voltage scaling in clock notifiers seems similar. Clock and regulater
>> > > embedded code into each other will cause things complicated.
>> >
>> > Hi Richard,
>> >
>> > Sorry, I do not follow the above statement.  Can you clarify what you
>> > mean?
>> As we have agreement that a operating point may have multiple clocks
>> and regulators, this patch is impossible to support multi clocks. And
>> it might mislead dvfs implementer to use clock notifier. It may be good
>> to have unified api like dvfs_set_opp(opp), or drivers can handle clocks
>> and regulators theirselves which is more flexible. What do you think?
>>
>
> Yes, there is a long-standing question whether clk_set_rate is
> sufficient to cover dvfs needs or if a new api is required.  There are
> many possible solutions.
>
> One solution is to use clk_set_rate and use the rate-change notifiers to
> call clk_set_rate on the other required clocks.  This is graceful from
> the perspective of the driver since the driver author only has to think
> about directly managing the clock(s) for that device and the rest is
> managed automagically.  It is more complicated for the platform
> integrator that must make sure the automagical stuff is set up
> correctly.  This might be considered a more "distributed" approach.

From my point of view, I see some concern with this solution. Mainly
because of a lot of complexity with regards to DVFS will be pushed
down to be handled by each an every driver. Probably it will be okay
for SoC specific drivers but likely not for cross SoC drivers, if you
see what I mean.

>
> Another solution would be a new api which calls clk_set_rate
> individually on the affected clocks (e.g. a functional clk, an async
> bridge child clock, and some other related non-child clock).  This is
> less complicated for the platform integrator and represents a more
> "centralized" approach.  It is less graceful for the device driver
> author who must learn a new api and decide whether to call the new api
> or to call clk_set_rate.

This could be somewhat more preferred than the first solution. Likely
less code in each driver but still "DVFS intelligence" need to exist
there.

>
> A hybrid solution might be to set a flag on a clock (e.g.
> CLK_SET_RATE_DVFS) which tells the clk framework that this clock is
> special and clk_set_rate should call dvfs_set_opp(), where
> dvfs_set_opp() is never exposed to drivers directly.
>

I like the direction of were this idea is going. In principle that
will mean that you actually can do DVFS handling from
clk_prepare|unprepare as well, which I think is a wanted feature as
well.
Moreover, drivers do in general not need to bother about DVFS which in
the end should be a good, right?

For reference, from a ux500 SoC perspective, we have internal code -
not upstreamed yet, which implements a specific "OPP clock" type. No
changes has been done to the common clk framework for this. The "OPP
clock" clk hw, utilizes the OPP library to find out what opp to chose
for a certain frequency and then request a change if needed. To make
this solution really fly we do require the clk API to be re-entrant
since that is needed to be able to update the OPP. Moreover, it would
be possible to make use of the clk_prepare|unprepare callbacks for
this clk hw to also handle OPP changes.

> None of the above solutions are related to your point about scaling
> voltage from clk_set_rate.  Voltage may still be scaled from the clock
> rate-change notifier despite the method chose to scale groups of clocks.
>
> Regards,
> Mike
>
>> Thanks
>> Richard
>
> _______________________________________________
> linaro-dev mailing list
> linaro-dev@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-dev

Kind regards
Ulf Hansson
taras.kondratiuk@linaro.org April 2, 2013, 5:49 p.m. | #13
Hi Mike

> +/*
> + * Copyright (C) 2011-2012 Linaro Ltd <mturquette@linaro.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * Helper functions for dynamic voltage & frequency transitions using
> + * the OPP library.
> + */
> +
> +#include <linux/clk.h>
> +#include <linux/regulator/consumer.h>
> +#include <linux/opp.h>
> +#include <linux/device.h>
> +#include <linux/slab.h>
> +#include <linux/module.h>
> +
> +/*
> + * XXX clk, regulator & tolerance should be stored in the OPP table?
> + */
> +struct dvfs_info {
> +    struct device *dev;
> +    struct clk *clk;
> +    struct regulator *reg;
> +    int tol;
> +    struct notifier_block nb;
> +};
> +
> +#define to_dvfs_info(_nb) container_of(_nb, struct dvfs_info, nb)
> +
> +static int dvfs_clk_notifier_handler(struct notifier_block *nb,
> +        unsigned long flags, void *data)
> +{
> +    struct clk_notifier_data *cnd = data;
> +    struct dvfs_info *di = to_dvfs_info(nb);
> +    int ret, volt_new, volt_old;
> +    struct opp *opp;
> +
> +    volt_old = regulator_get_voltage(di->reg);
> +    rcu_read_lock();
> +    opp = opp_find_freq_floor(di->dev, &cnd->new_rate);
I think here should be opp_find_freq_ceil().
Let's imagine we have a following OPP table for some device:
1 - <100MHz - 1.0V>
2 - <200MHz - 1.2V>
3 - <400MHz - 1.4V>

If device driver scales clock to 150MHz, then OPP #1 will be chosen.
This will lead to configuration <150MHz - 1.0V> that may be unstable.
It would be better to ceil and end-up with <150MHz - 1.2V>.

> +    volt_new = opp_get_voltage(opp);
> +    rcu_read_unlock();
> +
> +    /* scaling up?  scale voltage before frequency */
> +    if (flags & PRE_RATE_CHANGE && cnd->new_rate > cnd->old_rate) {
opp_find_freq_*() functions can update cnd->new_rate,
so you may compare not exactly what you are expecting.

> +        dev_dbg(di->dev, "%s: %d mV --> %d mV\n",
> +                __func__, volt_old, volt_new);
> +
> +        ret = regulator_set_voltage_tol(di->reg, volt_new, di->tol);
I may not have a deep understanding of regulator framework,
but looks like regulator_set_voltage_tol() is not the right API here.
As per my understanding regulator framework should aggregate
voltage request from different consumers of the particular regulator.

Let's say there are two consumers of VDD regulator.
First device set 1.0V. It will map to range 0.98-1.02V (2% tolerance).
If second device will try to set 1.3V it will fail because range 1.28-1.32V
does not overlap with the first request.

Maybe better to set upper limit to max OPP voltage or just use INT_MAX.

> +
> +        if (ret) {
> +            dev_warn(di->dev, "%s: unable to scale voltage up.\n",
> +                 __func__);
> +            return notifier_from_errno(ret);
> +        }
> +    }
> +
> +    /* scaling down?  scale voltage after frequency */
> +    if (flags & POST_RATE_CHANGE && cnd->new_rate < cnd->old_rate) {
> +        dev_dbg(di->dev, "%s: %d mV --> %d mV\n",
> +                __func__, volt_old, volt_new);
> +
> +        ret = regulator_set_voltage_tol(di->reg, volt_new, di->tol);
> +
> +        if (ret) {
> +            dev_warn(di->dev, "%s: unable to scale voltage down.\n",
> +                 __func__);
> +            return notifier_from_errno(ret);
> +        }
> +    }
> +
> +    return NOTIFY_OK;
> +}
> +
> +struct dvfs_info *dvfs_clk_notifier_register(struct dvfs_info_init *dii)
> +{
> +    struct dvfs_info *di;
> +    int ret = 0;
> +
> +    if (!dii)
> +        return ERR_PTR(-EINVAL);
> +
> +    di = kzalloc(sizeof(struct dvfs_info), GFP_KERNEL);
> +    if (!di)
> +        return ERR_PTR(-ENOMEM);
> +
> +    di->dev = dii->dev;
> +    di->clk = clk_get(di->dev, dii->con_id);
> +    if (IS_ERR(di->clk)) {
> +        ret = -ENOMEM;
> +        goto err;
> +    }
> +
> +    di->reg = regulator_get(di->dev, dii->reg_id);
> +    if (IS_ERR(di->reg)) {
> +        ret = -ENOMEM;
> +        goto err;
> +    }
> +
> +    di->tol = dii->tol;
> +    di->nb.notifier_call = dvfs_clk_notifier_handler;
> +
> +    ret = clk_notifier_register(di->clk, &di->nb);
> +
> +    if (ret)
> +        goto err;
> +
> +    return di;
> +
> +err:
> +    kfree(di);
> +    return ERR_PTR(ret);
> +}
> +EXPORT_SYMBOL_GPL(dvfs_clk_notifier_register);
> +
> +void dvfs_clk_notifier_unregister(struct dvfs_info *di)
> +{
> +    clk_notifier_unregister(di->clk, &di->nb);
> +    clk_put(di->clk);
> +    regulator_put(di->reg);
> +    kfree(di);
> +}
> +EXPORT_SYMBOL_GPL(dvfs_clk_notifier_unregister);
> diff --git a/include/linux/clk.h b/include/linux/clk.h
> index b3ac22d..28d952f 100644
> --- a/include/linux/clk.h
> +++ b/include/linux/clk.h
> @@ -78,9 +78,34 @@  struct clk_notifier_data {
>      unsigned long        new_rate;
>  };
>
> -int clk_notifier_register(struct clk *clk, struct notifier_block *nb);
> +/**
> + * struct dvfs_info_init - data needs to initialize struct dvfs_info
> + * @dev:    device related to this frequency-voltage pair
> + * @con_id:    string name of clock connection
> + * @reg_id:    string name of regulator
> + * @tol:    voltage tolerance for this device
> + *
> + * Provides the data needed to register a common dvfs sequence in a clk
> + * notifier handler.  The clk and regulator lookups are stored in a
> + * private struct and the notifier handler is registered with the clk
> + * framework with a call to dvfs_clk_notifier_register.
> + *
> + * FIXME stuffing @tol here is a hack.  It belongs in the opp table.
> + * Maybe clk & regulator will also live in the opp table some day.
> + */
> +struct dvfs_info_init {
> +    struct device *dev;
> +    const char *con_id;
> +    const char *reg_id;
> +    int tol;
> +};
> +
> +struct dvfs_info;
>
> +int clk_notifier_register(struct clk *clk, struct notifier_block *nb);
>  int clk_notifier_unregister(struct clk *clk, struct notifier_block *nb);
> +struct dvfs_info *dvfs_clk_notifier_register(struct dvfs_info_init *dii);
> +void dvfs_clk_notifier_unregister(struct dvfs_info *di);
>
>  #endif

Patch

diff --git a/drivers/clk/Makefile b/drivers/clk/Makefile
index e73b1d6..e720b7c 100644
--- a/drivers/clk/Makefile
+++ b/drivers/clk/Makefile
@@ -7,6 +7,7 @@  obj-$(CONFIG_COMMON_CLK)	+= clk-fixed-factor.o
 obj-$(CONFIG_COMMON_CLK)	+= clk-fixed-rate.o
 obj-$(CONFIG_COMMON_CLK)	+= clk-gate.o
 obj-$(CONFIG_COMMON_CLK)	+= clk-mux.o
+obj-$(CONFIG_COMMON_CLK)	+= dvfs.o
 
 # SoCs specific
 obj-$(CONFIG_ARCH_BCM2835)	+= clk-bcm2835.o
diff --git a/drivers/clk/dvfs.c b/drivers/clk/dvfs.c
new file mode 100644
index 0000000..d916d0b
--- /dev/null
+++ b/drivers/clk/dvfs.c
@@ -0,0 +1,125 @@ 
+/*
+ * Copyright (C) 2011-2012 Linaro Ltd <mturquette@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Helper functions for dynamic voltage & frequency transitions using
+ * the OPP library.
+ */
+
+#include <linux/clk.h>
+#include <linux/regulator/consumer.h>
+#include <linux/opp.h>
+#include <linux/device.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+
+/*
+ * XXX clk, regulator & tolerance should be stored in the OPP table?
+ */
+struct dvfs_info {
+	struct device *dev;
+	struct clk *clk;
+	struct regulator *reg;
+	int tol;
+	struct notifier_block nb;
+};
+
+#define to_dvfs_info(_nb) container_of(_nb, struct dvfs_info, nb)
+
+static int dvfs_clk_notifier_handler(struct notifier_block *nb,
+		unsigned long flags, void *data)
+{
+	struct clk_notifier_data *cnd = data;
+	struct dvfs_info *di = to_dvfs_info(nb);
+	int ret, volt_new, volt_old;
+	struct opp *opp;
+
+	volt_old = regulator_get_voltage(di->reg);
+	rcu_read_lock();
+	opp = opp_find_freq_floor(di->dev, &cnd->new_rate);
+	volt_new = opp_get_voltage(opp);
+	rcu_read_unlock();
+
+	/* scaling up?  scale voltage before frequency */
+	if (flags & PRE_RATE_CHANGE && cnd->new_rate > cnd->old_rate) {
+		dev_dbg(di->dev, "%s: %d mV --> %d mV\n",
+				__func__, volt_old, volt_new);
+
+		ret = regulator_set_voltage_tol(di->reg, volt_new, di->tol);
+
+		if (ret) {
+			dev_warn(di->dev, "%s: unable to scale voltage up.\n",
+				 __func__);
+			return notifier_from_errno(ret);
+		}
+	}
+
+	/* scaling down?  scale voltage after frequency */
+	if (flags & POST_RATE_CHANGE && cnd->new_rate < cnd->old_rate) {
+		dev_dbg(di->dev, "%s: %d mV --> %d mV\n",
+				__func__, volt_old, volt_new);
+
+		ret = regulator_set_voltage_tol(di->reg, volt_new, di->tol);
+
+		if (ret) {
+			dev_warn(di->dev, "%s: unable to scale voltage down.\n",
+				 __func__);
+			return notifier_from_errno(ret);
+		}
+	}
+
+	return NOTIFY_OK;
+}
+
+struct dvfs_info *dvfs_clk_notifier_register(struct dvfs_info_init *dii)
+{
+	struct dvfs_info *di;
+	int ret = 0;
+
+	if (!dii)
+		return ERR_PTR(-EINVAL);
+
+	di = kzalloc(sizeof(struct dvfs_info), GFP_KERNEL);
+	if (!di)
+		return ERR_PTR(-ENOMEM);
+
+	di->dev = dii->dev;
+	di->clk = clk_get(di->dev, dii->con_id);
+	if (IS_ERR(di->clk)) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	di->reg = regulator_get(di->dev, dii->reg_id);
+	if (IS_ERR(di->reg)) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	di->tol = dii->tol;
+	di->nb.notifier_call = dvfs_clk_notifier_handler;
+
+	ret = clk_notifier_register(di->clk, &di->nb);
+
+	if (ret)
+		goto err;
+
+	return di;
+
+err:
+	kfree(di);
+	return ERR_PTR(ret);
+}
+EXPORT_SYMBOL_GPL(dvfs_clk_notifier_register);
+
+void dvfs_clk_notifier_unregister(struct dvfs_info *di)
+{
+	clk_notifier_unregister(di->clk, &di->nb);
+	clk_put(di->clk);
+	regulator_put(di->reg);
+	kfree(di);
+}
+EXPORT_SYMBOL_GPL(dvfs_clk_notifier_unregister);
diff --git a/include/linux/clk.h b/include/linux/clk.h
index b3ac22d..28d952f 100644
--- a/include/linux/clk.h
+++ b/include/linux/clk.h
@@ -78,9 +78,34 @@  struct clk_notifier_data {
 	unsigned long		new_rate;
 };
 
-int clk_notifier_register(struct clk *clk, struct notifier_block *nb);
+/**
+ * struct dvfs_info_init - data needs to initialize struct dvfs_info
+ * @dev:	device related to this frequency-voltage pair
+ * @con_id:	string name of clock connection
+ * @reg_id:	string name of regulator
+ * @tol:	voltage tolerance for this device
+ *
+ * Provides the data needed to register a common dvfs sequence in a clk
+ * notifier handler.  The clk and regulator lookups are stored in a
+ * private struct and the notifier handler is registered with the clk
+ * framework with a call to dvfs_clk_notifier_register.
+ *
+ * FIXME stuffing @tol here is a hack.  It belongs in the opp table.
+ * Maybe clk & regulator will also live in the opp table some day.
+ */
+struct dvfs_info_init {
+	struct device *dev;
+	const char *con_id;
+	const char *reg_id;
+	int tol;
+};
+
+struct dvfs_info;
 
+int clk_notifier_register(struct clk *clk, struct notifier_block *nb);
 int clk_notifier_unregister(struct clk *clk, struct notifier_block *nb);
+struct dvfs_info *dvfs_clk_notifier_register(struct dvfs_info_init *dii);
+void dvfs_clk_notifier_unregister(struct dvfs_info *di);
 
 #endif