regulator: core: Skip balancing of the enabled regulators in regulator_enable()

Message ID 20191008101709.13827-1-m.szyprowski@samsung.com
State New
Headers show
Series
  • regulator: core: Skip balancing of the enabled regulators in regulator_enable()
Related show

Commit Message

Marek Szyprowski Oct. 8, 2019, 10:17 a.m.
Commit f8702f9e4aa7 ("regulator: core: Use ww_mutex for regulators
locking"), regardless of the subject, added additional call to
regulator_balance_voltage() during regulator_enable(). This is basically
a good idea, however it causes some issue for the regulators which are
already enabled at boot and are critical for system operation (for example
provides supply to the CPU).

CPUfreq or other drivers typically call regulator_enable() on such
regulators during their probe, although the regulators are already enabled
by bootloader. The mentioned patch however added a call to
regulator_balance_voltage(), what in case of system boot, where no
additional requirements are set yet, typically causes to limit the voltage
to the minimal value defined at regulator constraints. This causes a crash
of the system when voltage on the CPU regulator is set to the lowest
possible value without adjusting the operation frequency. Fix this by
adding a check if regulator is already enabled - if so, then skip the
balancing procedure. The voltage will be balanced later anyway once the
required voltage value is requested.

Fixes: f8702f9e4aa7 ("regulator: core: Use ww_mutex for regulators locking")
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

---
This patch fixes enabling coupled requlators on Exynos5800-based Peach-Pi
board done by the following patch:
https://patchwork.kernel.org/patch/11172427/
Once it has been applied, the following issue has been reported:
https://patchwork.kernel.org/patch/11176661/
---
 drivers/regulator/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

-- 
2.17.1

Comments

Mark Brown Oct. 8, 2019, 11:50 a.m. | #1
On Tue, Oct 08, 2019 at 12:17:09PM +0200, Marek Szyprowski wrote:
> Commit f8702f9e4aa7 ("regulator: core: Use ww_mutex for regulators

> locking"), regardless of the subject, added additional call to

> regulator_balance_voltage() during regulator_enable(). This is basically

> a good idea, however it causes some issue for the regulators which are

> already enabled at boot and are critical for system operation (for example

> provides supply to the CPU).


If regulators are essential to system operation they should be marked as
always-on...

> CPUfreq or other drivers typically call regulator_enable() on such

> regulators during their probe, although the regulators are already enabled

> by bootloader. The mentioned patch however added a call to

> regulator_balance_voltage(), what in case of system boot, where no

> additional requirements are set yet, typically causes to limit the voltage

> to the minimal value defined at regulator constraints. This causes a crash

> of the system when voltage on the CPU regulator is set to the lowest

> possible value without adjusting the operation frequency. Fix this by

> adding a check if regulator is already enabled - if so, then skip the

> balancing procedure. The voltage will be balanced later anyway once the

> required voltage value is requested.


This then means that for users that might legitimately enable and
disable regulators that need to be constrained are forced to change the
voltage when they enable the regualtors in order to have their
constraints take effect which seems bad.  I'd rather change the the
cpufreq consumers to either not do the enable (since there really should
be an always-on constraint this should be redundant, we might need to
fix the core to take account of their settings though I think we lost
that) or to set the voltage to whatever they need prior to doing their
first enable, that seems more robust.
Marek Szyprowski Oct. 8, 2019, 12:01 p.m. | #2
Hi Mark,

On 08.10.2019 13:50, Mark Brown wrote:
> On Tue, Oct 08, 2019 at 12:17:09PM +0200, Marek Szyprowski wrote:

>> Commit f8702f9e4aa7 ("regulator: core: Use ww_mutex for regulators

>> locking"), regardless of the subject, added additional call to

>> regulator_balance_voltage() during regulator_enable(). This is basically

>> a good idea, however it causes some issue for the regulators which are

>> already enabled at boot and are critical for system operation (for example

>> provides supply to the CPU).

> If regulators are essential to system operation they should be marked as

> always-on...


The are marked as always on:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/exynos5800-peach-pi.dts#n253

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/exynos5800-peach-pi.dts#n265

>> CPUfreq or other drivers typically call regulator_enable() on such

>> regulators during their probe, although the regulators are already enabled

>> by bootloader. The mentioned patch however added a call to

>> regulator_balance_voltage(), what in case of system boot, where no

>> additional requirements are set yet, typically causes to limit the voltage

>> to the minimal value defined at regulator constraints. This causes a crash

>> of the system when voltage on the CPU regulator is set to the lowest

>> possible value without adjusting the operation frequency. Fix this by

>> adding a check if regulator is already enabled - if so, then skip the

>> balancing procedure. The voltage will be balanced later anyway once the

>> required voltage value is requested.

> This then means that for users that might legitimately enable and

> disable regulators that need to be constrained are forced to change the

> voltage when they enable the regualtors in order to have their

> constraints take effect which seems bad.  I'd rather change the the

> cpufreq consumers to either not do the enable (since there really should

> be an always-on constraint this should be redundant, we might need to

> fix the core to take account of their settings though I think we lost

> that) or to set the voltage to whatever they need prior to doing their

> first enable, that seems more robust.


Well, I'm open for other ways of fixing this issue. Calling enable on 
always-on regulator imho should not change its rate...

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland
Mark Brown Oct. 8, 2019, 12:06 p.m. | #3
On Tue, Oct 08, 2019 at 02:01:15PM +0200, Marek Szyprowski wrote:
> On 08.10.2019 13:50, Mark Brown wrote:


> > This then means that for users that might legitimately enable and

> > disable regulators that need to be constrained are forced to change the

> > voltage when they enable the regualtors in order to have their

> > constraints take effect which seems bad.  I'd rather change the the

> > cpufreq consumers to either not do the enable (since there really should

> > be an always-on constraint this should be redundant, we might need to

> > fix the core to take account of their settings though I think we lost

> > that) or to set the voltage to whatever they need prior to doing their

> > first enable, that seems more robust.


> Well, I'm open for other ways of fixing this issue. Calling enable on 

> always-on regulator imho should not change its rate...


Yes, although there is the whole "don't touch things until a consumer
tells us to" thing going on.  I had expected that this was kicking in
because we weren't paying attention to the constraints of disabled
regulators but I can't see the code implementing that any more so I
guess we removed it at some point (it was always debatable).
Marek Szyprowski Oct. 8, 2019, 12:38 p.m. | #4
Hi Mark,

On 08.10.2019 14:06, Mark Brown wrote:
> On Tue, Oct 08, 2019 at 02:01:15PM +0200, Marek Szyprowski wrote:

>> On 08.10.2019 13:50, Mark Brown wrote:

>>> This then means that for users that might legitimately enable and

>>> disable regulators that need to be constrained are forced to change the

>>> voltage when they enable the regualtors in order to have their

>>> constraints take effect which seems bad.  I'd rather change the the

>>> cpufreq consumers to either not do the enable (since there really should

>>> be an always-on constraint this should be redundant, we might need to

>>> fix the core to take account of their settings though I think we lost

>>> that) or to set the voltage to whatever they need prior to doing their

>>> first enable, that seems more robust.

>> Well, I'm open for other ways of fixing this issue. Calling enable on

>> always-on regulator imho should not change its rate...

> Yes, although there is the whole "don't touch things until a consumer

> tells us to" thing going on.  I had expected that this was kicking in

> because we weren't paying attention to the constraints of disabled

> regulators but I can't see the code implementing that any more so I

> guess we removed it at some point (it was always debatable).


Then if I get it right, the issue is caused by the commit 7f93ff73f7c8 
("opp: core: add regulators enable and disable"). I've checked and 
indeed reverting it fixes Peach Pi to boot properly. The question is if 
this is desired behavior or not?

I've CC: Viresh, Kamil and Bartlomiej, here is the link to the beginning 
of this thread:

https://lkml.org/lkml/2019/10/8/265

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland
Mark Brown Oct. 8, 2019, 12:47 p.m. | #5
On Tue, Oct 08, 2019 at 02:38:55PM +0200, Marek Szyprowski wrote:

> Then if I get it right, the issue is caused by the commit 7f93ff73f7c8 

> ("opp: core: add regulators enable and disable"). I've checked and 

> indeed reverting it fixes Peach Pi to boot properly. The question is if 

> this is desired behavior or not?


That doesn't seem ideal - either it's redundant for regulators that need
to be marked as always-on anyway or it's going to force the regulators
on when a device could do runtime PM (eg, if the same code can run on
something like a GPU which can be turned off while the screen is off or
is displaying a static image).
Bartlomiej Zolnierkiewicz Oct. 8, 2019, 1:24 p.m. | #6
On 10/8/19 2:47 PM, Mark Brown wrote:
> On Tue, Oct 08, 2019 at 02:38:55PM +0200, Marek Szyprowski wrote:

> 

>> Then if I get it right, the issue is caused by the commit 7f93ff73f7c8 

>> ("opp: core: add regulators enable and disable"). I've checked and 

>> indeed reverting it fixes Peach Pi to boot properly. The question is if 

>> this is desired behavior or not?

> 

> That doesn't seem ideal - either it's redundant for regulators that need

> to be marked as always-on anyway or it's going to force the regulators

> on when a device could do runtime PM (eg, if the same code can run on

> something like a GPU which can be turned off while the screen is off or

> is displaying a static image).


Commit 7f93ff73f7c8 ("opp: core: add regulators enable and disable")
currently can be safely reverted as all affected users use always-on
regulators. However IMHO it should be possible to enable always-on
regulator without side-effects.

When it comes to setting regulator constraints before doing enable
operation, it also seems to be possible solution but would require
splitting regulator_set_voltage() operation on two functions:

- one for setting constraints (before regulator_enable() operation)

- the other one actually setting voltage (after enable operation)

Unfortunately this is much bigger task and doesn't seem to be -rc
time material so I'm in favor of just applying Marek's fix as it is
for now.
 
Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics
Dmitry Osipenko Oct. 8, 2019, 3:02 p.m. | #7
08.10.2019 16:24, Bartlomiej Zolnierkiewicz пишет:
> 

> On 10/8/19 2:47 PM, Mark Brown wrote:

>> On Tue, Oct 08, 2019 at 02:38:55PM +0200, Marek Szyprowski wrote:

>>

>>> Then if I get it right, the issue is caused by the commit 7f93ff73f7c8 

>>> ("opp: core: add regulators enable and disable"). I've checked and 

>>> indeed reverting it fixes Peach Pi to boot properly.


Yes, please note that the "ww_mutex" patch didn't change the original logic and only
rearranged the code a tad.

 The question is if
>>> this is desired behavior or not?

>>

>> That doesn't seem ideal - either it's redundant for regulators that need

>> to be marked as always-on anyway or it's going to force the regulators

>> on when a device could do runtime PM (eg, if the same code can run on

>> something like a GPU which can be turned off while the screen is off or

>> is displaying a static image).

> 

> Commit 7f93ff73f7c8 ("opp: core: add regulators enable and disable")

> currently can be safely reverted as all affected users use always-on

> regulators. However IMHO it should be possible to enable always-on

> regulator without side-effects.

> 

> When it comes to setting regulator constraints before doing enable

> operation, it also seems to be possible solution but would require

> splitting regulator_set_voltage() operation on two functions:

> 

> - one for setting constraints (before regulator_enable() operation)

> 

> - the other one actually setting voltage (after enable operation)

> 

> Unfortunately this is much bigger task and doesn't seem to be -rc

> time material so I'm in favor of just applying Marek's fix as it is

> for now.


That OPP patch caused the same problem for the NVIDIA Tegra20 CPUFreq driver (in-progress)
and I resolved it in the coupler's code [0]. Perhaps the generic coupler could do the same
thing by assuming that min_uV=current_uV until any consumer sets the voltage, i.e. if
regulator_check_consumers(min_uV=0) returns min_uV=0.

[0] https://lkml.org/lkml/2019/7/25/892
Mark Brown Oct. 8, 2019, 3:48 p.m. | #8
On Tue, Oct 08, 2019 at 03:24:17PM +0200, Bartlomiej Zolnierkiewicz wrote:

> Commit 7f93ff73f7c8 ("opp: core: add regulators enable and disable")

> currently can be safely reverted as all affected users use always-on

> regulators. However IMHO it should be possible to enable always-on

> regulator without side-effects.


With coupled regulators you might have something kicking in because a
change was made on a completely different regulator...  If we don't take
account of coupling requirements we'd doubtless have issues with that at
some point.

> When it comes to setting regulator constraints before doing enable

> operation, it also seems to be possible solution but would require

> splitting regulator_set_voltage() operation on two functions:


> - one for setting constraints (before regulator_enable() operation)


> - the other one actually setting voltage (after enable operation)


I don't follow?  What would a "constraint" be in this context and how
would it be different to the voltage range you'd set in normal operation?
Bartlomiej Zolnierkiewicz Oct. 8, 2019, 4:02 p.m. | #9
On 10/8/19 5:48 PM, Mark Brown wrote:
> On Tue, Oct 08, 2019 at 03:24:17PM +0200, Bartlomiej Zolnierkiewicz wrote:

> 

>> Commit 7f93ff73f7c8 ("opp: core: add regulators enable and disable")

>> currently can be safely reverted as all affected users use always-on

>> regulators. However IMHO it should be possible to enable always-on

>> regulator without side-effects.

> 

> With coupled regulators you might have something kicking in because a

> change was made on a completely different regulator...  If we don't take

> account of coupling requirements we'd doubtless have issues with that at

> some point.


OK, I have not considered this.

>> When it comes to setting regulator constraints before doing enable

>> operation, it also seems to be possible solution but would require

>> splitting regulator_set_voltage() operation on two functions:

> 

>> - one for setting constraints (before regulator_enable() operation)

> 

>> - the other one actually setting voltage (after enable operation)

> 

> I don't follow?  What would a "constraint" be in this context and how

> would it be different to the voltage range you'd set in normal operation?


The constraint here would be just the voltage range. I just wanted to
point out that the actual voltage set operation (on the hardware itself
not the internal subsystem bookkeeping) shouldn't be done before enable
operation (especially in context of non-coupled regulators).

Taking into account your remark about enable operation on coupled
regulators and Dmitry's mail about cpufreq issue I think now that just
dropping opp change is the most straightforward fix.

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics
Mark Brown Oct. 8, 2019, 4:15 p.m. | #10
On Tue, Oct 08, 2019 at 06:02:36PM +0300, Dmitry Osipenko wrote:

Please fix your mail client to word wrap within paragraphs at something
substantially less than 80 columns.  Doing this makes your messages much
easier to read and reply to.

> That OPP patch caused the same problem for the NVIDIA Tegra20 CPUFreq

> driver (in-progress) and I resolved it in the coupler's code [0].

> Perhaps the generic coupler could do the same thing by assuming that

> min_uV=current_uV until any consumer sets the voltage, i.e. if

> regulator_check_consumers(min_uV=0) returns min_uV=0.


That sounds like it might just postpone the inevitable - if you set the
wrong voltage first it might decide to drop down some voltage that
wasn't expected.  There's a bit of a bootstrapping issue.  I think it
would be safer to just say that anything that is within spec won't get
changed any time we balance, we'd only change things if needed to bring
them back into spec.
Mark Brown Oct. 8, 2019, 4:21 p.m. | #11
On Tue, Oct 08, 2019 at 06:02:08PM +0200, Bartlomiej Zolnierkiewicz wrote:

> Taking into account your remark about enable operation on coupled

> regulators and Dmitry's mail about cpufreq issue I think now that just

> dropping opp change is the most straightforward fix.


It's certainly the most straightforward thing for the immediate problem.
I do think we probably need to improve how we're handling the coupling
though, we've got some fragility here that needs addressing.
Dmitry Osipenko Oct. 8, 2019, 5:05 p.m. | #12
08.10.2019 19:15, Mark Brown пишет:
> On Tue, Oct 08, 2019 at 06:02:36PM +0300, Dmitry Osipenko wrote:

> 

> Please fix your mail client to word wrap within paragraphs at something

> substantially less than 80 columns.  Doing this makes your messages much

> easier to read and reply to.


Indeed, thanks!

>> That OPP patch caused the same problem for the NVIDIA Tegra20 CPUFreq

>> driver (in-progress) and I resolved it in the coupler's code [0].

>> Perhaps the generic coupler could do the same thing by assuming that

>> min_uV=current_uV until any consumer sets the voltage, i.e. if

>> regulator_check_consumers(min_uV=0) returns min_uV=0.

> 

> That sounds like it might just postpone the inevitable - if you set the

> wrong voltage first it might decide to drop down some voltage that

> wasn't expected.  There's a bit of a bootstrapping issue.  I think it

> would be safer to just say that anything that is within spec won't get

> changed any time we balance, we'd only change things if needed to bring

> them back into spec.


Yes, the case of changing voltage before regulator is enabled seems
won't work as expected.

Maybe it won't hurt to disallow a non always-on regulators to be coupled
until there will be a real user for that case.
Mark Brown Oct. 8, 2019, 5:17 p.m. | #13
On Tue, Oct 08, 2019 at 08:05:03PM +0300, Dmitry Osipenko wrote:
> 08.10.2019 19:15, Mark Brown пишет:


> > That sounds like it might just postpone the inevitable - if you set the

> > wrong voltage first it might decide to drop down some voltage that

> > wasn't expected.  There's a bit of a bootstrapping issue.  I think it

> > would be safer to just say that anything that is within spec won't get

> > changed any time we balance, we'd only change things if needed to bring

> > them back into spec.


> Yes, the case of changing voltage before regulator is enabled seems

> won't work as expected.


> Maybe it won't hurt to disallow a non always-on regulators to be coupled

> until there will be a real user for that case.


I thought that coupling with the CPU core and main SoC power regulators
was one of the main use cases for this feature?
Dmitry Osipenko Oct. 8, 2019, 6 p.m. | #14
08.10.2019 20:17, Mark Brown пишет:
> On Tue, Oct 08, 2019 at 08:05:03PM +0300, Dmitry Osipenko wrote:

>> 08.10.2019 19:15, Mark Brown пишет:

> 

>>> That sounds like it might just postpone the inevitable - if you set the

>>> wrong voltage first it might decide to drop down some voltage that

>>> wasn't expected.  There's a bit of a bootstrapping issue.  I think it

>>> would be safer to just say that anything that is within spec won't get

>>> changed any time we balance, we'd only change things if needed to bring

>>> them back into spec.

> 

>> Yes, the case of changing voltage before regulator is enabled seems

>> won't work as expected.

> 

>> Maybe it won't hurt to disallow a non always-on regulators to be coupled

>> until there will be a real user for that case.

> 

> I thought that coupling with the CPU core and main SoC power regulators

> was one of the main use cases for this feature?

> 


In a case of NVIDIA Tegra SoCs, it's coupling of CPU core *and* SoC core
regulators. I see that Exynos also couples the always-on regulators.

Properly handling case of a disabled coupled regulator certainly will be
useful, but looks like there are no real users for that feature right
now and thus no real testing is done.

Keeping coupled regulators in s valid voltage range is the today's main
feature.
Mark Brown Oct. 8, 2019, 6:07 p.m. | #15
On Tue, Oct 08, 2019 at 09:00:29PM +0300, Dmitry Osipenko wrote:
> 08.10.2019 20:17, Mark Brown пишет:

> > On Tue, Oct 08, 2019 at 08:05:03PM +0300, Dmitry Osipenko wrote:


> >> Maybe it won't hurt to disallow a non always-on regulators to be coupled

> >> until there will be a real user for that case.


> > I thought that coupling with the CPU core and main SoC power regulators

> > was one of the main use cases for this feature?


> Properly handling case of a disabled coupled regulator certainly will be

> useful, but looks like there are no real users for that feature right

> now and thus no real testing is done.


Right, sorry - I missed the double negative there.
Marek Szyprowski Oct. 9, 2019, 10:29 a.m. | #16
Hi Mark,

On 08.10.2019 20:07, Mark Brown wrote:
> On Tue, Oct 08, 2019 at 09:00:29PM +0300, Dmitry Osipenko wrote:

>> Properly handling case of a disabled coupled regulator certainly will be

>> useful, but looks like there are no real users for that feature right

>> now and thus no real testing is done.

> Right, sorry - I missed the double negative there.


Okay, then what is the conclusion, as I got lost a bit? How do you want 
this issue to be fixed?

Best regards

-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland
Mark Brown Oct. 9, 2019, 2:13 p.m. | #17
On Wed, Oct 09, 2019 at 12:29:00PM +0200, Marek Szyprowski wrote:

> Okay, then what is the conclusion, as I got lost a bit? How do you want 

> this issue to be fixed?


We should revert the enable call, it shouldn't be required, and ideally
the default balancer could be updated to only make configuration changes
if they're actually required which would help avoid triggering any such
things in future if we don't absolutely have to.
Viresh Kumar Oct. 10, 2019, 7:29 a.m. | #18
On 09-10-19, 15:13, Mark Brown wrote:
> On Wed, Oct 09, 2019 at 12:29:00PM +0200, Marek Szyprowski wrote:

> 

> > Okay, then what is the conclusion, as I got lost a bit? How do you want 

> > this issue to be fixed?

> 

> We should revert the enable call, it shouldn't be required, and ideally

> the default balancer could be updated to only make configuration changes

> if they're actually required which would help avoid triggering any such

> things in future if we don't absolutely have to.


Sorry for the delay in responding, just came back after vacations.

Should the OPP change be reverted ? Someone going to send that revert to me with
the required explanation ?

-- 
viresh
Mark Brown Oct. 10, 2019, 1:55 p.m. | #19
On Thu, Oct 10, 2019 at 12:19:55PM +0200, Marek Szyprowski wrote:
> On 09.10.2019 16:13, Mark Brown wrote:


> > We should revert the enable call, it shouldn't be required, and ideally

> > the default balancer could be updated to only make configuration changes

> > if they're actually required which would help avoid triggering any such

> > things in future if we don't absolutely have to.


> Okay, Then in case of regulator core - do you accept the initial patch 

> as it indeed forces the default balancer to avoid unnecessary changes, 

> or do you want me to rewrite it to assume min_uV = current_uV for the 

> already enabled regulators during the initial balancing, like suggested 

> by Dmitry?


Neither, I'm suggesting you make the change above.

Patch

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index afe94470b67f..aca74b83f3bc 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -2481,7 +2481,8 @@  static int _regulator_enable(struct regulator *regulator)
 	}
 
 	/* balance only if there are regulators coupled */
-	if (rdev->coupling_desc.n_coupled > 1) {
+	if (rdev->coupling_desc.n_coupled > 1 &&
+	    !_regulator_is_enabled(rdev)) {
 		ret = regulator_balance_voltage(rdev, PM_SUSPEND_ON);
 		if (ret < 0)
 			goto err_disable_supply;