ARM: Don't ever downscale loops_per_jiffy in SMP systems#

Message ID	alpine.LFD.2.11.1405091640090.980@knanqh.ubzr
State	New
Headers	show Return-Path: <patchwork-forward+bncBCE6L2HOQQPRBNMGWWNQKGQEKTJCPYQ@linaro.org> Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.220.182 as permitted sender) client-ip=209.85.220.182; Received-SPF: none (google.com: linux-kernel-owner@vger.kernel.org does not designate permitted sender hosts) client-ip=209.132.180.67; Date: Fri, 9 May 2014 17:05:43 -0400 (EDT) From: Nicolas Pitre <nicolas.pitre@linaro.org> To: Russell King - ARM Linux <linux@arm.linux.org.uk> cc: Doug Anderson <dianders@chromium.org>, Viresh Kumar <viresh.kumar@linaro.org>, "Rafael J. Wysocki" <rjw@rjwysocki.net>, Will Deacon <will.deacon@arm.com>, John Stultz <john.stultz@linaro.org>, David Riley <davidriley@chromium.org>, "olof@lixom.net" <olof@lixom.net>, Sonny Rao <sonnyrao@chromium.org>, Richard Zhao <richard.zhao@linaro.org>, Santosh Shilimkar <santosh.shilimkar@ti.com>, Shawn Guo <shawn.guo@linaro.org>, Stephen Boyd <sboyd@codeaurora.org>, Marc Zyngier <marc.zyngier@arm.com>, Stephen Warren <swarren@nvidia.com>, Paul Gortmaker <paul.gortmaker@windriver.com>, "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org> Subject: Re: [PATCH] ARM: Don't ever downscale loops_per_jiffy in SMP systems# In-Reply-To: <20140509182245.GM3693@n2100.arm.linux.org.uk> Message-ID: <alpine.LFD.2.11.1405091640090.980@knanqh.ubzr> References: <CAD=FV=UEGJGHrdcXgs3EZ_wFXT8GnNeCOMt7CgdiiUzBzZ5Y0A@mail.gmail.com> <alpine.LFD.2.11.1405081149190.980@knanqh.ubzr> <CAD=FV=VtcLOh51LCmafzRMdqtDrwKXCqK6gE6V7OOOUe4gjwbA@mail.gmail.com> <alpine.LFD.2.11.1405081329010.980@knanqh.ubzr> <20140508192209.GH3693@n2100.arm.linux.org.uk> <alpine.LFD.2.11.1405081600260.980@knanqh.ubzr> <20140508205223.GI3693@n2100.arm.linux.org.uk> <alpine.LFD.2.11.1405082058130.980@knanqh.ubzr> <20140509091824.GL3693@n2100.arm.linux.org.uk> <alpine.LFD.2.11.1405091245070.980@knanqh.ubzr> <20140509182245.GM3693@n2100.arm.linux.org.uk> User-Agent: Alpine 2.11 (LFD 23 2013-08-11) MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Content-Type: TEXT/PLAIN; charset=US-ASCII

Nicolas Pitre May 9, 2014, 9:05 p.m. UTC

On Fri, 9 May 2014, Russell King - ARM Linux wrote:

> I'd much prefer just printing a warning at kernel boot time to report
> that the kernel is running with features which would make udelay() less
> than accurate.

What if there is simply no timer to rely upon, as in those cases where 
interrupts are needed for time keeping to make progress?  We should do 
better than simply saying "sorry your kernel should irradicate every 
udelay() usage to be reliable".

And I mean "reliable" which is not exactly the same as "accurate".  
Reliable means "never *significantly* shorter".

> Remember, it should be usable for _short_ delays on slow machines as
> well as other stuff, and if we're going to start throwing stuff like
> the above at it, it's going to become very inefficient.

You said that udelay can be much longer than expected due to various 
reasons.

You also said that the IRQ handler overhead during udelay calibration 
makes actual delays slightli shorter than expected.

I'm suggesting the addition of a slight overhead that is much smaller 
than the IRQ handler here.  That shouldn't impact things masurably.  
I'd certainly like Doug to run his udelay timing test with the following 
patch to see if it solves the problem.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Doug Anderson May 12, 2014, 11:51 p.m. UTC | #1

Hi,

On Fri, May 9, 2014 at 2:05 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Fri, 9 May 2014, Russell King - ARM Linux wrote:
>
>> I'd much prefer just printing a warning at kernel boot time to report
>> that the kernel is running with features which would make udelay() less
>> than accurate.
>
> What if there is simply no timer to rely upon, as in those cases where
> interrupts are needed for time keeping to make progress?  We should do
> better than simply saying "sorry your kernel should irradicate every
> udelay() usage to be reliable".
>
> And I mean "reliable" which is not exactly the same as "accurate".
> Reliable means "never *significantly* shorter".
>
>> Remember, it should be usable for _short_ delays on slow machines as
>> well as other stuff, and if we're going to start throwing stuff like
>> the above at it, it's going to become very inefficient.
>
> You said that udelay can be much longer than expected due to various
> reasons.
>
> You also said that the IRQ handler overhead during udelay calibration
> makes actual delays slightli shorter than expected.
>
> I'm suggesting the addition of a slight overhead that is much smaller
> than the IRQ handler here.  That shouldn't impact things masurably.
> I'd certainly like Doug to run his udelay timing test with the following
> patch to see if it solves the problem.

...so I spent a whole chunk of time debugging this problem today.  I'm
out of time today (more tomorrow), but it looks like the theory I
proposed about why udelay() is giving bad results _might_ have more to
do with bugs in the exynos cpufreq driver and less to do with the
theoretical race we've been talking about.  It looks possible that the
driver is not setting the "old" frequency properly, which would
certainly cause problems.

I'll post more when I figure this out for sure.

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Doug Anderson May 13, 2014, 9:50 p.m. UTC | #2

Hi,

On Mon, May 12, 2014 at 4:51 PM, Doug Anderson <dianders@chromium.org> wrote:
> Hi,
>
> On Fri, May 9, 2014 at 2:05 PM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
>> On Fri, 9 May 2014, Russell King - ARM Linux wrote:
>>
>>> I'd much prefer just printing a warning at kernel boot time to report
>>> that the kernel is running with features which would make udelay() less
>>> than accurate.
>>
>> What if there is simply no timer to rely upon, as in those cases where
>> interrupts are needed for time keeping to make progress?  We should do
>> better than simply saying "sorry your kernel should irradicate every
>> udelay() usage to be reliable".
>>
>> And I mean "reliable" which is not exactly the same as "accurate".
>> Reliable means "never *significantly* shorter".
>>
>>> Remember, it should be usable for _short_ delays on slow machines as
>>> well as other stuff, and if we're going to start throwing stuff like
>>> the above at it, it's going to become very inefficient.
>>
>> You said that udelay can be much longer than expected due to various
>> reasons.
>>
>> You also said that the IRQ handler overhead during udelay calibration
>> makes actual delays slightli shorter than expected.
>>
>> I'm suggesting the addition of a slight overhead that is much smaller
>> than the IRQ handler here.  That shouldn't impact things masurably.
>> I'd certainly like Doug to run his udelay timing test with the following
>> patch to see if it solves the problem.
>
> ...so I spent a whole chunk of time debugging this problem today.  I'm
> out of time today (more tomorrow), but it looks like the theory I
> proposed about why udelay() is giving bad results _might_ have more to
> do with bugs in the exynos cpufreq driver and less to do with the
> theoretical race we've been talking about.  It looks possible that the
> driver is not setting the "old" frequency properly, which would
> certainly cause problems.

Argh.  It turns out that I spent a whole lot of time tracking down the
fact that cpufreq_out_of_sync() running.  As part of debugging this
problem I added a cpufreq_get(0).  That would periodically notice that
the driver's reported frequency didn't match "policy->cur" and call
cpufreq_out_of_sync().  cpufreq_out_of_sync() would "thoughtfully"
send out its own CPUFREQ_PRECHANGE / CPUFREQ_POSTCHANGE but without
any sort of mutexes (at least in our tree).  Ugh.

Overall cpufreq_out_of_sync() seems incredibly racy since there will
inevitably be some period of time where the cpufreq driver has changed
the real CPU frequency but hasn't yet sent out the
cpufreq_notify_transition().  ...and there is no locking between the
two that I see.  ...but that's getting pretty far afield from my
original bug and it's been that way forever, so I guess I'll ignore
it.

--

...but then I found the true problem shows up when we transition
between very low frequencies on exynos, like between 200MHz and
300MHz.  While transitioning between frequencies the system
temporarily bumps over to the "switcher" PLL running at 800MHz while
waiting for the main PLL to stabilize.  No CPUFREQ notification is
sent for that.  That means there's a period of time when we're running
at 800MHz but loops_per_jiffy is calibrated at between 200MHz and
300MHz.

I'm welcome to any suggestions for how to address this.  It sorta
feels like it would be a common thing to have a temporary PLL during
the transition, so my inclination would be to add a "temp" field to
"struct cpufreq_freqs".  Anyone who cared about the fact that cpufreq
might transition through a different frequency on its way from old to
new could look at this field.

What do people think?

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Stephen Warren May 13, 2014, 10:15 p.m. UTC | #3

On 05/13/2014 03:50 PM, Doug Anderson wrote:
...
> ...but then I found the true problem shows up when we transition
> between very low frequencies on exynos, like between 200MHz and
> 300MHz.  While transitioning between frequencies the system
> temporarily bumps over to the "switcher" PLL running at 800MHz while
> waiting for the main PLL to stabilize.  No CPUFREQ notification is
> sent for that.  That means there's a period of time when we're running
> at 800MHz but loops_per_jiffy is calibrated at between 200MHz and
> 300MHz.
> 
> 
> I'm welcome to any suggestions for how to address this.  It sorta
> feels like it would be a common thing to have a temporary PLL during
> the transition, ...

We definitely do that on Tegra for some cpufreq transitions.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Nicolas Pitre May 13, 2014, 11:15 p.m. UTC | #4

On Tue, 13 May 2014, Stephen Warren wrote:

> On 05/13/2014 03:50 PM, Doug Anderson wrote:
> ...
> > ...but then I found the true problem shows up when we transition
> > between very low frequencies on exynos, like between 200MHz and
> > 300MHz.  While transitioning between frequencies the system
> > temporarily bumps over to the "switcher" PLL running at 800MHz while
> > waiting for the main PLL to stabilize.  No CPUFREQ notification is
> > sent for that.  That means there's a period of time when we're running
> > at 800MHz but loops_per_jiffy is calibrated at between 200MHz and
> > 300MHz.
> > 
> > 
> > I'm welcome to any suggestions for how to address this.  It sorta
> > feels like it would be a common thing to have a temporary PLL during
> > the transition, ...
> 
> We definitely do that on Tegra for some cpufreq transitions.

Ouch...  If this is a common strategy to use a third frequency during a 
transition phase, especially if that frequency is way off (800MHz vs 
200-300MHz) then it is something the cpufreq layer must capture and 
advertise.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Nicolas Pitre May 13, 2014, 11:29 p.m. UTC | #5

On Tue, 13 May 2014, Nicolas Pitre wrote:

> On Tue, 13 May 2014, Stephen Warren wrote:
> 
> > On 05/13/2014 03:50 PM, Doug Anderson wrote:
> > ...
> > > ...but then I found the true problem shows up when we transition
> > > between very low frequencies on exynos, like between 200MHz and
> > > 300MHz.  While transitioning between frequencies the system
> > > temporarily bumps over to the "switcher" PLL running at 800MHz while
> > > waiting for the main PLL to stabilize.  No CPUFREQ notification is
> > > sent for that.  That means there's a period of time when we're running
> > > at 800MHz but loops_per_jiffy is calibrated at between 200MHz and
> > > 300MHz.
> > > 
> > > 
> > > I'm welcome to any suggestions for how to address this.  It sorta
> > > feels like it would be a common thing to have a temporary PLL during
> > > the transition, ...
> > 
> > We definitely do that on Tegra for some cpufreq transitions.
> 
> Ouch...  If this is a common strategy to use a third frequency during a 
> transition phase, especially if that frequency is way off (800MHz vs 
> 200-300MHz) then it is something the cpufreq layer must capture and 
> advertise.

Of course if only the loops_per_jiffy scaling does care about frequency 
changes these days, and if in those cases udelay() can instead be moved 
to a timer source on those hick-up prone platforms, then all this is 
fairly theoretical and may not be worth pursuing.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Russell King - ARM Linux May 13, 2014, 11:36 p.m. UTC | #6

On Tue, May 13, 2014 at 07:29:52PM -0400, Nicolas Pitre wrote:
> On Tue, 13 May 2014, Nicolas Pitre wrote:
> 
> > On Tue, 13 May 2014, Stephen Warren wrote:
> > 
> > > On 05/13/2014 03:50 PM, Doug Anderson wrote:
> > > ...
> > > > ...but then I found the true problem shows up when we transition
> > > > between very low frequencies on exynos, like between 200MHz and
> > > > 300MHz.  While transitioning between frequencies the system
> > > > temporarily bumps over to the "switcher" PLL running at 800MHz while
> > > > waiting for the main PLL to stabilize.  No CPUFREQ notification is
> > > > sent for that.  That means there's a period of time when we're running
> > > > at 800MHz but loops_per_jiffy is calibrated at between 200MHz and
> > > > 300MHz.
> > > > 
> > > > 
> > > > I'm welcome to any suggestions for how to address this.  It sorta
> > > > feels like it would be a common thing to have a temporary PLL during
> > > > the transition, ...
> > > 
> > > We definitely do that on Tegra for some cpufreq transitions.
> > 
> > Ouch...  If this is a common strategy to use a third frequency during a 
> > transition phase, especially if that frequency is way off (800MHz vs 
> > 200-300MHz) then it is something the cpufreq layer must capture and 
> > advertise.
> 
> Of course if only the loops_per_jiffy scaling does care about frequency 
> changes these days, and if in those cases udelay() can instead be moved 
> to a timer source on those hick-up prone platforms, then all this is 
> fairly theoretical and may not be worth pursuing.

As I've been saying... use a bloody timer. :)

Viresh Kumar May 15, 2014, 6:12 a.m. UTC | #7

On 14 May 2014 03:20, Doug Anderson <dianders@chromium.org> wrote:
> ...but then I found the true problem shows up when we transition
> between very low frequencies on exynos, like between 200MHz and
> 300MHz.  While transitioning between frequencies the system
> temporarily bumps over to the "switcher" PLL running at 800MHz while
> waiting for the main PLL to stabilize.  No CPUFREQ notification is
> sent for that.  That means there's a period of time when we're running
> at 800MHz but loops_per_jiffy is calibrated at between 200MHz and
> 300MHz.
>
>
> I'm welcome to any suggestions for how to address this.

I have attempted to fix this in a generic way and sent an RFC patch for
this. I have cc'd only few people from this list which I thought would be
interested in cpufreq stuff, sorry if I missed anyone.

https://lkml.org/lkml/2014/5/15/40
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

ARM: Don't ever downscale loops_per_jiffy in SMP systems#

Commit Message

Comments

Patch