Message ID | 20220608142723.103523089@infradead.org |
---|---|
Headers | show |
Series | cpuidle,rcu: Cleanup the mess | expand |
On Wed, Jun 08, 2022 at 04:27:57PM +0200, Peter Zijlstra wrote: > @@ -254,11 +255,18 @@ void omap_sram_idle(void) > */ > if (save_state) > omap34xx_save_context(omap3_arm_context); > + > + if (rcuidle) > + cpuidle_rcu_enter(); > + > if (save_state == 1 || save_state == 3) > cpu_suspend(save_state, omap34xx_do_sram_idle); > else > omap34xx_do_sram_idle(save_state); > > + if (rcuidle) > + rcuidle_rcu_exit(); *sigh* so much for this having been exposed to the robots for >2 days :/
On Wed, Jun 8, 2022 at 4:27 PM Peter Zijlstra <peterz@infradead.org> wrote: > > Current arch_cpu_idle() is called with IRQs disabled, but will return > with IRQs enabled. > > However, the very first thing the generic code does after calling > arch_cpu_idle() is raw_local_irq_disable(). This means that > architectures that can idle with IRQs disabled end up doing a > pointless 'enable-disable' dance. > > Therefore, push this IRQ disabling into the idle function, meaning > that those architectures can avoid the pointless IRQ state flipping. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> I think you now need to add the a raw_local_irq_disable(); in loongarch as well. Arnd
On Wed, Jun 8, 2022 at 4:27 PM Peter Zijlstra <peterz@infradead.org> wrote: > > arch_cpu_idle() is a very simple idle interface and exposes only a > single idle state and is expected to not require RCU and not do any > tracing/instrumentation. > > As such, omap_sram_idle() is not a valid implementation. Replace it > with the simple (shallow) omap3_do_wfi() call. Leaving the more > complicated idle states for the cpuidle driver. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> I see similar code in omap2: omap2_pm_idle() -> omap2_enter_full_retention() -> omap2_sram_suspend() Is that code path safe to use without RCU or does it need a similar change? Arnd
* Arnd Bergmann <arnd@arndb.de> [220608 18:18]: > On Wed, Jun 8, 2022 at 4:27 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > > arch_cpu_idle() is a very simple idle interface and exposes only a > > single idle state and is expected to not require RCU and not do any > > tracing/instrumentation. > > > > As such, omap_sram_idle() is not a valid implementation. Replace it > > with the simple (shallow) omap3_do_wfi() call. Leaving the more > > complicated idle states for the cpuidle driver. Agreed it makes sense to limit deeper idle states to cpuidle. Hopefully there is some informative splat for attempting to use arch_cpu_ide() for deeper idle states :) > I see similar code in omap2: > > omap2_pm_idle() > -> omap2_enter_full_retention() > -> omap2_sram_suspend() > > Is that code path safe to use without RCU or does it need a similar change? Seems like a similar change should be done for omap2. Then anybody who cares to implement a minimal cpuidle support can do so. Regards, Tony
On Wed 2022-06-08 16:27:47, Peter Zijlstra wrote: > The problem, per commit fc98c3c8c9dc ("printk: use rcuidle console > tracepoint"), was printk usage from the cpuidle path where RCU was > already disabled. > > Per the patches earlier in this series, this is no longer the case. My understanding is that this series reduces a lot the amount of code called with RCU disabled. As a result the particular printk() call mentioned by commit fc98c3c8c9dc ("printk: use rcuidle console tracepoint") is called with RCU enabled now. Hence this particular problem is fixed better way now. But is this true in general? Does this "prevent" calling printk() a safe way in code with RCU disabled? I am not sure if anyone cares. printk() is the best effort functionality because of the consoles code anyway. Also I wonder if anyone uses this trace_console(). Therefore if this patch allows to remove some tricky tracing code then it might be worth it. But if trace_console_rcuidle() variant is still going to be available then I would keep using it. Best Regards, Petr > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > kernel/printk/printk.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -2238,7 +2238,7 @@ static u16 printk_sprint(char *text, u16 > } > } > > - trace_console_rcuidle(text, text_len); > + trace_console(text, text_len); > > return text_len; > } >
On Wed, Jun 08, 2022 at 06:28:33PM +0200, Arnd Bergmann wrote: > On Wed, Jun 8, 2022 at 4:27 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > > arch_cpu_idle() is a very simple idle interface and exposes only a > > single idle state and is expected to not require RCU and not do any > > tracing/instrumentation. > > > > As such, omap_sram_idle() is not a valid implementation. Replace it > > with the simple (shallow) omap3_do_wfi() call. Leaving the more > > complicated idle states for the cpuidle driver. > > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > I see similar code in omap2: > > omap2_pm_idle() > -> omap2_enter_full_retention() > -> omap2_sram_suspend() > > Is that code path safe to use without RCU or does it need a similar change? It needs a similar change, clearly I was running on fumes to not have found that when grepping around the omap code :/
On Thu, Jun 09, 2022 at 10:39:22AM +0300, Tony Lindgren wrote: > * Arnd Bergmann <arnd@arndb.de> [220608 18:18]: > > On Wed, Jun 8, 2022 at 4:27 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > arch_cpu_idle() is a very simple idle interface and exposes only a > > > single idle state and is expected to not require RCU and not do any > > > tracing/instrumentation. > > > > > > As such, omap_sram_idle() is not a valid implementation. Replace it > > > with the simple (shallow) omap3_do_wfi() call. Leaving the more > > > complicated idle states for the cpuidle driver. > > Agreed it makes sense to limit deeper idle states to cpuidle. Hopefully > there is some informative splat for attempting to use arch_cpu_ide() > for deeper idle states :) The arch_cpu_idle() interface doesn't allow one to express a desire for deeper states. I'm not sure how anyone could even attempt this. But given what OMAP needs to go deeper, this would involve things that require RCU, combine that with the follow up patches that rip out all the trace_.*_rcuidle() hackery from the power and clock domain code, PROVE_RCU should scream if anybody were to attempt it.
On Thu, Jun 09, 2022 at 11:16:46AM +0200, Petr Mladek wrote: > On Wed 2022-06-08 16:27:47, Peter Zijlstra wrote: > > The problem, per commit fc98c3c8c9dc ("printk: use rcuidle console > > tracepoint"), was printk usage from the cpuidle path where RCU was > > already disabled. > > > > Per the patches earlier in this series, this is no longer the case. > > My understanding is that this series reduces a lot the amount > of code called with RCU disabled. As a result the particular printk() > call mentioned by commit fc98c3c8c9dc ("printk: use rcuidle console > tracepoint") is called with RCU enabled now. Hence this particular > problem is fixed better way now. > > But is this true in general? > Does this "prevent" calling printk() a safe way in code with > RCU disabled? On x86_64, yes. Other architectures, less so. Specifically, the objtool noinstr validation pass will warn at build time (DEBUG_ENTRY=y) if any noinstr/cpuidle code does a call to non-vetted code like printk(). At the same time; there's a few hacks that allow WARN to work, but mostly if you hit WARN in entry/noinstr you get to keep the pieces in any case. On other architecture we'll need to rely on runtime coverage with PROVE_RCU. That is, if a splat like in the above mentioned commit happens again, we'll need to fix it by adjusting the callchain, not by mucking about with RCU state. > I am not sure if anyone cares. printk() is the best effort > functionality because of the consoles code anyway. Also I wonder > if anyone uses this trace_console(). This is the tracepoint used to spool all of printk into ftrace, I suspect there's users, but I haven't used it myself. > Therefore if this patch allows to remove some tricky tracing > code then it might be worth it. But if trace_console_rcuidle() > variant is still going to be available then I would keep using it. My ultimate goal is to delete trace_.*_rcuidle() and RCU_NONIDLE() entirely. We're close, but not quite there yet.
Sending again. The previous attempt was rejected by several recipients. It was caused by a mail server changes on my side. I am sorry for spamming those who got the 1st mail already. On Wed 2022-06-08 16:27:47, Peter Zijlstra wrote: > The problem, per commit fc98c3c8c9dc ("printk: use rcuidle console > tracepoint"), was printk usage from the cpuidle path where RCU was > already disabled. > > Per the patches earlier in this series, this is no longer the case. My understanding is that this series reduces a lot the amount of code called with RCU disabled. As a result the particular printk() call mentioned by commit fc98c3c8c9dc ("printk: use rcuidle console tracepoint") is called with RCU enabled now. Hence this particular problem is fixed better way now. But is this true in general? Does this "prevent" calling printk() a safe way in code with RCU disabled? I am not sure if anyone cares. printk() is the best effort functionality because of the consoles code anyway. Also I wonder if anyone uses this trace_console(). Therefore if this patch allows to remove some tricky tracing code then it might be worth it. But if trace_console_rcuidle() variant is still going to be available then I would keep using it. Best Regards, Petr > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > kernel/printk/printk.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -2238,7 +2238,7 @@ static u16 printk_sprint(char *text, u16 > } > } > > - trace_console_rcuidle(text, text_len); > + trace_console(text, text_len); > > return text_len; > } >
My emails are getting rejected... Let me try web-interface Kudos to Petr for the questions and thanks to PeterZ for the answers. On Thu, Jun 9, 2022 at 7:02 PM Peter Zijlstra <peterz@infradead.org> wrote: > This is the tracepoint used to spool all of printk into ftrace, I > suspect there's users, but I haven't used it myself. I'm somewhat curious whether we can actually remove that trace event.
On Thu 2022-06-09 20:30:58, Sergey Senozhatsky wrote: > My emails are getting rejected... Let me try web-interface Bad day for mail sending. I have problems as well ;-) > Kudos to Petr for the questions and thanks to PeterZ for the answers. > > On Thu, Jun 9, 2022 at 7:02 PM Peter Zijlstra <peterz@infradead.org> wrote: > > This is the tracepoint used to spool all of printk into ftrace, I > > suspect there's users, but I haven't used it myself. > > I'm somewhat curious whether we can actually remove that trace event. Good question. Well, I think that it might be useful. It allows to see trace and printk messages together. It was ugly when it was in the console code. The new location in vprintk_store() allows to have it even "correctly" sorted (timestamp) against other tracing messages. Best Regards, Petr
On Thu 2022-06-09 12:02:04, Peter Zijlstra wrote: > On Thu, Jun 09, 2022 at 11:16:46AM +0200, Petr Mladek wrote: > > On Wed 2022-06-08 16:27:47, Peter Zijlstra wrote: > > > The problem, per commit fc98c3c8c9dc ("printk: use rcuidle console > > > tracepoint"), was printk usage from the cpuidle path where RCU was > > > already disabled. > > > > > Does this "prevent" calling printk() a safe way in code with > > RCU disabled? > > On x86_64, yes. Other architectures, less so. > > Specifically, the objtool noinstr validation pass will warn at build > time (DEBUG_ENTRY=y) if any noinstr/cpuidle code does a call to > non-vetted code like printk(). > > At the same time; there's a few hacks that allow WARN to work, but > mostly if you hit WARN in entry/noinstr you get to keep the pieces in > any case. > > On other architecture we'll need to rely on runtime coverage with > PROVE_RCU. That is, if a splat like in the above mentioned commit > happens again, we'll need to fix it by adjusting the callchain, not by > mucking about with RCU state. Makes sense. Feel free to use for this patch: Acked-by: Petr Mladek <pmladek@suse.com> > > Therefore if this patch allows to remove some tricky tracing > > code then it might be worth it. But if trace_console_rcuidle() > > variant is still going to be available then I would keep using it. > > My ultimate goal is to delete trace_.*_rcuidle() and RCU_NONIDLE() > entirely. We're close, but not quite there yet. I keep my fingers crossed. Best Regards, Petr
On Thu, Jun 9, 2022 at 10:06 PM Petr Mladek <pmladek@suse.com> wrote: > > Makes sense. Feel free to use for this patch: > > Acked-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
On Thu, Jun 9, 2022 at 10:02 PM Petr Mladek <pmladek@suse.com> wrote: > > On Thu 2022-06-09 20:30:58, Sergey Senozhatsky wrote: > > My emails are getting rejected... Let me try web-interface > > Bad day for mail sending. I have problems as well ;-) For me the problem is still there and apparently it's an "too many recipients" error. > > I'm somewhat curious whether we can actually remove that trace event. > > Good question. > > Well, I think that it might be useful. It allows to see trace and > printk messages together. Fair enough. Seems that back in 2011 people were pretty happy with it https://lore.kernel.org/all/1322161388.5366.54.camel@jlt3.sipsolutions.net/T/#m7bf6416f469119372191f22a6ecf653c5f7331d2 but... reportedly, one of the folks who Ack-ed it (*cough cough* PeterZ) has never used it. > It was ugly when it was in the console code. The new location > in vprintk_store() allows to have it even "correctly" sorted > (timestamp) against other tracing messages. That's true.
On Mon, Jun 13, 2022 at 04:26:01PM +0800, Lai Jiangshan wrote: > On Wed, Jun 8, 2022 at 10:48 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > > Now that arch_cpu_idle() is expected to return with IRQs disabled, > > avoid the useless STI/CLI dance. > > > > Per the specs this is supposed to work, but nobody has yet relied up > > this behaviour so broken implementations are possible. > > I'm totally newbie here. > > The point of safe_halt() is that STI must be used and be used > directly before HLT to enable IRQ during the halting and stop > the halting if there is any IRQ. Correct; on real hardware. But this is virt... > In TDX case, STI must be used directly before the hypercall. > Otherwise, no IRQ can come and the vcpu would be stalled forever. > > Although the hypercall has an "irq_disabled" argument. > But the hypervisor doesn't (and can't) touch the IRQ flags no matter > what the "irq_disabled" argument is. The IRQ is not enabled during > the halting if the IRQ is disabled before the hypercall even if > irq_disabled=false. All we need the VMM to do is wake the vCPU, and it can do that, irrespective of the guest's IF. So the VMM can (and does) know if there's an interrupt pending, and that's all that's needed to wake from this hypercall. Once the vCPU is back up and running again, we'll eventually set IF again and the pending interrupt will get delivered and all's well. Think of this like MWAIT with ECX[0] set if you will.
* Peter Zijlstra <peterz@infradead.org> [220608 15:00]: > On Wed, Jun 08, 2022 at 04:27:57PM +0200, Peter Zijlstra wrote: > > @@ -254,11 +255,18 @@ void omap_sram_idle(void) > > */ > > if (save_state) > > omap34xx_save_context(omap3_arm_context); > > + > > + if (rcuidle) > > + cpuidle_rcu_enter(); > > + > > if (save_state == 1 || save_state == 3) > > cpu_suspend(save_state, omap34xx_do_sram_idle); > > else > > omap34xx_do_sram_idle(save_state); > > > > + if (rcuidle) > > + rcuidle_rcu_exit(); > > *sigh* so much for this having been exposed to the robots for >2 days :/ I tested your git branch of these patches, so: Reviewed-by: Tony Lindgren <tony@atomide.com> Tested-by: Tony Lindgren <tony@atomide.com>
On Jun 13, 2022, at 11:48 AM, Srivatsa S. Bhat <srivatsa@csail.mit.edu> wrote: > ⚠ External Email > > On 6/8/22 4:27 PM, Peter Zijlstra wrote: >> vmlinux.o: warning: objtool: acpi_idle_enter_s2idle+0xde: call to wbinvd() leaves .noinstr.text section >> vmlinux.o: warning: objtool: default_idle+0x4: call to arch_safe_halt() leaves .noinstr.text section >> vmlinux.o: warning: objtool: xen_safe_halt+0xa: call to HYPERVISOR_sched_op.constprop.0() leaves .noinstr.text section >> >> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > Reviewed-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> > >> >> -static inline void wbinvd(void) >> +extern noinstr void pv_native_wbinvd(void); >> + >> +static __always_inline void wbinvd(void) >> { >> PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ALT_NOT(X86_FEATURE_XENPV)); >> } I guess it is yet another instance of wrong accounting of GCC for the assembly blocks’ weight. I guess it is not a solution for older GCCs, but presumably ____PVOP_ALT_CALL() and friends should have used asm_inline or some new “asm_volatile_inline” variant.
On Wed, Jun 08, 2022 at 04:27:23PM +0200, Peter Zijlstra wrote: > Hi All! (omg so many) Hi Peter, Sorry for the delay; my plate has also been rather full recently. I'm beginning to page this in now. > These here few patches mostly clear out the utter mess that is cpuidle vs rcuidle. > > At the end of the ride there's only 2 real RCU_NONIDLE() users left > > arch/arm64/kernel/suspend.c: RCU_NONIDLE(__cpu_suspend_exit()); > drivers/perf/arm_pmu.c: RCU_NONIDLE(armpmu_start(event, PERF_EF_RELOAD)); The latter of these is necessary because apparently PM notifiers are called with RCU not watching. Is that still the case today (or at the end of this series)? If so, that feels like fertile land for more issues (yaey...). If not, we should be able to drop this. I can go dig into that some more. > kernel/cfi.c: RCU_NONIDLE({ > > (the CFI one is likely dead in the kCFI rewrite) and there's only a hand full > of trace_.*_rcuidle() left: > > kernel/trace/trace_preemptirq.c: trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1); > kernel/trace/trace_preemptirq.c: trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1); > kernel/trace/trace_preemptirq.c: trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr); > kernel/trace/trace_preemptirq.c: trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr); > kernel/trace/trace_preemptirq.c: trace_preempt_enable_rcuidle(a0, a1); > kernel/trace/trace_preemptirq.c: trace_preempt_disable_rcuidle(a0, a1); > > All of them are in 'deprecated' code that is unused for GENERIC_ENTRY. I think those are also unused on arm64 too? If not, I can go attack that. > I've touched a _lot_ of code that I can't test and likely broken some of it :/ > In particular, the whole ARM cpuidle stuff was quite involved with OMAP being > the absolute 'winner'. > > I'm hoping Mark can help me sort the remaining ARM64 bits as he moves that to > GENERIC_ENTRY. Moving to GENERIC_ENTRY as a whole is going to take a tonne of work (refactoring both arm64 and the generic portion to be more amenable to each other), but we can certainly move closer to that for the bits that matter here. Maybe we want a STRICT_ENTRY option to get rid of all the deprecated stuff that we can select regardless of GENERIC_ENTRY to make that easier. > I've also got a note that says ARM64 can probably do a WFE based > idle state and employ TIF_POLLING_NRFLAG to avoid some IPIs. Possibly; I'm not sure how much of a win that'll be given that by default we'll have a ~10KHz WFE wakeup from the timer, but we could take a peek. Thanks, Mark.
On Thu, 9 Jun 2022 15:02:20 +0200 Petr Mladek <pmladek@suse.com> wrote: > > I'm somewhat curious whether we can actually remove that trace event. > > Good question. > > Well, I think that it might be useful. It allows to see trace and > printk messages together. Yes people still use it. I was just asked about it at Kernel Recipes. That is, someone wanted printk mixed in with the tracing, and I told them about this event (which they didn't know about but was happy to hear that it existed). -- Steve
On Mon, Jun 13, 2022 at 07:23:13PM +0000, Nadav Amit wrote: > On Jun 13, 2022, at 11:48 AM, Srivatsa S. Bhat <srivatsa@csail.mit.edu> wrote: > > > ⚠ External Email > > > > On 6/8/22 4:27 PM, Peter Zijlstra wrote: > >> vmlinux.o: warning: objtool: acpi_idle_enter_s2idle+0xde: call to wbinvd() leaves .noinstr.text section > >> vmlinux.o: warning: objtool: default_idle+0x4: call to arch_safe_halt() leaves .noinstr.text section > >> vmlinux.o: warning: objtool: xen_safe_halt+0xa: call to HYPERVISOR_sched_op.constprop.0() leaves .noinstr.text section > >> > >> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > > > Reviewed-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> > > > >> > >> -static inline void wbinvd(void) > >> +extern noinstr void pv_native_wbinvd(void); > >> + > >> +static __always_inline void wbinvd(void) > >> { > >> PVOP_ALT_VCALL0(cpu.wbinvd, "wbinvd", ALT_NOT(X86_FEATURE_XENPV)); > >> } > > I guess it is yet another instance of wrong accounting of GCC for > the assembly blocks’ weight. I guess it is not a solution for older > GCCs, but presumably ____PVOP_ALT_CALL() and friends should have > used asm_inline or some new “asm_volatile_inline” variant. Partially, some of the *SAN options also generate a metric ton of nonsense when enabled and skew the compilers towards not inlining things.
On Tue, Jun 14, 2022 at 12:19:29PM +0100, Mark Rutland wrote: > On Wed, Jun 08, 2022 at 04:27:23PM +0200, Peter Zijlstra wrote: > > Hi All! (omg so many) > > Hi Peter, > > Sorry for the delay; my plate has also been rather full recently. I'm beginning > to page this in now. No worries; we all have too much to do ;-) > > These here few patches mostly clear out the utter mess that is cpuidle vs rcuidle. > > > > At the end of the ride there's only 2 real RCU_NONIDLE() users left > > > > arch/arm64/kernel/suspend.c: RCU_NONIDLE(__cpu_suspend_exit()); > > drivers/perf/arm_pmu.c: RCU_NONIDLE(armpmu_start(event, PERF_EF_RELOAD)); > > The latter of these is necessary because apparently PM notifiers are called > with RCU not watching. Is that still the case today (or at the end of this > series)? If so, that feels like fertile land for more issues (yaey...). If not, > we should be able to drop this. That should be fixed; fingers crossed :-) > > kernel/cfi.c: RCU_NONIDLE({ > > > > (the CFI one is likely dead in the kCFI rewrite) and there's only a hand full > > of trace_.*_rcuidle() left: > > > > kernel/trace/trace_preemptirq.c: trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1); > > kernel/trace/trace_preemptirq.c: trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1); > > kernel/trace/trace_preemptirq.c: trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr); > > kernel/trace/trace_preemptirq.c: trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr); > > kernel/trace/trace_preemptirq.c: trace_preempt_enable_rcuidle(a0, a1); > > kernel/trace/trace_preemptirq.c: trace_preempt_disable_rcuidle(a0, a1); > > > > All of them are in 'deprecated' code that is unused for GENERIC_ENTRY. > > I think those are also unused on arm64 too? > > If not, I can go attack that. My grep spots: arch/arm64/kernel/entry-common.c: trace_hardirqs_on(); arch/arm64/include/asm/daifflags.h: trace_hardirqs_off(); arch/arm64/include/asm/daifflags.h: trace_hardirqs_off(); The _on thing should be replaced with something like: trace_hardirqs_on_prepare(); lockdep_hardirqs_on_prepare(); instrumentation_end(); rcu_irq_exit(); lockdep_hardirqs_on(CALLER_ADDR0); (as I think you know, since you have some of that already). And something similar for the _off thing, but with _off_finish(). > > I've touched a _lot_ of code that I can't test and likely broken some of it :/ > > In particular, the whole ARM cpuidle stuff was quite involved with OMAP being > > the absolute 'winner'. > > > > I'm hoping Mark can help me sort the remaining ARM64 bits as he moves that to > > GENERIC_ENTRY. > > Moving to GENERIC_ENTRY as a whole is going to take a tonne of work > (refactoring both arm64 and the generic portion to be more amenable to each > other), but we can certainly move closer to that for the bits that matter here. I know ... been there etc.. :-) > Maybe we want a STRICT_ENTRY option to get rid of all the deprecated stuff that > we can select regardless of GENERIC_ENTRY to make that easier. Possible yeah. > > I've also got a note that says ARM64 can probably do a WFE based > > idle state and employ TIF_POLLING_NRFLAG to avoid some IPIs. > > Possibly; I'm not sure how much of a win that'll be given that by default we'll > have a ~10KHz WFE wakeup from the timer, but we could take a peek. Ohh.. I didn't know it woke up *that* often. I just know Will made use of it in things like smp_cond_load_relaxed() which would be somewhat similar to a very shallow idle state that looks at the TIF word.
On Tue, Jun 14, 2022 at 06:58:30PM +0200, Peter Zijlstra wrote: > On Tue, Jun 14, 2022 at 12:19:29PM +0100, Mark Rutland wrote: > > On Wed, Jun 08, 2022 at 04:27:23PM +0200, Peter Zijlstra wrote: > > > Hi All! (omg so many) > > > > Hi Peter, > > > > Sorry for the delay; my plate has also been rather full recently. I'm beginning > > to page this in now. > > No worries; we all have too much to do ;-) > > > > These here few patches mostly clear out the utter mess that is cpuidle vs rcuidle. > > > > > > At the end of the ride there's only 2 real RCU_NONIDLE() users left > > > > > > arch/arm64/kernel/suspend.c: RCU_NONIDLE(__cpu_suspend_exit()); > > > drivers/perf/arm_pmu.c: RCU_NONIDLE(armpmu_start(event, PERF_EF_RELOAD)); > > > > The latter of these is necessary because apparently PM notifiers are called > > with RCU not watching. Is that still the case today (or at the end of this > > series)? If so, that feels like fertile land for more issues (yaey...). If not, > > we should be able to drop this. > > That should be fixed; fingers crossed :-) Cool; I'll try to give that a spin when I'm sat next to some relevant hardware. :) > > > kernel/cfi.c: RCU_NONIDLE({ > > > > > > (the CFI one is likely dead in the kCFI rewrite) and there's only a hand full > > > of trace_.*_rcuidle() left: > > > > > > kernel/trace/trace_preemptirq.c: trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1); > > > kernel/trace/trace_preemptirq.c: trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1); > > > kernel/trace/trace_preemptirq.c: trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr); > > > kernel/trace/trace_preemptirq.c: trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr); > > > kernel/trace/trace_preemptirq.c: trace_preempt_enable_rcuidle(a0, a1); > > > kernel/trace/trace_preemptirq.c: trace_preempt_disable_rcuidle(a0, a1); > > > > > > All of them are in 'deprecated' code that is unused for GENERIC_ENTRY. > > I think those are also unused on arm64 too? > > > > If not, I can go attack that. > > My grep spots: > > arch/arm64/kernel/entry-common.c: trace_hardirqs_on(); > arch/arm64/include/asm/daifflags.h: trace_hardirqs_off(); > arch/arm64/include/asm/daifflags.h: trace_hardirqs_off(); Ah; I hadn't realised those used trace_.*_rcuidle() behind the scenes. That affects local_irq_{enable,disable,restore}() too (which is what the daifflags.h bits are emulating), and also the generic entry code's irqentry_exit(). So it feels to me like we should be fixing those more generally? e.g. say that with a new STRICT_ENTRY[_RCU], we can only call trace_hardirqs_{on,off}() with RCU watching, and alter the definition of those? > The _on thing should be replaced with something like: > > trace_hardirqs_on_prepare(); > lockdep_hardirqs_on_prepare(); > instrumentation_end(); > rcu_irq_exit(); > lockdep_hardirqs_on(CALLER_ADDR0); > > (as I think you know, since you have some of that already). And > something similar for the _off thing, but with _off_finish(). Sure; I knew that was necessary for the outermost parts of entry (and I think that's all handled), I just hadn't realised that trace_hardirqs_{on,off} did the rcuidle thing in the middle. It'd be nice to not have to open-code the whole sequence everywhere for the portions which run after entry and are instrumentable, so (as above) I reckon we want to make trace_hardirqs_{on,off}() not do the rcuidle part unnecessarily (which IIUC is an end-goal anyway)? > > > I've touched a _lot_ of code that I can't test and likely broken some of it :/ > > > In particular, the whole ARM cpuidle stuff was quite involved with OMAP being > > > the absolute 'winner'. > > > > > > I'm hoping Mark can help me sort the remaining ARM64 bits as he moves that to > > > GENERIC_ENTRY. > > > > Moving to GENERIC_ENTRY as a whole is going to take a tonne of work > > (refactoring both arm64 and the generic portion to be more amenable to each > > other), but we can certainly move closer to that for the bits that matter here. > > I know ... been there etc.. :-) > > > Maybe we want a STRICT_ENTRY option to get rid of all the deprecated stuff that > > we can select regardless of GENERIC_ENTRY to make that easier. > > Possible yeah. > > > > I've also got a note that says ARM64 can probably do a WFE based > > > idle state and employ TIF_POLLING_NRFLAG to avoid some IPIs. > > > > Possibly; I'm not sure how much of a win that'll be given that by default we'll > > have a ~10KHz WFE wakeup from the timer, but we could take a peek. > > Ohh.. I didn't know it woke up *that* often. I just know Will made use > of it in things like smp_cond_load_relaxed() which would be somewhat > similar to a very shallow idle state that looks at the TIF word. We'll get some saving, I'm just not sure where that falls on the curve of idle states. FWIW the wakeup *can* be disabled (and it'd be nice to when we have WFxT instructions which take a timeout), it jsut happens to be on by default for reasons. Thanks, Mark.