Message ID | 20250218213337.377987-11-ankur.a.arora@oracle.com |
---|---|
State | New |
Headers | show |
Series | arm64: support poll_idle() | expand |
在 2025/2/19 05:33, Ankur Arora 写道: > Needed for cpuidle-haltpoll. > > Acked-by: Will Deacon <will@kernel.org> > Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> > --- > arch/arm64/kernel/idle.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm64/kernel/idle.c b/arch/arm64/kernel/idle.c > index 05cfb347ec26..b85ba0df9b02 100644 > --- a/arch/arm64/kernel/idle.c > +++ b/arch/arm64/kernel/idle.c > @@ -43,3 +43,4 @@ void __cpuidle arch_cpu_idle(void) > */ > cpu_do_idle(); Hi, Ankur, With haltpoll_driver registered, arch_cpu_idle() on x86 can select mwait_idle() in idle threads. It use MONITOR sets up an effective address range that is monitored for write-to-memory activities; MWAIT places the processor in an optimized state (this may vary between different implementations) until a write to the monitored address range occurs. Should arch_cpu_idle() on arm64 also use the LDXR/WFE to avoid wakeup IPI like x86 monitor/mwait? Thanks. Shuai
On Fri, 2025-04-11 at 11:32 +0800, Shuai Xue wrote: > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > > > > > 在 2025/2/19 05:33, Ankur Arora 写道: > > > > Needed for cpuidle-haltpoll. > > > > > > > > Acked-by: Will Deacon <will@kernel.org> > > > > Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> > > > > --- > > > > arch/arm64/kernel/idle.c | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > diff --git a/arch/arm64/kernel/idle.c b/arch/arm64/kernel/idle.c > > > > index 05cfb347ec26..b85ba0df9b02 100644 > > > > --- a/arch/arm64/kernel/idle.c > > > > +++ b/arch/arm64/kernel/idle.c > > > > @@ -43,3 +43,4 @@ void __cpuidle arch_cpu_idle(void) > > > > */ > > > > cpu_do_idle(); > > > > Hi, Ankur, > > > > With haltpoll_driver registered, arch_cpu_idle() on x86 can select > > mwait_idle() in idle threads. > > > > It use MONITOR sets up an effective address range that is monitored > > for write-to-memory activities; MWAIT places the processor in > > an optimized state (this may vary between different implementations) > > until a write to the monitored address range occurs. > > > > Should arch_cpu_idle() on arm64 also use the LDXR/WFE > > to avoid wakeup IPI like x86 monitor/mwait? WFE will wake from the event stream, which can have short sub-ms periods on many systems. May be something to consider when WFET is more widely available. > > > > Thanks. > > Shuai > > > > Regards, Haris Okanovic AWS Graviton Software
Shuai Xue <xueshuai@linux.alibaba.com> writes: > 在 2025/2/19 05:33, Ankur Arora 写道: >> Needed for cpuidle-haltpoll. >> Acked-by: Will Deacon <will@kernel.org> >> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> >> --- >> arch/arm64/kernel/idle.c | 1 + >> 1 file changed, 1 insertion(+) >> diff --git a/arch/arm64/kernel/idle.c b/arch/arm64/kernel/idle.c >> index 05cfb347ec26..b85ba0df9b02 100644 >> --- a/arch/arm64/kernel/idle.c >> +++ b/arch/arm64/kernel/idle.c >> @@ -43,3 +43,4 @@ void __cpuidle arch_cpu_idle(void) >> */ >> cpu_do_idle(); > > Hi, Ankur, > > With haltpoll_driver registered, arch_cpu_idle() on x86 can select > mwait_idle() in idle threads. > > It use MONITOR sets up an effective address range that is monitored > for write-to-memory activities; MWAIT places the processor in > an optimized state (this may vary between different implementations) > until a write to the monitored address range occurs. MWAIT is more capable than WFE -- it allows selection of deeper idle state. IIRC C2/C3. > Should arch_cpu_idle() on arm64 also use the LDXR/WFE > to avoid wakeup IPI like x86 monitor/mwait? Avoiding the wakeup IPI needs TIF_NR_POLLING and polling in idle support that this series adds. As Haris notes, the negative with only using WFE is that it only allows a single idle state, one that is fairly shallow because the event-stream causes a wakeup every 100us. -- ankur
在 2025/4/12 04:57, Ankur Arora 写道: > > Shuai Xue <xueshuai@linux.alibaba.com> writes: > >> 在 2025/2/19 05:33, Ankur Arora 写道: >>> Needed for cpuidle-haltpoll. >>> Acked-by: Will Deacon <will@kernel.org> >>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> >>> --- >>> arch/arm64/kernel/idle.c | 1 + >>> 1 file changed, 1 insertion(+) >>> diff --git a/arch/arm64/kernel/idle.c b/arch/arm64/kernel/idle.c >>> index 05cfb347ec26..b85ba0df9b02 100644 >>> --- a/arch/arm64/kernel/idle.c >>> +++ b/arch/arm64/kernel/idle.c >>> @@ -43,3 +43,4 @@ void __cpuidle arch_cpu_idle(void) >>> */ >>> cpu_do_idle(); >> >> Hi, Ankur, >> >> With haltpoll_driver registered, arch_cpu_idle() on x86 can select >> mwait_idle() in idle threads. >> >> It use MONITOR sets up an effective address range that is monitored >> for write-to-memory activities; MWAIT places the processor in >> an optimized state (this may vary between different implementations) >> until a write to the monitored address range occurs. > > MWAIT is more capable than WFE -- it allows selection of deeper idle > state. IIRC C2/C3. > >> Should arch_cpu_idle() on arm64 also use the LDXR/WFE >> to avoid wakeup IPI like x86 monitor/mwait? > > Avoiding the wakeup IPI needs TIF_NR_POLLING and polling in idle support > that this series adds. > > As Haris notes, the negative with only using WFE is that it only allows > a single idle state, one that is fairly shallow because the event-stream > causes a wakeup every 100us. > > -- > ankur Hi, Ankur and Haris Got it, thanks for explaination :) Comparing sched-pipe performance on Rund with Yitian 710, *IPC improved 35%*: w/o haltpoll Performance counter stats for 'CPU(s) 0,1' (5 runs): 32521.53 msec task-clock # 2.000 CPUs utilized ( +- 1.16% ) 38081402726 cycles # 1.171 GHz ( +- 1.70% ) 27324614561 instructions # 0.72 insn per cycle ( +- 0.12% ) 181 sched:sched_wake_idle_without_ipi # 0.006 K/sec w/ haltpoll Performance counter stats for 'CPU(s) 0,1' (5 runs): 9477.15 msec task-clock # 2.000 CPUs utilized ( +- 0.89% ) 21486828269 cycles # 2.267 GHz ( +- 0.35% ) 23867109747 instructions # 1.11 insn per cycle ( +- 0.11% ) 1925207 sched:sched_wake_idle_without_ipi # 0.203 M/sec Comparing sched-pipe performance on QEMU with Kunpeng 920, *IPC improved 10%*: w/o haltpoll Performance counter stats for 'CPU(s) 0,1' (5 runs): 34,007.89 msec task-clock # 2.000 CPUs utilized ( +- 8.86% ) 4,407,859,620 cycles # 0.130 GHz ( +- 84.92% ) 2,482,046,461 instructions # 0.56 insn per cycle ( +- 88.27% ) 16 sched:sched_wake_idle_without_ipi # 0.470 /sec ( +- 98.77% ) 17.00 +- 1.51 seconds time elapsed ( +- 8.86% ) w/ haltpoll Performance counter stats for 'CPU(s) 0,1' (5 runs): 16,894.37 msec task-clock # 2.000 CPUs utilized ( +- 3.80% ) 8,703,158,826 cycles # 0.515 GHz ( +- 31.31% ) 5,379,257,839 instructions # 0.62 insn per cycle ( +- 30.03% ) 549,434 sched:sched_wake_idle_without_ipi # 32.522 K/sec ( +- 30.05% ) 8.447 +- 0.321 seconds time elapsed ( +- 3.80% ) Tested-by: Shuai Xue <xueshuai@linux.alibaba.com> Thanks. Shuai
在 2025/4/14 11:46, Ankur Arora 写道: > > Shuai Xue <xueshuai@linux.alibaba.com> writes: > >> 在 2025/4/12 04:57, Ankur Arora 写道: >>> Shuai Xue <xueshuai@linux.alibaba.com> writes: >>> >>>> 在 2025/2/19 05:33, Ankur Arora 写道: >>>>> Needed for cpuidle-haltpoll. >>>>> Acked-by: Will Deacon <will@kernel.org> >>>>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> >>>>> --- >>>>> arch/arm64/kernel/idle.c | 1 + >>>>> 1 file changed, 1 insertion(+) >>>>> diff --git a/arch/arm64/kernel/idle.c b/arch/arm64/kernel/idle.c >>>>> index 05cfb347ec26..b85ba0df9b02 100644 >>>>> --- a/arch/arm64/kernel/idle.c >>>>> +++ b/arch/arm64/kernel/idle.c >>>>> @@ -43,3 +43,4 @@ void __cpuidle arch_cpu_idle(void) >>>>> */ >>>>> cpu_do_idle(); >>>> >>>> Hi, Ankur, >>>> >>>> With haltpoll_driver registered, arch_cpu_idle() on x86 can select >>>> mwait_idle() in idle threads. >>>> >>>> It use MONITOR sets up an effective address range that is monitored >>>> for write-to-memory activities; MWAIT places the processor in >>>> an optimized state (this may vary between different implementations) >>>> until a write to the monitored address range occurs. >>> MWAIT is more capable than WFE -- it allows selection of deeper idle >>> state. IIRC C2/C3. >>> >>>> Should arch_cpu_idle() on arm64 also use the LDXR/WFE >>>> to avoid wakeup IPI like x86 monitor/mwait? >>> Avoiding the wakeup IPI needs TIF_NR_POLLING and polling in idle support >>> that this series adds. >>> As Haris notes, the negative with only using WFE is that it only allows >>> a single idle state, one that is fairly shallow because the event-stream >>> causes a wakeup every 100us. >>> -- >>> ankur >> >> Hi, Ankur and Haris >> >> Got it, thanks for explaination :) >> >> Comparing sched-pipe performance on Rund with Yitian 710, *IPC improved 35%*: > > Thanks for testing Shuai. I wasn't expecting the IPC to improve by quite > that much :). The reduced instructions make sense since we don't have to > handle the IRQ anymore but we would spend some of the saved cycles > waiting in WFE instead. > > I'm not familiar with the Yitian 710. Can you check if you are running > with WFE? That's the __smp_cond_load_relaxed_timewait() path vs the > __smp_cond_load_relaxed_spinwait() path in [0]. Same question for the > Kunpeng 920. Yes, it running with __smp_cond_load_relaxed_timewait(). I use perf-probe to check if WFE is available in Guest: perf probe 'arch_timer_evtstrm_available%return r=$retval' perf record -e probe:arch_timer_evtstrm_available__return -aR sleep 1 perf script swapper 0 [000] 1360.063049: probe:arch_timer_evtstrm_available__return: (ffff800080a5c640 <- ffff800080d42764) r=0x1 arch_timer_evtstrm_available returns true, so __smp_cond_load_relaxed_timewait() is used. > > Also, I'm working on a new version of the series in [1]. Would you be > okay trying that out? Sure. Please cc me when you send out a new version. > > Thanks > Ankur > > [0] https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com/ > [1] https://lore.kernel.org/lkml/20250203214911.898276-4-ankur.a.arora@oracle.com/ > Thanks. Shuai
diff --git a/arch/arm64/kernel/idle.c b/arch/arm64/kernel/idle.c index 05cfb347ec26..b85ba0df9b02 100644 --- a/arch/arm64/kernel/idle.c +++ b/arch/arm64/kernel/idle.c @@ -43,3 +43,4 @@ void __cpuidle arch_cpu_idle(void) */ cpu_do_idle(); } +EXPORT_SYMBOL_GPL(arch_cpu_idle);