mbox series

[RFC,v4,0/2] CPU-Idle latency selftest framework

Message ID 20210412074309.38484-1-psampat@linux.ibm.com
Headers show
Series CPU-Idle latency selftest framework | expand

Message

Pratik R. Sampat April 12, 2021, 7:43 a.m. UTC
Changelog v3-->v4
Based on review comments by Doug Smythies,
1. Parsing the thread_siblings_list for CPU topology information to
   correctly identify the cores the test should run on in
   default(quick) mode.
2. The source CPU to source CPU interaction in the IPI test will always
   result in a lower latency and cause a bias in the average, hence
   avoid adding the latency to be averaged for same cpu IPIs. The
   latency will still be displayed in the detailed logs.

RFC v3: https://lkml.org/lkml/2021/4/4/31
---
A kernel module + userspace driver to estimate the wakeup latency
caused by going into stop states. The motivation behind this program is
to find significant deviations behind advertised latency and residency
values.

The patchset measures latencies for two kinds of events. IPIs and Timers
As this is a software-only mechanism, there will additional latencies of
the kernel-firmware-hardware interactions. To account for that, the
program also measures a baseline latency on a 100 percent loaded CPU
and the latencies achieved must be in view relative to that.

To achieve this, we introduce a kernel module and expose its control
knobs through the debugfs interface that the selftests can engage with.

The kernel module provides the following interfaces within
/sys/kernel/debug/latency_test/ for,

IPI test:
    ipi_cpu_dest = Destination CPU for the IPI
    ipi_cpu_src = Origin of the IPI
    ipi_latency_ns = Measured latency time in ns
Timeout test:
    timeout_cpu_src = CPU on which the timer to be queued
    timeout_expected_ns = Timer duration
    timeout_diff_ns = Difference of actual duration vs expected timer

Sample output on a POWER9 system is as follows:
# --IPI Latency Test---
# Baseline Average IPI latency(ns): 3114
# Observed Average IPI latency(ns) - State0: 3265
# Observed Average IPI latency(ns) - State1: 3507
# Observed Average IPI latency(ns) - State2: 3739
# Observed Average IPI latency(ns) - State3: 3807
# Observed Average IPI latency(ns) - State4: 17070
# Observed Average IPI latency(ns) - State5: 1038174
# Observed Average IPI latency(ns) - State6: 1068784
# 
# --Timeout Latency Test--
# Baseline Average timeout diff(ns): 1420
# Observed Average timeout diff(ns) - State0: 1640
# Observed Average timeout diff(ns) - State1: 1764
# Observed Average timeout diff(ns) - State2: 1715
# Observed Average timeout diff(ns) - State3: 1845
# Observed Average timeout diff(ns) - State4: 16581
# Observed Average timeout diff(ns) - State5: 939977
# Observed Average timeout diff(ns) - State6: 1073024


Things to keep in mind:

1. This kernel module + bash driver does not guarantee idleness on a
   core when the IPI and the Timer is armed. It only invokes sleep and
   hopes that the core is idle once the IPI/Timer is invoked onto it.
   Hence this program must be run on a completely idle system for best
   results

2. Even on a completely idle system, there maybe book-keeping tasks or
   jitter tasks that can run on the core we want idle. This can create
   outliers in the latency measurement. Thankfully, these outliers
   should be large enough to easily weed them out.

3. A userspace only selftest variant was also sent out as RFC based on
   suggestions over the previous patchset to simply the kernel
   complexeity. However, a userspace only approach had more noise in
   the latency measurement due to userspace-kernel interactions
   which led to run to run variance and a lesser accurate test.
   Another downside of the nature of a userspace program is that it
   takes orders of magnitude longer to complete a full system test
   compared to the kernel framework.
   RFC patch: https://lkml.org/lkml/2020/9/2/356

4. For Intel Systems, the Timer based latencies don't exactly give out
   the measure of idle latencies. This is because of a hardware
   optimization mechanism that pre-arms a CPU when a timer is set to
   wakeup. That doesn't make this metric useless for Intel systems,
   it just means that is measuring IPI/Timer responding latency rather
   than idle wakeup latencies.
   (Source: https://lkml.org/lkml/2020/9/2/610)
   For solution to this problem, a hardware based latency analyzer is
   devised by Artem Bityutskiy from Intel.
   https://youtu.be/Opk92aQyvt0?t=8266
   https://intel.github.io/wult/

Pratik Rajesh Sampat (2):
  cpuidle: Extract IPI based and timer based wakeup latency from idle
    states
  selftest/cpuidle: Add support for cpuidle latency measurement

 drivers/cpuidle/Makefile                   |   1 +
 drivers/cpuidle/test-cpuidle_latency.c     | 157 ++++++++
 lib/Kconfig.debug                          |  10 +
 tools/testing/selftests/Makefile           |   1 +
 tools/testing/selftests/cpuidle/Makefile   |   6 +
 tools/testing/selftests/cpuidle/cpuidle.sh | 402 +++++++++++++++++++++
 tools/testing/selftests/cpuidle/settings   |   2 +
 7 files changed, 579 insertions(+)
 create mode 100644 drivers/cpuidle/test-cpuidle_latency.c
 create mode 100644 tools/testing/selftests/cpuidle/Makefile
 create mode 100755 tools/testing/selftests/cpuidle/cpuidle.sh
 create mode 100644 tools/testing/selftests/cpuidle/settings

Comments

Doug Smythies April 13, 2021, 2:52 p.m. UTC | #1
Hi Pratik,

V4 seems fine. Thank you.

On Mon, Apr 12, 2021 at 12:43 AM Pratik Rajesh Sampat
<psampat@linux.ibm.com> wrote:
>

> Changelog v3-->v4

> Based on review comments by Doug Smythies,

> 1. Parsing the thread_siblings_list for CPU topology information to

>    correctly identify the cores the test should run on in

>    default(quick) mode.

> 2. The source CPU to source CPU interaction in the IPI test will always

>    result in a lower latency and cause a bias in the average, hence

>    avoid adding the latency to be averaged for same cpu IPIs. The

>    latency will still be displayed in the detailed logs.

>

> RFC v3: https://lkml.org/lkml/2021/4/4/31


Example output for an Intel i5-10600K, HWP active, performance mode.
System very idle:

$ sudo ./cpuidle.sh -v -i
Inserting /lib/modules/5.12.0-rc7-prs-v4/kernel/drivers/cpuidle/test-cpuidle_latency.ko
module

--IPI Latency Test---
Baseline Avg IPI latency(ns): 686
Observed Avg IPI latency(ns) - State 0: 468
Observed Avg IPI latency(ns) - State 1: 956
Observed Avg IPI latency(ns) - State 2: 17936
Observed Avg IPI latency(ns) - State 3: 17968

--Timeout Latency Test--
Baseline Avg timeout diff(ns): 445
Observed Avg timeout diff(ns) - State 0: 377
Observed Avg timeout diff(ns) - State 1: 630
Observed Avg timeout diff(ns) - State 2: 322812
Observed Avg timeout diff(ns) - State 3: 306067
Removing /lib/modules/5.12.0-rc7-prs-v4/kernel/drivers/cpuidle/test-cpuidle_latency.ko
module
Full Output logged at: cpuidle.log

$ grep . /sys/devices/system/cpu/cpu7/cpuidle/state*/residency
/sys/devices/system/cpu/cpu7/cpuidle/state0/residency:0
/sys/devices/system/cpu/cpu7/cpuidle/state1/residency:1
/sys/devices/system/cpu/cpu7/cpuidle/state2/residency:360
/sys/devices/system/cpu/cpu7/cpuidle/state3/residency:3102
$ grep . /sys/devices/system/cpu/cpu7/cpuidle/state*/latency
/sys/devices/system/cpu/cpu7/cpuidle/state0/latency:0
/sys/devices/system/cpu/cpu7/cpuidle/state1/latency:1
/sys/devices/system/cpu/cpu7/cpuidle/state2/latency:120
/sys/devices/system/cpu/cpu7/cpuidle/state3/latency:1034

... Doug