[lkp-developer,sched/core] 6b94780e45: unixbench.score -4.5% regression

Message ID 20170102145637.GA8760@linaro.org
State New
Headers show

Commit Message

Vincent Guittot Jan. 2, 2017, 2:56 p.m.
Hi Xiaolong,

Le Monday 19 Dec 2016 à 08:14:53 (+0800), kernel test robot a écrit :
> 

> Greeting,

> 

> FYI, we noticed a -4.5% regression of unixbench.score due to commit:


I have been able to restore performance on my platform with the patch below.
Could you test it ?

---
 kernel/sched/core.c | 1 +
 1 file changed, 1 insertion(+)

-- 
2.7.4

Vincent

> 

> 

> commit: 6b94780e45c17b83e3e75f8aaca5a328db583c74 ("sched/core: Use load_avg for selecting idlest group")

> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

> 

> in testcase: unixbench

> on test machine: 24 threads Nehalem-EP with 24G memory

> with following parameters:

> 

> 	runtime: 300s

> 	nr_task: 100%

> 	test: shell1

> 	cpufreq_governor: performance

> 

> test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.

> test-url: https://github.com/kdlucas/byte-unixbench

> 

> In addition to that, the commit also has significant impact on the following tests:

> 

> +------------------+-----------------------------------------------------------------------+

> | testcase: change | unixbench: unixbench.score -2.9% regression                           |

> | test machine     | 8 threads Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz with 6G memory       |

> | test parameters  | nr_task=1                                                             |

> |                  | runtime=300s                                                          |

> |                  | test=shell8                                                           |

> +------------------+-----------------------------------------------------------------------+

> 

> 

> Details are as below:

> -------------------------------------------------------------------------------------------------->

> 

> 

> To reproduce:

> 

>         git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git

>         cd lkp-tests

>         bin/lkp install job.yaml  # job file is attached in this email

>         bin/lkp run     job.yaml

> 

> testcase/path_params/tbox_group/run: unixbench/300s-100%-shell1-performance/lkp-wsm-ep1

> 

> f519a3f1c6b7a990  6b94780e45c17b83e3e75f8aac  

> ----------------  --------------------------  

>      25565              -5%      24414        unixbench.score

>   29557557                    28781098        unixbench.time.voluntary_context_switches

>       5743              -4%       5514        unixbench.time.user_time

>  9.232e+08              -4%  8.831e+08        unixbench.time.minor_page_faults

>       1807              -5%       1709        unixbench.time.percent_of_cpu_this_job_got

>       5656              -7%       5271        unixbench.time.system_time

>   13223805             -20%   10628072        unixbench.time.involuntary_context_switches

>     741766             -62%     279054        interrupts.CAL:Function_call_interrupts

>      31060              -9%      28214        vmstat.system.in

>     126250             -12%     110890        vmstat.system.cs

>      78.58              -6%      74.20        turbostat.%Busy

>       2507              -6%       2366        turbostat.Avg_MHz

>       9134 ± 47%     -6e+03       2973 ± 36%  latency_stats.max.pipe_read.__vfs_read.vfs_read.SyS_read.entry_SYSCALL_64_fastpath

>     380879 ± 10%      5e+05     887692 ± 49%  latency_stats.sum.wait_on_page_bit_killable.__lock_page_or_retry.filemap_fault.__do_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault

>      31710 ± 15%     -2e+04      10583 ± 14%  latency_stats.sum.call_rwsem_down_write_failed.__vma_adjust.__split_vma.do_munmap.vm_munmap.elf_map.load_elf_binary.search_binary_handler.do_execveat_common.SyS_execve.do_syscall_64.return_from_SYSCALL_64

>      51796 ±  4%     -4e+04      15457 ± 10%  latency_stats.sum.call_rwsem_down_write_failed.unlink_file_vma.free_pgtables.unmap_region.do_munmap.vm_munmap.elf_map.load_elf_binary.search_binary_handler.do_execveat_common.SyS_execve.do_syscall_64

>     111998 ± 18%     -7e+04      37074 ± 14%  latency_stats.sum.call_rwsem_down_write_failed.__vma_adjust.__split_vma.do_munmap.mmap_region.do_mmap.vm_mmap_pgoff.SyS_mmap_pgoff.SyS_mmap.entry_SYSCALL_64_fastpath

>     275087 ± 15%     -2e+05      81973 ±  3%  latency_stats.sum.call_rwsem_down_write_failed.unlink_file_vma.free_pgtables.unmap_region.do_munmap.mmap_region.do_mmap.vm_mmap_pgoff.SyS_mmap_pgoff.SyS_mmap.entry_SYSCALL_64_fastpath

>     930993 ± 12%     -6e+05     320520 ±  4%  latency_stats.sum.call_rwsem_down_write_failed.vma_link.mmap_region.do_mmap.vm_mmap_pgoff.vm_mmap.elf_map.load_elf_binary.search_binary_handler.do_execveat_common.SyS_execve.do_syscall_64

>    4755783 ±  9%     -3e+06    1619348 ±  4%  latency_stats.sum.call_rwsem_down_write_failed.__vma_adjust.__split_vma.split_vma.mprotect_fixup.do_mprotect_pkey.SyS_mprotect.entry_SYSCALL_64_fastpath

>    5536067 ± 10%     -4e+06    1929338 ±  3%  latency_stats.sum.call_rwsem_down_write_failed.copy_process._do_fork.SyS_clone.do_syscall_64.return_from_SYSCALL_64

>  9.032e+08              -4%   8.64e+08        perf-stat.page-faults

>  9.032e+08              -4%   8.64e+08        perf-stat.minor-faults

>  2.329e+09                   2.269e+09        perf-stat.node-load-misses

>    2.2e+09              -9%  2.011e+09 ±  5%  perf-stat.dTLB-store-misses

>  3.278e+10              -9%  2.987e+10 ±  6%  perf-stat.dTLB-load-misses

>   19484819              13%   21974129        perf-stat.cpu-migrations

>  3.755e+13              -6%   3.54e+13        perf-stat.cpu-cycles

>       3244               4%       3379        perf-stat.instructions-per-iTLB-miss

>  4.536e+12              -4%  4.332e+12        perf-stat.branch-instructions

>  2.303e+13              -4%  2.208e+13        perf-stat.instructions

>  5.768e+12              -4%  5.517e+12        perf-stat.dTLB-loads

>  3.567e+11              -4%  3.414e+11        perf-stat.cache-references

>       2.97                        2.93        perf-stat.branch-miss-rate%

>  2.768e+10                   2.699e+10        perf-stat.node-stores

>  5.446e+10              -3%  5.275e+10        perf-stat.cache-misses

>       0.03              -4%       0.03        perf-stat.iTLB-load-miss-rate%

>  9.673e+09              -4%  9.294e+09        perf-stat.node-loads

>  3.596e+12              -4%  3.442e+12        perf-stat.dTLB-stores

>       0.61                        0.62        perf-stat.ipc

>  1.347e+11              -6%   1.27e+11        perf-stat.branch-misses

>  7.098e+09              -8%  6.533e+09        perf-stat.iTLB-load-misses

>  2.309e+13              -4%  2.206e+13        perf-stat.iTLB-loads

>   79911173             -12%   70187035        perf-stat.context-switches

> 

> 

> 

>                                  turbostat._Busy

> 

>   90 ++-------------------------------------*---*---------------------------+

>      |                                    ..       *...*..                  |

>   80 *+..*..*...*..*...*..*...*..*...O...*  O   O  O   O  O...O..O...O  O   O

>   70 O+  O  O   O  O   O  O   O  O                                          |

>      |                                                                      |

>   60 ++                                                                     |

>   50 ++                                                                     |

>      |                                                                      |

>   40 ++                                                                     |

>   30 ++                                                                     |

>      |                                                                      |

>   20 ++                                                                     |

>   10 ++                                                                     |

>      |                                                                      |

>    0 ++----------------------------------O----------------------------------+

> 

> 

> 

> 

> 

>                     unixbench.time.percent_of_cpu_this_job_got

> 

>   2500 ++-------------------------------------------------------------------+

>        |                                                                    |

>        |                                       .*...                        |

>   2000 ++                                   .*.     *..*...                 |

>        *..*...*..*...*..*...*..*...*..O...*. O  O   O  O   O..O...O..O   O  O

>        O  O   O  O   O  O   O  O   O                                        |

>   1500 ++                                                                   |

>        |                                                                    |

>   1000 ++                                                                   |

>        |                                                                    |

>        |                                                                    |

>    500 ++                                                                   |

>        |                                                                    |

>        |                                                                    |

>      0 ++---------------------------------O---------------------------------+

> 

> 

>                                   vmstat.system.in

> 

>   40000 ++------------------------------------------------------------------+

>         |                                          .*...*..                 |

>   35000 ++                                  .*...*.                         |

>   30000 *+.*...*..*...*..*..*...*..*...*..*.               *..*...*..*      |

>         O  O   O  O   O  O  O   O  O   O     O   O  O   O  O  O   O  O   O  O

>   25000 ++                                                                  |

>         |                                                                   |

>   20000 ++                                                                  |

>         |                                                                   |

>   15000 ++                                                                  |

>   10000 ++                                                                  |

>         |                                                                   |

>    5000 ++                                                                  |

>         |                                                                   |

>       0 ++--------------------------------O---------------------------------+

> 

> 	[*] bisect-good sample

> 	[O] bisect-bad  sample

> 

> 

> Disclaimer:

> Results have been estimated based on internal Intel analysis and are provided

> for informational purposes only. Any difference in system hardware or software

> design or configuration may affect actual performance.

> 

> 

> Thanks,

> Xiaolong

Comments

Ye Xiaolong Jan. 3, 2017, 7:13 a.m. | #1
On 01/02, Vincent Guittot wrote:
>Hi Xiaolong,

>

>Le Monday 19 Dec 2016 à 08:14:53 (+0800), kernel test robot a écrit :

>>

>> Greeting,

>>

>> FYI, we noticed a -4.5% regression of unixbench.score due to commit:

>

>I have been able to restore performance on my platform with the patch below.

>Could you test it ?

>

>---

> kernel/sched/core.c | 1

> 1 file changed, 1 insertion(+)

>

>diff --git a/kernel/sched/core.c b/kernel/sched/core.c

>index 393759b..6e7d45c 100644

>--- a/kernel/sched/core.c

>+++ b/kernel/sched/core.c

>@@ -2578,6 +2578,7 @@ void wake_up_new_task(struct task_struct *p)

> 	__set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));

> #endif

> 	rq = __task_rq_lock(p, &rf);

>+	update_rq_clock(rq);

> 	post_init_entity_util_avg(&p->se);

>

> 	activate_task(rq, p, 0);

>--

>2.7.4

>

>Vincent


Hi, Vincent,

I applied your fix patch on top of 6b94780 ("sched/core: Use load_avg for selecting idlest group"),
and here is the comparison. (60df283834fd4def3c11ad2de3 is the fix commit id).
Seems the performance hasn't been restored back.


=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
  gcc-6/performance/x86_64-rhel-7.2/100%/debian-x86_64-2016-08-31.cgz/300s/lkp-wsm-ep1/shell1/unixbench

commit:
  f519a3f1c6b7a990e5aed37a8f853c6ecfdee945
  6b94780e45c17b83e3e75f8aaca5a328db583c74
  60df283834fd4def3c11ad2de3e6fc9e81b7dff1

f519a3f1c6b7a990 6b94780e45c17b83e3e75f8aac 60df283834fd4def3c11ad2de3
---------------- -------------------------- --------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
     25565 ±  0%      -4.5%      24414 ±  0%      -4.5%      24421 ±  0%  unixbench.score
  13223805 ±  2%     -19.6%   10628072 ±  0%     -21.3%   10412818 ±  1%  unixbench.time.involuntary_context_switches
 9.232e+08 ±  0%      -4.3%  8.831e+08 ±  0%      -4.3%  8.838e+08 ±  0%  unixbench.time.minor_page_faults
      1807 ±  0%      -5.4%       1709 ±  0%      -5.6%       1705 ±  0%  unixbench.time.percent_of_cpu_this_job_got
      5656 ±  0%      -6.8%       5271 ±  0%      -7.3%       5243 ±  0%  unixbench.time.system_time
      5743 ±  0%      -4.0%       5514 ±  0%      -3.9%       5516 ±  0%  unixbench.time.user_time
  29557557 ±  0%      -2.6%   28781098 ±  0%      -2.2%   28919280 ±  0%  unixbench.time.voluntary_context_switches
    741766 ±  2%     -62.4%     279054 ±  1%     -61.8%     283034 ±  1%  interrupts.CAL:Function_call_interrupts
   2912823 ±  0%      -9.7%    2630010 ±  0%      -8.7%    2660077 ±  0%  softirqs.RCU
  13223805 ±  2%     -19.6%   10628072 ±  0%     -21.3%   10412818 ±  1%  time.involuntary_context_switches
    126250 ±  0%     -12.2%     110890 ±  0%     -11.5%     111739 ±  0%  vmstat.system.cs
     31060 ±  1%      -9.2%      28214 ±  0%      -9.6%      28078 ±  0%  vmstat.system.in
    454.50 ±150%    +164.7%       1203 ±166%    +792.3%       4055 ± 18%  numa-numastat.node0.numa_foreign
    454.50 ±150%    +164.7%       1203 ±166%    +792.3%       4055 ± 18%  numa-numastat.node0.numa_miss
      4297 ± 15%     -18.1%       3520 ± 57%     -84.5%     666.40 ±113%  numa-numastat.node1.numa_foreign
      4297 ± 15%     -18.1%       3520 ± 57%     -84.5%     666.40 ±113%  numa-numastat.node1.numa_miss
     78.58 ±  0%      -5.6%      74.20 ±  0%      -6.0%      73.90 ±  0%  turbostat.%Busy
      2507 ±  0%      -5.6%       2366 ±  0%      -6.0%       2356 ±  0%  turbostat.Avg_MHz
      3.01 ±  2%    +100.4%       6.03 ±  2%    +100.1%       6.02 ±  0%  turbostat.CPU%c3
      2.35 ±  1%      +6.8%       2.51 ±  4%     +12.1%       2.64 ±  1%  turbostat.CPU%c6
      1.25 ±  5%     -17.1%       1.04 ± 22%     -32.3%       0.85 ±  5%  perf-profile.children.cycles-pp.__irqentry_text_start

Thanks,
Xiaolong

>

>>

>>

>> commit: 6b94780e45c17b83e3e75f8aaca5a328db583c74 ("sched/core: Use load_avg for selecting idlest group")

>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

>>

>> in testcase: unixbench

>> on test machine: 24 threads Nehalem-EP with 24G memory

>> with following parameters:

>>

>> 	runtime: 300s

>> 	nr_task: 100%

>> 	test: shell1

>> 	cpufreq_governor: performance

>>

>> test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.

>> test-url: https://github.com/kdlucas/byte-unixbench

>>

>> In addition to that, the commit also has significant impact on the following tests:

>>

>> +------------------+-----------------------------------------------------------------------+

>> | testcase: change | unixbench: unixbench.score -2.9% regression                           |

>> | test machine     | 8 threads Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz with 6G memory       |

>> | test parameters  | nr_task=1                                                             |

>> |                  | runtime=300s                                                          |

>> |                  | test=shell8                                                           |

>> +------------------+-----------------------------------------------------------------------+

>>

>>

>> Details are as below:

>> -------------------------------------------------------------------------------------------------->

>>

>>

>> To reproduce:

>>

>>         git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git

>>         cd lkp-tests

>>         bin/lkp install job.yaml  # job file is attached in this email

>>         bin/lkp run     job.yaml

>>

>> testcase/path_params/tbox_group/run: unixbench/300s-100%-shell1-performance/lkp-wsm-ep1

>>

>> f519a3f1c6b7a990  6b94780e45c17b83e3e75f8aac

>> ----------------  --------------------------

>>      25565              -5%      24414        unixbench.score

>>   29557557                    28781098        unixbench.time.voluntary_context_switches

>>       5743              -4%       5514        unixbench.time.user_time

>>  9.232e+08              -4%  8.831e+08        unixbench.time.minor_page_faults

>>       1807              -5%       1709        unixbench.time.percent_of_cpu_this_job_got

>>       5656              -7%       5271        unixbench.time.system_time

>>   13223805             -20%   10628072        unixbench.time.involuntary_context_switches

>>     741766             -62%     279054        interrupts.CAL:Function_call_interrupts

>>      31060              -9%      28214        vmstat.system.in

>>     126250             -12%     110890        vmstat.system.cs

>>      78.58              -6%      74.20        turbostat.%Busy

>>       2507              -6%       2366        turbostat.Avg_MHz

>>       9134 ± 47%     -6e+03       2973 ± 36%  latency_stats.max.pipe_read.__vfs_read.vfs_read.SyS_read.entry_SYSCALL_64_fastpath

>>     380879 ± 10%      5e+05     887692 ± 49%  latency_stats.sum.wait_on_page_bit_killable.__lock_page_or_retry.filemap_fault.__do_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault

>>      31710 ± 15%     -2e+04      10583 ± 14%  latency_stats.sum.call_rwsem_down_write_failed.__vma_adjust.__split_vma.do_munmap.vm_munmap.elf_map.load_elf_binary.search_binary_handler.do_execveat_common.SyS_execve.do_syscall_64.return_from_SYSCALL_64

>>      51796 ±  4%     -4e+04      15457 ± 10%  latency_stats.sum.call_rwsem_down_write_failed.unlink_file_vma.free_pgtables.unmap_region.do_munmap.vm_munmap.elf_map.load_elf_binary.search_binary_handler.do_execveat_common.SyS_execve.do_syscall_64

>>     111998 ± 18%     -7e+04      37074 ± 14%  latency_stats.sum.call_rwsem_down_write_failed.__vma_adjust.__split_vma.do_munmap.mmap_region.do_mmap.vm_mmap_pgoff.SyS_mmap_pgoff.SyS_mmap.entry_SYSCALL_64_fastpath

>>     275087 ± 15%     -2e+05      81973 ±  3%  latency_stats.sum.call_rwsem_down_write_failed.unlink_file_vma.free_pgtables.unmap_region.do_munmap.mmap_region.do_mmap.vm_mmap_pgoff.SyS_mmap_pgoff.SyS_mmap.entry_SYSCALL_64_fastpath

>>     930993 ± 12%     -6e+05     320520 ±  4%  latency_stats.sum.call_rwsem_down_write_failed.vma_link.mmap_region.do_mmap.vm_mmap_pgoff.vm_mmap.elf_map.load_elf_binary.search_binary_handler.do_execveat_common.SyS_execve.do_syscall_64

>>    4755783 ±  9%     -3e+06    1619348 ±  4%  latency_stats.sum.call_rwsem_down_write_failed.__vma_adjust.__split_vma.split_vma.mprotect_fixup.do_mprotect_pkey.SyS_mprotect.entry_SYSCALL_64_fastpath

>>    5536067 ± 10%     -4e+06    1929338 ±  3%  latency_stats.sum.call_rwsem_down_write_failed.copy_process._do_fork.SyS_clone.do_syscall_64.return_from_SYSCALL_64

>>  9.032e+08              -4%   8.64e+08        perf-stat.page-faults

>>  9.032e+08              -4%   8.64e+08        perf-stat.minor-faults

>>  2.329e+09                   2.269e+09        perf-stat.node-load-misses

>>    2.2e+09              -9%  2.011e+09 ±  5%  perf-stat.dTLB-store-misses

>>  3.278e+10              -9%  2.987e+10 ±  6%  perf-stat.dTLB-load-misses

>>   19484819              13%   21974129        perf-stat.cpu-migrations

>>  3.755e+13              -6%   3.54e+13        perf-stat.cpu-cycles

>>       3244               4%       3379        perf-stat.instructions-per-iTLB-miss

>>  4.536e+12              -4%  4.332e+12        perf-stat.branch-instructions

>>  2.303e+13              -4%  2.208e+13        perf-stat.instructions

>>  5.768e+12              -4%  5.517e+12        perf-stat.dTLB-loads

>>  3.567e+11              -4%  3.414e+11        perf-stat.cache-references

>>       2.97                        2.93        perf-stat.branch-miss-rate%

>>  2.768e+10                   2.699e+10        perf-stat.node-stores

>>  5.446e+10              -3%  5.275e+10        perf-stat.cache-misses

>>       0.03              -4%       0.03        perf-stat.iTLB-load-miss-rate%

>>  9.673e+09              -4%  9.294e+09        perf-stat.node-loads

>>  3.596e+12              -4%  3.442e+12        perf-stat.dTLB-stores

>>       0.61                        0.62        perf-stat.ipc

>>  1.347e+11              -6%   1.27e+11        perf-stat.branch-misses

>>  7.098e+09              -8%  6.533e+09        perf-stat.iTLB-load-misses

>>  2.309e+13              -4%  2.206e+13        perf-stat.iTLB-loads

>>   79911173             -12%   70187035        perf-stat.context-switches

>>

>>

>>

>>                                  turbostat._Busy

>>

>>   90 ++-------------------------------------*---*---------------------------+

>>      |                                    ..       *...*..                  |

>>   80 *+..*..*...*..*...*..*...*..*...O...*  O   O  O   O  O...O..O...O  O   O

>>   70 O+  O  O   O  O   O  O   O  O                                          |

>>      |                                                                      |

>>   60 ++                                                                     |

>>   50 ++                                                                     |

>>      |                                                                      |

>>   40 ++                                                                     |

>>   30 ++                                                                     |

>>      |                                                                      |

>>   20 ++                                                                     |

>>   10 ++                                                                     |

>>      |                                                                      |

>>    0 ++----------------------------------O----------------------------------+

>>

>>

>>

>>

>>

>>                     unixbench.time.percent_of_cpu_this_job_got

>>

>>   2500 ++-------------------------------------------------------------------+

>>        |                                                                    |

>>        |                                       .*...                        |

>>   2000 ++                                   .*.     *..*...                 |

>>        *..*...*..*...*..*...*..*...*..O...*. O  O   O  O   O..O...O..O   O  O

>>        O  O   O  O   O  O   O  O   O                                        |

>>   1500 ++                                                                   |

>>        |                                                                    |

>>   1000 ++                                                                   |

>>        |                                                                    |

>>        |                                                                    |

>>    500 ++                                                                   |

>>        |                                                                    |

>>        |                                                                    |

>>      0 ++---------------------------------O---------------------------------+

>>

>>

>>                                   vmstat.system.in

>>

>>   40000 ++------------------------------------------------------------------+

>>         |                                          .*...*..                 |

>>   35000 ++                                  .*...*.                         |

>>   30000 *+.*...*..*...*..*..*...*..*...*..*.               *..*...*..*      |

>>         O  O   O  O   O  O  O   O  O   O     O   O  O   O  O  O   O  O   O  O

>>   25000 ++                                                                  |

>>         |                                                                   |

>>   20000 ++                                                                  |

>>         |                                                                   |

>>   15000 ++                                                                  |

>>   10000 ++                                                                  |

>>         |                                                                   |

>>    5000 ++                                                                  |

>>         |                                                                   |

>>       0 ++--------------------------------O---------------------------------+

>>

>> 	[*] bisect-good sample

>> 	[O] bisect-bad  sample

>>

>>

>> Disclaimer:

>> Results have been estimated based on internal Intel analysis and are provided

>> for informational purposes only. Any difference in system hardware or software

>> design or configuration may affect actual performance.

>>

>>

>> Thanks,

>> Xiaolong

>
Vincent Guittot Jan. 3, 2017, 9:01 a.m. | #2
Hi Xiaolong,

Thanks for testing, I'm going to look for another root cause
It was also mentioned  a -2.9% regression with a 8 threads Intel(R)
Core(TM) i7 CPU 870 @ 2.93GHz with 6G memory. Have you checked this
platform too ?

Regards,
Vincent

On 3 January 2017 at 08:13, Ye Xiaolong <xiaolong.ye@intel.com> wrote:
> On 01/02, Vincent Guittot wrote:

>>Hi Xiaolong,

>>

>>Le Monday 19 Dec 2016 ŕ 08:14:53 (+0800), kernel test robot a écrit :

>>>

>>> Greeting,

>>>

>>> FYI, we noticed a -4.5% regression of unixbench.score due to commit:

>>

>>I have been able to restore performance on my platform with the patch below.

>>Could you test it ?

>>

>>---

>> kernel/sched/core.c | 1

>> 1 file changed, 1 insertion(+)

>>

>>diff --git a/kernel/sched/core.c b/kernel/sched/core.c

>>index 393759b..6e7d45c 100644

>>--- a/kernel/sched/core.c

>>+++ b/kernel/sched/core.c

>>@@ -2578,6 +2578,7 @@ void wake_up_new_task(struct task_struct *p)

>>       __set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));

>> #endif

>>       rq = __task_rq_lock(p, &rf);

>>+      update_rq_clock(rq);

>>       post_init_entity_util_avg(&p->se);

>>

>>       activate_task(rq, p, 0);

>>--

>>2.7.4

>>

>>Vincent

>

> Hi, Vincent,

>

> I applied your fix patch on top of 6b94780 ("sched/core: Use load_avg for selecting idlest group"),

> and here is the comparison. (60df283834fd4def3c11ad2de3 is the fix commit id).

> Seems the performance hasn't been restored back.


Thanks for testings.
>

>

> =========================================================================================

> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:

>   gcc-6/performance/x86_64-rhel-7.2/100%/debian-x86_64-2016-08-31.cgz/300s/lkp-wsm-ep1/shell1/unixbench

>

> commit:

>   f519a3f1c6b7a990e5aed37a8f853c6ecfdee945

>   6b94780e45c17b83e3e75f8aaca5a328db583c74

>   60df283834fd4def3c11ad2de3e6fc9e81b7dff1

>

> f519a3f1c6b7a990 6b94780e45c17b83e3e75f8aac 60df283834fd4def3c11ad2de3

> ---------------- -------------------------- --------------------------

>          %stddev     %change         %stddev     %change         %stddev

>              \          |                \          |                \

>      25565 ą  0%      -4.5%      24414 ą  0%      -4.5%      24421 ą  0%  unixbench.score

>   13223805 ą  2%     -19.6%   10628072 ą  0%     -21.3%   10412818 ą  1%  unixbench.time.involuntary_context_switches

>  9.232e+08 ą  0%      -4.3%  8.831e+08 ą  0%      -4.3%  8.838e+08 ą  0%  unixbench.time.minor_page_faults

>       1807 ą  0%      -5.4%       1709 ą  0%      -5.6%       1705 ą  0%  unixbench.time.percent_of_cpu_this_job_got

>       5656 ą  0%      -6.8%       5271 ą  0%      -7.3%       5243 ą  0%  unixbench.time.system_time

>       5743 ą  0%      -4.0%       5514 ą  0%      -3.9%       5516 ą  0%  unixbench.time.user_time

>   29557557 ą  0%      -2.6%   28781098 ą  0%      -2.2%   28919280 ą  0%  unixbench.time.voluntary_context_switches

>     741766 ą  2%     -62.4%     279054 ą  1%     -61.8%     283034 ą  1%  interrupts.CAL:Function_call_interrupts

>    2912823 ą  0%      -9.7%    2630010 ą  0%      -8.7%    2660077 ą  0%  softirqs.RCU

>   13223805 ą  2%     -19.6%   10628072 ą  0%     -21.3%   10412818 ą  1%  time.involuntary_context_switches

>     126250 ą  0%     -12.2%     110890 ą  0%     -11.5%     111739 ą  0%  vmstat.system.cs

>      31060 ą  1%      -9.2%      28214 ą  0%      -9.6%      28078 ą  0%  vmstat.system.in

>     454.50 ą150%    +164.7%       1203 ą166%    +792.3%       4055 ą 18%  numa-numastat.node0.numa_foreign

>     454.50 ą150%    +164.7%       1203 ą166%    +792.3%       4055 ą 18%  numa-numastat.node0.numa_miss

>       4297 ą 15%     -18.1%       3520 ą 57%     -84.5%     666.40 ą113%  numa-numastat.node1.numa_foreign

>       4297 ą 15%     -18.1%       3520 ą 57%     -84.5%     666.40 ą113%  numa-numastat.node1.numa_miss

>      78.58 ą  0%      -5.6%      74.20 ą  0%      -6.0%      73.90 ą  0%  turbostat.%Busy

>       2507 ą  0%      -5.6%       2366 ą  0%      -6.0%       2356 ą  0%  turbostat.Avg_MHz

>       3.01 ą  2%    +100.4%       6.03 ą  2%    +100.1%       6.02 ą  0%  turbostat.CPU%c3

>       2.35 ą  1%      +6.8%       2.51 ą  4%     +12.1%       2.64 ą  1%  turbostat.CPU%c6

>       1.25 ą  5%     -17.1%       1.04 ą 22%     -32.3%       0.85 ą  5%  perf-profile.children.cycles-pp.__irqentry_text_start

>

> Thanks,

> Xiaolong

>

>>

>>>

>>>

>>> commit: 6b94780e45c17b83e3e75f8aaca5a328db583c74 ("sched/core: Use load_avg for selecting idlest group")

>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

>>>

>>> in testcase: unixbench

>>> on test machine: 24 threads Nehalem-EP with 24G memory

>>> with following parameters:

>>>

>>>      runtime: 300s

>>>      nr_task: 100%

>>>      test: shell1

>>>      cpufreq_governor: performance

>>>

>>> test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.

>>> test-url: https://github.com/kdlucas/byte-unixbench

>>>

>>> In addition to that, the commit also has significant impact on the following tests:

>>>

>>> +------------------+-----------------------------------------------------------------------+

>>> | testcase: change | unixbench: unixbench.score -2.9% regression                           |

>>> | test machine     | 8 threads Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz with 6G memory       |

>>> | test parameters  | nr_task=1                                                             |

>>> |                  | runtime=300s                                                          |

>>> |                  | test=shell8                                                           |

>>> +------------------+-----------------------------------------------------------------------+

>>>

>>>

>>> Details are as below:

>>> -------------------------------------------------------------------------------------------------->

>>>

>>>

>>> To reproduce:

>>>

>>>         git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git

>>>         cd lkp-tests

>>>         bin/lkp install job.yaml  # job file is attached in this email

>>>         bin/lkp run     job.yaml

>>>

>>> testcase/path_params/tbox_group/run: unixbench/300s-100%-shell1-performance/lkp-wsm-ep1

>>>

>>> f519a3f1c6b7a990  6b94780e45c17b83e3e75f8aac

>>> ----------------  --------------------------

>>>      25565              -5%      24414        unixbench.score

>>>   29557557                    28781098        unixbench.time.voluntary_context_switches

>>>       5743              -4%       5514        unixbench.time.user_time

>>>  9.232e+08              -4%  8.831e+08        unixbench.time.minor_page_faults

>>>       1807              -5%       1709        unixbench.time.percent_of_cpu_this_job_got

>>>       5656              -7%       5271        unixbench.time.system_time

>>>   13223805             -20%   10628072        unixbench.time.involuntary_context_switches

>>>     741766             -62%     279054        interrupts.CAL:Function_call_interrupts

>>>      31060              -9%      28214        vmstat.system.in

>>>     126250             -12%     110890        vmstat.system.cs

>>>      78.58              -6%      74.20        turbostat.%Busy

>>>       2507              -6%       2366        turbostat.Avg_MHz

>>>       9134 ą 47%     -6e+03       2973 ą 36%  latency_stats.max.pipe_read.__vfs_read.vfs_read.SyS_read.entry_SYSCALL_64_fastpath

>>>     380879 ą 10%      5e+05     887692 ą 49%  latency_stats.sum.wait_on_page_bit_killable.__lock_page_or_retry.filemap_fault.__do_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault

>>>      31710 ą 15%     -2e+04      10583 ą 14%  latency_stats.sum.call_rwsem_down_write_failed.__vma_adjust.__split_vma.do_munmap.vm_munmap.elf_map.load_elf_binary.search_binary_handler.do_execveat_common.SyS_execve.do_syscall_64.return_from_SYSCALL_64

>>>      51796 ą  4%     -4e+04      15457 ą 10%  latency_stats.sum.call_rwsem_down_write_failed.unlink_file_vma.free_pgtables.unmap_region.do_munmap.vm_munmap.elf_map.load_elf_binary.search_binary_handler.do_execveat_common.SyS_execve.do_syscall_64

>>>     111998 ą 18%     -7e+04      37074 ą 14%  latency_stats.sum.call_rwsem_down_write_failed.__vma_adjust.__split_vma.do_munmap.mmap_region.do_mmap.vm_mmap_pgoff.SyS_mmap_pgoff.SyS_mmap.entry_SYSCALL_64_fastpath

>>>     275087 ą 15%     -2e+05      81973 ą  3%  latency_stats.sum.call_rwsem_down_write_failed.unlink_file_vma.free_pgtables.unmap_region.do_munmap.mmap_region.do_mmap.vm_mmap_pgoff.SyS_mmap_pgoff.SyS_mmap.entry_SYSCALL_64_fastpath

>>>     930993 ą 12%     -6e+05     320520 ą  4%  latency_stats.sum.call_rwsem_down_write_failed.vma_link.mmap_region.do_mmap.vm_mmap_pgoff.vm_mmap.elf_map.load_elf_binary.search_binary_handler.do_execveat_common.SyS_execve.do_syscall_64

>>>    4755783 ą  9%     -3e+06    1619348 ą  4%  latency_stats.sum.call_rwsem_down_write_failed.__vma_adjust.__split_vma.split_vma.mprotect_fixup.do_mprotect_pkey.SyS_mprotect.entry_SYSCALL_64_fastpath

>>>    5536067 ą 10%     -4e+06    1929338 ą  3%  latency_stats.sum.call_rwsem_down_write_failed.copy_process._do_fork.SyS_clone.do_syscall_64.return_from_SYSCALL_64

>>>  9.032e+08              -4%   8.64e+08        perf-stat.page-faults

>>>  9.032e+08              -4%   8.64e+08        perf-stat.minor-faults

>>>  2.329e+09                   2.269e+09        perf-stat.node-load-misses

>>>    2.2e+09              -9%  2.011e+09 ą  5%  perf-stat.dTLB-store-misses

>>>  3.278e+10              -9%  2.987e+10 ą  6%  perf-stat.dTLB-load-misses

>>>   19484819              13%   21974129        perf-stat.cpu-migrations

>>>  3.755e+13              -6%   3.54e+13        perf-stat.cpu-cycles

>>>       3244               4%       3379        perf-stat.instructions-per-iTLB-miss

>>>  4.536e+12              -4%  4.332e+12        perf-stat.branch-instructions

>>>  2.303e+13              -4%  2.208e+13        perf-stat.instructions

>>>  5.768e+12              -4%  5.517e+12        perf-stat.dTLB-loads

>>>  3.567e+11              -4%  3.414e+11        perf-stat.cache-references

>>>       2.97                        2.93        perf-stat.branch-miss-rate%

>>>  2.768e+10                   2.699e+10        perf-stat.node-stores

>>>  5.446e+10              -3%  5.275e+10        perf-stat.cache-misses

>>>       0.03              -4%       0.03        perf-stat.iTLB-load-miss-rate%

>>>  9.673e+09              -4%  9.294e+09        perf-stat.node-loads

>>>  3.596e+12              -4%  3.442e+12        perf-stat.dTLB-stores

>>>       0.61                        0.62        perf-stat.ipc

>>>  1.347e+11              -6%   1.27e+11        perf-stat.branch-misses

>>>  7.098e+09              -8%  6.533e+09        perf-stat.iTLB-load-misses

>>>  2.309e+13              -4%  2.206e+13        perf-stat.iTLB-loads

>>>   79911173             -12%   70187035        perf-stat.context-switches

>>>

>>>

>>>

>>>                                  turbostat._Busy

>>>

>>>   90 ++-------------------------------------*---*---------------------------+

>>>      |                                    ..       *...*..                  |

>>>   80 *+..*..*...*..*...*..*...*..*...O...*  O   O  O   O  O...O..O...O  O   O

>>>   70 O+  O  O   O  O   O  O   O  O                                          |

>>>      |                                                                      |

>>>   60 ++                                                                     |

>>>   50 ++                                                                     |

>>>      |                                                                      |

>>>   40 ++                                                                     |

>>>   30 ++                                                                     |

>>>      |                                                                      |

>>>   20 ++                                                                     |

>>>   10 ++                                                                     |

>>>      |                                                                      |

>>>    0 ++----------------------------------O----------------------------------+

>>>

>>>

>>>

>>>

>>>

>>>                     unixbench.time.percent_of_cpu_this_job_got

>>>

>>>   2500 ++-------------------------------------------------------------------+

>>>        |                                                                    |

>>>        |                                       .*...                        |

>>>   2000 ++                                   .*.     *..*...                 |

>>>        *..*...*..*...*..*...*..*...*..O...*. O  O   O  O   O..O...O..O   O  O

>>>        O  O   O  O   O  O   O  O   O                                        |

>>>   1500 ++                                                                   |

>>>        |                                                                    |

>>>   1000 ++                                                                   |

>>>        |                                                                    |

>>>        |                                                                    |

>>>    500 ++                                                                   |

>>>        |                                                                    |

>>>        |                                                                    |

>>>      0 ++---------------------------------O---------------------------------+

>>>

>>>

>>>                                   vmstat.system.in

>>>

>>>   40000 ++------------------------------------------------------------------+

>>>         |                                          .*...*..                 |

>>>   35000 ++                                  .*...*.                         |

>>>   30000 *+.*...*..*...*..*..*...*..*...*..*.               *..*...*..*      |

>>>         O  O   O  O   O  O  O   O  O   O     O   O  O   O  O  O   O  O   O  O

>>>   25000 ++                                                                  |

>>>         |                                                                   |

>>>   20000 ++                                                                  |

>>>         |                                                                   |

>>>   15000 ++                                                                  |

>>>   10000 ++                                                                  |

>>>         |                                                                   |

>>>    5000 ++                                                                  |

>>>         |                                                                   |

>>>       0 ++--------------------------------O---------------------------------+

>>>

>>>      [*] bisect-good sample

>>>      [O] bisect-bad  sample

>>>

>>>

>>> Disclaimer:

>>> Results have been estimated based on internal Intel analysis and are provided

>>> for informational purposes only. Any difference in system hardware or software

>>> design or configuration may affect actual performance.

>>>

>>>

>>> Thanks,

>>> Xiaolong

>>

Patch

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 393759b..6e7d45c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2578,6 +2578,7 @@  void wake_up_new_task(struct task_struct *p)
 	__set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
 #endif
 	rq = __task_rq_lock(p, &rf);
+	update_rq_clock(rq);
 	post_init_entity_util_avg(&p->se);
 
 	activate_task(rq, p, 0);