From patchwork Thu Nov 23 00:25:28 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kim Phillips X-Patchwork-Id: 119504 Delivered-To: patch@linaro.org Received: by 10.140.22.164 with SMTP id 33csp134547qgn; Wed, 22 Nov 2017 16:25:33 -0800 (PST) X-Google-Smtp-Source: AGs4zMYQXQH1WLfX0yzpnjSXJ4qLG2JSFtXJN9V1WfMDABReu3F9+VIvj/jiQmVMZ89L1OUVUHlm X-Received: by 10.98.150.221 with SMTP id s90mr18409755pfk.151.1511396733827; Wed, 22 Nov 2017 16:25:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511396733; cv=none; d=google.com; s=arc-20160816; b=wrSAaVKkmwRunNtV472CwqaKgm92DbybdW4JWeM95iYbADHWlUy5yBxEXU2ABVEva9 +WHgdC1N9wONWrAQqIidUiEUxuVekB+wQGQDWmZKlnZWTr6bnPt8MIOsyM8fQ0PxKUFY VwOazvSxZPSRNHMXdoL7HPtmBsj1qCY8aU/mzu4919f4noZGpkFUggNgAfZM9Cqenfv/ fSXnXXmhYOsAnxNwZcl1oJYwEVVf3W7d4T78bCPfCZW8xR/lvCMg1thEHiOS9XlNjSFU SGQh+rolq50apqeozzbEru7Rk11ZrerpV+KB0Lt25YdbUbgAnlZ4rSuR7CZxSiQkUt8w YcjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :organization:message-id:subject:cc:to:from:date :arc-authentication-results; bh=ZiDfNJMr/+RDFzHCuT+e0zoG3hiLq22+sh1c6kEyy3c=; b=Aphui6WajBMhnV/1hmmlvoUO2vFUOUo87KrayU+On7MZw5OdnSMaa7lwTxGDmGnv3v gXJPxTOmoHQ70/lsVeSCnEuUGYYlwM/rN4EpN8h6S+SJMldLpoPoXAuuQ/KxVDhLBfvq Jjj4WYgzfUCWYJqGrI9AeU3A2JVbPChRmiB6TYL/WCZqZoMRa/taxUTHfrVVsXvq2tBk MoABCYesxHhGbGgIKGU9B373woqKelqOr4UdlH+2Z+7OOCxer+C1Sgqg4MRW8Am1Ncvw gKDD/tsUyqLDIECzHK0GCpVXlV+FmoiJYzPZ/YWSJTtOW2EqX3G6BUmC/x2+t2c6SvvF tC8g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o8si14249631pgp.359.2017.11.22.16.25.33; Wed, 22 Nov 2017 16:25:33 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751979AbdKWAZb (ORCPT + 28 others); Wed, 22 Nov 2017 19:25:31 -0500 Received: from foss.arm.com ([217.140.101.70]:58438 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751594AbdKWAZa (ORCPT ); Wed, 22 Nov 2017 19:25:30 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5FFBE1435; Wed, 22 Nov 2017 16:25:29 -0800 (PST) Received: from dupont (dupont.austin.arm.com [10.118.7.100]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D83E93F487; Wed, 22 Nov 2017 16:25:28 -0800 (PST) Date: Wed, 22 Nov 2017 18:25:28 -0600 From: Kim Phillips To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Darren Hart , Colin Ian King Cc: James Yang , linux-kernel@vger.kernel.org Subject: [PATCH 1/3] perf bench futex: benchmark only online CPUs Message-Id: <20171122182528.3a5ac33fa0563b9e25271659@arm.com> Organization: Arm X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: James Yang The "perf bench futex" benchmarks have a problem when not all CPUs in the system are online: perf assumes the CPUs that are online are contiguously numbered and assigns processor affinity to the threads by modulo striping. When the online CPUs are not contiguously numbered, perf errors out with: $ echo 0 | sudo tee /sys/devices/system/cpu/cpu3/online 0 $ ./oldperf bench futex all perf: pthread_create: Operation not permitted Run summary [PID 14934]: 7 threads, each operating on 1024 [private] futexes for 10 secs. $ This patch makes perf not assume all CPUs configured are online, and adds a mapping table to stripe affinity across the CPUs that are online. Signed-off-by: James Yang Signed-off-by: Kim Phillips --- tools/perf/bench/Build | 1 + tools/perf/bench/cpu-online-map.c | 21 +++++++++++++++++++++ tools/perf/bench/cpu-online-map.h | 6 ++++++ tools/perf/bench/futex-hash.c | 14 ++++++++++++-- tools/perf/bench/futex-lock-pi.c | 15 ++++++++++++--- tools/perf/bench/futex-requeue.c | 14 +++++++++++--- tools/perf/bench/futex-wake-parallel.c | 17 ++++++++++++++--- tools/perf/bench/futex-wake.c | 15 ++++++++++++--- 8 files changed, 89 insertions(+), 14 deletions(-) create mode 100644 tools/perf/bench/cpu-online-map.c create mode 100644 tools/perf/bench/cpu-online-map.h -- 2.15.0 diff --git a/tools/perf/bench/Build b/tools/perf/bench/Build index 60bf11943047..bc79d3577ace 100644 --- a/tools/perf/bench/Build +++ b/tools/perf/bench/Build @@ -6,6 +6,7 @@ perf-y += futex-wake.o perf-y += futex-wake-parallel.o perf-y += futex-requeue.o perf-y += futex-lock-pi.o +perf-y += cpu-online-map.o perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-asm.o perf-$(CONFIG_X86_64) += mem-memset-x86-64-asm.o diff --git a/tools/perf/bench/cpu-online-map.c b/tools/perf/bench/cpu-online-map.c new file mode 100644 index 000000000000..4157d2220bf4 --- /dev/null +++ b/tools/perf/bench/cpu-online-map.c @@ -0,0 +1,21 @@ +#include +#include +#include +#include +#include "cpu-online-map.h" + +void compute_cpu_online_map(int *cpu_online_map) +{ + long ncpus_conf = sysconf(_SC_NPROCESSORS_CONF); + cpu_set_t set; + size_t j = 0; + + CPU_ZERO(&set); + + if (sched_getaffinity(0, sizeof(set), &set)) + err(EXIT_FAILURE, "sched_getaffinity"); + + for (int i = 0; i < ncpus_conf; i++) + if (CPU_ISSET(i, &set)) + cpu_online_map[j++] = i; +} diff --git a/tools/perf/bench/cpu-online-map.h b/tools/perf/bench/cpu-online-map.h new file mode 100644 index 000000000000..0620a674c6d8 --- /dev/null +++ b/tools/perf/bench/cpu-online-map.h @@ -0,0 +1,6 @@ +#ifndef CPU_ONLINE_MAP_H +#define CPU_ONLINE_MAP_H + +void compute_cpu_online_map(int *cpu_online_map); + +#endif diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c index 58ae6ed8f38b..81314eea1cf5 100644 --- a/tools/perf/bench/futex-hash.c +++ b/tools/perf/bench/futex-hash.c @@ -24,6 +24,7 @@ #include #include "bench.h" #include "futex.h" +#include "cpu-online-map.h" #include #include @@ -120,9 +121,11 @@ int bench_futex_hash(int argc, const char **argv) int ret = 0; cpu_set_t cpu; struct sigaction act; - unsigned int i, ncpus; + unsigned int i; + long ncpus; pthread_attr_t thread_attr; struct worker *worker = NULL; + int *cpu_map; argc = parse_options(argc, argv, options, bench_futex_hash_usage, 0); if (argc) { @@ -132,6 +135,12 @@ int bench_futex_hash(int argc, const char **argv) ncpus = sysconf(_SC_NPROCESSORS_ONLN); + cpu_map = calloc(ncpus, sizeof(int)); + if (!cpu_map) + goto errmem; + + compute_cpu_online_map(cpu_map); + sigfillset(&act.sa_mask); act.sa_sigaction = toggle_done; sigaction(SIGINT, &act, NULL); @@ -164,7 +173,7 @@ int bench_futex_hash(int argc, const char **argv) goto errmem; CPU_ZERO(&cpu); - CPU_SET(i % ncpus, &cpu); + CPU_SET(cpu_map[i % ncpus], &cpu); ret = pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpu); if (ret) @@ -217,6 +226,7 @@ int bench_futex_hash(int argc, const char **argv) print_summary(); free(worker); + free(cpu_map); return ret; errmem: err(EXIT_FAILURE, "calloc"); diff --git a/tools/perf/bench/futex-lock-pi.c b/tools/perf/bench/futex-lock-pi.c index 08653ae8a8c4..42ab351b2c50 100644 --- a/tools/perf/bench/futex-lock-pi.c +++ b/tools/perf/bench/futex-lock-pi.c @@ -15,6 +15,7 @@ #include #include "bench.h" #include "futex.h" +#include "cpu-online-map.h" #include #include @@ -113,7 +114,8 @@ static void *workerfn(void *arg) return NULL; } -static void create_threads(struct worker *w, pthread_attr_t thread_attr) +static void create_threads(struct worker *w, pthread_attr_t thread_attr, + int *cpu_map) { cpu_set_t cpu; unsigned int i; @@ -131,7 +133,7 @@ static void create_threads(struct worker *w, pthread_attr_t thread_attr) worker[i].futex = &global_futex; CPU_ZERO(&cpu); - CPU_SET(i % ncpus, &cpu); + CPU_SET(cpu_map[i % ncpus], &cpu); if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpu)) err(EXIT_FAILURE, "pthread_attr_setaffinity_np"); @@ -147,12 +149,18 @@ int bench_futex_lock_pi(int argc, const char **argv) unsigned int i; struct sigaction act; pthread_attr_t thread_attr; + int *cpu_map; argc = parse_options(argc, argv, options, bench_futex_lock_pi_usage, 0); if (argc) goto err; ncpus = sysconf(_SC_NPROCESSORS_ONLN); + cpu_map = calloc(ncpus, sizeof(int)); + if (!cpu_map) + err(EXIT_FAILURE, "calloc"); + + compute_cpu_online_map(cpu_map); sigfillset(&act.sa_mask); act.sa_sigaction = toggle_done; @@ -180,7 +188,7 @@ int bench_futex_lock_pi(int argc, const char **argv) pthread_attr_init(&thread_attr); gettimeofday(&start, NULL); - create_threads(worker, thread_attr); + create_threads(worker, thread_attr, cpu_map); pthread_attr_destroy(&thread_attr); pthread_mutex_lock(&thread_lock); @@ -218,6 +226,7 @@ int bench_futex_lock_pi(int argc, const char **argv) print_summary(); free(worker); + free(cpu_map); return ret; err: usage_with_options(bench_futex_lock_pi_usage, options); diff --git a/tools/perf/bench/futex-requeue.c b/tools/perf/bench/futex-requeue.c index 1058c194608a..366961b901c6 100644 --- a/tools/perf/bench/futex-requeue.c +++ b/tools/perf/bench/futex-requeue.c @@ -22,6 +22,7 @@ #include #include "bench.h" #include "futex.h" +#include "cpu-online-map.h" #include #include @@ -83,7 +84,7 @@ static void *workerfn(void *arg __maybe_unused) } static void block_threads(pthread_t *w, - pthread_attr_t thread_attr) + pthread_attr_t thread_attr, int *cpu_map) { cpu_set_t cpu; unsigned int i; @@ -93,7 +94,7 @@ static void block_threads(pthread_t *w, /* create and block all threads */ for (i = 0; i < nthreads; i++) { CPU_ZERO(&cpu); - CPU_SET(i % ncpus, &cpu); + CPU_SET(cpu_map[i % ncpus], &cpu); if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpu)) err(EXIT_FAILURE, "pthread_attr_setaffinity_np"); @@ -116,12 +117,18 @@ int bench_futex_requeue(int argc, const char **argv) unsigned int i, j; struct sigaction act; pthread_attr_t thread_attr; + int *cpu_map; argc = parse_options(argc, argv, options, bench_futex_requeue_usage, 0); if (argc) goto err; ncpus = sysconf(_SC_NPROCESSORS_ONLN); + cpu_map = calloc(ncpus, sizeof(int)); + if (!cpu_map) + err(EXIT_FAILURE, "calloc"); + + compute_cpu_online_map(cpu_map); sigfillset(&act.sa_mask); act.sa_sigaction = toggle_done; @@ -156,7 +163,7 @@ int bench_futex_requeue(int argc, const char **argv) struct timeval start, end, runtime; /* create, launch & block all threads */ - block_threads(worker, thread_attr); + block_threads(worker, thread_attr, cpu_map); /* make sure all threads are already blocked */ pthread_mutex_lock(&thread_lock); @@ -210,6 +217,7 @@ int bench_futex_requeue(int argc, const char **argv) print_summary(); free(worker); + free(cpu_map); return ret; err: usage_with_options(bench_futex_requeue_usage, options); diff --git a/tools/perf/bench/futex-wake-parallel.c b/tools/perf/bench/futex-wake-parallel.c index b4732dad9f89..5617bcd17e55 100644 --- a/tools/perf/bench/futex-wake-parallel.c +++ b/tools/perf/bench/futex-wake-parallel.c @@ -21,6 +21,7 @@ #include #include "bench.h" #include "futex.h" +#include "cpu-online-map.h" #include #include @@ -119,7 +120,8 @@ static void *blocked_workerfn(void *arg __maybe_unused) return NULL; } -static void block_threads(pthread_t *w, pthread_attr_t thread_attr) +static void block_threads(pthread_t *w, pthread_attr_t thread_attr, + int *cpu_map) { cpu_set_t cpu; unsigned int i; @@ -129,7 +131,7 @@ static void block_threads(pthread_t *w, pthread_attr_t thread_attr) /* create and block all threads */ for (i = 0; i < nblocked_threads; i++) { CPU_ZERO(&cpu); - CPU_SET(i % ncpus, &cpu); + CPU_SET(cpu_map[i % ncpus], &cpu); if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpu)) err(EXIT_FAILURE, "pthread_attr_setaffinity_np"); @@ -205,6 +207,7 @@ int bench_futex_wake_parallel(int argc, const char **argv) struct sigaction act; pthread_attr_t thread_attr; struct thread_data *waking_worker; + int *cpu_map; argc = parse_options(argc, argv, options, bench_futex_wake_parallel_usage, 0); @@ -218,6 +221,13 @@ int bench_futex_wake_parallel(int argc, const char **argv) sigaction(SIGINT, &act, NULL); ncpus = sysconf(_SC_NPROCESSORS_ONLN); + + cpu_map = calloc(ncpus, sizeof(int)); + if (!cpu_map) + err(EXIT_FAILURE, "calloc"); + + compute_cpu_online_map(cpu_map); + if (!nblocked_threads) nblocked_threads = ncpus; @@ -259,7 +269,7 @@ int bench_futex_wake_parallel(int argc, const char **argv) err(EXIT_FAILURE, "calloc"); /* create, launch & block all threads */ - block_threads(blocked_worker, thread_attr); + block_threads(blocked_worker, thread_attr, cpu_map); /* make sure all threads are already blocked */ pthread_mutex_lock(&thread_lock); @@ -295,5 +305,6 @@ int bench_futex_wake_parallel(int argc, const char **argv) print_summary(); free(blocked_worker); + free(cpu_map); return ret; } diff --git a/tools/perf/bench/futex-wake.c b/tools/perf/bench/futex-wake.c index 8c5c0b6b5c97..c0c71a769d2f 100644 --- a/tools/perf/bench/futex-wake.c +++ b/tools/perf/bench/futex-wake.c @@ -22,6 +22,7 @@ #include #include "bench.h" #include "futex.h" +#include "cpu-online-map.h" #include #include @@ -89,7 +90,7 @@ static void print_summary(void) } static void block_threads(pthread_t *w, - pthread_attr_t thread_attr) + pthread_attr_t thread_attr, int *cpu_map) { cpu_set_t cpu; unsigned int i; @@ -99,7 +100,7 @@ static void block_threads(pthread_t *w, /* create and block all threads */ for (i = 0; i < nthreads; i++) { CPU_ZERO(&cpu); - CPU_SET(i % ncpus, &cpu); + CPU_SET(cpu_map[i % ncpus], &cpu); if (pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpu)) err(EXIT_FAILURE, "pthread_attr_setaffinity_np"); @@ -122,6 +123,7 @@ int bench_futex_wake(int argc, const char **argv) unsigned int i, j; struct sigaction act; pthread_attr_t thread_attr; + int *cpu_map; argc = parse_options(argc, argv, options, bench_futex_wake_usage, 0); if (argc) { @@ -131,6 +133,12 @@ int bench_futex_wake(int argc, const char **argv) ncpus = sysconf(_SC_NPROCESSORS_ONLN); + cpu_map = calloc(ncpus, sizeof(int)); + if (! cpu_map) + err(EXIT_FAILURE, "calloc"); + + compute_cpu_online_map(cpu_map); + sigfillset(&act.sa_mask); act.sa_sigaction = toggle_done; sigaction(SIGINT, &act, NULL); @@ -161,7 +169,7 @@ int bench_futex_wake(int argc, const char **argv) struct timeval start, end, runtime; /* create, launch & block all threads */ - block_threads(worker, thread_attr); + block_threads(worker, thread_attr, cpu_map); /* make sure all threads are already blocked */ pthread_mutex_lock(&thread_lock); @@ -204,5 +212,6 @@ int bench_futex_wake(int argc, const char **argv) print_summary(); free(worker); + free(cpu_map); return ret; } From patchwork Thu Nov 23 00:25:34 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kim Phillips X-Patchwork-Id: 119505 Delivered-To: patch@linaro.org Received: by 10.140.22.164 with SMTP id 33csp134669qgn; Wed, 22 Nov 2017 16:25:41 -0800 (PST) X-Google-Smtp-Source: AGs4zMZZ814FCNclWoGGgcbo3QpXx2j+YhVH10XrQTGQsarYwtHb+7o+9W9mfaT8T1NSn0E9qgcU X-Received: by 10.84.130.98 with SMTP id 89mr23534167plc.232.1511396741315; Wed, 22 Nov 2017 16:25:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511396741; cv=none; d=google.com; s=arc-20160816; b=wo1F1FNVZEkQTZ/ysURTSYYlVTBIAkdu3pVVGuvzqGnJCTpF9H/tTSJ/qqfEMtFPcb sFthbuUA2r7Tn16SgiWjREYuXm+p8hhu1kZQtjx8+l0Bz4COz7PUfSpFjF8wLxkrnJqT gC11JG/3U/wql1zni9mY5NNHEuJbkW+O8KOMB0HAQi/IbH0ZjSavreQLwodXA0kxhaqH XSsL2N3TTk6+VoMdCpRmcEZQcZgpUB6IWSXqgzBdDIsvPt4eWsCkuoGbY3YF9rCZtz5f EGYy4u3F676PStRTf1PVPIIR65tzv1kpJxi5DWlIh1eDY1sHQN4oQ5RxCjJUlEK0DXmK e4RA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :organization:message-id:subject:cc:to:from:date :arc-authentication-results; bh=x9uJS0iJp3X1vZYx+R1Z0Pj2IKh2E5o/owIjUjSWj6A=; b=vvtt61fiZPRKQ7CekCzerF9XUPXGfLeMUJoEiTzxj+f5DbPBgRJ10vK3YF0DIw8L2C CJRpW197c/OM3H3T7lNxk7BKw9Q5QCfZeS8wZqfimoyovItyrZxxN+nfHndGRQdOpl3R WZMN2uLz8I4lsGnz09xSUltpwfTk9ND8PoZr3xfKrLU5C9mkmCUZEtlFUbgrvyrwWg5S 9kDuT+jugl8EBL/iOQ+sN+ByuV/nYsXLI3azTK35PCUwQ0VAFu1M4FhlOGlODN4hf2Fe 8l39UijSte+akFtlIhzXtoCAjdUpi+n5bUQBtVXGBhaBmrNDnnNuA/rxDLDQ4OOGNjdj wqOQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 133si15348751pfy.414.2017.11.22.16.25.41; Wed, 22 Nov 2017 16:25:41 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752053AbdKWAZj (ORCPT + 28 others); Wed, 22 Nov 2017 19:25:39 -0500 Received: from foss.arm.com ([217.140.101.70]:58464 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751594AbdKWAZg (ORCPT ); Wed, 22 Nov 2017 19:25:36 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8434815A2; Wed, 22 Nov 2017 16:25:35 -0800 (PST) Received: from dupont (dupont.austin.arm.com [10.118.7.100]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0A5633F487; Wed, 22 Nov 2017 16:25:35 -0800 (PST) Date: Wed, 22 Nov 2017 18:25:34 -0600 From: Kim Phillips To: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Thomas Gleixner , Darren Hart , Colin Ian King Cc: James Yang , linux-kernel@vger.kernel.org Subject: [PATCH 2/3] perf bench futex: synchronize 'bench futex wake-parallel' wakers Message-Id: <20171122182534.e3dd99e415d23a0490a84827@arm.com> Organization: Arm X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: James Yang Waker threads in the futex wake-parallel benchmark are started by a loop using pthread_create(). However, there is no synchronization for when the waker threads wake the waiting threads. Comparison of the waker threads' measurement timestamps show they are not all running concurrently because older waker threads finish their task before newer waker threads even start. This patch uses a barrier to better synchronize the waker threads. Additionally, unlike the waiter threads, the waker threads' processor affinity is not specified, so the result has run-to-run variability as the scheduler decides on which CPUs they are to run. So we add a -W/--affine-wakers flag to stripe the affinity of the waker threads across the online CPUs instead of having the scheduler place them. Signed-off-by: James Yang Signed-off-by: Kim Phillips --- tools/perf/bench/futex-wake-parallel.c | 44 ++++++++++++++++++++++++++++++---- 1 file changed, 40 insertions(+), 4 deletions(-) -- 2.15.0 diff --git a/tools/perf/bench/futex-wake-parallel.c b/tools/perf/bench/futex-wake-parallel.c index 5617bcd17e55..65a3a3466dce 100644 --- a/tools/perf/bench/futex-wake-parallel.c +++ b/tools/perf/bench/futex-wake-parallel.c @@ -8,6 +8,8 @@ * it can be used to measure futex_wake() changes. */ +#include "debug.h" + /* For the CLR_() macros */ #include #include @@ -31,6 +33,8 @@ struct thread_data { pthread_t worker; unsigned int nwoken; struct timeval runtime; + struct timeval start; + struct timeval end; }; static unsigned int nwakes = 1; @@ -39,10 +43,11 @@ static unsigned int nwakes = 1; static u_int32_t futex = 0; static pthread_t *blocked_worker; -static bool done = false, silent = false, fshared = false; +static bool done = false, silent = false, fshared = false, affine_wakers = false; static unsigned int nblocked_threads = 0, nwaking_threads = 0; static pthread_mutex_t thread_lock; static pthread_cond_t thread_parent, thread_worker; +static pthread_barrier_t barrier; static struct stats waketime_stats, wakeup_stats; static unsigned int ncpus, threads_starting; static int futex_flag = 0; @@ -52,6 +57,7 @@ static const struct option options[] = { OPT_UINTEGER('w', "nwakers", &nwaking_threads, "Specify amount of waking threads"), OPT_BOOLEAN( 's', "silent", &silent, "Silent mode: do not display data/details"), OPT_BOOLEAN( 'S', "shared", &fshared, "Use shared futexes instead of private ones"), + OPT_BOOLEAN( 'W', "affine-wakers", &affine_wakers, "Stripe affinity of waker threads across CPUs"), OPT_END() }; @@ -65,6 +71,8 @@ static void *waking_workerfn(void *arg) struct thread_data *waker = (struct thread_data *) arg; struct timeval start, end; + pthread_barrier_wait(&barrier); + gettimeofday(&start, NULL); waker->nwoken = futex_wake(&futex, nwakes, futex_flag); @@ -75,31 +83,59 @@ static void *waking_workerfn(void *arg) gettimeofday(&end, NULL); timersub(&end, &start, &waker->runtime); + waker->start = start; + waker->end = end; + pthread_exit(NULL); return NULL; } -static void wakeup_threads(struct thread_data *td, pthread_attr_t thread_attr) +static void wakeup_threads(struct thread_data *td, pthread_attr_t thread_attr, + int *cpu_map) { unsigned int i; pthread_attr_setdetachstate(&thread_attr, PTHREAD_CREATE_JOINABLE); + pthread_barrier_init(&barrier, NULL, nwaking_threads + 1); + /* create and block all threads */ for (i = 0; i < nwaking_threads; i++) { /* * Thread creation order will impact per-thread latency * as it will affect the order to acquire the hb spinlock. - * For now let the scheduler decide. */ + + if (affine_wakers) { + cpu_set_t cpu; + CPU_ZERO(&cpu); + CPU_SET(cpu_map[(i + 1) % ncpus], &cpu); + + if (pthread_attr_setaffinity_np(&thread_attr, + sizeof(cpu_set_t), + &cpu)) + err(EXIT_FAILURE, "pthread_attr_setaffinity_np"); + } + if (pthread_create(&td[i].worker, &thread_attr, waking_workerfn, (void *)&td[i])) err(EXIT_FAILURE, "pthread_create"); } + pthread_barrier_wait(&barrier); + for (i = 0; i < nwaking_threads; i++) if (pthread_join(td[i].worker, NULL)) err(EXIT_FAILURE, "pthread_join"); + + pthread_barrier_destroy(&barrier); + + for (i = 0; i < nwaking_threads; i++) { + pr_debug("%6ld.%06ld\t" + "%6ld.%06ld\n", + td[i].start.tv_sec, td[i].start.tv_usec, + td[i].end.tv_sec, td[i].end.tv_usec); + } } static void *blocked_workerfn(void *arg __maybe_unused) @@ -281,7 +317,7 @@ int bench_futex_wake_parallel(int argc, const char **argv) usleep(100000); /* Ok, all threads are patiently blocked, start waking folks up */ - wakeup_threads(waking_worker, thread_attr); + wakeup_threads(waking_worker, thread_attr, cpu_map); for (i = 0; i < nblocked_threads; i++) { ret = pthread_join(blocked_worker[i], NULL);