From patchwork Thu Nov 23 00:25:34 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kim Phillips <kim.phillips@arm.com>
X-Patchwork-Id: 119505
Delivered-To: patch@linaro.org
Received: by 10.140.22.164 with SMTP id 33csp134669qgn;
 Wed, 22 Nov 2017 16:25:41 -0800 (PST)
X-Google-Smtp-Source: AGs4zMZZ814FCNclWoGGgcbo3QpXx2j+YhVH10XrQTGQsarYwtHb+7o+9W9mfaT8T1NSn0E9qgcU
X-Received: by 10.84.130.98 with SMTP id 89mr23534167plc.232.1511396741315; 
 Wed, 22 Nov 2017 16:25:41 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1511396741; cv=none;
 d=google.com; s=arc-20160816;
 b=wo1F1FNVZEkQTZ/ysURTSYYlVTBIAkdu3pVVGuvzqGnJCTpF9H/tTSJ/qqfEMtFPcb
 sFthbuUA2r7Tn16SgiWjREYuXm+p8hhu1kZQtjx8+l0Bz4COz7PUfSpFjF8wLxkrnJqT
 gC11JG/3U/wql1zni9mY5NNHEuJbkW+O8KOMB0HAQi/IbH0ZjSavreQLwodXA0kxhaqH
 XSsL2N3TTk6+VoMdCpRmcEZQcZgpUB6IWSXqgzBdDIsvPt4eWsCkuoGbY3YF9rCZtz5f
 EGYy4u3F676PStRTf1PVPIIR65tzv1kpJxi5DWlIh1eDY1sHQN4oQ5RxCjJUlEK0DXmK
 e4RA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:content-transfer-encoding:mime-version
 :organization:message-id:subject:cc:to:from:date
 :arc-authentication-results;
 bh=x9uJS0iJp3X1vZYx+R1Z0Pj2IKh2E5o/owIjUjSWj6A=;
 b=vvtt61fiZPRKQ7CekCzerF9XUPXGfLeMUJoEiTzxj+f5DbPBgRJ10vK3YF0DIw8L2C
 CJRpW197c/OM3H3T7lNxk7BKw9Q5QCfZeS8wZqfimoyovItyrZxxN+nfHndGRQdOpl3R
 WZMN2uLz8I4lsGnz09xSUltpwfTk9ND8PoZr3xfKrLU5C9mkmCUZEtlFUbgrvyrwWg5S
 9kDuT+jugl8EBL/iOQ+sN+ByuV/nYsXLI3azTK35PCUwQ0VAFu1M4FhlOGlODN4hf2Fe
 8l39UijSte+akFtlIhzXtoCAjdUpi+n5bUQBtVXGBhaBmrNDnnNuA/rxDLDQ4OOGNjdj
 wqOQ==
ARC-Authentication-Results: i=1; mx.google.com;
 spf=pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 133si15348751pfy.414.2017.11.22.16.25.41; 
 Wed, 22 Nov 2017 16:25:41 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 spf=pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1752053AbdKWAZj (ORCPT <rfc822; dan.rue@linaro.org> + 28 others); 
 Wed, 22 Nov 2017 19:25:39 -0500
Received: from foss.arm.com ([217.140.101.70]:58464 "EHLO foss.arm.com"
 rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
 id S1751594AbdKWAZg (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
 Wed, 22 Nov 2017 19:25:36 -0500
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8434815A2;
 Wed, 22 Nov 2017 16:25:35 -0800 (PST)
Received: from dupont (dupont.austin.arm.com [10.118.7.100])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id
 0A5633F487; Wed, 22 Nov 2017 16:25:35 -0800 (PST)
Date: Wed, 22 Nov 2017 18:25:34 -0600
From: Kim Phillips <kim.phillips@arm.com>
To: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>,
 Arnaldo Carvalho de Melo <acme@kernel.org>,
 Alexander Shishkin <alexander.shishkin@linux.intel.com>,
 Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>,
 Thomas Gleixner <tglx@linutronix.de>, Darren Hart <dvhart@infradead.org>,
 Colin Ian King <colin.king@canonical.com>
Cc: James Yang <james.yang@arm.com>, linux-kernel@vger.kernel.org
Subject: [PATCH 2/3] perf bench futex: synchronize 'bench futex
 wake-parallel' wakers
Message-Id: <20171122182534.e3dd99e415d23a0490a84827@arm.com>
Organization: Arm
X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

From: James Yang <james.yang@arm.com>

Waker threads in the futex wake-parallel benchmark are started by a
loop using pthread_create().  However, there is no synchronization
for when the waker threads wake the waiting threads.  Comparison of
the waker threads' measurement timestamps show they are not all
running concurrently because older waker threads finish their task
before newer waker threads even start.

This patch uses a barrier to better synchronize the waker threads.

Additionally, unlike the waiter threads, the waker threads' processor
affinity is not specified, so the result has run-to-run variability as
the scheduler decides on which CPUs they are to run.  So we add a
-W/--affine-wakers flag to stripe the affinity of the waker threads
across the online CPUs instead of having the scheduler place them.

Signed-off-by: James Yang <james.yang@arm.com>
Signed-off-by: Kim Phillips <kim.phillips@arm.com>
---
 tools/perf/bench/futex-wake-parallel.c | 44 ++++++++++++++++++++++++++++++----
 1 file changed, 40 insertions(+), 4 deletions(-)

-- 
2.15.0

diff --git a/tools/perf/bench/futex-wake-parallel.c b/tools/perf/bench/futex-wake-parallel.c
index 5617bcd17e55..65a3a3466dce 100644
--- a/tools/perf/bench/futex-wake-parallel.c
+++ b/tools/perf/bench/futex-wake-parallel.c
@@ -8,6 +8,8 @@
  * it can be used to measure futex_wake() changes.
  */
 
+#include "debug.h"
+
 /* For the CLR_() macros */
 #include <string.h>
 #include <pthread.h>
@@ -31,6 +33,8 @@ struct thread_data {
 	pthread_t worker;
 	unsigned int nwoken;
 	struct timeval runtime;
+	struct timeval start;
+	struct timeval end;
 };
 
 static unsigned int nwakes = 1;
@@ -39,10 +43,11 @@ static unsigned int nwakes = 1;
 static u_int32_t futex = 0;
 
 static pthread_t *blocked_worker;
-static bool done = false, silent = false, fshared = false;
+static bool done = false, silent = false, fshared = false, affine_wakers = false;
 static unsigned int nblocked_threads = 0, nwaking_threads = 0;
 static pthread_mutex_t thread_lock;
 static pthread_cond_t thread_parent, thread_worker;
+static pthread_barrier_t barrier;
 static struct stats waketime_stats, wakeup_stats;
 static unsigned int ncpus, threads_starting;
 static int futex_flag = 0;
@@ -52,6 +57,7 @@ static const struct option options[] = {
 	OPT_UINTEGER('w', "nwakers", &nwaking_threads, "Specify amount of waking threads"),
 	OPT_BOOLEAN( 's', "silent",  &silent,   "Silent mode: do not display data/details"),
 	OPT_BOOLEAN( 'S', "shared",  &fshared,  "Use shared futexes instead of private ones"),
+	OPT_BOOLEAN( 'W', "affine-wakers", &affine_wakers, "Stripe affinity of waker threads across CPUs"),
 	OPT_END()
 };
 
@@ -65,6 +71,8 @@ static void *waking_workerfn(void *arg)
 	struct thread_data *waker = (struct thread_data *) arg;
 	struct timeval start, end;
 
+	pthread_barrier_wait(&barrier);
+
 	gettimeofday(&start, NULL);
 
 	waker->nwoken = futex_wake(&futex, nwakes, futex_flag);
@@ -75,31 +83,59 @@ static void *waking_workerfn(void *arg)
 	gettimeofday(&end, NULL);
 	timersub(&end, &start, &waker->runtime);
 
+	waker->start = start;
+	waker->end   = end;
+
 	pthread_exit(NULL);
 	return NULL;
 }
 
-static void wakeup_threads(struct thread_data *td, pthread_attr_t thread_attr)
+static void wakeup_threads(struct thread_data *td, pthread_attr_t thread_attr,
+			   int *cpu_map)
 {
 	unsigned int i;
 
 	pthread_attr_setdetachstate(&thread_attr, PTHREAD_CREATE_JOINABLE);
 
+	pthread_barrier_init(&barrier, NULL, nwaking_threads + 1);
+
 	/* create and block all threads */
 	for (i = 0; i < nwaking_threads; i++) {
 		/*
 		 * Thread creation order will impact per-thread latency
 		 * as it will affect the order to acquire the hb spinlock.
-		 * For now let the scheduler decide.
 		 */
+
+		if (affine_wakers) {
+			cpu_set_t cpu;
+			CPU_ZERO(&cpu);
+			CPU_SET(cpu_map[(i + 1) % ncpus], &cpu);
+
+			if (pthread_attr_setaffinity_np(&thread_attr,
+							sizeof(cpu_set_t),
+							&cpu))
+				err(EXIT_FAILURE, "pthread_attr_setaffinity_np");
+		}
+
 		if (pthread_create(&td[i].worker, &thread_attr,
 				   waking_workerfn, (void *)&td[i]))
 			err(EXIT_FAILURE, "pthread_create");
 	}
 
+	pthread_barrier_wait(&barrier);
+
 	for (i = 0; i < nwaking_threads; i++)
 		if (pthread_join(td[i].worker, NULL))
 			err(EXIT_FAILURE, "pthread_join");
+
+	pthread_barrier_destroy(&barrier);
+
+	for (i = 0; i < nwaking_threads; i++) {
+		pr_debug("%6ld.%06ld\t"
+			 "%6ld.%06ld\n",
+			 td[i].start.tv_sec, td[i].start.tv_usec,
+			 td[i].end.tv_sec, td[i].end.tv_usec);
+	}
 }
 
 static void *blocked_workerfn(void *arg __maybe_unused)
@@ -281,7 +317,7 @@ int bench_futex_wake_parallel(int argc, const char **argv)
 		usleep(100000);
 
 		/* Ok, all threads are patiently blocked, start waking folks up */
-		wakeup_threads(waking_worker, thread_attr);
+		wakeup_threads(waking_worker, thread_attr, cpu_map);
 
 		for (i = 0; i < nblocked_threads; i++) {
 			ret = pthread_join(blocked_worker[i], NULL);