From patchwork Tue Jun 14 15:10:41 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Mark Rutland <mark.rutland@arm.com>
X-Patchwork-Id: 70042
Delivered-To: patch@linaro.org
Received: by 10.140.106.246 with SMTP id e109csp2100618qgf;
 Tue, 14 Jun 2016 08:11:18 -0700 (PDT)
X-Received: by 10.107.53.5 with SMTP id c5mr32273398ioa.2.1465917078042;
 Tue, 14 Jun 2016 08:11:18 -0700 (PDT)
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id uh8si993274pac.79.2016.06.14.08.11.11; 
 Tue, 14 Jun 2016 08:11:18 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 spf=pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1752137AbcFNPLI (ORCPT <rfc822;julien.grall@linaro.org>
 + 30 others); Tue, 14 Jun 2016 11:11:08 -0400
Received: from foss.arm.com ([217.140.101.70]:59573 "EHLO foss.arm.com"
 rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
 id S1750789AbcFNPLF (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
 Tue, 14 Jun 2016 11:11:05 -0400
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9C917F;
 Tue, 14 Jun 2016 08:11:44 -0700 (PDT)
Received: from leverpostej.cambridge.arm.com
 (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id
 2A8D63F41F; Tue, 14 Jun 2016 08:11:03 -0700 (PDT)
From: Mark Rutland <mark.rutland@arm.com>
To: linux-kernel@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>,
 Alexander Shishkin <alexander.shishkin@linux.intel.com>,
 Arnaldo Carvalho de Melo <acme@kernel.org>,
 Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
 Will Deacon <will.deacon@arm.com>
Subject: [PATCH] perf: fix pmu::filter_match for SW-led groups
Date: Tue, 14 Jun 2016 16:10:41 +0100
Message-Id: <1465917041-15339-1-git-send-email-mark.rutland@arm.com>
X-Mailer: git-send-email 1.9.1
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Commit 66eb579e66ecfea5 ("perf: allow for PMU-specific event filtering")
added pmu::filter_match. This was intended to avoid HW constraints on
events from resulting in extremely pessimistic scheduling.

However, pmu::filter_match is only called for the leader of each event
group. When the leader is a SW event, we do not filter the groups, and
may fail at pmu::add time, and when this happens we'll give up on
scheduling any event groups later in the list until they are rotated
ahead of the failing group.

This can result in extremely sub-optimal scheduling behaviour, e.g. if
running the following on a big.LITTLE platform:

$ taskset -c 0 ./perf stat \
 -e 'a57{context-switches,armv8_cortex_a57/config=0x11/}' \
 -e 'a53{context-switches,armv8_cortex_a53/config=0x11/}' \
 ls

     <not counted>      context-switches                                              (0.00%)
     <not counted>      armv8_cortex_a57/config=0x11/                                     (0.00%)
                24      context-switches                                              (37.36%)
          57589154      armv8_cortex_a53/config=0x11/                                     (37.36%)

Here the 'a53' event group was always eligible to be scheduled, but the
a57 group never eligible to be scheduled, as the task was always affine
to a Cortex-A53 CPU. The SW (group leader) event in the 'a57' group was
eligible, but the HW event failed at pmu::add time, resulting in
ctx_flexible_sched_in giving up on scheduling further groups with HW
events.

One way of avoiding this is to check pmu::filter_match on siblings as
well as the group leader. If any of these fail their pmu::filter_match,
we must skip the entire group before attempting to add any events.

Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
---
 kernel/events/core.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

I've tried to find a better way of handling this (without needing to walk the
siblings list), but so far I'm at a loss. At least it's "only" O(n) in the size
of the sibling list we were going to walk anyway.

I suspect that at a more fundamental level, I need to stop sharing a
perf_hw_context between HW PMUs (i.e. replace task_struct::perf_event_ctxp with
something that can handle multiple HW PMUs). From previous attempts I'm not
sure if that's going to be possible.

Any ideas appreciated!

Mark.

-- 
1.9.1
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9c51ec3..c0b6db0 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1678,12 +1678,32 @@ static bool is_orphaned_event(struct perf_event *event)
 	return event->state == PERF_EVENT_STATE_DEAD;
 }
 
-static inline int pmu_filter_match(struct perf_event *event)
+static inline int __pmu_filter_match(struct perf_event *event)
 {
 	struct pmu *pmu = event->pmu;
 	return pmu->filter_match ? pmu->filter_match(event) : 1;
 }
 
+/*
+ * Check whether we should attempt to schedule an event group based on
+ * PMU-specific filtering. An event group can consist of HW and SW events,
+ * potentially with a SW leader, so we must check all the filters to determine
+ * whether a group is schedulable.
+ */
+static inline int pmu_filter_match(struct perf_event *event)
+{
+	struct perf_event *child;
+
+	if (!__pmu_filter_match(event))
+		return 0;
+
+	list_for_each_entry(child, &event->sibling_list, group_entry)
+		if (!__pmu_filter_match(child))
+			return 0;
+
+	return 1;
+}
+
 static inline int
 event_filter_match(struct perf_event *event)
 {