From patchwork Tue Jun 14 15:10:41 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Rutland X-Patchwork-Id: 70042 Delivered-To: patch@linaro.org Received: by 10.140.106.246 with SMTP id e109csp2100618qgf; Tue, 14 Jun 2016 08:11:18 -0700 (PDT) X-Received: by 10.107.53.5 with SMTP id c5mr32273398ioa.2.1465917078042; Tue, 14 Jun 2016 08:11:18 -0700 (PDT) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id uh8si993274pac.79.2016.06.14.08.11.11; Tue, 14 Jun 2016 08:11:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752137AbcFNPLI (ORCPT + 30 others); Tue, 14 Jun 2016 11:11:08 -0400 Received: from foss.arm.com ([217.140.101.70]:59573 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750789AbcFNPLF (ORCPT ); Tue, 14 Jun 2016 11:11:05 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9C917F; Tue, 14 Jun 2016 08:11:44 -0700 (PDT) Received: from leverpostej.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 2A8D63F41F; Tue, 14 Jun 2016 08:11:03 -0700 (PDT) From: Mark Rutland To: linux-kernel@vger.kernel.org Cc: Mark Rutland , Alexander Shishkin , Arnaldo Carvalho de Melo , Ingo Molnar , Peter Zijlstra , Will Deacon Subject: [PATCH] perf: fix pmu::filter_match for SW-led groups Date: Tue, 14 Jun 2016 16:10:41 +0100 Message-Id: <1465917041-15339-1-git-send-email-mark.rutland@arm.com> X-Mailer: git-send-email 1.9.1 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Commit 66eb579e66ecfea5 ("perf: allow for PMU-specific event filtering") added pmu::filter_match. This was intended to avoid HW constraints on events from resulting in extremely pessimistic scheduling. However, pmu::filter_match is only called for the leader of each event group. When the leader is a SW event, we do not filter the groups, and may fail at pmu::add time, and when this happens we'll give up on scheduling any event groups later in the list until they are rotated ahead of the failing group. This can result in extremely sub-optimal scheduling behaviour, e.g. if running the following on a big.LITTLE platform: $ taskset -c 0 ./perf stat \ -e 'a57{context-switches,armv8_cortex_a57/config=0x11/}' \ -e 'a53{context-switches,armv8_cortex_a53/config=0x11/}' \ ls context-switches (0.00%) armv8_cortex_a57/config=0x11/ (0.00%) 24 context-switches (37.36%) 57589154 armv8_cortex_a53/config=0x11/ (37.36%) Here the 'a53' event group was always eligible to be scheduled, but the a57 group never eligible to be scheduled, as the task was always affine to a Cortex-A53 CPU. The SW (group leader) event in the 'a57' group was eligible, but the HW event failed at pmu::add time, resulting in ctx_flexible_sched_in giving up on scheduling further groups with HW events. One way of avoiding this is to check pmu::filter_match on siblings as well as the group leader. If any of these fail their pmu::filter_match, we must skip the entire group before attempting to add any events. Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Will Deacon Cc: linux-kernel@vger.kernel.org Signed-off-by: Mark Rutland --- kernel/events/core.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) I've tried to find a better way of handling this (without needing to walk the siblings list), but so far I'm at a loss. At least it's "only" O(n) in the size of the sibling list we were going to walk anyway. I suspect that at a more fundamental level, I need to stop sharing a perf_hw_context between HW PMUs (i.e. replace task_struct::perf_event_ctxp with something that can handle multiple HW PMUs). From previous attempts I'm not sure if that's going to be possible. Any ideas appreciated! Mark. -- 1.9.1 diff --git a/kernel/events/core.c b/kernel/events/core.c index 9c51ec3..c0b6db0 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1678,12 +1678,32 @@ static bool is_orphaned_event(struct perf_event *event) return event->state == PERF_EVENT_STATE_DEAD; } -static inline int pmu_filter_match(struct perf_event *event) +static inline int __pmu_filter_match(struct perf_event *event) { struct pmu *pmu = event->pmu; return pmu->filter_match ? pmu->filter_match(event) : 1; } +/* + * Check whether we should attempt to schedule an event group based on + * PMU-specific filtering. An event group can consist of HW and SW events, + * potentially with a SW leader, so we must check all the filters to determine + * whether a group is schedulable. + */ +static inline int pmu_filter_match(struct perf_event *event) +{ + struct perf_event *child; + + if (!__pmu_filter_match(event)) + return 0; + + list_for_each_entry(child, &event->sibling_list, group_entry) + if (!__pmu_filter_match(child)) + return 0; + + return 1; +} + static inline int event_filter_match(struct perf_event *event) {