From patchwork Wed Jun 28 10:56:01 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Mark Rutland <mark.rutland@arm.com>
X-Patchwork-Id: 106516
Delivered-To: patch@linaro.org
Received: by 10.140.101.44 with SMTP id t41csp863838qge;
 Wed, 28 Jun 2017 03:56:57 -0700 (PDT)
X-Received: by 10.98.61.199 with SMTP id x68mr9946058pfj.228.1498647417564; 
 Wed, 28 Jun 2017 03:56:57 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1498647417; cv=none;
 d=google.com; s=arc-20160816;
 b=Dup2G/4/47zmXkmy/l4zDu+76/EO0aoC7AqXSO30J+S+JGR2lVLb+tZEa8LFh0eb0k
 sXTuRHl0FZK9rw+SgSI+ToSUXmjJbRg8YbNoQIYU9Yl+oXtRNF5Rt8Ci22sKeqRX9yXM
 +bcRojAxzQ7EAtm5X5DOR51ozscltspS/Y6Dv4cMG2fpmcyj896/j4HVHuk5yCV+2O0F
 iWAdUcMHD0ixoPBhwo/tPczNgfkzs5jperbYMjLgJPfvCmE81qLzh2rH+DxvqMmTO5iE
 kfJCkjGSlvcn5e1iESwSsPsUVuZxapytVP2YIkHELrnCPYcNKrKTMfixdJOT6gHxdsrO
 C7uQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:user-agent:in-reply-to
 :content-disposition:mime-version:references:message-id:subject:cc
 :to:from:date:arc-authentication-results;
 bh=byAyjd8Q4KPXI5ZRzgxsFoCZ288zSBt9ADXJF/Rj1M8=;
 b=shoJIl0bcwnOLx678pfRoTJcjzXStqS2qSkN7Co+X+46cADZJ5WYN6/aYJ7/srJH9Z
 wW9N2Y0n1Xhq+ofmPtZZocy3nAS2S+3wk9Y4Vh6l6JdtIKxj6v3lJidc78AaQ4Z/T3tF
 exyjD4vjTewJgXKKCR6VRCB9FPU3ZhCsfsxy1QXT5Vo8H/4LausYAO599JraJlCaOLD5
 PkuD3KWorukr5rCnCwASgH61uYFiU7OMJds9xr34oX1Fa2hPM64/h9IrCMN6LhhoYUsB
 sw+iNlY2Bov94kCROJx8Z0+92BRHul3Vq7YGVK5Jwgk/8i+l5dmsMYObJAIHT8lMDJbh
 oCWA==
ARC-Authentication-Results: i=1; mx.google.com;
 spf=pass (google.com: best guess record for domain of
 stable-owner@vger.kernel.org designates 209.132.180.67 as
 permitted sender) smtp.mailfrom=stable-owner@vger.kernel.org
Return-Path: <stable-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id g5si1470759pln.247.2017.06.28.03.56.56;
 Wed, 28 Jun 2017 03:56:57 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of
 stable-owner@vger.kernel.org designates 209.132.180.67 as
 permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 spf=pass (google.com: best guess record for domain of
 stable-owner@vger.kernel.org designates 209.132.180.67 as
 permitted sender) smtp.mailfrom=stable-owner@vger.kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1751596AbdF1K44 (ORCPT <rfc822;sumit.semwal@linaro.org>
 + 6 others); Wed, 28 Jun 2017 06:56:56 -0400
Received: from foss.arm.com ([217.140.101.70]:40300 "EHLO foss.arm.com"
 rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
 id S1751584AbdF1K4z (ORCPT <rfc822;stable@vger.kernel.org>);
 Wed, 28 Jun 2017 06:56:55 -0400
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 85C45344;
 Wed, 28 Jun 2017 03:56:54 -0700 (PDT)
Received: from leverpostej (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id
 8C02F3F41F; Wed, 28 Jun 2017 03:56:51 -0700 (PDT)
Date: Wed, 28 Jun 2017 11:56:01 +0100
From: Mark Rutland <mark.rutland@arm.com>
To: Kyle Huey <me@kylehuey.com>
Cc: "Jin, Yao" <yao.jin@linux.intel.com>, Ingo Molnar <mingo@kernel.org>,
 "Peter Zijlstra (Intel)" <peterz@infradead.org>, stable@vger.kernel.org,
 Alexander Shishkin <alexander.shishkin@linux.intel.com>,
 Arnaldo Carvalho de Melo <acme@redhat.com>, Jiri Olsa <jolsa@redhat.com>,
 Linus Torvalds <torvalds@linux-foundation.org>,
 Namhyung Kim <namhyung@kernel.org>, Stephane Eranian <eranian@google.com>,
 Thomas Gleixner <tglx@linutronix.de>,
 Vince Weaver <vincent.weaver@maine.edu>, acme@kernel.org,
 jolsa@kernel.org, kan.liang@intel.com,
 Will Deacon <will.deacon@arm.com>, gregkh@linuxfoundation.org,
 Robert O'Callahan <robert@ocallahan.org>,
 open list <linux-kernel@vger.kernel.org>
Subject: [PATCH] perf/core: generate overflow signal when samples are dropped
 (WAS: Re: [REGRESSION] perf/core: PMU interrupts dropped if we
 entered the kernel in the "skid" region)
Message-ID: <20170628105600.GC5981@leverpostej>
References: <CAP045Ap8cMx6mzSgcQ3n3bnh_8GJuCp7_KZe_5ZTCR_K6cPTLw@mail.gmail.com>
 <CAP045Ao2=6A7GEhBfmY0gZ2g=urL5U4R5ywLsTo4Yb6Tk1M8Cw@mail.gmail.com>
 <2256f9b5-1277-c4b1-1472-61a10cd1db9a@linux.intel.com>
 <CAP045AryPCUTO92S3hmrkag3D7NqgJM-hK82a7iKm9s-rWdn1w@mail.gmail.com>
 <20170628101248.GB5981@leverpostej>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <20170628101248.GB5981@leverpostej>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: stable-owner@vger.kernel.org
Precedence: bulk
List-ID: <stable.vger.kernel.org>
X-Mailing-List: stable@vger.kernel.org

On Wed, Jun 28, 2017 at 11:12:48AM +0100, Mark Rutland wrote:
> On Tue, Jun 27, 2017 at 09:51:00PM -0700, Kyle Huey wrote:

> As we're trying to avoid smapling state, I think we can move the check
> into perf_prepare_sample() or __perf_event_output(), where that state is
> actually sampled. I'll take a look at that momentarily.
> 
> Just to clarify, you don't care about the sample state at all? i.e. you
> don't need the user program counter?
> 
> Is that signal delivered to the tracee, or to a different process that
> traces it? If the latter, what ensures that the task is stopped
> sufficiently quickly?
> 
> > It seems to me that it might be reasonable to ignore the interrupt if
> > the purpose of the interrupt is to trigger sampling of the CPUs
> > register state.  But if the interrupt will trigger some other
> > operation, such as a signal on an fd, then there's no reason to drop
> > it.
> 
> Agreed. I'll try to have a patch for this soon.
> 
> I just need to figure out exactly where that overflow signal is
> generated by the perf core.

I've figured that out now. That's handled by perf_pending_event(), whcih
is the irq_work we schedule in __perf_event_overflow().

Does the below patch work for you?

---->8----
>From bb1f99dace508ce34ab0818f91d59e73450fa142 Mon Sep 17 00:00:00 2001
From: Mark Rutland <mark.rutland@arm.com>
Date: Wed, 28 Jun 2017 11:39:25 +0100
Subject: [PATCH] perf/core: generate overflow signal when samples are dropped

We recently tried to kill an information leak where users could receive
kernel samples due to skid on the PMU interrupt. To block this, we
bailed out early in perf_event_overflow(), as we do for non-sampling
events.

This broke rr, which uses sampling events to receive a signal on
overflow (but does not care about the contents of the sample). These
signals are critical to the correct operation of rr.

Instead of bailing out early in perf_event_overflow, we can bail prior
to performing the actual sampling in __perf_event_output(). This avoids
the information leak, but preserves the generation of the signal.

Since we don't place any sample data into the ring buffer, the signal is
arguably spurious. However, a userspace ringbuffer consumer can already
consume data prior to taking the associated signals, and therefore must
handle spurious signals to operate correctly. Thus, this signal
shouldn't be harmful.

Fixes: cc1582c231ea041f ("perf/core: Drop kernel samples even though :u is specified")
Reported-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c | 46 +++++++++++++++++++++++++---------------------
 1 file changed, 25 insertions(+), 21 deletions(-)

-- 
1.9.1

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6c4e523..6b263f3 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6090,6 +6090,21 @@ void perf_prepare_sample(struct perf_event_header *header,
 	}
 }
 
+static bool sample_is_allowed(struct perf_event *event, struct pt_regs *regs)
+{
+	/*
+	 * Due to interrupt latency (AKA "skid"), we may enter the
+	 * kernel before taking an overflow, even if the PMU is only
+	 * counting user events.
+	 * To avoid leaking information to userspace, we must always
+	 * reject kernel samples when exclude_kernel is set.
+	 */
+	if (event->attr.exclude_kernel && !user_mode(regs))
+		return false;
+
+	return true;
+}
+
 static void __always_inline
 __perf_event_output(struct perf_event *event,
 		    struct perf_sample_data *data,
@@ -6101,6 +6116,12 @@ void perf_prepare_sample(struct perf_event_header *header,
 	struct perf_output_handle handle;
 	struct perf_event_header header;
 
+	/*
+	 * For security, drop the skid kernel samples if necessary.
+	 */
+	if (!sample_is_allowed(event, regs))
+		return ret;
+
 	/* protect the callchain buffers */
 	rcu_read_lock();
 
@@ -7316,21 +7337,6 @@ int perf_event_account_interrupt(struct perf_event *event)
 	return __perf_event_account_interrupt(event, 1);
 }
 
-static bool sample_is_allowed(struct perf_event *event, struct pt_regs *regs)
-{
-	/*
-	 * Due to interrupt latency (AKA "skid"), we may enter the
-	 * kernel before taking an overflow, even if the PMU is only
-	 * counting user events.
-	 * To avoid leaking information to userspace, we must always
-	 * reject kernel samples when exclude_kernel is set.
-	 */
-	if (event->attr.exclude_kernel && !user_mode(regs))
-		return false;
-
-	return true;
-}
-
 /*
  * Generic event overflow handling, sampling.
  */
@@ -7352,12 +7358,6 @@ static int __perf_event_overflow(struct perf_event *event,
 	ret = __perf_event_account_interrupt(event, throttle);
 
 	/*
-	 * For security, drop the skid kernel samples if necessary.
-	 */
-	if (!sample_is_allowed(event, regs))
-		return ret;
-
-	/*
 	 * XXX event_limit might not quite work as expected on inherited
 	 * events
 	 */
@@ -7372,6 +7372,10 @@ static int __perf_event_overflow(struct perf_event *event,
 
 	READ_ONCE(event->overflow_handler)(event, data, regs);
 
+	/*
+	 * We must generate a wakeup regardless of whether we actually
+	 * generated a sample. This is relied upon by rr.
+	 */
 	if (*perf_event_fasync(event) && event->pending_kill) {
 		event->pending_wakeup = 1;
 		irq_work_queue(&event->pending);