diff mbox series

[4.19,146/245] perf stat: Force error in fallback on :k events

Message ID 20200929105954.090876288@linuxfoundation.org
State New
Headers show
Series None | expand

Commit Message

Greg Kroah-Hartman Sept. 29, 2020, 10:59 a.m. UTC
From: Stephane Eranian <eranian@google.com>

[ Upstream commit bec49a9e05db3dbdca696fa07c62c52638fb6371 ]

When it is not possible for a non-privilege perf command to monitor at
the kernel level (:k), the fallback code forces a :u. That works if the
event was previously monitoring both levels.  But if the event was
already constrained to kernel only, then it does not make sense to
restrict it to user only.

Given the code works by exclusion, a kernel only event would have:

  attr->exclude_user = 1

The fallback code would add:

  attr->exclude_kernel = 1

In the end the end would not monitor in either the user level or kernel
level. In other words, it would count nothing.

An event programmed to monitor kernel only cannot be switched to user
only without seriously warning the user.

This patch forces an error in this case to make it clear the request
cannot really be satisfied.

Behavior with paranoid 1:

  $ sudo bash -c "echo 1 > /proc/sys/kernel/perf_event_paranoid"
  $ perf stat -e cycles:k sleep 1

   Performance counter stats for 'sleep 1':

           1,520,413      cycles:k

         1.002361664 seconds time elapsed

         0.002480000 seconds user
         0.000000000 seconds sys

Old behavior with paranoid 2:

  $ sudo bash -c "echo 2 > /proc/sys/kernel/perf_event_paranoid"
  $ perf stat -e cycles:k sleep 1
   Performance counter stats for 'sleep 1':

                   0      cycles:ku

         1.002358127 seconds time elapsed

         0.002384000 seconds user
         0.000000000 seconds sys

New behavior with paranoid 2:

  $ sudo bash -c "echo 2 > /proc/sys/kernel/perf_event_paranoid"
  $ perf stat -e cycles:k sleep 1
  Error:
  You may not have permission to collect stats.

  Consider tweaking /proc/sys/kernel/perf_event_paranoid,
  which controls use of the performance events system by
  unprivileged users (without CAP_PERFMON or CAP_SYS_ADMIN).

  The current value is 2:

    -1: Allow use of (almost) all events by all users
        Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
  >= 0: Disallow ftrace function tracepoint by users without CAP_PERFMON or CAP_SYS_ADMIN
        Disallow raw tracepoint access by users without CAP_SYS_PERFMON or CAP_SYS_ADMIN
  >= 1: Disallow CPU event access by users without CAP_PERFMON or CAP_SYS_ADMIN
  >= 2: Disallow kernel profiling by users without CAP_PERFMON or CAP_SYS_ADMIN

  To make this setting permanent, edit /etc/sysctl.conf too, e.g.:

          kernel.perf_event_paranoid = -1

v2 of this patch addresses the review feedback from jolsa@redhat.com.

Signed-off-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200414161550.225588-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 tools/perf/util/evsel.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Naresh Kamboju Sept. 29, 2020, 1:33 p.m. UTC | #1
On Tue, 29 Sep 2020 at 17:54, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>

> From: Stephane Eranian <eranian@google.com>

>

> [ Upstream commit bec49a9e05db3dbdca696fa07c62c52638fb6371 ]

>

> When it is not possible for a non-privilege perf command to monitor at

> the kernel level (:k), the fallback code forces a :u. That works if the

> event was previously monitoring both levels.  But if the event was

> already constrained to kernel only, then it does not make sense to

> restrict it to user only.

>

> Given the code works by exclusion, a kernel only event would have:

>

>   attr->exclude_user = 1

>

> The fallback code would add:

>

>   attr->exclude_kernel = 1

>

> In the end the end would not monitor in either the user level or kernel

> level. In other words, it would count nothing.

>

> An event programmed to monitor kernel only cannot be switched to user

> only without seriously warning the user.

>

> This patch forces an error in this case to make it clear the request

> cannot really be satisfied.

>

> Behavior with paranoid 1:

>

>   $ sudo bash -c "echo 1 > /proc/sys/kernel/perf_event_paranoid"

>   $ perf stat -e cycles:k sleep 1

>

>    Performance counter stats for 'sleep 1':

>

>            1,520,413      cycles:k

>

>          1.002361664 seconds time elapsed

>

>          0.002480000 seconds user

>          0.000000000 seconds sys

>

> Old behavior with paranoid 2:

>

>   $ sudo bash -c "echo 2 > /proc/sys/kernel/perf_event_paranoid"

>   $ perf stat -e cycles:k sleep 1

>    Performance counter stats for 'sleep 1':

>

>                    0      cycles:ku

>

>          1.002358127 seconds time elapsed

>

>          0.002384000 seconds user

>          0.000000000 seconds sys

>

> New behavior with paranoid 2:

>

>   $ sudo bash -c "echo 2 > /proc/sys/kernel/perf_event_paranoid"

>   $ perf stat -e cycles:k sleep 1

>   Error:

>   You may not have permission to collect stats.

>

>   Consider tweaking /proc/sys/kernel/perf_event_paranoid,

>   which controls use of the performance events system by

>   unprivileged users (without CAP_PERFMON or CAP_SYS_ADMIN).

>

>   The current value is 2:

>

>     -1: Allow use of (almost) all events by all users

>         Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK

>   >= 0: Disallow ftrace function tracepoint by users without CAP_PERFMON or CAP_SYS_ADMIN

>         Disallow raw tracepoint access by users without CAP_SYS_PERFMON or CAP_SYS_ADMIN

>   >= 1: Disallow CPU event access by users without CAP_PERFMON or CAP_SYS_ADMIN

>   >= 2: Disallow kernel profiling by users without CAP_PERFMON or CAP_SYS_ADMIN

>

>   To make this setting permanent, edit /etc/sysctl.conf too, e.g.:

>

>           kernel.perf_event_paranoid = -1

>

> v2 of this patch addresses the review feedback from jolsa@redhat.com.

>

> Signed-off-by: Stephane Eranian <eranian@google.com>

> Reviewed-by: Ian Rogers <irogers@google.com>

> Acked-by: Jiri Olsa <jolsa@redhat.com>

> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>

> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>

> Cc: Jiri Olsa <jolsa@redhat.com>

> Cc: Mark Rutland <mark.rutland@arm.com>

> Cc: Namhyung Kim <namhyung@kernel.org>

> Cc: Peter Zijlstra <peterz@infradead.org>

> Link: http://lore.kernel.org/lkml/20200414161550.225588-1-irogers@google.com

> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

> Signed-off-by: Sasha Levin <sashal@kernel.org>


perf failed on stable rc branch 4.19 on all devices.

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>

build warning and errors,
-----------------------------------
In file included from util/evlist.h:15:0,
                 from util/evsel.c:30:
util/evsel.c: In function 'perf_evsel__exit':
util/util.h:25:28: warning: passing argument 1 of 'free' discards
'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
 #define zfree(ptr) ({ free(*ptr); *ptr = NULL; })
                            ^
util/evsel.c:1293:2: note: in expansion of macro 'zfree'
  zfree(&evsel->pmu_name);
  ^~~~~
In file included from
/srv/oe/build/tmp-lkft-glibc/work/intel_corei7_64-linaro-linux/perf/1.0-r9/perf-1.0/tools/perf/arch/x86/include/perf_regs.h:5:0,
                 from util/perf_regs.h:27,
                 from util/event.h:11,
                 from util/callchain.h:8,
                 from util/evsel.c:26:
perf/1.0-r9/recipe-sysroot/usr/include/stdlib.h:563:13: note: expected
'void *' but argument is of type 'const char *'
 extern void free (void *__ptr) __THROW;
             ^~~~
util/evsel.c: In function 'perf_evsel__fallback':
util/evsel.c:2802:14: error: 'struct perf_evsel' has no member named
'core'; did you mean 'node'?
   if (evsel->core.attr.exclude_user)
              ^~~~
              node

> ---

>  tools/perf/util/evsel.c | 4 ++++

>  1 file changed, 4 insertions(+)

>

> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c

> index 68c5ab0e1800b..e8586957562b3 100644

> --- a/tools/perf/util/evsel.c

> +++ b/tools/perf/util/evsel.c

> @@ -2796,6 +2796,10 @@ bool perf_evsel__fallback(struct perf_evsel *evsel, int err,

>                 char *new_name;

>                 const char *sep = ":";

>

> +               /* If event has exclude user then don't exclude kernel. */

> +               if (evsel->core.attr.exclude_user)

> +                       return false;

> +

>                 /* Is there already the separator in the name. */

>                 if (strchr(name, '/') ||

>                     strchr(name, ':'))

> --

> 2.25.1

>

>

>
Greg Kroah-Hartman Sept. 29, 2020, 2:27 p.m. UTC | #2
On Tue, Sep 29, 2020 at 07:03:46PM +0530, Naresh Kamboju wrote:
> On Tue, 29 Sep 2020 at 17:54, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > From: Stephane Eranian <eranian@google.com>
> >
> > [ Upstream commit bec49a9e05db3dbdca696fa07c62c52638fb6371 ]
> >
> > When it is not possible for a non-privilege perf command to monitor at
> > the kernel level (:k), the fallback code forces a :u. That works if the
> > event was previously monitoring both levels.  But if the event was
> > already constrained to kernel only, then it does not make sense to
> > restrict it to user only.
> >
> > Given the code works by exclusion, a kernel only event would have:
> >
> >   attr->exclude_user = 1
> >
> > The fallback code would add:
> >
> >   attr->exclude_kernel = 1
> >
> > In the end the end would not monitor in either the user level or kernel
> > level. In other words, it would count nothing.
> >
> > An event programmed to monitor kernel only cannot be switched to user
> > only without seriously warning the user.
> >
> > This patch forces an error in this case to make it clear the request
> > cannot really be satisfied.
> >
> > Behavior with paranoid 1:
> >
> >   $ sudo bash -c "echo 1 > /proc/sys/kernel/perf_event_paranoid"
> >   $ perf stat -e cycles:k sleep 1
> >
> >    Performance counter stats for 'sleep 1':
> >
> >            1,520,413      cycles:k
> >
> >          1.002361664 seconds time elapsed
> >
> >          0.002480000 seconds user
> >          0.000000000 seconds sys
> >
> > Old behavior with paranoid 2:
> >
> >   $ sudo bash -c "echo 2 > /proc/sys/kernel/perf_event_paranoid"
> >   $ perf stat -e cycles:k sleep 1
> >    Performance counter stats for 'sleep 1':
> >
> >                    0      cycles:ku
> >
> >          1.002358127 seconds time elapsed
> >
> >          0.002384000 seconds user
> >          0.000000000 seconds sys
> >
> > New behavior with paranoid 2:
> >
> >   $ sudo bash -c "echo 2 > /proc/sys/kernel/perf_event_paranoid"
> >   $ perf stat -e cycles:k sleep 1
> >   Error:
> >   You may not have permission to collect stats.
> >
> >   Consider tweaking /proc/sys/kernel/perf_event_paranoid,
> >   which controls use of the performance events system by
> >   unprivileged users (without CAP_PERFMON or CAP_SYS_ADMIN).
> >
> >   The current value is 2:
> >
> >     -1: Allow use of (almost) all events by all users
> >         Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
> >   >= 0: Disallow ftrace function tracepoint by users without CAP_PERFMON or CAP_SYS_ADMIN
> >         Disallow raw tracepoint access by users without CAP_SYS_PERFMON or CAP_SYS_ADMIN
> >   >= 1: Disallow CPU event access by users without CAP_PERFMON or CAP_SYS_ADMIN
> >   >= 2: Disallow kernel profiling by users without CAP_PERFMON or CAP_SYS_ADMIN
> >
> >   To make this setting permanent, edit /etc/sysctl.conf too, e.g.:
> >
> >           kernel.perf_event_paranoid = -1
> >
> > v2 of this patch addresses the review feedback from jolsa@redhat.com.
> >
> > Signed-off-by: Stephane Eranian <eranian@google.com>
> > Reviewed-by: Ian Rogers <irogers@google.com>
> > Acked-by: Jiri Olsa <jolsa@redhat.com>
> > Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > Cc: Jiri Olsa <jolsa@redhat.com>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Namhyung Kim <namhyung@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Link: http://lore.kernel.org/lkml/20200414161550.225588-1-irogers@google.com
> > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Signed-off-by: Sasha Levin <sashal@kernel.org>
> 
> perf failed on stable rc branch 4.19 on all devices.
> 
> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> 
> build warning and errors,
> -----------------------------------
> In file included from util/evlist.h:15:0,
>                  from util/evsel.c:30:
> util/evsel.c: In function 'perf_evsel__exit':
> util/util.h:25:28: warning: passing argument 1 of 'free' discards
> 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
>  #define zfree(ptr) ({ free(*ptr); *ptr = NULL; })
>                             ^
> util/evsel.c:1293:2: note: in expansion of macro 'zfree'
>   zfree(&evsel->pmu_name);
>   ^~~~~
> In file included from
> /srv/oe/build/tmp-lkft-glibc/work/intel_corei7_64-linaro-linux/perf/1.0-r9/perf-1.0/tools/perf/arch/x86/include/perf_regs.h:5:0,
>                  from util/perf_regs.h:27,
>                  from util/event.h:11,
>                  from util/callchain.h:8,
>                  from util/evsel.c:26:
> perf/1.0-r9/recipe-sysroot/usr/include/stdlib.h:563:13: note: expected
> 'void *' but argument is of type 'const char *'
>  extern void free (void *__ptr) __THROW;
>              ^~~~
> util/evsel.c: In function 'perf_evsel__fallback':
> util/evsel.c:2802:14: error: 'struct perf_evsel' has no member named
> 'core'; did you mean 'node'?
>    if (evsel->core.attr.exclude_user)
>               ^~~~
>               node

I thought Sasha had dropped all of the offending patches.  I'll go drop
this one and push out a new 4.19-rc release.

But note, the latest 4.19.y tree doesn't even build perf for me, so I
can't really check this locally :(

thanks,

greg k-h
Sasha Levin Sept. 29, 2020, 4:06 p.m. UTC | #3
On Tue, Sep 29, 2020 at 04:27:17PM +0200, Greg Kroah-Hartman wrote:
>On Tue, Sep 29, 2020 at 07:03:46PM +0530, Naresh Kamboju wrote:

>> On Tue, 29 Sep 2020 at 17:54, Greg Kroah-Hartman

>> <gregkh@linuxfoundation.org> wrote:

>> >

>> > From: Stephane Eranian <eranian@google.com>

>> >

>> > [ Upstream commit bec49a9e05db3dbdca696fa07c62c52638fb6371 ]

>> >

>> > When it is not possible for a non-privilege perf command to monitor at

>> > the kernel level (:k), the fallback code forces a :u. That works if the

>> > event was previously monitoring both levels.  But if the event was

>> > already constrained to kernel only, then it does not make sense to

>> > restrict it to user only.

>> >

>> > Given the code works by exclusion, a kernel only event would have:

>> >

>> >   attr->exclude_user = 1

>> >

>> > The fallback code would add:

>> >

>> >   attr->exclude_kernel = 1

>> >

>> > In the end the end would not monitor in either the user level or kernel

>> > level. In other words, it would count nothing.

>> >

>> > An event programmed to monitor kernel only cannot be switched to user

>> > only without seriously warning the user.

>> >

>> > This patch forces an error in this case to make it clear the request

>> > cannot really be satisfied.

>> >

>> > Behavior with paranoid 1:

>> >

>> >   $ sudo bash -c "echo 1 > /proc/sys/kernel/perf_event_paranoid"

>> >   $ perf stat -e cycles:k sleep 1

>> >

>> >    Performance counter stats for 'sleep 1':

>> >

>> >            1,520,413      cycles:k

>> >

>> >          1.002361664 seconds time elapsed

>> >

>> >          0.002480000 seconds user

>> >          0.000000000 seconds sys

>> >

>> > Old behavior with paranoid 2:

>> >

>> >   $ sudo bash -c "echo 2 > /proc/sys/kernel/perf_event_paranoid"

>> >   $ perf stat -e cycles:k sleep 1

>> >    Performance counter stats for 'sleep 1':

>> >

>> >                    0      cycles:ku

>> >

>> >          1.002358127 seconds time elapsed

>> >

>> >          0.002384000 seconds user

>> >          0.000000000 seconds sys

>> >

>> > New behavior with paranoid 2:

>> >

>> >   $ sudo bash -c "echo 2 > /proc/sys/kernel/perf_event_paranoid"

>> >   $ perf stat -e cycles:k sleep 1

>> >   Error:

>> >   You may not have permission to collect stats.

>> >

>> >   Consider tweaking /proc/sys/kernel/perf_event_paranoid,

>> >   which controls use of the performance events system by

>> >   unprivileged users (without CAP_PERFMON or CAP_SYS_ADMIN).

>> >

>> >   The current value is 2:

>> >

>> >     -1: Allow use of (almost) all events by all users

>> >         Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK

>> >   >= 0: Disallow ftrace function tracepoint by users without CAP_PERFMON or CAP_SYS_ADMIN

>> >         Disallow raw tracepoint access by users without CAP_SYS_PERFMON or CAP_SYS_ADMIN

>> >   >= 1: Disallow CPU event access by users without CAP_PERFMON or CAP_SYS_ADMIN

>> >   >= 2: Disallow kernel profiling by users without CAP_PERFMON or CAP_SYS_ADMIN

>> >

>> >   To make this setting permanent, edit /etc/sysctl.conf too, e.g.:

>> >

>> >           kernel.perf_event_paranoid = -1

>> >

>> > v2 of this patch addresses the review feedback from jolsa@redhat.com.

>> >

>> > Signed-off-by: Stephane Eranian <eranian@google.com>

>> > Reviewed-by: Ian Rogers <irogers@google.com>

>> > Acked-by: Jiri Olsa <jolsa@redhat.com>

>> > Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>

>> > Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>

>> > Cc: Jiri Olsa <jolsa@redhat.com>

>> > Cc: Mark Rutland <mark.rutland@arm.com>

>> > Cc: Namhyung Kim <namhyung@kernel.org>

>> > Cc: Peter Zijlstra <peterz@infradead.org>

>> > Link: http://lore.kernel.org/lkml/20200414161550.225588-1-irogers@google.com

>> > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

>> > Signed-off-by: Sasha Levin <sashal@kernel.org>

>>

>> perf failed on stable rc branch 4.19 on all devices.

>>

>> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>

>>

>> build warning and errors,

>> -----------------------------------

>> In file included from util/evlist.h:15:0,

>>                  from util/evsel.c:30:

>> util/evsel.c: In function 'perf_evsel__exit':

>> util/util.h:25:28: warning: passing argument 1 of 'free' discards

>> 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]

>>  #define zfree(ptr) ({ free(*ptr); *ptr = NULL; })

>>                             ^

>> util/evsel.c:1293:2: note: in expansion of macro 'zfree'

>>   zfree(&evsel->pmu_name);

>>   ^~~~~

>> In file included from

>> /srv/oe/build/tmp-lkft-glibc/work/intel_corei7_64-linaro-linux/perf/1.0-r9/perf-1.0/tools/perf/arch/x86/include/perf_regs.h:5:0,

>>                  from util/perf_regs.h:27,

>>                  from util/event.h:11,

>>                  from util/callchain.h:8,

>>                  from util/evsel.c:26:

>> perf/1.0-r9/recipe-sysroot/usr/include/stdlib.h:563:13: note: expected

>> 'void *' but argument is of type 'const char *'

>>  extern void free (void *__ptr) __THROW;

>>              ^~~~

>> util/evsel.c: In function 'perf_evsel__fallback':

>> util/evsel.c:2802:14: error: 'struct perf_evsel' has no member named

>> 'core'; did you mean 'node'?

>>    if (evsel->core.attr.exclude_user)

>>               ^~~~

>>               node

>

>I thought Sasha had dropped all of the offending patches.  I'll go drop

>this one and push out a new 4.19-rc release.


I did, looks like this a new report.

>But note, the latest 4.19.y tree doesn't even build perf for me, so I

>can't really check this locally :(


Same here. Naresh, does perf builds "out of the box" for you, or do you
carry any patches on top?

-- 
Thanks,
Sasha
diff mbox series

Patch

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 68c5ab0e1800b..e8586957562b3 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2796,6 +2796,10 @@  bool perf_evsel__fallback(struct perf_evsel *evsel, int err,
 		char *new_name;
 		const char *sep = ":";
 
+		/* If event has exclude user then don't exclude kernel. */
+		if (evsel->core.attr.exclude_user)
+			return false;
+
 		/* Is there already the separator in the name. */
 		if (strchr(name, '/') ||
 		    strchr(name, ':'))