diff mbox

perf tools: session: avoid infinite loop

Message ID 1442423929-12253-1-git-send-email-mark.rutland@arm.com
State Superseded
Headers show

Commit Message

Mark Rutland Sept. 16, 2015, 5:18 p.m. UTC
If a session contains no events, we can get stuck in an infinite loop in
__perf_session__process_events, with a non-zero file_size and
data_offset, but a zero data_size.

In this case, we can mmap the entirety of the file (consisting of the
file and attribute headers), and fetch_mmaped_event will correctly
refuse to read any (unmapped and non-existent) event headers. This
causes __perf_session__process_events to unmap the file and retry with
the exact same parameters, getting stuck in an infinite loop.

This has been observed to result in an exit-time hang when counting
rare/unschedulable events with perf record, and can be triggered
artificially with the script below:

----
#!/bin/sh
printf "REPRO: launching perf\n";
./perf record -e software/config=9/ sleep 1 &
PERF_PID=$!;
sleep 0.002;
kill -2 $PERF_PID;
printf "REPRO: waiting for perf (%d) to exit...\n" "$PERF_PID";
wait $PERF_PID;
printf "REPRO: perf exited\n";
----

To avoid this, have __perf_session__process_events bail out early when
the file has no data (i.e. it has no events).

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 tools/perf/util/session.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Arnaldo Carvalho de Melo Sept. 16, 2015, 8:54 p.m. UTC | #1
Em Wed, Sep 16, 2015 at 06:18:49PM +0100, Mark Rutland escreveu:
> This has been observed to result in an exit-time hang when counting
> rare/unschedulable events with perf record, and can be triggered
> artificially with the script below:
> 
> ----
> #!/bin/sh
> printf "REPRO: launching perf\n";
> ./perf record -e software/config=9/ sleep 1 &
> PERF_PID=$!;
> sleep 0.002;
> kill -2 $PERF_PID;
> printf "REPRO: waiting for perf (%d) to exit...\n" "$PERF_PID";
> wait $PERF_PID;
> printf "REPRO: perf exited\n";
> ----

So, I run it here, without this patch, and get:

  [root@zoo ~]# time ./repro.sh 
  REPRO: launching perf
  REPRO: waiting for perf (766) to exit...
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.015 MB perf.data ]
  REPRO: perf exited

  real	0m1.060s
  user	0m0.018s
  sys	0m0.037s
  [root@zoo ~]# perf report --stdio
  Error:
  The perf.data file has no samples!
  # To display the perf.data header info, please use
  # --header/--header-only options.
  #
  [root@zoo ~]# perf evlist -v
  software/config=9/: type: 1, size: 112, config: 0x9, { sample_period,
  sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1,
  inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1,
  sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
  [root@zoo ~]# 

What am I doing wrong? Trying to reproduce this before even looking at
the patch :-)

- Arnaldo
 
> To avoid this, have __perf_session__process_events bail out early when
> the file has no data (i.e. it has no events).
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
>  tools/perf/util/session.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 8a4537e..fc3f7c9 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -1580,7 +1580,10 @@ static int __perf_session__process_events(struct perf_session *session,
>  	file_offset = page_offset;
>  	head = data_offset - page_offset;
>  
> -	if (data_size && (data_offset + data_size < file_size))
> +	if (data_size == 0)
> +		goto out;
> +
> +	if (data_offset + data_size < file_size)
>  		file_size = data_offset + data_size;
>  
>  	ui_progress__init(&prog, file_size, "Processing events...");
> -- 
> 1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Mark Rutland Sept. 17, 2015, 3:41 p.m. UTC | #2
Hi,

On Wed, Sep 16, 2015 at 09:54:54PM +0100, Arnaldo Carvalho de Melo wrote:
> Em Wed, Sep 16, 2015 at 06:18:49PM +0100, Mark Rutland escreveu:
> > This has been observed to result in an exit-time hang when counting
> > rare/unschedulable events with perf record, and can be triggered
> > artificially with the script below:
> > 
> > ----
> > #!/bin/sh
> > printf "REPRO: launching perf\n";
> > ./perf record -e software/config=9/ sleep 1 &
> > PERF_PID=$!;
> > sleep 0.002;
> > kill -2 $PERF_PID;
> > printf "REPRO: waiting for perf (%d) to exit...\n" "$PERF_PID";
> > wait $PERF_PID;
> > printf "REPRO: perf exited\n";
> > ----
> 
> So, I run it here, without this patch, and get:
> 
>   [root@zoo ~]# time ./repro.sh 
>   REPRO: launching perf
>   REPRO: waiting for perf (766) to exit...
>   [ perf record: Woken up 1 times to write data ]
>   [ perf record: Captured and wrote 0.015 MB perf.data ]
>   REPRO: perf exited
>   real	0m1.060s
>   user	0m0.018s
>   sys	0m0.037s

[...]

> What am I doing wrong? Trying to reproduce this before even looking at
> the patch :-)

I suspect you have a shinier computer than I do! ;)

It's easier to trigger on slower machines -- decreasing the sleep time
to 0.001 or below may help, though if it's set too low we exit early
without exercising the failing path.

It looks like to trigger the bug we need to successfully call
record__mmap_read_all at least once (but not read any data), then break out of
the loop because hits == rec->samples && done. That way we call
process_buildids with no data having been written.

I added some additional logging of rec->bytes_written immediately before and
after the main loop. In all cases where we've written something we exit, and in
all cases we didn't, we hang. I've left some examples (from an arm64 system)
below.

Thanks,
Mark.

mark@ribbensteg:~$ ./repro.sh 
REPRO: launching perf
WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
check /proc/sys/kernel/kptr_restrict.

Samples in kernel functions may not be resolved if a suitable vmlinux
file is not found in the buildid cache or in the vmlinux path.

Samples in kernel modules won't be resolved at all.

If some relocation was applied (e.g. kexec) symbols may be misresolved
even with a suitable vmlinux or kallsyms file.

Waiting for perf (2104) to exit...
Cannot read kernel map
Couldn't record kernel reference relocation symbol
Symbol resolution may be skewed if relocation was used (e.g. kexec).
Check /proc/kallsyms permission or run as root.
Pre-loop rec wrote 0
Post-loop rec wrote 2128
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data ]
Perf exited

mark@ribbensteg:~$ ./repro.sh 
REPRO: launching perf
WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
check /proc/sys/kernel/kptr_restrict.

Samples in kernel functions may not be resolved if a suitable vmlinux
file is not found in the buildid cache or in the vmlinux path.

Samples in kernel modules won't be resolved at all.

If some relocation was applied (e.g. kexec) symbols may be misresolved
even with a suitable vmlinux or kallsyms file.

Waiting for perf (2099) to exit...
Cannot read kernel map
Couldn't record kernel reference relocation symbol
Symbol resolution may be skewed if relocation was used (e.g. kexec).
Check /proc/kallsyms permission or run as root.
Pre-loop rec wrote 0
Post-loop rec wrote 0
[ perf record: Woken up 0 times to write data ]
< hang here >

mark@ribbensteg:~$ ./repro.sh 
REPRO: launching perf
WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
check /proc/sys/kernel/kptr_restrict.

Samples in kernel functions may not be resolved if a suitable vmlinux
file is not found in the buildid cache or in the vmlinux path.

Samples in kernel modules won't be resolved at all.

If some relocation was applied (e.g. kexec) symbols may be misresolved
even with a suitable vmlinux or kallsyms file.

Cannot read kernel map
Couldn't record kernel reference relocation symbol
Symbol resolution may be skewed if relocation was used (e.g. kexec).
Check /proc/kallsyms permission or run as root.
Pre-loop rec wrote 0
Waiting for perf (2108) to exit...
Post-loop rec wrote 0
[ perf record: Woken up 1 times to write data ]
< hang here >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Adrian Hunter Sept. 18, 2015, 6:09 a.m. UTC | #3
On 17/09/15 18:41, Mark Rutland wrote:
> Hi,
> 
> On Wed, Sep 16, 2015 at 09:54:54PM +0100, Arnaldo Carvalho de Melo wrote:
>> Em Wed, Sep 16, 2015 at 06:18:49PM +0100, Mark Rutland escreveu:
>>> This has been observed to result in an exit-time hang when counting
>>> rare/unschedulable events with perf record, and can be triggered
>>> artificially with the script below:
>>>
>>> ----
>>> #!/bin/sh
>>> printf "REPRO: launching perf\n";
>>> ./perf record -e software/config=9/ sleep 1 &
>>> PERF_PID=$!;
>>> sleep 0.002;
>>> kill -2 $PERF_PID;
>>> printf "REPRO: waiting for perf (%d) to exit...\n" "$PERF_PID";
>>> wait $PERF_PID;
>>> printf "REPRO: perf exited\n";
>>> ----
>>
>> So, I run it here, without this patch, and get:
>>
>>   [root@zoo ~]# time ./repro.sh 
>>   REPRO: launching perf
>>   REPRO: waiting for perf (766) to exit...
>>   [ perf record: Woken up 1 times to write data ]
>>   [ perf record: Captured and wrote 0.015 MB perf.data ]
>>   REPRO: perf exited
>>   real	0m1.060s
>>   user	0m0.018s
>>   sys	0m0.037s
> 
> [...]
> 
>> What am I doing wrong? Trying to reproduce this before even looking at
>> the patch :-)
> 
> I suspect you have a shinier computer than I do! ;)
> 

I imagine you would also need to contrive not to write any synthesized MMAP
or COMM events i.e. no access to /proc perhaps?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Mark Rutland Sept. 18, 2015, 9:51 a.m. UTC | #4
On Fri, Sep 18, 2015 at 07:09:16AM +0100, Adrian Hunter wrote:
> On 17/09/15 18:41, Mark Rutland wrote:
> > Hi,
> > 
> > On Wed, Sep 16, 2015 at 09:54:54PM +0100, Arnaldo Carvalho de Melo wrote:
> >> Em Wed, Sep 16, 2015 at 06:18:49PM +0100, Mark Rutland escreveu:
> >>> This has been observed to result in an exit-time hang when counting
> >>> rare/unschedulable events with perf record, and can be triggered
> >>> artificially with the script below:
> >>>
> >>> ----
> >>> #!/bin/sh
> >>> printf "REPRO: launching perf\n";
> >>> ./perf record -e software/config=9/ sleep 1 &
> >>> PERF_PID=$!;
> >>> sleep 0.002;
> >>> kill -2 $PERF_PID;
> >>> printf "REPRO: waiting for perf (%d) to exit...\n" "$PERF_PID";
> >>> wait $PERF_PID;
> >>> printf "REPRO: perf exited\n";
> >>> ----
> >>
> >> So, I run it here, without this patch, and get:
> >>
> >>   [root@zoo ~]# time ./repro.sh 
> >>   REPRO: launching perf
> >>   REPRO: waiting for perf (766) to exit...
> >>   [ perf record: Woken up 1 times to write data ]
> >>   [ perf record: Captured and wrote 0.015 MB perf.data ]
> >>   REPRO: perf exited
> >>   real	0m1.060s
> >>   user	0m0.018s
> >>   sys	0m0.037s
> > 
> > [...]
> > 
> >> What am I doing wrong? Trying to reproduce this before even looking at
> >> the patch :-)
> > 
> > I suspect you have a shinier computer than I do! ;)
> > 
> 
> I imagine you would also need to contrive not to write any synthesized MMAP
> or COMM events i.e. no access to /proc perhaps?

Taking a look through __cmd_record:

I'm not writing to a pipe, so nothing gets synthesized for attrs or tracepoints.

I'm not using aux trace, so nothing gets synthesized for that.

Running as a regular user I don't have synthesized events for the kernel mmap
or modules:

	WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
	check /proc/sys/kernel/kptr_restrict.

	Samples in kernel functions may not be resolved if a suitable vmlinux
	file is not found in the buildid cache or in the vmlinux path.

	Samples in kernel modules won't be resolved at all.

	If some relocation was applied (e.g. kexec) symbols may be misresolved
	even with a suitable vmlinux or kallsyms file.

	Cannot read kernel map
	Couldn't record kernel reference relocation symbol
	Symbol resolution may be skewed if relocation was used (e.g. kexec).
	Check /proc/kallsyms permission or run as root.

I'm not running a guest, so I don't have any synthesized events for the guest
os.

__machine__synthesize_threads doesn't synthesize any task events
(target__has_task(target) is false at this point, as is
target__has_cpu(target)), because perf starts the workload later via
perf_evlist__start_workload.

So it looks like I shouldn't have any synthesized events. Have I missed
anything?

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Adrian Hunter Sept. 18, 2015, 10:55 a.m. UTC | #5
On 18/09/15 12:51, Mark Rutland wrote:
> On Fri, Sep 18, 2015 at 07:09:16AM +0100, Adrian Hunter wrote:
>> On 17/09/15 18:41, Mark Rutland wrote:
>>> Hi,
>>>
>>> On Wed, Sep 16, 2015 at 09:54:54PM +0100, Arnaldo Carvalho de Melo wrote:
>>>> Em Wed, Sep 16, 2015 at 06:18:49PM +0100, Mark Rutland escreveu:
>>>>> This has been observed to result in an exit-time hang when counting
>>>>> rare/unschedulable events with perf record, and can be triggered
>>>>> artificially with the script below:
>>>>>
>>>>> ----
>>>>> #!/bin/sh
>>>>> printf "REPRO: launching perf\n";
>>>>> ./perf record -e software/config=9/ sleep 1 &
>>>>> PERF_PID=$!;
>>>>> sleep 0.002;
>>>>> kill -2 $PERF_PID;
>>>>> printf "REPRO: waiting for perf (%d) to exit...\n" "$PERF_PID";
>>>>> wait $PERF_PID;
>>>>> printf "REPRO: perf exited\n";
>>>>> ----
>>>>
>>>> So, I run it here, without this patch, and get:
>>>>
>>>>   [root@zoo ~]# time ./repro.sh 
>>>>   REPRO: launching perf
>>>>   REPRO: waiting for perf (766) to exit...
>>>>   [ perf record: Woken up 1 times to write data ]
>>>>   [ perf record: Captured and wrote 0.015 MB perf.data ]
>>>>   REPRO: perf exited
>>>>   real	0m1.060s
>>>>   user	0m0.018s
>>>>   sys	0m0.037s
>>>
>>> [...]
>>>
>>>> What am I doing wrong? Trying to reproduce this before even looking at
>>>> the patch :-)
>>>
>>> I suspect you have a shinier computer than I do! ;)
>>>
>>
>> I imagine you would also need to contrive not to write any synthesized MMAP
>> or COMM events i.e. no access to /proc perhaps?
> 
> Taking a look through __cmd_record:
> 
> I'm not writing to a pipe, so nothing gets synthesized for attrs or tracepoints.
> 
> I'm not using aux trace, so nothing gets synthesized for that.
> 
> Running as a regular user I don't have synthesized events for the kernel mmap
> or modules:
> 
> 	WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
> 	check /proc/sys/kernel/kptr_restrict.
> 
> 	Samples in kernel functions may not be resolved if a suitable vmlinux
> 	file is not found in the buildid cache or in the vmlinux path.
> 
> 	Samples in kernel modules won't be resolved at all.
> 
> 	If some relocation was applied (e.g. kexec) symbols may be misresolved
> 	even with a suitable vmlinux or kallsyms file.
> 
> 	Cannot read kernel map
> 	Couldn't record kernel reference relocation symbol
> 	Symbol resolution may be skewed if relocation was used (e.g. kexec).
> 	Check /proc/kallsyms permission or run as root.
> 
> I'm not running a guest, so I don't have any synthesized events for the guest
> os.
> 
> __machine__synthesize_threads doesn't synthesize any task events
> (target__has_task(target) is false at this point, as is
> target__has_cpu(target)), because perf starts the workload later via
> perf_evlist__start_workload.
> 
> So it looks like I shouldn't have any synthesized events. Have I missed
> anything?

Yes, you are right.  But you are not getting the COMM and MMAP events from
the exec which means you are killing perf before it execs the workload.
Perf writes through a pipe to its forked child to do the exec, so
to reproduce it you just need to put everything on the same cpu and play around
with the sleep number

	taskset -c 0 tools/perf/perf record -- sleep 1 & sleep 0.005 ; kill -2 $!

reproduces the problem for me.  Of course, since the workload doesn't get exec'ed
it doesn't matter if it is bogus i.e.

	taskset -c 0 tools/perf/perf record -- sdfgsdgdg & sleep 0.005 ; kill -2 $!

also reproduces the problem.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Mark Rutland Sept. 18, 2015, 1:37 p.m. UTC | #6
> > So it looks like I shouldn't have any synthesized events. Have I missed
> > anything?
> 
> Yes, you are right.  But you are not getting the COMM and MMAP events from
> the exec which means you are killing perf before it execs the workload.

Oh, I see.

> Perf writes through a pipe to its forked child to do the exec, so
> to reproduce it you just need to put everything on the same cpu and play around
> with the sleep number
> 
> 	taskset -c 0 tools/perf/perf record -- sleep 1 & sleep 0.005 ; kill -2 $!
> 
> reproduces the problem for me.  Of course, since the workload doesn't get exec'ed
> it doesn't matter if it is bogus i.e.
> 
> 	taskset -c 0 tools/perf/perf record -- sdfgsdgdg & sleep 0.005 ; kill -2 $!
> 
> also reproduces the problem.

Thanks for confirming!

Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Mark Rutland Sept. 18, 2015, 3:18 p.m. UTC | #7
On Fri, Sep 18, 2015 at 04:00:18PM +0100, Arnaldo Carvalho de Melo wrote:
> I tried what you suggested, but couldn't reproduce it, I only managed
> when I applied the patch below _and_ executed;
> 
>  # echo 1 > /proc/sys/kernel/kptr_restrict
> 
> To be on the same page as Mark. Then, yes, I get stuck in:
> 
> #0  0x00007f746a1d8aba in mmap64 () at ../sysdeps/unix/syscall-template.S:81
> #1  0x00000000004be66e in __perf_session__process_events (file_size=232, data_size=<optimized out>, data_offset=<optimized out>, session=0x1ba8590)
>     at util/session.c:1604
> #2  perf_session__process_events (session=0x1ba8590) at util/session.c:1685
> #3  0x000000000042d97b in process_buildids (rec=0x869ac0 <record>) at builtin-record.c:364
> #4  __cmd_record (rec=0x869ac0 <record>, argv=<optimized out>, argc=<optimized out>) at builtin-record.c:734
> #5  cmd_record (argc=<optimized out>, argv=<optimized out>, prefix=<optimized out>) at builtin-record.c:1199
> #6  0x0000000000479123 in run_builtin (p=p@entry=0x874a90 <commands+144>, argc=argc@entry=3, argv=argv@entry=0x7ffd92eb5ee0) at perf.c:370
> #7  0x0000000000420aba in handle_internal_command (argv=0x7ffd92eb5ee0, argc=3) at perf.c:429
> #8  run_argv (argv=0x7ffd92eb5c70, argcp=0x7ffd92eb5c7c) at perf.c:473
> #9  main (argc=3, argv=0x7ffd92eb5ee0) at perf.c:588
> (gdb) 
> 
> After applying your patch it works, had to tweak the commit log to avoid
> starting lines with ---, breaks git scripts, also added a commiter log,
> check it, patch is below, after the one I used to not process any
> samples.

Sorry for the '---' problem, I'll bear that in mind in future.

Your commit log looks fine, though I'm slightly confused by the
Reported-by line -- did you mean to add that?

Mark.

> commit dd486ec4aa33cfca2fd912ef501d49909005de79
> Author: Mark Rutland <mark.rutland@arm.com>
> Date:   Wed Sep 16 18:18:49 2015 +0100
> 
>     perf record: Avoid infinite loop at buildid processing with no samples
>     
>     If a session contains no events, we can get stuck in an infinite loop in
>     __perf_session__process_events, with a non-zero file_size and data_offset, but
>     a zero data_size.
>     
>     In this case, we can mmap the entirety of the file (consisting of the file and
>     attribute headers), and fetch_mmaped_event will correctly refuse to read any
>     (unmapped and non-existent) event headers. This causes
>     __perf_session__process_events to unmap the file and retry with the exact same
>     parameters, getting stuck in an infinite loop.
>     
>     This has been observed to result in an exit-time hang when counting
>     rare/unschedulable events with perf record, and can be triggered artificially
>     with the script below:
>     
>       ----
>       #!/bin/sh
>       printf "REPRO: launching perf\n";
>       ./perf record -e software/config=9/ sleep 1 &
>       PERF_PID=$!;
>       sleep 0.002;
>       kill -2 $PERF_PID;
>       printf "REPRO: waiting for perf (%d) to exit...\n" "$PERF_PID";
>       wait $PERF_PID;
>       printf "REPRO: perf exited\n";
>       ----
>     
>     To avoid this, have __perf_session__process_events bail out early when
>     the file has no data (i.e. it has no events).
>     
>     Commiter note:
>     
>     I only managed to reproduce this when setting
>     /proc/sys/kernel/kptr_restrict to '1' and changing the code to
>     purposefully not process any samples and no synthesized samples, i.e.
>     kptr_restrict prevents 'record' from synthesizing the kernel mmaps for
>     vmlinux + modules and since it is a workload started from perf, we don't
>     synthesize mmap/comm records for existing threads.
>     
>     Adrian Hunter managed to reproduce it in his environment tho.
>     
>     Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
>     Signed-off-by: Mark Rutland <mark.rutland@arm.com>
>     Cc: Adrian Hunter <adrian.hunter@intel.com>
>     Cc: Jiri Olsa <jolsa@redhat.com>
>     Cc: Peter Zijlstra <peterz@infradead.org>
>     Link: http://lkml.kernel.org/r/1442423929-12253-1-git-send-email-mark.rutland@arm.com
>     Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 8a4537ee9bc3..fc3f7c922f99 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -1580,7 +1580,10 @@ static int __perf_session__process_events(struct perf_session *session,
>  	file_offset = page_offset;
>  	head = data_offset - page_offset;
>  
> -	if (data_size && (data_offset + data_size < file_size))
> +	if (data_size == 0)
> +		goto out;
> +
> +	if (data_offset + data_size < file_size)
>  		file_size = data_offset + data_size;
>  
>  	ui_progress__init(&prog, file_size, "Processing events...");
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Arnaldo Carvalho de Melo Sept. 18, 2015, 3:29 p.m. UTC | #8
Em Fri, Sep 18, 2015 at 04:18:51PM +0100, Mark Rutland escreveu:
> On Fri, Sep 18, 2015 at 04:00:18PM +0100, Arnaldo Carvalho de Melo wrote:
> > After applying your patch it works, had to tweak the commit log to avoid
> > starting lines with ---, breaks git scripts, also added a commiter log,
> > check it, patch is below, after the one I used to not process any
> > samples.
 
> Sorry for the '---' problem, I'll bear that in mind in future.
 
> Your commit log looks fine, though I'm slightly confused by the
> Reported-by line -- did you mean to add that?

Sorry, I mean Tested-by:, will replace, guess i can replace the one for
Adrian from Cc: to Tested-by too, right?

- Arnaldo
 
> Mark.
> 
> > commit dd486ec4aa33cfca2fd912ef501d49909005de79
> > Author: Mark Rutland <mark.rutland@arm.com>
> > Date:   Wed Sep 16 18:18:49 2015 +0100
> > 
> >     perf record: Avoid infinite loop at buildid processing with no samples
> >     
> >     If a session contains no events, we can get stuck in an infinite loop in
> >     __perf_session__process_events, with a non-zero file_size and data_offset, but
> >     a zero data_size.
> >     
> >     In this case, we can mmap the entirety of the file (consisting of the file and
> >     attribute headers), and fetch_mmaped_event will correctly refuse to read any
> >     (unmapped and non-existent) event headers. This causes
> >     __perf_session__process_events to unmap the file and retry with the exact same
> >     parameters, getting stuck in an infinite loop.
> >     
> >     This has been observed to result in an exit-time hang when counting
> >     rare/unschedulable events with perf record, and can be triggered artificially
> >     with the script below:
> >     
> >       ----
> >       #!/bin/sh
> >       printf "REPRO: launching perf\n";
> >       ./perf record -e software/config=9/ sleep 1 &
> >       PERF_PID=$!;
> >       sleep 0.002;
> >       kill -2 $PERF_PID;
> >       printf "REPRO: waiting for perf (%d) to exit...\n" "$PERF_PID";
> >       wait $PERF_PID;
> >       printf "REPRO: perf exited\n";
> >       ----
> >     
> >     To avoid this, have __perf_session__process_events bail out early when
> >     the file has no data (i.e. it has no events).
> >     
> >     Commiter note:
> >     
> >     I only managed to reproduce this when setting
> >     /proc/sys/kernel/kptr_restrict to '1' and changing the code to
> >     purposefully not process any samples and no synthesized samples, i.e.
> >     kptr_restrict prevents 'record' from synthesizing the kernel mmaps for
> >     vmlinux + modules and since it is a workload started from perf, we don't
> >     synthesize mmap/comm records for existing threads.
> >     
> >     Adrian Hunter managed to reproduce it in his environment tho.
> >     
> >     Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
> >     Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> >     Cc: Adrian Hunter <adrian.hunter@intel.com>
> >     Cc: Jiri Olsa <jolsa@redhat.com>
> >     Cc: Peter Zijlstra <peterz@infradead.org>
> >     Link: http://lkml.kernel.org/r/1442423929-12253-1-git-send-email-mark.rutland@arm.com
> >     Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> > 
> > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> > index 8a4537ee9bc3..fc3f7c922f99 100644
> > --- a/tools/perf/util/session.c
> > +++ b/tools/perf/util/session.c
> > @@ -1580,7 +1580,10 @@ static int __perf_session__process_events(struct perf_session *session,
> >  	file_offset = page_offset;
> >  	head = data_offset - page_offset;
> >  
> > -	if (data_size && (data_offset + data_size < file_size))
> > +	if (data_size == 0)
> > +		goto out;
> > +
> > +	if (data_offset + data_size < file_size)
> >  		file_size = data_offset + data_size;
> >  
> >  	ui_progress__init(&prog, file_size, "Processing events...");
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Adrian Hunter Sept. 21, 2015, 12:33 p.m. UTC | #9
On 18/09/15 18:29, Arnaldo Carvalho de Melo wrote:
> Em Fri, Sep 18, 2015 at 04:18:51PM +0100, Mark Rutland escreveu:
>> On Fri, Sep 18, 2015 at 04:00:18PM +0100, Arnaldo Carvalho de Melo wrote:
>>> After applying your patch it works, had to tweak the commit log to avoid
>>> starting lines with ---, breaks git scripts, also added a commiter log,
>>> check it, patch is below, after the one I used to not process any
>>> samples.
>  
>> Sorry for the '---' problem, I'll bear that in mind in future.
>  
>> Your commit log looks fine, though I'm slightly confused by the
>> Reported-by line -- did you mean to add that?
> 
> Sorry, I mean Tested-by:, will replace, guess i can replace the one for
> Adrian from Cc: to Tested-by too, right?

Yes

> 
> - Arnaldo
>  
>> Mark.
>>
>>> commit dd486ec4aa33cfca2fd912ef501d49909005de79
>>> Author: Mark Rutland <mark.rutland@arm.com>
>>> Date:   Wed Sep 16 18:18:49 2015 +0100
>>>
>>>     perf record: Avoid infinite loop at buildid processing with no samples
>>>     
>>>     If a session contains no events, we can get stuck in an infinite loop in
>>>     __perf_session__process_events, with a non-zero file_size and data_offset, but
>>>     a zero data_size.
>>>     
>>>     In this case, we can mmap the entirety of the file (consisting of the file and
>>>     attribute headers), and fetch_mmaped_event will correctly refuse to read any
>>>     (unmapped and non-existent) event headers. This causes
>>>     __perf_session__process_events to unmap the file and retry with the exact same
>>>     parameters, getting stuck in an infinite loop.
>>>     
>>>     This has been observed to result in an exit-time hang when counting
>>>     rare/unschedulable events with perf record, and can be triggered artificially
>>>     with the script below:
>>>     
>>>       ----
>>>       #!/bin/sh
>>>       printf "REPRO: launching perf\n";
>>>       ./perf record -e software/config=9/ sleep 1 &
>>>       PERF_PID=$!;
>>>       sleep 0.002;
>>>       kill -2 $PERF_PID;
>>>       printf "REPRO: waiting for perf (%d) to exit...\n" "$PERF_PID";
>>>       wait $PERF_PID;
>>>       printf "REPRO: perf exited\n";
>>>       ----
>>>     
>>>     To avoid this, have __perf_session__process_events bail out early when
>>>     the file has no data (i.e. it has no events).
>>>     
>>>     Commiter note:
>>>     
>>>     I only managed to reproduce this when setting
>>>     /proc/sys/kernel/kptr_restrict to '1' and changing the code to
>>>     purposefully not process any samples and no synthesized samples, i.e.
>>>     kptr_restrict prevents 'record' from synthesizing the kernel mmaps for
>>>     vmlinux + modules and since it is a workload started from perf, we don't
>>>     synthesize mmap/comm records for existing threads.
>>>     
>>>     Adrian Hunter managed to reproduce it in his environment tho.
>>>     
>>>     Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
>>>     Signed-off-by: Mark Rutland <mark.rutland@arm.com>
>>>     Cc: Adrian Hunter <adrian.hunter@intel.com>
>>>     Cc: Jiri Olsa <jolsa@redhat.com>
>>>     Cc: Peter Zijlstra <peterz@infradead.org>
>>>     Link: http://lkml.kernel.org/r/1442423929-12253-1-git-send-email-mark.rutland@arm.com
>>>     Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
>>>
>>> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
>>> index 8a4537ee9bc3..fc3f7c922f99 100644
>>> --- a/tools/perf/util/session.c
>>> +++ b/tools/perf/util/session.c
>>> @@ -1580,7 +1580,10 @@ static int __perf_session__process_events(struct perf_session *session,
>>>  	file_offset = page_offset;
>>>  	head = data_offset - page_offset;
>>>  
>>> -	if (data_size && (data_offset + data_size < file_size))
>>> +	if (data_size == 0)
>>> +		goto out;
>>> +
>>> +	if (data_offset + data_size < file_size)
>>>  		file_size = data_offset + data_size;
>>>  
>>>  	ui_progress__init(&prog, file_size, "Processing events...");
>>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
diff mbox

Patch

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 8a4537e..fc3f7c9 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1580,7 +1580,10 @@  static int __perf_session__process_events(struct perf_session *session,
 	file_offset = page_offset;
 	head = data_offset - page_offset;
 
-	if (data_size && (data_offset + data_size < file_size))
+	if (data_size == 0)
+		goto out;
+
+	if (data_offset + data_size < file_size)
 		file_size = data_offset + data_size;
 
 	ui_progress__init(&prog, file_size, "Processing events...");