Message ID | 20190110142745.25495-1-mark.rutland@arm.com |
---|---|
State | Accepted |
Commit | 9dff0aa95a324e262ffb03f425d00e4751f3294e |
Headers | show |
Series | perf/core: don't WARN for impossible rb sizes | expand |
Hi Mark, On 10/01/2019 14:27, Mark Rutland wrote: > The perf tool uses /proc/sys/kernel/perf_event_mlock_kb to determine how > large its ringbuffer mmap should be. This can be configured to arbitrary > values, which can be larger than the maximum possible allocation from > kmalloc. > > When this is configured to a suitably large value (e.g. thanks to the > perf fuzzer), attempting to use perf record triggers a WARN_ON_ONCE() in > __alloc_pages_nodemask(): > > [ 337.316688] WARNING: CPU: 2 PID: 5666 at mm/page_alloc.c:4511 > __alloc_pages_nodemask+0x3f8/0xbc8 > [ 337.316694] Modules linked in: > [ 337.316704] CPU: 2 PID: 5666 Comm: perf Not tainted 5.0.0-rc1 #2669 > [ 337.316708] Hardware name: ARM Juno development board (r0) (DT) > [ 337.316714] pstate: 20000005 (nzCv daif -PAN -UAO) > [ 337.316720] pc : __alloc_pages_nodemask+0x3f8/0xbc8 > [ 337.316728] lr : alloc_pages_current+0x80/0xe8 > [ 337.316732] sp : ffff000016eeb9e0 > [ 337.316736] x29: ffff000016eeb9e0 x28: 0000000000080001 > [ 337.316744] x27: 0000000000000000 x26: ffff0000111e21f0 > [ 337.316751] x25: 0000000000000001 x24: 0000000000000000 > [ 337.316757] x23: 0000000000080001 x22: 0000000000000000 > [ 337.316762] x21: 0000000000000000 x20: 000000000000000b > [ 337.316768] x19: 000000000060c0c0 x18: 0000000000000000 > [ 337.316773] x17: 0000000000000000 x16: 0000000000000000 > [ 337.316779] x15: 0000000000000000 x14: 0000000000000000 > [ 337.316784] x13: 0000000000000000 x12: 0000000000000000 > [ 337.316789] x11: 0000000000100000 x10: 0000000000000000 > [ 337.316795] x9 : 0000000010044400 x8 : 0000000080001000 > [ 337.316800] x7 : 0000000000000000 x6 : ffff800975584700 > [ 337.316806] x5 : 0000000000000000 x4 : ffff0000111cd6c8 > [ 337.316811] x3 : 0000000000000000 x2 : 0000000000000000 > [ 337.316816] x1 : 000000000000000b x0 : 000000000060c0c0 > [ 337.316822] Call trace: > [ 337.316828] __alloc_pages_nodemask+0x3f8/0xbc8 > [ 337.316834] alloc_pages_current+0x80/0xe8 > [ 337.316841] kmalloc_order+0x14/0x30 > [ 337.316848] __kmalloc+0x1dc/0x240 > [ 337.316854] rb_alloc+0x3c/0x170 > [ 337.316860] perf_mmap+0x3bc/0x470 > [ 337.316867] mmap_region+0x374/0x4f8 > [ 337.316873] do_mmap+0x300/0x430 > [ 337.316878] vm_mmap_pgoff+0xe4/0x110 > [ 337.316884] ksys_mmap_pgoff+0xc0/0x230 > [ 337.316892] __arm64_sys_mmap+0x28/0x38 > [ 337.316899] el0_svc_common+0xb4/0x118 > [ 337.316905] el0_svc_handler+0x2c/0x80 > [ 337.316910] el0_svc+0x8/0xc > [ 337.316915] ---[ end trace fa29167e20ef0c62 ]--- > > Let's avoid this by checking that the requested allocation is possible > before calling kzalloc. > > Reported-by: Julien Thierry <julien.thierry@arm.com> > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Jiri Olsa <jolsa@redhat.com> > Cc: Namhyung Kim <namhyung@kernel.org> > Cc: Peter Zijlstra <peterz@infradead.org> > --- > kernel/events/ring_buffer.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c > index 4a9937076331..309ef5a64af5 100644 > --- a/kernel/events/ring_buffer.c > +++ b/kernel/events/ring_buffer.c > @@ -734,6 +734,9 @@ struct ring_buffer *rb_alloc(int nr_pages, long watermark, int cpu, int flags) > size = sizeof(struct ring_buffer); > size += nr_pages * sizeof(void *); > > + if (order_base_2(size) >= MAX_ORDER) > + goto fail; > + I see that in kernel/events/ring_buffer.c there are two versions of rb_alloc() (depending on whether CONFIG_PERF_USE_VMALLOC is defined or not). Since the warning comes from the kzalloc, I'd think we'd need to add this check in both implementations of rb_alloc(). With that change (or if for some reason the other rb_alloc() version doesn't need the check): Reviewed-by: Julien Thierry <julien.thierry@arm.com> Thanks, -- Julien Thierry
Hi Mark, Looks I hit a regression issue on SKL desktop. For example, root@skl:/tmp# perf record -g -a failed to mmap with 12 (Cannot allocate memory) In this case, size = 1264, order_base_2 = 11, MAX_ORDER = 11 if (order_base_2(size) >= MAX_ORDER) goto fail; It will goto fail directly. Is it really correct? Could you help to look at this? BTW, I tested with Arnaldo's perf/core branch. Thanks Jin Yao On 1/10/2019 10:27 PM, Mark Rutland wrote: > The perf tool uses /proc/sys/kernel/perf_event_mlock_kb to determine how > large its ringbuffer mmap should be. This can be configured to arbitrary > values, which can be larger than the maximum possible allocation from > kmalloc. > > When this is configured to a suitably large value (e.g. thanks to the > perf fuzzer), attempting to use perf record triggers a WARN_ON_ONCE() in > __alloc_pages_nodemask(): > > [ 337.316688] WARNING: CPU: 2 PID: 5666 at mm/page_alloc.c:4511 > __alloc_pages_nodemask+0x3f8/0xbc8 > [ 337.316694] Modules linked in: > [ 337.316704] CPU: 2 PID: 5666 Comm: perf Not tainted 5.0.0-rc1 #2669 > [ 337.316708] Hardware name: ARM Juno development board (r0) (DT) > [ 337.316714] pstate: 20000005 (nzCv daif -PAN -UAO) > [ 337.316720] pc : __alloc_pages_nodemask+0x3f8/0xbc8 > [ 337.316728] lr : alloc_pages_current+0x80/0xe8 > [ 337.316732] sp : ffff000016eeb9e0 > [ 337.316736] x29: ffff000016eeb9e0 x28: 0000000000080001 > [ 337.316744] x27: 0000000000000000 x26: ffff0000111e21f0 > [ 337.316751] x25: 0000000000000001 x24: 0000000000000000 > [ 337.316757] x23: 0000000000080001 x22: 0000000000000000 > [ 337.316762] x21: 0000000000000000 x20: 000000000000000b > [ 337.316768] x19: 000000000060c0c0 x18: 0000000000000000 > [ 337.316773] x17: 0000000000000000 x16: 0000000000000000 > [ 337.316779] x15: 0000000000000000 x14: 0000000000000000 > [ 337.316784] x13: 0000000000000000 x12: 0000000000000000 > [ 337.316789] x11: 0000000000100000 x10: 0000000000000000 > [ 337.316795] x9 : 0000000010044400 x8 : 0000000080001000 > [ 337.316800] x7 : 0000000000000000 x6 : ffff800975584700 > [ 337.316806] x5 : 0000000000000000 x4 : ffff0000111cd6c8 > [ 337.316811] x3 : 0000000000000000 x2 : 0000000000000000 > [ 337.316816] x1 : 000000000000000b x0 : 000000000060c0c0 > [ 337.316822] Call trace: > [ 337.316828] __alloc_pages_nodemask+0x3f8/0xbc8 > [ 337.316834] alloc_pages_current+0x80/0xe8 > [ 337.316841] kmalloc_order+0x14/0x30 > [ 337.316848] __kmalloc+0x1dc/0x240 > [ 337.316854] rb_alloc+0x3c/0x170 > [ 337.316860] perf_mmap+0x3bc/0x470 > [ 337.316867] mmap_region+0x374/0x4f8 > [ 337.316873] do_mmap+0x300/0x430 > [ 337.316878] vm_mmap_pgoff+0xe4/0x110 > [ 337.316884] ksys_mmap_pgoff+0xc0/0x230 > [ 337.316892] __arm64_sys_mmap+0x28/0x38 > [ 337.316899] el0_svc_common+0xb4/0x118 > [ 337.316905] el0_svc_handler+0x2c/0x80 > [ 337.316910] el0_svc+0x8/0xc > [ 337.316915] ---[ end trace fa29167e20ef0c62 ]--- > > Let's avoid this by checking that the requested allocation is possible > before calling kzalloc. > > Reported-by: Julien Thierry <julien.thierry@arm.com> > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Jiri Olsa <jolsa@redhat.com> > Cc: Namhyung Kim <namhyung@kernel.org> > Cc: Peter Zijlstra <peterz@infradead.org> > --- > kernel/events/ring_buffer.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c > index 4a9937076331..309ef5a64af5 100644 > --- a/kernel/events/ring_buffer.c > +++ b/kernel/events/ring_buffer.c > @@ -734,6 +734,9 @@ struct ring_buffer *rb_alloc(int nr_pages, long watermark, int cpu, int flags) > size = sizeof(struct ring_buffer); > size += nr_pages * sizeof(void *); > > + if (order_base_2(size) >= MAX_ORDER) > + goto fail; > + > rb = kzalloc(size, GFP_KERNEL); > if (!rb) > goto fail; >
On Tue, Feb 12, 2019 at 10:42:38AM +0800, Jin, Yao wrote: > > diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c > > index 4a9937076331..309ef5a64af5 100644 > > --- a/kernel/events/ring_buffer.c > > +++ b/kernel/events/ring_buffer.c > > @@ -734,6 +734,9 @@ struct ring_buffer *rb_alloc(int nr_pages, long watermark, int cpu, int flags) > > size = sizeof(struct ring_buffer); > > size += nr_pages * sizeof(void *); > > + if (order_base_2(size) >= MAX_ORDER) > > + goto fail; > > + > > rb = kzalloc(size, GFP_KERNEL); Yes, Boris also send the entire morning bisecting this. The problem is that @size is in bytes and MAX_ORDER is in pages. That should be: if (order_base_2(size) >= PAGE_SHIFT+MAX_ORDER)
On 2/12/2019 9:07 PM, Peter Zijlstra wrote: > On Tue, Feb 12, 2019 at 10:42:38AM +0800, Jin, Yao wrote: >>> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c >>> index 4a9937076331..309ef5a64af5 100644 >>> --- a/kernel/events/ring_buffer.c >>> +++ b/kernel/events/ring_buffer.c >>> @@ -734,6 +734,9 @@ struct ring_buffer *rb_alloc(int nr_pages, long watermark, int cpu, int flags) >>> size = sizeof(struct ring_buffer); >>> size += nr_pages * sizeof(void *); >>> + if (order_base_2(size) >= MAX_ORDER) >>> + goto fail; >>> + >>> rb = kzalloc(size, GFP_KERNEL); > > Yes, Boris also send the entire morning bisecting this. > > The problem is that @size is in bytes and MAX_ORDER is in pages. > > That should be: > > if (order_base_2(size) >= PAGE_SHIFT+MAX_ORDER) > > Thanks Peter! This fix works! Thanks Jin Yao
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c index 4a9937076331..309ef5a64af5 100644 --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -734,6 +734,9 @@ struct ring_buffer *rb_alloc(int nr_pages, long watermark, int cpu, int flags) size = sizeof(struct ring_buffer); size += nr_pages * sizeof(void *); + if (order_base_2(size) >= MAX_ORDER) + goto fail; + rb = kzalloc(size, GFP_KERNEL); if (!rb) goto fail;
The perf tool uses /proc/sys/kernel/perf_event_mlock_kb to determine how large its ringbuffer mmap should be. This can be configured to arbitrary values, which can be larger than the maximum possible allocation from kmalloc. When this is configured to a suitably large value (e.g. thanks to the perf fuzzer), attempting to use perf record triggers a WARN_ON_ONCE() in __alloc_pages_nodemask(): [ 337.316688] WARNING: CPU: 2 PID: 5666 at mm/page_alloc.c:4511 __alloc_pages_nodemask+0x3f8/0xbc8 [ 337.316694] Modules linked in: [ 337.316704] CPU: 2 PID: 5666 Comm: perf Not tainted 5.0.0-rc1 #2669 [ 337.316708] Hardware name: ARM Juno development board (r0) (DT) [ 337.316714] pstate: 20000005 (nzCv daif -PAN -UAO) [ 337.316720] pc : __alloc_pages_nodemask+0x3f8/0xbc8 [ 337.316728] lr : alloc_pages_current+0x80/0xe8 [ 337.316732] sp : ffff000016eeb9e0 [ 337.316736] x29: ffff000016eeb9e0 x28: 0000000000080001 [ 337.316744] x27: 0000000000000000 x26: ffff0000111e21f0 [ 337.316751] x25: 0000000000000001 x24: 0000000000000000 [ 337.316757] x23: 0000000000080001 x22: 0000000000000000 [ 337.316762] x21: 0000000000000000 x20: 000000000000000b [ 337.316768] x19: 000000000060c0c0 x18: 0000000000000000 [ 337.316773] x17: 0000000000000000 x16: 0000000000000000 [ 337.316779] x15: 0000000000000000 x14: 0000000000000000 [ 337.316784] x13: 0000000000000000 x12: 0000000000000000 [ 337.316789] x11: 0000000000100000 x10: 0000000000000000 [ 337.316795] x9 : 0000000010044400 x8 : 0000000080001000 [ 337.316800] x7 : 0000000000000000 x6 : ffff800975584700 [ 337.316806] x5 : 0000000000000000 x4 : ffff0000111cd6c8 [ 337.316811] x3 : 0000000000000000 x2 : 0000000000000000 [ 337.316816] x1 : 000000000000000b x0 : 000000000060c0c0 [ 337.316822] Call trace: [ 337.316828] __alloc_pages_nodemask+0x3f8/0xbc8 [ 337.316834] alloc_pages_current+0x80/0xe8 [ 337.316841] kmalloc_order+0x14/0x30 [ 337.316848] __kmalloc+0x1dc/0x240 [ 337.316854] rb_alloc+0x3c/0x170 [ 337.316860] perf_mmap+0x3bc/0x470 [ 337.316867] mmap_region+0x374/0x4f8 [ 337.316873] do_mmap+0x300/0x430 [ 337.316878] vm_mmap_pgoff+0xe4/0x110 [ 337.316884] ksys_mmap_pgoff+0xc0/0x230 [ 337.316892] __arm64_sys_mmap+0x28/0x38 [ 337.316899] el0_svc_common+0xb4/0x118 [ 337.316905] el0_svc_handler+0x2c/0x80 [ 337.316910] el0_svc+0x8/0xc [ 337.316915] ---[ end trace fa29167e20ef0c62 ]--- Let's avoid this by checking that the requested allocation is possible before calling kzalloc. Reported-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> --- kernel/events/ring_buffer.c | 3 +++ 1 file changed, 3 insertions(+) -- 2.11.0