Message ID | 20180510130632.34497-1-mark.rutland@arm.com |
---|---|
State | New |
Headers | show |
Series | perf/ring_buffer: ensure atomicity and order of updates | expand |
Hi Mark, I love your patch! Perhaps something to improve: [auto build test WARNING on tip/perf/core] [also build test WARNING on v4.17-rc4 next-20180510] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Mark-Rutland/perf-ring_buffer-ensure-atomicity-and-order-of-updates/20180511-070205 config: i386-randconfig-s1-201818 (attached as .config) compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026 reproduce: # save the attached .config to linux build tree make ARCH=i386 All warnings (new ones prefixed by >>): In file included from include/uapi/linux/swab.h:6:0, from include/linux/swab.h:5, from include/uapi/linux/byteorder/little_endian.h:13, from include/linux/byteorder/little_endian.h:5, from arch/x86/include/uapi/asm/byteorder.h:5, from include/uapi/linux/perf_event.h:20, from include/linux/perf_event.h:17, from kernel//events/ring_buffer.c:12: kernel//events/ring_buffer.c: In function 'perf_output_put_handle': include/linux/compiler.h:339:38: error: call to '__compiletime_assert_86' declared with attribute error: Need native word sized stores/loads for atomicity. _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert' prefix ## suffix(); \ ^~~~~~ include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~~~~~~~~~~~~~~~~~ >> include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert' compiletime_assert(__native_word(t), \ ^~~~~~~~~~~~~~~~~~ >> include/asm-generic/barrier.h:181:2: note: in expansion of macro 'compiletime_assert_atomic_type' compiletime_assert_atomic_type(*p); \ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> kernel//events/ring_buffer.c:86:2: note: in expansion of macro 'smp_store_release' smp_store_release(&rb->user_page->data_head, head); /* B, matches C */ ^~~~~~~~~~~~~~~~~ kernel//events/ring_buffer.c: In function 'perf_aux_output_skip': include/linux/compiler.h:339:38: error: call to '__compiletime_assert_498' declared with attribute error: Need native word sized stores/loads for atomicity. _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert' prefix ## suffix(); \ ^~~~~~ include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~~~~~~~~~~~~~~~~~ >> include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert' compiletime_assert(__native_word(t), \ ^~~~~~~~~~~~~~~~~~ >> include/asm-generic/barrier.h:181:2: note: in expansion of macro 'compiletime_assert_atomic_type' compiletime_assert_atomic_type(*p); \ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel//events/ring_buffer.c:498:2: note: in expansion of macro 'smp_store_release' smp_store_release(&rb->user_page->aux_head, rb->aux_head); ^~~~~~~~~~~~~~~~~ kernel//events/ring_buffer.c: In function 'perf_aux_output_end': include/linux/compiler.h:339:38: error: call to '__compiletime_assert_466' declared with attribute error: Need native word sized stores/loads for atomicity. _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert' prefix ## suffix(); \ ^~~~~~ include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~~~~~~~~~~~~~~~~~ >> include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert' compiletime_assert(__native_word(t), \ ^~~~~~~~~~~~~~~~~~ >> include/asm-generic/barrier.h:181:2: note: in expansion of macro 'compiletime_assert_atomic_type' compiletime_assert_atomic_type(*p); \ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel//events/ring_buffer.c:466:2: note: in expansion of macro 'smp_store_release' smp_store_release(&rb->user_page->aux_head, rb->aux_head); ^~~~~~~~~~~~~~~~~ -- In file included from include/uapi/linux/swab.h:6:0, from include/linux/swab.h:5, from include/uapi/linux/byteorder/little_endian.h:13, from include/linux/byteorder/little_endian.h:5, from arch/x86/include/uapi/asm/byteorder.h:5, from include/uapi/linux/perf_event.h:20, from include/linux/perf_event.h:17, from kernel/events/ring_buffer.c:12: kernel/events/ring_buffer.c: In function 'perf_output_put_handle': include/linux/compiler.h:339:38: error: call to '__compiletime_assert_86' declared with attribute error: Need native word sized stores/loads for atomicity. _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert' prefix ## suffix(); \ ^~~~~~ include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~~~~~~~~~~~~~~~~~ >> include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert' compiletime_assert(__native_word(t), \ ^~~~~~~~~~~~~~~~~~ >> include/asm-generic/barrier.h:181:2: note: in expansion of macro 'compiletime_assert_atomic_type' compiletime_assert_atomic_type(*p); \ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/events/ring_buffer.c:86:2: note: in expansion of macro 'smp_store_release' smp_store_release(&rb->user_page->data_head, head); /* B, matches C */ ^~~~~~~~~~~~~~~~~ kernel/events/ring_buffer.c: In function 'perf_aux_output_skip': include/linux/compiler.h:339:38: error: call to '__compiletime_assert_498' declared with attribute error: Need native word sized stores/loads for atomicity. _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert' prefix ## suffix(); \ ^~~~~~ include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~~~~~~~~~~~~~~~~~ >> include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert' compiletime_assert(__native_word(t), \ ^~~~~~~~~~~~~~~~~~ >> include/asm-generic/barrier.h:181:2: note: in expansion of macro 'compiletime_assert_atomic_type' compiletime_assert_atomic_type(*p); \ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/events/ring_buffer.c:498:2: note: in expansion of macro 'smp_store_release' smp_store_release(&rb->user_page->aux_head, rb->aux_head); ^~~~~~~~~~~~~~~~~ kernel/events/ring_buffer.c: In function 'perf_aux_output_end': include/linux/compiler.h:339:38: error: call to '__compiletime_assert_466' declared with attribute error: Need native word sized stores/loads for atomicity. _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert' prefix ## suffix(); \ ^~~~~~ include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~~~~~~~~~~~~~~~~~ >> include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert' compiletime_assert(__native_word(t), \ ^~~~~~~~~~~~~~~~~~ >> include/asm-generic/barrier.h:181:2: note: in expansion of macro 'compiletime_assert_atomic_type' compiletime_assert_atomic_type(*p); \ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/events/ring_buffer.c:466:2: note: in expansion of macro 'smp_store_release' smp_store_release(&rb->user_page->aux_head, rb->aux_head); ^~~~~~~~~~~~~~~~~ vim +/smp_store_release +86 kernel//events/ring_buffer.c 44 45 static void perf_output_put_handle(struct perf_output_handle *handle) 46 { 47 struct ring_buffer *rb = handle->rb; 48 unsigned long head; 49 50 again: 51 head = local_read(&rb->head); 52 53 /* 54 * IRQ/NMI can happen here, which means we can miss a head update. 55 */ 56 57 if (!local_dec_and_test(&rb->nest)) 58 goto out; 59 60 /* 61 * Since the mmap() consumer (userspace) can run on a different CPU: 62 * 63 * kernel user 64 * 65 * if (LOAD ->data_tail) { LOAD ->data_head 66 * (A) smp_rmb() (C) 67 * STORE $data LOAD $data 68 * smp_mb() (D) 69 * RELEASE ->data_head (B) STORE ->data_tail 70 * } 71 * 72 * Where A pairs with D, and B pairs with C. 73 * 74 * In our case (A) is a control dependency that separates the load of 75 * the ->data_tail and the stores of $data. In case ->data_tail 76 * indicates there is no room in the buffer to store $data we do not. 77 * 78 * D needs to be a full barrier since it separates the data READ 79 * from the tail WRITE. 80 * 81 * For B a WMB is sufficient since it separates two WRITEs, and for C 82 * an RMB is sufficient since it separates two READs. 83 * 84 * See perf_output_begin(). 85 */ > 86 smp_store_release(&rb->user_page->data_head, head); /* B, matches C */ 87 88 /* 89 * Now check if we missed an update -- rely on previous implied 90 * compiler barriers to force a re-read. 91 */ 92 if (unlikely(head != local_read(&rb->head))) { 93 local_inc(&rb->nest); 94 goto again; 95 } 96 97 if (handle->wakeup != local_read(&rb->wakeup)) 98 perf_output_wakeup(handle); 99 100 out: 101 preempt_enable(); 102 } 103 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
Hi Mark, I love your patch! Perhaps something to improve: [auto build test WARNING on tip/perf/core] [also build test WARNING on v4.17-rc4 next-20180510] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Mark-Rutland/perf-ring_buffer-ensure-atomicity-and-order-of-updates/20180511-070205 config: i386-randconfig-s0-201818 (attached as .config) compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026 reproduce: # save the attached .config to linux build tree make ARCH=i386 All warnings (new ones prefixed by >>): In file included from include/uapi/linux/swab.h:6:0, from include/linux/swab.h:5, from include/uapi/linux/byteorder/little_endian.h:13, from include/linux/byteorder/little_endian.h:5, from arch/x86/include/uapi/asm/byteorder.h:5, from include/uapi/linux/perf_event.h:20, from include/linux/perf_event.h:17, from kernel/events/ring_buffer.c:12: kernel/events/ring_buffer.c: In function 'perf_output_put_handle': include/linux/compiler.h:339:38: error: call to '__compiletime_assert_86' declared with attribute error: Need native word sized stores/loads for atomicity. _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert' prefix ## suffix(); \ ^~~~~~ include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~~~~~~~~~~~~~~~~~ include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert' compiletime_assert(__native_word(t), \ ^~~~~~~~~~~~~~~~~~ >> arch/x86/include/asm/barrier.h:69:2: note: in expansion of macro 'compiletime_assert_atomic_type' compiletime_assert_atomic_type(*p); \ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> include/asm-generic/barrier.h:157:33: note: in expansion of macro '__smp_store_release' #define smp_store_release(p, v) __smp_store_release(p, v) ^~~~~~~~~~~~~~~~~~~ kernel/events/ring_buffer.c:86:2: note: in expansion of macro 'smp_store_release' smp_store_release(&rb->user_page->data_head, head); /* B, matches C */ ^~~~~~~~~~~~~~~~~ kernel/events/ring_buffer.c: In function 'perf_aux_output_skip': include/linux/compiler.h:339:38: error: call to '__compiletime_assert_498' declared with attribute error: Need native word sized stores/loads for atomicity. _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert' prefix ## suffix(); \ ^~~~~~ include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~~~~~~~~~~~~~~~~~ include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert' compiletime_assert(__native_word(t), \ ^~~~~~~~~~~~~~~~~~ >> arch/x86/include/asm/barrier.h:69:2: note: in expansion of macro 'compiletime_assert_atomic_type' compiletime_assert_atomic_type(*p); \ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> include/asm-generic/barrier.h:157:33: note: in expansion of macro '__smp_store_release' #define smp_store_release(p, v) __smp_store_release(p, v) ^~~~~~~~~~~~~~~~~~~ kernel/events/ring_buffer.c:498:2: note: in expansion of macro 'smp_store_release' smp_store_release(&rb->user_page->aux_head, rb->aux_head); ^~~~~~~~~~~~~~~~~ kernel/events/ring_buffer.c: In function 'perf_aux_output_end': include/linux/compiler.h:339:38: error: call to '__compiletime_assert_466' declared with attribute error: Need native word sized stores/loads for atomicity. _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert' prefix ## suffix(); \ ^~~~~~ include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~~~~~~~~~~~~~~~~~ include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert' compiletime_assert(__native_word(t), \ ^~~~~~~~~~~~~~~~~~ >> arch/x86/include/asm/barrier.h:69:2: note: in expansion of macro 'compiletime_assert_atomic_type' compiletime_assert_atomic_type(*p); \ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> include/asm-generic/barrier.h:157:33: note: in expansion of macro '__smp_store_release' #define smp_store_release(p, v) __smp_store_release(p, v) ^~~~~~~~~~~~~~~~~~~ kernel/events/ring_buffer.c:466:2: note: in expansion of macro 'smp_store_release' smp_store_release(&rb->user_page->aux_head, rb->aux_head); ^~~~~~~~~~~~~~~~~ -- In file included from include/uapi/linux/swab.h:6:0, from include/linux/swab.h:5, from include/uapi/linux/byteorder/little_endian.h:13, from include/linux/byteorder/little_endian.h:5, from arch/x86/include/uapi/asm/byteorder.h:5, from include/uapi/linux/perf_event.h:20, from include/linux/perf_event.h:17, from kernel//events/ring_buffer.c:12: kernel//events/ring_buffer.c: In function 'perf_output_put_handle': include/linux/compiler.h:339:38: error: call to '__compiletime_assert_86' declared with attribute error: Need native word sized stores/loads for atomicity. _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert' prefix ## suffix(); \ ^~~~~~ include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~~~~~~~~~~~~~~~~~ include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert' compiletime_assert(__native_word(t), \ ^~~~~~~~~~~~~~~~~~ >> arch/x86/include/asm/barrier.h:69:2: note: in expansion of macro 'compiletime_assert_atomic_type' compiletime_assert_atomic_type(*p); \ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> include/asm-generic/barrier.h:157:33: note: in expansion of macro '__smp_store_release' #define smp_store_release(p, v) __smp_store_release(p, v) ^~~~~~~~~~~~~~~~~~~ kernel//events/ring_buffer.c:86:2: note: in expansion of macro 'smp_store_release' smp_store_release(&rb->user_page->data_head, head); /* B, matches C */ ^~~~~~~~~~~~~~~~~ kernel//events/ring_buffer.c: In function 'perf_aux_output_skip': include/linux/compiler.h:339:38: error: call to '__compiletime_assert_498' declared with attribute error: Need native word sized stores/loads for atomicity. _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert' prefix ## suffix(); \ ^~~~~~ include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~~~~~~~~~~~~~~~~~ include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert' compiletime_assert(__native_word(t), \ ^~~~~~~~~~~~~~~~~~ >> arch/x86/include/asm/barrier.h:69:2: note: in expansion of macro 'compiletime_assert_atomic_type' compiletime_assert_atomic_type(*p); \ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> include/asm-generic/barrier.h:157:33: note: in expansion of macro '__smp_store_release' #define smp_store_release(p, v) __smp_store_release(p, v) ^~~~~~~~~~~~~~~~~~~ kernel//events/ring_buffer.c:498:2: note: in expansion of macro 'smp_store_release' smp_store_release(&rb->user_page->aux_head, rb->aux_head); ^~~~~~~~~~~~~~~~~ kernel//events/ring_buffer.c: In function 'perf_aux_output_end': include/linux/compiler.h:339:38: error: call to '__compiletime_assert_466' declared with attribute error: Need native word sized stores/loads for atomicity. _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^ include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert' prefix ## suffix(); \ ^~~~~~ include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) ^~~~~~~~~~~~~~~~~~~ include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert' compiletime_assert(__native_word(t), \ ^~~~~~~~~~~~~~~~~~ >> arch/x86/include/asm/barrier.h:69:2: note: in expansion of macro 'compiletime_assert_atomic_type' compiletime_assert_atomic_type(*p); \ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> include/asm-generic/barrier.h:157:33: note: in expansion of macro '__smp_store_release' #define smp_store_release(p, v) __smp_store_release(p, v) ^~~~~~~~~~~~~~~~~~~ kernel//events/ring_buffer.c:466:2: note: in expansion of macro 'smp_store_release' smp_store_release(&rb->user_page->aux_head, rb->aux_head); ^~~~~~~~~~~~~~~~~ vim +/compiletime_assert_atomic_type +69 arch/x86/include/asm/barrier.h 47933ad4 Peter Zijlstra 2013-11-06 66 1638fb72 Michael S. Tsirkin 2015-12-27 67 #define __smp_store_release(p, v) \ 47933ad4 Peter Zijlstra 2013-11-06 68 do { \ 47933ad4 Peter Zijlstra 2013-11-06 @69 compiletime_assert_atomic_type(*p); \ 47933ad4 Peter Zijlstra 2013-11-06 70 barrier(); \ 76695af2 Andrey Konovalov 2015-08-02 71 WRITE_ONCE(*p, v); \ 47933ad4 Peter Zijlstra 2013-11-06 72 } while (0) 47933ad4 Peter Zijlstra 2013-11-06 73 :::::: The code at line 69 was first introduced by commit :::::: 47933ad41a86a4a9b50bed7c9b9bd2ba242aac63 arch: Introduce smp_load_acquire(), smp_store_release() :::::: TO: Peter Zijlstra <peterz@infradead.org> :::::: CC: Ingo Molnar <mingo@kernel.org> --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Thu, May 10, 2018 at 02:06:32PM +0100, Mark Rutland wrote: > - smp_wmb(); /* B, matches C */ > - rb->user_page->data_head = head; > + smp_store_release(&rb->user_page->data_head, head); /* B, matches C */ > - rb->user_page->aux_head = rb->aux_head; > + smp_store_release(&rb->user_page->aux_head, rb->aux_head); > - rb->user_page->aux_head = rb->aux_head; > + smp_store_release(&rb->user_page->aux_head, rb->aux_head); The kbuild test robot has helpfully discovered another latent bug here. We assume we can make single-copy-atomic accesses to {aux,data}_{head,tail}, but this isn't necessarily true on 32-bit architectures, and smp_store_release() rightly complains at build time. READ_ONCE() and WRITE_ONCE() "helpfully" make a silent fallback to a memcpy in this case, so we're broken today, regardless of this change. I suspect that in practice we get single-copy-atomicity for the 32-bit halves, and sessions likely produce less than 4GiB of ringbuffer data, so failures would be rare. I'm not sure how to fix the ABI here. The same issue applies on the userspace side, so whatever we do we need to fix both sides. Thanks, Mark.
On Fri, May 11, 2018 at 11:59:32AM +0100, Mark Rutland wrote: > READ_ONCE() and WRITE_ONCE() "helpfully" make a silent fallback to a > memcpy in this case, so we're broken today, regardless of this change. > > I suspect that in practice we get single-copy-atomicity for the 32-bit > halves, and sessions likely produce less than 4GiB of ringbuffer data, > so failures would be rare. This should not be a problem because of the 32bit adress space limit, which would necessarily limit us to the low word. Also note that in perf_output_put_handle(), where we write ->data_head, the store is from an 'unsigned long'. So on 32bit that will result in a zero high word. Similarly, in __perf_output_begin() we read ->data_tail into an unsigned long, which will discard the high word. So userspace should always read (head) a zero high word, irrespective of a split store (2x32bit), and the kernel will disregard the high word on reading (tail), irrespective of what userspace put there. This is all a bit subtle, and could probably use a comment, but it ought to work..
On Fri, May 11, 2018 at 06:22:29PM +0200, Peter Zijlstra wrote: > On Fri, May 11, 2018 at 11:59:32AM +0100, Mark Rutland wrote: > > READ_ONCE() and WRITE_ONCE() "helpfully" make a silent fallback to a > > memcpy in this case, so we're broken today, regardless of this change. > > > > I suspect that in practice we get single-copy-atomicity for the 32-bit > > halves, and sessions likely produce less than 4GiB of ringbuffer data, > > so failures would be rare. > > This should not be a problem because of the 32bit adress space limit, > which would necessarily limit us to the low word. For the wrapped values, yes. I thought that the head and tail values were meant to be free-running, but I can't see where I got that idea from now that I've gone digging again. > Also note that in perf_output_put_handle(), where we write ->data_head, > the store is from an 'unsigned long'. So on 32bit that will result in a > zero high word. Similarly, in __perf_output_begin() we read ->data_tail > into an unsigned long, which will discard the high word. Ah, that's a fair point. So it's just compat userspace that this is potentially borked for. ;) > So userspace should always read (head) a zero high word, irrespective of > a split store (2x32bit), and the kernel will disregard the high word on > reading (tail), irrespective of what userspace put there. > > This is all a bit subtle, and could probably use a comment, but it ought > to work.. It would be nice to guarantee that we don't lose 32-bit atomicity by virtue of {READ,WRITE}_ONCE() falling back to memcpy in this case, so maybe we should wrap this in some helpers. I'll see if I can come up with something which isn't hideous, or I might just pretend I never stumbled across this. :) Thanks, Mark.
On Mon, May 14, 2018 at 12:05:33PM +0100, Mark Rutland wrote: > On Fri, May 11, 2018 at 06:22:29PM +0200, Peter Zijlstra wrote: > > On Fri, May 11, 2018 at 11:59:32AM +0100, Mark Rutland wrote: > > > READ_ONCE() and WRITE_ONCE() "helpfully" make a silent fallback to a > > > memcpy in this case, so we're broken today, regardless of this change. > > > > > > I suspect that in practice we get single-copy-atomicity for the 32-bit > > > halves, and sessions likely produce less than 4GiB of ringbuffer data, > > > so failures would be rare. > > > > This should not be a problem because of the 32bit adress space limit, > > which would necessarily limit us to the low word. > > For the wrapped values, yes. > > I thought that the head and tail values were meant to be free-running, > but I can't see where I got that idea from now that I've gone digging > again. They are indeed free running. > > Also note that in perf_output_put_handle(), where we write ->data_head, > > the store is from an 'unsigned long'. So on 32bit that will result in a > > zero high word. Similarly, in __perf_output_begin() we read ->data_tail > > into an unsigned long, which will discard the high word. > > Ah, that's a fair point. So it's just compat userspace that this is > potentially borked for. ;) Right.. #$$#@ compat. Hurmph.. not sure how to go about fixing that there. > > So userspace should always read (head) a zero high word, irrespective of > > a split store (2x32bit), and the kernel will disregard the high word on > > reading (tail), irrespective of what userspace put there. > > > > This is all a bit subtle, and could probably use a comment, but it ought > > to work.. > > It would be nice to guarantee that we don't lose 32-bit atomicity by > virtue of {READ,WRITE}_ONCE() falling back to memcpy in this case, so > maybe we should wrap this in some helpers. Our __READ_ONCE_SIZE / __write_once_size include case 8 unconditionally. So we'll always issue a volatile u64 load/store and let the compiler figure out how to do that -- typically 2 load/stores I would imagine.
On Mon, May 14, 2018 at 01:28:15PM +0200, Peter Zijlstra wrote: > On Mon, May 14, 2018 at 12:05:33PM +0100, Mark Rutland wrote: > > > Also note that in perf_output_put_handle(), where we write ->data_head, > > > the store is from an 'unsigned long'. So on 32bit that will result in a > > > zero high word. Similarly, in __perf_output_begin() we read ->data_tail > > > into an unsigned long, which will discard the high word. > > > > Ah, that's a fair point. So it's just compat userspace that this is > > potentially borked for. ;) > > Right.. #$$#@ compat. Hurmph.. not sure how to go about fixing that > there. How's this? --- include/linux/perf_event.h | 12 ++++++++++++ kernel/events/core.c | 31 +++++++++++++++++++++++++++++-- kernel/events/ring_buffer.c | 39 ++++++++++++++++++++++++++++++++++----- 3 files changed, 75 insertions(+), 7 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index e71e99eb9a4e..7834dfb6a83b 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -517,6 +517,7 @@ typedef void (*perf_overflow_handler_t)(struct perf_event *, */ #define PERF_EV_CAP_SOFTWARE BIT(0) #define PERF_EV_CAP_READ_ACTIVE_PKG BIT(1) +#define PERF_EV_CAP_COMPAT BIT(2) #define SWEVENT_HLIST_BITS 8 #define SWEVENT_HLIST_SIZE (1 << SWEVENT_HLIST_BITS) @@ -1220,6 +1221,11 @@ static inline bool is_write_backward(struct perf_event *event) return !!event->attr.write_backward; } +static inline bool is_compat_event(struct perf_event *event) +{ + return event->event_caps & PERF_EV_CAP_COMPAT; +} + static inline bool has_addr_filter(struct perf_event *event) { return event->pmu->nr_addr_filters; @@ -1249,6 +1255,12 @@ extern int perf_output_begin_forward(struct perf_output_handle *handle, extern int perf_output_begin_backward(struct perf_output_handle *handle, struct perf_event *event, unsigned int size); +extern int perf_output_begin_forward_compat(struct perf_output_handle *handle, + struct perf_event *event, + unsigned int size); +extern int perf_output_begin_backward_compat(struct perf_output_handle *handle, + struct perf_event *event, + unsigned int size); extern void perf_output_end(struct perf_output_handle *handle); extern unsigned int perf_output_copy(struct perf_output_handle *handle, diff --git a/kernel/events/core.c b/kernel/events/core.c index 67612ce359ad..0e60f35442a6 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -6523,6 +6523,22 @@ perf_event_output_backward(struct perf_event *event, __perf_event_output(event, data, regs, perf_output_begin_backward); } +void +perf_event_output_forward_compat(struct perf_event *event, + struct perf_sample_data *data, + struct pt_regs *regs) +{ + __perf_event_output(event, data, regs, perf_output_begin_forward_compat); +} + +void +perf_event_output_backward_compat(struct perf_event *event, + struct perf_sample_data *data, + struct pt_regs *regs) +{ + __perf_event_output(event, data, regs, perf_output_begin_backward_compat); +} + void perf_event_output(struct perf_event *event, struct perf_sample_data *data, @@ -10000,10 +10016,16 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, event->overflow_handler = overflow_handler; event->overflow_handler_context = context; } else if (is_write_backward(event)){ - event->overflow_handler = perf_event_output_backward; + if (is_compat_event(event)) + event->overflow_handler = perf_event_output_backward_compat; + else + event->overflow_handler = perf_event_output_backward; event->overflow_handler_context = NULL; } else { - event->overflow_handler = perf_event_output_forward; + if (is_compat_event(event)) + event->overflow_handler = perf_event_output_forward_compat; + else + event->overflow_handler = perf_event_output_forward; event->overflow_handler_context = NULL; } @@ -10266,6 +10288,8 @@ perf_event_set_output(struct perf_event *event, struct perf_event *output_event) if (is_write_backward(output_event) != is_write_backward(event)) goto out; + if (is_compat_event(output_event) != is_compat_event(event)) + goto out; /* * If both events generate aux data, they must be on the same PMU */ @@ -10499,6 +10523,9 @@ SYSCALL_DEFINE5(perf_event_open, goto err_cred; } + if (in_compat_syscall()) + event->event_caps |= PERF_EV_CAP_COMPAT; + if (is_sampling_event(event)) { if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) { err = -EOPNOTSUPP; diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c index 1d8ca9ea9979..8a8ca117e5de 100644 --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -85,7 +85,10 @@ static void perf_output_put_handle(struct perf_output_handle *handle) * See perf_output_begin(). */ smp_wmb(); /* B, matches C */ - rb->user_page->data_head = head; + /* + * 'unsigned long' to 'u64' promotion will zero the high word on 32 bit. + */ + WRITE_ONCE(rb->user_page->data_head, head); /* * Now check if we missed an update -- rely on previous implied @@ -117,7 +120,7 @@ ring_buffer_has_space(unsigned long head, unsigned long tail, static int __always_inline __perf_output_begin(struct perf_output_handle *handle, struct perf_event *event, unsigned int size, - bool backward) + bool backward, bool compat) { struct ring_buffer *rb; unsigned long tail, offset, head; @@ -158,7 +161,13 @@ __perf_output_begin(struct perf_output_handle *handle, perf_output_get_handle(handle); do { + /* + * 'u64' to 'unsigned long' demotion looses the high word on 32 bit. + */ tail = READ_ONCE(rb->user_page->data_tail); + if (compat) + tail &= UINT_MAX; + offset = head = local_read(&rb->head); if (!rb->overwrite) { if (unlikely(!ring_buffer_has_space(head, tail, @@ -183,6 +192,13 @@ __perf_output_begin(struct perf_output_handle *handle, head += size; else head -= size; + + /* + * Ensure rb->head has the high word clear for compat; this + * avoids having to muck with perf_output_put_handle(). + */ + if (compat) + head &= UINT_MAX; } while (local_cmpxchg(&rb->head, offset, head) != offset); if (backward) { @@ -234,13 +250,25 @@ __perf_output_begin(struct perf_output_handle *handle, int perf_output_begin_forward(struct perf_output_handle *handle, struct perf_event *event, unsigned int size) { - return __perf_output_begin(handle, event, size, false); + return __perf_output_begin(handle, event, size, false, false); } int perf_output_begin_backward(struct perf_output_handle *handle, struct perf_event *event, unsigned int size) { - return __perf_output_begin(handle, event, size, true); + return __perf_output_begin(handle, event, size, true, false); +} + +int perf_output_begin_forward_compat(struct perf_output_handle *handle, + struct perf_event *event, unsigned int size) +{ + return __perf_output_begin(handle, event, size, false, true); +} + +int perf_output_begin_backward_compat(struct perf_output_handle *handle, + struct perf_event *event, unsigned int size) +{ + return __perf_output_begin(handle, event, size, true, true); } int perf_output_begin(struct perf_output_handle *handle, @@ -248,7 +276,8 @@ int perf_output_begin(struct perf_output_handle *handle, { return __perf_output_begin(handle, event, size, - unlikely(is_write_backward(event))); + unlikely(is_write_backward(event)), + unlikely(is_compat_event(event)); } unsigned int perf_output_copy(struct perf_output_handle *handle,
On Mon, May 14, 2018 at 05:02:13PM +0200, Peter Zijlstra wrote: > On Mon, May 14, 2018 at 01:28:15PM +0200, Peter Zijlstra wrote: > > On Mon, May 14, 2018 at 12:05:33PM +0100, Mark Rutland wrote: > > > > > Also note that in perf_output_put_handle(), where we write ->data_head, > > > > the store is from an 'unsigned long'. So on 32bit that will result in a > > > > zero high word. Similarly, in __perf_output_begin() we read ->data_tail > > > > into an unsigned long, which will discard the high word. > > > > > > Ah, that's a fair point. So it's just compat userspace that this is > > > potentially borked for. ;) > > > > Right.. #$$#@ compat. Hurmph.. not sure how to go about fixing that > > there. > > How's this? > > --- > include/linux/perf_event.h | 12 ++++++++++++ > kernel/events/core.c | 31 +++++++++++++++++++++++++++++-- > kernel/events/ring_buffer.c | 39 ++++++++++++++++++++++++++++++++++----- > 3 files changed, 75 insertions(+), 7 deletions(-) > > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > index e71e99eb9a4e..7834dfb6a83b 100644 > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -517,6 +517,7 @@ typedef void (*perf_overflow_handler_t)(struct perf_event *, > */ > #define PERF_EV_CAP_SOFTWARE BIT(0) > #define PERF_EV_CAP_READ_ACTIVE_PKG BIT(1) > +#define PERF_EV_CAP_COMPAT BIT(2) > > #define SWEVENT_HLIST_BITS 8 > #define SWEVENT_HLIST_SIZE (1 << SWEVENT_HLIST_BITS) > @@ -1220,6 +1221,11 @@ static inline bool is_write_backward(struct perf_event *event) > return !!event->attr.write_backward; > } > > +static inline bool is_compat_event(struct perf_event *event) > +{ > + return event->event_caps & PERF_EV_CAP_COMPAT; > +} > @@ -10499,6 +10523,9 @@ SYSCALL_DEFINE5(perf_event_open, > goto err_cred; > } > > + if (in_compat_syscall()) > + event->event_caps |= PERF_EV_CAP_COMPAT; > + After a native perf_event_open, you could pass the fd (or exec) to another task that was compat (or vice-versa), so this wouldn't work in that case (crazy as it may be). I don't have a better suggestion at present, though. Thanks, Mark.
On Mon, May 14, 2018 at 04:20:23PM +0100, Mark Rutland wrote: > > @@ -10499,6 +10523,9 @@ SYSCALL_DEFINE5(perf_event_open, > > goto err_cred; > > } > > > > + if (in_compat_syscall()) > > + event->event_caps |= PERF_EV_CAP_COMPAT; > > + > > After a native perf_event_open, you could pass the fd (or exec) to > another task that was compat (or vice-versa), so this wouldn't work in > that case (crazy as it may be). > > I don't have a better suggestion at present, though. As discussed on IRC, we could trigger off of the buffer size; if the buffer is <4G the &= UINT_MAX is harmless, if the buffer is larger, you have to be using a 64bit thingy anyway. Flipping the overflow functions around on attach/detach to buffers is a little more dodgy, but could be done I suppose.
On Thu, May 10, 2018 at 02:06:32PM +0100, Mark Rutland wrote: > Userspace can read/write the user page at any point in time, and in > perf_output_put_handle() we're very careful to use memory barriers to > ensure ordering between updates to data and the user page. > > We don't use barriers when updating aux_head, where similar ordering > constraints apply. This could result in userspace seeing stale data, or > data being overwritten while userspace was still consuming it. > > Further, we update data_head and aux_head with plain assignments, which > the compiler can tear, potentially resulting in userspace seeing > erroneous values. > > We can solve both of these problems by using smp_store_release to update > data_head and aux_head, so let's do so. > > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Will Deacon <will.deacon@arm.com> > --- > kernel/events/ring_buffer.c | 13 ++++++------- > 1 file changed, 6 insertions(+), 7 deletions(-) > > diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c > index 6c6b3c48db71..839b207e4c77 100644 > --- a/kernel/events/ring_buffer.c > +++ b/kernel/events/ring_buffer.c > @@ -63,10 +63,10 @@ static void perf_output_put_handle(struct perf_output_handle *handle) > * kernel user > * > * if (LOAD ->data_tail) { LOAD ->data_head > - * (A) smp_rmb() (C) > + * (A) smp_rmb() (C) > * STORE $data LOAD $data > - * smp_wmb() (B) smp_mb() (D) > - * STORE ->data_head STORE ->data_tail > + * smp_mb() (D) > + * RELEASE ->data_head (B) STORE ->data_tail > * } One thing to be aware of here is that the choice of ordering primitive (e.g. using fences vs acquire/release operations) has the potential to create ABI with userspace. I don't know of any architectures which currently care, but if were were to merge a non multi-copy atomic architecture with native acquire/release instructions, you could see issues if e.g. userspace used smp_rmb(); READ_ONCE but the kernel used a RELEASE store. Anyway, that's currently theoretical, but I think it's an argument for putting these accessors in a uapi header. Will
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c index 6c6b3c48db71..839b207e4c77 100644 --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -63,10 +63,10 @@ static void perf_output_put_handle(struct perf_output_handle *handle) * kernel user * * if (LOAD ->data_tail) { LOAD ->data_head - * (A) smp_rmb() (C) + * (A) smp_rmb() (C) * STORE $data LOAD $data - * smp_wmb() (B) smp_mb() (D) - * STORE ->data_head STORE ->data_tail + * smp_mb() (D) + * RELEASE ->data_head (B) STORE ->data_tail * } * * Where A pairs with D, and B pairs with C. @@ -83,8 +83,7 @@ static void perf_output_put_handle(struct perf_output_handle *handle) * * See perf_output_begin(). */ - smp_wmb(); /* B, matches C */ - rb->user_page->data_head = head; + smp_store_release(&rb->user_page->data_head, head); /* B, matches C */ /* * Now check if we missed an update -- rely on previous implied @@ -464,7 +463,7 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size) handle->aux_flags); } - rb->user_page->aux_head = rb->aux_head; + smp_store_release(&rb->user_page->aux_head, rb->aux_head); if (rb_need_aux_wakeup(rb)) wakeup = true; @@ -496,7 +495,7 @@ int perf_aux_output_skip(struct perf_output_handle *handle, unsigned long size) rb->aux_head += size; - rb->user_page->aux_head = rb->aux_head; + smp_store_release(&rb->user_page->aux_head, rb->aux_head); if (rb_need_aux_wakeup(rb)) { perf_output_wakeup(handle); handle->wakeup = rb->aux_wakeup + rb->aux_watermark;
Userspace can read/write the user page at any point in time, and in perf_output_put_handle() we're very careful to use memory barriers to ensure ordering between updates to data and the user page. We don't use barriers when updating aux_head, where similar ordering constraints apply. This could result in userspace seeing stale data, or data being overwritten while userspace was still consuming it. Further, we update data_head and aux_head with plain assignments, which the compiler can tear, potentially resulting in userspace seeing erroneous values. We can solve both of these problems by using smp_store_release to update data_head and aux_head, so let's do so. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> --- kernel/events/ring_buffer.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) -- 2.11.0