perf/ring_buffer: ensure atomicity and order of updates

Message ID 20180510130632.34497-1-mark.rutland@arm.com
State New
Headers show
Series
  • perf/ring_buffer: ensure atomicity and order of updates
Related show

Commit Message

Mark Rutland May 10, 2018, 1:06 p.m.
Userspace can read/write the user page at any point in time, and in
perf_output_put_handle() we're very careful to use memory barriers to
ensure ordering between updates to data and the user page.

We don't use barriers when updating aux_head, where similar ordering
constraints apply. This could result in userspace seeing stale data, or
data being overwritten while userspace was still consuming it.

Further, we update data_head and aux_head with plain assignments, which
the compiler can tear, potentially resulting in userspace seeing
erroneous values.

We can solve both of these problems by using smp_store_release to update
data_head and aux_head, so let's do so.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 kernel/events/ring_buffer.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

-- 
2.11.0

Comments

kernel test robot May 11, 2018, 1:19 a.m. | #1
Hi Mark,

I love your patch! Perhaps something to improve:

[auto build test WARNING on tip/perf/core]
[also build test WARNING on v4.17-rc4 next-20180510]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Mark-Rutland/perf-ring_buffer-ensure-atomicity-and-order-of-updates/20180511-070205
config: i386-randconfig-s1-201818 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   In file included from include/uapi/linux/swab.h:6:0,
                    from include/linux/swab.h:5,
                    from include/uapi/linux/byteorder/little_endian.h:13,
                    from include/linux/byteorder/little_endian.h:5,
                    from arch/x86/include/uapi/asm/byteorder.h:5,
                    from include/uapi/linux/perf_event.h:20,
                    from include/linux/perf_event.h:17,
                    from kernel//events/ring_buffer.c:12:
   kernel//events/ring_buffer.c: In function 'perf_output_put_handle':
   include/linux/compiler.h:339:38: error: call to '__compiletime_assert_86' declared with attribute error: Need native word sized stores/loads for atomicity.
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
                                         ^
   include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert'
       prefix ## suffix();    \
       ^~~~~~
   include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
     ^~~~~~~~~~~~~~~~~~~
>> include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert'

     compiletime_assert(__native_word(t),    \
     ^~~~~~~~~~~~~~~~~~
>> include/asm-generic/barrier.h:181:2: note: in expansion of macro 'compiletime_assert_atomic_type'

     compiletime_assert_atomic_type(*p);    \
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> kernel//events/ring_buffer.c:86:2: note: in expansion of macro 'smp_store_release'

     smp_store_release(&rb->user_page->data_head, head); /* B, matches C */
     ^~~~~~~~~~~~~~~~~
   kernel//events/ring_buffer.c: In function 'perf_aux_output_skip':
   include/linux/compiler.h:339:38: error: call to '__compiletime_assert_498' declared with attribute error: Need native word sized stores/loads for atomicity.
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
                                         ^
   include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert'
       prefix ## suffix();    \
       ^~~~~~
   include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
     ^~~~~~~~~~~~~~~~~~~
>> include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert'

     compiletime_assert(__native_word(t),    \
     ^~~~~~~~~~~~~~~~~~
>> include/asm-generic/barrier.h:181:2: note: in expansion of macro 'compiletime_assert_atomic_type'

     compiletime_assert_atomic_type(*p);    \
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel//events/ring_buffer.c:498:2: note: in expansion of macro 'smp_store_release'
     smp_store_release(&rb->user_page->aux_head, rb->aux_head);
     ^~~~~~~~~~~~~~~~~
   kernel//events/ring_buffer.c: In function 'perf_aux_output_end':
   include/linux/compiler.h:339:38: error: call to '__compiletime_assert_466' declared with attribute error: Need native word sized stores/loads for atomicity.
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
                                         ^
   include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert'
       prefix ## suffix();    \
       ^~~~~~
   include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
     ^~~~~~~~~~~~~~~~~~~
>> include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert'

     compiletime_assert(__native_word(t),    \
     ^~~~~~~~~~~~~~~~~~
>> include/asm-generic/barrier.h:181:2: note: in expansion of macro 'compiletime_assert_atomic_type'

     compiletime_assert_atomic_type(*p);    \
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel//events/ring_buffer.c:466:2: note: in expansion of macro 'smp_store_release'
     smp_store_release(&rb->user_page->aux_head, rb->aux_head);
     ^~~~~~~~~~~~~~~~~
--
   In file included from include/uapi/linux/swab.h:6:0,
                    from include/linux/swab.h:5,
                    from include/uapi/linux/byteorder/little_endian.h:13,
                    from include/linux/byteorder/little_endian.h:5,
                    from arch/x86/include/uapi/asm/byteorder.h:5,
                    from include/uapi/linux/perf_event.h:20,
                    from include/linux/perf_event.h:17,
                    from kernel/events/ring_buffer.c:12:
   kernel/events/ring_buffer.c: In function 'perf_output_put_handle':
   include/linux/compiler.h:339:38: error: call to '__compiletime_assert_86' declared with attribute error: Need native word sized stores/loads for atomicity.
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
                                         ^
   include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert'
       prefix ## suffix();    \
       ^~~~~~
   include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
     ^~~~~~~~~~~~~~~~~~~
>> include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert'

     compiletime_assert(__native_word(t),    \
     ^~~~~~~~~~~~~~~~~~
>> include/asm-generic/barrier.h:181:2: note: in expansion of macro 'compiletime_assert_atomic_type'

     compiletime_assert_atomic_type(*p);    \
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/events/ring_buffer.c:86:2: note: in expansion of macro 'smp_store_release'
     smp_store_release(&rb->user_page->data_head, head); /* B, matches C */
     ^~~~~~~~~~~~~~~~~
   kernel/events/ring_buffer.c: In function 'perf_aux_output_skip':
   include/linux/compiler.h:339:38: error: call to '__compiletime_assert_498' declared with attribute error: Need native word sized stores/loads for atomicity.
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
                                         ^
   include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert'
       prefix ## suffix();    \
       ^~~~~~
   include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
     ^~~~~~~~~~~~~~~~~~~
>> include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert'

     compiletime_assert(__native_word(t),    \
     ^~~~~~~~~~~~~~~~~~
>> include/asm-generic/barrier.h:181:2: note: in expansion of macro 'compiletime_assert_atomic_type'

     compiletime_assert_atomic_type(*p);    \
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/events/ring_buffer.c:498:2: note: in expansion of macro 'smp_store_release'
     smp_store_release(&rb->user_page->aux_head, rb->aux_head);
     ^~~~~~~~~~~~~~~~~
   kernel/events/ring_buffer.c: In function 'perf_aux_output_end':
   include/linux/compiler.h:339:38: error: call to '__compiletime_assert_466' declared with attribute error: Need native word sized stores/loads for atomicity.
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
                                         ^
   include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert'
       prefix ## suffix();    \
       ^~~~~~
   include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
     ^~~~~~~~~~~~~~~~~~~
>> include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert'

     compiletime_assert(__native_word(t),    \
     ^~~~~~~~~~~~~~~~~~
>> include/asm-generic/barrier.h:181:2: note: in expansion of macro 'compiletime_assert_atomic_type'

     compiletime_assert_atomic_type(*p);    \
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/events/ring_buffer.c:466:2: note: in expansion of macro 'smp_store_release'
     smp_store_release(&rb->user_page->aux_head, rb->aux_head);
     ^~~~~~~~~~~~~~~~~

vim +/smp_store_release +86 kernel//events/ring_buffer.c

    44	
    45	static void perf_output_put_handle(struct perf_output_handle *handle)
    46	{
    47		struct ring_buffer *rb = handle->rb;
    48		unsigned long head;
    49	
    50	again:
    51		head = local_read(&rb->head);
    52	
    53		/*
    54		 * IRQ/NMI can happen here, which means we can miss a head update.
    55		 */
    56	
    57		if (!local_dec_and_test(&rb->nest))
    58			goto out;
    59	
    60		/*
    61		 * Since the mmap() consumer (userspace) can run on a different CPU:
    62		 *
    63		 *   kernel				user
    64		 *
    65		 *   if (LOAD ->data_tail) {		LOAD ->data_head
    66		 *				(A) 	smp_rmb()	(C)
    67		 *	STORE $data			LOAD $data
    68		 *					smp_mb()	(D)
    69		 *	RELEASE ->data_head	(B)	STORE ->data_tail
    70		 *   }
    71		 *
    72		 * Where A pairs with D, and B pairs with C.
    73		 *
    74		 * In our case (A) is a control dependency that separates the load of
    75		 * the ->data_tail and the stores of $data. In case ->data_tail
    76		 * indicates there is no room in the buffer to store $data we do not.
    77		 *
    78		 * D needs to be a full barrier since it separates the data READ
    79		 * from the tail WRITE.
    80		 *
    81		 * For B a WMB is sufficient since it separates two WRITEs, and for C
    82		 * an RMB is sufficient since it separates two READs.
    83		 *
    84		 * See perf_output_begin().
    85		 */
  > 86		smp_store_release(&rb->user_page->data_head, head); /* B, matches C */

    87	
    88		/*
    89		 * Now check if we missed an update -- rely on previous implied
    90		 * compiler barriers to force a re-read.
    91		 */
    92		if (unlikely(head != local_read(&rb->head))) {
    93			local_inc(&rb->nest);
    94			goto again;
    95		}
    96	
    97		if (handle->wakeup != local_read(&rb->wakeup))
    98			perf_output_wakeup(handle);
    99	
   100	out:
   101		preempt_enable();
   102	}
   103	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
kernel test robot May 11, 2018, 1:19 a.m. | #2
Hi Mark,

I love your patch! Perhaps something to improve:

[auto build test WARNING on tip/perf/core]
[also build test WARNING on v4.17-rc4 next-20180510]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Mark-Rutland/perf-ring_buffer-ensure-atomicity-and-order-of-updates/20180511-070205
config: i386-randconfig-s0-201818 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All warnings (new ones prefixed by >>):

   In file included from include/uapi/linux/swab.h:6:0,
                    from include/linux/swab.h:5,
                    from include/uapi/linux/byteorder/little_endian.h:13,
                    from include/linux/byteorder/little_endian.h:5,
                    from arch/x86/include/uapi/asm/byteorder.h:5,
                    from include/uapi/linux/perf_event.h:20,
                    from include/linux/perf_event.h:17,
                    from kernel/events/ring_buffer.c:12:
   kernel/events/ring_buffer.c: In function 'perf_output_put_handle':
   include/linux/compiler.h:339:38: error: call to '__compiletime_assert_86' declared with attribute error: Need native word sized stores/loads for atomicity.
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
                                         ^
   include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert'
       prefix ## suffix();    \
       ^~~~~~
   include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
     ^~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert'
     compiletime_assert(__native_word(t),    \
     ^~~~~~~~~~~~~~~~~~
>> arch/x86/include/asm/barrier.h:69:2: note: in expansion of macro 'compiletime_assert_atomic_type'

     compiletime_assert_atomic_type(*p);    \
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> include/asm-generic/barrier.h:157:33: note: in expansion of macro '__smp_store_release'

    #define smp_store_release(p, v) __smp_store_release(p, v)
                                    ^~~~~~~~~~~~~~~~~~~
   kernel/events/ring_buffer.c:86:2: note: in expansion of macro 'smp_store_release'
     smp_store_release(&rb->user_page->data_head, head); /* B, matches C */
     ^~~~~~~~~~~~~~~~~
   kernel/events/ring_buffer.c: In function 'perf_aux_output_skip':
   include/linux/compiler.h:339:38: error: call to '__compiletime_assert_498' declared with attribute error: Need native word sized stores/loads for atomicity.
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
                                         ^
   include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert'
       prefix ## suffix();    \
       ^~~~~~
   include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
     ^~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert'
     compiletime_assert(__native_word(t),    \
     ^~~~~~~~~~~~~~~~~~
>> arch/x86/include/asm/barrier.h:69:2: note: in expansion of macro 'compiletime_assert_atomic_type'

     compiletime_assert_atomic_type(*p);    \
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> include/asm-generic/barrier.h:157:33: note: in expansion of macro '__smp_store_release'

    #define smp_store_release(p, v) __smp_store_release(p, v)
                                    ^~~~~~~~~~~~~~~~~~~
   kernel/events/ring_buffer.c:498:2: note: in expansion of macro 'smp_store_release'
     smp_store_release(&rb->user_page->aux_head, rb->aux_head);
     ^~~~~~~~~~~~~~~~~
   kernel/events/ring_buffer.c: In function 'perf_aux_output_end':
   include/linux/compiler.h:339:38: error: call to '__compiletime_assert_466' declared with attribute error: Need native word sized stores/loads for atomicity.
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
                                         ^
   include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert'
       prefix ## suffix();    \
       ^~~~~~
   include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
     ^~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert'
     compiletime_assert(__native_word(t),    \
     ^~~~~~~~~~~~~~~~~~
>> arch/x86/include/asm/barrier.h:69:2: note: in expansion of macro 'compiletime_assert_atomic_type'

     compiletime_assert_atomic_type(*p);    \
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> include/asm-generic/barrier.h:157:33: note: in expansion of macro '__smp_store_release'

    #define smp_store_release(p, v) __smp_store_release(p, v)
                                    ^~~~~~~~~~~~~~~~~~~
   kernel/events/ring_buffer.c:466:2: note: in expansion of macro 'smp_store_release'
     smp_store_release(&rb->user_page->aux_head, rb->aux_head);
     ^~~~~~~~~~~~~~~~~
--
   In file included from include/uapi/linux/swab.h:6:0,
                    from include/linux/swab.h:5,
                    from include/uapi/linux/byteorder/little_endian.h:13,
                    from include/linux/byteorder/little_endian.h:5,
                    from arch/x86/include/uapi/asm/byteorder.h:5,
                    from include/uapi/linux/perf_event.h:20,
                    from include/linux/perf_event.h:17,
                    from kernel//events/ring_buffer.c:12:
   kernel//events/ring_buffer.c: In function 'perf_output_put_handle':
   include/linux/compiler.h:339:38: error: call to '__compiletime_assert_86' declared with attribute error: Need native word sized stores/loads for atomicity.
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
                                         ^
   include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert'
       prefix ## suffix();    \
       ^~~~~~
   include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
     ^~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert'
     compiletime_assert(__native_word(t),    \
     ^~~~~~~~~~~~~~~~~~
>> arch/x86/include/asm/barrier.h:69:2: note: in expansion of macro 'compiletime_assert_atomic_type'

     compiletime_assert_atomic_type(*p);    \
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> include/asm-generic/barrier.h:157:33: note: in expansion of macro '__smp_store_release'

    #define smp_store_release(p, v) __smp_store_release(p, v)
                                    ^~~~~~~~~~~~~~~~~~~
   kernel//events/ring_buffer.c:86:2: note: in expansion of macro 'smp_store_release'
     smp_store_release(&rb->user_page->data_head, head); /* B, matches C */
     ^~~~~~~~~~~~~~~~~
   kernel//events/ring_buffer.c: In function 'perf_aux_output_skip':
   include/linux/compiler.h:339:38: error: call to '__compiletime_assert_498' declared with attribute error: Need native word sized stores/loads for atomicity.
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
                                         ^
   include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert'
       prefix ## suffix();    \
       ^~~~~~
   include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
     ^~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert'
     compiletime_assert(__native_word(t),    \
     ^~~~~~~~~~~~~~~~~~
>> arch/x86/include/asm/barrier.h:69:2: note: in expansion of macro 'compiletime_assert_atomic_type'

     compiletime_assert_atomic_type(*p);    \
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> include/asm-generic/barrier.h:157:33: note: in expansion of macro '__smp_store_release'

    #define smp_store_release(p, v) __smp_store_release(p, v)
                                    ^~~~~~~~~~~~~~~~~~~
   kernel//events/ring_buffer.c:498:2: note: in expansion of macro 'smp_store_release'
     smp_store_release(&rb->user_page->aux_head, rb->aux_head);
     ^~~~~~~~~~~~~~~~~
   kernel//events/ring_buffer.c: In function 'perf_aux_output_end':
   include/linux/compiler.h:339:38: error: call to '__compiletime_assert_466' declared with attribute error: Need native word sized stores/loads for atomicity.
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
                                         ^
   include/linux/compiler.h:319:4: note: in definition of macro '__compiletime_assert'
       prefix ## suffix();    \
       ^~~~~~
   include/linux/compiler.h:339:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
     ^~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:342:2: note: in expansion of macro 'compiletime_assert'
     compiletime_assert(__native_word(t),    \
     ^~~~~~~~~~~~~~~~~~
>> arch/x86/include/asm/barrier.h:69:2: note: in expansion of macro 'compiletime_assert_atomic_type'

     compiletime_assert_atomic_type(*p);    \
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> include/asm-generic/barrier.h:157:33: note: in expansion of macro '__smp_store_release'

    #define smp_store_release(p, v) __smp_store_release(p, v)
                                    ^~~~~~~~~~~~~~~~~~~
   kernel//events/ring_buffer.c:466:2: note: in expansion of macro 'smp_store_release'
     smp_store_release(&rb->user_page->aux_head, rb->aux_head);
     ^~~~~~~~~~~~~~~~~

vim +/compiletime_assert_atomic_type +69 arch/x86/include/asm/barrier.h

47933ad4 Peter Zijlstra     2013-11-06  66  
1638fb72 Michael S. Tsirkin 2015-12-27  67  #define __smp_store_release(p, v)					\
47933ad4 Peter Zijlstra     2013-11-06  68  do {									\
47933ad4 Peter Zijlstra     2013-11-06 @69  	compiletime_assert_atomic_type(*p);				\
47933ad4 Peter Zijlstra     2013-11-06  70  	barrier();							\
76695af2 Andrey Konovalov   2015-08-02  71  	WRITE_ONCE(*p, v);						\
47933ad4 Peter Zijlstra     2013-11-06  72  } while (0)
47933ad4 Peter Zijlstra     2013-11-06  73  

:::::: The code at line 69 was first introduced by commit
:::::: 47933ad41a86a4a9b50bed7c9b9bd2ba242aac63 arch: Introduce smp_load_acquire(), smp_store_release()

:::::: TO: Peter Zijlstra <peterz@infradead.org>
:::::: CC: Ingo Molnar <mingo@kernel.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
Mark Rutland May 11, 2018, 10:59 a.m. | #3
On Thu, May 10, 2018 at 02:06:32PM +0100, Mark Rutland wrote:
> -	smp_wmb(); /* B, matches C */

> -	rb->user_page->data_head = head;

> +	smp_store_release(&rb->user_page->data_head, head); /* B, matches C */


> -	rb->user_page->aux_head = rb->aux_head;

> +	smp_store_release(&rb->user_page->aux_head, rb->aux_head);


> -	rb->user_page->aux_head = rb->aux_head;

> +	smp_store_release(&rb->user_page->aux_head, rb->aux_head);



The kbuild test robot has helpfully discovered another latent bug here.

We assume we can make single-copy-atomic accesses to
{aux,data}_{head,tail}, but this isn't necessarily true on 32-bit
architectures, and smp_store_release() rightly complains at build time.

READ_ONCE() and WRITE_ONCE() "helpfully" make a silent fallback to a
memcpy in this case, so we're broken today, regardless of this change.

I suspect that in practice we get single-copy-atomicity for the 32-bit
halves, and sessions likely produce less than 4GiB of ringbuffer data,
so failures would be rare.

I'm not sure how to fix the ABI here. The same issue applies on the
userspace side, so whatever we do we need to fix both sides.

Thanks,
Mark.
Peter Zijlstra May 11, 2018, 4:22 p.m. | #4
On Fri, May 11, 2018 at 11:59:32AM +0100, Mark Rutland wrote:
> READ_ONCE() and WRITE_ONCE() "helpfully" make a silent fallback to a

> memcpy in this case, so we're broken today, regardless of this change.

> 

> I suspect that in practice we get single-copy-atomicity for the 32-bit

> halves, and sessions likely produce less than 4GiB of ringbuffer data,

> so failures would be rare.


This should not be a problem because of the 32bit adress space limit,
which would necessarily limit us to the low word.

Also note that in perf_output_put_handle(), where we write ->data_head,
the store is from an 'unsigned long'. So on 32bit that will result in a
zero high word. Similarly, in __perf_output_begin() we read ->data_tail
into an unsigned long, which will discard the high word.

So userspace should always read (head) a zero high word, irrespective of
a split store (2x32bit), and the kernel will disregard the high word on
reading (tail), irrespective of what userspace put there.

This is all a bit subtle, and could probably use a comment, but it ought
to work..
Mark Rutland May 14, 2018, 11:05 a.m. | #5
On Fri, May 11, 2018 at 06:22:29PM +0200, Peter Zijlstra wrote:
> On Fri, May 11, 2018 at 11:59:32AM +0100, Mark Rutland wrote:

> > READ_ONCE() and WRITE_ONCE() "helpfully" make a silent fallback to a

> > memcpy in this case, so we're broken today, regardless of this change.

> > 

> > I suspect that in practice we get single-copy-atomicity for the 32-bit

> > halves, and sessions likely produce less than 4GiB of ringbuffer data,

> > so failures would be rare.

> 

> This should not be a problem because of the 32bit adress space limit,

> which would necessarily limit us to the low word.


For the wrapped values, yes.

I thought that the head and tail values were meant to be free-running,
but I can't see where I got that idea from now that I've gone digging
again.

> Also note that in perf_output_put_handle(), where we write ->data_head,

> the store is from an 'unsigned long'. So on 32bit that will result in a

> zero high word. Similarly, in __perf_output_begin() we read ->data_tail

> into an unsigned long, which will discard the high word.


Ah, that's a fair point. So it's just compat userspace that this is
potentially borked for. ;)

> So userspace should always read (head) a zero high word, irrespective of

> a split store (2x32bit), and the kernel will disregard the high word on

> reading (tail), irrespective of what userspace put there.

> 

> This is all a bit subtle, and could probably use a comment, but it ought

> to work..


It would be nice to guarantee that we don't lose 32-bit atomicity by
virtue of {READ,WRITE}_ONCE() falling back to memcpy in this case, so
maybe we should wrap this in some helpers.

I'll see if I can come up with something which isn't hideous, or I might
just pretend I never stumbled across this. :)

Thanks,
Mark.
Peter Zijlstra May 14, 2018, 11:28 a.m. | #6
On Mon, May 14, 2018 at 12:05:33PM +0100, Mark Rutland wrote:
> On Fri, May 11, 2018 at 06:22:29PM +0200, Peter Zijlstra wrote:

> > On Fri, May 11, 2018 at 11:59:32AM +0100, Mark Rutland wrote:

> > > READ_ONCE() and WRITE_ONCE() "helpfully" make a silent fallback to a

> > > memcpy in this case, so we're broken today, regardless of this change.

> > > 

> > > I suspect that in practice we get single-copy-atomicity for the 32-bit

> > > halves, and sessions likely produce less than 4GiB of ringbuffer data,

> > > so failures would be rare.

> > 

> > This should not be a problem because of the 32bit adress space limit,

> > which would necessarily limit us to the low word.

> 

> For the wrapped values, yes.

> 

> I thought that the head and tail values were meant to be free-running,

> but I can't see where I got that idea from now that I've gone digging

> again.


They are indeed free running.

> > Also note that in perf_output_put_handle(), where we write ->data_head,

> > the store is from an 'unsigned long'. So on 32bit that will result in a

> > zero high word. Similarly, in __perf_output_begin() we read ->data_tail

> > into an unsigned long, which will discard the high word.

> 

> Ah, that's a fair point. So it's just compat userspace that this is

> potentially borked for. ;)


Right.. #$$#@ compat. Hurmph.. not sure how to go about fixing that
there.

> > So userspace should always read (head) a zero high word, irrespective of

> > a split store (2x32bit), and the kernel will disregard the high word on

> > reading (tail), irrespective of what userspace put there.

> > 

> > This is all a bit subtle, and could probably use a comment, but it ought

> > to work..

> 

> It would be nice to guarantee that we don't lose 32-bit atomicity by

> virtue of {READ,WRITE}_ONCE() falling back to memcpy in this case, so

> maybe we should wrap this in some helpers.


Our __READ_ONCE_SIZE / __write_once_size include case 8 unconditionally.
So we'll always issue a volatile u64 load/store and let the compiler
figure out how to do that -- typically 2 load/stores I would imagine.
Peter Zijlstra May 14, 2018, 3:02 p.m. | #7
On Mon, May 14, 2018 at 01:28:15PM +0200, Peter Zijlstra wrote:
> On Mon, May 14, 2018 at 12:05:33PM +0100, Mark Rutland wrote:


> > > Also note that in perf_output_put_handle(), where we write ->data_head,

> > > the store is from an 'unsigned long'. So on 32bit that will result in a

> > > zero high word. Similarly, in __perf_output_begin() we read ->data_tail

> > > into an unsigned long, which will discard the high word.

> > 

> > Ah, that's a fair point. So it's just compat userspace that this is

> > potentially borked for. ;)

> 

> Right.. #$$#@ compat. Hurmph.. not sure how to go about fixing that

> there.


How's this?

---
 include/linux/perf_event.h  | 12 ++++++++++++
 kernel/events/core.c        | 31 +++++++++++++++++++++++++++++--
 kernel/events/ring_buffer.c | 39 ++++++++++++++++++++++++++++++++++-----
 3 files changed, 75 insertions(+), 7 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e71e99eb9a4e..7834dfb6a83b 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -517,6 +517,7 @@ typedef void (*perf_overflow_handler_t)(struct perf_event *,
  */
 #define PERF_EV_CAP_SOFTWARE		BIT(0)
 #define PERF_EV_CAP_READ_ACTIVE_PKG	BIT(1)
+#define PERF_EV_CAP_COMPAT		BIT(2)
 
 #define SWEVENT_HLIST_BITS		8
 #define SWEVENT_HLIST_SIZE		(1 << SWEVENT_HLIST_BITS)
@@ -1220,6 +1221,11 @@ static inline bool is_write_backward(struct perf_event *event)
 	return !!event->attr.write_backward;
 }
 
+static inline bool is_compat_event(struct perf_event *event)
+{
+	return event->event_caps & PERF_EV_CAP_COMPAT;
+}
+
 static inline bool has_addr_filter(struct perf_event *event)
 {
 	return event->pmu->nr_addr_filters;
@@ -1249,6 +1255,12 @@ extern int perf_output_begin_forward(struct perf_output_handle *handle,
 extern int perf_output_begin_backward(struct perf_output_handle *handle,
 				      struct perf_event *event,
 				      unsigned int size);
+extern int perf_output_begin_forward_compat(struct perf_output_handle *handle,
+				    struct perf_event *event,
+				    unsigned int size);
+extern int perf_output_begin_backward_compat(struct perf_output_handle *handle,
+				      struct perf_event *event,
+				      unsigned int size);
 
 extern void perf_output_end(struct perf_output_handle *handle);
 extern unsigned int perf_output_copy(struct perf_output_handle *handle,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 67612ce359ad..0e60f35442a6 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6523,6 +6523,22 @@ perf_event_output_backward(struct perf_event *event,
 	__perf_event_output(event, data, regs, perf_output_begin_backward);
 }
 
+void
+perf_event_output_forward_compat(struct perf_event *event,
+			 struct perf_sample_data *data,
+			 struct pt_regs *regs)
+{
+	__perf_event_output(event, data, regs, perf_output_begin_forward_compat);
+}
+
+void
+perf_event_output_backward_compat(struct perf_event *event,
+			   struct perf_sample_data *data,
+			   struct pt_regs *regs)
+{
+	__perf_event_output(event, data, regs, perf_output_begin_backward_compat);
+}
+
 void
 perf_event_output(struct perf_event *event,
 		  struct perf_sample_data *data,
@@ -10000,10 +10016,16 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 		event->overflow_handler	= overflow_handler;
 		event->overflow_handler_context = context;
 	} else if (is_write_backward(event)){
-		event->overflow_handler = perf_event_output_backward;
+		if (is_compat_event(event))
+			event->overflow_handler = perf_event_output_backward_compat;
+		else
+			event->overflow_handler = perf_event_output_backward;
 		event->overflow_handler_context = NULL;
 	} else {
-		event->overflow_handler = perf_event_output_forward;
+		if (is_compat_event(event))
+			event->overflow_handler = perf_event_output_forward_compat;
+		else
+			event->overflow_handler = perf_event_output_forward;
 		event->overflow_handler_context = NULL;
 	}
 
@@ -10266,6 +10288,8 @@ perf_event_set_output(struct perf_event *event, struct perf_event *output_event)
 	if (is_write_backward(output_event) != is_write_backward(event))
 		goto out;
 
+	if (is_compat_event(output_event) != is_compat_event(event))
+		goto out;
 	/*
 	 * If both events generate aux data, they must be on the same PMU
 	 */
@@ -10499,6 +10523,9 @@ SYSCALL_DEFINE5(perf_event_open,
 		goto err_cred;
 	}
 
+	if (in_compat_syscall())
+		event->event_caps |= PERF_EV_CAP_COMPAT;
+
 	if (is_sampling_event(event)) {
 		if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) {
 			err = -EOPNOTSUPP;
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 1d8ca9ea9979..8a8ca117e5de 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -85,7 +85,10 @@ static void perf_output_put_handle(struct perf_output_handle *handle)
 	 * See perf_output_begin().
 	 */
 	smp_wmb(); /* B, matches C */
-	rb->user_page->data_head = head;
+	/*
+	 * 'unsigned long' to 'u64' promotion will zero the high word on 32 bit.
+	 */
+	WRITE_ONCE(rb->user_page->data_head, head);
 
 	/*
 	 * Now check if we missed an update -- rely on previous implied
@@ -117,7 +120,7 @@ ring_buffer_has_space(unsigned long head, unsigned long tail,
 static int __always_inline
 __perf_output_begin(struct perf_output_handle *handle,
 		    struct perf_event *event, unsigned int size,
-		    bool backward)
+		    bool backward, bool compat)
 {
 	struct ring_buffer *rb;
 	unsigned long tail, offset, head;
@@ -158,7 +161,13 @@ __perf_output_begin(struct perf_output_handle *handle,
 	perf_output_get_handle(handle);
 
 	do {
+		/*
+		 * 'u64' to 'unsigned long' demotion looses the high word on 32 bit.
+		 */
 		tail = READ_ONCE(rb->user_page->data_tail);
+		if (compat)
+			tail &= UINT_MAX;
+
 		offset = head = local_read(&rb->head);
 		if (!rb->overwrite) {
 			if (unlikely(!ring_buffer_has_space(head, tail,
@@ -183,6 +192,13 @@ __perf_output_begin(struct perf_output_handle *handle,
 			head += size;
 		else
 			head -= size;
+
+		/*
+		 * Ensure rb->head has the high word clear for compat; this
+		 * avoids having to muck with perf_output_put_handle().
+		 */
+		if (compat)
+			head &= UINT_MAX;
 	} while (local_cmpxchg(&rb->head, offset, head) != offset);
 
 	if (backward) {
@@ -234,13 +250,25 @@ __perf_output_begin(struct perf_output_handle *handle,
 int perf_output_begin_forward(struct perf_output_handle *handle,
 			     struct perf_event *event, unsigned int size)
 {
-	return __perf_output_begin(handle, event, size, false);
+	return __perf_output_begin(handle, event, size, false, false);
 }
 
 int perf_output_begin_backward(struct perf_output_handle *handle,
 			       struct perf_event *event, unsigned int size)
 {
-	return __perf_output_begin(handle, event, size, true);
+	return __perf_output_begin(handle, event, size, true, false);
+}
+
+int perf_output_begin_forward_compat(struct perf_output_handle *handle,
+			     struct perf_event *event, unsigned int size)
+{
+	return __perf_output_begin(handle, event, size, false, true);
+}
+
+int perf_output_begin_backward_compat(struct perf_output_handle *handle,
+			       struct perf_event *event, unsigned int size)
+{
+	return __perf_output_begin(handle, event, size, true, true);
 }
 
 int perf_output_begin(struct perf_output_handle *handle,
@@ -248,7 +276,8 @@ int perf_output_begin(struct perf_output_handle *handle,
 {
 
 	return __perf_output_begin(handle, event, size,
-				   unlikely(is_write_backward(event)));
+				   unlikely(is_write_backward(event)),
+				   unlikely(is_compat_event(event));
 }
 
 unsigned int perf_output_copy(struct perf_output_handle *handle,
Mark Rutland May 14, 2018, 3:20 p.m. | #8
On Mon, May 14, 2018 at 05:02:13PM +0200, Peter Zijlstra wrote:
> On Mon, May 14, 2018 at 01:28:15PM +0200, Peter Zijlstra wrote:

> > On Mon, May 14, 2018 at 12:05:33PM +0100, Mark Rutland wrote:

> 

> > > > Also note that in perf_output_put_handle(), where we write ->data_head,

> > > > the store is from an 'unsigned long'. So on 32bit that will result in a

> > > > zero high word. Similarly, in __perf_output_begin() we read ->data_tail

> > > > into an unsigned long, which will discard the high word.

> > > 

> > > Ah, that's a fair point. So it's just compat userspace that this is

> > > potentially borked for. ;)

> > 

> > Right.. #$$#@ compat. Hurmph.. not sure how to go about fixing that

> > there.

> 

> How's this?

> 

> ---

>  include/linux/perf_event.h  | 12 ++++++++++++

>  kernel/events/core.c        | 31 +++++++++++++++++++++++++++++--

>  kernel/events/ring_buffer.c | 39 ++++++++++++++++++++++++++++++++++-----

>  3 files changed, 75 insertions(+), 7 deletions(-)

> 

> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h

> index e71e99eb9a4e..7834dfb6a83b 100644

> --- a/include/linux/perf_event.h

> +++ b/include/linux/perf_event.h

> @@ -517,6 +517,7 @@ typedef void (*perf_overflow_handler_t)(struct perf_event *,

>   */

>  #define PERF_EV_CAP_SOFTWARE		BIT(0)

>  #define PERF_EV_CAP_READ_ACTIVE_PKG	BIT(1)

> +#define PERF_EV_CAP_COMPAT		BIT(2)

>  

>  #define SWEVENT_HLIST_BITS		8

>  #define SWEVENT_HLIST_SIZE		(1 << SWEVENT_HLIST_BITS)

> @@ -1220,6 +1221,11 @@ static inline bool is_write_backward(struct perf_event *event)

>  	return !!event->attr.write_backward;

>  }

>  

> +static inline bool is_compat_event(struct perf_event *event)

> +{

> +	return event->event_caps & PERF_EV_CAP_COMPAT;

> +}


> @@ -10499,6 +10523,9 @@ SYSCALL_DEFINE5(perf_event_open,

>  		goto err_cred;

>  	}

>  

> +	if (in_compat_syscall())

> +		event->event_caps |= PERF_EV_CAP_COMPAT;

> +


After a native perf_event_open, you could pass the fd (or exec) to
another task that was compat (or vice-versa), so this wouldn't work in
that case (crazy as it may be).

I don't have a better suggestion at present, though.

Thanks,
Mark.
Peter Zijlstra May 14, 2018, 3:24 p.m. | #9
On Mon, May 14, 2018 at 04:20:23PM +0100, Mark Rutland wrote:

> > @@ -10499,6 +10523,9 @@ SYSCALL_DEFINE5(perf_event_open,

> >  		goto err_cred;

> >  	}

> >  

> > +	if (in_compat_syscall())

> > +		event->event_caps |= PERF_EV_CAP_COMPAT;

> > +

> 

> After a native perf_event_open, you could pass the fd (or exec) to

> another task that was compat (or vice-versa), so this wouldn't work in

> that case (crazy as it may be).

> 

> I don't have a better suggestion at present, though.


As discussed on IRC, we could trigger off of the buffer size; if the
buffer is <4G the &= UINT_MAX is harmless, if the buffer is larger, you
have to be using a 64bit thingy anyway.

Flipping the overflow functions around on attach/detach to buffers is a
little more dodgy, but could be done I suppose.
Will Deacon May 23, 2018, 4:42 p.m. | #10
On Thu, May 10, 2018 at 02:06:32PM +0100, Mark Rutland wrote:
> Userspace can read/write the user page at any point in time, and in

> perf_output_put_handle() we're very careful to use memory barriers to

> ensure ordering between updates to data and the user page.

> 

> We don't use barriers when updating aux_head, where similar ordering

> constraints apply. This could result in userspace seeing stale data, or

> data being overwritten while userspace was still consuming it.

> 

> Further, we update data_head and aux_head with plain assignments, which

> the compiler can tear, potentially resulting in userspace seeing

> erroneous values.

> 

> We can solve both of these problems by using smp_store_release to update

> data_head and aux_head, so let's do so.

> 

> Signed-off-by: Mark Rutland <mark.rutland@arm.com>

> Cc: Ingo Molnar <mingo@redhat.com>

> Cc: Peter Zijlstra <peterz@infradead.org>

> Cc: Will Deacon <will.deacon@arm.com>

> ---

>  kernel/events/ring_buffer.c | 13 ++++++-------

>  1 file changed, 6 insertions(+), 7 deletions(-)

> 

> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c

> index 6c6b3c48db71..839b207e4c77 100644

> --- a/kernel/events/ring_buffer.c

> +++ b/kernel/events/ring_buffer.c

> @@ -63,10 +63,10 @@ static void perf_output_put_handle(struct perf_output_handle *handle)

>  	 *   kernel				user

>  	 *

>  	 *   if (LOAD ->data_tail) {		LOAD ->data_head

> -	 *			(A)		smp_rmb()	(C)

> +	 *				(A) 	smp_rmb()	(C)

>  	 *	STORE $data			LOAD $data

> -	 *	smp_wmb()	(B)		smp_mb()	(D)

> -	 *	STORE ->data_head		STORE ->data_tail

> +	 *					smp_mb()	(D)

> +	 *	RELEASE ->data_head	(B)	STORE ->data_tail

>  	 *   }


One thing to be aware of here is that the choice of ordering primitive (e.g.
using fences vs acquire/release operations) has the potential to create
ABI with userspace. I don't know of any architectures which currently care,
but if were were to merge a non multi-copy atomic architecture with native
acquire/release instructions, you could see issues if e.g. userspace used
smp_rmb(); READ_ONCE but the kernel used a RELEASE store.

Anyway, that's currently theoretical, but I think it's an argument for
putting these accessors in a uapi header.

Will

Patch

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 6c6b3c48db71..839b207e4c77 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -63,10 +63,10 @@  static void perf_output_put_handle(struct perf_output_handle *handle)
 	 *   kernel				user
 	 *
 	 *   if (LOAD ->data_tail) {		LOAD ->data_head
-	 *			(A)		smp_rmb()	(C)
+	 *				(A) 	smp_rmb()	(C)
 	 *	STORE $data			LOAD $data
-	 *	smp_wmb()	(B)		smp_mb()	(D)
-	 *	STORE ->data_head		STORE ->data_tail
+	 *					smp_mb()	(D)
+	 *	RELEASE ->data_head	(B)	STORE ->data_tail
 	 *   }
 	 *
 	 * Where A pairs with D, and B pairs with C.
@@ -83,8 +83,7 @@  static void perf_output_put_handle(struct perf_output_handle *handle)
 	 *
 	 * See perf_output_begin().
 	 */
-	smp_wmb(); /* B, matches C */
-	rb->user_page->data_head = head;
+	smp_store_release(&rb->user_page->data_head, head); /* B, matches C */
 
 	/*
 	 * Now check if we missed an update -- rely on previous implied
@@ -464,7 +463,7 @@  void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size)
 		                     handle->aux_flags);
 	}
 
-	rb->user_page->aux_head = rb->aux_head;
+	smp_store_release(&rb->user_page->aux_head, rb->aux_head);
 	if (rb_need_aux_wakeup(rb))
 		wakeup = true;
 
@@ -496,7 +495,7 @@  int perf_aux_output_skip(struct perf_output_handle *handle, unsigned long size)
 
 	rb->aux_head += size;
 
-	rb->user_page->aux_head = rb->aux_head;
+	smp_store_release(&rb->user_page->aux_head, rb->aux_head);
 	if (rb_need_aux_wakeup(rb)) {
 		perf_output_wakeup(handle);
 		handle->wakeup = rb->aux_wakeup + rb->aux_watermark;