diff mbox series

locking/qspinlock: Ensure node is initialised before updating prev->next

Message ID 1517401246-2750-1-git-send-email-will.deacon@arm.com
State New
Headers show
Series locking/qspinlock: Ensure node is initialised before updating prev->next | expand

Commit Message

Will Deacon Jan. 31, 2018, 12:20 p.m. UTC
When a locker ends up queuing on the qspinlock locking slowpath, we
initialise the relevant mcs node and publish it indirectly by updating
the tail portion of the lock word using xchg_tail. If we find that there
was a pre-existing locker in the queue, we subsequently update their
->next field to point at our node so that we are notified when it's our
turn to take the lock.

This can be roughly illustrated as follows:

  /* Initialise the fields in node and encode a pointer to node in tail */
  tail = initialise_node(node);

  /*
   * Exchange tail into the lockword using an atomic read-modify-write
   * operation with release semantics
   */
  old = xchg_tail(lock, tail);

  /* If there was a pre-existing waiter ... */
  if (old & _Q_TAIL_MASK) {
	prev = decode_tail(old);
	smp_read_barrier_depends();

	/* ... then update their ->next field to point to node.
	WRITE_ONCE(prev->next, node);
  }

The conditional update of prev->next therefore relies on the address
dependency from the result of xchg_tail ensuring order against the
prior initialisation of node. However, since the release semantics of
the xchg_tail operation apply only to the write portion of the RmW,
then this ordering is not guaranteed and it is possible for the CPU
to return old before the writes to node have been published, consequently
allowing us to point prev->next to an uninitialised node.

This patch fixes the problem by making the update of prev->next a RELEASE
operation, which also removes the reliance on dependency ordering.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

---
 kernel/locking/qspinlock.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

-- 
2.1.4

Comments

Peter Zijlstra Jan. 31, 2018, 12:38 p.m. UTC | #1
On Wed, Jan 31, 2018 at 12:20:46PM +0000, Will Deacon wrote:
> diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c

> index 294294c71ba4..1ebbc366a31d 100644

> --- a/kernel/locking/qspinlock.c

> +++ b/kernel/locking/qspinlock.c

> @@ -408,16 +408,15 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)

>  	 */

>  	if (old & _Q_TAIL_MASK) {

>  		prev = decode_tail(old);

> +

>  		/*

> -		 * The above xchg_tail() is also a load of @lock which generates,

> -		 * through decode_tail(), a pointer.

> -		 *

> -		 * The address dependency matches the RELEASE of xchg_tail()

> -		 * such that the access to @prev must happen after.

> +		 * We must ensure that the stores to @node are observed before

> +		 * the write to prev->next. The address dependency on xchg_tail

> +		 * is not sufficient to ensure this because the read component

> +		 * of xchg_tail is unordered with respect to the initialisation

> +		 * of node.

>  		 */

> -		smp_read_barrier_depends();


Right, except you're patching old code here, please try again on a tree
that includes commit:

  548095dea63f ("locking: Remove smp_read_barrier_depends() from queued_spin_lock_slowpath()")

> -

> -		WRITE_ONCE(prev->next, node);

> +		smp_store_release(prev->next, node);

>  

>  		pv_wait_node(node, prev);

>  		arch_mcs_spin_lock_contended(&node->locked);

> -- 

> 2.1.4

>
Andrea Parri Jan. 31, 2018, 2:11 p.m. UTC | #2
On Wed, Jan 31, 2018 at 01:38:59PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 31, 2018 at 12:20:46PM +0000, Will Deacon wrote:

> > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c

> > index 294294c71ba4..1ebbc366a31d 100644

> > --- a/kernel/locking/qspinlock.c

> > +++ b/kernel/locking/qspinlock.c

> > @@ -408,16 +408,15 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)

> >  	 */

> >  	if (old & _Q_TAIL_MASK) {

> >  		prev = decode_tail(old);

> > +

> >  		/*

> > -		 * The above xchg_tail() is also a load of @lock which generates,

> > -		 * through decode_tail(), a pointer.

> > -		 *

> > -		 * The address dependency matches the RELEASE of xchg_tail()

> > -		 * such that the access to @prev must happen after.

> > +		 * We must ensure that the stores to @node are observed before

> > +		 * the write to prev->next. The address dependency on xchg_tail

> > +		 * is not sufficient to ensure this because the read component

> > +		 * of xchg_tail is unordered with respect to the initialisation

> > +		 * of node.

> >  		 */

> > -		smp_read_barrier_depends();

> 

> Right, except you're patching old code here, please try again on a tree

> that includes commit:

> 

>   548095dea63f ("locking: Remove smp_read_barrier_depends() from queued_spin_lock_slowpath()")


BTW, which loads was/is the smp_read_barrier_depends() supposed to order? ;)

I was somehow guessing that this barrier was/is there to "order" the load
from xchg_tail() with the address-dependent loads from pv_wait_node(); is
this true? (Does Will's patch really remove the reliance on the barrier?)

  Andrea


> 

> > -

> > -		WRITE_ONCE(prev->next, node);

> > +		smp_store_release(prev->next, node);

> >  

> >  		pv_wait_node(node, prev);

> >  		arch_mcs_spin_lock_contended(&node->locked);

> > -- 

> > 2.1.4

> >
kernel test robot Feb. 3, 2018, 2:24 a.m. UTC | #3
Hi Will,

I love your patch! Perhaps something to improve:

[auto build test WARNING on v4.15]
[cannot apply to tip/locking/core tip/core/locking tip/auto-latest next-20180202]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Will-Deacon/locking-qspinlock-Ensure-node-is-initialised-before-updating-prev-next/20180203-095222
config: sparc64-allyesconfig (attached as .config)
compiler: sparc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sparc64 

All warnings (new ones prefixed by >>):

   In file included from include/linux/kernel.h:10:0,
                    from include/linux/list.h:9,
                    from include/linux/smp.h:12,
                    from kernel/locking/qspinlock.c:25:
   kernel/locking/qspinlock.c: In function 'queued_spin_lock_slowpath':
   include/linux/compiler.h:264:8: error: conversion to non-scalar type requested
     union { typeof(x) __val; char __c[1]; } __u = \
           ^
>> arch/sparc/include/asm/barrier_64.h:45:2: note: in expansion of macro 'WRITE_ONCE'

     WRITE_ONCE(*p, v);      \
     ^~~~~~~~~~
   include/asm-generic/barrier.h:157:33: note: in expansion of macro '__smp_store_release'
    #define smp_store_release(p, v) __smp_store_release(p, v)
                                    ^~~~~~~~~~~~~~~~~~~
   kernel/locking/qspinlock.c:419:3: note: in expansion of macro 'smp_store_release'
      smp_store_release(prev->next, node);
      ^~~~~~~~~~~~~~~~~
--
   In file included from include/linux/kernel.h:10:0,
                    from include/linux/list.h:9,
                    from include/linux/smp.h:12,
                    from kernel//locking/qspinlock.c:25:
   kernel//locking/qspinlock.c: In function 'queued_spin_lock_slowpath':
   include/linux/compiler.h:264:8: error: conversion to non-scalar type requested
     union { typeof(x) __val; char __c[1]; } __u = \
           ^
>> arch/sparc/include/asm/barrier_64.h:45:2: note: in expansion of macro 'WRITE_ONCE'

     WRITE_ONCE(*p, v);      \
     ^~~~~~~~~~
   include/asm-generic/barrier.h:157:33: note: in expansion of macro '__smp_store_release'
    #define smp_store_release(p, v) __smp_store_release(p, v)
                                    ^~~~~~~~~~~~~~~~~~~
   kernel//locking/qspinlock.c:419:3: note: in expansion of macro 'smp_store_release'
      smp_store_release(prev->next, node);
      ^~~~~~~~~~~~~~~~~

vim +/WRITE_ONCE +45 arch/sparc/include/asm/barrier_64.h

d550bbd4 David Howells      2012-03-28  40  
45d9b859 Michael S. Tsirkin 2015-12-27  41  #define __smp_store_release(p, v)						\
47933ad4 Peter Zijlstra     2013-11-06  42  do {									\
47933ad4 Peter Zijlstra     2013-11-06  43  	compiletime_assert_atomic_type(*p);				\
47933ad4 Peter Zijlstra     2013-11-06  44  	barrier();							\
76695af2 Andrey Konovalov   2015-08-02 @45  	WRITE_ONCE(*p, v);						\
47933ad4 Peter Zijlstra     2013-11-06  46  } while (0)
47933ad4 Peter Zijlstra     2013-11-06  47  

:::::: The code at line 45 was first introduced by commit
:::::: 76695af20c015206cffb84b15912be6797d0cca2 locking, arch: use WRITE_ONCE()/READ_ONCE() in smp_store_release()/smp_load_acquire()

:::::: TO: Andrey Konovalov <andreyknvl@google.com>
:::::: CC: Ingo Molnar <mingo@kernel.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
kernel test robot Feb. 3, 2018, 2:26 a.m. UTC | #4
Hi Will,

I love your patch! Yet something to improve:

[auto build test ERROR on v4.15]
[cannot apply to tip/locking/core tip/core/locking tip/auto-latest next-20180202]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Will-Deacon/locking-qspinlock-Ensure-node-is-initialised-before-updating-prev-next/20180203-095222
config: x86_64-randconfig-x017-201804 (attached as .config)
compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All error/warnings (new ones prefixed by >>):

   In file included from include/linux/kernel.h:10:0,
                    from include/linux/list.h:9,
                    from include/linux/smp.h:12,
                    from kernel/locking/qspinlock.c:25:
   kernel/locking/qspinlock.c: In function 'queued_spin_lock_slowpath':
>> include/linux/compiler.h:264:8: error: conversion to non-scalar type requested

     union { typeof(x) __val; char __c[1]; } __u = \
           ^
>> arch/x86/include/asm/barrier.h:71:2: note: in expansion of macro 'WRITE_ONCE'

     WRITE_ONCE(*p, v);      \
     ^~~~~~~~~~
   include/asm-generic/barrier.h:157:33: note: in expansion of macro '__smp_store_release'
    #define smp_store_release(p, v) __smp_store_release(p, v)
                                    ^~~~~~~~~~~~~~~~~~~
>> kernel/locking/qspinlock.c:419:3: note: in expansion of macro 'smp_store_release'

      smp_store_release(prev->next, node);
      ^~~~~~~~~~~~~~~~~
--
   In file included from include/linux/kernel.h:10:0,
                    from include/linux/list.h:9,
                    from include/linux/smp.h:12,
                    from kernel//locking/qspinlock.c:25:
   kernel//locking/qspinlock.c: In function 'queued_spin_lock_slowpath':
>> include/linux/compiler.h:264:8: error: conversion to non-scalar type requested

     union { typeof(x) __val; char __c[1]; } __u = \
           ^
>> arch/x86/include/asm/barrier.h:71:2: note: in expansion of macro 'WRITE_ONCE'

     WRITE_ONCE(*p, v);      \
     ^~~~~~~~~~
   include/asm-generic/barrier.h:157:33: note: in expansion of macro '__smp_store_release'
    #define smp_store_release(p, v) __smp_store_release(p, v)
                                    ^~~~~~~~~~~~~~~~~~~
   kernel//locking/qspinlock.c:419:3: note: in expansion of macro 'smp_store_release'
      smp_store_release(prev->next, node);
      ^~~~~~~~~~~~~~~~~

vim +/WRITE_ONCE +71 arch/x86/include/asm/barrier.h

47933ad4 Peter Zijlstra     2013-11-06  66  
1638fb72 Michael S. Tsirkin 2015-12-27  67  #define __smp_store_release(p, v)					\
47933ad4 Peter Zijlstra     2013-11-06  68  do {									\
47933ad4 Peter Zijlstra     2013-11-06  69  	compiletime_assert_atomic_type(*p);				\
47933ad4 Peter Zijlstra     2013-11-06  70  	barrier();							\
76695af2 Andrey Konovalov   2015-08-02 @71  	WRITE_ONCE(*p, v);						\
47933ad4 Peter Zijlstra     2013-11-06  72  } while (0)
47933ad4 Peter Zijlstra     2013-11-06  73  

:::::: The code at line 71 was first introduced by commit
:::::: 76695af20c015206cffb84b15912be6797d0cca2 locking, arch: use WRITE_ONCE()/READ_ONCE() in smp_store_release()/smp_load_acquire()

:::::: TO: Andrey Konovalov <andreyknvl@google.com>
:::::: CC: Ingo Molnar <mingo@kernel.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
diff mbox series

Patch

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 294294c71ba4..1ebbc366a31d 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -408,16 +408,15 @@  void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 	 */
 	if (old & _Q_TAIL_MASK) {
 		prev = decode_tail(old);
+
 		/*
-		 * The above xchg_tail() is also a load of @lock which generates,
-		 * through decode_tail(), a pointer.
-		 *
-		 * The address dependency matches the RELEASE of xchg_tail()
-		 * such that the access to @prev must happen after.
+		 * We must ensure that the stores to @node are observed before
+		 * the write to prev->next. The address dependency on xchg_tail
+		 * is not sufficient to ensure this because the read component
+		 * of xchg_tail is unordered with respect to the initialisation
+		 * of node.
 		 */
-		smp_read_barrier_depends();
-
-		WRITE_ONCE(prev->next, node);
+		smp_store_release(prev->next, node);
 
 		pv_wait_node(node, prev);
 		arch_mcs_spin_lock_contended(&node->locked);