[STABLE,v4.9,09/10] locking/qspinlock/x86: Increase _Q_PENDING_LOOPS upper bound

Sebastian Andrzej Siewior Dec. 18, 2018, 10:10 p.m.
From: Will Deacon <will.deacon@arm.com>

commit b247be3fe89b6aba928bf80f4453d1c4ba8d2063 upstream.

On x86, atomic_cond_read_relaxed will busy-wait with a cpu_relax() loop,
so it is desirable to increase the number of times we spin on the qspinlock
lockword when it is found to be transitioning from pending to locked.

According to Waiman Long:

 | Ideally, the spinning times should be at least a few times the typical
 | cacheline load time from memory which I think can be down to 100ns or
 | so for each cacheline load with the newest systems or up to several
 | hundreds ns for older systems.

which in his benchmarking corresponded to 512 iterations.

diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index e07cc206919d4..8b1ba1607091c 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -5,6 +5,8 @@ 
 #include <asm-generic/qspinlock_types.h>
 #include <asm/paravirt.h>
+#define _Q_PENDING_LOOPS	(1 << 9)
 #define	queued_spin_unlock queued_spin_unlock
  * queued_spin_unlock - release a queued spinlock