diff mbox

[RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

Message ID 56A0A4ED.3070308@huawei.com
State New
Headers show

Commit Message

Ding Tianhong Jan. 21, 2016, 9:29 a.m. UTC
I build a script to create several process for ioctl loop calling,
the ioctl will calling the kernel function just like:
xx_ioctl {
...
rtnl_lock();
function();
rtnl_unlock();
...
}
The function may sleep several ms, but will not halt, at the same time
another user service may calling ifconfig to change the state of the
ethernet, and after several hours, the hung task thread report this problem:

-- 
2.5.0

Comments

Ding Tianhong Jan. 24, 2016, 8:03 a.m. UTC | #1
On 2016/1/22 21:59, Waiman Long wrote:
> On 01/22/2016 06:06 AM, Peter Zijlstra wrote:

>> On Fri, Jan 22, 2016 at 11:56:52AM +0100, Peter Zijlstra wrote:

>>> On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote:

>>>

>>>> There might be other details, but this is the one that stood out.

>>> I think this also does the wrong thing for use_ww_ctx.

>> Something like so?

> 

> I think that should work. My only minor concern is that putting the waiter spinner at the end of the OSQ will take it longer to get the lock, but that shouldn't be a big issue.

> 

>> ---

>>   kernel/locking/mutex.c | 15 +++++++++++++++

>>   1 file changed, 15 insertions(+)

>>

>> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c

>> index 0551c219c40e..070a0ac34aa7 100644

>> --- a/kernel/locking/mutex.c

>> +++ b/kernel/locking/mutex.c

>> @@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,

>>       struct task_struct *task = current;

>>       struct mutex_waiter waiter;

>>       unsigned long flags;

>> +    bool acquired;

>>       int ret;

>>

>>       preempt_disable();

>> @@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,

>>       lock_contended(&lock->dep_map, ip);

>>

>>       for (;;) {

>> +        acquired = false;

>>           /*

>>            * Lets try to take the lock again - this is needed even if

>>            * we get here for the first time (shortly after failing to

>> @@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,

>>           /* didn't get the lock, go to sleep: */

>>           spin_unlock_mutex(&lock->wait_lock, flags);

>>           schedule_preempt_disabled();

>> +

>> +        if (mutex_is_locked(lock))

>> +            acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx);

>> +

>>           spin_lock_mutex(&lock->wait_lock, flags);

>> +

>> +        if (acquired) {

>> +            atomic_set(&lock->count, -1);

>> +            break;

>> +        }

>>       }

>>       __set_task_state(task, TASK_RUNNING);

>>

>> @@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,

>>           atomic_set(&lock->count, 0);

>>       debug_mutex_free_waiter(&waiter);

>>

>> +    if (acquired)

>> +        goto unlock;

>> +

>>   skip_wait:

>>       /* got the lock - cleanup and rejoice! */

>>       lock_acquired(&lock->dep_map, ip);

>> @@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,

>>           ww_mutex_set_context_slowpath(ww, ww_ctx);

>>       }

>>

>> +unlock:

>>       spin_unlock_mutex(&lock->wait_lock, flags);

>>       preempt_enable();

>>       return 0;

> 

> Cheers,

> Longman

> 


looks good to me, I will try this solution and report the result, thanks everyone.

Ding

> .

>
Ding Tianhong Jan. 30, 2016, 1:18 a.m. UTC | #2
On 2016/1/29 17:53, Peter Zijlstra wrote:
> On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote:

> 

>> looks good to me, I will try this solution and report the result, thanks everyone.

> 

> Did you get a change to run with this?

> 

> .

> 


I backport this patch to 3.10 lts kernel, and didn't change any logic, Till now, the patch works fine to me, and no need to change anything,
So I think this patch is no problem, could you formal release this patch to the latest kernel? :)

Thanks.
Ding
Ding Tianhong Feb. 3, 2016, 7:10 a.m. UTC | #3
On 2016/2/3 5:19, Davidlohr Bueso wrote:
> On Mon, 01 Feb 2016, Peter Zijlstra wrote:

> 

>> Subject: locking/mutex: Avoid spinner vs waiter starvation

>> From: Peter Zijlstra <peterz@infradead.org>

>> Date: Fri, 22 Jan 2016 12:06:53 +0100

>>

>> Ding Tianhong reported that under his load the optimistic spinners

>> would totally starve a task that ended up on the wait list.

>>

>> Fix this by ensuring the top waiter also partakes in the optimistic

>> spin queue.

>>

>> There are a few subtle differences between the assumed state of

>> regular optimistic spinners and those already on the wait list, which

>> result in the @acquired complication of the acquire path.

>>

>> Most notable are:

>>

>> - waiters are on the wait list and need to be taken off

>> - mutex_optimistic_spin() sets the lock->count to 0 on acquire

>>   even though there might be more tasks on the wait list.

> 

> Right, the main impact I see with these complications are that the

> window of when a waiter takes the lock via spinning and then acquires

> the wait_lock to remove itself from the list, will allow an unlock

> thread to set the lock as available in the fastpath which could in

> turn allow a third thread the steal the lock. With high contention,

> this window will be come obviously larger as we contend for the

> wait_lock.

> 

> CPU-0                                  CPU-1            CPU-3

> __mutex_lock_common              mutex_optimistic_spin

>   (->count now 0)

>             __mutex_fastpath_unlock

>             (->count now 1)                 __mutex_fastpath_lock

>                                       (stolen)

>                                                        

> spin_lock_mutex(&lock->wait_lock, flags);

> 

> But we've always been bad when it comes to counter and waiters.

> 


Agree, but this patch is going to help the waiter in the wait list to get the lock, your scene probability looks more
too low and I don't think it is a problem.

Thanks
Ding


> Thanks,

> Davidlohr

> 

> .

>
Ding Tianhong Feb. 4, 2016, 1:20 a.m. UTC | #4
On 2016/2/4 3:24, Davidlohr Bueso wrote:
> On Wed, 03 Feb 2016, Ding Tianhong wrote:

> 

>> Agree, but this patch is going to help the waiter in the wait list to get the lock, your scene probability looks more

>> too low and I don't think it is a problem.

> 

> Sure, I was in fact implying its not the end of the world,

> although it will be interesting to see the impact on different

> (non pathological) workloads, even if it only affects a single

> waiter. Also, technically this issue can also affect rwsems if

> only using writers, but that's obviously pretty idiotic, so I

> wouldn't worry about it.

> 

> Thanks,

> Davidlohr

> 


Hi Davidlohr, Peter:

According Davidlohr's suggestion, I use several VM to test Peter's patch, and sadly I found one VM
still happen Hung Task for this problem, so I think we still need to think more about this solution.

Thanks.
Ding

> .

>
diff mbox

Patch

========================================================================
149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds.
[149738.040597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[149738.042280] ifconfig D ffff88061ec13680 0 11890 11573 0x00000080
[149738.042284] ffff88052449bd40 0000000000000082 ffff88053a33f300 ffff88052449bfd8
[149738.042286] ffff88052449bfd8 ffff88052449bfd8 ffff88053a33f300 ffffffff819e6240
[149738.042288] ffffffff819e6244 ffff88053a33f300 00000000ffffffff ffffffff819e6248
[149738.042290] Call Trace:
[149738.042300] [<ffffffff8160d219>] schedule_preempt_disabled+0x29/0x70
[149738.042303] [<ffffffff8160af65>] __mutex_lock_slowpath+0xc5/0x1c0
[149738.042305] [<ffffffff8160a3cf>] mutex_lock+0x1f/0x2f
[149738.042309] [<ffffffff8150d945>] rtnl_lock+0x15/0x20
[149738.042311] [<ffffffff81514e3a>] dev_ioctl+0xda/0x590
[149738.042314] [<ffffffff816121cc>] ? __do_page_fault+0x21c/0x560
[149738.042318] [<ffffffff814e42c5>] sock_do_ioctl+0x45/0x50
[149738.042320] [<ffffffff814e49d0>] sock_ioctl+0x1f0/0x2c0
[149738.042324] [<ffffffff811dc9b5>] do_vfs_ioctl+0x2e5/0x4c0
[149738.042327] [<ffffffff811e6a00>] ? fget_light+0xa0/0xd0

================================ cut here ================================

I got the vmcore and found that the ifconfig is already in the wait_list of the
rtnl_lock for 120 second, but my process could get and release the rtnl_lock
normally several times in one second, so it means that my process jump the
queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock
slow path and found that the mutex may spin on owner ignore whether the  wait list
is empty, it will cause the task in the wait list always be cut in line, so add
test for wait list in the mutex_can_spin_on_owner and avoid this problem.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Waiman Long <Waiman.Long@hp.com>
---
 kernel/locking/mutex.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c21..596b341 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -256,7 +256,7 @@  static inline int mutex_can_spin_on_owner(struct mutex *lock)
 	struct task_struct *owner;
 	int retval = 1;
 
-	if (need_resched())
+	if (need_resched() || atomic_read(&lock->count) == -1)
 		return 0;
 
 	rcu_read_lock();
@@ -283,10 +283,11 @@  static inline bool mutex_try_to_acquire(struct mutex *lock)
 /*
  * Optimistic spinning.
  *
- * We try to spin for acquisition when we find that the lock owner
- * is currently running on a (different) CPU and while we don't
- * need to reschedule. The rationale is that if the lock owner is
- * running, it is likely to release the lock soon.
+ * We try to spin for acquisition when we find that there are no
+ * pending waiters and the lock owner is currently running on a
+ * (different) CPU and while we don't need to reschedule. The
+ * rationale is that if the lock owner is running, it is likely
+ * to release the lock soon.
  *
  * Since this needs the lock owner, and this mutex implementation
  * doesn't track the owner atomically in the lock field, we need to