From patchwork Thu Feb 11 09:26:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lee Jones X-Patchwork-Id: 380895 Delivered-To: patch@linaro.org Received: by 2002:a02:b18a:0:0:0:0:0 with SMTP id t10csp1909677jah; Thu, 11 Feb 2021 01:31:56 -0800 (PST) X-Google-Smtp-Source: ABdhPJxTMOZ1Nhi60iFTd6eGQcfVQeHy6QjYwvmT9/FG14PbfzMuWsOeKPcWgMlgXX+/Db5V57Lb X-Received: by 2002:a05:6402:4389:: with SMTP id o9mr7516134edc.164.1613035916182; Thu, 11 Feb 2021 01:31:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613035916; cv=none; d=google.com; s=arc-20160816; b=witFTm3OR7v0QEVUsU9eP+lzhEWvj30+FtnBaCXCjBAnfTkUhTOwL+fd/Bjo8pDPix 0ziRg1dbvTJXfv1wfVQxto+ghvowEpQwDDKmWCBVN66fSGvhb9PO3rZF6IW5pZKunCu3 Hbcab63knXLxoJeaVqPHImW/a4NSsYn6xTiALWvvShVaI6/CpulcaPDnU3HZPaXI+jXe KmoUvcXtpwtqi6+sl36B67VFAJbnMqyDFqv4FLmyTxgDFI6p/mjQcI2HzS1FSRD5DTQz 4a1BDi56q6YhwZziGvsugT86i2FyPH3woBuRB4QH0y0Q7UdqMgXrtF2nMfSXG8WrE3pk YrWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=zT+iHjphKsz597MUGj9bT/k1CO7ELZX3vL6R9H/hLig=; b=KCL3o/2mcqQGcouOngiCC87pp08qO3QWK6YyXHO4Mnk9bOelqz12wHvkdntKTQCim9 aE1SNDUPAHhKByaiX4apC1/GZqUgTe1sRbrVcvZ7NaQhf+/IplFszqrpaUfJKnS/Aese SZjZF4F1USM7XLV2YYLNrBc23q9gb8r1wOK0BJu4uqSqNl+9Z7w//22LqV5eZgTJlJ6t HikDVfgiSA8ADBNSrbyeq5T0pqIvuzKLBkGKE5zVT0WJY1DFfNtn6uYVhLkeumbuTOaF +TKuZf2JvlM6PZLpQDWLwnh9ZMcPmIXL9M24h5y18+aHHGQHL6LSR4OJEpZGajqY710L sW9w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=oKugZmpm; spf=pass (google.com: domain of stable-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=stable-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t20si3394112eju.239.2021.02.11.01.31.56; Thu, 11 Feb 2021 01:31:56 -0800 (PST) Received-SPF: pass (google.com: domain of stable-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=oKugZmpm; spf=pass (google.com: domain of stable-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=stable-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230011AbhBKJaS (ORCPT + 13 others); Thu, 11 Feb 2021 04:30:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60218 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230078AbhBKJ2H (ORCPT ); Thu, 11 Feb 2021 04:28:07 -0500 Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 338FBC06178A for ; Thu, 11 Feb 2021 01:27:27 -0800 (PST) Received: by mail-wm1-x32b.google.com with SMTP id o24so4923979wmh.5 for ; Thu, 11 Feb 2021 01:27:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=zT+iHjphKsz597MUGj9bT/k1CO7ELZX3vL6R9H/hLig=; b=oKugZmpmDl9cZwSHBc8RVeKi3ctio/G8zbXcQ4lticc87CjidNgPGN8QvGRNCcvAh1 D3p+kTfKYjbcN0W8ps1pAHIWVUikHI3/WLZbUUBfjGl+AwGNjQT6LqlXhPCEHpP8O3lK JeMdNrClt9lKbTmvbgBNhb4wgb7XzjAd8Y4aG+qhhjj6KNZjpcCRbfBArGM1IUW2m40y hFYYWYgRi0gYf2KgZST7rDVeefHr4Atc0JAZIghXjlCg3cl+phSekj9A6UTwihiA3lb1 1cJFVn7829RPzDT4ESBcJJQBCUxsy32dGJ5/5hI/1vW3zqs3uJPKwj+x8QmYkMdudhKV sEWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=zT+iHjphKsz597MUGj9bT/k1CO7ELZX3vL6R9H/hLig=; b=RNlgakDFvpvM93FCP1GKTf8XUtIPry9t7jB7NUJIfJgdSl6zSgC1g7LhpuMhDjH0ND tyFg1D6YI9Vtqjsqx6ZB1FQKsCwoflhIgLVXP8BgRgFsZ3YOMgXgWl6SNS7YRqhPxgFy 7PYs8qbZnJsRKyMLA1aG6cigvKPBZVXE20qOKHxbfuwN3EzhIccVW/wSvxUEM90VH2ud jtR700YEpcXvbSQ6EEfQMmU+4Jl8Vu6XCS0NO9GJx8PfvDt1KKO3hN9vHWF/fAupq+ZM aEyTI95CDt87mEUFtGk8OAJbyqlWdoXaPMB+yvhXSgprnGXtndfsNlGTLHmLbZ3u/rNj VaFg== X-Gm-Message-State: AOAM532wmJ4gspctqiMpkIgAScrzefDIZbwt2NdbynMslPp/ADwBNvVB jKnZJpQLcpnFkE/CmC9sdeDW3/cbyyOwXA== X-Received: by 2002:a05:600c:2056:: with SMTP id p22mr4279066wmg.12.1613035645562; Thu, 11 Feb 2021 01:27:25 -0800 (PST) Received: from dell.default ([91.110.221.159]) by smtp.gmail.com with ESMTPSA id y5sm8335124wmi.10.2021.02.11.01.27.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Feb 2021 01:27:14 -0800 (PST) From: Lee Jones To: stable@vger.kernel.org Cc: zhengyejian@foxmail.com, Thomas Gleixner , Peter Zijlstra , Greg Kroah-Hartman , Lee Jones Subject: [PATCH 1/3] futex: Ensure the correct return value from futex_lock_pi() Date: Thu, 11 Feb 2021 09:26:58 +0000 Message-Id: <20210211092700.11772-2-lee.jones@linaro.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210211092700.11772-1-lee.jones@linaro.org> References: <20210211092700.11772-1-lee.jones@linaro.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Thomas Gleixner commit 12bb3f7f1b03d5913b3f9d4236a488aa7774dfe9 upstream In case that futex_lock_pi() was aborted by a signal or a timeout and the task returned without acquiring the rtmutex, but is the designated owner of the futex due to a concurrent futex_unlock_pi() fixup_owner() is invoked to establish consistent state. In that case it invokes fixup_pi_state_owner() which in turn tries to acquire the rtmutex again. If that succeeds then it does not propagate this success to fixup_owner() and futex_lock_pi() returns -EINTR or -ETIMEOUT despite having the futex locked. Return success from fixup_pi_state_owner() in all cases where the current task owns the rtmutex and therefore the futex and propagate it correctly through fixup_owner(). Fixup the other callsite which does not expect a positive return value. Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex") Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman [Lee: Back-ported in support of a previous futex attempt] Signed-off-by: Lee Jones --- kernel/futex.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) -- 2.25.1 diff --git a/kernel/futex.c b/kernel/futex.c index 83db5787c67ef..a43cf67c2fe91 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2322,7 +2322,7 @@ static int __fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q, } if (__rt_mutex_futex_trylock(&pi_state->pi_mutex)) { - /* We got the lock after all, nothing to fix. */ + /* We got the lock. pi_state is correct. Tell caller. */ return 1; } @@ -2364,7 +2364,7 @@ static int __fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q, */ pi_state_update_owner(pi_state, newowner); - return 0; + return argowner == current; /* * To handle the page fault we need to drop the hash bucket @@ -2447,8 +2447,6 @@ static long futex_wait_restart(struct restart_block *restart); */ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked) { - int ret = 0; - if (locked) { /* * Got the lock. We might not be the anticipated owner if we @@ -2459,8 +2457,8 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked) * stable state, anything else needs more attention. */ if (q->pi_state->owner != current) - ret = fixup_pi_state_owner(uaddr, q, current); - goto out; + return fixup_pi_state_owner(uaddr, q, current); + return 1; } /* @@ -2471,10 +2469,8 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked) * Another speculative read; pi_state->owner == current is unstable * but needs our attention. */ - if (q->pi_state->owner == current) { - ret = fixup_pi_state_owner(uaddr, q, NULL); - goto out; - } + if (q->pi_state->owner == current) + return fixup_pi_state_owner(uaddr, q, NULL); /* * Paranoia check. If we did not take the lock, then we should not be @@ -2483,8 +2479,7 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked) if (WARN_ON_ONCE(rt_mutex_owner(&q->pi_state->pi_mutex) == current)) return fixup_pi_state_owner(uaddr, q, current); -out: - return ret ? ret : locked; + return 0; } /** @@ -3106,6 +3101,11 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags, */ put_pi_state(q.pi_state); spin_unlock(q.lock_ptr); + /* + * Adjust the return value. It's either -EFAULT or + * success (1) but the caller expects 0 for success. + */ + ret = ret < 0 ? ret : 0; } } else { struct rt_mutex *pi_mutex; From patchwork Thu Feb 11 09:26:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lee Jones X-Patchwork-Id: 380896 Delivered-To: patch@linaro.org Received: by 2002:a02:b18a:0:0:0:0:0 with SMTP id t10csp1909687jah; Thu, 11 Feb 2021 01:31:57 -0800 (PST) X-Google-Smtp-Source: ABdhPJx3TVfVzJI0Smkxrx6kh1ZUBB1I3e6WkJAiPNslNvbMTPTbFND6MdFyzTzF/X3VPvV7RUBG X-Received: by 2002:a50:bf47:: with SMTP id g7mr7330844edk.323.1613035917137; Thu, 11 Feb 2021 01:31:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613035917; cv=none; d=google.com; s=arc-20160816; b=OYGma3p1+R33ZzSZDicrxToUBlNs306fixj6zCfp9VbL9OrZdm1jViuxjJsJcV1sZG WoCPCDoF/4dbRgtI3Vm1iCmbwFry36av1OPz596yUjJKw5oP6oR5d6kiPnqXTqIB/F5O 0xs1420SvVMAsiAlGCukGt+HPW6ydjbhYu0wwxZoNmPINSVR/yhDae194bXafy8nKSUI UWEywfNy24i4gnoM2K6nPLFuBJfrgtjaUgA1ekgZXECfOk6JQBMRJQmr2f2m3wbqC4Ws ch9bfEgj2kDlk/QyVFrBlWyzxoFlbY1jX9teXb2IVUI6F8D9+CBnn18roXoUQqcBs92E l8BQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=CxJjMXkMj/WyIEge7vE6JXU1t1YR/vT3MHOJ75WJKbg=; b=zL+K8kdOtCf3xFTVazwEiGY9wyoFHOf5w/Ofjp70rfblvFNZPXp8QC4NsUOzeNnCB0 fGvcgbODTUM/onJmZm2DUF7utiRsVxBDVvef9ehDSg1HBjBFIvZvauTTXOuuwTkPnwzA NgmzSzpqqya/6EAL8XP3t5TXNF07HuRNsoGhvbbWTM5EVQkfHe84hpFSdZufy3YjXchm tJMC0sN9Ka1SCSyOakO1Q/Dd+GdH5+pUySbvxbAlCGYuK0lYfxQu72hFkqgEHFCW2vsd h29EujBfUCIt8b2gamTlBIeDlrv2DfHb3IKnriFiYw8bIuG99vJ9WBKMHphbxbUS4AyK vQfQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=MP5Ab8cg; spf=pass (google.com: domain of stable-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=stable-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t20si3394112eju.239.2021.02.11.01.31.56; Thu, 11 Feb 2021 01:31:57 -0800 (PST) Received-SPF: pass (google.com: domain of stable-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=MP5Ab8cg; spf=pass (google.com: domain of stable-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=stable-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230063AbhBKJaZ (ORCPT + 13 others); Thu, 11 Feb 2021 04:30:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60222 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230080AbhBKJ2O (ORCPT ); Thu, 11 Feb 2021 04:28:14 -0500 Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D9492C06178B for ; Thu, 11 Feb 2021 01:27:28 -0800 (PST) Received: by mail-wm1-x330.google.com with SMTP id w4so4775582wmi.4 for ; Thu, 11 Feb 2021 01:27:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=CxJjMXkMj/WyIEge7vE6JXU1t1YR/vT3MHOJ75WJKbg=; b=MP5Ab8cgakCP4eYLpA3NexSZkhYH29ZLk5DAd1LojwAt8he8j7sWlmhbtLRrJR3tjR J7Zp3E65m52drtm9s/if8lfD2KMahbAkr4+Czhnz2zcuYdSqFQ524hHI4bm5FoA+5KeN 0Y9yutoN9qQX9Cmvc5+DsCai5MEV8AuNTb0P8zW/cfS35oN/47gGQwLe3cZ2m6JjCH0Y Jgsx1cqfnn8J4PKPWC2yBIo94TnhXBK/lvYQmc2W3oqItex/xV19aYM9QMAGhVyo//UR SmK7a/1W3wDSJe7vDIIW55olJ63lzBCVj8ep1UUwApxZng5Htn2fxRtCEkWNizLaW4jx aFtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=CxJjMXkMj/WyIEge7vE6JXU1t1YR/vT3MHOJ75WJKbg=; b=Uy9usiQ7g+PT1nHnX4dar6RI8SYWBS8g1X8MJtWLLb1zQW5gaQf+qhd8kpU6yjyboW tAcUr+vEU8+9L9biHow2WnXwJInxsueu97YYFjumFF/icsSuOFbUQuINkNBD5UOdkL9o DA2ZZQ3ZczA31jh6OfAoy3djpC2qyhog2VKXOjZcqhcoZet8Zgbg9rASA9/N8+/yEyav K6BRYQPfMJQdPzCWbGv48EWuhf+m6EtaOrGmc4MVyeOEsv1X61HeRgoP5X8HgPZelg+0 UNx4OfED1O17rQT7a7stPTovj2swZzWCSMlgG2Sy21spxH9+dwSL96e7sg+o1veY++Qs cRaw== X-Gm-Message-State: AOAM531uqHVnnST28UtQ7x4W6WaBLJCCvfDhIyEyZi0Spu8+epH++UMT 1OUicHpGl6WpirA9jVCSzL2T6Ls+o5FKhw== X-Received: by 2002:a7b:c193:: with SMTP id y19mr4132390wmi.23.1613035647285; Thu, 11 Feb 2021 01:27:27 -0800 (PST) Received: from dell.default ([91.110.221.159]) by smtp.gmail.com with ESMTPSA id y5sm8335124wmi.10.2021.02.11.01.27.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Feb 2021 01:27:26 -0800 (PST) From: Lee Jones To: stable@vger.kernel.org Cc: zhengyejian@foxmail.com, Peter Zijlstra , juri.lelli@arm.com, bigeasy@linutronix.de, xlpang@redhat.com, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, jdesfossez@efficios.com, dvhart@infradead.org, bristot@redhat.com, Thomas Gleixner , Lee Jones Subject: [PATCH 2/3] futex: Change locking rules Date: Thu, 11 Feb 2021 09:26:59 +0000 Message-Id: <20210211092700.11772-3-lee.jones@linaro.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210211092700.11772-1-lee.jones@linaro.org> References: <20210211092700.11772-1-lee.jones@linaro.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Peter Zijlstra Currently futex-pi relies on hb->lock to serialize everything. But hb->lock creates another set of problems, especially priority inversions on RT where hb->lock becomes a rt_mutex itself. The rt_mutex::wait_lock is the most obvious protection for keeping the futex user space value and the kernel internal pi_state in sync. Rework and document the locking so rt_mutex::wait_lock is held accross all operations which modify the user space value and the pi state. This allows to invoke rt_mutex_unlock() (including deboost) without holding hb->lock as a next step. Nothing yet relies on the new locking rules. Signed-off-by: Peter Zijlstra (Intel) Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104151.751993333@infradead.org Signed-off-by: Thomas Gleixner [Lee: Back-ported in support of a previous futex back-port attempt] Signed-off-by: Lee Jones --- kernel/futex.c | 138 +++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 112 insertions(+), 26 deletions(-) -- 2.25.1 diff --git a/kernel/futex.c b/kernel/futex.c index a43cf67c2fe91..829e897c8883b 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -1019,6 +1019,39 @@ static void exit_pi_state_list(struct task_struct *curr) * [10] There is no transient state which leaves owner and user space * TID out of sync. Except one error case where the kernel is denied * write access to the user address, see fixup_pi_state_owner(). + * + * + * Serialization and lifetime rules: + * + * hb->lock: + * + * hb -> futex_q, relation + * futex_q -> pi_state, relation + * + * (cannot be raw because hb can contain arbitrary amount + * of futex_q's) + * + * pi_mutex->wait_lock: + * + * {uval, pi_state} + * + * (and pi_mutex 'obviously') + * + * p->pi_lock: + * + * p->pi_state_list -> pi_state->list, relation + * + * pi_state->refcount: + * + * pi_state lifetime + * + * + * Lock order: + * + * hb->lock + * pi_mutex->wait_lock + * p->pi_lock + * */ /* @@ -1026,10 +1059,12 @@ static void exit_pi_state_list(struct task_struct *curr) * the pi_state against the user space value. If correct, attach to * it. */ -static int attach_to_pi_state(u32 uval, struct futex_pi_state *pi_state, +static int attach_to_pi_state(u32 __user *uaddr, u32 uval, + struct futex_pi_state *pi_state, struct futex_pi_state **ps) { pid_t pid = uval & FUTEX_TID_MASK; + int ret, uval2; /* * Userspace might have messed up non-PI and PI futexes [3] @@ -1037,8 +1072,33 @@ static int attach_to_pi_state(u32 uval, struct futex_pi_state *pi_state, if (unlikely(!pi_state)) return -EINVAL; + /* + * We get here with hb->lock held, and having found a + * futex_top_waiter(). This means that futex_lock_pi() of said futex_q + * has dropped the hb->lock in between queue_me() and unqueue_me_pi(), + * which in turn means that futex_lock_pi() still has a reference on + * our pi_state. + */ WARN_ON(!atomic_read(&pi_state->refcount)); + /* + * Now that we have a pi_state, we can acquire wait_lock + * and do the state validation. + */ + raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); + + /* + * Since {uval, pi_state} is serialized by wait_lock, and our current + * uval was read without holding it, it can have changed. Verify it + * still is what we expect it to be, otherwise retry the entire + * operation. + */ + if (get_futex_value_locked(&uval2, uaddr)) + goto out_efault; + + if (uval != uval2) + goto out_eagain; + /* * Handle the owner died case: */ @@ -1054,11 +1114,11 @@ static int attach_to_pi_state(u32 uval, struct futex_pi_state *pi_state, * is not 0. Inconsistent state. [5] */ if (pid) - return -EINVAL; + goto out_einval; /* * Take a ref on the state and return success. [4] */ - goto out_state; + goto out_attach; } /* @@ -1070,14 +1130,14 @@ static int attach_to_pi_state(u32 uval, struct futex_pi_state *pi_state, * Take a ref on the state and return success. [6] */ if (!pid) - goto out_state; + goto out_attach; } else { /* * If the owner died bit is not set, then the pi_state * must have an owner. [7] */ if (!pi_state->owner) - return -EINVAL; + goto out_einval; } /* @@ -1086,11 +1146,29 @@ static int attach_to_pi_state(u32 uval, struct futex_pi_state *pi_state, * user space TID. [9/10] */ if (pid != task_pid_vnr(pi_state->owner)) - return -EINVAL; -out_state: + goto out_einval; + +out_attach: atomic_inc(&pi_state->refcount); + raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); *ps = pi_state; return 0; + +out_einval: + ret = -EINVAL; + goto out_error; + +out_eagain: + ret = -EAGAIN; + goto out_error; + +out_efault: + ret = -EFAULT; + goto out_error; + +out_error: + raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); + return ret; } /** @@ -1183,6 +1261,9 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key, /* * No existing pi state. First waiter. [2] + * + * This creates pi_state, we have hb->lock held, this means nothing can + * observe this state, wait_lock is irrelevant. */ pi_state = alloc_pi_state(); @@ -1207,7 +1288,8 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key, return 0; } -static int lookup_pi_state(u32 uval, struct futex_hash_bucket *hb, +static int lookup_pi_state(u32 __user *uaddr, u32 uval, + struct futex_hash_bucket *hb, union futex_key *key, struct futex_pi_state **ps, struct task_struct **exiting) { @@ -1218,7 +1300,7 @@ static int lookup_pi_state(u32 uval, struct futex_hash_bucket *hb, * attach to the pi_state when the validation succeeds. */ if (match) - return attach_to_pi_state(uval, match->pi_state, ps); + return attach_to_pi_state(uaddr, uval, match->pi_state, ps); /* * We are the first waiter - try to look up the owner based on @@ -1237,7 +1319,7 @@ static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval) if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, uval, newval))) return -EFAULT; - /*If user space value changed, let the caller retry */ + /* If user space value changed, let the caller retry */ return curval != uval ? -EAGAIN : 0; } @@ -1301,7 +1383,7 @@ static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb, */ match = futex_top_waiter(hb, key); if (match) - return attach_to_pi_state(uval, match->pi_state, ps); + return attach_to_pi_state(uaddr, uval, match->pi_state, ps); /* * No waiter and user TID is 0. We are here because the @@ -1441,6 +1523,7 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this, if (cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)) { ret = -EFAULT; + } else if (curval != uval) { /* * If a unconditional UNLOCK_PI operation (user space did not @@ -1977,7 +2060,7 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags, * If that call succeeds then we have pi_state and an * initial refcount on it. */ - ret = lookup_pi_state(ret, hb2, &key2, + ret = lookup_pi_state(uaddr2, ret, hb2, &key2, &pi_state, &exiting); } @@ -2282,7 +2365,6 @@ static int __fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q, int err = 0; oldowner = pi_state->owner; - /* Owner died? */ if (!pi_state->owner) newtid |= FUTEX_OWNER_DIED; @@ -2305,11 +2387,10 @@ static int __fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q, * because we can fault here. Imagine swapped out pages or a fork * that marked all the anonymous memory readonly for cow. * - * Modifying pi_state _before_ the user space value would - * leave the pi_state in an inconsistent state when we fault - * here, because we need to drop the hash bucket lock to - * handle the fault. This might be observed in the PID check - * in lookup_pi_state. + * Modifying pi_state _before_ the user space value would leave the + * pi_state in an inconsistent state when we fault here, because we + * need to drop the locks to handle the fault. This might be observed + * in the PID check in lookup_pi_state. */ retry: if (!argowner) { @@ -2367,21 +2448,26 @@ static int __fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q, return argowner == current; /* - * To handle the page fault we need to drop the hash bucket - * lock here. That gives the other task (either the highest priority - * waiter itself or the task which stole the rtmutex) the - * chance to try the fixup of the pi_state. So once we are - * back from handling the fault we need to check the pi_state - * after reacquiring the hash bucket lock and before trying to - * do another fixup. When the fixup has been done already we - * simply return. + * To handle the page fault we need to drop the locks here. That gives + * the other task (either the highest priority waiter itself or the + * task which stole the rtmutex) the chance to try the fixup of the + * pi_state. So once we are back from handling the fault we need to + * check the pi_state after reacquiring the locks and before trying to + * do another fixup. When the fixup has been done already we simply + * return. + * + * Note: we hold both hb->lock and pi_mutex->wait_lock. We can safely + * drop hb->lock since the caller owns the hb -> futex_q relation. + * Dropping the pi_mutex->wait_lock requires the state revalidate. */ handle_fault: + raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); spin_unlock(q->lock_ptr); err = fault_in_user_writeable(uaddr); spin_lock(q->lock_ptr); + raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); /* * Check if someone else fixed it for us: From patchwork Thu Feb 11 09:27:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lee Jones X-Patchwork-Id: 380897 Delivered-To: patch@linaro.org Received: by 2002:a02:b18a:0:0:0:0:0 with SMTP id t10csp1909805jah; Thu, 11 Feb 2021 01:32:07 -0800 (PST) X-Google-Smtp-Source: ABdhPJwNW+YiITnM1vYdtDr0GjmARvYPBY/GIujUPSqteYK1zlyXHO4Y4XOX+yDwtZvSRZkGgh2d X-Received: by 2002:a17:906:388a:: with SMTP id q10mr7455253ejd.496.1613035927300; Thu, 11 Feb 2021 01:32:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613035927; cv=none; d=google.com; s=arc-20160816; b=pBGbl+AmwU+DtbzZ2kYbC7g0cDFlTFGWQWE3Oh8xsRco8BBF+ZEnKmC6g3JIXxRja5 LMS8ATA+pOnmXI0/3W2c2NnrgFmgKwNlGI+X3VEQbmCktfZPC5LPF46WvJIYC1tYKSn8 sn1nFfptqk9RLHyR/cYO5PLnydrqYW0SHwl3u2tCXugVi3Gj0TkDcJUu6wbdhJHgKuOG ovXGKZOP6j2hCYzPbQVTg7XehDr5UZ05hMjZJrWkPW42Vf2bDxWaTDZgTGN4aFDTxwnf OFw/dq8AJtvhsjPqIGKcIriLOMe7Ok1OUN+izmyY6Fr0/NKEkklr90pTj43Ot1aFbVBo 6MkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=mfHbazlOCX/UlXyuDmE+5dGDxJaOZ2y4lG5YFAFcBjU=; b=oPSRp3lnx4Q/a081YnhpsAcYJWSFgvpVS5a5ozhfEEnlda/xGYXZDaDetfnaXRTxQJ wZEh8hjY6oCJkcM2kGTxF2Ufk2X8xG2PV3eMuWQxSYTyBRhN7l+0lh4ygQNikKCfW2Hf ErmeKL5Ra6AWb9pE8nuKaOMYr0gM3oyEvgvpCe7JBO8j0RB/+vSlaUyZULUBX7OCrJ82 xDTnB8deW+lPZxeMZE0nqNJw+3GKrGTAK40//ji6EQ4ha0JDF9vtX0wg3nwvzlGEi8ZK lxVBy8n9PhICvrxITfRzKeFYpBrTWBABI8muoCvw1LxCtAdBnQ90qW4FM9ZLyH0OBJcp MSuA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=y8mTOVEk; spf=pass (google.com: domain of stable-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=stable-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t20si3394112eju.239.2021.02.11.01.32.03; Thu, 11 Feb 2021 01:32:07 -0800 (PST) Received-SPF: pass (google.com: domain of stable-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=y8mTOVEk; spf=pass (google.com: domain of stable-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=stable-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230118AbhBKJbo (ORCPT + 13 others); Thu, 11 Feb 2021 04:31:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60396 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229892AbhBKJ3A (ORCPT ); Thu, 11 Feb 2021 04:29:00 -0500 Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EDF6EC06178C for ; Thu, 11 Feb 2021 01:27:43 -0800 (PST) Received: by mail-wm1-x329.google.com with SMTP id m1so4930436wml.2 for ; Thu, 11 Feb 2021 01:27:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=mfHbazlOCX/UlXyuDmE+5dGDxJaOZ2y4lG5YFAFcBjU=; b=y8mTOVEkO/To5S7VtQcSqAXJCP374GI9JpGeBTUHMva/ovyzkwuK1s3bE9lNqkdSO5 S5gbnCbsuQftPnedu528OJ/9MtkV+c3OVSlTP7Jv16Mv4TeZJ7RRieB0RhY01vV5Pm+z 7BhW9Nz/yeIC+Lyj8crI04EsugQljn2sAaymCGqwFSjyhNORRrdXmx2NslpDW2t5lc1i MmmMAv9PhZLyGp6Pbo9xG06Oc9/9moaYiFRh02gH9mBx4tZZTrJQ75VdTo+OzjdkctK8 xSto6sy8u+ZLdHV508c9pViOAg5lDLz3h1WmV/SYq9HOW0YWY+58yAUlQsc6xqm77LT2 kvoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=mfHbazlOCX/UlXyuDmE+5dGDxJaOZ2y4lG5YFAFcBjU=; b=PfSKfHWVf9BfLYEIw+MfbXoNe9GEvffYLwQVMzMSjWddZcZRafHZOXsEpeOIbCC65D bYFw6y1cmM2Wzv91OlC9JOBSqChM8GJMdov5rjdoVIGo/J/ijzpxjlK31lScXEj4DA/g UK0C18TtthJ5G59JGfNTEKUvYifd4gO5hvafqsw5yGXec+h7bW7qFMQ4Qj2B4/dS8Y9U 0lzfCCb6tRGKp49aPFTpg7oIjjyl4olMTViYrHOCNgNkvCgJbUgG2sftZtz1nO5ptjMz nuzNX88Bl5Q/MpefCpviZwq772xN3mUys4iFqDFWscc+Qr0nh5dFAJnNZT9QWryOsFVB J5Vw== X-Gm-Message-State: AOAM530bACwZgxOwrZfLYyTbvkaeOnH4jeg2GVwjOsHD4ooupKxR+OGo GMfqmzte3WxTj4X3fpbLa4wpqDOe6tmutg== X-Received: by 2002:a7b:c942:: with SMTP id i2mr4300829wml.187.1613035662379; Thu, 11 Feb 2021 01:27:42 -0800 (PST) Received: from dell.default ([91.110.221.159]) by smtp.gmail.com with ESMTPSA id y5sm8335124wmi.10.2021.02.11.01.27.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Feb 2021 01:27:31 -0800 (PST) From: Lee Jones To: stable@vger.kernel.org Cc: zhengyejian@foxmail.com, Thomas Gleixner , Stefan Liebler , Peter Zijlstra , Heiko Carstens , Darren Hart , Ingo Molnar , Sasha Levin , Sudip Mukherjee , Greg Kroah-Hartman , Lee Jones Subject: [PATCH 3/3] futex: Cure exit race Date: Thu, 11 Feb 2021 09:27:00 +0000 Message-Id: <20210211092700.11772-4-lee.jones@linaro.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210211092700.11772-1-lee.jones@linaro.org> References: <20210211092700.11772-1-lee.jones@linaro.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Thomas Gleixner commit da791a667536bf8322042e38ca85d55a78d3c273 upstream. Stefan reported, that the glibc tst-robustpi4 test case fails occasionally. That case creates the following race between sys_exit() and sys_futex_lock_pi(): CPU0 CPU1 sys_exit() sys_futex() do_exit() futex_lock_pi() exit_signals(tsk) No waiters: tsk->flags |= PF_EXITING; *uaddr == 0x00000PID mm_release(tsk) Set waiter bit exit_robust_list(tsk) { *uaddr = 0x80000PID; Set owner died attach_to_pi_owner() { *uaddr = 0xC0000000; tsk = get_task(PID); } if (!tsk->flags & PF_EXITING) { ... attach(); tsk->flags |= PF_EXITPIDONE; } else { if (!(tsk->flags & PF_EXITPIDONE)) return -EAGAIN; return -ESRCH; <--- FAIL } ESRCH is returned all the way to user space, which triggers the glibc test case assert. Returning ESRCH unconditionally is wrong here because the user space value has been changed by the exiting task to 0xC0000000, i.e. the FUTEX_OWNER_DIED bit is set and the futex PID value has been cleared. This is a valid state and the kernel has to handle it, i.e. taking the futex. Cure it by rereading the user space value when PF_EXITING and PF_EXITPIDONE is set in the task which 'owns' the futex. If the value has changed, let the kernel retry the operation, which includes all regular sanity checks and correctly handles the FUTEX_OWNER_DIED case. If it hasn't changed, then return ESRCH as there is no way to distinguish this case from malfunctioning user space. This happens when the exiting task did not have a robust list, the robust list was corrupted or the user space value in the futex was simply bogus. Reported-by: Stefan Liebler Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra Cc: Heiko Carstens Cc: Darren Hart Cc: Ingo Molnar Cc: Sasha Levin Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=200467 Link: https://lkml.kernel.org/r/20181210152311.986181245@linutronix.de Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman [Lee: Required to satisfy functional dependency from futex back-port. Re-add the missing handle_exit_race() parts from: 3d4775df0a89 ("futex: Replace PF_EXITPIDONE with a state")] Signed-off-by: Lee Jones --- kernel/futex.c | 71 +++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 65 insertions(+), 6 deletions(-) -- 2.25.1 diff --git a/kernel/futex.c b/kernel/futex.c index 829e897c8883b..b65dbb5d60bb1 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -1201,11 +1201,67 @@ static void wait_for_owner_exiting(int ret, struct task_struct *exiting) put_task_struct(exiting); } +static int handle_exit_race(u32 __user *uaddr, u32 uval, + struct task_struct *tsk) +{ + u32 uval2; + + /* + * If the futex exit state is not yet FUTEX_STATE_DEAD, wait + * for it to finish. + */ + if (tsk && tsk->futex_state != FUTEX_STATE_DEAD) + return -EAGAIN; + + /* + * Reread the user space value to handle the following situation: + * + * CPU0 CPU1 + * + * sys_exit() sys_futex() + * do_exit() futex_lock_pi() + * futex_lock_pi_atomic() + * exit_signals(tsk) No waiters: + * tsk->flags |= PF_EXITING; *uaddr == 0x00000PID + * mm_release(tsk) Set waiter bit + * exit_robust_list(tsk) { *uaddr = 0x80000PID; + * Set owner died attach_to_pi_owner() { + * *uaddr = 0xC0000000; tsk = get_task(PID); + * } if (!tsk->flags & PF_EXITING) { + * ... attach(); + * tsk->futex_state = } else { + * FUTEX_STATE_DEAD; if (tsk->futex_state != + * FUTEX_STATE_DEAD) + * return -EAGAIN; + * return -ESRCH; <--- FAIL + * } + * + * Returning ESRCH unconditionally is wrong here because the + * user space value has been changed by the exiting task. + * + * The same logic applies to the case where the exiting task is + * already gone. + */ + if (get_futex_value_locked(&uval2, uaddr)) + return -EFAULT; + + /* If the user space value has changed, try again. */ + if (uval2 != uval) + return -EAGAIN; + + /* + * The exiting task did not have a robust list, the robust list was + * corrupted or the user space value in *uaddr is simply bogus. + * Give up and tell user space. + */ + return -ESRCH; +} + /* * Lookup the task for the TID provided from user space and attach to * it after doing proper sanity checks. */ -static int attach_to_pi_owner(u32 uval, union futex_key *key, +static int attach_to_pi_owner(u32 __user *uaddr, u32 uval, union futex_key *key, struct futex_pi_state **ps, struct task_struct **exiting) { @@ -1216,12 +1272,15 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key, /* * We are the first waiter - try to look up the real owner and attach * the new pi_state to it, but bail out when TID = 0 [1] + * + * The !pid check is paranoid. None of the call sites should end up + * with pid == 0, but better safe than sorry. Let the caller retry */ if (!pid) - return -ESRCH; + return -EAGAIN; p = futex_find_get_task(pid); if (!p) - return -ESRCH; + return handle_exit_race(uaddr, uval, NULL); if (unlikely(p->flags & PF_KTHREAD)) { put_task_struct(p); @@ -1240,7 +1299,7 @@ static int attach_to_pi_owner(u32 uval, union futex_key *key, * FUTEX_STATE_DEAD, we know that the task has finished * the cleanup: */ - int ret = (p->futex_state = FUTEX_STATE_DEAD) ? -ESRCH : -EAGAIN; + int ret = handle_exit_race(uaddr, uval, p); raw_spin_unlock_irq(&p->pi_lock); /* @@ -1306,7 +1365,7 @@ static int lookup_pi_state(u32 __user *uaddr, u32 uval, * We are the first waiter - try to look up the owner based on * @uval and attach to it. */ - return attach_to_pi_owner(uval, key, ps, exiting); + return attach_to_pi_owner(uaddr, uval, key, ps, exiting); } static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval) @@ -1422,7 +1481,7 @@ static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb, * attach to the owner. If that fails, no harm done, we only * set the FUTEX_WAITERS bit in the user space variable. */ - return attach_to_pi_owner(uval, key, ps, exiting); + return attach_to_pi_owner(uaddr, newval, key, ps, exiting); } /**