From patchwork Fri May 13 14:24:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg Kroah-Hartman X-Patchwork-Id: 572509 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F97CC433EF for ; Fri, 13 May 2022 14:31:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381019AbiEMOby (ORCPT ); Fri, 13 May 2022 10:31:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46500 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1381135AbiEMOa5 (ORCPT ); Fri, 13 May 2022 10:30:57 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1866C994EA; Fri, 13 May 2022 07:28:31 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6AA3E61F99; Fri, 13 May 2022 14:28:30 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 43FEAC34100; Fri, 13 May 2022 14:28:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1652452109; bh=SZDidThTxrSwy7TK6gVo0MGXhtuEdWJ6gJKEZ+IvH8o=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=vCT/7Gq6CyZDRyC5dP/I/EwNyIqx8o+a1jyqAoROdrav9/kb6FBkIhAQ/T3tMUy9i 5eTIc9gErtY66ApecnXN2LxGFA1UnfauE+Cqyzoqt0c3oXJtYJoEErjykS0k1fHPsh 9eDNYmByxz0u47vzmr7z+oxfH+I5+FL7pXaGL+qA= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Naoya Horiguchi , Youquan Song , Tony Luck , Andrew Morton , Linus Torvalds Subject: [PATCH 5.15 19/21] mm/hwpoison: fix error page recovered but reported "not recovered" Date: Fri, 13 May 2022 16:24:01 +0200 Message-Id: <20220513142230.431675712@linuxfoundation.org> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220513142229.874949670@linuxfoundation.org> References: <20220513142229.874949670@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Naoya Horiguchi commit 046545a661af2beec21de7b90ca0e35f05088a81 upstream. When an uncorrected memory error is consumed there is a race between the CMCI from the memory controller reporting an uncorrected error with a UCNA signature, and the core reporting and SRAR signature machine check when the data is about to be consumed. If the CMCI wins that race, the page is marked poisoned when uc_decode_notifier() calls memory_failure() and the machine check processing code finds the page already poisoned. It calls kill_accessing_process() to make sure a SIGBUS is sent. But returns the wrong error code. Console log looks like this: mce: Uncorrected hardware memory error in user-access at 3710b3400 Memory failure: 0x3710b3: recovery action for dirty LRU page: Recovered Memory failure: 0x3710b3: already hardware poisoned Memory failure: 0x3710b3: Sending SIGBUS to einj_mem_uc:361438 due to hardware memory corruption mce: Memory error not recovered kill_accessing_process() is supposed to return -EHWPOISON to notify that SIGBUS is already set to the process and kill_me_maybe() doesn't have to send it again. But current code simply fails to do this, so fix it to make sure to work as intended. This change avoids the noise message "Memory error not recovered" and skips duplicate SIGBUSs. [tony.luck@intel.com: reword some parts of commit message] Link: https://lkml.kernel.org/r/20220113231117.1021405-1-naoya.horiguchi@linux.dev Fixes: a3f5d80ea401 ("mm,hwpoison: send SIGBUS with error virutal address") Signed-off-by: Naoya Horiguchi Reported-by: Youquan Song Cc: Tony Luck Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/memory-failure.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -705,8 +705,10 @@ static int kill_accessing_process(struct (void *)&priv); if (ret == 1 && priv.tk.addr) kill_proc(&priv.tk, pfn, flags); + else + ret = 0; mmap_read_unlock(p->mm); - return ret ? -EFAULT : -EHWPOISON; + return ret > 0 ? -EHWPOISON : -EFAULT; } static const char *action_name[] = {