From patchwork Thu Sep 28 14:33:05 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gabriele Paoloni X-Patchwork-Id: 114452 Delivered-To: patch@linaro.org Received: by 10.140.106.117 with SMTP id d108csp816011qgf; Thu, 28 Sep 2017 07:33:29 -0700 (PDT) X-Google-Smtp-Source: AOwi7QBt0GKyzKFk1Qu06V/cVso81LdpsGgSYDIBChn1KLtCKBtzSCiaECmO2itNNT88/NWpyeMr X-Received: by 10.84.241.1 with SMTP id a1mr4269190pll.199.1506609209210; Thu, 28 Sep 2017 07:33:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1506609209; cv=none; d=google.com; s=arc-20160816; b=bczPQ6xjnTAa13vadTbiUZqkuwncyHkqKlmApcUxqsJ+DtB+0x68fUoCUUyVItpHSC GUB1DA7LgJdGdbSSmF1GJh07u7rpT6L9y3w8dAiPCCYMz/JwjKIPE8w2rx2GyUBsFGPB SSwCBuH1iumUiTYStSPYokXPSi75a1RBCl0VLL71txzp6wP/h8rmo8gU4Y4vaQmaWLoP jM5ePp0JE0NjvmB0JvwKk5XMfzSCBYfAvdDk6hQs2WX3SIgqcK1i7AeR6c3VhWPR+5IR YT++U8IDHYJHZWbZpQ0KpXlw4S/q5jNqIsR3MvNhgwroqiQ8mCucn4FzXEU4NcjBnIjg WkCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:subject:cc :to:from:arc-authentication-results; bh=xj6+cqkfkpOkgdoBEi+Fig+1yuDRuVsTx2oaKRRgLZY=; b=V5WrlA61dadDcp63zy6MFEfWKpwLimIY8tuYFQBiQHG5lkItgVx5pBsRNlzna7zGJ5 u+t9Pncsx4YHLUbnuBQzakHHF/Aafb738fMFIg7Tmir+yy3Mb0RGJDxr8Bs5NcvhWCXT SAYpDjrE4EBO2NObyeEjEPca+vLzcq07JZLDlEvfm1Kkf5K6v+AxV7BysqwRNmMOw8TF lhIWkL3yLp4Y1KiBqBPGiyiZKV++rC+BCTXAImb8XoF4tvDuevKQq3ugDNCf75y4Rndg sEAJw17gFQCDlcsCZ8KemPlSzqFib3gnL+LPVECc5nPQJ0nHlmHO7NZoyY7PHzVj6I12 ZIpQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g59si1519548plb.95.2017.09.28.07.33.28; Thu, 28 Sep 2017 07:33:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932117AbdI1Od1 (ORCPT + 26 others); Thu, 28 Sep 2017 10:33:27 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:7466 "EHLO szxga04-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752484AbdI1OdZ (ORCPT ); Thu, 28 Sep 2017 10:33:25 -0400 Received: from 172.30.72.58 (EHLO DGGEMS411-HUB.china.huawei.com) ([172.30.72.58]) by dggrg04-dlp.huawei.com (MOS 4.4.6-GA FastPath queued) with ESMTP id DIC96935; Thu, 28 Sep 2017 22:33:20 +0800 (CST) Received: from G00308965-DELL1.china.huawei.com (10.203.181.163) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.301.0; Thu, 28 Sep 2017 22:33:12 +0800 From: Gabriele Paoloni To: , CC: , , , , Subject: [PATCH v3] PCIe AER: report uncorrectable errors only to the functions that logged the errors Date: Thu, 28 Sep 2017 15:33:05 +0100 Message-ID: <1506609185-8800-1-git-send-email-gabriele.paoloni@huawei.com> X-Mailer: git-send-email 2.7.1.windows.1 MIME-Version: 1.0 X-Originating-IP: [10.203.181.163] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020203.59CD0832.002B, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: b33573abdabf6110bc8d2daa27808886 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently if an uncorrectable error is reported by an EP the AER driver walks over all the devices connected to the upstream port bus and in turns call the report_error_detected() callback. If any of the devices connected to the bus does not implement dev->driver->err_handler->error_detected() do_recovery() will fail leaving all the bus hierarchy devices unrecovered. According to section "6.2.2.2.2. Non-Fatal Errors" of the PCIe specs << Non-fatal errors are uncorrectable errors which cause a particular transaction to be unreliable but the Link is otherwise fully functional. Isolating Non-fatal from Fatal errors provides Requester/Receiver logic in a device or system management software the opportunity to recover from the error without resetting the components on the Link and disturbing other transactions in progress. Devices not associated with the transaction in error are not impacted by the error.>> therefore for non fatal errors the PCIe link should not be considered compromised and it makes sense to report the error only to all the functions that logged an error. This patch implements this new behaviour for non fatal errors. Also this patch fixes a bug (filed as in the link below) Link: https://bugzilla.kernel.org/show_bug.cgi?id=197055 Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver") Signed-off-by: Gabriele Paoloni Signed-off-by: Dongdong Liu --- Changes from v2: - no functional changes - Added reference in the commit log to the bugzilla ticket - Added reference in the commit log the commit that this patch fixes - Added reference in the commit log to the PCIe specs for Non-fatal error handling rules Changes from v1: - now errors are reported only to the fucntions that logged the error instead of all the functions in the same device. - the patch subject has changed to match the new implementation --- drivers/pci/pcie/aer/aerdrv_core.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) -- 2.7.4 diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c index 890efcc..7448052 100644 --- a/drivers/pci/pcie/aer/aerdrv_core.c +++ b/drivers/pci/pcie/aer/aerdrv_core.c @@ -390,7 +390,14 @@ static pci_ers_result_t broadcast_error_message(struct pci_dev *dev, * If the error is reported by an end point, we think this * error is related to the upstream link of the end point. */ - pci_walk_bus(dev->bus, cb, &result_data); + if (state == pci_channel_io_normal) + /* + * the error is non fatal so the bus is ok, just invoke + * the callback for the function that logged the error. + */ + cb(dev, &result_data); + else + pci_walk_bus(dev->bus, cb, &result_data); } return result_data.result;