From patchwork Wed May 29 08:44:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robert Richter X-Patchwork-Id: 165342 Delivered-To: patch@linaro.org Received: by 2002:a92:9e1a:0:0:0:0:0 with SMTP id q26csp9580449ili; Wed, 29 May 2019 01:45:18 -0700 (PDT) X-Google-Smtp-Source: APXvYqwB1bVfAktLyAayfBEVQxdIQmq7SfzJp02fsORwx5yNwsstBsO8+W/0yRe0+ZZdXCg//9hZ X-Received: by 2002:a17:902:ac82:: with SMTP id h2mr80604535plr.303.1559119518819; Wed, 29 May 2019 01:45:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559119518; cv=none; d=google.com; s=arc-20160816; b=ANFFXRFHFLc9kLRYDvYhMWS/CWhLtCXlytG+eCbUMBwFvNj/f0BWiVESAYiDtqqIP3 bUedyUOWPtAAxXwMBP+ouXfYGosPj/T/ub7AS1BuQWuchIqRnxpU1T7pAXUtWXnvfaW3 x+WxSqfTtVJVcy8POmA/6HzZzWDbubaScpmEi9ocku9DTgQoxaWEBo1n6k43tfhJAkV1 NAP5BtFjHawdLzn66omI/7hVvtRgEwl3+nMs9Qdf8mpv+fsjQSpWwBQkhWrak3rHBOCl Iu4ACApNJgOeEUD88NmcdcaNM9VaBhs5L/oDs2IErifpXiT72gvNtqa4dTPzRxm8cWG3 vXDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-language:accept-language:in-reply-to:references:message-id :date:thread-index:thread-topic:subject:cc:to:from:dkim-signature :dkim-signature; bh=GUy40+DnrF6OKmv6P4tIQfRAjMx1FBjH2O7IeBlDwQA=; b=SxbHtWqHugFczT6NsQruSjYuJni4qE9kRbX+MlCKXbwVksnQvJs7xPDUgxRLreNp47 cZNKbymqXtuvu0fUWaOqJK25tx+7c0vdNcxk6EsQScIdQJ6tlK1YyJHRzqrVeOqXInuH /jIiDZi/FIkbzPZghZbE2hAbVTOps5ivIDImwfzz8/MogM0JKBKLJHYrvHLsGx68cZKH /eDlSw7KecKVprsR1aw5VOCqY6bihJQHrwS4Bhnp0Qf/qzf88Ilx0utNhQ2ROmgrguB9 S9SI4VeWzIXTvYzBn1ROyJTtgA0CKkEABvAn/nCA4oVderotgiVotJUFO1B7ID2LYJ+M mVjg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@marvell.com header.s=pfpt0818 header.b=AMUvekIg; dkim=pass header.i=@marvell.onmicrosoft.com header.s=selector2-marvell-onmicrosoft-com header.b=mAGOAFym; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=marvell.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c30si444467pje.1.2019.05.29.01.45.18; Wed, 29 May 2019 01:45:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@marvell.com header.s=pfpt0818 header.b=AMUvekIg; dkim=pass header.i=@marvell.onmicrosoft.com header.s=selector2-marvell-onmicrosoft-com header.b=mAGOAFym; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=marvell.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726155AbfE2IpQ (ORCPT + 30 others); Wed, 29 May 2019 04:45:16 -0400 Received: from mx0b-0016f401.pphosted.com ([67.231.156.173]:49664 "EHLO mx0b-0016f401.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726901AbfE2Ioz (ORCPT ); Wed, 29 May 2019 04:44:55 -0400 Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x4T8doO7017780; Wed, 29 May 2019 01:44:49 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pfpt0818; bh=GUy40+DnrF6OKmv6P4tIQfRAjMx1FBjH2O7IeBlDwQA=; b=AMUvekIgvWcZh/XwYCRPpO1FWMI1WcyVEUNrpnWqtbjxpQjSK1hmXnAxtsm4f6xyb8zN N7PmFgR2R2b4AVU2sETTkXNvrgWUIBq/hag3nijMZL5o2PvisKD1Wluft67eMxTLiprT i1o34zepXX7OFF7ulI6+c9+S8OWR8eXlO/2e4BeR3eZfvM5t+otIrLyFxbSx44pXjb9G 4wrbWilvRc7oILMIdkZU1X9o/+SdEIgQAo+QdWgdQss7iwxVR17fJXiozXUK8qNr/gyC bLvD1l8EElqoC2mmVsRiKOa3KYmDRu95XbAWmSg8Jz8jkN6UdfKacglixGPc+Jazboi5 oQ== Received: from sc-exch03.marvell.com ([199.233.58.183]) by mx0b-0016f401.pphosted.com with ESMTP id 2sskp88p77-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Wed, 29 May 2019 01:44:49 -0700 Received: from SC-EXCH04.marvell.com (10.93.176.84) by SC-EXCH03.marvell.com (10.93.176.83) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Wed, 29 May 2019 01:44:47 -0700 Received: from NAM03-CO1-obe.outbound.protection.outlook.com (104.47.40.50) by SC-EXCH04.marvell.com (10.93.176.84) with Microsoft SMTP Server (TLS) id 15.0.1367.3 via Frontend Transport; Wed, 29 May 2019 01:44:47 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.onmicrosoft.com; s=selector2-marvell-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=GUy40+DnrF6OKmv6P4tIQfRAjMx1FBjH2O7IeBlDwQA=; b=mAGOAFymtg2P1Czg9tx1ytPR+F66Cq18uWzxBvg3Q1W0yZ965hF5Oqn+bPlfAAQtR0Vc+H5JC1ecT4d0nAV/bDB/tPmTz0LipzJAJIpP0WwjjYQDlYTUwIi0UIApaKUJtGTy78M892wPLeJG88XQbf09zO1Ipr4y+YTcwtt0y/g= Received: from MN2PR18MB3408.namprd18.prod.outlook.com (10.255.238.217) by MN2PR18MB3437.namprd18.prod.outlook.com (10.255.239.22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1922.20; Wed, 29 May 2019 08:44:45 +0000 Received: from MN2PR18MB3408.namprd18.prod.outlook.com ([fe80::7c9a:f3bf:fe2e:fe4a]) by MN2PR18MB3408.namprd18.prod.outlook.com ([fe80::7c9a:f3bf:fe2e:fe4a%4]) with mapi id 15.20.1922.021; Wed, 29 May 2019 08:44:45 +0000 From: Robert Richter To: Borislav Petkov , Tony Luck , "James Morse" , Mauro Carvalho Chehab CC: "linux-edac@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Robert Richter Subject: [PATCH 19/21] EDAC, ghes: Identify dimm by node, card, module and handle Thread-Topic: [PATCH 19/21] EDAC, ghes: Identify dimm by node, card, module and handle Thread-Index: AQHVFfrEFc2xj5nexkqG4JRDUNiq8A== Date: Wed, 29 May 2019 08:44:45 +0000 Message-ID: <20190529084344.28562-20-rrichter@marvell.com> References: <20190529084344.28562-1-rrichter@marvell.com> In-Reply-To: <20190529084344.28562-1-rrichter@marvell.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: AM6PR01CA0046.eurprd01.prod.exchangelabs.com (2603:10a6:20b:e0::23) To MN2PR18MB3408.namprd18.prod.outlook.com (2603:10b6:208:16c::25) x-ms-exchange-messagesentrepresentingtype: 1 x-mailer: git-send-email 2.20.1 x-originating-ip: [78.54.13.57] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: fb6cd393-6d40-4d0f-33ff-08d6e411e681 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:MN2PR18MB3437; x-ms-traffictypediagnostic: MN2PR18MB3437: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:9508; x-forefront-prvs: 0052308DC6 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(366004)(346002)(396003)(376002)(39860400002)(136003)(189003)(199004)(8936002)(4326008)(53936002)(50226002)(110136005)(107886003)(68736007)(36756003)(5660300002)(54906003)(11346002)(26005)(186003)(52116002)(2906002)(2616005)(476003)(446003)(6506007)(14454004)(66476007)(25786009)(478600001)(386003)(86362001)(486006)(81166006)(81156014)(8676002)(305945005)(316002)(7736002)(14444005)(256004)(99286004)(76176011)(6512007)(6436002)(73956011)(64756008)(66446008)(66556008)(66066001)(3846002)(66946007)(1076003)(6486002)(102836004)(71190400001)(71200400001)(6116002); DIR:OUT; SFP:1101; SCL:1; SRVR:MN2PR18MB3437; H:MN2PR18MB3408.namprd18.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: marvell.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: r2n5TutrTGSmGGK/+6u9kjneM51PDHMxyHnGbd6EWkdpiKzEqiv/kz+uiF0IeCN4gVkL3LJeUEL+W3e4xyxoAXekK3cPajzN62psjnG6dT/p4Tl1219WwPtgKYt2+RlS9ASBabh+i6FyWJTzIxmi4xcYiZK1VFTmZfYJWbSx7nTCYzvvhYGTL0MWFNFYAJeR4qxzjbtH8cyb/2ywA30szjgCsxe24uvwVbgOGxAufyVfSbrpGthCz3XnSGKLu8uJfxWPhJlApkJ1z6ZPHSNIBdCauT5xGskgFgBrfuqmudFuaCCk+bFWkBid7SVyKVJQtAUfJgud6shgFlPHHAnPCM1CkOn1PWVSsSD8eMRFLRBrDYrH0BO7hSgqCDtgwDzPo34s+SDmWSH8LEpTkvCR66LjwqJDfWW/m6sDuEer3j4= MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: fb6cd393-6d40-4d0f-33ff-08d6e411e681 X-MS-Exchange-CrossTenant-originalarrivaltime: 29 May 2019 08:44:45.5129 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 70e1fb47-1155-421d-87fc-2e58f638b6e0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: rrichter@marvell.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR18MB3437 X-OriginatorOrg: marvell.com X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-05-29_05:, , signatures=0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org According to SMBIOS Spec. 2.7 (N.2.5 Memory Error Section), a failing DIMM (module or rank number) can be identified by its error location consisting of node, card and module. A module handle is used to map it to the dimms listed in the dmi table. Collect all those data from the error record and select the dimm accordingly. Inconsistent error records will be reported which is the case if the same dimm handle reports errors with different node, card or module. The change allows to enable per-layer reporting based on node, card and module in the next patch. Signed-off-by: Robert Richter --- drivers/edac/ghes_edac.c | 74 +++++++++++++++++++++++++++++++++------- 1 file changed, 62 insertions(+), 12 deletions(-) -- 2.20.1 diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index 4bac643d3404..07c847ed7315 100644 --- a/drivers/edac/ghes_edac.c +++ b/drivers/edac/ghes_edac.c @@ -83,8 +83,11 @@ struct memarr_dmi_entry { struct ghes_dimm_info { struct dimm_info dimm_info; + struct dimm_info *dimm; int idx; int numa_node; + int card; + int module; phys_addr_t start; phys_addr_t end; u16 phys_handle; @@ -119,6 +122,8 @@ static void ghes_dimm_info_init(void) for_each_dimm(dimm) { dimm->idx = idx; dimm->numa_node = NUMA_NO_NODE; + dimm->card = -1; + dimm->module = -1; idx++; } } @@ -401,6 +406,13 @@ static void mci_add_dimm_info(struct mem_ctl_info *mci) if (*dmi_dimm->label) strcpy(mci_dimm->label, dmi_dimm->label); + + /* + * From here on do not use any longer &dimm.dimm_info. + * Instead switch to the mci's dimm info which might + * contain updated data, such as the label. + */ + dimm->dimm = mci_dimm; } if (index != mci->tot_dimms) @@ -408,24 +420,46 @@ static void mci_add_dimm_info(struct mem_ctl_info *mci) index, mci->tot_dimms); } -static struct mem_ctl_info *get_mc_by_node(int nid) +/* Requires ghes_lock being set. */ +static struct ghes_dimm_info * +get_and_prepare_dimm_info(int nid, int card, int module, int handle) { - struct mem_ctl_info *mci = edac_mc_find(nid); + static struct ghes_dimm_info *dimm; + struct dimm_info *di; - if (mci) - return mci; + /* + * We require smbios_handle being set in the error report for + * per layer reporting (SMBIOS handle for the Type 17 Memory + * Device Structure that represents the Memory Module) + */ + for_each_dimm(dimm) { + di = dimm->dimm; + if (di->smbios_handle == handle) + goto found; + } - if (num_possible_nodes() > 1) { - edac_mc_printk(fallback, KERN_WARNING, - "Invalid or no node information, falling back to first node: %s", - fallback->dev_name); + return NULL; +found: + if (dimm->card < 0 && card >= 0) + dimm->card = card; + if (dimm->module < 0 && module >= 0) + dimm->module = module; + + if ((num_possible_nodes() > 1 && di->mci->mc_idx != nid) || + (card >= 0 && card != dimm->card) || + (module >= 0 && module != dimm->module)) { + edac_mc_printk(di->mci, KERN_WARNING, + "Inconsistent error report (nid/card/module): %d/%d/%d (dimm%d: %d/%d/%d)", + nid, card, module, di->idx, + di->mci->mc_idx, dimm->card, dimm->module); } - return fallback; + return dimm; } void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) { + struct ghes_dimm_info *dimm; struct dimm_info *dimm_info; enum hw_event_mc_err_type type; struct edac_raw_error_desc *e; @@ -434,6 +468,9 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) unsigned long flags; char *p; int nid = NUMA_NO_NODE; + int card = -1; + int module = -1; + int handle = -1; /* We need at least one mc */ if (WARN_ON_ONCE(!fallback)) @@ -449,10 +486,23 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) spin_lock_irqsave(&ghes_lock, flags); - /* select the node's mc device */ if (mem_err->validation_bits & CPER_MEM_VALID_NODE) nid = mem_err->node; - mci = get_mc_by_node(nid); + if (mem_err->validation_bits & CPER_MEM_VALID_CARD) + card = mem_err->card; + if (mem_err->validation_bits & CPER_MEM_VALID_MODULE) + module = mem_err->module; + if (mem_err->validation_bits & CPER_MEM_VALID_MODULE_HANDLE) + handle = mem_err->mem_dev_handle; + + dimm = get_and_prepare_dimm_info(nid, card, module, handle); + if (dimm) + mci = dimm->dimm->mci; + else + mci = edac_mc_find(nid); + if (!mci) + mci = fallback; + pvt = mci->pvt_info; e = &mci->error_desc; @@ -670,7 +720,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) if (p > pvt->other_detail) *(p - 1) = '\0'; - dimm_info = edac_get_dimm_by_index(mci, e->top_layer); + dimm_info = dimm ? dimm->dimm : NULL; edac_raw_mc_handle_error(type, mci, dimm_info, e, -1, -1);