From patchwork Wed May 29 08:44:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robert Richter X-Patchwork-Id: 165343 Delivered-To: patch@linaro.org Received: by 2002:a92:9e1a:0:0:0:0:0 with SMTP id q26csp9580518ili; Wed, 29 May 2019 01:45:24 -0700 (PDT) X-Google-Smtp-Source: APXvYqxEpAlPCVAxdj214Brq94v26e3lpWnbENY+qGkkZySChOaey1Nn4K80wtwVKvrM4HwvYhnX X-Received: by 2002:a17:90a:f48a:: with SMTP id bx10mr900084pjb.118.1559119524143; Wed, 29 May 2019 01:45:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559119524; cv=none; d=google.com; s=arc-20160816; b=LvpCrrPHXVavf1EmnV8Bip5znFw7BiHb1pDtYorDP+nlFc7O4ESaIwOSv0+jmFRgJY E13D/D3RROcBtZ6xAKFNN0JBpAnZPu1YnUq2slPDorloJsGY6EJEHg3HpAGwQDB6xmEj XMuG71UG0Zq/Cqimx5/CwB7+48jQqaSqfwD34Q+xCOR4L13RZQmQOpah9p9+k7wBS7EB W7emCk5UO9DDeOqk714TCcUBiaK6+gMTknI4aYx6hARKtwKuRTgyS3FVpUNu9QirvVJo OVqvS6H59aypoGiQdjchKYWTSJFSIVw4tknSQVwxl+n8rTiN94ULH2IZP0kJ1SXJzZ2+ bE8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-language:accept-language:in-reply-to:references:message-id :date:thread-index:thread-topic:subject:cc:to:from:dkim-signature :dkim-signature; bh=coTN1R3U1d0hZvMbb9SooRv7f70/WoSqsGqWhWcV3SU=; b=UeJTw2F5pkHfZe2nn75RApb0vt3KKkDMvBcUM7ulBylFXz9npV1JK546RCpgB9swoC BUQWC7CE/iFgr1ZMa/lPnLB552+Sb32L5HhojT3kXFqgJw66enun24FymtST3RFx/e+f cRMasBn4ra/+CVj17Bcfq4+DS4i3m/+dIX+/3P9TPEkyKQERd3SagJ300S0qgcR4dxsr WUJds4rR/I7b/xrVXoSKHzd1JSs8uzcrE7FZAQCF2Px1OZVIbK/CCBQ9I/EmJkATycK3 CyaNja1m+qkBanCtNAz7vD9ZO/pHpHwriwn2FlSFIozgKLZwI8xIgy5PUgoJ+OrBnKjG BSdQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@marvell.com header.s=pfpt0818 header.b=QRD29DPs; dkim=pass header.i=@marvell.onmicrosoft.com header.s=selector2-marvell-onmicrosoft-com header.b="u0AZY9/Z"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=marvell.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c30si444467pje.1.2019.05.29.01.45.23; Wed, 29 May 2019 01:45:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@marvell.com header.s=pfpt0818 header.b=QRD29DPs; dkim=pass header.i=@marvell.onmicrosoft.com header.s=selector2-marvell-onmicrosoft-com header.b="u0AZY9/Z"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=marvell.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727041AbfE2IpX (ORCPT + 30 others); Wed, 29 May 2019 04:45:23 -0400 Received: from mx0a-0016f401.pphosted.com ([67.231.148.174]:35498 "EHLO mx0b-0016f401.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726853AbfE2Iou (ORCPT ); Wed, 29 May 2019 04:44:50 -0400 Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x4T8eGUG022567; Wed, 29 May 2019 01:44:41 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pfpt0818; bh=coTN1R3U1d0hZvMbb9SooRv7f70/WoSqsGqWhWcV3SU=; b=QRD29DPsLyTjiB6NgA1DmxkbtG7qIdEdb7Zq8TKu+iqCdySwIjLVeBcfgLxy6rLWx8d6 +UaW880YkZFWGjrFoMxuEOIG3Wzjx7SbI+6r+SnmJXHPRiMQnIi6Ew3ix2wc13VhO5g+ ekKXnlX8ce+LPYsx3vqioxOsl4FMIXzH7M0nf60mRw8C/nnM23H6GJkhyRM8fGGE5IKa q+7KOQrbScDP7TyPBlz/v5QKEhpFquf2QQYBynzk/Sipzctg2+5PDAjRJGvtdNii/4lj asL4CxAo2E6NdwPqYZ+42kzdy2p/wUk4VpYVsS2zlm/mLsJyAuZpLmU6L3z0QF9Zw7IO eA== Received: from sc-exch01.marvell.com ([199.233.58.181]) by mx0a-0016f401.pphosted.com with ESMTP id 2sspkpg1qv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Wed, 29 May 2019 01:44:41 -0700 Received: from SC-EXCH01.marvell.com (10.93.176.81) by SC-EXCH01.marvell.com (10.93.176.81) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Wed, 29 May 2019 01:44:40 -0700 Received: from NAM03-CO1-obe.outbound.protection.outlook.com (104.47.40.50) by SC-EXCH01.marvell.com (10.93.176.81) with Microsoft SMTP Server (TLS) id 15.0.1367.3 via Frontend Transport; Wed, 29 May 2019 01:44:40 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.onmicrosoft.com; s=selector2-marvell-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=coTN1R3U1d0hZvMbb9SooRv7f70/WoSqsGqWhWcV3SU=; b=u0AZY9/ZITl9ZKBQTl/H7uP6T+mgBdagCgP60kN40unmWIzcn+LPTvWD0WOvWdT79HdovZbD63NlmonERAjXwLNt9itkE6Ga/9sudMi/OrtRYjk8H1sHvKPN00GZkV1ccKbVCrhsQuFPrG0Tecos/kt6DYD4Yz3LEh6mc1/ONT0= Received: from MN2PR18MB3408.namprd18.prod.outlook.com (10.255.238.217) by MN2PR18MB3437.namprd18.prod.outlook.com (10.255.239.22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1922.20; Wed, 29 May 2019 08:44:39 +0000 Received: from MN2PR18MB3408.namprd18.prod.outlook.com ([fe80::7c9a:f3bf:fe2e:fe4a]) by MN2PR18MB3408.namprd18.prod.outlook.com ([fe80::7c9a:f3bf:fe2e:fe4a%4]) with mapi id 15.20.1922.021; Wed, 29 May 2019 08:44:39 +0000 From: Robert Richter To: Borislav Petkov , Tony Luck , "James Morse" , Mauro Carvalho Chehab CC: "linux-edac@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Robert Richter Subject: [PATCH 16/21] EDAC, ghes: Create one memory controller device per node Thread-Topic: [PATCH 16/21] EDAC, ghes: Create one memory controller device per node Thread-Index: AQHVFfrA0cMtpHQTLU2dSIEWEn98LQ== Date: Wed, 29 May 2019 08:44:39 +0000 Message-ID: <20190529084344.28562-17-rrichter@marvell.com> References: <20190529084344.28562-1-rrichter@marvell.com> In-Reply-To: <20190529084344.28562-1-rrichter@marvell.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: AM6PR01CA0046.eurprd01.prod.exchangelabs.com (2603:10a6:20b:e0::23) To MN2PR18MB3408.namprd18.prod.outlook.com (2603:10b6:208:16c::25) x-ms-exchange-messagesentrepresentingtype: 1 x-mailer: git-send-email 2.20.1 x-originating-ip: [78.54.13.57] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: aa590c18-ac00-49bd-1a43-08d6e411e2a5 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:MN2PR18MB3437; x-ms-traffictypediagnostic: MN2PR18MB3437: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:3826; x-forefront-prvs: 0052308DC6 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(366004)(346002)(396003)(376002)(39860400002)(136003)(189003)(199004)(8936002)(4326008)(53936002)(50226002)(110136005)(107886003)(68736007)(36756003)(5660300002)(54906003)(11346002)(26005)(186003)(52116002)(2906002)(2616005)(476003)(446003)(6506007)(14454004)(66476007)(25786009)(478600001)(386003)(86362001)(486006)(81166006)(81156014)(8676002)(305945005)(316002)(7736002)(14444005)(256004)(99286004)(76176011)(6512007)(6436002)(73956011)(64756008)(66446008)(66556008)(66066001)(3846002)(66946007)(1076003)(6486002)(102836004)(71190400001)(71200400001)(6116002); DIR:OUT; SFP:1101; SCL:1; SRVR:MN2PR18MB3437; H:MN2PR18MB3408.namprd18.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: marvell.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: NXiANgjVcjSkLXNRgho/9DSFhyo48+LgRj6vA7ZL7IpjRIv4qlHM4PJ9IYDtg03Lcv2bYYqDlhaYkbnIpGyFdLj5QJFDT7KmV5tI6cV3Hj+44t4+9WCGsV9sjZr7cpxWL+d2CXXrbH/SYO/+nwrpraweuXXOzddZgz+sFmUGKwFQpSzyi/R2PCcQIj7OIeKUIHDwe7NcHpdMqzooiNAOMY5qur7R+qUQaMJRaEuZ1olJ/pvRuYqatBA0ZJD+7f7sMIkqOklTM1/ePaJ5mrnPxjXfYJO6eSdDPHXJEtiJ1sVCI3F0+sHIZZBmN4k4nuIqTuYHF+/FI6Yw+utKkZWCk9B20bsvwTiPoDokbG9KaP6RpgF7qgQSBmINE3zu9IWaHxg+xXzJWa+PQ0m0J3FzSWizwG9fcSVWWDLoD5n1cI8= MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: aa590c18-ac00-49bd-1a43-08d6e411e2a5 X-MS-Exchange-CrossTenant-originalarrivaltime: 29 May 2019 08:44:39.0924 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 70e1fb47-1155-421d-87fc-2e58f638b6e0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: rrichter@marvell.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR18MB3437 X-OriginatorOrg: marvell.com X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-05-29_05:, , signatures=0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Typically for most systems, there is one edac memory controller device per node. This patch implements the same for the ghes driver. Now, create multiple mc devices and map the dimms based on the node id. We need at least one node that is used as fallback if no node information is available in the error report. Here a complete and consistent error report from a ThunderX2 system (zero counter values dropped): # find /sys/devices/system/edac/mc/ -name \*count | sort -V | xargs grep . | sed -e '/:0/d' /sys/devices/system/edac/mc/mc0/ce_count:11 /sys/devices/system/edac/mc/mc0/ce_noinfo_count:1 /sys/devices/system/edac/mc/mc0/csrow2/ce_count:5 /sys/devices/system/edac/mc/mc0/csrow2/ch0_ce_count:5 /sys/devices/system/edac/mc/mc0/csrow3/ce_count:3 /sys/devices/system/edac/mc/mc0/csrow3/ch0_ce_count:3 /sys/devices/system/edac/mc/mc0/csrow4/ce_count:2 /sys/devices/system/edac/mc/mc0/csrow4/ch0_ce_count:2 /sys/devices/system/edac/mc/mc0/dimm2/dimm_ce_count:5 /sys/devices/system/edac/mc/mc0/dimm3/dimm_ce_count:3 /sys/devices/system/edac/mc/mc0/dimm4/dimm_ce_count:2 /sys/devices/system/edac/mc/mc1/ce_count:7 /sys/devices/system/edac/mc/mc1/csrow2/ce_count:4 /sys/devices/system/edac/mc/mc1/csrow2/ch0_ce_count:4 /sys/devices/system/edac/mc/mc1/csrow3/ce_count:1 /sys/devices/system/edac/mc/mc1/csrow3/ch0_ce_count:1 /sys/devices/system/edac/mc/mc1/csrow6/ce_count:2 /sys/devices/system/edac/mc/mc1/csrow6/ch0_ce_count:2 /sys/devices/system/edac/mc/mc1/dimm2/dimm_ce_count:4 /sys/devices/system/edac/mc/mc1/dimm3/dimm_ce_count:1 /sys/devices/system/edac/mc/mc1/dimm6/dimm_ce_count:2 Signed-off-by: Robert Richter --- drivers/edac/ghes_edac.c | 126 ++++++++++++++++++++++++++++++++------- 1 file changed, 104 insertions(+), 22 deletions(-) -- 2.20.1 diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index c39cdfdfb8db..e5fa977bcfd9 100644 --- a/drivers/edac/ghes_edac.c +++ b/drivers/edac/ghes_edac.c @@ -18,6 +18,7 @@ #include struct ghes_edac_pvt { + struct device dev; struct list_head list; struct ghes *ghes; struct mem_ctl_info *mci; @@ -28,7 +29,7 @@ struct ghes_edac_pvt { }; static atomic_t ghes_init = ATOMIC_INIT(0); -static struct ghes_edac_pvt *ghes_pvt; +struct mem_ctl_info *fallback; /* * Sync with other, potentially concurrent callers of @@ -161,15 +162,15 @@ static void ghes_edac_set_nid(const struct dmi_header *dh, void *arg) } } -static int get_dimm_smbios_index(u16 handle) +static int get_dimm_smbios_index(struct mem_ctl_info *mci, u16 handle) { - struct mem_ctl_info *mci = ghes_pvt->mci; struct dimm_info *dimm; mci_for_each_dimm(mci, dimm) { if (dimm->smbios_handle == handle) return dimm->idx; } + return -1; } @@ -370,6 +371,9 @@ static void mci_add_dimm_info(struct mem_ctl_info *mci) int index = 0; for_each_dimm(dimm) { + if (mci->mc_idx != dimm->numa_node) + continue; + dmi_dimm = &dimm->dimm_info; mci_dimm = edac_get_dimm_by_index(mci, index); @@ -390,17 +394,35 @@ static void mci_add_dimm_info(struct mem_ctl_info *mci) index, mci->tot_dimms); } +static struct mem_ctl_info *get_mc_by_node(int nid) +{ + struct mem_ctl_info *mci = edac_mc_find(nid); + + if (mci) + return mci; + + if (num_possible_nodes() > 1) { + edac_mc_printk(fallback, KERN_WARNING, + "Invalid or no node information, falling back to first node: %s", + fallback->dev_name); + } + + return fallback; +} + void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) { struct dimm_info *dimm_info; enum hw_event_mc_err_type type; struct edac_raw_error_desc *e; struct mem_ctl_info *mci; - struct ghes_edac_pvt *pvt = ghes_pvt; + struct ghes_edac_pvt *pvt; unsigned long flags; char *p; + int nid = NUMA_NO_NODE; - if (!pvt) + /* We need at least one mc */ + if (WARN_ON_ONCE(!fallback)) return; /* @@ -413,7 +435,11 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) spin_lock_irqsave(&ghes_lock, flags); - mci = pvt->mci; + /* select the node's mc device */ + if (mem_err->validation_bits & CPER_MEM_VALID_NODE) + nid = mem_err->node; + mci = get_mc_by_node(nid); + pvt = mci->pvt_info; e = &mci->error_desc; /* Cleans the error report buffer */ @@ -546,7 +572,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) p += sprintf(p, "DIMM DMI handle: 0x%.4x ", mem_err->mem_dev_handle); - index = get_dimm_smbios_index(mem_err->mem_dev_handle); + index = get_dimm_smbios_index(mci, mem_err->mem_dev_handle); if (index >= 0) e->top_layer = index; } @@ -645,15 +671,29 @@ static struct acpi_platform_list plat_list[] = { { } /* End */ }; +void ghes_edac_release(struct device *dev) +{ + struct ghes_edac_pvt *ghes_pvt; + struct mem_ctl_info *mci; + + ghes_pvt = container_of(dev, struct ghes_edac_pvt, dev); + + mci = ghes_pvt->mci; + edac_mc_del_mc(mci->pdev); + edac_mc_free(mci); +} + static int ghes_edac_register_one(int nid, struct ghes *ghes, struct device *parent) { + struct device *dev; + struct ghes_edac_pvt *ghes_pvt; int rc; struct mem_ctl_info *mci; struct edac_mc_layer layers[1]; layers[0].type = EDAC_MC_LAYER_ALL_MEM; - layers[0].size = mem_info.num_dimm; + layers[0].size = mem_info.num_per_node[nid]; layers[0].is_virt_csrow = true; mci = edac_mc_alloc(nid, ARRAY_SIZE(layers), layers, @@ -667,43 +707,69 @@ ghes_edac_register_one(int nid, struct ghes *ghes, struct device *parent) ghes_pvt->ghes = ghes; ghes_pvt->mci = mci; - mci->pdev = parent; + dev = &ghes_pvt->dev; + dev->parent = parent; + dev->release = ghes_edac_release; + dev_set_name(dev, "ghes_mc%d", nid); + + rc = device_register(dev); + if (rc) { + pr_err("Can't create EDAC device (%d)\n", rc); + goto fail; + } + + mci->pdev = dev; mci->mtype_cap = MEM_FLAG_EMPTY; mci->edac_ctl_cap = EDAC_FLAG_NONE; mci->edac_cap = EDAC_FLAG_NONE; mci->mod_name = "ghes_edac.c"; - mci->ctl_name = "ghes_edac"; - mci->dev_name = "ghes"; + mci->ctl_name = "ghes_mc"; + mci->dev_name = dev_name(dev); mci_add_dimm_info(mci); rc = edac_mc_add_mc(mci); if (rc < 0) { - pr_err("Can't register at EDAC core\n"); - edac_mc_free(mci); - return -ENODEV; + pr_err("Can't register at EDAC core (%d)\n", rc); + goto fail; } + return 0; +fail: + put_device(dev); + return rc; +} + +static void ghes_edac_unregister_one(struct mem_ctl_info *mci) +{ + struct ghes_edac_pvt *pvt = mci->pvt_info; + + put_device(&pvt->dev); } void ghes_edac_unregister(struct ghes *ghes) { struct mem_ctl_info *mci; + int nid; - if (!ghes_pvt) - return; - - mci = ghes_pvt->mci; - edac_mc_del_mc(mci->pdev); - edac_mc_free(mci); + for_each_node(nid) { + mci = edac_mc_find(nid); + /* stop fallback at last */ + if (mci && mci != fallback) + ghes_edac_unregister_one(mci); + } + ghes_edac_unregister_one(fallback); + fallback = NULL; kfree(mem_info.dimms); + atomic_dec(&ghes_init); } int ghes_edac_register(struct ghes *ghes, struct device *dev) { bool fake = false; int rc; + int nid; int idx = -1; if (IS_ENABLED(CONFIG_X86)) { @@ -743,7 +809,23 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) pr_info("This system has %d DIMM sockets.\n", mem_info.num_dimm); } - rc = ghes_edac_register_one(0, ghes, dev); + for_each_node(nid) { + if (!mem_info.num_per_node[nid]) + continue; - return rc; + rc = ghes_edac_register_one(nid, ghes, dev); + if (rc) { + ghes_edac_unregister(ghes); + return rc; + } + + /* + * use the first node's mc as fallback in case we can + * not detect the node from the error information + */ + if (!fallback) + fallback = edac_mc_find(nid); + } + + return 0; }