From patchwork Mon Jun 24 15:09:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robert Richter X-Patchwork-Id: 167614 Delivered-To: patch@linaro.org Received: by 2002:a92:4782:0:0:0:0:0 with SMTP id e2csp4348053ilk; Mon, 24 Jun 2019 08:09:53 -0700 (PDT) X-Google-Smtp-Source: APXvYqyaSyTI3oAd6BpoCyC7gIXrOHWcv/zBZJOiUxbsypceftgoQKdStFtGQlDPT0gZvkWz2MeQ X-Received: by 2002:a17:90a:8984:: with SMTP id v4mr25343845pjn.133.1561388993753; Mon, 24 Jun 2019 08:09:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561388993; cv=none; d=google.com; s=arc-20160816; b=Hb8MkSisCddNCGodoest0PcuACsEKvyb5iwHPfeDv2AjdP6YgmoYB6nhWYkgw8jY4m y1EQy+CvI75+jEPBtgQop+0tmPFGVGBWwhM9VxfFYbL3RdH1o5xjHxcUxoGA/oG9u3dX /0kgGKTegZBC8cCH6dOfBTRpcWSs7Qfz8r2YqTvfiC4HZnsLqAgUzUIPD8P0s6S7X+gW TQzEoUm7rw9kwO9KY/i7Jqqp2nqGfIYQH1VrMdebK8VTgk71v+USEx5KQXb/ujxzClZN TTmoradC7YxU8S8AIkC7rxJfXGg+yquQswrnXI2dGx61phMi3zwF/R96gjlMfUMxjfgF ks2Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-language:accept-language:in-reply-to:references:message-id :date:thread-index:thread-topic:subject:cc:to:from:dkim-signature :dkim-signature; bh=7Z0UKMqfS8QvL2mnWEE469Otwk5nyt+5zDx8HNspP20=; b=upZXe0/P7heLiC+J4abKKliswkHv4hurl20tqxuZ4ODODsphXowN/E3bNSPE16gHbC M6NFCFl5uM41Gg7FX8mWyCRNwzwmk/J++nMiQuUDd9WDrVjCzn+BpW5ZpuSijUJsJ4pz Fv55um//aM0fbqURD2W3NsPr9ikQnMmGsZFD+3g1L2nOTy6/dvUpcdFJKEbxI5MvTO1V KJelH7ZW3jGAM7TFKcZ119Sh40ifMKY2SxzzlkZzd58Uho1g0wjIp7sTIT6tS4umes/G TYvvST/Wot4YSkregD/IiOB4SkRjxQ8cw1HNkDBlkmvpd3Nm+kc5DzJGL9rc9DSXEQ/M MhSg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@marvell.com header.s=pfpt0818 header.b=crUbmuvb; dkim=pass header.i=@marvell.onmicrosoft.com header.s=selector2-marvell-onmicrosoft-com header.b=o1VLQdEE; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=marvell.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w10si10049805plq.403.2019.06.24.08.09.53; Mon, 24 Jun 2019 08:09:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@marvell.com header.s=pfpt0818 header.b=crUbmuvb; dkim=pass header.i=@marvell.onmicrosoft.com header.s=selector2-marvell-onmicrosoft-com header.b=o1VLQdEE; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=marvell.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730857AbfFXPJv (ORCPT + 30 others); Mon, 24 Jun 2019 11:09:51 -0400 Received: from mx0a-0016f401.pphosted.com ([67.231.148.174]:60586 "EHLO mx0b-0016f401.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730834AbfFXPJr (ORCPT ); Mon, 24 Jun 2019 11:09:47 -0400 Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x5OF0N6n018740; Mon, 24 Jun 2019 08:09:40 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pfpt0818; bh=7Z0UKMqfS8QvL2mnWEE469Otwk5nyt+5zDx8HNspP20=; b=crUbmuvbbvT3MzYaKe+2Qk1jDGdghm4FPQZYv2dhG7BKbH6Am0AyGxgz99x6DXEAif3H gCXy9Pc4kSq3wqHRFcsSuKTZjjOSL/1ya9aIWBd/N3zb6SlpjWj+83WuQqfX4ZpZ/zjo jAZDe1Tey1pIrA5ytoWbshwkjYysPHkJ89Cq8IczKKhzwavhczdun3PW3HIi211kKWE0 oTlEpGX1epDVbCB8sbmX/icUWzYL03gygYcuT7BL5dG0jtRPyDIn3I6Cy1uFxhHOa9uf S70MSZULpL1EaPbilUMkdbjEUbi5ZYTvEJVurB6q+rrpqRzIDGFREZfywwkbUHqkZt4N wA== Received: from sc-exch02.marvell.com ([199.233.58.182]) by mx0a-0016f401.pphosted.com with ESMTP id 2tarxr9tc7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Mon, 24 Jun 2019 08:09:40 -0700 Received: from SC-EXCH02.marvell.com (10.93.176.82) by SC-EXCH02.marvell.com (10.93.176.82) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Mon, 24 Jun 2019 08:09:39 -0700 Received: from NAM03-BY2-obe.outbound.protection.outlook.com (104.47.42.52) by SC-EXCH02.marvell.com (10.93.176.82) with Microsoft SMTP Server (TLS) id 15.0.1367.3 via Frontend Transport; Mon, 24 Jun 2019 08:09:39 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.onmicrosoft.com; s=selector2-marvell-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=7Z0UKMqfS8QvL2mnWEE469Otwk5nyt+5zDx8HNspP20=; b=o1VLQdEEo2+LH+4ozj9HGk44HSjbAI/zBZoJJHkBAWMVsfRf4HKwCcz8BvnArn2Llro2zpiDALY7tqichzqsEFD/MxznZHSzi5xNikm8EShrUr2BxJMd1pGV7wpJNyvxn/Ad7XtZ3ZvS/ZlnSo6pmt7BWoSeZRvPSicTNdqsldo= Received: from MN2PR18MB3408.namprd18.prod.outlook.com (10.255.238.217) by MN2PR18MB2589.namprd18.prod.outlook.com (20.179.82.96) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2008.16; Mon, 24 Jun 2019 15:09:38 +0000 Received: from MN2PR18MB3408.namprd18.prod.outlook.com ([fe80::d3:794c:1b94:cf3]) by MN2PR18MB3408.namprd18.prod.outlook.com ([fe80::d3:794c:1b94:cf3%4]) with mapi id 15.20.2008.014; Mon, 24 Jun 2019 15:09:38 +0000 From: Robert Richter To: Borislav Petkov , James Morse , "Mauro Carvalho Chehab" CC: "linux-edac@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Robert Richter Subject: [PATCH v2 21/24] EDAC, ghes: Enable per-layer reporting based on card/module Thread-Topic: [PATCH v2 21/24] EDAC, ghes: Enable per-layer reporting based on card/module Thread-Index: AQHVKp7X2ENQDwTZIEmHMk4xQwA/XQ== Date: Mon, 24 Jun 2019 15:09:37 +0000 Message-ID: <20190624150758.6695-22-rrichter@marvell.com> References: <20190624150758.6695-1-rrichter@marvell.com> In-Reply-To: <20190624150758.6695-1-rrichter@marvell.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: HE1P190CA0035.EURP190.PROD.OUTLOOK.COM (2603:10a6:7:52::24) To MN2PR18MB3408.namprd18.prod.outlook.com (2603:10b6:208:16c::25) x-ms-exchange-messagesentrepresentingtype: 1 x-mailer: git-send-email 2.20.1 x-originating-ip: [92.254.182.202] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 0d14670b-2279-466b-acb7-08d6f8b5f96e x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:MN2PR18MB2589; x-ms-traffictypediagnostic: MN2PR18MB2589: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:4125; x-forefront-prvs: 007814487B x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(396003)(39860400002)(346002)(376002)(366004)(136003)(189003)(199004)(2906002)(2616005)(486006)(476003)(52116002)(14444005)(256004)(5660300002)(66446008)(73956011)(66946007)(1076003)(14454004)(6116002)(86362001)(3846002)(99286004)(6486002)(68736007)(81166006)(81156014)(6436002)(4326008)(305945005)(50226002)(53936002)(7736002)(8676002)(8936002)(478600001)(64756008)(66556008)(66476007)(6512007)(107886003)(71190400001)(71200400001)(36756003)(26005)(102836004)(386003)(186003)(11346002)(76176011)(6506007)(110136005)(316002)(54906003)(446003)(25786009)(66066001); DIR:OUT; SFP:1101; SCL:1; SRVR:MN2PR18MB2589; H:MN2PR18MB3408.namprd18.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: marvell.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: iBkW0geixbLc5vk3LMdcJ5KSX9jYzKCM5k+j4mCqOuzIXsFuWlH3LeMKmHbQDTmiA3Fh2q5f/X/C8pWS8WPcvnHXuJrhqxY5hXhPkAP6tbA1zfPYiUldhxlNKeR6MK3YqVwWgnIvBNdpyls7xUttr9XKFZuvE6JHnATIgZ2pqPhMIHvgzAH/8qX9kWSjUjrGEfMBSFT9zzX5LPVKF+9tiLo1rhekAcQsMoc14BM7o8gYr+RhzliJ/1iEgyFLASRxVORPTmv08eoN8iZgV8SAflquIJPIwuNRkIGeMe1xSObUTCrjdIQzHVOBitIvrr10egZmks6PGL3kJUy5YcIFxia333xnjaJeJP2V9TCeNqmO2G0b5I3G+H9fVjrhshAjxWfcgpE8HCj3A2MrhV2t+X7omMBniaB8jWTuSTmrBAs= MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: 0d14670b-2279-466b-acb7-08d6f8b5f96e X-MS-Exchange-CrossTenant-originalarrivaltime: 24 Jun 2019 15:09:37.9381 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 70e1fb47-1155-421d-87fc-2e58f638b6e0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: rrichter@marvell.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR18MB2589 X-OriginatorOrg: marvell.com X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-06-24_10:, , signatures=0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch enables per-layer reporting of the GHES driver based on node, card and module. A dimm can be uniquely identified by those 3 identifiers. The mc device is selected by the node id. Thus, each ghes edac memory controller device has a 2-dimensional layer hierarchy based on card and module in the same way as most other driver have. An error log looks as follows now: [ 8902.592060] {4}[Hardware Error]: Error 6, type: corrected [ 8902.597534] {4}[Hardware Error]: section_type: memory error [ 8902.603267] {4}[Hardware Error]: error_status: 0x0000000000000400 [ 8902.609522] {4}[Hardware Error]: physical_address: 0x000000b3bb7d3000 [ 8902.616126] {4}[Hardware Error]: node: 1 card: 3 module: 0 rank: 1 bank: 771 column: 14 bit_position: 16 [ 8902.625854] {4}[Hardware Error]: DIMM location: N1 DIMM_L0 [ 8902.807783] EDAC MC1: 1 CE ghes_mc on N1 DIMM_L0 (card:3 module:0 page:0xb3bb7d3 offset:0x0 grain:0 syndrome:0x0 - APEI location: node:1 card:3 module:0 rank:1 bank:771 col:14 bit_pos:16 handle:0x0052 status(0x0000000000000400): Storage error in DRAM memory) GHES error reports are now similar to edac_mc reports. This patch moves common code of ghes and edac_mc to edac_raw_mc_handle_error(). Signed-off-by: Robert Richter --- drivers/edac/edac_mc.c | 45 ++++++++++++++---------- drivers/edac/ghes_edac.c | 76 ++++++++++++++++++---------------------- include/linux/edac.h | 2 ++ 3 files changed, 63 insertions(+), 60 deletions(-) -- 2.20.1 diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c index 3a40496a1973..9383a1179b83 100644 --- a/drivers/edac/edac_mc.c +++ b/drivers/edac/edac_mc.c @@ -915,11 +915,13 @@ int edac_mc_find_csrow_by_page(struct mem_ctl_info *mci, unsigned long page) EXPORT_SYMBOL_GPL(edac_mc_find_csrow_by_page); const char *edac_layer_name[] = { - [EDAC_MC_LAYER_BRANCH] = "branch", - [EDAC_MC_LAYER_CHANNEL] = "channel", - [EDAC_MC_LAYER_SLOT] = "slot", - [EDAC_MC_LAYER_CHIP_SELECT] = "csrow", - [EDAC_MC_LAYER_ALL_MEM] = "memory", + [EDAC_MC_LAYER_BRANCH] = "branch", + [EDAC_MC_LAYER_CHANNEL] = "channel", + [EDAC_MC_LAYER_SLOT] = "slot", + [EDAC_MC_LAYER_CHIP_SELECT] = "csrow", + [EDAC_MC_LAYER_ALL_MEM] = "memory", + [EDAC_MC_LAYER_CARD] = "card", + [EDAC_MC_LAYER_MODULE] = "module", }; EXPORT_SYMBOL_GPL(edac_layer_name); @@ -1046,7 +1048,26 @@ void edac_raw_mc_handle_error(const enum hw_event_mc_err_type type, int row, int chan) { char detail[80]; + int idx; + int pos[EDAC_MAX_LAYERS] = { e->top_layer, e->mid_layer, + e->low_layer }; u8 grain_bits; + char *p; + + /* Fill the RAM location data */ + p = e->location; + + for (idx = 0; idx < mci->n_layers; idx++) { + if (pos[idx] < 0) + continue; + + p += sprintf(p, "%s:%d ", + edac_layer_name[mci->layers[idx].type], + pos[idx]); + } + + if (p > e->location) + *(p - 1) = '\0'; /* * We expect the hw to report a reasonable grain, fallback to @@ -1233,20 +1254,6 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type, else if (!*e->label) strcpy(e->label, "unknown memory"); - /* Fill the RAM location data */ - p = e->location; - - for (i = 0; i < mci->n_layers; i++) { - if (pos[i] < 0) - continue; - - p += sprintf(p, "%s:%d ", - edac_layer_name[mci->layers[i].type], - pos[i]); - } - if (p > e->location) - *(p - 1) = '\0'; - dimm = edac_get_dimm(mci, top_layer, mid_layer, low_layer); edac_raw_mc_handle_error(type, mci, dimm, e, row, chan); diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index 689841c5c84d..fb5a54e27917 100644 --- a/drivers/edac/ghes_edac.c +++ b/drivers/edac/ghes_edac.c @@ -178,18 +178,6 @@ static void ghes_edac_set_nid(const struct dmi_header *dh, void *arg) } } -static int get_dimm_smbios_index(struct mem_ctl_info *mci, u16 handle) -{ - struct dimm_info *dimm; - - mci_for_each_dimm(mci, dimm) { - if (dimm->smbios_handle == handle) - return dimm->idx; - } - - return -1; -} - static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg) { if (dh->type == DMI_ENTRY_MEM_DEVICE) { @@ -500,11 +488,13 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) pvt = mci->pvt_info; e = &mci->error_desc; + edac_dbg(3, "MC%d\n", mci->mc_idx); + /* Cleans the error report buffer */ memset(e, 0, sizeof (*e)); + e->error_count = 1; e->grain = 1; - strcpy(e->label, "unknown label"); e->top_layer = -1; e->mid_layer = -1; e->low_layer = -1; @@ -514,6 +504,25 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) *pvt->msg = '\0'; *pvt->other_detail = '\0'; + if (dimm) { + /* The DIMM could be identified. */ + e->top_layer = dimm->card; + e->mid_layer = dimm->module; + strcpy(e->label, dimm->dimm->label); + } else if (nid >= 0 || card >= 0 || module >= 0 || handle >= 0) { + /* + * We have at least some information and can do a + * per-layer reporting, but the exact location is + * unknown. + */ + e->top_layer = card; + e->mid_layer = module; + strcpy(e->label, "unknown memory"); + } else { + /* No error location at all. */ + strcpy(e->label, "any memory"); + } + switch (sev) { case GHES_SEV_CORRECTED: type = HW_EVENT_ERR_CORRECTED; @@ -533,8 +542,10 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) (long long)mem_err->validation_bits); /* Error type, mapped on e->msg */ + p = pvt->msg; + p += sprintf(p, "%s", mci->ctl_name); if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_TYPE) { - p = pvt->msg; + p += sprintf(p, ": "); switch (mem_err->error_type) { case 0: p += sprintf(p, "Unknown"); @@ -588,8 +599,6 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) p += sprintf(p, "reserved error (%d)", mem_err->error_type); } - } else { - strcpy(pvt->msg, "unknown error"); } /* Error address */ @@ -602,8 +611,9 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) e->grain = ~mem_err->physical_addr_mask + 1; - /* Memory error location, mapped on e->location */ - p = e->location; + /* Memory error location, mapped on e->other_detail */ + p = pvt->other_detail; + p += snprintf(p, sizeof(pvt->other_detail), "APEI location: "); if (mem_err->validation_bits & CPER_MEM_VALID_NODE) p += sprintf(p, "node:%d ", mem_err->node); if (mem_err->validation_bits & CPER_MEM_VALID_CARD) @@ -621,27 +631,8 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) if (mem_err->validation_bits & CPER_MEM_VALID_BIT_POSITION) p += sprintf(p, "bit_pos:%d ", mem_err->bit_pos); if (mem_err->validation_bits & CPER_MEM_VALID_MODULE_HANDLE) { - const char *bank = NULL, *device = NULL; - int index = -1; - - dmi_memdev_name(mem_err->mem_dev_handle, &bank, &device); - if (bank != NULL && device != NULL) - p += sprintf(p, "DIMM location:%s %s ", bank, device); - else - p += sprintf(p, "DIMM DMI handle: 0x%.4x ", - mem_err->mem_dev_handle); - - index = get_dimm_smbios_index(mci, mem_err->mem_dev_handle); - if (index >= 0) - e->top_layer = index; + p += sprintf(p, "handle:0x%.4x ", handle); } - if (p > e->location) - *(p - 1) = '\0'; - - /* All other fields are mapped on e->other_detail */ - p = pvt->other_detail; - p += snprintf(p, sizeof(pvt->other_detail), - "APEI location: %s ", e->location); if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_STATUS) { u64 status = mem_err->error_status; @@ -749,11 +740,14 @@ ghes_edac_register_one(int nid, struct ghes *ghes, struct device *parent) struct ghes_edac_pvt *ghes_pvt; int rc; struct mem_ctl_info *mci; - struct edac_mc_layer layers[1]; + struct edac_mc_layer layers[2]; - layers[0].type = EDAC_MC_LAYER_ALL_MEM; + layers[0].type = EDAC_MC_LAYER_CARD; layers[0].size = 0; - layers[0].is_virt_csrow = true; + layers[0].is_virt_csrow = false; + layers[1].type = EDAC_MC_LAYER_MODULE; + layers[1].size = 0; + layers[1].is_virt_csrow = false; mci = edac_mc_alloc_by_dimm(nid, mem_info.dimms_per_node[nid], ARRAY_SIZE(layers), layers, diff --git a/include/linux/edac.h b/include/linux/edac.h index 4dcf075e9dff..40e7da735e48 100644 --- a/include/linux/edac.h +++ b/include/linux/edac.h @@ -336,6 +336,8 @@ enum edac_mc_layer_type { EDAC_MC_LAYER_SLOT, EDAC_MC_LAYER_CHIP_SELECT, EDAC_MC_LAYER_ALL_MEM, + EDAC_MC_LAYER_CARD, /* SMBIOS Type 16 Memory Array */ + EDAC_MC_LAYER_MODULE, /* SMBIOS Type 17 Memory Device */ }; /**