From patchwork Wed May 29 08:44:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Robert Richter X-Patchwork-Id: 165340 Delivered-To: patch@linaro.org Received: by 2002:a92:9e1a:0:0:0:0:0 with SMTP id q26csp9580286ili; Wed, 29 May 2019 01:45:05 -0700 (PDT) X-Google-Smtp-Source: APXvYqxsINNrP0KhbGvrMq06Lx1C6LYQ/MDTtzGxz4JMvgYcCysnQB8EdHiSXeNIVQNF/k/CxpyP X-Received: by 2002:a17:90a:2401:: with SMTP id h1mr10532529pje.123.1559119505625; Wed, 29 May 2019 01:45:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559119505; cv=none; d=google.com; s=arc-20160816; b=oDdLQ0GW4eRHz3hhQSWfVtNwxJp4yGOb+gzJwIMTTvFs5TWpOvLCTyYVBxxIAFrdTV XMK7J+pppcJ8ozQ+G9TFveOn4KowUc9xm7CXU5EBebdvn+0tRyKV7JRFJiPkXyH8EQJD /jhEjXazRcJjR8W3FzChyxe6ScWXAs7SMj5xdCqYLs2y0r+aJneBzDQvYb/fSz7bHp4O nwZP1JBeCQesGfh1cetpr5aeabJrGYaY5BM6wn/olzZIRlk/5W2f79zuc6mUu3KdG+rZ kN82Nis0ms3F8YZoP3+zapfHij6K/iCLrAcMSdmIQREK4v7K33d+3bnD94Ae7ncEoZLS CtuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-language:accept-language:in-reply-to:references:message-id :date:thread-index:thread-topic:subject:cc:to:from:dkim-signature :dkim-signature; bh=dpQEqX+Tz0FmEJnIkWCsiP+ALItrYn6fkOWFHNnuN6I=; b=0TntxSV+z6GoU7Z6C/Q4Y+CrfJGWMwyTQP+UdalHOJqCprvEMDO8dGThJhCqiMMtJJ DJVolWm/4f3/UKqIZVaPyCZRDiBH2qFVPfDQzB2wIIfJAQr5wsEOwNoRVZrVEJlZHuYw 8pelKwSWyphYoO5MEitSJJMoDCsnfGCTKWQh8oadnxnpYXITfhohD+6d9HazQSkDNi40 i+EjxhjsMl2hi/oON5Qa/UcuRyCytRWgeDHk+rReM+PwwSnEnCBDNfmWOw4iMjLGCTvi gBGn2kcTpJFMORdOU4dv+fusXQFVkV1DKGlakoJ54s6p6YcKV9K7WYe26jWAPJD/L5tq C14A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@marvell.com header.s=pfpt0818 header.b=LK58nhB4; dkim=pass header.i=@marvell.onmicrosoft.com header.s=selector2-marvell-onmicrosoft-com header.b=RkXgIgXX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=marvell.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n7si24246780pgq.459.2019.05.29.01.45.05; Wed, 29 May 2019 01:45:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@marvell.com header.s=pfpt0818 header.b=LK58nhB4; dkim=pass header.i=@marvell.onmicrosoft.com header.s=selector2-marvell-onmicrosoft-com header.b=RkXgIgXX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=marvell.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726990AbfE2IpE (ORCPT + 30 others); Wed, 29 May 2019 04:45:04 -0400 Received: from mx0b-0016f401.pphosted.com ([67.231.156.173]:49778 "EHLO mx0b-0016f401.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726944AbfE2IpC (ORCPT ); Wed, 29 May 2019 04:45:02 -0400 Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x4T8isC6022021; Wed, 29 May 2019 01:44:55 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pfpt0818; bh=dpQEqX+Tz0FmEJnIkWCsiP+ALItrYn6fkOWFHNnuN6I=; b=LK58nhB4IrykW5wi8qfhvqCDhD6Kz8JQhGjRcV3PpJUeOqAo80IOKj1ybDj5HhflJKbR D0jVKe0ckVxPWLPchmSGO6NVrFY+DVBSra0fsJyIPYZ/GAqaHqVw4tlRjxaNe01qXCxP P5ECsi9Lv1Jkize4HGL5ScBaFY3b0HChwqWkADCkOf59Ph4+HdIdWNy4/eZzG8gZZoyI C3P58a3fQgZnN/6S1W7QnjdHTgY2U+uVsXHhxAtMD4sy6TZZqm6N8139PHQacRIN0RpA 3ggF3t59coUb+z1U54PWfvk+oFBwFEERqftQUtXEWavbEw9sYvkhhKed67f0OdR0zKxM YA== Received: from sc-exch02.marvell.com ([199.233.58.182]) by mx0b-0016f401.pphosted.com with ESMTP id 2sskp88p7b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Wed, 29 May 2019 01:44:54 -0700 Received: from SC-EXCH03.marvell.com (10.93.176.83) by SC-EXCH02.marvell.com (10.93.176.82) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Wed, 29 May 2019 01:44:49 -0700 Received: from NAM03-CO1-obe.outbound.protection.outlook.com (104.47.40.58) by SC-EXCH03.marvell.com (10.93.176.83) with Microsoft SMTP Server (TLS) id 15.0.1367.3 via Frontend Transport; Wed, 29 May 2019 01:44:49 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.onmicrosoft.com; s=selector2-marvell-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=dpQEqX+Tz0FmEJnIkWCsiP+ALItrYn6fkOWFHNnuN6I=; b=RkXgIgXXbkTlPMPUupB9knnAVJWuhVOFpWNLHkb8xLLwP/uojJs3P4hEY+TsR0joMtF8huMzSeomMthtm9mNDolabfdvj6bUMcADC4NaLImEvm+b4kyTKqTUJkH3KAsRP3alfAIru7kW4Lw+uiDRPxX6d7YDx8lhyxFpBva7wTQ= Received: from MN2PR18MB3408.namprd18.prod.outlook.com (10.255.238.217) by MN2PR18MB3437.namprd18.prod.outlook.com (10.255.239.22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1922.20; Wed, 29 May 2019 08:44:48 +0000 Received: from MN2PR18MB3408.namprd18.prod.outlook.com ([fe80::7c9a:f3bf:fe2e:fe4a]) by MN2PR18MB3408.namprd18.prod.outlook.com ([fe80::7c9a:f3bf:fe2e:fe4a%4]) with mapi id 15.20.1922.021; Wed, 29 May 2019 08:44:48 +0000 From: Robert Richter To: Borislav Petkov , Tony Luck , "James Morse" , Mauro Carvalho Chehab CC: "linux-edac@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Robert Richter Subject: [PATCH 20/21] EDAC, ghes: Enable per-layer reporting based on card/module Thread-Topic: [PATCH 20/21] EDAC, ghes: Enable per-layer reporting based on card/module Thread-Index: AQHVFfrFZJH+3Ckua0Ku/1mj0fm9LA== Date: Wed, 29 May 2019 08:44:47 +0000 Message-ID: <20190529084344.28562-21-rrichter@marvell.com> References: <20190529084344.28562-1-rrichter@marvell.com> In-Reply-To: <20190529084344.28562-1-rrichter@marvell.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: AM6PR01CA0046.eurprd01.prod.exchangelabs.com (2603:10a6:20b:e0::23) To MN2PR18MB3408.namprd18.prod.outlook.com (2603:10b6:208:16c::25) x-ms-exchange-messagesentrepresentingtype: 1 x-mailer: git-send-email 2.20.1 x-originating-ip: [78.54.13.57] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: e12dfd8d-1c96-405b-9c9f-08d6e411e7e3 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:MN2PR18MB3437; x-ms-traffictypediagnostic: MN2PR18MB3437: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:4125; x-forefront-prvs: 0052308DC6 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(366004)(346002)(396003)(376002)(39860400002)(136003)(189003)(199004)(8936002)(4326008)(53936002)(50226002)(110136005)(107886003)(68736007)(36756003)(5660300002)(54906003)(11346002)(26005)(186003)(52116002)(2906002)(2616005)(476003)(446003)(6506007)(14454004)(66476007)(25786009)(478600001)(386003)(86362001)(486006)(81166006)(81156014)(8676002)(305945005)(316002)(7736002)(14444005)(256004)(99286004)(76176011)(6512007)(6436002)(73956011)(64756008)(66446008)(66556008)(66066001)(3846002)(66946007)(1076003)(6486002)(102836004)(71190400001)(71200400001)(6116002); DIR:OUT; SFP:1101; SCL:1; SRVR:MN2PR18MB3437; H:MN2PR18MB3408.namprd18.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: marvell.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: w2d1BTM5fPjQ4IN34JokgCugWGamPxQWWlRyoYCCDNeFOuvS5D3ekywVvlzLJFS3tm+W82GzEbVew/TTC2NUaVMHzUbE2UTr+IqfCpa+ftX/ctvirZ2wyeifU9gnSzj9DUTssna4YrktrwBjqa4B22+/i/i2gqOSTIsY6GSXAskqyWeu9WP9oP2cGmjOod4rOcmFRlaEmHlE42zlAPezsW7r4h8R14UxD3g3kcLme4c5R4G6u4fpDvInRN8zYbIjiE8vpQxTwFK6ONgSfcyEQ0iMlE4LNWNeXx3fNM7+7CGjSERvFi1lSlxRM1vAv6r5h03/8r72ejMI18v26IsD0Subn3v85XlvODm/Z0fwwGMNMZhzDG8bUdK9BH7+Fe+rQ9yJoO8CPUubmA8lMIX8eHfrd8ZIoY584kGL+qRuA00= MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: e12dfd8d-1c96-405b-9c9f-08d6e411e7e3 X-MS-Exchange-CrossTenant-originalarrivaltime: 29 May 2019 08:44:47.9097 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 70e1fb47-1155-421d-87fc-2e58f638b6e0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: rrichter@marvell.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR18MB3437 X-OriginatorOrg: marvell.com X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-05-29_05:, , signatures=0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch enables per-layer reporting of the GHES driver based on node, card and module. A dimm can be uniquely identified by those 3 identifiers. The mc device is selected by the node id. Thus, each ghes edac memory controller device has a 2-dimensional layer hierarchy based on card and module in the same way as most other driver have. An error log looks as follows now: [ 8902.592060] {4}[Hardware Error]: Error 6, type: corrected [ 8902.597534] {4}[Hardware Error]: section_type: memory error [ 8902.603267] {4}[Hardware Error]: error_status: 0x0000000000000400 [ 8902.609522] {4}[Hardware Error]: physical_address: 0x000000b3bb7d3000 [ 8902.616126] {4}[Hardware Error]: node: 1 card: 3 module: 0 rank: 1 bank: 771 column: 14 bit_position: 16 [ 8902.625854] {4}[Hardware Error]: DIMM location: N1 DIMM_L0 [ 8902.807783] EDAC MC1: 1 CE ghes_mc on N1 DIMM_L0 (card:3 module:0 page:0xb3bb7d3 offset:0x0 grain:0 syndrome:0x0 - APEI location: node:1 card:3 module:0 rank:1 bank:771 col:14 bit_pos:16 handle:0x0052 status(0x0000000000000400): Storage error in DRAM memory) GHES error reports are now similar to edac_mc reports. This patch moves common code of ghes and edac_mc to edac_raw_mc_handle_error(). Signed-off-by: Robert Richter --- drivers/edac/edac_mc.c | 45 ++++++++++++++---------- drivers/edac/ghes_edac.c | 76 ++++++++++++++++++---------------------- include/linux/edac.h | 2 ++ 3 files changed, 63 insertions(+), 60 deletions(-) -- 2.20.1 diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c index bdeb9fd08249..c159bb3c77e0 100644 --- a/drivers/edac/edac_mc.c +++ b/drivers/edac/edac_mc.c @@ -915,11 +915,13 @@ int edac_mc_find_csrow_by_page(struct mem_ctl_info *mci, unsigned long page) EXPORT_SYMBOL_GPL(edac_mc_find_csrow_by_page); const char *edac_layer_name[] = { - [EDAC_MC_LAYER_BRANCH] = "branch", - [EDAC_MC_LAYER_CHANNEL] = "channel", - [EDAC_MC_LAYER_SLOT] = "slot", - [EDAC_MC_LAYER_CHIP_SELECT] = "csrow", - [EDAC_MC_LAYER_ALL_MEM] = "memory", + [EDAC_MC_LAYER_BRANCH] = "branch", + [EDAC_MC_LAYER_CHANNEL] = "channel", + [EDAC_MC_LAYER_SLOT] = "slot", + [EDAC_MC_LAYER_CHIP_SELECT] = "csrow", + [EDAC_MC_LAYER_ALL_MEM] = "memory", + [EDAC_MC_LAYER_CARD] = "card", + [EDAC_MC_LAYER_MODULE] = "module", }; EXPORT_SYMBOL_GPL(edac_layer_name); @@ -1046,7 +1048,26 @@ void edac_raw_mc_handle_error(const enum hw_event_mc_err_type type, int row, int chan) { char detail[80]; + int idx; + int pos[EDAC_MAX_LAYERS] = { e->top_layer, e->mid_layer, + e->low_layer }; u8 grain_bits; + char *p; + + /* Fill the RAM location data */ + p = e->location; + + for (idx = 0; idx < mci->n_layers; idx++) { + if (pos[idx] < 0) + continue; + + p += sprintf(p, "%s:%d ", + edac_layer_name[mci->layers[idx].type], + pos[idx]); + } + + if (p > e->location) + *(p - 1) = '\0'; /* Report the error via the trace interface */ grain_bits = fls_long(e->grain) + 1; @@ -1228,20 +1249,6 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type, else if (!*e->label) strcpy(e->label, "unknown memory"); - /* Fill the RAM location data */ - p = e->location; - - for (i = 0; i < mci->n_layers; i++) { - if (pos[i] < 0) - continue; - - p += sprintf(p, "%s:%d ", - edac_layer_name[mci->layers[i].type], - pos[i]); - } - if (p > e->location) - *(p - 1) = '\0'; - dimm = edac_get_dimm(mci, top_layer, mid_layer, low_layer); edac_raw_mc_handle_error(type, mci, dimm, e, row, chan); diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index 07c847ed7315..67e962159653 100644 --- a/drivers/edac/ghes_edac.c +++ b/drivers/edac/ghes_edac.c @@ -167,18 +167,6 @@ static void ghes_edac_set_nid(const struct dmi_header *dh, void *arg) } } -static int get_dimm_smbios_index(struct mem_ctl_info *mci, u16 handle) -{ - struct dimm_info *dimm; - - mci_for_each_dimm(mci, dimm) { - if (dimm->smbios_handle == handle) - return dimm->idx; - } - - return -1; -} - static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg) { if (dh->type == DMI_ENTRY_MEM_DEVICE) { @@ -506,10 +494,12 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) pvt = mci->pvt_info; e = &mci->error_desc; + edac_dbg(3, "MC%d\n", mci->mc_idx); + /* Cleans the error report buffer */ memset(e, 0, sizeof (*e)); + e->error_count = 1; - strcpy(e->label, "unknown label"); e->top_layer = -1; e->mid_layer = -1; e->low_layer = -1; @@ -519,6 +509,25 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) *pvt->msg = '\0'; *pvt->other_detail = '\0'; + if (dimm) { + /* The DIMM could be identified. */ + e->top_layer = dimm->card; + e->mid_layer = dimm->module; + strcpy(e->label, dimm->dimm->label); + } else if (nid >= 0 || card >= 0 || module >= 0 || handle >= 0) { + /* + * We have at least some information and can do a + * per-layer reporting, but the exact location is + * unknown. + */ + e->top_layer = card; + e->mid_layer = module; + strcpy(e->label, "unknown memory"); + } else { + /* No error location at all. */ + strcpy(e->label, "any memory"); + } + switch (sev) { case GHES_SEV_CORRECTED: type = HW_EVENT_ERR_CORRECTED; @@ -538,8 +547,10 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) (long long)mem_err->validation_bits); /* Error type, mapped on e->msg */ + p = pvt->msg; + p += sprintf(p, "%s", mci->ctl_name); if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_TYPE) { - p = pvt->msg; + p += sprintf(p, ": "); switch (mem_err->error_type) { case 0: p += sprintf(p, "Unknown"); @@ -593,8 +604,6 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) p += sprintf(p, "reserved error (%d)", mem_err->error_type); } - } else { - strcpy(pvt->msg, "unknown error"); } /* Error address */ @@ -607,8 +616,9 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK); - /* Memory error location, mapped on e->location */ - p = e->location; + /* Memory error location, mapped on e->other_detail */ + p = pvt->other_detail; + p += snprintf(p, sizeof(pvt->other_detail), "APEI location: "); if (mem_err->validation_bits & CPER_MEM_VALID_NODE) p += sprintf(p, "node:%d ", mem_err->node); if (mem_err->validation_bits & CPER_MEM_VALID_CARD) @@ -626,27 +636,8 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) if (mem_err->validation_bits & CPER_MEM_VALID_BIT_POSITION) p += sprintf(p, "bit_pos:%d ", mem_err->bit_pos); if (mem_err->validation_bits & CPER_MEM_VALID_MODULE_HANDLE) { - const char *bank = NULL, *device = NULL; - int index = -1; - - dmi_memdev_name(mem_err->mem_dev_handle, &bank, &device); - if (bank != NULL && device != NULL) - p += sprintf(p, "DIMM location:%s %s ", bank, device); - else - p += sprintf(p, "DIMM DMI handle: 0x%.4x ", - mem_err->mem_dev_handle); - - index = get_dimm_smbios_index(mci, mem_err->mem_dev_handle); - if (index >= 0) - e->top_layer = index; + p += sprintf(p, "handle:0x%.4x ", handle); } - if (p > e->location) - *(p - 1) = '\0'; - - /* All other fields are mapped on e->other_detail */ - p = pvt->other_detail; - p += snprintf(p, sizeof(pvt->other_detail), - "APEI location: %s ", e->location); if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_STATUS) { u64 status = mem_err->error_status; @@ -754,11 +745,14 @@ ghes_edac_register_one(int nid, struct ghes *ghes, struct device *parent) struct ghes_edac_pvt *ghes_pvt; int rc; struct mem_ctl_info *mci; - struct edac_mc_layer layers[1]; + struct edac_mc_layer layers[2]; - layers[0].type = EDAC_MC_LAYER_ALL_MEM; + layers[0].type = EDAC_MC_LAYER_CARD; layers[0].size = 0; - layers[0].is_virt_csrow = true; + layers[0].is_virt_csrow = false; + layers[1].type = EDAC_MC_LAYER_MODULE; + layers[1].size = 0; + layers[1].is_virt_csrow = false; mci = edac_mc_alloc_by_dimm(nid, mem_info.num_per_node[nid], ARRAY_SIZE(layers), layers, diff --git a/include/linux/edac.h b/include/linux/edac.h index 4dcf075e9dff..40e7da735e48 100644 --- a/include/linux/edac.h +++ b/include/linux/edac.h @@ -336,6 +336,8 @@ enum edac_mc_layer_type { EDAC_MC_LAYER_SLOT, EDAC_MC_LAYER_CHIP_SELECT, EDAC_MC_LAYER_ALL_MEM, + EDAC_MC_LAYER_CARD, /* SMBIOS Type 16 Memory Array */ + EDAC_MC_LAYER_MODULE, /* SMBIOS Type 17 Memory Device */ }; /**