From patchwork Fri Oct 25 17:13:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiju Jose X-Patchwork-Id: 838433 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D319E21620A; Fri, 25 Oct 2024 17:14:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.176.79.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729876492; cv=none; b=brDxeeVD5pUW3gQ4idNwJPhHr+UH6zIrL7LYG5cMA3siOS/kO6v180nhRoWfZrZIpmg6v83d+/9/6HJUejpUvdRC9BgsbYy4FoHgkKmQUW9SwwgjVrmcfFNIHvCI9AImFQPXALeRVF/alpbS33S+nBoz0W0LsSQe3cgP75U9zaw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729876492; c=relaxed/simple; bh=VR/vt1ze42h+9UhrNwhIw7ID0VMJKhS7nqypXeDzQKw=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=PVIIp0xWagwbj//hhO7mkrhbEQLpCs6wgJT8bVrl3UtZu7FsRrEN72O68tiuzRBGzZJikNWktkv9BukxS5SMKL/PtPuBwge2+8HVhUBPPawsf3N+v/EerM32qYOT82GwtyC4Dt1+8nbboNGqoW8/j4QxpPsJmaxz1JEodkWjn6k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=185.176.79.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.18.186.231]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4XZq632Rzlz6LCgg; Sat, 26 Oct 2024 01:10:03 +0800 (CST) Received: from frapeml500007.china.huawei.com (unknown [7.182.85.172]) by mail.maildlp.com (Postfix) with ESMTPS id C68261402C7; Sat, 26 Oct 2024 01:14:46 +0800 (CST) Received: from P_UKIT01-A7bmah.china.huawei.com (10.48.151.104) by frapeml500007.china.huawei.com (7.182.85.172) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Fri, 25 Oct 2024 19:14:44 +0200 From: To: , , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v14 08/14] cxl/memfeature: Add CXL memory device ECS control feature Date: Fri, 25 Oct 2024 18:13:49 +0100 Message-ID: <20241025171356.1377-9-shiju.jose@huawei.com> X-Mailer: git-send-email 2.43.0.windows.1 In-Reply-To: <20241025171356.1377-1-shiju.jose@huawei.com> References: <20241025171356.1377-1-shiju.jose@huawei.com> Precedence: bulk X-Mailing-List: linux-acpi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: lhrpeml500002.china.huawei.com (7.191.160.78) To frapeml500007.china.huawei.com (7.182.85.172) From: Shiju Jose CXL spec 3.1 section 8.2.9.9.11.2 describes the DDR5 ECS (Error Check Scrub) control feature. The Error Check Scrub (ECS) is a feature defined in JEDEC DDR5 SDRAM Specification (JESD79-5) and allows the DRAM to internally read, correct single-bit errors, and write back corrected data bits to the DRAM array while providing transparency to error counts. The ECS control allows the requester to change the log entry type, the ECS threshold count (provided the request falls within the limits specified in DDR5 mode registers), switch between codeword mode and row count mode, and reset the ECS counter. Register with EDAC device driver, which retrieves the ECS attribute descriptors from the EDAC ECS and exposes the ECS control attributes to userspace via sysfs. For example, the ECS control for the memory media FRU0 in CXL mem0 device is located at /sys/bus/edac/devices/cxl_mem0/ecs_fru0/ Signed-off-by: Shiju Jose --- drivers/cxl/core/memfeature.c | 368 +++++++++++++++++++++++++++++++++- 1 file changed, 365 insertions(+), 3 deletions(-) diff --git a/drivers/cxl/core/memfeature.c b/drivers/cxl/core/memfeature.c index 8fff00f62f8c..26ba360165db 100644 --- a/drivers/cxl/core/memfeature.c +++ b/drivers/cxl/core/memfeature.c @@ -19,7 +19,7 @@ #include #include -#define CXL_DEV_NUM_RAS_FEATURES 1 +#define CXL_DEV_NUM_RAS_FEATURES 2 #define CXL_DEV_HOUR_IN_SECS 3600 #define CXL_SCRUB_NAME_LEN 128 @@ -306,15 +306,340 @@ static const struct edac_scrub_ops cxl_ps_scrub_ops = { .set_cycle_duration = cxl_patrol_scrub_write_scrub_cycle, }; +/* CXL DDR5 ECS control definitions */ +static const uuid_t cxl_ecs_uuid = + UUID_INIT(0xe5b13f22, 0x2328, 0x4a14, 0xb8, 0xba, 0xb9, 0x69, 0x1e, 0x89, 0x33, 0x86); + +struct cxl_ecs_context { + u16 num_media_frus; + u16 get_feat_size; + u16 set_feat_size; + u8 get_version; + u8 set_version; + u16 set_effects; + struct cxl_memdev *cxlmd; +}; + +enum { + CXL_ECS_PARAM_LOG_ENTRY_TYPE, + CXL_ECS_PARAM_THRESHOLD, + CXL_ECS_PARAM_MODE, + CXL_ECS_PARAM_RESET_COUNTER, +}; + +#define CXL_ECS_LOG_ENTRY_TYPE_MASK GENMASK(1, 0) +#define CXL_ECS_REALTIME_REPORT_CAP_MASK BIT(0) +#define CXL_ECS_THRESHOLD_COUNT_MASK GENMASK(2, 0) +#define CXL_ECS_COUNT_MODE_MASK BIT(3) +#define CXL_ECS_RESET_COUNTER_MASK BIT(4) + +enum { + ECS_THRESHOLD_256 = 3, + ECS_THRESHOLD_1024 = 4, + ECS_THRESHOLD_4096 = 5, +}; + +static const u16 ecs_supp_threshold[] = { + [ECS_THRESHOLD_256] = 256, + [ECS_THRESHOLD_1024] = 1024, + [ECS_THRESHOLD_4096] = 4096, +}; + +enum { + ECS_LOG_ENTRY_TYPE_DRAM = 0x0, + ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU = 0x1, +}; + +enum cxl_ecs_count_mode { + ECS_MODE_COUNTS_ROWS = 0, + ECS_MODE_COUNTS_CODEWORDS = 1, +}; + +/** + * struct cxl_ecs_params - CXL memory DDR5 ECS parameter data structure. + * @log_entry_type: ECS log entry type, per DRAM or per memory media FRU. + * @threshold: ECS threshold count per GB of memory cells. + * @count_mode: codeword/row count mode + * 0 : ECS counts rows with errors + * 1 : ECS counts codeword with errors + * @reset_counter: [IN] reset ECC counter to default value. + */ +struct cxl_ecs_params { + u8 log_entry_type; + u16 threshold; + enum cxl_ecs_count_mode count_mode; + u8 reset_counter; +}; + +struct cxl_ecs_fru_rd_attrs { + u8 ecs_cap; + __le16 ecs_config; + u8 ecs_flags; +} __packed; + +struct cxl_ecs_rd_attrs { + u8 ecs_log_cap; + struct cxl_ecs_fru_rd_attrs fru_attrs[]; +} __packed; + +struct cxl_ecs_fru_wr_attrs { + __le16 ecs_config; +} __packed; + +struct cxl_ecs_wr_attrs { + u8 ecs_log_cap; + struct cxl_ecs_fru_wr_attrs fru_attrs[]; +} __packed; + +/* CXL DDR5 ECS control functions */ +static int cxl_mem_ecs_get_attrs(struct device *dev, void *drv_data, int fru_id, + struct cxl_ecs_params *params) +{ + struct cxl_ecs_context *cxl_ecs_ctx = drv_data; + struct cxl_memdev *cxlmd = cxl_ecs_ctx->cxlmd; + struct cxl_dev_state *cxlds = cxlmd->cxlds; + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds); + struct cxl_ecs_fru_rd_attrs *fru_rd_attrs; + size_t rd_data_size; + u8 threshold_index; + size_t data_size; + + rd_data_size = cxl_ecs_ctx->get_feat_size; + + struct cxl_ecs_rd_attrs *rd_attrs __free(kfree) = + kmalloc(rd_data_size, GFP_KERNEL); + if (!rd_attrs) + return -ENOMEM; + + params->log_entry_type = 0; + params->threshold = 0; + params->count_mode = 0; + data_size = cxl_get_feature(mds, cxl_ecs_uuid, + CXL_GET_FEAT_SEL_CURRENT_VALUE, + rd_attrs, rd_data_size); + if (!data_size) + return -EIO; + + fru_rd_attrs = rd_attrs->fru_attrs; + params->log_entry_type = FIELD_GET(CXL_ECS_LOG_ENTRY_TYPE_MASK, + rd_attrs->ecs_log_cap); + threshold_index = FIELD_GET(CXL_ECS_THRESHOLD_COUNT_MASK, + fru_rd_attrs[fru_id].ecs_config); + params->threshold = ecs_supp_threshold[threshold_index]; + params->count_mode = FIELD_GET(CXL_ECS_COUNT_MODE_MASK, + fru_rd_attrs[fru_id].ecs_config); + return 0; +} + +static int cxl_mem_ecs_set_attrs(struct device *dev, void *drv_data, int fru_id, + struct cxl_ecs_params *params, u8 param_type) +{ + struct cxl_ecs_context *cxl_ecs_ctx = drv_data; + struct cxl_memdev *cxlmd = cxl_ecs_ctx->cxlmd; + struct cxl_dev_state *cxlds = cxlmd->cxlds; + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds); + struct cxl_ecs_fru_rd_attrs *fru_rd_attrs; + struct cxl_ecs_fru_wr_attrs *fru_wr_attrs; + size_t rd_data_size, wr_data_size; + u16 num_media_frus, count; + size_t data_size; + int ret; + + num_media_frus = cxl_ecs_ctx->num_media_frus; + rd_data_size = cxl_ecs_ctx->get_feat_size; + wr_data_size = cxl_ecs_ctx->set_feat_size; + struct cxl_ecs_rd_attrs *rd_attrs __free(kfree) = + kmalloc(rd_data_size, GFP_KERNEL); + if (!rd_attrs) + return -ENOMEM; + + data_size = cxl_get_feature(mds, cxl_ecs_uuid, + CXL_GET_FEAT_SEL_CURRENT_VALUE, + rd_attrs, rd_data_size); + if (!data_size) + return -EIO; + + struct cxl_ecs_wr_attrs *wr_attrs __free(kfree) = + kmalloc(wr_data_size, GFP_KERNEL); + if (!wr_attrs) + return -ENOMEM; + + /* + * Fill writable attributes from the current attributes read + * for all the media FRUs. + */ + fru_rd_attrs = rd_attrs->fru_attrs; + fru_wr_attrs = wr_attrs->fru_attrs; + wr_attrs->ecs_log_cap = rd_attrs->ecs_log_cap; + for (count = 0; count < num_media_frus; count++) + fru_wr_attrs[count].ecs_config = fru_rd_attrs[count].ecs_config; + + /* Fill attribute to be set for the media FRU */ + switch (param_type) { + case CXL_ECS_PARAM_LOG_ENTRY_TYPE: + if (params->log_entry_type != ECS_LOG_ENTRY_TYPE_DRAM && + params->log_entry_type != ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU) { + dev_err(dev, + "Invalid CXL ECS scrub log entry type(%d) to set\n", + params->log_entry_type); + dev_err(dev, + "Log Entry Type 0: per DRAM 1: per Memory Media FRU\n"); + return -EINVAL; + } + wr_attrs->ecs_log_cap = FIELD_PREP(CXL_ECS_LOG_ENTRY_TYPE_MASK, + params->log_entry_type); + break; + case CXL_ECS_PARAM_THRESHOLD: + fru_wr_attrs[fru_id].ecs_config &= ~CXL_ECS_THRESHOLD_COUNT_MASK; + switch (params->threshold) { + case 256: + fru_wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK, + ECS_THRESHOLD_256); + break; + case 1024: + fru_wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK, + ECS_THRESHOLD_1024); + break; + case 4096: + fru_wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK, + ECS_THRESHOLD_4096); + break; + default: + dev_err(dev, + "Invalid CXL ECS scrub threshold count(%d) to set\n", + params->threshold); + dev_err(dev, + "Supported scrub threshold counts: %u, %u, %u\n", + ecs_supp_threshold[ECS_THRESHOLD_256], + ecs_supp_threshold[ECS_THRESHOLD_1024], + ecs_supp_threshold[ECS_THRESHOLD_4096]); + return -EINVAL; + } + break; + case CXL_ECS_PARAM_MODE: + if (params->count_mode != ECS_MODE_COUNTS_ROWS && + params->count_mode != ECS_MODE_COUNTS_CODEWORDS) { + dev_err(dev, + "Invalid CXL ECS scrub mode(%d) to set\n", + params->count_mode); + dev_err(dev, + "Supported ECS Modes: 0: ECS counts rows with errors," + " 1: ECS counts codewords with errors\n"); + return -EINVAL; + } + fru_wr_attrs[fru_id].ecs_config &= ~CXL_ECS_COUNT_MODE_MASK; + fru_wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_ECS_COUNT_MODE_MASK, + params->count_mode); + break; + case CXL_ECS_PARAM_RESET_COUNTER: + if (params->reset_counter != 1) + return -EINVAL; + + fru_wr_attrs[fru_id].ecs_config &= ~CXL_ECS_RESET_COUNTER_MASK; + fru_wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_ECS_RESET_COUNTER_MASK, + params->reset_counter); + break; + default: + dev_err(dev, "Invalid CXL ECS parameter to set\n"); + return -EINVAL; + } + + ret = cxl_set_feature(mds, cxl_ecs_uuid, cxl_ecs_ctx->set_version, + wr_attrs, wr_data_size, + CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET); + if (ret) { + dev_err(dev, "CXL ECS set feature failed ret=%d\n", ret); + return ret; + } + + return 0; +} + +#define CXL_ECS_READ_ATTR(attrib) \ +static int cxl_ecs_get_##attrib(struct device *dev, void *drv_data, \ + int fru_id, u32 *val) \ +{ \ + struct cxl_ecs_params params; \ + int ret; \ + \ + ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, ¶ms); \ + if (ret) \ + return ret; \ + \ + *val = params.attrib; \ + \ + return 0; \ +} + +CXL_ECS_READ_ATTR(log_entry_type) +CXL_ECS_READ_ATTR(count_mode) +CXL_ECS_READ_ATTR(threshold) + +#define CXL_ECS_GET_ATTR(attrib, data, attr_type) \ +static int cxl_ecs_get_##attrib(struct device *dev, void *drv_data, \ + int fru_id, u32 *val) \ +{ \ + struct cxl_ecs_params params; \ + int ret; \ + \ + ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, ¶ms); \ + if (ret) \ + return ret; \ + \ + if (params.data == (attr_type)) \ + *val = 1; \ + else \ + *val = 0; \ + \ + return 0; \ +} + +CXL_ECS_GET_ATTR(log_entry_type_per_dram, log_entry_type, ECS_LOG_ENTRY_TYPE_DRAM) +CXL_ECS_GET_ATTR(log_entry_type_per_memory_media, log_entry_type, ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU) +CXL_ECS_GET_ATTR(mode_counts_rows, count_mode, ECS_MODE_COUNTS_ROWS) +CXL_ECS_GET_ATTR(mode_counts_codewords, count_mode, ECS_MODE_COUNTS_CODEWORDS) + +#define CXL_ECS_WRITE_ATTR(attrib, param_type) \ +static int cxl_ecs_set_##attrib(struct device *dev, void *drv_data, \ + int fru_id, u32 val) \ +{ \ + struct cxl_ecs_params params = { \ + .attrib = val, \ + }; \ + \ + return cxl_mem_ecs_set_attrs(dev, drv_data, fru_id, ¶ms, (param_type)); \ +} +CXL_ECS_WRITE_ATTR(log_entry_type, CXL_ECS_PARAM_LOG_ENTRY_TYPE) +CXL_ECS_WRITE_ATTR(count_mode, CXL_ECS_PARAM_MODE) +CXL_ECS_WRITE_ATTR(reset_counter, CXL_ECS_PARAM_RESET_COUNTER) +CXL_ECS_WRITE_ATTR(threshold, CXL_ECS_PARAM_THRESHOLD) + +static const struct edac_ecs_ops cxl_ecs_ops = { + .get_log_entry_type = cxl_ecs_get_log_entry_type, + .set_log_entry_type = cxl_ecs_set_log_entry_type, + .get_log_entry_type_per_dram = cxl_ecs_get_log_entry_type_per_dram, + .get_log_entry_type_per_memory_media = + cxl_ecs_get_log_entry_type_per_memory_media, + .get_mode = cxl_ecs_get_count_mode, + .set_mode = cxl_ecs_set_count_mode, + .get_mode_counts_codewords = cxl_ecs_get_mode_counts_codewords, + .get_mode_counts_rows = cxl_ecs_get_mode_counts_rows, + .reset = cxl_ecs_set_reset_counter, + .get_threshold = cxl_ecs_get_threshold, + .set_threshold = cxl_ecs_set_threshold, +}; + int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr) { struct edac_dev_feature ras_features[CXL_DEV_NUM_RAS_FEATURES]; struct cxl_patrol_scrub_context *cxl_ps_ctx; char cxl_dev_name[CXL_SCRUB_NAME_LEN]; + struct cxl_ecs_context *cxl_ecs_ctx; struct cxl_feat_entry feat_entry; struct cxl_memdev_state *mds; struct cxl_dev_state *cxlds; int num_ras_features = 0; + int num_media_frus; u8 scrub_inst = 0; int rc, i; @@ -341,10 +666,10 @@ int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr) rc = cxl_get_supported_feature_entry(mds, &cxl_patrol_scrub_uuid, &feat_entry); if (rc < 0) - return rc; + goto feat_scrub_done; if (!(feat_entry.attr_flags & CXL_FEAT_ENTRY_FLAG_CHANGABLE)) - return -EOPNOTSUPP; + goto feat_scrub_done; } cxl_ps_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ps_ctx), GFP_KERNEL); @@ -375,6 +700,43 @@ int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr) ras_features[num_ras_features].ctx = cxl_ps_ctx; num_ras_features++; +feat_scrub_done: + if (!cxlr) { + rc = cxl_get_supported_feature_entry(mds, &cxl_ecs_uuid, + &feat_entry); + if (rc < 0) + goto feat_ecs_done; + + if (!(feat_entry.attr_flags & CXL_FEAT_ENTRY_FLAG_CHANGABLE)) + goto feat_ecs_done; + num_media_frus = (feat_entry.get_feat_size - sizeof(struct cxl_ecs_rd_attrs)) / + sizeof(struct cxl_ecs_fru_rd_attrs); + if (!num_media_frus) + goto feat_ecs_done; + + cxl_ecs_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ecs_ctx), + GFP_KERNEL); + if (!cxl_ecs_ctx) + goto feat_ecs_done; + *cxl_ecs_ctx = (struct cxl_ecs_context) { + .get_feat_size = feat_entry.get_feat_size, + .set_feat_size = feat_entry.set_feat_size, + .get_version = feat_entry.get_feat_ver, + .set_version = feat_entry.set_feat_ver, + .set_effects = feat_entry.set_effects, + .num_media_frus = num_media_frus, + .cxlmd = cxlmd, + }; + + ras_features[num_ras_features].ft_type = RAS_FEAT_ECS; + ras_features[num_ras_features].ecs_ops = &cxl_ecs_ops; + ras_features[num_ras_features].ctx = cxl_ecs_ctx; + ras_features[num_ras_features].ecs_info.num_media_frus = + num_media_frus; + num_ras_features++; + } + +feat_ecs_done: return edac_dev_register(&cxlmd->dev, cxl_dev_name, NULL, num_ras_features, ras_features); }