Message ID | 1ca1274f57fac69eda2f193de53077e8254a70fe.1720679234.git.mchehab+huawei@kernel.org |
---|---|
State | Superseded |
Headers | show |
Series | [v2,1/5] RAS: Report all ARM processor CPER information to userspace | expand |
On Thu, Jul 11, 2024 at 08:28:55AM +0200, Mauro Carvalho Chehab wrote: > Up to UEFI spec 2.9, the type byte of CPER struct for ARM processor > was defined simply as: > > Type at byte offset 4: > > - Cache error > - TLB Error > - Bus Error > - Micro-architectural Error > All other values are reserved > > Yet, there was no information about how this would be encoded. > > Spec 2.9A errata corrected it by defining: > > - Bit 1 - Cache Error > - Bit 2 - TLB Error > - Bit 3 - Bus Error > - Bit 4 - Micro-architectural Error > All other values are reserved > > That actually aligns with the values already defined on older > versions at N.2.4.1. Generic Processor Error Section. > > Spec 2.10 also preserve the same encoding as 2.9A. > > Adjust CPER and GHES handling code for both generic and ARM > processors to properly handle UEFI 2.9A and 2.10 encoding. > > Link: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > drivers/acpi/apei/ghes.c | 15 ++++++---- > drivers/firmware/efi/cper-arm.c | 50 ++++++++++++++++----------------- > include/linux/cper.h | 10 +++---- > 3 files changed, 38 insertions(+), 37 deletions(-) How was this thing ever tested?!?! Geez. drivers/acpi/apei/ghes.c: In function ‘ghes_handle_arm_hw_error’: drivers/acpi/apei/ghes.c:565:34: error: implicit declaration of function ‘FIELD_GET’ [-Werror=implicit-function-declaration] 565 | FIELD_GET(CPER_ARM_ERR_TYPE_MASK, err_info->type), | ^~~~~~~~~ cc1: some warnings being treated as errors make[5]: *** [scripts/Makefile.build:244: drivers/acpi/apei/ghes.o] Error 1 make[4]: *** [scripts/Makefile.build:485: drivers/acpi/apei] Error 2 make[4]: *** Waiting for unfinished jobs.... make[3]: *** [scripts/Makefile.build:485: drivers/acpi] Error 2 make[3]: *** Waiting for unfinished jobs.... make[2]: *** [scripts/Makefile.build:485: drivers] Error 2 make[1]: *** [/mnt/kernel/kernel/2nd/linux/Makefile:1925: .] Error 2 make: *** [Makefile:224: __sub-make] Error 2
Em Mon, 2 Sep 2024 17:27:55 +0200 Borislav Petkov <bp@alien8.de> escreveu: > On Thu, Jul 11, 2024 at 08:28:55AM +0200, Mauro Carvalho Chehab wrote: > > Up to UEFI spec 2.9, the type byte of CPER struct for ARM processor > > was defined simply as: > > > > Type at byte offset 4: > > > > - Cache error > > - TLB Error > > - Bus Error > > - Micro-architectural Error > > All other values are reserved > > > > Yet, there was no information about how this would be encoded. > > > > Spec 2.9A errata corrected it by defining: > > > > - Bit 1 - Cache Error > > - Bit 2 - TLB Error > > - Bit 3 - Bus Error > > - Bit 4 - Micro-architectural Error > > All other values are reserved > > > > That actually aligns with the values already defined on older > > versions at N.2.4.1. Generic Processor Error Section. > > > > Spec 2.10 also preserve the same encoding as 2.9A. > > > > Adjust CPER and GHES handling code for both generic and ARM > > processors to properly handle UEFI 2.9A and 2.10 encoding. > > > > Link: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > --- > > drivers/acpi/apei/ghes.c | 15 ++++++---- > > drivers/firmware/efi/cper-arm.c | 50 ++++++++++++++++----------------- > > include/linux/cper.h | 10 +++---- > > 3 files changed, 38 insertions(+), 37 deletions(-) > > How was this thing ever tested?!?! It was tested on ARM, together with QEMU error injection code I'm working there. Currently, only GPIO and SEA notification types are supported, so no x86 support yet (I'm trying to add SCI too, allowing to test on x86 as well[1]). It sounds that bitfield.h is indirectly included from arch/arm64/include/asm/sysreg.h when compiled on such arch. Anyway, I'll add an include for it there. [1] https://gitlab.com/mchehab_kernel/qemu/-/commits/TODO_x86_ghes?ref_type=heads code is not working yet. I'm stuck with an issue related with EDK2. on x86_64, it is failing to create a new fw_cfg file needed to store HEST error structures. Regards, Mauro
On Wed, Sep 04, 2024 at 06:45:49AM +0200, Mauro Carvalho Chehab wrote: > It was tested on ARM, together with QEMU error injection code I'm working > there. Currently, only GPIO and SEA notification types are supported, so > no x86 support yet (I'm trying to add SCI too, allowing to test on x86 > as well[1]). > > It sounds that bitfield.h is indirectly included from > arch/arm64/include/asm/sysreg.h when compiled on such arch. Before you send patches, make sure you *at* *least* build-test them on the affected architectures. I don't have to tell you this. Thx.
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 06d9351a9abc..d7d78d546f68 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -533,6 +533,7 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, { struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata); int flags = sync ? MF_ACTION_REQUIRED : 0; + char error_type[120]; bool queued = false; int sec_sev, i; char *p; @@ -545,9 +546,8 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, p = (char *)(err + 1); for (i = 0; i < err->err_info_num; i++) { struct cper_arm_err_info *err_info = (struct cper_arm_err_info *)p; - bool is_cache = (err_info->type == CPER_ARM_CACHE_ERROR); + bool is_cache = err_info->type & CPER_ARM_CACHE_ERROR; bool has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR); - const char *error_type = "unknown error"; /* * The field (err_info->error_info & BIT(26)) is fixed to set to @@ -561,12 +561,15 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, continue; } - if (err_info->type < ARRAY_SIZE(cper_proc_error_type_strs)) - error_type = cper_proc_error_type_strs[err_info->type]; + cper_bits_to_str(error_type, sizeof(error_type), + FIELD_GET(CPER_ARM_ERR_TYPE_MASK, err_info->type), + cper_proc_error_type_strs, + ARRAY_SIZE(cper_proc_error_type_strs)); pr_warn_ratelimited(FW_WARN GHES_PFX - "Unhandled processor error type: %s\n", - error_type); + "Unhandled processor error type 0x%02x: %s%s\n", + err_info->type, error_type, + (err_info->type & ~CPER_ARM_ERR_TYPE_MASK) ? " with reserved bit(s)" : ""); p += err_info->length; } diff --git a/drivers/firmware/efi/cper-arm.c b/drivers/firmware/efi/cper-arm.c index eb7ee6af55f2..52d18490b59e 100644 --- a/drivers/firmware/efi/cper-arm.c +++ b/drivers/firmware/efi/cper-arm.c @@ -93,15 +93,11 @@ static void cper_print_arm_err_info(const char *pfx, u32 type, bool proc_context_corrupt, corrected, precise_pc, restartable_pc; bool time_out, access_mode; - /* If the type is unknown, bail. */ - if (type > CPER_ARM_MAX_TYPE) - return; - /* * Vendor type errors have error information values that are vendor * specific. */ - if (type == CPER_ARM_VENDOR_ERROR) + if (type & CPER_ARM_VENDOR_ERROR) return; if (error_info & CPER_ARM_ERR_VALID_TRANSACTION_TYPE) { @@ -116,43 +112,38 @@ static void cper_print_arm_err_info(const char *pfx, u32 type, if (error_info & CPER_ARM_ERR_VALID_OPERATION_TYPE) { op_type = ((error_info >> CPER_ARM_ERR_OPERATION_SHIFT) & CPER_ARM_ERR_OPERATION_MASK); - switch (type) { - case CPER_ARM_CACHE_ERROR: + if (type & CPER_ARM_CACHE_ERROR) { if (op_type < ARRAY_SIZE(arm_cache_err_op_strs)) { - printk("%soperation type: %s\n", pfx, + printk("%scache error, operation type: %s\n", pfx, arm_cache_err_op_strs[op_type]); } - break; - case CPER_ARM_TLB_ERROR: + } + if (type & CPER_ARM_TLB_ERROR) { if (op_type < ARRAY_SIZE(arm_tlb_err_op_strs)) { - printk("%soperation type: %s\n", pfx, + printk("%sTLB error, operation type: %s\n", pfx, arm_tlb_err_op_strs[op_type]); } - break; - case CPER_ARM_BUS_ERROR: + } + if (type & CPER_ARM_BUS_ERROR) { if (op_type < ARRAY_SIZE(arm_bus_err_op_strs)) { - printk("%soperation type: %s\n", pfx, + printk("%sbus error, operation type: %s\n", pfx, arm_bus_err_op_strs[op_type]); } - break; } } if (error_info & CPER_ARM_ERR_VALID_LEVEL) { level = ((error_info >> CPER_ARM_ERR_LEVEL_SHIFT) & CPER_ARM_ERR_LEVEL_MASK); - switch (type) { - case CPER_ARM_CACHE_ERROR: + if (type & CPER_ARM_CACHE_ERROR) printk("%scache level: %d\n", pfx, level); - break; - case CPER_ARM_TLB_ERROR: + + if (type & CPER_ARM_TLB_ERROR) printk("%sTLB level: %d\n", pfx, level); - break; - case CPER_ARM_BUS_ERROR: + + if (type & CPER_ARM_BUS_ERROR) printk("%saffinity level at which the bus error occurred: %d\n", pfx, level); - break; - } } if (error_info & CPER_ARM_ERR_VALID_PROC_CONTEXT_CORRUPT) { @@ -241,6 +232,7 @@ void cper_print_proc_arm(const char *pfx, struct cper_arm_err_info *err_info; struct cper_arm_ctx_info *ctx_info; char newpfx[64], infopfx[ARRAY_SIZE(newpfx) + 1]; + char error_type[120]; printk("%sMIDR: 0x%016llx\n", pfx, proc->midr); @@ -289,9 +281,15 @@ void cper_print_proc_arm(const char *pfx, newpfx); } - printk("%serror_type: %d, %s\n", newpfx, err_info->type, - err_info->type < ARRAY_SIZE(cper_proc_error_type_strs) ? - cper_proc_error_type_strs[err_info->type] : "unknown"); + cper_bits_to_str(error_type, sizeof(error_type), + FIELD_GET(CPER_ARM_ERR_TYPE_MASK, err_info->type), + cper_proc_error_type_strs, + ARRAY_SIZE(cper_proc_error_type_strs)); + + printk("%serror_type: 0x%02x: %s%s\n", newpfx, err_info->type, + error_type, + (err_info->type & ~CPER_ARM_ERR_TYPE_MASK) ? " with reserved bit(s)" : ""); + if (err_info->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO) { printk("%serror_info: 0x%016llx\n", newpfx, err_info->error_info); diff --git a/include/linux/cper.h b/include/linux/cper.h index 25858a7608b7..3670b866ac11 100644 --- a/include/linux/cper.h +++ b/include/linux/cper.h @@ -293,11 +293,11 @@ enum { #define CPER_ARM_INFO_FLAGS_PROPAGATED BIT(2) #define CPER_ARM_INFO_FLAGS_OVERFLOW BIT(3) -#define CPER_ARM_CACHE_ERROR 0 -#define CPER_ARM_TLB_ERROR 1 -#define CPER_ARM_BUS_ERROR 2 -#define CPER_ARM_VENDOR_ERROR 3 -#define CPER_ARM_MAX_TYPE CPER_ARM_VENDOR_ERROR +#define CPER_ARM_ERR_TYPE_MASK GENMASK(4,1) +#define CPER_ARM_CACHE_ERROR BIT(1) +#define CPER_ARM_TLB_ERROR BIT(2) +#define CPER_ARM_BUS_ERROR BIT(3) +#define CPER_ARM_VENDOR_ERROR BIT(4) #define CPER_ARM_ERR_VALID_TRANSACTION_TYPE BIT(0) #define CPER_ARM_ERR_VALID_OPERATION_TYPE BIT(1)