diff mbox series

[v2,4/5] efi/cper: align ARM CPER type with UEFI 2.9A/2.10 specs

Message ID 1ca1274f57fac69eda2f193de53077e8254a70fe.1720679234.git.mchehab+huawei@kernel.org
State Superseded
Headers show
Series [v2,1/5] RAS: Report all ARM processor CPER information to userspace | expand

Commit Message

Mauro Carvalho Chehab July 11, 2024, 6:28 a.m. UTC
Up to UEFI spec 2.9, the type byte of CPER struct for ARM processor
was defined simply as:

Type at byte offset 4:

	- Cache error
	- TLB Error
	- Bus Error
	- Micro-architectural Error
	All other values are reserved

Yet, there was no information about how this would be encoded.

Spec 2.9A errata corrected it by defining:

	- Bit 1 - Cache Error
	- Bit 2 - TLB Error
	- Bit 3 - Bus Error
	- Bit 4 - Micro-architectural Error
	All other values are reserved

That actually aligns with the values already defined on older
versions at N.2.4.1. Generic Processor Error Section.

Spec 2.10 also preserve the same encoding as 2.9A.

Adjust CPER and GHES handling code for both generic and ARM
processors to properly handle UEFI 2.9A and 2.10 encoding.

Link: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 drivers/acpi/apei/ghes.c        | 15 ++++++----
 drivers/firmware/efi/cper-arm.c | 50 ++++++++++++++++-----------------
 include/linux/cper.h            | 10 +++----
 3 files changed, 38 insertions(+), 37 deletions(-)

Comments

Borislav Petkov Sept. 2, 2024, 3:27 p.m. UTC | #1
On Thu, Jul 11, 2024 at 08:28:55AM +0200, Mauro Carvalho Chehab wrote:
> Up to UEFI spec 2.9, the type byte of CPER struct for ARM processor
> was defined simply as:
> 
> Type at byte offset 4:
> 
> 	- Cache error
> 	- TLB Error
> 	- Bus Error
> 	- Micro-architectural Error
> 	All other values are reserved
> 
> Yet, there was no information about how this would be encoded.
> 
> Spec 2.9A errata corrected it by defining:
> 
> 	- Bit 1 - Cache Error
> 	- Bit 2 - TLB Error
> 	- Bit 3 - Bus Error
> 	- Bit 4 - Micro-architectural Error
> 	All other values are reserved
> 
> That actually aligns with the values already defined on older
> versions at N.2.4.1. Generic Processor Error Section.
> 
> Spec 2.10 also preserve the same encoding as 2.9A.
> 
> Adjust CPER and GHES handling code for both generic and ARM
> processors to properly handle UEFI 2.9A and 2.10 encoding.
> 
> Link: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  drivers/acpi/apei/ghes.c        | 15 ++++++----
>  drivers/firmware/efi/cper-arm.c | 50 ++++++++++++++++-----------------
>  include/linux/cper.h            | 10 +++----
>  3 files changed, 38 insertions(+), 37 deletions(-)

How was this thing ever tested?!?!

Geez.

drivers/acpi/apei/ghes.c: In function ‘ghes_handle_arm_hw_error’:
drivers/acpi/apei/ghes.c:565:34: error: implicit declaration of function ‘FIELD_GET’ [-Werror=implicit-function-declaration]
  565 |                                  FIELD_GET(CPER_ARM_ERR_TYPE_MASK, err_info->type),
      |                                  ^~~~~~~~~
cc1: some warnings being treated as errors
make[5]: *** [scripts/Makefile.build:244: drivers/acpi/apei/ghes.o] Error 1
make[4]: *** [scripts/Makefile.build:485: drivers/acpi/apei] Error 2
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [scripts/Makefile.build:485: drivers/acpi] Error 2
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [scripts/Makefile.build:485: drivers] Error 2
make[1]: *** [/mnt/kernel/kernel/2nd/linux/Makefile:1925: .] Error 2
make: *** [Makefile:224: __sub-make] Error 2
Mauro Carvalho Chehab Sept. 4, 2024, 4:45 a.m. UTC | #2
Em Mon, 2 Sep 2024 17:27:55 +0200
Borislav Petkov <bp@alien8.de> escreveu:

> On Thu, Jul 11, 2024 at 08:28:55AM +0200, Mauro Carvalho Chehab wrote:
> > Up to UEFI spec 2.9, the type byte of CPER struct for ARM processor
> > was defined simply as:
> > 
> > Type at byte offset 4:
> > 
> > 	- Cache error
> > 	- TLB Error
> > 	- Bus Error
> > 	- Micro-architectural Error
> > 	All other values are reserved
> > 
> > Yet, there was no information about how this would be encoded.
> > 
> > Spec 2.9A errata corrected it by defining:
> > 
> > 	- Bit 1 - Cache Error
> > 	- Bit 2 - TLB Error
> > 	- Bit 3 - Bus Error
> > 	- Bit 4 - Micro-architectural Error
> > 	All other values are reserved
> > 
> > That actually aligns with the values already defined on older
> > versions at N.2.4.1. Generic Processor Error Section.
> > 
> > Spec 2.10 also preserve the same encoding as 2.9A.
> > 
> > Adjust CPER and GHES handling code for both generic and ARM
> > processors to properly handle UEFI 2.9A and 2.10 encoding.
> > 
> > Link: https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > ---
> >  drivers/acpi/apei/ghes.c        | 15 ++++++----
> >  drivers/firmware/efi/cper-arm.c | 50 ++++++++++++++++-----------------
> >  include/linux/cper.h            | 10 +++----
> >  3 files changed, 38 insertions(+), 37 deletions(-)  
> 
> How was this thing ever tested?!?!

It was tested on ARM, together with QEMU error injection code I'm working
there. Currently, only GPIO and SEA notification types are supported, so
no x86 support yet (I'm trying to add SCI too, allowing to test on x86
as well[1]).

It sounds that bitfield.h is indirectly included from
arch/arm64/include/asm/sysreg.h when compiled on such arch.

Anyway, I'll add an include for it there.

[1] https://gitlab.com/mchehab_kernel/qemu/-/commits/TODO_x86_ghes?ref_type=heads
    code is not working yet. I'm stuck with an issue related with EDK2.
    on x86_64, it is failing to create a new fw_cfg file needed to
    store HEST error structures.

Regards,
Mauro
Borislav Petkov Sept. 4, 2024, 10:25 a.m. UTC | #3
On Wed, Sep 04, 2024 at 06:45:49AM +0200, Mauro Carvalho Chehab wrote:
> It was tested on ARM, together with QEMU error injection code I'm working
> there. Currently, only GPIO and SEA notification types are supported, so
> no x86 support yet (I'm trying to add SCI too, allowing to test on x86
> as well[1]).
> 
> It sounds that bitfield.h is indirectly included from
> arch/arm64/include/asm/sysreg.h when compiled on such arch.

Before you send patches, make sure you *at* *least* build-test them on the
affected architectures. I don't have to tell you this.

Thx.
diff mbox series

Patch

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 06d9351a9abc..d7d78d546f68 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -533,6 +533,7 @@  static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
 {
 	struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
 	int flags = sync ? MF_ACTION_REQUIRED : 0;
+	char error_type[120];
 	bool queued = false;
 	int sec_sev, i;
 	char *p;
@@ -545,9 +546,8 @@  static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
 	p = (char *)(err + 1);
 	for (i = 0; i < err->err_info_num; i++) {
 		struct cper_arm_err_info *err_info = (struct cper_arm_err_info *)p;
-		bool is_cache = (err_info->type == CPER_ARM_CACHE_ERROR);
+		bool is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
 		bool has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
-		const char *error_type = "unknown error";
 
 		/*
 		 * The field (err_info->error_info & BIT(26)) is fixed to set to
@@ -561,12 +561,15 @@  static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
 			continue;
 		}
 
-		if (err_info->type < ARRAY_SIZE(cper_proc_error_type_strs))
-			error_type = cper_proc_error_type_strs[err_info->type];
+		cper_bits_to_str(error_type, sizeof(error_type),
+				 FIELD_GET(CPER_ARM_ERR_TYPE_MASK, err_info->type),
+				 cper_proc_error_type_strs,
+				 ARRAY_SIZE(cper_proc_error_type_strs));
 
 		pr_warn_ratelimited(FW_WARN GHES_PFX
-				    "Unhandled processor error type: %s\n",
-				    error_type);
+				    "Unhandled processor error type 0x%02x: %s%s\n",
+				    err_info->type, error_type,
+				    (err_info->type & ~CPER_ARM_ERR_TYPE_MASK) ? " with reserved bit(s)" : "");
 		p += err_info->length;
 	}
 
diff --git a/drivers/firmware/efi/cper-arm.c b/drivers/firmware/efi/cper-arm.c
index eb7ee6af55f2..52d18490b59e 100644
--- a/drivers/firmware/efi/cper-arm.c
+++ b/drivers/firmware/efi/cper-arm.c
@@ -93,15 +93,11 @@  static void cper_print_arm_err_info(const char *pfx, u32 type,
 	bool proc_context_corrupt, corrected, precise_pc, restartable_pc;
 	bool time_out, access_mode;
 
-	/* If the type is unknown, bail. */
-	if (type > CPER_ARM_MAX_TYPE)
-		return;
-
 	/*
 	 * Vendor type errors have error information values that are vendor
 	 * specific.
 	 */
-	if (type == CPER_ARM_VENDOR_ERROR)
+	if (type & CPER_ARM_VENDOR_ERROR)
 		return;
 
 	if (error_info & CPER_ARM_ERR_VALID_TRANSACTION_TYPE) {
@@ -116,43 +112,38 @@  static void cper_print_arm_err_info(const char *pfx, u32 type,
 	if (error_info & CPER_ARM_ERR_VALID_OPERATION_TYPE) {
 		op_type = ((error_info >> CPER_ARM_ERR_OPERATION_SHIFT)
 			   & CPER_ARM_ERR_OPERATION_MASK);
-		switch (type) {
-		case CPER_ARM_CACHE_ERROR:
+		if (type & CPER_ARM_CACHE_ERROR) {
 			if (op_type < ARRAY_SIZE(arm_cache_err_op_strs)) {
-				printk("%soperation type: %s\n", pfx,
+				printk("%scache error, operation type: %s\n", pfx,
 				       arm_cache_err_op_strs[op_type]);
 			}
-			break;
-		case CPER_ARM_TLB_ERROR:
+		}
+		if (type & CPER_ARM_TLB_ERROR) {
 			if (op_type < ARRAY_SIZE(arm_tlb_err_op_strs)) {
-				printk("%soperation type: %s\n", pfx,
+				printk("%sTLB error, operation type: %s\n", pfx,
 				       arm_tlb_err_op_strs[op_type]);
 			}
-			break;
-		case CPER_ARM_BUS_ERROR:
+		}
+		if (type & CPER_ARM_BUS_ERROR) {
 			if (op_type < ARRAY_SIZE(arm_bus_err_op_strs)) {
-				printk("%soperation type: %s\n", pfx,
+				printk("%sbus error, operation type: %s\n", pfx,
 				       arm_bus_err_op_strs[op_type]);
 			}
-			break;
 		}
 	}
 
 	if (error_info & CPER_ARM_ERR_VALID_LEVEL) {
 		level = ((error_info >> CPER_ARM_ERR_LEVEL_SHIFT)
 			 & CPER_ARM_ERR_LEVEL_MASK);
-		switch (type) {
-		case CPER_ARM_CACHE_ERROR:
+		if (type & CPER_ARM_CACHE_ERROR)
 			printk("%scache level: %d\n", pfx, level);
-			break;
-		case CPER_ARM_TLB_ERROR:
+
+		if (type & CPER_ARM_TLB_ERROR)
 			printk("%sTLB level: %d\n", pfx, level);
-			break;
-		case CPER_ARM_BUS_ERROR:
+
+		if (type & CPER_ARM_BUS_ERROR)
 			printk("%saffinity level at which the bus error occurred: %d\n",
 			       pfx, level);
-			break;
-		}
 	}
 
 	if (error_info & CPER_ARM_ERR_VALID_PROC_CONTEXT_CORRUPT) {
@@ -241,6 +232,7 @@  void cper_print_proc_arm(const char *pfx,
 	struct cper_arm_err_info *err_info;
 	struct cper_arm_ctx_info *ctx_info;
 	char newpfx[64], infopfx[ARRAY_SIZE(newpfx) + 1];
+	char error_type[120];
 
 	printk("%sMIDR: 0x%016llx\n", pfx, proc->midr);
 
@@ -289,9 +281,15 @@  void cper_print_proc_arm(const char *pfx,
 				       newpfx);
 		}
 
-		printk("%serror_type: %d, %s\n", newpfx, err_info->type,
-			err_info->type < ARRAY_SIZE(cper_proc_error_type_strs) ?
-			cper_proc_error_type_strs[err_info->type] : "unknown");
+		cper_bits_to_str(error_type, sizeof(error_type),
+				 FIELD_GET(CPER_ARM_ERR_TYPE_MASK, err_info->type),
+				 cper_proc_error_type_strs,
+				 ARRAY_SIZE(cper_proc_error_type_strs));
+
+		printk("%serror_type: 0x%02x: %s%s\n", newpfx, err_info->type,
+		       error_type,
+		       (err_info->type & ~CPER_ARM_ERR_TYPE_MASK) ? " with reserved bit(s)" : "");
+
 		if (err_info->validation_bits & CPER_ARM_INFO_VALID_ERR_INFO) {
 			printk("%serror_info: 0x%016llx\n", newpfx,
 			       err_info->error_info);
diff --git a/include/linux/cper.h b/include/linux/cper.h
index 25858a7608b7..3670b866ac11 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -293,11 +293,11 @@  enum {
 #define CPER_ARM_INFO_FLAGS_PROPAGATED		BIT(2)
 #define CPER_ARM_INFO_FLAGS_OVERFLOW		BIT(3)
 
-#define CPER_ARM_CACHE_ERROR			0
-#define CPER_ARM_TLB_ERROR			1
-#define CPER_ARM_BUS_ERROR			2
-#define CPER_ARM_VENDOR_ERROR			3
-#define CPER_ARM_MAX_TYPE			CPER_ARM_VENDOR_ERROR
+#define CPER_ARM_ERR_TYPE_MASK			GENMASK(4,1)
+#define CPER_ARM_CACHE_ERROR			BIT(1)
+#define CPER_ARM_TLB_ERROR			BIT(2)
+#define CPER_ARM_BUS_ERROR			BIT(3)
+#define CPER_ARM_VENDOR_ERROR			BIT(4)
 
 #define CPER_ARM_ERR_VALID_TRANSACTION_TYPE	BIT(0)
 #define CPER_ARM_ERR_VALID_OPERATION_TYPE	BIT(1)