Message ID | 20240228-cxl-cper3-v1-2-6aa3f1343c6c@intel.com |
---|---|
State | New |
Headers | show |
Series | efi/cxl-cper: Report CXL CPER events through tracing | expand |
Ira Weiny wrote: > BIOS can configure memory devices as firmware first. This will send CXL > events to the firmware instead of the OS. The firmware can then send > these events to the OS via UEFI. Currently a configuration such as this > will trace a non standard event in the log. Using the specific CXL > trace points with the additional information CXL can provide is much > more useful to users. Specifically, future support can be added to CXL > provide the DPA to HPA mapping configured at the time of the event. One could argue that support should have happened first and taken the event all the way to EDAC, so this needs to merged on faith that that those patches are in flight. > UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER) > format for CXL Component Events. The format is mostly the same as the > CXL Common Event Record Format. The difference is the use of a GUID in > the Section Type rather than a UUID as part of the event itself. > > Add GHES support to detect CXL CPER records and call into the CXL code > to process the event. > > Multiple methods were considered for the call into the CXL code. A > notifier chain was considered for the callback but the complexity did > not justify the use case. Not sure what this adds. If you want to talk about alternatives and tradeoffs, great, but that should be a comparative analysis in support of the chosen direction. > Furthermore, the CXL code is required to be called from process > context as it needs to take a device lock so a simple callback > register proved difficult. Dan Williams suggested using 2 work items > as an atomic way of switching between a callback being registered and > not. This allows the callback to run without any locking.[1] > > Note that a local work item is required to dump any messages seen during > a race between any check done in cxl_cper_post_event() and the > scheduling of work. That said, no attempt is made to stop the addition > of messages into the kfifo because this local work item provides a hook > to add a local CXL CPER trace point in a future patch. > > This new combined patch addresses the report by Dan Carpenter[2]. Thus > the reported by tag. > > [1] https://lore.kernel.org/all/65d111eb87115_6c745294ac@dwillia2-xfh.jf.intel.com.notmuch/ > [2] https://lore.kernel.org/all/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain/ > > Cc: Ard Biesheuvel <ardb@kernel.org> > Cc: "Rafael J. Wysocki" <rafael@kernel.org> > Cc: Tony Luck <tony.luck@intel.com> > Cc: Borislav Petkov <bp@alien8.de> > Reported-by: Dan Carpenter <dan.carpenter@linaro.org> checkpatch will whine about a missing Closes: tag. > Suggested-by: Dan Williams <dan.j.williams@intel.com> > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > > --- > [djbw: use kfifo for record data] > [djbw: Use work struct for sync between cxl reg and ghes code] > --- > drivers/acpi/apei/ghes.c | 127 ++++++++++++++++++++++++++++++++++++++++++++++ > include/linux/cxl-event.h | 18 +++++++ > 2 files changed, 145 insertions(+) > > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c > index ab2a82cb1b0b..f433f4eae888 100644 > --- a/drivers/acpi/apei/ghes.c > +++ b/drivers/acpi/apei/ghes.c > @@ -26,6 +26,7 @@ > #include <linux/interrupt.h> > #include <linux/timer.h> > #include <linux/cper.h> > +#include <linux/cxl-event.h> > #include <linux/platform_device.h> > #include <linux/mutex.h> > #include <linux/ratelimit.h> > @@ -33,6 +34,7 @@ > #include <linux/irq_work.h> > #include <linux/llist.h> > #include <linux/genalloc.h> > +#include <linux/kfifo.h> > #include <linux/pci.h> > #include <linux/pfn.h> > #include <linux/aer.h> > @@ -673,6 +675,116 @@ static void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata, > schedule_work(&entry->work); > } > > +/* CXL Event record UUIDs are formated as GUIDs and reported in section type */ > + > +/* > + * General Media Event Record > + * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43 > + */ > +#define CPER_SEC_CXL_GEN_MEDIA_GUID \ > + GUID_INIT(0xfbcd0a77, 0xc260, 0x417f, \ > + 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6) > + > +/* > + * DRAM Event Record > + * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44 > + */ > +#define CPER_SEC_CXL_DRAM_GUID \ > + GUID_INIT(0x601dcbb3, 0x9c06, 0x4eab, \ > + 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24) > + > +/* > + * Memory Module Event Record > + * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45 > + */ > +#define CPER_SEC_CXL_MEM_MODULE_GUID \ > + GUID_INIT(0xfe927475, 0xdd59, 0x4339, \ > + 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74) > + > +struct cxl_cper_work_data { > + enum cxl_event_type event_type; > + struct cxl_cper_event_rec rec; > +}; > + > +DEFINE_KFIFO(cxl_cper_fifo, struct cxl_cper_work_data, 32); > +static DEFINE_SPINLOCK(cxl_cper_read_lock); > +static DEFINE_SPINLOCK(cxl_cper_write_lock); > + > +static cxl_cper_callback cper_callback; > +/* cb function dumps the records */ > +static void cxl_cper_cb_fn(struct work_struct *work) > +{ > + struct cxl_cper_work_data wd; > + > + while (kfifo_out_spinlocked(&cxl_cper_fifo, &wd, 1, > + &cxl_cper_read_lock)) { Why is this taking the lock on retrieval? The work item is single-threaded. The only potential contention is between cxl_cper_local_fn() and cxl_cper_cb_fn() collision, but that can be handled by a cancel_work_sync(&cxl_local_work) on registration to pair with the cancel_work_sync(&cxl_cb_work) on unregistration. ...but I am not sure 2 work items are needed unless some default processing is going to happen in the "local" case. > + cper_callback(wd.event_type, &wd.rec); > + } > +} > +static DECLARE_WORK(cxl_cb_work, cxl_cper_cb_fn); > + > +static void cxl_cper_local_fn(struct work_struct *work) > +{ > + struct cxl_cper_work_data wd; > + > + while (kfifo_out_spinlocked(&cxl_cper_fifo, &wd, 1, > + &cxl_cper_read_lock)) { This just looks like open coded / less efficient kfifo_reset_out(). > + /* drop msg */ If the proposal is to do nothing when no callback is registered then no need to have have cxl_local_work at all. > + } > +} > +static DECLARE_WORK(cxl_local_work, cxl_cper_local_fn); > + > +/* Pointer for atomic switch of record processing */ > +struct work_struct *cxl_cper_work = &cxl_local_work; > + > +static void cxl_cper_post_event(enum cxl_event_type event_type, > + struct cxl_cper_event_rec *rec) > +{ > + struct cxl_cper_work_data wd; > + > + if (rec->hdr.length <= sizeof(rec->hdr) || > + rec->hdr.length > sizeof(*rec)) { > + pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n", > + rec->hdr.length); > + return; > + } > + > + if (!(rec->hdr.validation_bits & CPER_CXL_COMP_EVENT_LOG_VALID)) { > + pr_err(FW_WARN "CXL CPER invalid event\n"); > + return; > + } > + > + wd.event_type = event_type; > + memcpy(&wd.rec, rec, sizeof(wd.rec)); Unfortunate to have a double copy of the record into the stack variable and then again into the kfifo, but I can not immediately think of a way around that. > + > + kfifo_in_spinlocked(&cxl_cper_fifo, &wd, 1, &cxl_cper_write_lock); > + schedule_work(cxl_cper_work); I think you don't need 2 work items if you write it this way: spin_lock_irqsave(&cxl_cper_write_lock, flags); if (cxl_cper_work) { if (kfifo_put(&cxl_cper_fifo, &wd)) schedule_work(cxl_cper_work); else pr_err_ratelimited( "buffer overflow when queuing CXL CPER record\n"); } spin_lock_irqrestore(&cxl_cper_write_lock, flags); ...and then take the write lock when modifying that cxl_cper_work pointer between NULL and non-NULL.
Dan Williams wrote: > Ira Weiny wrote: > > BIOS can configure memory devices as firmware first. This will send CXL > > events to the firmware instead of the OS. The firmware can then send > > these events to the OS via UEFI. Currently a configuration such as this > > will trace a non standard event in the log. Using the specific CXL > > trace points with the additional information CXL can provide is much ^^^^^^^^^^^^^^^^^^^^^^^^ See below. > > more useful to users. > > Specifically, future support can be added to CXL > > provide the DPA to HPA mapping configured at the time of the event. > > One could argue that support should have happened first and taken the > event all the way to EDAC, so this needs to merged on faith that that > those patches are in flight. Sure but then you also don't get the additional information already provided by the CXL trace points. Reading this back I see that my use of 'Specifically' masked the fact that the CXL code already has better support to decode these records and we want to leverage that. > > > UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER) > > format for CXL Component Events. The format is mostly the same as the > > CXL Common Event Record Format. The difference is the use of a GUID in > > the Section Type rather than a UUID as part of the event itself. > > > > Add GHES support to detect CXL CPER records and call into the CXL code > > to process the event. > > > > Multiple methods were considered for the call into the CXL code. A > > notifier chain was considered for the callback but the complexity did > > not justify the use case. > > Not sure what this adds. If you want to talk about alternatives and > tradeoffs, great, but that should be a comparative analysis in support > of the chosen direction. I'll remove it. This was carried from the previous versions. > > > Furthermore, the CXL code is required to be called from process > > context as it needs to take a device lock so a simple callback > > register proved difficult. Dan Williams suggested using 2 work items > > as an atomic way of switching between a callback being registered and > > not. This allows the callback to run without any locking.[1] > > > > Note that a local work item is required to dump any messages seen during > > a race between any check done in cxl_cper_post_event() and the > > scheduling of work. That said, no attempt is made to stop the addition > > of messages into the kfifo because this local work item provides a hook > > to add a local CXL CPER trace point in a future patch. > > > > This new combined patch addresses the report by Dan Carpenter[2]. Thus > > the reported by tag. > > > > [1] https://lore.kernel.org/all/65d111eb87115_6c745294ac@dwillia2-xfh.jf.intel.com.notmuch/ > > [2] https://lore.kernel.org/all/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain/ > > > > Cc: Ard Biesheuvel <ardb@kernel.org> > > Cc: "Rafael J. Wysocki" <rafael@kernel.org> > > Cc: Tony Luck <tony.luck@intel.com> > > Cc: Borislav Petkov <bp@alien8.de> > > Reported-by: Dan Carpenter <dan.carpenter@linaro.org> > > checkpatch will whine about a missing Closes: tag. I know. How about Suggested-by? I want to give Dan credit because your revert already closed the actual bug. [snip] > > + > > +DEFINE_KFIFO(cxl_cper_fifo, struct cxl_cper_work_data, 32); > > +static DEFINE_SPINLOCK(cxl_cper_read_lock); > > +static DEFINE_SPINLOCK(cxl_cper_write_lock); > > + > > +static cxl_cper_callback cper_callback; > > +/* cb function dumps the records */ > > +static void cxl_cper_cb_fn(struct work_struct *work) > > +{ > > + struct cxl_cper_work_data wd; > > + > > + while (kfifo_out_spinlocked(&cxl_cper_fifo, &wd, 1, > > + &cxl_cper_read_lock)) { > > Why is this taking the lock on retrieval? The work item is > single-threaded. The only potential contention is between > cxl_cper_local_fn() and cxl_cper_cb_fn() collision, but that can be ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Because of this collision, Yes. > handled by a cancel_work_sync(&cxl_local_work) on registration to pair > with the cancel_work_sync(&cxl_cb_work) on unregistration. Ok yea I could use cancel_work_sync(). > > ...but I am not sure 2 work items are needed unless some default > processing is going to happen in the "local" case. I thought about merging patch 3 in this one and that would make the use of this work function more clear. > > > + cper_callback(wd.event_type, &wd.rec); > > + } > > +} > > +static DECLARE_WORK(cxl_cb_work, cxl_cper_cb_fn); > > + > > +static void cxl_cper_local_fn(struct work_struct *work) > > +{ > > + struct cxl_cper_work_data wd; > > + > > + while (kfifo_out_spinlocked(&cxl_cper_fifo, &wd, 1, > > + &cxl_cper_read_lock)) { > > This just looks like open coded / less efficient kfifo_reset_out(). > > > + /* drop msg */ > > If the proposal is to do nothing when no callback is registered then no > need to have have cxl_local_work at all. Patch 4 makes use of this. > > > + } > > +} > > +static DECLARE_WORK(cxl_local_work, cxl_cper_local_fn); > > + > > +/* Pointer for atomic switch of record processing */ > > +struct work_struct *cxl_cper_work = &cxl_local_work; > > + > > +static void cxl_cper_post_event(enum cxl_event_type event_type, > > + struct cxl_cper_event_rec *rec) > > +{ > > + struct cxl_cper_work_data wd; > > + > > + if (rec->hdr.length <= sizeof(rec->hdr) || > > + rec->hdr.length > sizeof(*rec)) { > > + pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n", > > + rec->hdr.length); > > + return; > > + } > > + > > + if (!(rec->hdr.validation_bits & CPER_CXL_COMP_EVENT_LOG_VALID)) { > > + pr_err(FW_WARN "CXL CPER invalid event\n"); > > + return; > > + } > > + > > + wd.event_type = event_type; > > + memcpy(&wd.rec, rec, sizeof(wd.rec)); > > Unfortunate to have a double copy of the record into the stack variable > and then again into the kfifo, but I can not immediately think of a way > around that. Yep. I could not either. > > > + > > + kfifo_in_spinlocked(&cxl_cper_fifo, &wd, 1, &cxl_cper_write_lock); > > + schedule_work(cxl_cper_work); > > I think you don't need 2 work items if you write it this way: > > spin_lock_irqsave(&cxl_cper_write_lock, flags); > if (cxl_cper_work) { > if (kfifo_put(&cxl_cper_fifo, &wd)) > schedule_work(cxl_cper_work); > else > pr_err_ratelimited( > "buffer overflow when queuing CXL CPER record\n"); > } > spin_lock_irqrestore(&cxl_cper_write_lock, flags); > > ...and then take the write lock when modifying that cxl_cper_work > pointer between NULL and non-NULL. I'll merge patch 4 here. Then this will be used. Ira
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index ab2a82cb1b0b..f433f4eae888 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -26,6 +26,7 @@ #include <linux/interrupt.h> #include <linux/timer.h> #include <linux/cper.h> +#include <linux/cxl-event.h> #include <linux/platform_device.h> #include <linux/mutex.h> #include <linux/ratelimit.h> @@ -33,6 +34,7 @@ #include <linux/irq_work.h> #include <linux/llist.h> #include <linux/genalloc.h> +#include <linux/kfifo.h> #include <linux/pci.h> #include <linux/pfn.h> #include <linux/aer.h> @@ -673,6 +675,116 @@ static void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata, schedule_work(&entry->work); } +/* CXL Event record UUIDs are formated as GUIDs and reported in section type */ + +/* + * General Media Event Record + * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43 + */ +#define CPER_SEC_CXL_GEN_MEDIA_GUID \ + GUID_INIT(0xfbcd0a77, 0xc260, 0x417f, \ + 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6) + +/* + * DRAM Event Record + * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44 + */ +#define CPER_SEC_CXL_DRAM_GUID \ + GUID_INIT(0x601dcbb3, 0x9c06, 0x4eab, \ + 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24) + +/* + * Memory Module Event Record + * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45 + */ +#define CPER_SEC_CXL_MEM_MODULE_GUID \ + GUID_INIT(0xfe927475, 0xdd59, 0x4339, \ + 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74) + +struct cxl_cper_work_data { + enum cxl_event_type event_type; + struct cxl_cper_event_rec rec; +}; + +DEFINE_KFIFO(cxl_cper_fifo, struct cxl_cper_work_data, 32); +static DEFINE_SPINLOCK(cxl_cper_read_lock); +static DEFINE_SPINLOCK(cxl_cper_write_lock); + +static cxl_cper_callback cper_callback; +/* cb function dumps the records */ +static void cxl_cper_cb_fn(struct work_struct *work) +{ + struct cxl_cper_work_data wd; + + while (kfifo_out_spinlocked(&cxl_cper_fifo, &wd, 1, + &cxl_cper_read_lock)) { + cper_callback(wd.event_type, &wd.rec); + } +} +static DECLARE_WORK(cxl_cb_work, cxl_cper_cb_fn); + +static void cxl_cper_local_fn(struct work_struct *work) +{ + struct cxl_cper_work_data wd; + + while (kfifo_out_spinlocked(&cxl_cper_fifo, &wd, 1, + &cxl_cper_read_lock)) { + /* drop msg */ + } +} +static DECLARE_WORK(cxl_local_work, cxl_cper_local_fn); + +/* Pointer for atomic switch of record processing */ +struct work_struct *cxl_cper_work = &cxl_local_work; + +static void cxl_cper_post_event(enum cxl_event_type event_type, + struct cxl_cper_event_rec *rec) +{ + struct cxl_cper_work_data wd; + + if (rec->hdr.length <= sizeof(rec->hdr) || + rec->hdr.length > sizeof(*rec)) { + pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n", + rec->hdr.length); + return; + } + + if (!(rec->hdr.validation_bits & CPER_CXL_COMP_EVENT_LOG_VALID)) { + pr_err(FW_WARN "CXL CPER invalid event\n"); + return; + } + + wd.event_type = event_type; + memcpy(&wd.rec, rec, sizeof(wd.rec)); + + kfifo_in_spinlocked(&cxl_cper_fifo, &wd, 1, &cxl_cper_write_lock); + schedule_work(cxl_cper_work); +} + +int cxl_cper_register_callback(cxl_cper_callback callback) +{ + if (cper_callback) + return -EINVAL; + cper_callback = callback; + /* Atomic switch back to callback processing */ + cxl_cper_work = &cxl_cb_work; + return 0; +} +EXPORT_SYMBOL_NS_GPL(cxl_cper_register_callback, CXL); + +int cxl_cper_unregister_callback(cxl_cper_callback callback) +{ + if (callback != cper_callback) + return -EINVAL; + + /* Atomic switch back to ghes processing */ + cxl_cper_work = &cxl_local_work; + cancel_work_sync(&cxl_cb_work); + cper_callback = NULL; + return 0; +} +EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_callback, CXL); + static bool ghes_do_proc(struct ghes *ghes, const struct acpi_hest_generic_status *estatus) { @@ -707,6 +819,21 @@ static bool ghes_do_proc(struct ghes *ghes, } else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) { queued = ghes_handle_arm_hw_error(gdata, sev, sync); + } + else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID)) { + struct cxl_cper_event_rec *rec = acpi_hest_get_payload(gdata); + + cxl_cper_post_event(CXL_CPER_EVENT_GEN_MEDIA, rec); + } + else if (guid_equal(sec_type, &CPER_SEC_CXL_DRAM_GUID)) { + struct cxl_cper_event_rec *rec = acpi_hest_get_payload(gdata); + + cxl_cper_post_event(CXL_CPER_EVENT_DRAM, rec); + } + else if (guid_equal(sec_type, &CPER_SEC_CXL_MEM_MODULE_GUID)) { + struct cxl_cper_event_rec *rec = acpi_hest_get_payload(gdata); + + cxl_cper_post_event(CXL_CPER_EVENT_MEM_MODULE, rec); } else { void *err = acpi_hest_get_payload(gdata); diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h index 812ed16ffc2f..4834cf23656e 100644 --- a/include/linux/cxl-event.h +++ b/include/linux/cxl-event.h @@ -143,4 +143,22 @@ struct cxl_cper_event_rec { union cxl_event event; } __packed; +typedef void (*cxl_cper_callback)(enum cxl_event_type type, + struct cxl_cper_event_rec *rec); + +#ifdef CONFIG_ACPI_APEI_GHES +int cxl_cper_register_callback(cxl_cper_callback callback); +int cxl_cper_unregister_callback(cxl_cper_callback callback); +#else +static inline int cxl_cper_register_callback(cxl_cper_callback callback) +{ + return 0; +} + +static inline int cxl_cper_unregister_callback(cxl_cper_callback callback) +{ + return 0; +} +#endif + #endif /* _LINUX_CXL_EVENT_H */
BIOS can configure memory devices as firmware first. This will send CXL events to the firmware instead of the OS. The firmware can then send these events to the OS via UEFI. Currently a configuration such as this will trace a non standard event in the log. Using the specific CXL trace points with the additional information CXL can provide is much more useful to users. Specifically, future support can be added to CXL provide the DPA to HPA mapping configured at the time of the event. UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER) format for CXL Component Events. The format is mostly the same as the CXL Common Event Record Format. The difference is the use of a GUID in the Section Type rather than a UUID as part of the event itself. Add GHES support to detect CXL CPER records and call into the CXL code to process the event. Multiple methods were considered for the call into the CXL code. A notifier chain was considered for the callback but the complexity did not justify the use case. Furthermore, the CXL code is required to be called from process context as it needs to take a device lock so a simple callback register proved difficult. Dan Williams suggested using 2 work items as an atomic way of switching between a callback being registered and not. This allows the callback to run without any locking.[1] Note that a local work item is required to dump any messages seen during a race between any check done in cxl_cper_post_event() and the scheduling of work. That said, no attempt is made to stop the addition of messages into the kfifo because this local work item provides a hook to add a local CXL CPER trace point in a future patch. This new combined patch addresses the report by Dan Carpenter[2]. Thus the reported by tag. [1] https://lore.kernel.org/all/65d111eb87115_6c745294ac@dwillia2-xfh.jf.intel.com.notmuch/ [2] https://lore.kernel.org/all/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain/ Cc: Ard Biesheuvel <ardb@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> --- [djbw: use kfifo for record data] [djbw: Use work struct for sync between cxl reg and ghes code] --- drivers/acpi/apei/ghes.c | 127 ++++++++++++++++++++++++++++++++++++++++++++++ include/linux/cxl-event.h | 18 +++++++ 2 files changed, 145 insertions(+)