Message ID | 20240326212640.96920-3-john.allen@amd.com |
---|---|
State | Superseded |
Headers | show |
Series | PRM handler direct call interface | expand |
On 3/26/24 17:26, John Allen wrote: > Future AMD platforms will provide a UEFI PRM module that implements a > number of address translation PRM handlers. This will provide an > interface for the OS to call platform specific code without requiring > the use of SMM or other heavy firmware operations. > > AMD Zen-based systems report memory error addresses through Machine > Check banks representing Unified Memory Controllers (UMCs) in the form > of UMC relative "normalized" addresses. A normalized address must be > converted to a system physical address to be usable by the OS. > > Add support for the normalized to system physical address translation > PRM handler in the AMD Address Translation Library and prefer it over > native code if available. The GUID and parameter buffer structure are > specific to the normalized to system physical address handler provided > by the address translation PRM module included in future AMD systems. > > The address translation PRM module is documented in chapter 22 of the > publicly available "AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh > ACPI v6.5 Porting Guide": > https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/programmer-references/58088-0.75-pub.pdf > > Signed-off-by: John Allen <john.allen@amd.com> > --- > drivers/ras/amd/atl/Makefile | 1 + > drivers/ras/amd/atl/internal.h | 2 ++ > drivers/ras/amd/atl/prm.c | 61 ++++++++++++++++++++++++++++++++++ > drivers/ras/amd/atl/umc.c | 5 +++ > 4 files changed, 69 insertions(+) > create mode 100644 drivers/ras/amd/atl/prm.c > > diff --git a/drivers/ras/amd/atl/Makefile b/drivers/ras/amd/atl/Makefile > index 4acd5f05bd9c..8f1afa793e3b 100644 > --- a/drivers/ras/amd/atl/Makefile > +++ b/drivers/ras/amd/atl/Makefile > @@ -14,5 +14,6 @@ amd_atl-y += denormalize.o > amd_atl-y += map.o > amd_atl-y += system.o > amd_atl-y += umc.o > +amd_atl-y += prm.o > > obj-$(CONFIG_AMD_ATL) += amd_atl.o > diff --git a/drivers/ras/amd/atl/internal.h b/drivers/ras/amd/atl/internal.h > index 5de69e0bb0f9..f739dcada126 100644 > --- a/drivers/ras/amd/atl/internal.h > +++ b/drivers/ras/amd/atl/internal.h > @@ -234,6 +234,8 @@ int dehash_address(struct addr_ctx *ctx); > unsigned long norm_to_sys_addr(u8 socket_id, u8 die_id, u8 coh_st_inst_id, unsigned long addr); > unsigned long convert_umc_mca_addr_to_sys_addr(struct atl_err *err); > > +unsigned long prm_umc_norm_to_sys_addr(u8 socket_id, u64 umc_bank_inst_id, unsigned long addr); > + > /* > * Make a gap in @data that is @num_bits long starting at @bit_num. > * e.g. data = 11111111'b > diff --git a/drivers/ras/amd/atl/prm.c b/drivers/ras/amd/atl/prm.c > new file mode 100644 > index 000000000000..54a69e660eb5 > --- /dev/null > +++ b/drivers/ras/amd/atl/prm.c > @@ -0,0 +1,61 @@ > +// SPDX-License-Identifier: GPL-2.0-or-later > +/* > + * AMD Address Translation Library > + * > + * prm.c : Plumbing code to UEFI Platform Runtime Mechanism (PRM) > + * > + * Copyright (c) 2024, Advanced Micro Devices, Inc. > + * All Rights Reserved. > + * > + * Author: John Allen <john.allen@amd.com> > + */ > + > +#include "internal.h" > + > +#if defined(CONFIG_ACPI_PRMT) > + > +#include <linux/prmt.h> > + > +struct prm_umc_param_buffer_norm { > + u64 norm_addr; > + u8 socket; > + u64 umc_bank_inst_id; > + void *output_buffer; > +} __packed; > + > +const guid_t norm_to_sys_prm_handler_guid = GUID_INIT(0xE7180659, 0xA65D, Use the static keyword since this is only used in the current file. > + 0x451D, 0x92, 0xCD, > + 0x2B, 0x56, 0xF1, 0x2B, > + 0xEB, 0xA6); > + > +unsigned long prm_umc_norm_to_sys_addr(u8 socket_id, u64 umc_bank_inst_id, unsigned long addr) > +{ > + struct prm_umc_param_buffer_norm param_buffer; > + unsigned long ret_addr; > + int ret; > + > + param_buffer.norm_addr = addr; > + param_buffer.socket = socket_id; > + param_buffer.umc_bank_inst_id = umc_bank_inst_id; > + param_buffer.output_buffer = &ret_addr; > + > + ret = acpi_call_prm_handler(norm_to_sys_prm_handler_guid, ¶m_buffer); > + if (!ret) > + return ret_addr; > + > + if (ret == -ENODEV) > + pr_info("PRM module/handler not available\n"); Make this a pr_debug(). I don't think this is something a user could do anything about. And one goal of this library to abstract how the functions work. So "trying different backends" is a library developer concern. > + else > + pr_info("PRM address translation failed\n"); Make this a pr_notice_once(). If the handler is available and fails, then this is likely a bug. It should be reported to the system vendor. And it may be possible for the user to update the PRM handler. This could be through a BIOS update or the runtime update option for PRM. Aside: is the runtime update option implemented? "Notice" is between info and warning. I think we'd want the user to notice, but this isn't so severe to need a warning. Also, *_once() will prevent duplicate messages in the case of multiple memory errors in the system. The handler shouldn't fail on any valid input, so a single notice is enough. Especially if the message doesn't have any error/context-specific details. Another aside: it's possible to have invalid input. This can happen in "software/simulated" MCA errors, i.e. the user provides an arbitrary value for MCA_ADDR. But this would be a user error. I don't think it's worth trying to filter out this case. An expert user could provide valid inputs, and they may want to test the full flow. And this isn't an issue just for PRM but the ATL overall. I hit this myself while testing another feature. I used a signature for MCA_ADDR (0xC001C0DE01ABCDEF ?) and the translation failed. But I was more interested in the signature than the real value. :) > + > + return ret; > +} > + > +#else /* ACPI_PRMT */ > + > +unsigned long prm_umc_norm_to_sys_addr(u8 socket_id, u64 umc_bank_inst_id, unsigned long addr) > +{ > + return -ENODEV; > +} > + > +#endif > diff --git a/drivers/ras/amd/atl/umc.c b/drivers/ras/amd/atl/umc.c > index 59b6169093f7..954cbe6bf465 100644 > --- a/drivers/ras/amd/atl/umc.c > +++ b/drivers/ras/amd/atl/umc.c > @@ -333,9 +333,14 @@ unsigned long convert_umc_mca_addr_to_sys_addr(struct atl_err *err) > u8 coh_st_inst_id = get_coh_st_inst_id(err); > unsigned long addr = get_addr(err->addr); > u8 die_id = get_die_id(err); > + unsigned long ret_addr; > > pr_debug("socket_id=0x%x die_id=0x%x coh_st_inst_id=0x%x addr=0x%016lx", > socket_id, die_id, coh_st_inst_id, addr); > > + ret_addr = prm_umc_norm_to_sys_addr(socket_id, err->ipid, addr); > + if (!IS_ERR_VALUE(ret_addr)) > + return ret_addr; > + > return norm_to_sys_addr(socket_id, die_id, coh_st_inst_id, addr); > } Thanks, Yazen
On Sun, Apr 07, 2024 at 10:17:26AM -0400, Yazen Ghannam wrote:
> Aside: is the runtime update option implemented?
AFAICS, no. I haven't looked at this flow in detail. Is that something
we would want to try and tackle with this series?
Thanks,
John
diff --git a/drivers/ras/amd/atl/Makefile b/drivers/ras/amd/atl/Makefile index 4acd5f05bd9c..8f1afa793e3b 100644 --- a/drivers/ras/amd/atl/Makefile +++ b/drivers/ras/amd/atl/Makefile @@ -14,5 +14,6 @@ amd_atl-y += denormalize.o amd_atl-y += map.o amd_atl-y += system.o amd_atl-y += umc.o +amd_atl-y += prm.o obj-$(CONFIG_AMD_ATL) += amd_atl.o diff --git a/drivers/ras/amd/atl/internal.h b/drivers/ras/amd/atl/internal.h index 5de69e0bb0f9..f739dcada126 100644 --- a/drivers/ras/amd/atl/internal.h +++ b/drivers/ras/amd/atl/internal.h @@ -234,6 +234,8 @@ int dehash_address(struct addr_ctx *ctx); unsigned long norm_to_sys_addr(u8 socket_id, u8 die_id, u8 coh_st_inst_id, unsigned long addr); unsigned long convert_umc_mca_addr_to_sys_addr(struct atl_err *err); +unsigned long prm_umc_norm_to_sys_addr(u8 socket_id, u64 umc_bank_inst_id, unsigned long addr); + /* * Make a gap in @data that is @num_bits long starting at @bit_num. * e.g. data = 11111111'b diff --git a/drivers/ras/amd/atl/prm.c b/drivers/ras/amd/atl/prm.c new file mode 100644 index 000000000000..54a69e660eb5 --- /dev/null +++ b/drivers/ras/amd/atl/prm.c @@ -0,0 +1,61 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * AMD Address Translation Library + * + * prm.c : Plumbing code to UEFI Platform Runtime Mechanism (PRM) + * + * Copyright (c) 2024, Advanced Micro Devices, Inc. + * All Rights Reserved. + * + * Author: John Allen <john.allen@amd.com> + */ + +#include "internal.h" + +#if defined(CONFIG_ACPI_PRMT) + +#include <linux/prmt.h> + +struct prm_umc_param_buffer_norm { + u64 norm_addr; + u8 socket; + u64 umc_bank_inst_id; + void *output_buffer; +} __packed; + +const guid_t norm_to_sys_prm_handler_guid = GUID_INIT(0xE7180659, 0xA65D, + 0x451D, 0x92, 0xCD, + 0x2B, 0x56, 0xF1, 0x2B, + 0xEB, 0xA6); + +unsigned long prm_umc_norm_to_sys_addr(u8 socket_id, u64 umc_bank_inst_id, unsigned long addr) +{ + struct prm_umc_param_buffer_norm param_buffer; + unsigned long ret_addr; + int ret; + + param_buffer.norm_addr = addr; + param_buffer.socket = socket_id; + param_buffer.umc_bank_inst_id = umc_bank_inst_id; + param_buffer.output_buffer = &ret_addr; + + ret = acpi_call_prm_handler(norm_to_sys_prm_handler_guid, ¶m_buffer); + if (!ret) + return ret_addr; + + if (ret == -ENODEV) + pr_info("PRM module/handler not available\n"); + else + pr_info("PRM address translation failed\n"); + + return ret; +} + +#else /* ACPI_PRMT */ + +unsigned long prm_umc_norm_to_sys_addr(u8 socket_id, u64 umc_bank_inst_id, unsigned long addr) +{ + return -ENODEV; +} + +#endif diff --git a/drivers/ras/amd/atl/umc.c b/drivers/ras/amd/atl/umc.c index 59b6169093f7..954cbe6bf465 100644 --- a/drivers/ras/amd/atl/umc.c +++ b/drivers/ras/amd/atl/umc.c @@ -333,9 +333,14 @@ unsigned long convert_umc_mca_addr_to_sys_addr(struct atl_err *err) u8 coh_st_inst_id = get_coh_st_inst_id(err); unsigned long addr = get_addr(err->addr); u8 die_id = get_die_id(err); + unsigned long ret_addr; pr_debug("socket_id=0x%x die_id=0x%x coh_st_inst_id=0x%x addr=0x%016lx", socket_id, die_id, coh_st_inst_id, addr); + ret_addr = prm_umc_norm_to_sys_addr(socket_id, err->ipid, addr); + if (!IS_ERR_VALUE(ret_addr)) + return ret_addr; + return norm_to_sys_addr(socket_id, die_id, coh_st_inst_id, addr); }
Future AMD platforms will provide a UEFI PRM module that implements a number of address translation PRM handlers. This will provide an interface for the OS to call platform specific code without requiring the use of SMM or other heavy firmware operations. AMD Zen-based systems report memory error addresses through Machine Check banks representing Unified Memory Controllers (UMCs) in the form of UMC relative "normalized" addresses. A normalized address must be converted to a system physical address to be usable by the OS. Add support for the normalized to system physical address translation PRM handler in the AMD Address Translation Library and prefer it over native code if available. The GUID and parameter buffer structure are specific to the normalized to system physical address handler provided by the address translation PRM module included in future AMD systems. The address translation PRM module is documented in chapter 22 of the publicly available "AMD Family 1Ah Models 00h–0Fh and Models 10h–1Fh ACPI v6.5 Porting Guide": https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/programmer-references/58088-0.75-pub.pdf Signed-off-by: John Allen <john.allen@amd.com> --- drivers/ras/amd/atl/Makefile | 1 + drivers/ras/amd/atl/internal.h | 2 ++ drivers/ras/amd/atl/prm.c | 61 ++++++++++++++++++++++++++++++++++ drivers/ras/amd/atl/umc.c | 5 +++ 4 files changed, 69 insertions(+) create mode 100644 drivers/ras/amd/atl/prm.c