Message ID | 20220817143458.335938-6-justin.he@arm.com |
---|---|
State | New |
Headers | show |
Series | Modularize ghes_edac driver | expand |
On Wednesday, August 17, 2022 8:35 AM, Jia He wrote: > Previous, there is only one edac can be registering in the EDAC core. After > ghes_edac is modularized, the registering of ghes devices may be deferred > when ghes_edac is loaded. Prevent the chipset-specific edac drivers from > loading after ghes_edac is registered, which allow the edac drivers to > get selected in e.g. HPE platform. ... > @@ -1382,6 +1395,7 @@ static int ghes_probe(struct platform_device > *ghes_dev) > platform_set_drvdata(ghes_dev, ghes); > > ghes->dev = &ghes_dev->dev; > + set_ghes_devs_registered(false); This does not look right to me. The condition of using ghes_edac is (ghes-present) && (ghes-preferred), where: - ghes-present is latched on ghes_probe() - ghes-preferred is true if the platform-check passes. ghes_get_device() introduced in the previous patch works as the ghes-preferred check. We cannot use ghes_edac registration as the ghes-present check in this scheme since it is deferred to module_init(). I'd suggest the following changes: - Update ghes_get_device() to check a flag latched on ghes_probe(). - Remove 'force' argument from ghes_get_device(). ghes_edac_init() is too late to set this flag. Also, other edac drivers should not be able to control this flag. I think this force_load kernel option needs to belong to ghes with this scheme. - ghes_get_device() exposes 'ghes_devs' to chipset-specific edac drivers, which should be internal to ghes. ghes_get_device() may be renamed to something like ghes_edac_preferred() which returns true / false. Toshi
Hi Toshi, > -----Original Message----- > From: Kani, Toshi <toshi.kani@hpe.com> > Sent: Friday, August 19, 2022 9:30 AM > To: Justin He <Justin.He@arm.com>; Ard Biesheuvel <ardb@kernel.org>; Len > Brown <lenb@kernel.org>; James Morse <James.Morse@arm.com>; Tony Luck > <tony.luck@intel.com>; Borislav Petkov <bp@alien8.de>; Mauro Carvalho > Chehab <mchehab@kernel.org>; Robert Richter <rric@kernel.org>; Robert > Moore <robert.moore@intel.com>; Qiuxu Zhuo <qiuxu.zhuo@intel.com>; > Yazen Ghannam <yazen.ghannam@amd.com> > Cc: linux-acpi@vger.kernel.org; linux-kernel@vger.kernel.org; > linux-edac@vger.kernel.org; devel@acpica.org; Rafael J . Wysocki > <rafael@kernel.org>; Shuai Xue <xueshuai@linux.alibaba.com>; Jarkko > Sakkinen <jarkko@kernel.org>; linux-efi@vger.kernel.org; nd <nd@arm.com> > Subject: RE: [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac from > loading after ghes_edac is registered > > On Wednesday, August 17, 2022 8:35 AM, Jia He wrote: > > Previous, there is only one edac can be registering in the EDAC core. > > After ghes_edac is modularized, the registering of ghes devices may be > > deferred when ghes_edac is loaded. Prevent the chipset-specific edac > > drivers from loading after ghes_edac is registered, which allow the > > edac drivers to get selected in e.g. HPE platform. > ... > > @@ -1382,6 +1395,7 @@ static int ghes_probe(struct platform_device > > *ghes_dev) > > platform_set_drvdata(ghes_dev, ghes); > > > > ghes->dev = &ghes_dev->dev; > > + set_ghes_devs_registered(false); > > This does not look right to me. > > The condition of using ghes_edac is (ghes-present) && (ghes-preferred), > where: > - ghes-present is latched on ghes_probe() > - ghes-preferred is true if the platform-check passes. > > ghes_get_device() introduced in the previous patch works as the > ghes-preferred check. > > We cannot use ghes_edac registration as the ghes-present check in this > scheme since it is deferred to module_init(). What is the logic for ghes-present check? In this patch, I assumed it is equal to "ghes_edac devices have been registered". Seems it is not correct. But should we consider the case as follows: What if sbridge_init () is invoked before ghes_edac_init()? i.e. Should we get sb_edac driver selected when ghes_edac is not loaded yet (e.g. on HPE)? > > I'd suggest the following changes: > - Update ghes_get_device() to check a flag latched on ghes_probe(). Still need your elaborating about the details of the flag. i.e. When is this flag latched? When is it unlatched? > - Remove 'force' argument from ghes_get_device(). ghes_edac_init() is too > late to set this flag. Also, other edac drivers should not be able to control this > flag. I think this force_load kernel option needs to belong to ghes with this > scheme. Agree, I will remove force_load in ghes_get_device(), and put all check/update of force_load in ghes > - ghes_get_device() exposes 'ghes_devs' to chipset-specific edac drivers, which > should be internal to ghes. ghes_get_device() may be renamed to something > like ghes_edac_preferred() which returns true / false. > Okay, agree -- Cheers, Justin (Jia He)
On Friday, August 19, 2022 4:35 AM, Justin He wrote: > > > @@ -1382,6 +1395,7 @@ static int ghes_probe(struct platform_device > > > *ghes_dev) > > > platform_set_drvdata(ghes_dev, ghes); > > > > > > ghes->dev = &ghes_dev->dev; > > > + set_ghes_devs_registered(false); > > > > This does not look right to me. > > > > The condition of using ghes_edac is (ghes-present) && (ghes-preferred), > > where: > > - ghes-present is latched on ghes_probe() > > - ghes-preferred is true if the platform-check passes. > > > > ghes_get_device() introduced in the previous patch works as the > > ghes-preferred check. > > > > We cannot use ghes_edac registration as the ghes-present check in this > > scheme since it is deferred to module_init(). > > What is the logic for ghes-present check? In this patch, I assumed it is equal to > "ghes_edac devices have been registered". Seems it is not correct. Using (ghes_edac-registered) is a wrong check in this scheme since ghes_edac registration is deferred. This check is fine in the current scheme since ghes_edac gets registered before any other chipset-specific edac drivers. > But should we consider the case as follows: > What if sbridge_init () is invoked before ghes_edac_init()? i.e. Should we get > sb_edac driver selected when ghes_edac is not loaded yet (e.g. on HPE)? No. The point is that ghes_edac driver needs to be selected, "regardless of the module ordering", on a system with GHES present & preferred. Note that this new scheme leads to the following condition change: - Current: (ghes-present) && (ghes-preferred) && (ghes_edac registered) - New: (ghes-present) && (ghes-preferred) The option I suggested previously keeps the current condition, but this new scheme does not for better modularity. What this means is that if ghes_edac is not enabled (but ghes is enabled) on a system with GHES present & preferred, no edac driver gets registered. This change is fine from my (HPE) perspective and should be fine for other GHES systems. GHES systems have chipset-specific edac driver in FW. OS-based chipset-specific edac driver is not necessary and may lead to a conflict of chipset register ownership. > > I'd suggest the following changes: > > - Update ghes_get_device() to check a flag latched on ghes_probe(). > > Still need your elaborating about the details of the flag. i.e. When is this flag > latched? When is it unlatched? This flag is a static variable, say ghes_present, which is set to false initially. ghes_probe() sets it to true. ghes_edac_preferred() (aka. ghes_get_device) checks this flag at beginning and returns false if this flag is false. It does not get unlatched since ACPI GHES table is static. Toshi
On Fri, Aug 19, 2022 at 05:48:39PM +0000, Kani, Toshi wrote: > This flag is a static variable, say ghes_present, which is set to > false initially. ghes_probe() sets it to true. ghes_edac_preferred() > (aka. ghes_get_device) checks this flag at beginning and returns false > if this flag is false. It does not get unlatched since ACPI GHES table > is static. What is that flag needed for at all? There are two possibilities: 1. ghes_probe() succeeds. ghes_edac loads properly and other drivers use ghes_get_devices() to know when to load. 2. ghes_probe() fails and that is caught during platform testing of all those platforms who wish to use ghes_edac. BIOS is fixed and goto 1. No need for funky flags whatsoever.
On Friday, August 19, 2022 12:30 PM, Borislav Petkov wrote: > > This flag is a static variable, say ghes_present, which is set to > > false initially. ghes_probe() sets it to true. ghes_edac_preferred() > > (aka. ghes_get_device) checks this flag at beginning and returns false > > if this flag is false. It does not get unlatched since ACPI GHES table > > is static. > > What is that flag needed for at all? Because ghes_get_device() always returns &ghes_devs on Arm, which is !NULL. It does not check if GHES is present. > There are two possibilities: > > 1. ghes_probe() succeeds. ghes_edac loads properly and other drivers use > ghes_get_devices() to know when to load. > > 2. ghes_probe() fails and that is caught during platform testing of all > those platforms who wish to use ghes_edac. BIOS is fixed and goto 1. > > No need for funky flags whatsoever. 3. ghes_probe() is not called, but ghes_get_device() is called from other edac drivers. Toshi
On Fri, Aug 19, 2022 at 06:46:34PM +0000, Kani, Toshi wrote:
> 3. ghes_probe() is not called,
When is ghes_probe() not called on ARM?
On Friday, August 19, 2022 12:50 PM, Borislav Petkov wrote: > > 3. ghes_probe() is not called, > > When is ghes_probe() not called on ARM? When the system does not implement ACPI GHES table, which I suppose is most of the case on ARM. Toshi
> -----Original Message----- > From: Kani, Toshi <toshi.kani@hpe.com> > Sent: Friday, August 19, 2022 12:49 PM > To: Justin He <Justin.He@arm.com> > Cc: linux-acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linux- > edac@vger.kernel.org; devel@acpica.org; Rafael J . Wysocki > <rafael@kernel.org>; Shuai Xue <xueshuai@linux.alibaba.com>; Jarkko > Sakkinen <jarkko@kernel.org>; linux-efi@vger.kernel.org; nd > <nd@arm.com>; Ard Biesheuvel <ardb@kernel.org>; Len Brown > <lenb@kernel.org>; James Morse <James.Morse@arm.com>; Tony Luck > <tony.luck@intel.com>; Borislav Petkov <bp@alien8.de>; Mauro Carvalho > Chehab <mchehab@kernel.org>; Robert Richter <rric@kernel.org>; Robert > Moore <robert.moore@intel.com>; Qiuxu Zhuo <qiuxu.zhuo@intel.com>; > Yazen Ghannam <yazen.ghannam@amd.com> > Subject: RE: [PATCH v2 5/7] EDAC/ghes: Prevent chipset-specific edac > from loading after ghes_edac is registered > > On Friday, August 19, 2022 4:35 AM, Justin He wrote: > > > > @@ -1382,6 +1395,7 @@ static int ghes_probe(struct > platform_device > > > > *ghes_dev) > > > > platform_set_drvdata(ghes_dev, ghes); > > > > > > > > ghes->dev = &ghes_dev->dev; > > > > + set_ghes_devs_registered(false); > > > > > > This does not look right to me. > > > > > > The condition of using ghes_edac is (ghes-present) && (ghes- > preferred), > > > where: > > > - ghes-present is latched on ghes_probe() > > > - ghes-preferred is true if the platform-check passes. > > > > > > ghes_get_device() introduced in the previous patch works as the > > > ghes-preferred check. > > > > > > We cannot use ghes_edac registration as the ghes-present check in > this > > > scheme since it is deferred to module_init(). > > > > What is the logic for ghes-present check? In this patch, I assumed it > is equal to > > "ghes_edac devices have been registered". Seems it is not correct. > > Using (ghes_edac-registered) is a wrong check in this scheme > since ghes_edac registration is deferred. This check is fine in > the current scheme since ghes_edac gets registered before > any other chipset-specific edac drivers. > > > But should we consider the case as follows: > > What if sbridge_init () is invoked before ghes_edac_init()? i.e. > Should we get > > sb_edac driver selected when ghes_edac is not loaded yet (e.g. on > HPE)? > > No. The point is that ghes_edac driver needs to be selected, > "regardless of the module ordering", on a system with GHES > present & preferred. > > Note that this new scheme leads to the following condition > change: > - Current: (ghes-present) && (ghes-preferred) && (ghes_edac registered) > - New: (ghes-present) && (ghes-preferred) > > The option I suggested previously keeps the current condition, > but this new scheme does not for better modularity. > > What this means is that if ghes_edac is not enabled (but ghes > is enabled) on a system with GHES present & preferred, no edac > driver gets registered. This change is fine from my (HPE) perspective > and should be fine for other GHES systems. GHES systems have > chipset-specific edac driver in FW. OS-based chipset-specific edac > driver is not necessary and may lead to a conflict of chipset register > ownership. Currently, running with this on the kernel command line ghes.disable causes the ACPI ghes driver to quit early in acpi_ghes_init(): /* * This driver isn't really modular, however for the time being, * continuing to use module_param is the easiest way to remain * compatible with existing boot arg use cases. */ bool ghes_disable; module_param_named(disable, ghes_disable, bool, 0); which results in the skx_edac module assuming it should run: [ 33.628140] calling skx_init+0x0/0xe5a [skx_edac] @ 1444 [ 33.628531] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0: DEV 0000:36:0a.0 (INTERRUPT) [ 33.641432] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1: DEV 0000:36:0c.0 (INTERRUPT) [ 33.653256] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0: DEV 0000:ae:0a.0 (INTERRUPT) [ 33.665055] EDAC MC3: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1: DEV 0000:ae:0c.0 (INTERRUPT) [ 33.676801] initcall skx_init+0x0/0xe5a [skx_edac] returned 0 after 36343 usecs We might need to differentiate between the system ROM really not offering GHES vs. the ghes module not running.
On Fri, Aug 19, 2022 at 06:53:41PM +0000, Kani, Toshi wrote: > When the system does not implement ACPI GHES table, > which I suppose is most of the case on ARM. That should be detected in ghes_get_devices() - just like on x86.
On Fri, Aug 19, 2022 at 06:57:11PM +0000, Elliott, Robert (Servers) wrote: > We might need to differentiate between the system ROM really not > offering GHES vs. the ghes module not running. All detectable in ghes_get_devices().
On Friday, August 19, 2022 1:31 PM, Borislav Petkov wrote: > > When the system does not implement ACPI GHES table, > > which I suppose is most of the case on ARM. > > That should be detected in ghes_get_devices() - just like on x86. Agreed. And that is the check to ghes_present flag... Toshi
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 9c52183e3ad9..9272d963b57d 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -95,6 +95,7 @@ #endif static ATOMIC_NOTIFIER_HEAD(ghes_report_chain); +static bool devs_registered; static inline bool is_hest_type_generic_v2(struct ghes *ghes) { @@ -1266,6 +1267,18 @@ static int apei_sdei_unregister_ghes(struct ghes *ghes) return sdei_unregister_ghes(ghes); } +void set_ghes_devs_registered(bool flag) +{ + devs_registered = flag; +} +EXPORT_SYMBOL_GPL(set_ghes_devs_registered); + +bool ghes_devs_registered(void) +{ + return devs_registered; +} +EXPORT_SYMBOL_GPL(ghes_devs_registered); + static int ghes_probe(struct platform_device *ghes_dev) { struct acpi_hest_generic *generic; @@ -1382,6 +1395,7 @@ static int ghes_probe(struct platform_device *ghes_dev) platform_set_drvdata(ghes_dev, ghes); ghes->dev = &ghes_dev->dev; + set_ghes_devs_registered(false); mutex_lock(&ghes_devs_mutex); list_add_tail(&ghes->elist, &ghes_devs); diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index e4eaf6668feb..f48507fa7b94 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -4329,7 +4329,7 @@ static int __init amd64_edac_init(void) int err = -ENODEV; int i; - if (ghes_get_devices(0)) + if (ghes_get_devices(0) && ghes_devs_registered()) return -EBUSY; owner = edac_get_owner(); diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index 3bdf8469882d..d26644b3bc47 100644 --- a/drivers/edac/ghes_edac.c +++ b/drivers/edac/ghes_edac.c @@ -564,6 +564,7 @@ static int __init ghes_edac_init(void) list_for_each_entry_safe(g, g_tmp, ghes_devs, elist) { ghes_edac_register(g->dev); } + set_ghes_devs_registered(true); return 0; } @@ -576,6 +577,7 @@ static void __exit ghes_edac_exit(void) list_for_each_entry_safe(g, g_tmp, ghes_devs, elist) { ghes_edac_unregister(g); } + set_ghes_devs_registered(false); } module_exit(ghes_edac_exit); diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c index 73f2ba0e64e3..66d89d00c4b3 100644 --- a/drivers/edac/pnd2_edac.c +++ b/drivers/edac/pnd2_edac.c @@ -1528,7 +1528,7 @@ static int __init pnd2_init(void) edac_dbg(2, "\n"); - if (ghes_get_devices(0)) + if (ghes_get_devices(0) && ghes_devs_registered()) return -EBUSY; owner = edac_get_owner(); diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c index 1d0520a16840..a3a89e366a74 100644 --- a/drivers/edac/sb_edac.c +++ b/drivers/edac/sb_edac.c @@ -3506,7 +3506,7 @@ static int __init sbridge_init(void) edac_dbg(2, "\n"); - if (ghes_get_devices(0)) + if (ghes_get_devices(0) && ghes_devs_registered()) return -EBUSY; owner = edac_get_owner(); diff --git a/drivers/edac/skx_base.c b/drivers/edac/skx_base.c index fe267f8543f5..efa1ae79c35a 100644 --- a/drivers/edac/skx_base.c +++ b/drivers/edac/skx_base.c @@ -653,7 +653,7 @@ static int __init skx_init(void) edac_dbg(2, "\n"); - if (ghes_get_devices(0)) + if (ghes_get_devices(0) && ghes_devs_registered()) return -EBUSY; owner = edac_get_owner(); diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h index 150c0b9500d6..21b9d4d4c3e9 100644 --- a/include/acpi/ghes.h +++ b/include/acpi/ghes.h @@ -72,8 +72,12 @@ int ghes_register_vendor_record_notifier(struct notifier_block *nb); */ void ghes_unregister_vendor_record_notifier(struct notifier_block *nb); struct list_head *ghes_get_devices(bool force); +bool ghes_devs_registered(void); +void set_ghes_devs_registered(bool flag); #else static inline struct list_head *ghes_get_devices(bool force) { return NULL; } +static inline bool ghes_devs_registered(void) { return false; } +static inline void set_ghes_devs_registered(bool flag) { return; } #endif int ghes_estatus_pool_init(int num_ghes);
Previous, there is only one edac can be registering in the EDAC core. After ghes_edac is modularized, the registering of ghes devices may be deferred when ghes_edac is loaded. Prevent the chipset-specific edac drivers from loading after ghes_edac is registered, which allow the edac drivers to get selected in e.g. HPE platform. Signed-off-by: Jia He <justin.he@arm.com> --- drivers/acpi/apei/ghes.c | 14 ++++++++++++++ drivers/edac/amd64_edac.c | 2 +- drivers/edac/ghes_edac.c | 2 ++ drivers/edac/pnd2_edac.c | 2 +- drivers/edac/sb_edac.c | 2 +- drivers/edac/skx_base.c | 2 +- include/acpi/ghes.h | 4 ++++ 7 files changed, 24 insertions(+), 4 deletions(-)