Message ID | 20220822154048.188253-1-justin.he@arm.com |
---|---|
Headers | show |
Series | Make ghes_edac a proper module | expand |
Hi,
Sorry for resending the v3.
There is an exceptional interrupt when I tried to post v3 at the first time.
Maybe it is caused by a comma "," inside the mail name.
E.g.
Signed-off-by: Some, one <someone@site.com>
Looks like a git sendemail issue?
Anyway, sorry for the inconvenience.
--
Cheers,
Justin (Jia He)
On Tue, Aug 23, 2022 at 3:50 AM Justin He <Justin.He@arm.com> wrote: > > Hi, > Sorry for resending the v3. > There is an exceptional interrupt when I tried to post v3 at the first time. > Maybe it is caused by a comma "," inside the mail name. > E.g. > Signed-off-by: Some, one <someone@site.com> > Looks like a git sendemail issue? > > Anyway, sorry for the inconvenience. I've received it twice, but no problem. I need Boris to tell me what to do with this series.
On Mon, Aug 22, 2022 at 03:40:42PM +0000, Jia He wrote: > Commit dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in > apci_init()") introduced a bug that ghes_edac_register() would be invoked > before edac_init(). Because at that time, the bus "edac" hadn't been even > registered, this created sysfs /devices/mc0 instead of > /sys/devices/system/edac/mc/mc0 on an Ampere eMag server. > > To remove the dependency of ghes_edac on ghes, make it a proper module. Use > a list to save the probing devices in ghes_probe(), and defer the > ghes_edac_register() to module_init() of the new ghes_edac module by > iterating over the devices list. > > Co-developed-by: Borislav Petkov <bp@alien8.de> > Signed-off-by: Borislav Petkov <bp@alien8.de> > Signed-off-by: Jia He <justin.he@arm.com> > Fixes: dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()") > Cc: stable@kernel.org Why is this marked for stable? The prerequisite patches are needed too. I guess this needs to be communicated to stable folks somehow by doing Cc: stable@kernel.org # needs commits X, Y, ... but I guess the committer needs to do that because only at commit time will X and Y be known... So, is there any particular reason why this should be in stable? > @@ -1442,7 +1449,9 @@ static int ghes_remove(struct platform_device *ghes_dev) > > ghes_fini(ghes); > > - ghes_edac_unregister(ghes); > + mutex_lock(&ghes_devs_mutex); > + list_del_rcu(&ghes->elist); Is that list RCU-protected? > + mutex_unlock(&ghes_devs_mutex); > > kfree(ghes); ... > @@ -566,3 +549,35 @@ void ghes_edac_unregister(struct ghes *ghes) > unlock: > mutex_unlock(&ghes_reg_mutex); > } > + > +static int __init ghes_edac_init(void) > +{ > + struct ghes *g, *g_tmp; > + > + if (!IS_ENABLED(CONFIG_X86)) > + force_load = true; No, this is not how this works. > + ghes_devs = ghes_get_devices(force_load); > + if (!ghes_devs) > + return -ENODEV; You simply need to check force_load here. > + list_for_each_entry_safe(g, g_tmp, ghes_devs, elist) { > + ghes_edac_register(g->dev); > + } > + > + return 0; > +}
On Monday, August 22, 2022 9:41 AM, Jia He wrote: > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c > index e17e0ee8f842..327386f3cf33 100644 > --- a/drivers/acpi/apei/ghes.c > +++ b/drivers/acpi/apei/ghes.c > @@ -1537,16 +1537,25 @@ static struct acpi_platform_list plat_list[] = { > { } /* End */ > }; > > -struct list_head *ghes_get_devices(void) > +bool ghes_edac_preferred(void) > { > int idx = -1; > > if (IS_ENABLED(CONFIG_X86)) { > idx = acpi_match_platform_list(plat_list); > if (idx < 0 && !ghes_edac_force) > - return NULL; > + return false; > } > > + return true; > +} > +EXPORT_SYMBOL_GPL(ghes_edac_preferred); > + > +struct list_head *ghes_get_devices(void) > +{ > + if (!ghes_edac_preferred()) > + return NULL; > + > return &ghes_devs; > } ghes_get_devices() changing multiple times in the series is confusing to me. Can you simply introduce ghes_get_devices() and ghes_preferred() in the right state in a patch? Perhaps, patch #2, #5, #6 can collapse to introduce the two funcs? The rest of patch #5 adding the call to ghes_edac_preferred() into other edac drivers can remain as a separate patch. Toshi
> -----Original Message----- > From: Kani, Toshi <toshi.kani@hpe.com> > Sent: Thursday, August 25, 2022 7:04 AM > To: Justin He <Justin.He@arm.com>; Len Brown <lenb@kernel.org>; James > Morse <James.Morse@arm.com>; Tony Luck <tony.luck@intel.com>; Borislav > Petkov <bp@alien8.de>; Mauro Carvalho Chehab <mchehab@kernel.org>; > Robert Richter <rric@kernel.org>; Robert Moore <robert.moore@intel.com>; > Qiuxu Zhuo <qiuxu.zhuo@intel.com>; Yazen Ghannam > <yazen.ghannam@amd.com>; Jonathan Corbet <corbet@lwn.net>; Jan > Luebbe <jlu@pengutronix.de>; Khuong Dinh > <khuong@os.amperecomputing.com> > Cc: Ard Biesheuvel <ardb@kernel.org>; linux-acpi@vger.kernel.org; > linux-kernel@vger.kernel.org; linux-edac@vger.kernel.org; devel@acpica.org; > Rafael J . Wysocki <rafael@kernel.org>; Shuai Xue > <xueshuai@linux.alibaba.com>; Jarkko Sakkinen <jarkko@kernel.org>; > linux-efi@vger.kernel.org; nd <nd@arm.com>; Paul E. McKenney > <paulmck@kernel.org>; Andrew Morton <akpm@linux-foundation.org>; > Neeraj Upadhyay <quic_neeraju@quicinc.com>; Randy Dunlap > <rdunlap@infradead.org>; Damien Le Moal > <damien.lemoal@opensource.wdc.com>; Muchun Song > <songmuchun@bytedance.com>; linux-doc@vger.kernel.org > Subject: RE: [RESEND PATCH v3 5/9] EDAC: Don't load chipset-specific edac > drivers when ghes_edac is preferred > > On Monday, August 22, 2022 9:41 AM, Jia He wrote: > > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index > > e17e0ee8f842..327386f3cf33 100644 > > --- a/drivers/acpi/apei/ghes.c > > +++ b/drivers/acpi/apei/ghes.c > > @@ -1537,16 +1537,25 @@ static struct acpi_platform_list plat_list[] = { > > { } /* End */ > > }; > > > > -struct list_head *ghes_get_devices(void) > > +bool ghes_edac_preferred(void) > > { > > int idx = -1; > > > > if (IS_ENABLED(CONFIG_X86)) { > > idx = acpi_match_platform_list(plat_list); > > if (idx < 0 && !ghes_edac_force) > > - return NULL; > > + return false; > > } > > > > + return true; > > +} > > +EXPORT_SYMBOL_GPL(ghes_edac_preferred); > > + > > +struct list_head *ghes_get_devices(void) { > > + if (!ghes_edac_preferred()) > > + return NULL; > > + > > return &ghes_devs; > > } > > ghes_get_devices() changing multiple times in the series is > confusing to me. Can you simply introduce ghes_get_devices() > and ghes_preferred() in the right state in a patch? Perhaps, patch #2, #5, #6 > can collapse to introduce the two funcs? > My purpose was to make it easy for review. I am ok to merge these patches into one. > The rest of patch #5 adding the call to ghes_edac_preferred() into other edac > drivers can remain as a separate patch. Okay, I assume you mean to merge all of the ghes_edac_preferred() checking for intel and Arm edac drivers into 1 separate patch, am I understanding well? -- Cheers, Justin (Jia He)
Hi Borislav > -----Original Message----- > On Mon, Aug 22, 2022 at 03:40:42PM +0000, Jia He wrote: > > Commit dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in > > apci_init()") introduced a bug that ghes_edac_register() would be > > invoked before edac_init(). Because at that time, the bus "edac" > > hadn't been even registered, this created sysfs /devices/mc0 instead > > of > > /sys/devices/system/edac/mc/mc0 on an Ampere eMag server. > > > > To remove the dependency of ghes_edac on ghes, make it a proper > > module. Use a list to save the probing devices in ghes_probe(), and > > defer the > > ghes_edac_register() to module_init() of the new ghes_edac module by > > iterating over the devices list. > > > > Co-developed-by: Borislav Petkov <bp@alien8.de> > > Signed-off-by: Borislav Petkov <bp@alien8.de> > > Signed-off-by: Jia He <justin.he@arm.com> > > Fixes: dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in > > apci_init()") > > Cc: stable@kernel.org > > Why is this marked for stable? > > The prerequisite patches are needed too. I guess this needs to be > communicated to stable folks somehow by doing > > Cc: stable@kernel.org # needs commits X, Y, ... > > but I guess the committer needs to do that because only at commit time will X > and Y be known... > > So, is there any particular reason why this should be in stable? Okay, I am fine with removing the stable line if dc4e8c07e9e2 will not be included in any stable tree branch. > > > @@ -1442,7 +1449,9 @@ static int ghes_remove(struct platform_device > > *ghes_dev) > > > > ghes_fini(ghes); > > > > - ghes_edac_unregister(ghes); > > + mutex_lock(&ghes_devs_mutex); > > + list_del_rcu(&ghes->elist); > > Is that list RCU-protected? No, I will remove the "rcu" suffix since I use list_add_tail. > > > + mutex_unlock(&ghes_devs_mutex); > > > > kfree(ghes); > > ... > > > @@ -566,3 +549,35 @@ void ghes_edac_unregister(struct ghes *ghes) > > unlock: > > mutex_unlock(&ghes_reg_mutex); > > } > > + > > +static int __init ghes_edac_init(void) { > > + struct ghes *g, *g_tmp; > > + > > + if (!IS_ENABLED(CONFIG_X86)) > > + force_load = true; > > No, this is not how this works. > > > + ghes_devs = ghes_get_devices(force_load); > > + if (!ghes_devs) > > + return -ENODEV; > > You simply need to check force_load here. > Okay, hence should I export the *ghes_devs* in ghes? -- Cheers, Justin (Jia He)
On Thursday, August 25, 2022 3:46 AM, Justin He wrote: > > ghes_get_devices() changing multiple times in the series is > > confusing to me. Can you simply introduce ghes_get_devices() > > and ghes_preferred() in the right state in a patch? Perhaps, patch #2, #5, > > #6 can collapse to introduce the two funcs? > > My purpose was to make it easy for review. I am ok to merge these patches > into one. This series starts with your original patchset and then has additional patches to address the issues with the original patchset. The number of the patches continues to increase in this way... You do not need to record the history of discussions and design changes in the series. > > The rest of patch #5 adding the call to ghes_edac_preferred() into other edac > > drivers can remain as a separate patch. > > Okay, I assume you mean to merge all of the ghes_edac_preferred() > checking for intel and > Arm edac drivers into 1 separate patch, am I understanding well? No, that's not what I meant. ghes_get_devices() and ghes_edac_preferred() look good to me after all patches are applied. So, instead of introducing the original design of ghes_get_devices() and then changing its design multiple times in the series, please simply add the final version of ghes_get_devices() and ghes_edac_preferred() in a single patch. They are small functions anyway. That is, your series includes something like: - PATCH: Add ghes_get_devices() and ghes_edac_preferred() interfaces - PATCH: Add ghes_edac_preferred check to chipset-specific edac drivers Toshi
On Thursday, August 25, 2022 6:21 AM, Justin He wrote: > > > @@ -566,3 +549,35 @@ void ghes_edac_unregister(struct ghes *ghes) > > > unlock: > > > mutex_unlock(&ghes_reg_mutex); > > > } > > > + > > > +static int __init ghes_edac_init(void) { > > > + struct ghes *g, *g_tmp; > > > + > > > + if (!IS_ENABLED(CONFIG_X86)) > > > + force_load = true; > > > > No, this is not how this works. > > > > > + ghes_devs = ghes_get_devices(force_load); > > > + if (!ghes_devs) > > > + return -ENODEV; > > > > You simply need to check force_load here. > > > > Okay, hence should I export the *ghes_devs* in ghes? It does not matter. This series then moves the force_load check to ghes_edac_preferred(). It is confusing for reviewers... Please divide patches based on the final design. Toshi