Message ID | 20241115035014.1339256-1-tanxiaofei@huawei.com |
---|---|
State | New |
Headers | show |
Series | acpi: Fix hed module initialization order when it is built-in | expand |
On Fri, Nov 15, 2024 at 4:56 AM Xiaofei Tan <tanxiaofei@huawei.com> wrote: > > When the module hed is built-in, the init order is determined by > Makefile order. Are you sure? > That order violates expectations. Because the module > hed init is behind evged. RAS records can't be handled in the > special time window that evged has initialized while hed not. > If the number of such RAS records is more than the APEI HEST error > source number, the HEST resources could be occupied all, and then > could affect subsequent RAS error reporting. Well, the problem is real, but does the change really prevent it from happening or does it just increase the likelihood of success? In the latter case, and generally speaking too, it would be better to add explicit synchronization between evged and hed. > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> > --- > drivers/acpi/Makefile | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile > index 61ca4afe83dc..54f60b7922ad 100644 > --- a/drivers/acpi/Makefile > +++ b/drivers/acpi/Makefile > @@ -15,6 +15,13 @@ endif > > obj-$(CONFIG_ACPI) += tables.o > > +# > +# The hed.o needs to be in front of evged.o to avoid the problem that > +# RAS errors cannot be handled in the special time window of startup > +# phase that evged has initialized while hed not. > +# > +obj-$(CONFIG_ACPI_HED) += hed.o > + > # > # ACPI Core Subsystem (Interpreter) > # > @@ -95,7 +102,6 @@ obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o > obj-$(CONFIG_ACPI_BATTERY) += battery.o > obj-$(CONFIG_ACPI_SBS) += sbshc.o > obj-$(CONFIG_ACPI_SBS) += sbs.o > -obj-$(CONFIG_ACPI_HED) += hed.o > obj-$(CONFIG_ACPI_EC_DEBUGFS) += ec_sys.o > obj-$(CONFIG_ACPI_BGRT) += bgrt.o > obj-$(CONFIG_ACPI_CPPC_LIB) += cppc_acpi.o > -- > 2.33.0 >
Em Fri, 15 Nov 2024 11:50:14 +0800 Xiaofei Tan <tanxiaofei@huawei.com> escreveu: Please always copy my @kernel.org address for upstream work. > When the module hed is built-in, the init order is determined by > Makefile order. That order violates expectations. Because the module > hed init is behind evged. RAS records can't be handled in the > special time window that evged has initialized while hed not. > If the number of such RAS records is more than the APEI HEST error > source number, the HEST resources could be occupied all, and then > could affect subsequent RAS error reporting. IMO, it is a lot better to use a late init call. Please see: include/linux/init.h This would be done by, for instance, using late_initcall(). Now, what we have is: acpi-y += evged.o obj-$(CONFIG_ACPI_HED) += hed.o Where ACPI_HED being a tri-state. It sounds to me, that even, with your patch, if you build HED as a module, you'll still have a problem. Shouldn't be ACPI_HED be changed from tristate to bool? Regards, Mauro > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> > --- > drivers/acpi/Makefile | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile > index 61ca4afe83dc..54f60b7922ad 100644 > --- a/drivers/acpi/Makefile > +++ b/drivers/acpi/Makefile > @@ -15,6 +15,13 @@ endif > > obj-$(CONFIG_ACPI) += tables.o > > +# > +# The hed.o needs to be in front of evged.o to avoid the problem that > +# RAS errors cannot be handled in the special time window of startup > +# phase that evged has initialized while hed not. > +# > +obj-$(CONFIG_ACPI_HED) += hed.o > + > # > # ACPI Core Subsystem (Interpreter) > # > @@ -95,7 +102,6 @@ obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o > obj-$(CONFIG_ACPI_BATTERY) += battery.o > obj-$(CONFIG_ACPI_SBS) += sbshc.o > obj-$(CONFIG_ACPI_SBS) += sbs.o > -obj-$(CONFIG_ACPI_HED) += hed.o > obj-$(CONFIG_ACPI_EC_DEBUGFS) += ec_sys.o > obj-$(CONFIG_ACPI_BGRT) += bgrt.o > obj-$(CONFIG_ACPI_CPPC_LIB) += cppc_acpi.o
Hi Rafael, 在 2024/12/11 1:59, Rafael J. Wysocki 写道: > On Fri, Nov 15, 2024 at 4:56 AM Xiaofei Tan <tanxiaofei@huawei.com> wrote: >> When the module hed is built-in, the init order is determined by >> Makefile order. > Are you sure? yes >> That order violates expectations. Because the module >> hed init is behind evged. RAS records can't be handled in the >> special time window that evged has initialized while hed not. >> If the number of such RAS records is more than the APEI HEST error >> source number, the HEST resources could be occupied all, and then >> could affect subsequent RAS error reporting. > Well, the problem is real, but does the change really prevent it from > happening or does it just increase the likelihood of success? It can be completely solved if the driver used as built-in way. If build HED as a module, it not solved. > > In the latter case, and generally speaking too, it would be better to > add explicit synchronization between evged and hed. > >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >> Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> >> --- >> drivers/acpi/Makefile | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile >> index 61ca4afe83dc..54f60b7922ad 100644 >> --- a/drivers/acpi/Makefile >> +++ b/drivers/acpi/Makefile >> @@ -15,6 +15,13 @@ endif >> >> obj-$(CONFIG_ACPI) += tables.o >> >> +# >> +# The hed.o needs to be in front of evged.o to avoid the problem that >> +# RAS errors cannot be handled in the special time window of startup >> +# phase that evged has initialized while hed not. >> +# >> +obj-$(CONFIG_ACPI_HED) += hed.o >> + >> # >> # ACPI Core Subsystem (Interpreter) >> # >> @@ -95,7 +102,6 @@ obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o >> obj-$(CONFIG_ACPI_BATTERY) += battery.o >> obj-$(CONFIG_ACPI_SBS) += sbshc.o >> obj-$(CONFIG_ACPI_SBS) += sbs.o >> -obj-$(CONFIG_ACPI_HED) += hed.o >> obj-$(CONFIG_ACPI_EC_DEBUGFS) += ec_sys.o >> obj-$(CONFIG_ACPI_BGRT) += bgrt.o >> obj-$(CONFIG_ACPI_CPPC_LIB) += cppc_acpi.o >> -- >> 2.33.0 >> > .
Hi Mauro, 在 2024/12/12 0:22, Mauro Carvalho Chehab 写道: > Em Fri, 15 Nov 2024 11:50:14 +0800 > Xiaofei Tan <tanxiaofei@huawei.com> escreveu: > > Please always copy my @kernel.org address for upstream work. OK >> When the module hed is built-in, the init order is determined by >> Makefile order. That order violates expectations. Because the module >> hed init is behind evged. RAS records can't be handled in the >> special time window that evged has initialized while hed not. >> If the number of such RAS records is more than the APEI HEST error >> source number, the HEST resources could be occupied all, and then >> could affect subsequent RAS error reporting. > IMO, it is a lot better to use a late init call. Please see: > include/linux/init.h > > This would be done by, for instance, using late_initcall(). > > Now, what we have is: > > acpi-y += evged.o > obj-$(CONFIG_ACPI_HED) += hed.o > > Where ACPI_HED being a tri-state. > > It sounds to me, that even, with your patch, if you build > HED as a module, you'll still have a problem. Yes, and it is also affected by loading sequence of HED and GHES. Anyway, the risk remains. > > Shouldn't be ACPI_HED be changed from tristate to bool? agree, @Rafael Hi Rafael, Please help check if we can do this change, thanks. > > Regards, > Mauro > >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >> Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> >> --- >> drivers/acpi/Makefile | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile >> index 61ca4afe83dc..54f60b7922ad 100644 >> --- a/drivers/acpi/Makefile >> +++ b/drivers/acpi/Makefile >> @@ -15,6 +15,13 @@ endif >> >> obj-$(CONFIG_ACPI) += tables.o >> >> +# >> +# The hed.o needs to be in front of evged.o to avoid the problem that >> +# RAS errors cannot be handled in the special time window of startup >> +# phase that evged has initialized while hed not. >> +# >> +obj-$(CONFIG_ACPI_HED) += hed.o >> + >> # >> # ACPI Core Subsystem (Interpreter) >> # >> @@ -95,7 +102,6 @@ obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o >> obj-$(CONFIG_ACPI_BATTERY) += battery.o >> obj-$(CONFIG_ACPI_SBS) += sbshc.o >> obj-$(CONFIG_ACPI_SBS) += sbs.o >> -obj-$(CONFIG_ACPI_HED) += hed.o >> obj-$(CONFIG_ACPI_EC_DEBUGFS) += ec_sys.o >> obj-$(CONFIG_ACPI_BGRT) += bgrt.o >> obj-$(CONFIG_ACPI_CPPC_LIB) += cppc_acpi.o > .
On Mon, 23 Dec 2024 17:31:08 +0800 Xiaofei Tan <tanxiaofei@huawei.com> wrote: > Hi Rafael, > > 在 2024/12/11 1:59, Rafael J. Wysocki 写道: > > On Fri, Nov 15, 2024 at 4:56 AM Xiaofei Tan <tanxiaofei@huawei.com> wrote: > >> When the module hed is built-in, the init order is determined by > >> Makefile order. > > Are you sure? > > yes We had a similar fix in CXL recently (which is why I suggested this approach internally when tanxiaofei mentioned the problem). https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/cxl?id=6575b268157f37929948a8d1f3bafb3d7c055bc1 The related discussion for the CXL patch was the first time I'd come across solution to load order for built in cases. > > >> That order violates expectations. Because the module > >> hed init is behind evged. RAS records can't be handled in the > >> special time window that evged has initialized while hed not. > >> If the number of such RAS records is more than the APEI HEST error > >> source number, the HEST resources could be occupied all, and then > >> could affect subsequent RAS error reporting. > > Well, the problem is real, but does the change really prevent it from > > happening or does it just increase the likelihood of success? > > It can be completely solved if the driver used as built-in way. If build HED as a > module, it not solved. Can we enforce that condition not happening with appropriate Kconfig? It's annoying to restrict build options, but if needed to make it work then better than not working! Jonathan > > > > > In the latter case, and generally speaking too, it would be better to > > add explicit synchronization between evged and hed. > > > >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > >> Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> > >> --- > >> drivers/acpi/Makefile | 8 +++++++- > >> 1 file changed, 7 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile > >> index 61ca4afe83dc..54f60b7922ad 100644 > >> --- a/drivers/acpi/Makefile > >> +++ b/drivers/acpi/Makefile > >> @@ -15,6 +15,13 @@ endif > >> > >> obj-$(CONFIG_ACPI) += tables.o > >> > >> +# > >> +# The hed.o needs to be in front of evged.o to avoid the problem that > >> +# RAS errors cannot be handled in the special time window of startup > >> +# phase that evged has initialized while hed not. > >> +# > >> +obj-$(CONFIG_ACPI_HED) += hed.o > >> + > >> # > >> # ACPI Core Subsystem (Interpreter) > >> # > >> @@ -95,7 +102,6 @@ obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o > >> obj-$(CONFIG_ACPI_BATTERY) += battery.o > >> obj-$(CONFIG_ACPI_SBS) += sbshc.o > >> obj-$(CONFIG_ACPI_SBS) += sbs.o > >> -obj-$(CONFIG_ACPI_HED) += hed.o > >> obj-$(CONFIG_ACPI_EC_DEBUGFS) += ec_sys.o > >> obj-$(CONFIG_ACPI_BGRT) += bgrt.o > >> obj-$(CONFIG_ACPI_CPPC_LIB) += cppc_acpi.o > >> -- > >> 2.33.0 > >> > > .
Hi Jonathan, 在 2024/12/24 3:33, Jonathan Cameron 写道: > On Mon, 23 Dec 2024 17:31:08 +0800 > Xiaofei Tan <tanxiaofei@huawei.com> wrote: > >> Hi Rafael, >> >> 在 2024/12/11 1:59, Rafael J. Wysocki 写道: >>> On Fri, Nov 15, 2024 at 4:56 AM Xiaofei Tan <tanxiaofei@huawei.com> wrote: >>>> When the module hed is built-in, the init order is determined by >>>> Makefile order. >>> Are you sure? >> yes > We had a similar fix in CXL recently (which is why I suggested this approach > internally when tanxiaofei mentioned the problem). > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/cxl?id=6575b268157f37929948a8d1f3bafb3d7c055bc1 > > The related discussion for the CXL patch was the first time I'd come across solution > to load order for built in cases. > Yes :) >>>> That order violates expectations. Because the module >>>> hed init is behind evged. RAS records can't be handled in the >>>> special time window that evged has initialized while hed not. >>>> If the number of such RAS records is more than the APEI HEST error >>>> source number, the HEST resources could be occupied all, and then >>>> could affect subsequent RAS error reporting. >>> Well, the problem is real, but does the change really prevent it from >>> happening or does it just increase the likelihood of success? >> It can be completely solved if the driver used as built-in way. If build HED as a >> module, it not solved. > Can we enforce that condition not happening with appropriate Kconfig? > It's annoying to restrict build options, but if needed to make it work > then better than not working! Agree, i will change ACPI_HED from tristate to bool if there are no other comments, thanks. > > Jonathan > > >>> In the latter case, and generally speaking too, it would be better to >>> add explicit synchronization between evged and hed. >>> >>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >>>> Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> >>>> --- >>>> drivers/acpi/Makefile | 8 +++++++- >>>> 1 file changed, 7 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile >>>> index 61ca4afe83dc..54f60b7922ad 100644 >>>> --- a/drivers/acpi/Makefile >>>> +++ b/drivers/acpi/Makefile >>>> @@ -15,6 +15,13 @@ endif >>>> >>>> obj-$(CONFIG_ACPI) += tables.o >>>> >>>> +# >>>> +# The hed.o needs to be in front of evged.o to avoid the problem that >>>> +# RAS errors cannot be handled in the special time window of startup >>>> +# phase that evged has initialized while hed not. >>>> +# >>>> +obj-$(CONFIG_ACPI_HED) += hed.o >>>> + >>>> # >>>> # ACPI Core Subsystem (Interpreter) >>>> # >>>> @@ -95,7 +102,6 @@ obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o >>>> obj-$(CONFIG_ACPI_BATTERY) += battery.o >>>> obj-$(CONFIG_ACPI_SBS) += sbshc.o >>>> obj-$(CONFIG_ACPI_SBS) += sbs.o >>>> -obj-$(CONFIG_ACPI_HED) += hed.o >>>> obj-$(CONFIG_ACPI_EC_DEBUGFS) += ec_sys.o >>>> obj-$(CONFIG_ACPI_BGRT) += bgrt.o >>>> obj-$(CONFIG_ACPI_CPPC_LIB) += cppc_acpi.o >>>> -- >>>> 2.33.0 >>>> >>> . > .
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index 61ca4afe83dc..54f60b7922ad 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -15,6 +15,13 @@ endif obj-$(CONFIG_ACPI) += tables.o +# +# The hed.o needs to be in front of evged.o to avoid the problem that +# RAS errors cannot be handled in the special time window of startup +# phase that evged has initialized while hed not. +# +obj-$(CONFIG_ACPI_HED) += hed.o + # # ACPI Core Subsystem (Interpreter) # @@ -95,7 +102,6 @@ obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o obj-$(CONFIG_ACPI_BATTERY) += battery.o obj-$(CONFIG_ACPI_SBS) += sbshc.o obj-$(CONFIG_ACPI_SBS) += sbs.o -obj-$(CONFIG_ACPI_HED) += hed.o obj-$(CONFIG_ACPI_EC_DEBUGFS) += ec_sys.o obj-$(CONFIG_ACPI_BGRT) += bgrt.o obj-$(CONFIG_ACPI_CPPC_LIB) += cppc_acpi.o