mbox series

[v3.1,0/4] arm64: kexec,kdump: fix boot failures on acpi-only system

Message ID 20180709234229.20181-1-takahiro.akashi@linaro.org
Headers show
Series arm64: kexec,kdump: fix boot failures on acpi-only system | expand

Message

AKASHI Takahiro July 9, 2018, 11:42 p.m. UTC
This patch series is a set of bug fixes to address kexec/kdump
failures which are sometimes observed on ACPI-only system and reported
in LAK-ML before.

In short, the phenomena are:
1. kexec'ed kernel can fail to boot because some ACPI table is corrupted
   by a new kernel (or other data) being loaded into System RAM. Currently
   kexec may possibly allocate space ignoring such "reserved" regions.
   We will see no messages after "Bye!"

2. crash dump (kdump) kernel can fail to boot and get into panic due to
   an alignment fault when accessing ACPI tables. This can happen because
   those tables are not always properly aligned while they are mapped
   non-cacheable (ioremap'ed) as they are not recognized as part of System
   RAM under the current implementation.

After discussing several possibilities to address those issues,
the agreed approach, in my understanding, is
* to add resource entries for every "reserved", i.e. memblock_reserve(),
  regions to /proc/iomem.
  (NOMAP regions, also marked as "reserved," remains at top-level for
  backward compatibility. User-space can tell the difference between
  reserved-system-ram and reserved-address-space.)
* For case (1), user space (kexec-tools) should rule out such regions
  in searching for free space for loaded data.
* For case (2), the kernel should access ACPI tables by mapping
  them with appropriate memory attributes described in UEFI memory map.
  (This means that it doesn't require any changes in /proc/iomem, and
  hence user space.)

Please find past discussions about /proc/iomem in [1].
--- more words from James ---
Our attempts to fix this just in the kernel reached a dead end, because Kdump
needs to include reserved-system-ram, whereas kexec has to avoid it. User-space
needs to be able to tell reserved-system-ram and reserved-address-space apart.
Hence we need to expose that information, and pick it up in user-space.

Patched-kernel and unpatch-user-space will work the same way it does today, as
the additional reserved regions are ignored by user-space.

Unpatched-kernel and patched-user-space will also work the same way it does
today as the additional reserved regions are missing.
--->8---

Patch#1 addresses kexec case, for which you are also required to update
user space. See necessary patches in [2]. If you want to review Patch#1,
please also take a look at and review [2].

Patch#2, #3 and #4 address the kdump case above.


Changes in v3.1 (2018, July 10, 2018)
* add Ard's patch[4] to this patch set

Changes in v3 (2018, July 9, 2018)
* drop the v2's patch#3, preferring [4]

Changes in v2 (2018, June 19, 2018)
* re-organise v1's patch#2 and #3 into v2's #2, #3 and #4
  not to break bisect

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-March/565980.html
[2] https://git.linaro.org/people/takahiro.akashi/kexec-tools.git arm64/resv_mem
[3] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-April/573655.html
[4] https://marc.info/?l=linux-efi&m=152930773507524&w=2

AKASHI Takahiro (2):
  efi/arm: map UEFI memory map even w/o runtime services enabled
  arm64: acpi: fix alignment fault in accessing ACPI

Ard Biesheuvel (1):
  efi/arm: preserve early mapping of UEFI memory map longer for BGRT

James Morse (1):
  arm64: export memblock_reserve()d regions via /proc/iomem

 arch/arm64/include/asm/acpi.h      | 23 ++++++++++++------
 arch/arm64/kernel/acpi.c           | 11 +++------
 arch/arm64/kernel/setup.c          | 38 ++++++++++++++++++++++++++++++
 drivers/firmware/efi/arm-init.c    |  1 -
 drivers/firmware/efi/arm-runtime.c | 16 +++++++------
 5 files changed, 66 insertions(+), 23 deletions(-)

-- 
2.17.0

Comments

Will Deacon July 12, 2018, 4:49 p.m. UTC | #1
Hi Akashi,

On Tue, Jul 10, 2018 at 08:42:25AM +0900, AKASHI Takahiro wrote:
> This patch series is a set of bug fixes to address kexec/kdump

> failures which are sometimes observed on ACPI-only system and reported

> in LAK-ML before.


I tried picking this up, along with Ard's fixup, but I'm seeing a build
failure for allmodconfig:

arch/arm64/kernel/acpi.o: In function `__acpi_get_mem_attribute':
acpi.c:(.text+0x60): undefined reference to `efi_mem_attributes'

I didn't investigate further. Please can you fix this?

Will
AKASHI Takahiro July 13, 2018, 12:34 a.m. UTC | #2
On Thu, Jul 12, 2018 at 05:49:19PM +0100, Will Deacon wrote:
> Hi Akashi,

> 

> On Tue, Jul 10, 2018 at 08:42:25AM +0900, AKASHI Takahiro wrote:

> > This patch series is a set of bug fixes to address kexec/kdump

> > failures which are sometimes observed on ACPI-only system and reported

> > in LAK-ML before.

> 

> I tried picking this up, along with Ard's fixup, but I'm seeing a build

> failure for allmodconfig:

> 

> arch/arm64/kernel/acpi.o: In function `__acpi_get_mem_attribute':

> acpi.c:(.text+0x60): undefined reference to `efi_mem_attributes'

> 

> I didn't investigate further. Please can you fix this?


Because CONFIG_ACPI is on and CONFIG_EFI is off.

This can happen in allmodconfig as CONFIG_EFI depends on
!CONFIG_CPU_BIG_ENDIAN, which is actually on in this case.

Looking at __acpi_get_mem_attributes(), since there is no information
available on memory attributes, what we can do at best is
  * return PAGE_KERNEL (= cacheable) for mapped memory,
  * return DEVICE_nGnRnE (= non-cacheable) otherwise
(See a hunk to be applied on top of my patch#4.)

I think that, after applying, acpi_os_ioremap() would work almost
in the same way as the original before my patchset given that
MAP memblock attribute is used only under CONFIG_EFI for now.

Make sense?

-Takahiro AKASHI

> Will

---8<---
diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index ed46dc188b22..cad3ed2666ef 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -238,6 +238,7 @@ void __init acpi_boot_table_init(void)
 
 pgprot_t __acpi_get_mem_attribute(phys_addr_t addr)
 {
+#ifdef CONFIG_EFI
 	/*
 	 * According to "Table 8 Map: EFI memory types to AArch64 memory
 	 * types" of UEFI 2.5 section 2.3.6.1, each EFI memory type is
@@ -255,5 +256,9 @@ pgprot_t __acpi_get_mem_attribute(phys_addr_t addr)
 		return __pgprot(PROT_NORMAL_WT);
 	if (attr & EFI_MEMORY_WC)
 		return __pgprot(PROT_NORMAL_NC);
+#else
+	if (memblock_is_map_memory(addr))
+		return PAGE_KERNEL;
+#endif
 	return __pgprot(PROT_DEVICE_nGnRnE);
 }
Ard Biesheuvel July 13, 2018, 5:49 a.m. UTC | #3
On 13 July 2018 at 02:34, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote:
> On Thu, Jul 12, 2018 at 05:49:19PM +0100, Will Deacon wrote:

>> Hi Akashi,

>>

>> On Tue, Jul 10, 2018 at 08:42:25AM +0900, AKASHI Takahiro wrote:

>> > This patch series is a set of bug fixes to address kexec/kdump

>> > failures which are sometimes observed on ACPI-only system and reported

>> > in LAK-ML before.

>>

>> I tried picking this up, along with Ard's fixup, but I'm seeing a build

>> failure for allmodconfig:

>>

>> arch/arm64/kernel/acpi.o: In function `__acpi_get_mem_attribute':

>> acpi.c:(.text+0x60): undefined reference to `efi_mem_attributes'

>>

>> I didn't investigate further. Please can you fix this?

>

> Because CONFIG_ACPI is on and CONFIG_EFI is off.

>

> This can happen in allmodconfig as CONFIG_EFI depends on

> !CONFIG_CPU_BIG_ENDIAN, which is actually on in this case.

>


Allowing both CONFIG_ACPI and CONFIG_CPU_BIG_ENDIAN to be configured
makes no sense at all. Things will surely break if you start using BE
memory accesses while parsing ACPI tables.

Allowing CONFIG_ACPI without CONFIG_EFI makes no sense either, since
on arm64, the only way to find the ACPI tables is through a UEFI
configuration table.

> Looking at __acpi_get_mem_attributes(), since there is no information

> available on memory attributes, what we can do at best is

>   * return PAGE_KERNEL (= cacheable) for mapped memory,

>   * return DEVICE_nGnRnE (= non-cacheable) otherwise

> (See a hunk to be applied on top of my patch#4.)

>

> I think that, after applying, acpi_os_ioremap() would work almost

> in the same way as the original before my patchset given that

> MAP memblock attribute is used only under CONFIG_EFI for now.

>

> Make sense?

>


Let's keep your code as is but fix the Kconfig dependencies instead.
AKASHI Takahiro July 17, 2018, 5:12 a.m. UTC | #4
Will,

On Fri, Jul 13, 2018 at 07:49:45AM +0200, Ard Biesheuvel wrote:
> On 13 July 2018 at 02:34, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote:

> > On Thu, Jul 12, 2018 at 05:49:19PM +0100, Will Deacon wrote:

> >> Hi Akashi,

> >>

> >> On Tue, Jul 10, 2018 at 08:42:25AM +0900, AKASHI Takahiro wrote:

> >> > This patch series is a set of bug fixes to address kexec/kdump

> >> > failures which are sometimes observed on ACPI-only system and reported

> >> > in LAK-ML before.

> >>

> >> I tried picking this up, along with Ard's fixup, but I'm seeing a build

> >> failure for allmodconfig:

> >>

> >> arch/arm64/kernel/acpi.o: In function `__acpi_get_mem_attribute':

> >> acpi.c:(.text+0x60): undefined reference to `efi_mem_attributes'

> >>

> >> I didn't investigate further. Please can you fix this?

> >

> > Because CONFIG_ACPI is on and CONFIG_EFI is off.

> >

> > This can happen in allmodconfig as CONFIG_EFI depends on

> > !CONFIG_CPU_BIG_ENDIAN, which is actually on in this case.

> >

> 

> Allowing both CONFIG_ACPI and CONFIG_CPU_BIG_ENDIAN to be configured

> makes no sense at all. Things will surely break if you start using BE

> memory accesses while parsing ACPI tables.

> 

> Allowing CONFIG_ACPI without CONFIG_EFI makes no sense either, since

> on arm64, the only way to find the ACPI tables is through a UEFI

> configuration table.



Do you agree to this?

-Takahiro AKASHI

> > Looking at __acpi_get_mem_attributes(), since there is no information

> > available on memory attributes, what we can do at best is

> >   * return PAGE_KERNEL (= cacheable) for mapped memory,

> >   * return DEVICE_nGnRnE (= non-cacheable) otherwise

> > (See a hunk to be applied on top of my patch#4.)

> >

> > I think that, after applying, acpi_os_ioremap() would work almost

> > in the same way as the original before my patchset given that

> > MAP memblock attribute is used only under CONFIG_EFI for now.

> >

> > Make sense?

> >

> 

> Let's keep your code as is but fix the Kconfig dependencies instead.
Will Deacon July 23, 2018, 1:35 p.m. UTC | #5
On Tue, Jul 17, 2018 at 02:12:23PM +0900, AKASHI Takahiro wrote:
> On Fri, Jul 13, 2018 at 07:49:45AM +0200, Ard Biesheuvel wrote:

> > On 13 July 2018 at 02:34, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote:

> > > On Thu, Jul 12, 2018 at 05:49:19PM +0100, Will Deacon wrote:

> > >> Hi Akashi,

> > >>

> > >> On Tue, Jul 10, 2018 at 08:42:25AM +0900, AKASHI Takahiro wrote:

> > >> > This patch series is a set of bug fixes to address kexec/kdump

> > >> > failures which are sometimes observed on ACPI-only system and reported

> > >> > in LAK-ML before.

> > >>

> > >> I tried picking this up, along with Ard's fixup, but I'm seeing a build

> > >> failure for allmodconfig:

> > >>

> > >> arch/arm64/kernel/acpi.o: In function `__acpi_get_mem_attribute':

> > >> acpi.c:(.text+0x60): undefined reference to `efi_mem_attributes'

> > >>

> > >> I didn't investigate further. Please can you fix this?

> > >

> > > Because CONFIG_ACPI is on and CONFIG_EFI is off.

> > >

> > > This can happen in allmodconfig as CONFIG_EFI depends on

> > > !CONFIG_CPU_BIG_ENDIAN, which is actually on in this case.

> > >

> > 

> > Allowing both CONFIG_ACPI and CONFIG_CPU_BIG_ENDIAN to be configured

> > makes no sense at all. Things will surely break if you start using BE

> > memory accesses while parsing ACPI tables.

> > 

> > Allowing CONFIG_ACPI without CONFIG_EFI makes no sense either, since

> > on arm64, the only way to find the ACPI tables is through a UEFI

> > configuration table.

> 

> 

> Do you agree to this?


Yes; please post a new series which resolves these dependencies and includes
Ard's fixup from before.

Thanks,

Will
Bhupesh Sharma July 24, 2018, 5:19 a.m. UTC | #6
Hi Will,

On Mon, Jul 23, 2018 at 7:05 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Tue, Jul 17, 2018 at 02:12:23PM +0900, AKASHI Takahiro wrote:

>> On Fri, Jul 13, 2018 at 07:49:45AM +0200, Ard Biesheuvel wrote:

>> > On 13 July 2018 at 02:34, AKASHI Takahiro <takahiro.akashi@linaro.org> wrote:

>> > > On Thu, Jul 12, 2018 at 05:49:19PM +0100, Will Deacon wrote:

>> > >> Hi Akashi,

>> > >>

>> > >> On Tue, Jul 10, 2018 at 08:42:25AM +0900, AKASHI Takahiro wrote:

>> > >> > This patch series is a set of bug fixes to address kexec/kdump

>> > >> > failures which are sometimes observed on ACPI-only system and reported

>> > >> > in LAK-ML before.

>> > >>

>> > >> I tried picking this up, along with Ard's fixup, but I'm seeing a build

>> > >> failure for allmodconfig:

>> > >>

>> > >> arch/arm64/kernel/acpi.o: In function `__acpi_get_mem_attribute':

>> > >> acpi.c:(.text+0x60): undefined reference to `efi_mem_attributes'

>> > >>

>> > >> I didn't investigate further. Please can you fix this?

>> > >

>> > > Because CONFIG_ACPI is on and CONFIG_EFI is off.

>> > >

>> > > This can happen in allmodconfig as CONFIG_EFI depends on

>> > > !CONFIG_CPU_BIG_ENDIAN, which is actually on in this case.

>> > >

>> >

>> > Allowing both CONFIG_ACPI and CONFIG_CPU_BIG_ENDIAN to be configured

>> > makes no sense at all. Things will surely break if you start using BE

>> > memory accesses while parsing ACPI tables.

>> >

>> > Allowing CONFIG_ACPI without CONFIG_EFI makes no sense either, since

>> > on arm64, the only way to find the ACPI tables is through a UEFI

>> > configuration table.

>>

>>

>> Do you agree to this?

>

> Yes; please post a new series which resolves these dependencies and includes

> Ard's fixup from before.


I see that Akashi has already posted a v4 here with the suggested fixes:
https://lkml.org/lkml/2018/7/22/321

Thanks,
Bhupesh