diff mbox

[Resend] ARM: kdump: makes second kernel use strict pfn_valid

Message ID 1400464443-34816-1-git-send-email-wangnan0@huawei.com
State New
Headers show

Commit Message

Wang Nan May 19, 2014, 1:54 a.m. UTC
When SPARSEMEM and CRASH_DUMP both selected, simple pfn_valid prevents
the second kernel ioremap first kernel's memory if the address falls
into second kernel section. This limitation requires the second kernel
occupies a full section, and elfcorehdr must resides in another section.

This patch makes crash dump kernel use strict pfn_valid, removes such
limitation.

For example:

  For a platform with SECTION_SIZE_BITS == 28 (256MiB) and
  crashkernel=128M@0x28000000 in kernel cmdline, the second
  kernel is loaded at 0x28000000. Kexec puts elfcorehdr at
  0x2ff00000, and passes 'elfcorehdr=0x2ff00000 mem=130048K' to
  second kernel. When second kernel start, it tries to use
  ioremap to retrive its elfcorehrd. In this case, elfcodehdr is at the
  same section of the second kernel, pfn_valid will recongnize
  the page as valid, so ioremap will refuse to map it.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Geng Hui <hui.geng@huawei.com>
---

I have sent this patch once, but get no response. Resend with commit
message update.

---
 arch/arm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Will Deacon May 19, 2014, 4:09 p.m. UTC | #1
On Mon, May 19, 2014 at 02:54:03AM +0100, Wang Nan wrote:
> When SPARSEMEM and CRASH_DUMP both selected, simple pfn_valid prevents
> the second kernel ioremap first kernel's memory if the address falls
> into second kernel section. This limitation requires the second kernel
> occupies a full section, and elfcorehdr must resides in another section.
> 
> This patch makes crash dump kernel use strict pfn_valid, removes such
> limitation.
> 
> For example:
> 
>   For a platform with SECTION_SIZE_BITS == 28 (256MiB) and
>   crashkernel=128M@0x28000000 in kernel cmdline, the second
>   kernel is loaded at 0x28000000. Kexec puts elfcorehdr at
>   0x2ff00000, and passes 'elfcorehdr=0x2ff00000 mem=130048K' to
>   second kernel. When second kernel start, it tries to use
>   ioremap to retrive its elfcorehrd. In this case, elfcodehdr is at the
>   same section of the second kernel, pfn_valid will recongnize
>   the page as valid, so ioremap will refuse to map it.

So isn't the issue here that you're passing an incorrect mem= parameter
to the crash kernel?

Will
Wang Nan May 20, 2014, 3:22 a.m. UTC | #2
On 2014/5/20 0:09, Will Deacon wrote:
> On Mon, May 19, 2014 at 02:54:03AM +0100, Wang Nan wrote:
>> When SPARSEMEM and CRASH_DUMP both selected, simple pfn_valid prevents
>> the second kernel ioremap first kernel's memory if the address falls
>> into second kernel section. This limitation requires the second kernel
>> occupies a full section, and elfcorehdr must resides in another section.
>>
>> This patch makes crash dump kernel use strict pfn_valid, removes such
>> limitation.
>>
>> For example:
>>
>>   For a platform with SECTION_SIZE_BITS == 28 (256MiB) and
>>   crashkernel=128M@0x28000000 in kernel cmdline, the second
>>   kernel is loaded at 0x28000000. Kexec puts elfcorehdr at
>>   0x2ff00000, and passes 'elfcorehdr=0x2ff00000 mem=130048K' to
>>   second kernel. When second kernel start, it tries to use
>>   ioremap to retrive its elfcorehrd. In this case, elfcodehdr is at the
>>   same section of the second kernel, pfn_valid will recongnize
>>   the page as valid, so ioremap will refuse to map it.
> 
> So isn't the issue here that you're passing an incorrect mem= parameter
> to the crash kernel?
> 

mem= parameter is generated by kexec-tools according to /proc/iomem, it is the size
of reserved memory minus 1MiB. So I think what you mean is I passing an incorrect
crashkernel= parameter?

I'll explain limitations on crash kernel reserved memory in the case of SPARSEMEM
enabled, and show how *impractical* the 'correct' crashkernel will be.

Use realview board for example.

Limitation 1: crash kernel reservation kernel must be aligned with 0x08000000 (128MiB).

  This is because zImage determine final kernel address by (pc & 0xf8000000). If,
  for example, set crashkernel=64M@0x29000000, then the second kernel itself
  overwrites first kernel's memory. We'll lost some memory in /proc/vmcore.

Limitation 2: crash kernel must resides in different section with the first kernel.

  This is because the second kernel use ioremap for accessing first kernel's memory,
  and arm prevent a valid pfn be ioremapped. Which means a whole section must be reserved
  for the secton kernel. On realview, which is 256MiB.

Limitation 3: the last 1MiB of reserved memory must be ioremappable.

  This is because the second kernel depeneds kexec-tools passing an elfheader as
  'elfcorehdr' to instructs it generating /proc/vmcore. See fs/proc/vmcore.c. Kexec-tools
  simply uses the last 1MiB for it. The second kernel use ioremap to access it, force
  the header be put in another section.

In realview board, the only possible correct setting should be 'crashkernel=257M@0x20000000'.
However, realview has only 1GiB memory, crash kernel consumes a quarter plus 1MiB. In addition, even
set this parameter, crash kernel is still unusable because:

  crashkernel reservation failed - memory is in use (0x20000000)

> Will
>
Wang Nan May 22, 2014, 1:53 a.m. UTC | #3
Hi Will,

What's your opinion about my explanation?

Thanks!

On 2014/5/20 11:22, Wang Nan wrote:
> On 2014/5/20 0:09, Will Deacon wrote:
>> On Mon, May 19, 2014 at 02:54:03AM +0100, Wang Nan wrote:
>>> When SPARSEMEM and CRASH_DUMP both selected, simple pfn_valid prevents
>>> the second kernel ioremap first kernel's memory if the address falls
>>> into second kernel section. This limitation requires the second kernel
>>> occupies a full section, and elfcorehdr must resides in another section.
>>>
>>> This patch makes crash dump kernel use strict pfn_valid, removes such
>>> limitation.
>>>
>>> For example:
>>>
>>>   For a platform with SECTION_SIZE_BITS == 28 (256MiB) and
>>>   crashkernel=128M@0x28000000 in kernel cmdline, the second
>>>   kernel is loaded at 0x28000000. Kexec puts elfcorehdr at
>>>   0x2ff00000, and passes 'elfcorehdr=0x2ff00000 mem=130048K' to
>>>   second kernel. When second kernel start, it tries to use
>>>   ioremap to retrive its elfcorehrd. In this case, elfcodehdr is at the
>>>   same section of the second kernel, pfn_valid will recongnize
>>>   the page as valid, so ioremap will refuse to map it.
>>
>> So isn't the issue here that you're passing an incorrect mem= parameter
>> to the crash kernel?
>>
> 
> mem= parameter is generated by kexec-tools according to /proc/iomem, it is the size
> of reserved memory minus 1MiB. So I think what you mean is I passing an incorrect
> crashkernel= parameter?
> 
> I'll explain limitations on crash kernel reserved memory in the case of SPARSEMEM
> enabled, and show how *impractical* the 'correct' crashkernel will be.
> 
> Use realview board for example.
> 
> Limitation 1: crash kernel reservation kernel must be aligned with 0x08000000 (128MiB).
> 
>   This is because zImage determine final kernel address by (pc & 0xf8000000). If,
>   for example, set crashkernel=64M@0x29000000, then the second kernel itself
>   overwrites first kernel's memory. We'll lost some memory in /proc/vmcore.
> 
> Limitation 2: crash kernel must resides in different section with the first kernel.
> 
>   This is because the second kernel use ioremap for accessing first kernel's memory,
>   and arm prevent a valid pfn be ioremapped. Which means a whole section must be reserved
>   for the secton kernel. On realview, which is 256MiB.
> 
> Limitation 3: the last 1MiB of reserved memory must be ioremappable.
> 
>   This is because the second kernel depeneds kexec-tools passing an elfheader as
>   'elfcorehdr' to instructs it generating /proc/vmcore. See fs/proc/vmcore.c. Kexec-tools
>   simply uses the last 1MiB for it. The second kernel use ioremap to access it, force
>   the header be put in another section.
> 
> In realview board, the only possible correct setting should be 'crashkernel=257M@0x20000000'.
> However, realview has only 1GiB memory, crash kernel consumes a quarter plus 1MiB. In addition, even
> set this parameter, crash kernel is still unusable because:
> 
>   crashkernel reservation failed - memory is in use (0x20000000)
> 
>> Will
>>
> 
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
>
AKASHI Takahiro May 29, 2014, 4:39 a.m. UTC | #4
Wang, Will

I'm now working on kdump support for arm64 on top of Geoff's kexec patch.

On 05/20/2014 12:22 PM, Wang Nan wrote:
> On 2014/5/20 0:09, Will Deacon wrote:
>> On Mon, May 19, 2014 at 02:54:03AM +0100, Wang Nan wrote:
>>> When SPARSEMEM and CRASH_DUMP both selected, simple pfn_valid prevents
>>> the second kernel ioremap first kernel's memory if the address falls
>>> into second kernel section. This limitation requires the second kernel
>>> occupies a full section, and elfcorehdr must resides in another section.
>>>
>>> This patch makes crash dump kernel use strict pfn_valid, removes such
>>> limitation.
>>>
>>> For example:
>>>
>>>    For a platform with SECTION_SIZE_BITS == 28 (256MiB) and
>>>    crashkernel=128M@0x28000000 in kernel cmdline, the second
>>>    kernel is loaded at 0x28000000. Kexec puts elfcorehdr at
>>>    0x2ff00000, and passes 'elfcorehdr=0x2ff00000 mem=130048K' to
>>>    second kernel. When second kernel start, it tries to use
>>>    ioremap to retrive its elfcorehrd. In this case, elfcodehdr is at the
>>>    same section of the second kernel, pfn_valid will recongnize
>>>    the page as valid, so ioremap will refuse to map it.
>>
>> So isn't the issue here that you're passing an incorrect mem= parameter
>> to the crash kernel?
>>
>
> mem= parameter is generated by kexec-tools according to /proc/iomem, it is the size
> of reserved memory minus 1MiB. So I think what you mean is I passing an incorrect
> crashkernel= parameter?

Just FYI, kexec-tools doesn't seem to be implemented in proper way to support device-tree.
Once device-tree is handled correctly, we don't need to pass "mem=" parameter.
(Of course, only on machines that support device-tree.)

> I'll explain limitations on crash kernel reserved memory in the case of SPARSEMEM
> enabled, and show how *impractical* the 'correct' crashkernel will be.
>
> Use realview board for example.
>
> Limitation 1: crash kernel reservation kernel must be aligned with 0x08000000 (128MiB).
>
>    This is because zImage determine final kernel address by (pc & 0xf8000000). If,
>    for example, set crashkernel=64M@0x29000000, then the second kernel itself
>    overwrites first kernel's memory. We'll lost some memory in /proc/vmcore.
>
> Limitation 2: crash kernel must resides in different section with the first kernel.
>
>    This is because the second kernel use ioremap for accessing first kernel's memory,
>    and arm prevent a valid pfn be ioremapped. Which means a whole section must be reserved
>    for the secton kernel. On realview, which is 256MiB.
>
> Limitation 3: the last 1MiB of reserved memory must be ioremappable.
>
>    This is because the second kernel depeneds kexec-tools passing an elfheader as
>    'elfcorehdr' to instructs it generating /proc/vmcore. See fs/proc/vmcore.c. Kexec-tools
>    simply uses the last 1MiB for it. The second kernel use ioremap to access it, force
>    the header be put in another section.

We can avoid "Limitation 3" just by implementing arm's own elfcorehdr_read() with memcpy().
I can submit a patch, but can't test it for now.

-Takahiro AKASHI


> In realview board, the only possible correct setting should be 'crashkernel=257M@0x20000000'.
> However, realview has only 1GiB memory, crash kernel consumes a quarter plus 1MiB. In addition, even
> set this parameter, crash kernel is still unusable because:
>
>    crashkernel reservation failed - memory is in use (0x20000000)
>
>> Will
>>
>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Wang Nan May 29, 2014, 10:09 a.m. UTC | #5
On 2014/5/29 12:39, AKASHI Takahiro wrote:
> Wang, Will
> 
> I'm now working on kdump support for arm64 on top of Geoff's kexec patch.
> 
> On 05/20/2014 12:22 PM, Wang Nan wrote:
>> On 2014/5/20 0:09, Will Deacon wrote:
>>> On Mon, May 19, 2014 at 02:54:03AM +0100, Wang Nan wrote:
>>>> When SPARSEMEM and CRASH_DUMP both selected, simple pfn_valid prevents
>>>> the second kernel ioremap first kernel's memory if the address falls
>>>> into second kernel section. This limitation requires the second kernel
>>>> occupies a full section, and elfcorehdr must resides in another section.
>>>>
>>>> This patch makes crash dump kernel use strict pfn_valid, removes such
>>>> limitation.
>>>>
>>>> For example:
>>>>
>>>>    For a platform with SECTION_SIZE_BITS == 28 (256MiB) and
>>>>    crashkernel=128M@0x28000000 in kernel cmdline, the second
>>>>    kernel is loaded at 0x28000000. Kexec puts elfcorehdr at
>>>>    0x2ff00000, and passes 'elfcorehdr=0x2ff00000 mem=130048K' to
>>>>    second kernel. When second kernel start, it tries to use
>>>>    ioremap to retrive its elfcorehrd. In this case, elfcodehdr is at the
>>>>    same section of the second kernel, pfn_valid will recongnize
>>>>    the page as valid, so ioremap will refuse to map it.
>>>
>>> So isn't the issue here that you're passing an incorrect mem= parameter
>>> to the crash kernel?
>>>
>>
>> mem= parameter is generated by kexec-tools according to /proc/iomem, it is the size
>> of reserved memory minus 1MiB. So I think what you mean is I passing an incorrect
>> crashkernel= parameter?
> 
> Just FYI, kexec-tools doesn't seem to be implemented in proper way to support device-tree.
> Once device-tree is handled correctly, we don't need to pass "mem=" parameter.
> (Of course, only on machines that support device-tree.)
> 
>> I'll explain limitations on crash kernel reserved memory in the case of SPARSEMEM
>> enabled, and show how *impractical* the 'correct' crashkernel will be.
>>
>> Use realview board for example.
>>
>> Limitation 1: crash kernel reservation kernel must be aligned with 0x08000000 (128MiB).
>>
>>    This is because zImage determine final kernel address by (pc & 0xf8000000). If,
>>    for example, set crashkernel=64M@0x29000000, then the second kernel itself
>>    overwrites first kernel's memory. We'll lost some memory in /proc/vmcore.
>>
>> Limitation 2: crash kernel must resides in different section with the first kernel.
>>
>>    This is because the second kernel use ioremap for accessing first kernel's memory,
>>    and arm prevent a valid pfn be ioremapped. Which means a whole section must be reserved
>>    for the secton kernel. On realview, which is 256MiB.
>>
>> Limitation 3: the last 1MiB of reserved memory must be ioremappable.
>>
>>    This is because the second kernel depeneds kexec-tools passing an elfheader as
>>    'elfcorehdr' to instructs it generating /proc/vmcore. See fs/proc/vmcore.c. Kexec-tools
>>    simply uses the last 1MiB for it. The second kernel use ioremap to access it, force
>>    the header be put in another section.
> 
> We can avoid "Limitation 3" just by implementing arm's own elfcorehdr_read() with memcpy().
> I can submit a patch, but can't test it for now.
> 
> -Takahiro AKASHI
> 

However you still need pfn_valid to check whether elfcorehdr resides in a valid area.

Furthermore, simply replacing ioremap to memcpy seems breaks things. Configurations work
before replacement will fail. Finally you will find you still need strict pfn_valid to
check whether to use ioremap or use memcpy.

> 
>> In realview board, the only possible correct setting should be 'crashkernel=257M@0x20000000'.
>> However, realview has only 1GiB memory, crash kernel consumes a quarter plus 1MiB. In addition, even
>> set this parameter, crash kernel is still unusable because:
>>
>>    crashkernel reservation failed - memory is in use (0x20000000)
>>
>>> Will
>>>
>>
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
diff mbox

Patch

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index db3c541..795b1d4 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1800,7 +1800,7 @@  config ARCH_SELECT_MEMORY_MODEL
 	def_bool ARCH_SPARSEMEM_ENABLE
 
 config HAVE_ARCH_PFN_VALID
-	def_bool ARCH_HAS_HOLES_MEMORYMODEL || !SPARSEMEM
+	def_bool ARCH_HAS_HOLES_MEMORYMODEL || !SPARSEMEM || CRASH_DUMP
 
 config HIGHMEM
 	bool "High Memory Support"