mbox series

[v2,00/23] x86_64: Improvements at compressed kernel stage

Message ID cover.1666705333.git.baskov@ispras.ru
Headers show
Series x86_64: Improvements at compressed kernel stage | expand

Message

Evgeniy Baskov Oct. 25, 2022, 2:12 p.m. UTC
This patchset is aimed
* to improve UEFI compatibility of compressed kernel code for x86_64
* to setup proper memory access attributes for code and rodata sections
* to implement W^X protection policy throughout the whole execution 
  of compressed kernel for EFISTUB code path. 

Kernel is made to be more compatible with PE image specification [3],
allowing it to be successfully loaded by stricter PE loader
implementations like the one from [2]. There is at least one
known implementation that uses that loader in production [4].
There are also ongoing efforts to upstream these changes.

Also the patchset adds EFI_MEMORY_ATTTRIBUTE_PROTOCOL, included into
EFI specification since version 2.10, as a better alternative to
using DXE services for memory protection attributes manipulation,
since it is defined by the UEFI specification itself and not UEFI PI
specification. This protocol is not widely available so the code
using DXE services is kept in place as a fallback in case specific
implementation does not support the new protocol.
One of EFI implementations that already support
EFI_MEMORY_ATTTRIBUTE_PROTOCOL is Microsoft Project Mu [5].
 
Kernel image generation tool (tools/build.c) is refactored as a part
of changes that makes PE image more compatible.
   
The patchset implements memory protection for compressed kernel
code while executing both inside EFI boot services and outside of
them. For EFISTUB code path W^X protection policy is maintained
throughout the whole execution of compressed kernel. The latter
is achieved by extracting the kernel directly from EFI environment
and jumping to it's head immediately after exiting EFI boot services.
As a side effect of this change one page table rebuild and a copy of
the kernel image is removed.

Direct extraction can be toggled using CONFIG_EFI_STUB_EXTRACT_DIRECT.
Memory protection inside EFI environment is controlled by the
CONFIG_DXE_MEM_ATTRIBUTES option, although with these patches this
option also control the use EFI_MEMORY_ATTTRIBUTE_PROTOCOL and memory
protection attributes of PE sections and not only DXE services as the
name might suggest.

Changes in v2:
 * Fix spelling.
 * Rebase code to current master.
 * Split huge patches into smaller ones.
 * Remove unneeded forward declarations.
 * Make direct extraction unconditional.
   * Also make it work for x86_32.
   * Reduce lower limit of KASLR to 64M.
 * Make callback interface more logically consistent.
 * Actually declare callbacks structure before using it.
 * Mention effect on x86_32 in commit message of 
   "x86/build: Remove RWX sections and align on 4KB".
 * Clarify commit message of
   "x86/boot: Increase boot page table size".
 * Remove "startup32_" prefix on startup32_enable_nx_if_supported.
 * Move linker generated sections outside of function scope.
 * Drop some unintended changes.
 * Drop generating 2 reloc entries.
   (as I've misread the documentation and there's no need for this change.)
 * Set has_nx from enable_nx_if_supported correctly.
 * Move ELF header check to build time.
 * Set WP at the same time as PG in trampoline code,
   as it is more logically consistent.
 * Put x86-specific EFISTUB definitions in x86-stub.h header.
 * Catch presence of ELF segments violating W^X during build.
 * Move PE definitions from build.c to a new header file.
 * Fix generation of PE '.compat' section.

I decided to keep protection of compressed kernel blob and '.rodata'
separate from '.text' for now, since it does not really have a lot
of overhead.

Otherwise, all comments on v1 seems to be addressed. Please also apply
Peter's patch [6] on top of this series.

Patch "x86/boot: Support 4KB pages for identity mapping" needs review
from x86/mm team.

[1] https://lkml.org/lkml/2022/8/1/1314
[2] https://github.com/acidanthera/audk/tree/secure_pe
[3] https://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/pecoff_v83.docx
[4] https://www.ispras.ru/en/technologies/asperitas/
[5] https://github.com/microsoft/mu_tiano_platforms
[6] https://lkml.org/lkml/2022/10/18/1178

Evgeniy Baskov (23):
  x86/boot: Align vmlinuz sections on page size
  x86/build: Remove RWX sections and align on 4KB
  x86/boot: Set cr0 to known state in trampoline
  x86/boot: Increase boot page table size
  x86/boot: Support 4KB pages for identity mapping
  x86/boot: Setup memory protection for bzImage code
  x86/build: Check W^X of vmlinux during build
  x86/boot: Map memory explicitly
  x86/boot: Remove mapping from page fault handler
  efi/libstub: Move helper function to related file
  x86/boot: Make console interface more abstract
  x86/boot: Make kernel_add_identity_map() a pointer
  x86/boot: Split trampoline and pt init code
  x86/boot: Add EFI kernel extraction interface
  efi/x86: Support extracting kernel from libstub
  x86/boot: Reduce lower limit of physical KASLR
  x86/boot: Reduce size of the DOS stub
  tools/include: Add simplified version of pe.h
  x86/build: Cleanup tools/build.c
  x86/build: Make generated PE more spec compliant
  efi/x86: Explicitly set sections memory attributes
  efi/libstub: Add memory attribute protocol definitions
  efi/libstub: Use memory attribute protocol

 arch/x86/boot/Makefile                        |   2 +-
 arch/x86/boot/compressed/Makefile             |   8 +-
 arch/x86/boot/compressed/acpi.c               |  21 +-
 arch/x86/boot/compressed/efi.c                |  19 +-
 arch/x86/boot/compressed/head_32.S            |  44 +-
 arch/x86/boot/compressed/head_64.S            |  76 ++-
 arch/x86/boot/compressed/ident_map_64.c       | 122 ++--
 arch/x86/boot/compressed/kaslr.c              |   8 +-
 arch/x86/boot/compressed/misc.c               | 279 ++++-----
 arch/x86/boot/compressed/misc.h               |  23 +-
 arch/x86/boot/compressed/pgtable.h            |  20 -
 arch/x86/boot/compressed/pgtable_64.c         |  75 ++-
 arch/x86/boot/compressed/putstr.c             | 130 ++++
 arch/x86/boot/compressed/sev.c                |   6 +-
 arch/x86/boot/compressed/vmlinux.lds.S        |   6 +
 arch/x86/boot/header.S                        | 110 +---
 arch/x86/boot/tools/build.c                   | 573 +++++++++++-------
 arch/x86/include/asm/boot.h                   |  26 +-
 arch/x86/include/asm/efi.h                    |   7 +
 arch/x86/include/asm/init.h                   |   1 +
 arch/x86/include/asm/shared/extract.h         |  27 +
 arch/x86/include/asm/shared/pgtable.h         |  29 +
 arch/x86/kernel/vmlinux.lds.S                 |  15 +-
 arch/x86/mm/ident_map.c                       | 185 +++++-
 drivers/firmware/efi/Kconfig                  |   2 +
 drivers/firmware/efi/libstub/Makefile         |   2 +-
 drivers/firmware/efi/libstub/efistub.h        |  26 +
 drivers/firmware/efi/libstub/mem.c            | 190 ++++++
 .../firmware/efi/libstub/x86-extract-direct.c | 204 +++++++
 drivers/firmware/efi/libstub/x86-stub.c       | 231 ++-----
 drivers/firmware/efi/libstub/x86-stub.h       |  11 +
 include/linux/efi.h                           |   1 +
 tools/include/linux/pe.h                      | 150 +++++
 33 files changed, 1848 insertions(+), 781 deletions(-)
 delete mode 100644 arch/x86/boot/compressed/pgtable.h
 create mode 100644 arch/x86/boot/compressed/putstr.c
 create mode 100644 arch/x86/include/asm/shared/extract.h
 create mode 100644 arch/x86/include/asm/shared/pgtable.h
 create mode 100644 drivers/firmware/efi/libstub/x86-extract-direct.c
 create mode 100644 drivers/firmware/efi/libstub/x86-stub.h
 create mode 100644 tools/include/linux/pe.h

Comments

Mario Limonciello Nov. 4, 2022, 6:21 p.m. UTC | #1
On 10/25/2022 09:12, Evgeniy Baskov wrote:
> This patchset is aimed
> * to improve UEFI compatibility of compressed kernel code for x86_64
> * to setup proper memory access attributes for code and rodata sections
> * to implement W^X protection policy throughout the whole execution
>    of compressed kernel for EFISTUB code path.
> 
> Kernel is made to be more compatible with PE image specification [3],
> allowing it to be successfully loaded by stricter PE loader
> implementations like the one from [2]. There is at least one
> known implementation that uses that loader in production [4].
> There are also ongoing efforts to upstream these changes.
> 
> Also the patchset adds EFI_MEMORY_ATTTRIBUTE_PROTOCOL, included into
> EFI specification since version 2.10, as a better alternative to
> using DXE services for memory protection attributes manipulation,
> since it is defined by the UEFI specification itself and not UEFI PI
> specification. This protocol is not widely available so the code
> using DXE services is kept in place as a fallback in case specific
> implementation does not support the new protocol.
> One of EFI implementations that already support
> EFI_MEMORY_ATTTRIBUTE_PROTOCOL is Microsoft Project Mu [5].
>   
> Kernel image generation tool (tools/build.c) is refactored as a part
> of changes that makes PE image more compatible.
>     
> The patchset implements memory protection for compressed kernel
> code while executing both inside EFI boot services and outside of
> them. For EFISTUB code path W^X protection policy is maintained
> throughout the whole execution of compressed kernel. The latter
> is achieved by extracting the kernel directly from EFI environment
> and jumping to it's head immediately after exiting EFI boot services.
> As a side effect of this change one page table rebuild and a copy of
> the kernel image is removed.
> 
> Direct extraction can be toggled using CONFIG_EFI_STUB_EXTRACT_DIRECT.
> Memory protection inside EFI environment is controlled by the
> CONFIG_DXE_MEM_ATTRIBUTES option, although with these patches this
> option also control the use EFI_MEMORY_ATTTRIBUTE_PROTOCOL and memory
> protection attributes of PE sections and not only DXE services as the
> name might suggest.
> 
> Changes in v2:
>   * Fix spelling.
>   * Rebase code to current master.
>   * Split huge patches into smaller ones.
>   * Remove unneeded forward declarations.
>   * Make direct extraction unconditional.
>     * Also make it work for x86_32.
>     * Reduce lower limit of KASLR to 64M.
>   * Make callback interface more logically consistent.
>   * Actually declare callbacks structure before using it.
>   * Mention effect on x86_32 in commit message of
>     "x86/build: Remove RWX sections and align on 4KB".
>   * Clarify commit message of
>     "x86/boot: Increase boot page table size".
>   * Remove "startup32_" prefix on startup32_enable_nx_if_supported.
>   * Move linker generated sections outside of function scope.
>   * Drop some unintended changes.
>   * Drop generating 2 reloc entries.
>     (as I've misread the documentation and there's no need for this change.)
>   * Set has_nx from enable_nx_if_supported correctly.
>   * Move ELF header check to build time.
>   * Set WP at the same time as PG in trampoline code,
>     as it is more logically consistent.
>   * Put x86-specific EFISTUB definitions in x86-stub.h header.
>   * Catch presence of ELF segments violating W^X during build.
>   * Move PE definitions from build.c to a new header file.
>   * Fix generation of PE '.compat' section.
> 
> I decided to keep protection of compressed kernel blob and '.rodata'
> separate from '.text' for now, since it does not really have a lot
> of overhead.
> 
> Otherwise, all comments on v1 seems to be addressed. Please also apply
> Peter's patch [6] on top of this series.
> 
> Patch "x86/boot: Support 4KB pages for identity mapping" needs review
> from x86/mm team.
> 
> [1] https://lkml.org/lkml/2022/8/1/1314
> [2] https://github.com/acidanthera/audk/tree/secure_pe
> [3] https://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/pecoff_v83.docx
> [4] https://www.ispras.ru/en/technologies/asperitas/
> [5] https://github.com/microsoft/mu_tiano_platforms
> [6] https://lkml.org/lkml/2022/10/18/1178
> 
> Evgeniy Baskov (23):
>    x86/boot: Align vmlinuz sections on page size
>    x86/build: Remove RWX sections and align on 4KB
>    x86/boot: Set cr0 to known state in trampoline
>    x86/boot: Increase boot page table size
>    x86/boot: Support 4KB pages for identity mapping
>    x86/boot: Setup memory protection for bzImage code
>    x86/build: Check W^X of vmlinux during build
>    x86/boot: Map memory explicitly
>    x86/boot: Remove mapping from page fault handler
>    efi/libstub: Move helper function to related file
>    x86/boot: Make console interface more abstract
>    x86/boot: Make kernel_add_identity_map() a pointer
>    x86/boot: Split trampoline and pt init code
>    x86/boot: Add EFI kernel extraction interface
>    efi/x86: Support extracting kernel from libstub
>    x86/boot: Reduce lower limit of physical KASLR
>    x86/boot: Reduce size of the DOS stub
>    tools/include: Add simplified version of pe.h
>    x86/build: Cleanup tools/build.c
>    x86/build: Make generated PE more spec compliant
>    efi/x86: Explicitly set sections memory attributes
>    efi/libstub: Add memory attribute protocol definitions
>    efi/libstub: Use memory attribute protocol
> 
>   arch/x86/boot/Makefile                        |   2 +-
>   arch/x86/boot/compressed/Makefile             |   8 +-
>   arch/x86/boot/compressed/acpi.c               |  21 +-
>   arch/x86/boot/compressed/efi.c                |  19 +-
>   arch/x86/boot/compressed/head_32.S            |  44 +-
>   arch/x86/boot/compressed/head_64.S            |  76 ++-
>   arch/x86/boot/compressed/ident_map_64.c       | 122 ++--
>   arch/x86/boot/compressed/kaslr.c              |   8 +-
>   arch/x86/boot/compressed/misc.c               | 279 ++++-----
>   arch/x86/boot/compressed/misc.h               |  23 +-
>   arch/x86/boot/compressed/pgtable.h            |  20 -
>   arch/x86/boot/compressed/pgtable_64.c         |  75 ++-
>   arch/x86/boot/compressed/putstr.c             | 130 ++++
>   arch/x86/boot/compressed/sev.c                |   6 +-
>   arch/x86/boot/compressed/vmlinux.lds.S        |   6 +
>   arch/x86/boot/header.S                        | 110 +---
>   arch/x86/boot/tools/build.c                   | 573 +++++++++++-------
>   arch/x86/include/asm/boot.h                   |  26 +-
>   arch/x86/include/asm/efi.h                    |   7 +
>   arch/x86/include/asm/init.h                   |   1 +
>   arch/x86/include/asm/shared/extract.h         |  27 +
>   arch/x86/include/asm/shared/pgtable.h         |  29 +
>   arch/x86/kernel/vmlinux.lds.S                 |  15 +-
>   arch/x86/mm/ident_map.c                       | 185 +++++-
>   drivers/firmware/efi/Kconfig                  |   2 +
>   drivers/firmware/efi/libstub/Makefile         |   2 +-
>   drivers/firmware/efi/libstub/efistub.h        |  26 +
>   drivers/firmware/efi/libstub/mem.c            | 190 ++++++
>   .../firmware/efi/libstub/x86-extract-direct.c | 204 +++++++
>   drivers/firmware/efi/libstub/x86-stub.c       | 231 ++-----
>   drivers/firmware/efi/libstub/x86-stub.h       |  11 +
>   include/linux/efi.h                           |   1 +
>   tools/include/linux/pe.h                      | 150 +++++
>   33 files changed, 1848 insertions(+), 781 deletions(-)
>   delete mode 100644 arch/x86/boot/compressed/pgtable.h
>   create mode 100644 arch/x86/boot/compressed/putstr.c
>   create mode 100644 arch/x86/include/asm/shared/extract.h
>   create mode 100644 arch/x86/include/asm/shared/pgtable.h
>   create mode 100644 drivers/firmware/efi/libstub/x86-extract-direct.c
>   create mode 100644 drivers/firmware/efi/libstub/x86-stub.h
>   create mode 100644 tools/include/linux/pe.h
> 

Hi,

I was talking to Peter Jones recently about what was still missing for 
NX support in the kernel and he pointed me at this series.

So I had a try with this series on top of:

ee6050c8af96 ("Merge tag 'ata-6.1-rc4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata")

Unfortunately I can't boot the system with this series applied.
This is not on a system that enforces NX pre-boot (but that was my goal 
after I could prove booting on something that doesn't).
I didn't apply Peter's patch 6 you referenced in your cover letter, but 
I don't expect that's the reason for the failure.

I get:

"Failed to allocate space for tmp_cmdline"

    -- System Halted

This is early enough [1] that I don't have anything else output to a 
serial log from the kernel.

https://github.com/torvalds/linux/blob/d4013bc4d49f6da8178a340348369bb9920225c9/arch/x86/boot/compressed/kaslr.c#L268

Since this is only in the kaslr path, I tried to turn that off with 
'nokaslr' on the kernel command line.

I then get a failure of:

"Out of memory while allocating zstd_dctx"

   -- System Halted

This kernel was booted from the following path:
-> Insyde BIOS
--> shim (from Fedora 36 repository)
---> GRUB (from Peter for Fedora 36 w/ some level NX support)
----> kernel binary (self-built)

The BIOS on this system doesn't validate NX, but also the shim binary 
did not have the NX bit set in the PE header.

Your cover letter referenced CONFIG_EFI_STUB_EXTRACT_DIRECT but I didn't 
find this option in the series.  I also tried both with 
CONFIG_EFI_DXE_MEM_ATTRIBUTES=y or unset, same result.
Evgeniy Baskov Nov. 8, 2022, 7:01 a.m. UTC | #2
On 2022-11-04 21:21, Limonciello, Mario wrote:
> On 10/25/2022 09:12, Evgeniy Baskov wrote:
>> ...
>> 
> 
> Hi,
> 
> I was talking to Peter Jones recently about what was still missing for
> NX support in the kernel and he pointed me at this series.
> 
> So I had a try with this series on top of:
> 
> ee6050c8af96 ("Merge tag 'ata-6.1-rc4' of
> git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata")
> 
> Unfortunately I can't boot the system with this series applied.
> This is not on a system that enforces NX pre-boot (but that was my
> goal after I could prove booting on something that doesn't).
> I didn't apply Peter's patch 6 you referenced in your cover letter,
> but I don't expect that's the reason for the failure.
> 
> I get:
> 
> "Failed to allocate space for tmp_cmdline"
> 
>    -- System Halted
> 
> This is early enough [1] that I don't have anything else output to a
> serial log from the kernel.
> 
> https://github.com/torvalds/linux/blob/d4013bc4d49f6da8178a340348369bb9920225c9/arch/x86/boot/compressed/kaslr.c#L268
> 
> Since this is only in the kaslr path, I tried to turn that off with
> 'nokaslr' on the kernel command line.
> 
> I then get a failure of:
> 
> "Out of memory while allocating zstd_dctx"
> 
>   -- System Halted
> 
> This kernel was booted from the following path:
> -> Insyde BIOS
> --> shim (from Fedora 36 repository)
> ---> GRUB (from Peter for Fedora 36 w/ some level NX support)
> ----> kernel binary (self-built)
> 
> The BIOS on this system doesn't validate NX, but also the shim binary
> did not have the NX bit set in the PE header.
> 
> Your cover letter referenced CONFIG_EFI_STUB_EXTRACT_DIRECT but I
> didn't find this option in the series.  I also tried both with
> CONFIG_EFI_DXE_MEM_ATTRIBUTES=y or unset, same result.

Hi,

Thanks for your feedback!

CONFIG_EFI_STUB_EXTRACT_DIRECT option was removed in v2 of the series
and direct extraction is unconditional now.

You are getting really weird errors, which unfortunately I am unable
to reproduce yet. I've tried booting with fedora's grub and the series
applied on top of ee6050c8af96, but it did boot successfully.

 From the error messages it's some problem with malloc() implementation
of compressed kernel code. I suspect that malloc_ptr inside .bss is not
zeroed. This should not happen when booting via either non-UEFI
interface, or via UEFI (when kernel is properly loaded as PE image).
The problem, I think, arises when kernel is loaded as a blob, but EFI
handover protocol is used to start its execution. This is what grub
seems to be doing.

Can you please try booting with patches below applied on top?
If this fixes the problem, I'll include these changes in v3.

Thanks,
Evgeniy Baskov
--
diff --git a/arch/x86/boot/compressed/head_32.S 
b/arch/x86/boot/compressed/head_32.S
index 9871cc8466fb..69811d9ab4ce 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -152,6 +152,14 @@ SYM_FUNC_END(startup_32)

  #ifdef CONFIG_EFI_STUB
  SYM_FUNC_START(efi32_stub_entry)
+	/* Clear BSS */
+	xorl	%eax, %eax
+	leal	_bss@GOTOFF(%ebx), %edi
+	leal	_ebss@GOTOFF(%ebx), %ecx
+	subl	%edi, %ecx
+	shrl	$2, %ecx
+	rep	stosl
+
  	add	$0x4, %esp
  	movl	8(%esp), %esi	/* save boot_params pointer */
  	call	efi_main
diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index 2bb0e6da08c0..384706d12354 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -553,8 +553,20 @@ SYM_CODE_END(startup_64)
  #ifdef CONFIG_EFI_STUB
  	.org 0x390
  SYM_FUNC_START(efi64_stub_entry)
+	/* Preserve first parameter */
+	movq	%rdi, %r10
+
+	/* Clear BSS */
+	xorl	%eax, %eax
+	leaq	_bss(%rip), %rdi
+	leaq	_ebss(%rip), %rcx
+	subq	%rdi, %rcx
+	shrq	$3, %rcx
+	rep	stosq
+
  	and	$~0xf, %rsp			/* realign the stack */
  	movq	%rdx, %rbx			/* save boot_params pointer */
+	movq	%r10, %rdi
  	call	efi_main

  	cld
diff --git a/drivers/firmware/efi/libstub/x86-stub.c 
b/drivers/firmware/efi/libstub/x86-stub.c
index 95a69c37518e..1a2c52e77c6e 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -24,7 +24,7 @@

  const efi_system_table_t *efi_system_table;
  extern u32 image_offset;
-static efi_loaded_image_t *image = NULL;
+static efi_loaded_image_t *image __section(".data");

  extern char _head[], _ehead[];
  extern char _compressed[], _ecompressed[];
Mario Limonciello Nov. 8, 2022, 6:17 p.m. UTC | #3
On 11/8/2022 01:01, Evgeniy Baskov wrote:
> On 2022-11-04 21:21, Limonciello, Mario wrote:
>> On 10/25/2022 09:12, Evgeniy Baskov wrote:
>>> ...
>>>
>>
>> Hi,
>>
>> I was talking to Peter Jones recently about what was still missing for
>> NX support in the kernel and he pointed me at this series.
>>
>> So I had a try with this series on top of:
>>
>> ee6050c8af96 ("Merge tag 'ata-6.1-rc4' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata")
>>
>> Unfortunately I can't boot the system with this series applied.
>> This is not on a system that enforces NX pre-boot (but that was my
>> goal after I could prove booting on something that doesn't).
>> I didn't apply Peter's patch 6 you referenced in your cover letter,
>> but I don't expect that's the reason for the failure.
>>
>> I get:
>>
>> "Failed to allocate space for tmp_cmdline"
>>
>>    -- System Halted
>>
>> This is early enough [1] that I don't have anything else output to a
>> serial log from the kernel.
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftorvalds%2Flinux%2Fblob%2Fd4013bc4d49f6da8178a340348369bb9920225c9%2Farch%2Fx86%2Fboot%2Fcompressed%2Fkaslr.c%23L268&data=05%7C01%7Cmario.limonciello%40amd.com%7C9280e92e85bc4e2b52cd08dac15704a5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638034876740462327%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=oiCJUa8M3x%2FYCxVjJj98R7iU%2FzIj%2FQxdVOWbnqGWCNI%3D&reserved=0
>>
>> Since this is only in the kaslr path, I tried to turn that off with
>> 'nokaslr' on the kernel command line.
>>
>> I then get a failure of:
>>
>> "Out of memory while allocating zstd_dctx"
>>
>>   -- System Halted
>>
>> This kernel was booted from the following path:
>> -> Insyde BIOS
>> --> shim (from Fedora 36 repository)
>> ---> GRUB (from Peter for Fedora 36 w/ some level NX support)
>> ----> kernel binary (self-built)
>>
>> The BIOS on this system doesn't validate NX, but also the shim binary
>> did not have the NX bit set in the PE header.
>>
>> Your cover letter referenced CONFIG_EFI_STUB_EXTRACT_DIRECT but I
>> didn't find this option in the series.  I also tried both with
>> CONFIG_EFI_DXE_MEM_ATTRIBUTES=y or unset, same result.
> 
> Hi,
> 
> Thanks for your feedback!
> 

Sure!

> CONFIG_EFI_STUB_EXTRACT_DIRECT option was removed in v2 of the series
> and direct extraction is unconditional now.
> 
> You are getting really weird errors, which unfortunately I am unable
> to reproduce yet. I've tried booting with fedora's grub and the series
> applied on top of ee6050c8af96, but it did boot successfully.

Well so I expect the unique difference is that I'm using Peter's GRUB 
that has some NX support landed.  He has binaries for it here:

https://blog.uncooperative.org/~pjones/nx/repo/

*Theoretically* a BIOS that enforces NX should be able to boot a shim with
the NX compat bit set which should be able to boot that GRUB supporting 
NX which should be able to boot this series.

> 
>  From the error messages it's some problem with malloc() implementation
> of compressed kernel code. I suspect that malloc_ptr inside .bss is not
> zeroed. This should not happen when booting via either non-UEFI
> interface, or via UEFI (when kernel is properly loaded as PE image).
> The problem, I think, arises when kernel is loaded as a blob, but EFI
> handover protocol is used to start its execution. This is what grub
> seems to be doing.
> 
> Can you please try booting with patches below applied on top?
> If this fixes the problem, I'll include these changes in v3.

Yup, spot on.  I can the kernel from Peter's NX enabled GRUB now with:
* 6.1-rc4
* Your existing 23 patch series
* this new patch

Thanks!!

Would you mind CC me when you submit v3?  As I have an interest in 
seeing NX support I'd like to continue to follow along on the series.

Anything in the series you don't change in any material way from v2 
please feel free to include:

Tested-by: Mario Limonciello <mario.limonciello@amd.com>

> 
> Thanks,
> Evgeniy Baskov
> -- 
> diff --git a/arch/x86/boot/compressed/head_32.S 
> b/arch/x86/boot/compressed/head_32.S
> index 9871cc8466fb..69811d9ab4ce 100644
> --- a/arch/x86/boot/compressed/head_32.S
> +++ b/arch/x86/boot/compressed/head_32.S
> @@ -152,6 +152,14 @@ SYM_FUNC_END(startup_32)
> 
>   #ifdef CONFIG_EFI_STUB
>   SYM_FUNC_START(efi32_stub_entry)
> +    /* Clear BSS */
> +    xorl    %eax, %eax
> +    leal    _bss@GOTOFF(%ebx), %edi
> +    leal    _ebss@GOTOFF(%ebx), %ecx
> +    subl    %edi, %ecx
> +    shrl    $2, %ecx
> +    rep    stosl
> +
>       add    $0x4, %esp
>       movl    8(%esp), %esi    /* save boot_params pointer */
>       call    efi_main
> diff --git a/arch/x86/boot/compressed/head_64.S 
> b/arch/x86/boot/compressed/head_64.S
> index 2bb0e6da08c0..384706d12354 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -553,8 +553,20 @@ SYM_CODE_END(startup_64)
>   #ifdef CONFIG_EFI_STUB
>       .org 0x390
>   SYM_FUNC_START(efi64_stub_entry)
> +    /* Preserve first parameter */
> +    movq    %rdi, %r10
> +
> +    /* Clear BSS */
> +    xorl    %eax, %eax
> +    leaq    _bss(%rip), %rdi
> +    leaq    _ebss(%rip), %rcx
> +    subq    %rdi, %rcx
> +    shrq    $3, %rcx
> +    rep    stosq
> +
>       and    $~0xf, %rsp            /* realign the stack */
>       movq    %rdx, %rbx            /* save boot_params pointer */
> +    movq    %r10, %rdi
>       call    efi_main
> 
>       cld
> diff --git a/drivers/firmware/efi/libstub/x86-stub.c 
> b/drivers/firmware/efi/libstub/x86-stub.c
> index 95a69c37518e..1a2c52e77c6e 100644
> --- a/drivers/firmware/efi/libstub/x86-stub.c
> +++ b/drivers/firmware/efi/libstub/x86-stub.c
> @@ -24,7 +24,7 @@
> 
>   const efi_system_table_t *efi_system_table;
>   extern u32 image_offset;
> -static efi_loaded_image_t *image = NULL;
> +static efi_loaded_image_t *image __section(".data");
> 
>   extern char _head[], _ehead[];
>   extern char _compressed[], _ecompressed[];
Evgeniy Baskov Nov. 8, 2022, 11:49 p.m. UTC | #4
On 2022-11-08 21:17, Limonciello, Mario wrote:
> On 11/8/2022 01:01, Evgeniy Baskov wrote:
>> On 2022-11-04 21:21, Limonciello, Mario wrote:
>>> On 10/25/2022 09:12, Evgeniy Baskov wrote:
>>>> ...
>>>> 
>>> 
>>> Hi,
>>> 
>>> I was talking to Peter Jones recently about what was still missing 
>>> for
>>> NX support in the kernel and he pointed me at this series.
>>> 
>>> So I had a try with this series on top of:
>>> 
>>> ee6050c8af96 ("Merge tag 'ata-6.1-rc4' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata")
>>> 
>>> Unfortunately I can't boot the system with this series applied.
>>> This is not on a system that enforces NX pre-boot (but that was my
>>> goal after I could prove booting on something that doesn't).
>>> I didn't apply Peter's patch 6 you referenced in your cover letter,
>>> but I don't expect that's the reason for the failure.
>>> 
>>> I get:
>>> 
>>> "Failed to allocate space for tmp_cmdline"
>>> 
>>>    -- System Halted
>>> 
>>> This is early enough [1] that I don't have anything else output to a
>>> serial log from the kernel.
>>> 
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftorvalds%2Flinux%2Fblob%2Fd4013bc4d49f6da8178a340348369bb9920225c9%2Farch%2Fx86%2Fboot%2Fcompressed%2Fkaslr.c%23L268&amp;data=05%7C01%7Cmario.limonciello%40amd.com%7C9280e92e85bc4e2b52cd08dac15704a5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638034876740462327%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=oiCJUa8M3x%2FYCxVjJj98R7iU%2FzIj%2FQxdVOWbnqGWCNI%3D&amp;reserved=0
>>> 
>>> Since this is only in the kaslr path, I tried to turn that off with
>>> 'nokaslr' on the kernel command line.
>>> 
>>> I then get a failure of:
>>> 
>>> "Out of memory while allocating zstd_dctx"
>>> 
>>>   -- System Halted
>>> 
>>> This kernel was booted from the following path:
>>> -> Insyde BIOS
>>> --> shim (from Fedora 36 repository)
>>> ---> GRUB (from Peter for Fedora 36 w/ some level NX support)
>>> ----> kernel binary (self-built)
>>> 
>>> The BIOS on this system doesn't validate NX, but also the shim binary
>>> did not have the NX bit set in the PE header.
>>> 
>>> Your cover letter referenced CONFIG_EFI_STUB_EXTRACT_DIRECT but I
>>> didn't find this option in the series.  I also tried both with
>>> CONFIG_EFI_DXE_MEM_ATTRIBUTES=y or unset, same result.
>> 
>> Hi,
>> 
>> Thanks for your feedback!
>> 
> 
> Sure!
> 
>> CONFIG_EFI_STUB_EXTRACT_DIRECT option was removed in v2 of the series
>> and direct extraction is unconditional now.
>> 
>> You are getting really weird errors, which unfortunately I am unable
>> to reproduce yet. I've tried booting with fedora's grub and the series
>> applied on top of ee6050c8af96, but it did boot successfully.
> 
> Well so I expect the unique difference is that I'm using Peter's GRUB
> that has some NX support landed.  He has binaries for it here:
> 
> https://blog.uncooperative.org/~pjones/nx/repo/
> 
> *Theoretically* a BIOS that enforces NX should be able to boot a shim 
> with
> the NX compat bit set which should be able to boot that GRUB
> supporting NX which should be able to boot this series.
> 
>> 
>>  From the error messages it's some problem with malloc() 
>> implementation
>> of compressed kernel code. I suspect that malloc_ptr inside .bss is 
>> not
>> zeroed. This should not happen when booting via either non-UEFI
>> interface, or via UEFI (when kernel is properly loaded as PE image).
>> The problem, I think, arises when kernel is loaded as a blob, but EFI
>> handover protocol is used to start its execution. This is what grub
>> seems to be doing.
>> 
>> Can you please try booting with patches below applied on top?
>> If this fixes the problem, I'll include these changes in v3.
> 
> Yup, spot on.  I can the kernel from Peter's NX enabled GRUB now with:
> * 6.1-rc4
> * Your existing 23 patch series
> * this new patch
> 



> Thanks!!
> 
> Would you mind CC me when you submit v3?  As I have an interest in
> seeing NX support I'd like to continue to follow along on the series.

Ok.
> 
> Anything in the series you don't change in any material way from v2
> please feel free to include:
> 
> Tested-by: Mario Limonciello <mario.limonciello@amd.com>
> 

Great, thanks!
Lee, Chun-Yi Nov. 20, 2022, 1:49 a.m. UTC | #5
Hi Evgeniy,

Thanks for your effort!

On Tue, Oct 25, 2022 at 05:12:38PM +0300, Evgeniy Baskov wrote:
> This patchset is aimed
> * to improve UEFI compatibility of compressed kernel code for x86_64
> * to setup proper memory access attributes for code and rodata sections
> * to implement W^X protection policy throughout the whole execution 
>   of compressed kernel for EFISTUB code path. 
> 
> Kernel is made to be more compatible with PE image specification [3],
> allowing it to be successfully loaded by stricter PE loader
> implementations like the one from [2]. There is at least one
> known implementation that uses that loader in production [4].
> There are also ongoing efforts to upstream these changes.
> 
> Also the patchset adds EFI_MEMORY_ATTTRIBUTE_PROTOCOL, included into
> EFI specification since version 2.10, as a better alternative to
> using DXE services for memory protection attributes manipulation,
> since it is defined by the UEFI specification itself and not UEFI PI
> specification. This protocol is not widely available so the code
> using DXE services is kept in place as a fallback in case specific
> implementation does not support the new protocol.
> One of EFI implementations that already support
> EFI_MEMORY_ATTTRIBUTE_PROTOCOL is Microsoft Project Mu [5].
>  

Because Peter Jones point out this patchset to me, so I tried it
on OVMF, and I set the EfiLoaderData in DXE memory protection policy:

Index: edk2/MdeModulePkg/MdeModulePkg.dec
===================================================================
--- edk2.orig/MdeModulePkg/MdeModulePkg.dec
+++ edk2/MdeModulePkg/MdeModulePkg.dec
@@ -1392,7 +1392,7 @@
   # e.g. 0x7BD4 can be used for all memory except Code and ACPINVS/Reserved. <BR>
   #
   # @Prompt Set DXE memory protection policy.
-  gEfiMdeModulePkgTokenSpaceGuid.PcdDxeNxMemoryProtectionPolicy|0x0000000|UINT64|0x00001048^M
+  gEfiMdeModulePkgTokenSpaceGuid.PcdDxeNxMemoryProtectionPolicy|0x0000004|UINT64|0x00001048^M
 
   ## PCI Serial Device Info. It is an array of Device, Function, and Power Management
   #  information that describes the path that contains zero or more PCI to PCI bridges


I applied this v2 patch set on top of v6.1-rc5 kernel, and boot with a shim which
set the PE NX-compatibility DLL Characteristic flag. I got a page fault exception:

Loading Linux 6.1.0-rc5-default+ ...
Loading initial ramdisk ...
!!!! X64 Exception Type - 0E(#PF - Page-Fault)  CPU Apic ID - 00000000 !!!!
ExceptionData - 0000000000000011  I:1 R:0 U:0 W:0 P:1 PK:0 SS:0 SGX:0
RIP  - 0000000076A3C390, CS  - 0000000000000038, RFLAGS - 0000000000210202
RAX  - 000000007D8CCDF8, RCX - 0000000076A3C390, RDX - 000000007DE86000
RBX  - 0000000076A3C000, RSP - 000000007FF0D2C8, RBP - 000000007DE86000
RSI  - 000000007F9EE018, RDI - 000000007DFD1C18
R8   - 0000000076A3C000, R9  - 0000000000000190, R10 - 000000007FF1D658
R11  - 0000000000000004, R12 - 0000000000000190, R13 - 000000007D8CCE00
R14  - 000000007D8C76B4, R15 - 000000007BF0CBD5
DS   - 0000000000000030, ES  - 0000000000000030, FS  - 0000000000000030
GS   - 0000000000000030, SS  - 0000000000000030
CR0  - 0000000080010033, CR2 - 0000000076A3C390, CR3 - 000000007FC01000
CR4  - 0000000000000668, CR8 - 0000000000000000
DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3  - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 000000007F9DE000 0000000000000047, LDTR - 0000000000000000
IDTR - 000000007F2E9018 0000000000000FFF,   TR - 0000000000000000
FXSAVE_STATE - 000000007FF0CF20
!!!! Find image based on IP(0x7BF0BAB5) /mnt/working/source_code-git/edk2/Build/OvmfX64/DEBUG_GCC5/X64/MdeModulePkg/Universal/Variable/RuntimeDxe/VariableRuntimeDxe/DEBUG/VariableRuntimeDxe.dll (ImageBase=0000000000F40E7C, EntryPoint=0000000000F767B8) !!!!


My question is: Can I just set EfiLoaderData in DXE memory protection policy
in EDK2/OVMF to test this patchset? Or which platform (virtual or physical)
can we use for testing?

Thanks a lot!
Joey Lee