[v8,0/8] Add support for memory sealing

Message ID	20250129172550.1119706-1-adhemerval.zanella@linaro.org
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D919A3857B9E From: Adhemerval Zanella <adhemerval.zanella@linaro.org> To: libc-alpha@sourceware.org Cc: Jeff Xu <jeffxu@google.com>, Florian Weimer <fweimer@redhat.com>, "H . J . Lu" <hjl.tools@gmail.com> Subject: [PATCH v8 0/8] Add support for memory sealing Date: Wed, 29 Jan 2025 14:22:34 -0300 Message-ID: <20250129172550.1119706-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: libc-alpha-bounces~patch=linaro.org@sourceware.org
Series	Add support for memory sealing \| expand [v8,0/8] Add support for memory sealing [v8,1/8] linux: Add mseal syscall support [v8,2/8] elf: Parse gnu properties for static linked binaries [v8,3/8] elf: Parse gnu properties for the loader [v8,4/8] rtld: Move call_init_paths after _dl_process_pt_gnu_property [v8,5/8] elf: Use RTLD_NODELETE for dependencies [v8,6/8] elf: Add support to memory sealing [v8,7/8] Enable memory sealing automatically [v8,8/8] linux: Add memory sealing tests

Adhemerval Zanella Jan. 29, 2025, 5:22 p.m. UTC

The Linux 6.10 (8be7258aad44) added the mseal syscall that allows
blocking some memory operations on the VMA range:

 * Unmapping, moving to another location, extending or shrinking the
   size, munmap, and mremap.
 * Moving or expanding a different VMA into the current range, via
   mremap.
 * Modifying the memory range with mmap along with flag MAP_FIXED.
 * Expanding the size with mremap.
 * Change the protection flags with mprotect or pkey_mprotect.
 * Destructive behaviors on anonymous memory, such as madvice with
   MADV_DONTNEED.

Memory sealing is a hardening mechanism [1] to avoid either remapping
the memory segments or changing the memory protection segments layout
by the dynamic loader (for instance, the RELRO hardening). The OpenBSD
supports a similar hardening with the mimmutable syscall [2].

Memory sealing is an opt-in security feature that requires the new GNU
property GNU_PROPERTY_MEMORY_SEAL, defined in Linux ABI [3] and
supported on binutils 2.44 [4]. A GNU property is preferable over a
new dynamic section tag (like the one proposed for DT_GNU_FLAGS_1) so
it can be applied to ET_EXEC (for instance on static binaries).

The GNU_PROPERTY_MEMORY_SEAL enforcement depends on whether the kernel
supports the mseal syscall and how glibc is configured.  On the default
configuration that aims to support older kernel releases, the memory
sealing attribute is taken as a hint. If glibc is configured with a
minimum kernel of 6.10, where mseal is implied to be supported,
sealing is enforced.

The first patch adds the mseal support for Linux.  Although most
programs will not use it directly, some specific ones, like Chrome,
intend to use it.

The second and third patches are requirements to enable memory sealing
to work on executables, where they add gnu property parsing on the
loader and static binaries.

The fourth patch moves 'call_init_paths' after gnu attribute parsing,
so the loader can seal the rtld_malloc pages (since they are meant to
be immutable over process execution).

The fifth patch propagates the RTLD_NODELETE flag in case of dlopen. It
will be used to extend memory sealing for the object dependencies.

The sixth patch adds the memory sealing supports in multiple places
where the page is supposed to be immutable over program execution:

 * All shared library dependencies from the binary, including the
   read-only segments after PT_GNU_RELRO setup.
 * The binary itself, including dynamic and static links. In both cases,
   it is up either to binary or the loader to set up the sealing.
 * Any preload libraries.
 * Any library loaded with dlopen with RTLD_NODELETE flag (including
   libgcc.so loaded to enable process unwind and thread cancellation).
 * Audit modules.
 * The loader bump allocator.

The seventh patch makes glibc enable memory sealing as default if the
linker supports the option (-Wl,memory-seal). A new configure option,
--disable-default-memory-seal, disable it.

The last patch adds memory sealing tests, they are enabled if the linker
supports it.

This patchset does not delay RELRO activation until after their ELF
constructors have been executed, as suggested on the previous RFC for
mseal support. It is not strictly required, and it requires extensive
changes on_dl_start_user to either make _dl_init call RELRO/sealing
setup after ctor/initarray is done, or call it after _dl_init.  There 
is also the question of whether to apply RELRO/sealing per module after
dtor/initarray or in bulk after _dt_init.

I tested on both x86_64-linux-gnu and aarch64-linux-gnu with Linux
6.13, along with some testing on a powerpc64le-linux-gnu and s390x-inux-gnu VMs.

[1] https://blog.trailofbits.com/2024/10/25/a-deep-dive-into-linuxs-new-mseal-syscall/
[2] https://man.openbsd.org/mimmutable.2
[3] https://gitlab.com/x86-psABIs/Linux-ABI/-/commit/25a851b99665e7b22db5fabe818efaaa52466893
[4] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=4d890484df4b2cf004f6f1f6d8c39a69fa39c875

Changes v4->v5:
* Removed the tunable.
* Rebased against GCS change to enable GNU attribute parsing on all architectures.

Changes v3->v4:
* Rebase against master (remove nios2 ABI update and handle f2326c2ec0a0a8db
  changes).
* Handle vvar_vclock mapping on tests.
 
Changes v2->v3:
* Make the option opt-in instead of opt-out.

Adhemerval Zanella (8):
  linux: Add mseal syscall support
  elf: Parse gnu properties for static linked binaries
  elf: Parse gnu properties for the loader
  rtld: Move call_init_paths after _dl_process_pt_gnu_property
  elf: Use RTLD_NODELETE for dependencies
  elf: Add support to memory sealing
  Enable memory sealing automatically
  linux: Add memory sealing tests

 INSTALL                                       |   5 +
 Makeconfig                                    |  19 +-
 Makerules                                     |   2 +
 NEWS                                          |  14 +-
 configure                                     |  58 ++++
 configure.ac                                  |  20 ++
 csu/libc-start.c                              |   4 +
 elf/Makefile                                  |  19 +-
 elf/dl-load.c                                 |   4 +
 elf/dl-map-segments.h                         |  16 +-
 elf/dl-minimal-malloc.c                       |   5 +
 elf/dl-open.c                                 |   7 +-
 elf/dl-reloc.c                                |  49 ++++
 elf/dl-support.c                              |  16 +
 elf/elf.h                                     |   2 +
 elf/rtld.c                                    |  26 +-
 elf/setup-vdso.h                              |   2 +
 include/link.h                                |   8 +
 manual/install.texi                           |   5 +
 manual/memory.texi                            |  66 +++++
 sysdeps/aarch64/dl-prop.h                     |   5 +
 sysdeps/generic/dl-mseal.h                    |  22 ++
 sysdeps/generic/dl-prop-mseal.h               |  34 +++
 sysdeps/generic/dl-prop.h                     |   5 +
 sysdeps/generic/ldsodefs.h                    |   9 +
 sysdeps/generic/libc-prop.h                   |  44 +++
 sysdeps/unix/sysv/linux/Makefile              | 102 +++++++
 sysdeps/unix/sysv/linux/Versions              |   3 +
 sysdeps/unix/sysv/linux/aarch64/libc-start.h  |  11 -
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/arc/libc.abilist      |   1 +
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/bits/mman-shared.h    |   8 +
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/dl-mseal.c            |  51 ++++
 sysdeps/unix/sysv/linux/dl-mseal.h            |  31 ++
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/kernel-features.h     |   8 +
 .../sysv/linux/loongarch/lp64/libc.abilist    |   1 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   1 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   1 +
 .../sysv/linux/microblaze/be/libc.abilist     |   1 +
 .../sysv/linux/microblaze/le/libc.abilist     |   1 +
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   1 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   1 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/or1k/libc.abilist     |   1 +
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   1 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   1 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   1 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   1 +
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   1 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   1 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   1 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   1 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   1 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/syscalls.list         |   1 +
 .../sysv/linux/tst-dl_mseal-auditmod-noseal.c |   1 +
 .../unix/sysv/linux/tst-dl_mseal-auditmod.c   |  23 ++
 .../unix/sysv/linux/tst-dl_mseal-dlopen-1-1.c |  19 ++
 .../unix/sysv/linux/tst-dl_mseal-dlopen-1.c   |  19 ++
 .../linux/tst-dl_mseal-dlopen-2-1-noseal.c    |  19 ++
 .../unix/sysv/linux/tst-dl_mseal-dlopen-2-1.c |  19 ++
 .../sysv/linux/tst-dl_mseal-dlopen-2-noseal.c |  19 ++
 .../unix/sysv/linux/tst-dl_mseal-dlopen-2.c   |  19 ++
 .../sysv/linux/tst-dl_mseal-mod-1-noseal.c    |  19 ++
 sysdeps/unix/sysv/linux/tst-dl_mseal-mod-1.c  |  19 ++
 .../sysv/linux/tst-dl_mseal-mod-2-noseal.c    |  19 ++
 sysdeps/unix/sysv/linux/tst-dl_mseal-mod-2.c  |  19 ++
 sysdeps/unix/sysv/linux/tst-dl_mseal-noseal.c |  80 +++++
 .../sysv/linux/tst-dl_mseal-preload-noseal.c  |   1 +
 .../unix/sysv/linux/tst-dl_mseal-preload.c    |  19 ++
 .../unix/sysv/linux/tst-dl_mseal-skeleton.c   | 276 ++++++++++++++++++
 .../sysv/linux/tst-dl_mseal-static-noseal.c   |  45 +++
 sysdeps/unix/sysv/linux/tst-dl_mseal-static.c |  42 +++
 sysdeps/unix/sysv/linux/tst-dl_mseal.c        |  78 +++++
 sysdeps/unix/sysv/linux/tst-mseal.c           |  67 +++++
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   1 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   1 +
 sysdeps/x86/dl-prop.h                         |   8 +-
 86 files changed, 1514 insertions(+), 28 deletions(-)
 create mode 100644 sysdeps/generic/dl-mseal.h
 create mode 100644 sysdeps/generic/dl-prop-mseal.h
 create mode 100644 sysdeps/generic/libc-prop.h
 create mode 100644 sysdeps/unix/sysv/linux/dl-mseal.c
 create mode 100644 sysdeps/unix/sysv/linux/dl-mseal.h
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-auditmod-noseal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-auditmod.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-dlopen-1-1.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-dlopen-1.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-dlopen-2-1-noseal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-dlopen-2-1.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-dlopen-2-noseal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-dlopen-2.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-mod-1-noseal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-mod-1.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-mod-2-noseal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-mod-2.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-noseal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-preload-noseal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-preload.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-skeleton.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-static-noseal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal-static.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-dl_mseal.c
 create mode 100644 sysdeps/unix/sysv/linux/tst-mseal.c

Florian Weimer Feb. 3, 2025, 6:03 p.m. UTC | #1

* Adhemerval Zanella:

> The Linux 6.10 (8be7258aad44) added the mseal syscall that allows
> blocking some memory operations on the VMA range:
>
>  * Unmapping, moving to another location, extending or shrinking the
>    size, munmap, and mremap.
>  * Moving or expanding a different VMA into the current range, via
>    mremap.
>  * Modifying the memory range with mmap along with flag MAP_FIXED.
>  * Expanding the size with mremap.
>  * Change the protection flags with mprotect or pkey_mprotect.
>  * Destructive behaviors on anonymous memory, such as madvice with
>    MADV_DONTNEED.
>
> Memory sealing is a hardening mechanism [1] to avoid either remapping
> the memory segments or changing the memory protection segments layout
> by the dynamic loader (for instance, the RELRO hardening). The OpenBSD
> supports a similar hardening with the mimmutable syscall [2].

Have you checked that CRIU restore still works after these changes?  I
don't see how it can because the initial mappings can no longer be
unmapped.

I don't think the design is quite right.  The property flag (why not a
flag in a dynamic tag? these things are very complicated) should allow
flagging a program or initiallly loaded shared object, like CRIU or
ltrace, as incompatible with mseal.  If mseal is active for the process,
the mseal range should be controlled by a program header similar to
PT_GNU_RELRO, and not indiscriminately applied to the entire object.

I don't think we should apply this blindly to any NODELETE mapping.

Thanks,
Florian

Adhemerval Zanella Feb. 3, 2025, 6:35 p.m. UTC | #2

On 03/02/25 15:03, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> The Linux 6.10 (8be7258aad44) added the mseal syscall that allows
>> blocking some memory operations on the VMA range:
>>
>>  * Unmapping, moving to another location, extending or shrinking the
>>    size, munmap, and mremap.
>>  * Moving or expanding a different VMA into the current range, via
>>    mremap.
>>  * Modifying the memory range with mmap along with flag MAP_FIXED.
>>  * Expanding the size with mremap.
>>  * Change the protection flags with mprotect or pkey_mprotect.
>>  * Destructive behaviors on anonymous memory, such as madvice with
>>    MADV_DONTNEED.
>>
>> Memory sealing is a hardening mechanism [1] to avoid either remapping
>> the memory segments or changing the memory protection segments layout
>> by the dynamic loader (for instance, the RELRO hardening). The OpenBSD
>> supports a similar hardening with the mimmutable syscall [2].
> 
> Have you checked that CRIU restore still works after these changes?  I
> don't see how it can because the initial mappings can no longer be
> unmapped.

No, I will take a look but I am not if this applies since it is a opt-in
feature.  The CRIU adds some extra semantic requirements that is hard
to take in consideration, similar to the defunct prelink support.

But from CRIU discussion about a sealing extension to kernel mappings
(such as vDSO) I got the impression that memory sealing is not a problem
for userland (I guess that restoring just either ignore sealing or
apply itself).

> 
> I don't think the design is quite right.  The property flag (why not a
> flag in a dynamic tag? these things are very complicated) should allow
> flagging a program or initiallly loaded shared object, like CRIU or
> ltrace, as incompatible with mseal.  If mseal is active for the process,
> the mseal range should be controlled by a program header similar to
> PT_GNU_RELRO, and not indiscriminately applied to the entire object.

A dynamic tag would add support only for ET_DYN, where a GNU attribute
allows for ET_EXEC as well (similar to recent hardening mechanism like
BTI, GCS, etc.).

An the range approach is an extra complexity that I don't really see
much gain, since the idea of PT_LOAD are to being *immutable* over process
execution and not selective applied (different than PT_GNU_RELRO where
only part of memory range should be RO).

A range-based approach will still allow a misconfigured binary to be 
partially sealed, which defeats the whole idea of the hardening (similar
to partial relro).

> 
> I don't think we should apply this blindly to any NODELETE mapping.

It is not, it is controlled by the attribute presence. The NODELETE
just allow shared library with the sealing attribute to be sealed
with dlopen. 

> 
> Thanks,
> Florian
>

Florian Weimer Feb. 3, 2025, 7:05 p.m. UTC | #3

* Adhemerval Zanella Netto:

> On 03/02/25 15:03, Florian Weimer wrote:
>> * Adhemerval Zanella:
>> 
>>> The Linux 6.10 (8be7258aad44) added the mseal syscall that allows
>>> blocking some memory operations on the VMA range:
>>>
>>>  * Unmapping, moving to another location, extending or shrinking the
>>>    size, munmap, and mremap.
>>>  * Moving or expanding a different VMA into the current range, via
>>>    mremap.
>>>  * Modifying the memory range with mmap along with flag MAP_FIXED.
>>>  * Expanding the size with mremap.
>>>  * Change the protection flags with mprotect or pkey_mprotect.
>>>  * Destructive behaviors on anonymous memory, such as madvice with
>>>    MADV_DONTNEED.
>>>
>>> Memory sealing is a hardening mechanism [1] to avoid either remapping
>>> the memory segments or changing the memory protection segments layout
>>> by the dynamic loader (for instance, the RELRO hardening). The OpenBSD
>>> supports a similar hardening with the mimmutable syscall [2].
>> 
>> Have you checked that CRIU restore still works after these changes?  I
>> don't see how it can because the initial mappings can no longer be
>> unmapped.
>
> No, I will take a look but I am not if this applies since it is a opt-in
> feature.  The CRIU adds some extra semantic requirements that is hard
> to take in consideration, similar to the defunct prelink support.
>
> But from CRIU discussion about a sealing extension to kernel mappings
> (such as vDSO) I got the impression that memory sealing is not a problem
> for userland (I guess that restoring just either ignore sealing or
> apply itself).

CRIU needs to be able to unmap everything that was initially loaded by
the kernel and glibc.  This will stop working if we use mseal for glibc
itself.

> A dynamic tag would add support only for ET_DYN, where a GNU attribute
> allows for ET_EXEC as well (similar to recent hardening mechanism like
> BTI, GCS, etc.).

We don't need the compatibility indicator on static executables because
incompatible static binaries won't have the PT_* header that requests
sealing.

> An the range approach is an extra complexity that I don't really see
> much gain, since the idea of PT_LOAD are to being *immutable* over process
> execution and not selective applied (different than PT_GNU_RELRO where
> only part of memory range should be RO).
>
> A range-based approach will still allow a misconfigured binary to be 
> partially sealed, which defeats the whole idea of the hardening (similar
> to partial relro).

But partial sealing could be a feature, too.  For example, in libgcc_s,
we might want to keep the root of the dynamic unwinding data structure
in read-only memory most of the time, as a post-exploitation
countermeasure that makes it harder to run arbitrary DWARF programs.

Thanks,
Florian

Adhemerval Zanella Feb. 3, 2025, 7:25 p.m. UTC | #4

On 03/02/25 16:05, Florian Weimer wrote:
> * Adhemerval Zanella Netto:
> 
>> On 03/02/25 15:03, Florian Weimer wrote:
>>> * Adhemerval Zanella:
>>>
>>>> The Linux 6.10 (8be7258aad44) added the mseal syscall that allows
>>>> blocking some memory operations on the VMA range:
>>>>
>>>>  * Unmapping, moving to another location, extending or shrinking the
>>>>    size, munmap, and mremap.
>>>>  * Moving or expanding a different VMA into the current range, via
>>>>    mremap.
>>>>  * Modifying the memory range with mmap along with flag MAP_FIXED.
>>>>  * Expanding the size with mremap.
>>>>  * Change the protection flags with mprotect or pkey_mprotect.
>>>>  * Destructive behaviors on anonymous memory, such as madvice with
>>>>    MADV_DONTNEED.
>>>>
>>>> Memory sealing is a hardening mechanism [1] to avoid either remapping
>>>> the memory segments or changing the memory protection segments layout
>>>> by the dynamic loader (for instance, the RELRO hardening). The OpenBSD
>>>> supports a similar hardening with the mimmutable syscall [2].
>>>
>>> Have you checked that CRIU restore still works after these changes?  I
>>> don't see how it can because the initial mappings can no longer be
>>> unmapped.
>>
>> No, I will take a look but I am not if this applies since it is a opt-in
>> feature.  The CRIU adds some extra semantic requirements that is hard
>> to take in consideration, similar to the defunct prelink support.
>>
>> But from CRIU discussion about a sealing extension to kernel mappings
>> (such as vDSO) I got the impression that memory sealing is not a problem
>> for userland (I guess that restoring just either ignore sealing or
>> apply itself).
> 
> CRIU needs to be able to unmap everything that was initially loaded by
> the kernel and glibc.  This will stop working if we use mseal for glibc
> itself.

So in this case the easiest way it to filter of mseal (with seccomp or
something related) and disable sealing.  I don't have a easy solution.

> 
>> A dynamic tag would add support only for ET_DYN, where a GNU attribute
>> allows for ET_EXEC as well (similar to recent hardening mechanism like
>> BTI, GCS, etc.).
> 
> We don't need the compatibility indicator on static executables because
> incompatible static binaries won't have the PT_* header that requests
> sealing.

But we will still need to mark the ET_EXEC as opt-in for sealing.  A dynamic
tag will be only enabled for ET_DYN.

> 
>> An the range approach is an extra complexity that I don't really see
>> much gain, since the idea of PT_LOAD are to being *immutable* over process
>> execution and not selective applied (different than PT_GNU_RELRO where
>> only part of memory range should be RO).
>>
>> A range-based approach will still allow a misconfigured binary to be 
>> partially sealed, which defeats the whole idea of the hardening (similar
>> to partial relro).
> 
> But partial sealing could be a feature, too.  For example, in libgcc_s,
> we might want to keep the root of the dynamic unwinding data structure
> in read-only memory most of the time, as a post-exploitation
> countermeasure that makes it harder to run arbitrary DWARF programs.

Right, but I feel that this is out of scope of the current proposed approach,
where the idea is to only mark the binary to seal the current immutable
easily obtained loaded ELF segments.

Is this allocated dynamic during process execution of defined during build?
Can't you put this on relro segment, with a constructor to do any required
initialization?

Florian Weimer Feb. 3, 2025, 7:40 p.m. UTC | #5

* Adhemerval Zanella Netto:

>> CRIU needs to be able to unmap everything that was initially loaded by
>> the kernel and glibc.  This will stop working if we use mseal for glibc
>> itself.
>
> So in this case the easiest way it to filter of mseal (with seccomp or
> something related) and disable sealing.  I don't have a easy solution.

Please test with CRIU and trace and find a way to make them work again
if they are broken.

>>> A dynamic tag would add support only for ET_DYN, where a GNU attribute
>>> allows for ET_EXEC as well (similar to recent hardening mechanism like
>>> BTI, GCS, etc.).
>> 
>> We don't need the compatibility indicator on static executables because
>> incompatible static binaries won't have the PT_* header that requests
>> sealing.
>
> But we will still need to mark the ET_EXEC as opt-in for sealing.  A dynamic
> tag will be only enabled for ET_DYN.

If the PT_* address ranges are present, the binary opts in to sealing.

> Is this allocated dynamic during process execution of defined during build?
> Can't you put this on relro segment, with a constructor to do any required
> initialization?

Constructors currently run after RELRO has been applied.

Thanks,
Florian

Adhemerval Zanella Feb. 3, 2025, 8:04 p.m. UTC | #6

On 03/02/25 16:40, Florian Weimer wrote:
> * Adhemerval Zanella Netto:
> 
>>> CRIU needs to be able to unmap everything that was initially loaded by
>>> the kernel and glibc.  This will stop working if we use mseal for glibc
>>> itself.
>>
>> So in this case the easiest way it to filter of mseal (with seccomp or
>> something related) and disable sealing.  I don't have a easy solution.
> 
> Please test with CRIU and trace and find a way to make them work again
> if they are broken.

I don't see a way to make CRIU work where it is orthogonal of the idea of
immutability of memory sealing.  But I don't have much knowledge of how CRIU
work internally, maybe someone more well versed in this project could help us
out here.

> 
>>>> A dynamic tag would add support only for ET_DYN, where a GNU attribute
>>>> allows for ET_EXEC as well (similar to recent hardening mechanism like
>>>> BTI, GCS, etc.).
>>>
>>> We don't need the compatibility indicator on static executables because
>>> incompatible static binaries won't have the PT_* header that requests
>>> sealing.
>>
>> But we will still need to mark the ET_EXEC as opt-in for sealing.  A dynamic
>> tag will be only enabled for ET_DYN.
> 
> If the PT_* address ranges are present, the binary opts in to sealing.

But you still need a opt-in mark to be memory sealed, that's the gnu attribute
marking.  A dynamic tag would not be present on ET_EXEC (although with PIE
being more common it might not be that common), so we still need something
else to mark the binary as opt-in to memory sealing.

> 
>> Is this allocated dynamic during process execution of defined during build?
>> Can't you put this on relro segment, with a constructor to do any required
>> initialization?
> 
> Constructors currently run after RELRO has been applied.

Right, so is this some metadata in .data that is initialized after or during
process execution?  If so, I think for this case it would be better to have
something like a special section, like ".gnu.mseal"; where the idea of
adding sealing range makes sense.  But again, I think this is different
proposal and not really a blocker for this hardening.

Cristian Rodríguez Feb. 4, 2025, 2:11 a.m. UTC | #7

On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Adhemerval Zanella Netto:
>
> >> CRIU needs to be able to unmap everything that was initially loaded by
> >> the kernel and glibc.  This will stop working if we use mseal for glibc
> >> itself.
> >
> > So in this case the easiest way it to filter of mseal (with seccomp or
> > something related) and disable sealing.  I don't have a easy solution.
>
> Please test with CRIU and trace and find a way to make them work again
> if they are broken.

that is a kernel problem afaik.. .why libc has to care about this limitation ?

Andrei Vagin Feb. 6, 2025, 9:15 a.m. UTC | #8

On Mon, Feb 03, 2025 at 11:11:56PM -0300, Cristian Rodríguez wrote:
> On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * Adhemerval Zanella Netto:
> >
> > >> CRIU needs to be able to unmap everything that was initially loaded by
> > >> the kernel and glibc.  This will stop working if we use mseal for glibc
> > >> itself.
> > >
> > > So in this case the easiest way it to filter of mseal (with seccomp or
> > > something related) and disable sealing.  I don't have a easy solution.
> >
> > Please test with CRIU and trace and find a way to make them work again
> > if they are broken.
> 
> that is a kernel problem afaik..

Could you please provide more details on why you think that is the
kernel issue?

btw: this reminds me another discussion about mseal on lkml:
https://lore.kernel.org/lkml/htdv44tqzi4jl2b7dwutsdwnh4tgrxq6xdvumi5wwu3hnh7sgw@tfwlal74ukx6/

> .why libc has to care about this limitation ?

CRIU has worked with glibc for many years... It's not just about CRIU;
other projects, such as gVisor and UML, are also likely to be affected.

Thanks,
Andrei

Adhemerval Zanella Feb. 6, 2025, 2:16 p.m. UTC | #9

On 03/02/25 17:04, Adhemerval Zanella Netto wrote:
> 
> 
> On 03/02/25 16:40, Florian Weimer wrote:
>> * Adhemerval Zanella Netto:
>>
>>>> CRIU needs to be able to unmap everything that was initially loaded by
>>>> the kernel and glibc.  This will stop working if we use mseal for glibc
>>>> itself.
>>>
>>> So in this case the easiest way it to filter of mseal (with seccomp or
>>> something related) and disable sealing.  I don't have a easy solution.
>>
>> Please test with CRIU and trace and find a way to make them work again
>> if they are broken.
> 
> I don't see a way to make CRIU work where it is orthogonal of the idea of
> immutability of memory sealing.  But I don't have much knowledge of how CRIU
> work internally, maybe someone more well versed in this project could help us
> out here.
> 
>>
>>>>> A dynamic tag would add support only for ET_DYN, where a GNU attribute
>>>>> allows for ET_EXEC as well (similar to recent hardening mechanism like
>>>>> BTI, GCS, etc.).
>>>>
>>>> We don't need the compatibility indicator on static executables because
>>>> incompatible static binaries won't have the PT_* header that requests
>>>> sealing.
>>>
>>> But we will still need to mark the ET_EXEC as opt-in for sealing.  A dynamic
>>> tag will be only enabled for ET_DYN.
>>
>> If the PT_* address ranges are present, the binary opts in to sealing.
> 
> But you still need a opt-in mark to be memory sealed, that's the gnu attribute
> marking.  A dynamic tag would not be present on ET_EXEC (although with PIE
> being more common it might not be that common), so we still need something
> else to mark the binary as opt-in to memory sealing.
> 
>>
>>> Is this allocated dynamic during process execution of defined during build?
>>> Can't you put this on relro segment, with a constructor to do any required
>>> initialization?
>>
>> Constructors currently run after RELRO has been applied.
> 
> Right, so is this some metadata in .data that is initialized after or during
> process execution?  If so, I think for this case it would be better to have
> something like a special section, like ".gnu.mseal"; where the idea of
> adding sealing range makes sense.  But again, I think this is different
> proposal and not really a blocker for this hardening.
> 

Ok, I thought more about it and I still not sure if a range base is the ideal
solution for this.  The whole idea of memory sealing if the assumption of that
all PT_LOAD of the binary should be *immutable* after process startup; meaning 
that if the program intends to change anything it would be better to be explicit 
mark as so.

The OpenBSD memory sealing supported was added as default security hardening
(no opt-out, different that this proposal) and later they found the need to add
PT_OPENBSD_MUTABLE to handle some malloc metadata similar to the problem you bring.

I think we can use a add a PT_GNU_SEAL instead with the expected range for sealing; 
so in the future a binary can adjust the range if required.  I think it is somewhat 
more complex, since we will need to some extra constraints like the VirtAddr should
be page-aligned, there should be no gap, the end should be in a allocated VMA.  There
is also the extra complication that we will need to use a different strategy for 
non-contiguous mapping (as done by the kernel in some cases); which adds even more 
complexity.

Another solution is once we do require a way to mark the memory as mutable
(so loader will remove them during sealing) to add a PT_GNU_NOSEAL similar to
what OpenBSD does.  It makes more sense to me because if binary does require,
it knows exactly the range it requires to be mutable by marking it on a different
section (.gnu.mutable or something like it).

Adhemerval Zanella Feb. 6, 2025, 2:25 p.m. UTC | #10

On 06/02/25 06:15, Andrei Vagin wrote:
> On Mon, Feb 03, 2025 at 11:11:56PM -0300, Cristian Rodríguez wrote:
>> On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote:
>>>
>>> * Adhemerval Zanella Netto:
>>>
>>>>> CRIU needs to be able to unmap everything that was initially loaded by
>>>>> the kernel and glibc.  This will stop working if we use mseal for glibc
>>>>> itself.
>>>>
>>>> So in this case the easiest way it to filter of mseal (with seccomp or
>>>> something related) and disable sealing.  I don't have a easy solution.
>>>
>>> Please test with CRIU and trace and find a way to make them work again
>>> if they are broken.
>>
>> that is a kernel problem afaik..
> 
> Could you please provide more details on why you think that is the
> kernel issue?
> 
> btw: this reminds me another discussion about mseal on lkml:
> https://lore.kernel.org/lkml/htdv44tqzi4jl2b7dwutsdwnh4tgrxq6xdvumi5wwu3hnh7sgw@tfwlal74ukx6/
> 
>> .why libc has to care about this limitation ?
> 
> CRIU has worked with glibc for many years... It's not just about CRIU;
> other projects, such as gVisor and UML, are also likely to be affected.

The current proposal is a opt-in feature, but also without a way to disable it
(similar to how RELRO is enableD).

I don't have much experience on how CRIU or gVisor works internally, but if
any requires to change any metadata (munmap, mprotect) of the PT_LOAD elf 
segments after startup this basically defeats the whole idea of the memory
sealing hardening.

I don't see a way to support both semantics without some extra kernel support,
where either you can mark some process with extra credentials to do the 
required VMA operations (like process_madvise, etc.) or disable sealing during 
the snapshot.

The mseal usage idea was primarily for program loaders, similar to how
mimmutable for OpenBSD; but it seems that some programs also intend to
use the syscall directly for some internal hardening (like Chrome). How
CRIU/gVisor would handle such scenarios?

Aleksandr Mikhalitsyn Feb. 6, 2025, 6:03 p.m. UTC | #11

On Thu, Feb 6, 2025 at 3:25 PM Adhemerval Zanella Netto
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 06/02/25 06:15, Andrei Vagin wrote:
> > On Mon, Feb 03, 2025 at 11:11:56PM -0300, Cristian Rodríguez wrote:
> >> On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote:
> >>>
> >>> * Adhemerval Zanella Netto:
> >>>
> >>>>> CRIU needs to be able to unmap everything that was initially loaded by
> >>>>> the kernel and glibc.  This will stop working if we use mseal for glibc
> >>>>> itself.
> >>>>
> >>>> So in this case the easiest way it to filter of mseal (with seccomp or
> >>>> something related) and disable sealing.  I don't have a easy solution.
> >>>
> >>> Please test with CRIU and trace and find a way to make them work again
> >>> if they are broken.
> >>
> >> that is a kernel problem afaik..
> >
> > Could you please provide more details on why you think that is the
> > kernel issue?
> >
> > btw: this reminds me another discussion about mseal on lkml:
> > https://lore.kernel.org/lkml/htdv44tqzi4jl2b7dwutsdwnh4tgrxq6xdvumi5wwu3hnh7sgw@tfwlal74ukx6/
> >
> >> .why libc has to care about this limitation ?
> >
> > CRIU has worked with glibc for many years... It's not just about CRIU;
> > other projects, such as gVisor and UML, are also likely to be affected.
>
> The current proposal is a opt-in feature, but also without a way to disable it
> (similar to how RELRO is enableD).
>
> I don't have much experience on how CRIU or gVisor works internally, but if
> any requires to change any metadata (munmap, mprotect) of the PT_LOAD elf
> segments after startup this basically defeats the whole idea of the memory
> sealing hardening.
>
> I don't see a way to support both semantics without some extra kernel support,
> where either you can mark some process with extra credentials to do the
> required VMA operations (like process_madvise, etc.) or disable sealing during
> the snapshot.
>
> The mseal usage idea was primarily for program loaders, similar to how
> mimmutable for OpenBSD; but it seems that some programs also intend to
> use the syscall directly for some internal hardening (like Chrome). How
> CRIU/gVisor would handle such scenarios?

Dear friends,

I've quickly read a patchset [PATCH v8 0/8] Add support for memory
sealing (https://sourceware.org/pipermail/libc-alpha/2025-January/164361.html)
and noticed that on
https://sourceware.org/pipermail/libc-alpha/2025-January/164368.html
it's said:
>The GNU_PROPERTY_MEMORY_SEAL enforcement depends on whether the kernel
>supports the mseal syscall and how glibc is configured.  On the default
>configuration that aims to support older kernel releases, the memory
>sealing attribute is taken as a hint. If glibc is configured with a
>minimum kernel of 6.10, where mseal is implied to be supported,
>sealing is enforced.

=> if I understand it right, it makes memory sealing to be enabled by
default if the kernel supports it even without a linker flag, right?

I don't really understand what "glibc is configured with a minimum
kernel of 6.10" means from the user perspective.
I'm not very familiar with glibc internals, so can somebody put some
light on this, please?

I can't see how this can break the CRIU dump for us (I believe it
shouldn't but still worth checking), but for CRIU restore it's
definitely a problem
and reminds me of the rseq()&CRIU story we had a few years ago. My
current understanding is:

*during CRIU restore*
0. somehow disable mseal for CRIU binary itself, to make sure that
when CRIU do clone() we don't get any mappings sealed
1. restore all memory mappings of the restorable process without
mseal() applied to them
2. at the later criu restore stage go over them and apply mseal()

I have a bad feeling that I still miss something, but even step 0 is a
problem right now if we go with the current approach from this
patch series, isn't it?

Kind regards,
Alex

Adhemerval Zanella Feb. 6, 2025, 7:47 p.m. UTC | #12

On 06/02/25 15:03, Aleksandr Mikhalitsyn wrote:
> On Thu, Feb 6, 2025 at 3:25 PM Adhemerval Zanella Netto
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>>
>> On 06/02/25 06:15, Andrei Vagin wrote:
>>> On Mon, Feb 03, 2025 at 11:11:56PM -0300, Cristian Rodríguez wrote:
>>>> On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote:
>>>>>
>>>>> * Adhemerval Zanella Netto:
>>>>>
>>>>>>> CRIU needs to be able to unmap everything that was initially loaded by
>>>>>>> the kernel and glibc.  This will stop working if we use mseal for glibc
>>>>>>> itself.
>>>>>>
>>>>>> So in this case the easiest way it to filter of mseal (with seccomp or
>>>>>> something related) and disable sealing.  I don't have a easy solution.
>>>>>
>>>>> Please test with CRIU and trace and find a way to make them work again
>>>>> if they are broken.
>>>>
>>>> that is a kernel problem afaik..
>>>
>>> Could you please provide more details on why you think that is the
>>> kernel issue?
>>>
>>> btw: this reminds me another discussion about mseal on lkml:
>>> https://lore.kernel.org/lkml/htdv44tqzi4jl2b7dwutsdwnh4tgrxq6xdvumi5wwu3hnh7sgw@tfwlal74ukx6/
>>>
>>>> .why libc has to care about this limitation ?
>>>
>>> CRIU has worked with glibc for many years... It's not just about CRIU;
>>> other projects, such as gVisor and UML, are also likely to be affected.
>>
>> The current proposal is a opt-in feature, but also without a way to disable it
>> (similar to how RELRO is enableD).
>>
>> I don't have much experience on how CRIU or gVisor works internally, but if
>> any requires to change any metadata (munmap, mprotect) of the PT_LOAD elf
>> segments after startup this basically defeats the whole idea of the memory
>> sealing hardening.
>>
>> I don't see a way to support both semantics without some extra kernel support,
>> where either you can mark some process with extra credentials to do the
>> required VMA operations (like process_madvise, etc.) or disable sealing during
>> the snapshot.
>>
>> The mseal usage idea was primarily for program loaders, similar to how
>> mimmutable for OpenBSD; but it seems that some programs also intend to
>> use the syscall directly for some internal hardening (like Chrome). How
>> CRIU/gVisor would handle such scenarios?
> 
> Dear friends,
> 
> I've quickly read a patchset [PATCH v8 0/8] Add support for memory
> sealing (https://sourceware.org/pipermail/libc-alpha/2025-January/164361.html)
> and noticed that on
> https://sourceware.org/pipermail/libc-alpha/2025-January/164368.html
> it's said:
>> The GNU_PROPERTY_MEMORY_SEAL enforcement depends on whether the kernel
>> supports the mseal syscall and how glibc is configured.  On the default
>> configuration that aims to support older kernel releases, the memory
>> sealing attribute is taken as a hint. If glibc is configured with a
>> minimum kernel of 6.10, where mseal is implied to be supported,
>> sealing is enforced.
> 
> => if I understand it right, it makes memory sealing to be enabled by
> default if the kernel supports it even without a linker flag, right?
> 
> I don't really understand what "glibc is configured with a minimum
> kernel of 6.10" means from the user perspective.
> I'm not very familiar with glibc internals, so can somebody put some
> light on this, please?

On glibc has a minimum support kernel version of 3.2; but some 
architectures override it (either because the ABI was added in newer
versions, or due some other reason).

We also have an option on where you can build glibc assuming it will
always run on a specific kernel version (--enable-kernel=x.y.z).  On 
previous releases we enforced by checking the kernel version at loading
time, but currently glibc only uses to assume that certain syscall are 
always present (so there is no need to use fallbacks or handle ENOSYS).

So if you build glibc with --enable-kernel=6.10 it means that mseal
is expected to be always usable, ENOSYS is not possible, and thus any
syscall failure is expected to be an error (assuming that we are passing 
valid arguments).

If --enable-kernel is not used, it means that glibc can run on a kernel 
without mseal, and thus memory sealing can not be applied (we still might 
enforce it, but I think since we do have a way to enforce with  
--enable-kernel there is no urgent need for it).

In any case, memory sealing will be only applied in the presence
of GNU_PROPERTY_MEMORY_SEAL.

> 
> I can't see how this can break the CRIU dump for us (I believe it
> shouldn't but still worth checking), but for CRIU restore it's
> definitely a problem
> and reminds me of the rseq()&CRIU story we had a few years ago. My
> current understanding is:
> 
> *during CRIU restore*
> 0. somehow disable mseal for CRIU binary itself, to make sure that
> when CRIU do clone() we don't get any mappings sealed
> 1. restore all memory mappings of the restorable process without
> mseal() applied to them
> 2. at the later criu restore stage go over them and apply mseal()
> 
> I have a bad feeling that I still miss something, but even step 0 is a
> problem right now if we go with the current approach from this
> patch series, isn't it?

I am not familiar on how CRIU snapshot/restore is done, and how is
responsible to do each step. Is the kernel involved in any dump step,
meaning that you need either to start the process with some IPC, or it
just done in userland (with ptrace or other way to stop the process
plus reading /proc/mem)?

And on restore, how is this accomplished?

Andrei Vagin Feb. 7, 2025, 12:47 a.m. UTC | #13

On Thu, Feb 06, 2025 at 04:47:32PM -0300, Adhemerval Zanella Netto wrote:
> 
> 
> On 06/02/25 15:03, Aleksandr Mikhalitsyn wrote:
> > On Thu, Feb 6, 2025 at 3:25 PM Adhemerval Zanella Netto
> > <adhemerval.zanella@linaro.org> wrote:
> >>
> >>
> >>
> >> On 06/02/25 06:15, Andrei Vagin wrote:
> >>> On Mon, Feb 03, 2025 at 11:11:56PM -0300, Cristian Rodríguez wrote:
> >>>> On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote:
> >>>>>
> >>>>> * Adhemerval Zanella Netto:
> >>>>>
> >>>>>>> CRIU needs to be able to unmap everything that was initially loaded by
> >>>>>>> the kernel and glibc.  This will stop working if we use mseal for glibc
> >>>>>>> itself.
> >>>>>>
> >>>>>> So in this case the easiest way it to filter of mseal (with seccomp or
> >>>>>> something related) and disable sealing.  I don't have a easy solution.
> >>>>>
> >>>>> Please test with CRIU and trace and find a way to make them work again
> >>>>> if they are broken.
> >>>>
> >>>> that is a kernel problem afaik..
> >>>
> >>> Could you please provide more details on why you think that is the
> >>> kernel issue?
> >>>
> >>> btw: this reminds me another discussion about mseal on lkml:
> >>> https://lore.kernel.org/lkml/htdv44tqzi4jl2b7dwutsdwnh4tgrxq6xdvumi5wwu3hnh7sgw@tfwlal74ukx6/
> >>>
> >>>> .why libc has to care about this limitation ?
> >>>
> >>> CRIU has worked with glibc for many years... It's not just about CRIU;
> >>> other projects, such as gVisor and UML, are also likely to be affected.
> >>
> >> The current proposal is a opt-in feature, but also without a way to disable it
> >> (similar to how RELRO is enableD).
> >>
> >> I don't have much experience on how CRIU or gVisor works internally, but if
> >> any requires to change any metadata (munmap, mprotect) of the PT_LOAD elf
> >> segments after startup this basically defeats the whole idea of the memory
> >> sealing hardening.
> >>
> >> I don't see a way to support both semantics without some extra kernel support,
> >> where either you can mark some process with extra credentials to do the
> >> required VMA operations (like process_madvise, etc.) or disable sealing during
> >> the snapshot.
> >>
> >> The mseal usage idea was primarily for program loaders, similar to how
> >> mimmutable for OpenBSD; but it seems that some programs also intend to
> >> use the syscall directly for some internal hardening (like Chrome). How
> >> CRIU/gVisor would handle such scenarios?
> > 
> > Dear friends,
> > 
> > I've quickly read a patchset [PATCH v8 0/8] Add support for memory
> > sealing (https://sourceware.org/pipermail/libc-alpha/2025-January/164361.html)
> > and noticed that on
> > https://sourceware.org/pipermail/libc-alpha/2025-January/164368.html
> > it's said:
> >> The GNU_PROPERTY_MEMORY_SEAL enforcement depends on whether the kernel
> >> supports the mseal syscall and how glibc is configured.  On the default
> >> configuration that aims to support older kernel releases, the memory
> >> sealing attribute is taken as a hint. If glibc is configured with a
> >> minimum kernel of 6.10, where mseal is implied to be supported,
> >> sealing is enforced.
> > 
> > => if I understand it right, it makes memory sealing to be enabled by
> > default if the kernel supports it even without a linker flag, right?
> > 
> > I don't really understand what "glibc is configured with a minimum
> > kernel of 6.10" means from the user perspective.
> > I'm not very familiar with glibc internals, so can somebody put some
> > light on this, please?
> 
> On glibc has a minimum support kernel version of 3.2; but some 
> architectures override it (either because the ABI was added in newer
> versions, or due some other reason).
> 
> We also have an option on where you can build glibc assuming it will
> always run on a specific kernel version (--enable-kernel=x.y.z).  On 
> previous releases we enforced by checking the kernel version at loading
> time, but currently glibc only uses to assume that certain syscall are 
> always present (so there is no need to use fallbacks or handle ENOSYS).
> 
> So if you build glibc with --enable-kernel=6.10 it means that mseal
> is expected to be always usable, ENOSYS is not possible, and thus any
> syscall failure is expected to be an error (assuming that we are passing 
> valid arguments).
> 
> If --enable-kernel is not used, it means that glibc can run on a kernel 
> without mseal, and thus memory sealing can not be applied (we still might 
> enforce it, but I think since we do have a way to enforce with  
> --enable-kernel there is no urgent need for it).
> 
> In any case, memory sealing will be only applied in the presence
> of GNU_PROPERTY_MEMORY_SEAL.

But this flag is considered for a binary and its libraries separately.
If libc is compiled with GNU_PROPERTY_MEMORY_SEAL, all binaries that
load this libc will have sealed mappings, regardless of whether the
binary itself has the flag or not.

I compiled glibc with the patches and performed a simple experiment:

```
[root@bc2868439161 install]# cat test.c
int main() {
	return 0;
}
[root@bc2868439161 install]# gcc -Wl,-dynamic-linker,/mnt/glibc/install/lib/ld-linux-x86-64.so.2 -Wl,-z,nomemory-seal test.c
[root@bc2868439161 install]# strace -e mseal,openat,mmap ./a.out
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda54b59000
mseal(0x7fda54b59000, 8192, 0)          = 0
openat(AT_FDCWD, "/mnt/glibc/install/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
mmap(NULL, 2001, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fda54b58000
openat(AT_FDCWD, "/mnt/glibc/install/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
mmap(NULL, 1998928, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fda5496f000
mmap(0x7fda54ace000, 483328, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15f000) = 0x7fda54ace000
mmap(0x7fda54b44000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d5000) = 0x7fda54b44000
mmap(0x7fda54b4a000, 53328, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fda54b4a000
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda5496c000
mseal(0x7fda5496c000, 12288, 0)         = 0
mseal(0x7fda5496f000, 1998928, 0)       = 0
mseal(0x7fda54b61000, 163665, 0)        = 0
mseal(0x7fda54b89000, 45544, 0)         = 0
mseal(0x7fda54b95000, 13096, 0)         = 0
+++ exited with 0 +++
```

The test binary was compiled without the GNU_PROPERTY_MEMORY_SEAL flag.
However, we can see that all glibc mappings have been sealed. The
initial mapping is sealed even before libc.so is loaded, likely because
ld.so also has the GNU_PROPERTY_MEMORY_SEAL flag.

For operation, CRIU needs to be able to unmap all its mappings, which is
essential for restoring process address spaces.  This means we need to
compile CRIU so that its process doesn't have any sealed mappings.

The same requirement applies to gVisor and UML, which both use stub
processes to manage guest address spaces.  Basically, the main process
forks a new process, unmaps all existing mappings in the forked process,
and then populates it with guest mappings.

> 
> > 
> > I can't see how this can break the CRIU dump for us (I believe it
> > shouldn't but still worth checking), but for CRIU restore it's
> > definitely a problem
> > and reminds me of the rseq()&CRIU story we had a few years ago. My
> > current understanding is:
> > 
> > *during CRIU restore*
> > 0. somehow disable mseal for CRIU binary itself, to make sure that
> > when CRIU do clone() we don't get any mappings sealed
> > 1. restore all memory mappings of the restorable process without
> > mseal() applied to them
> > 2. at the later criu restore stage go over them and apply mseal()
> > 
> > I have a bad feeling that I still miss something, but even step 0 is a
> > problem right now if we go with the current approach from this
> > patch series, isn't it?
> 
> I am not familiar on how CRIU snapshot/restore is done, and how is
> responsible to do each step. Is the kernel involved in any dump step,
> meaning that you need either to start the process with some IPC, or it
> just done in userland (with ptrace or other way to stop the process
> plus reading /proc/mem)?

It is done in userland. CRIU uses ptrace, proc and even injects a small
binary code in a target process to collect all required information to
be able to restore the process in the same state later.

> 
> And on restore, how is this accomplished? 

The process is a bit more complicated, but for a basic understanding, it
involves the following steps: fork a new process; restore all mappings;
unmap all CRIU mappings; remap the restored mappings to the correct
addresses; and finally, resume the process.

Thanks,
Andrei

Adhemerval Zanella Feb. 7, 2025, 12:10 p.m. UTC | #14

On 06/02/25 21:47, Andrei Vagin wrote:
> On Thu, Feb 06, 2025 at 04:47:32PM -0300, Adhemerval Zanella Netto wrote:
>>
>>
>> On 06/02/25 15:03, Aleksandr Mikhalitsyn wrote:
>>> On Thu, Feb 6, 2025 at 3:25 PM Adhemerval Zanella Netto
>>> <adhemerval.zanella@linaro.org> wrote:
>>>>
>>>>
>>>>
>>>> On 06/02/25 06:15, Andrei Vagin wrote:
>>>>> On Mon, Feb 03, 2025 at 11:11:56PM -0300, Cristian Rodríguez wrote:
>>>>>> On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote:
>>>>>>>
>>>>>>> * Adhemerval Zanella Netto:
>>>>>>>
>>>>>>>>> CRIU needs to be able to unmap everything that was initially loaded by
>>>>>>>>> the kernel and glibc.  This will stop working if we use mseal for glibc
>>>>>>>>> itself.
>>>>>>>>
>>>>>>>> So in this case the easiest way it to filter of mseal (with seccomp or
>>>>>>>> something related) and disable sealing.  I don't have a easy solution.
>>>>>>>
>>>>>>> Please test with CRIU and trace and find a way to make them work again
>>>>>>> if they are broken.
>>>>>>
>>>>>> that is a kernel problem afaik..
>>>>>
>>>>> Could you please provide more details on why you think that is the
>>>>> kernel issue?
>>>>>
>>>>> btw: this reminds me another discussion about mseal on lkml:
>>>>> https://lore.kernel.org/lkml/htdv44tqzi4jl2b7dwutsdwnh4tgrxq6xdvumi5wwu3hnh7sgw@tfwlal74ukx6/
>>>>>
>>>>>> .why libc has to care about this limitation ?
>>>>>
>>>>> CRIU has worked with glibc for many years... It's not just about CRIU;
>>>>> other projects, such as gVisor and UML, are also likely to be affected.
>>>>
>>>> The current proposal is a opt-in feature, but also without a way to disable it
>>>> (similar to how RELRO is enableD).
>>>>
>>>> I don't have much experience on how CRIU or gVisor works internally, but if
>>>> any requires to change any metadata (munmap, mprotect) of the PT_LOAD elf
>>>> segments after startup this basically defeats the whole idea of the memory
>>>> sealing hardening.
>>>>
>>>> I don't see a way to support both semantics without some extra kernel support,
>>>> where either you can mark some process with extra credentials to do the
>>>> required VMA operations (like process_madvise, etc.) or disable sealing during
>>>> the snapshot.
>>>>
>>>> The mseal usage idea was primarily for program loaders, similar to how
>>>> mimmutable for OpenBSD; but it seems that some programs also intend to
>>>> use the syscall directly for some internal hardening (like Chrome). How
>>>> CRIU/gVisor would handle such scenarios?
>>>
>>> Dear friends,
>>>
>>> I've quickly read a patchset [PATCH v8 0/8] Add support for memory
>>> sealing (https://sourceware.org/pipermail/libc-alpha/2025-January/164361.html)
>>> and noticed that on
>>> https://sourceware.org/pipermail/libc-alpha/2025-January/164368.html
>>> it's said:
>>>> The GNU_PROPERTY_MEMORY_SEAL enforcement depends on whether the kernel
>>>> supports the mseal syscall and how glibc is configured.  On the default
>>>> configuration that aims to support older kernel releases, the memory
>>>> sealing attribute is taken as a hint. If glibc is configured with a
>>>> minimum kernel of 6.10, where mseal is implied to be supported,
>>>> sealing is enforced.
>>>
>>> => if I understand it right, it makes memory sealing to be enabled by
>>> default if the kernel supports it even without a linker flag, right?
>>>
>>> I don't really understand what "glibc is configured with a minimum
>>> kernel of 6.10" means from the user perspective.
>>> I'm not very familiar with glibc internals, so can somebody put some
>>> light on this, please?
>>
>> On glibc has a minimum support kernel version of 3.2; but some 
>> architectures override it (either because the ABI was added in newer
>> versions, or due some other reason).
>>
>> We also have an option on where you can build glibc assuming it will
>> always run on a specific kernel version (--enable-kernel=x.y.z).  On 
>> previous releases we enforced by checking the kernel version at loading
>> time, but currently glibc only uses to assume that certain syscall are 
>> always present (so there is no need to use fallbacks or handle ENOSYS).
>>
>> So if you build glibc with --enable-kernel=6.10 it means that mseal
>> is expected to be always usable, ENOSYS is not possible, and thus any
>> syscall failure is expected to be an error (assuming that we are passing 
>> valid arguments).
>>
>> If --enable-kernel is not used, it means that glibc can run on a kernel 
>> without mseal, and thus memory sealing can not be applied (we still might 
>> enforce it, but I think since we do have a way to enforce with  
>> --enable-kernel there is no urgent need for it).
>>
>> In any case, memory sealing will be only applied in the presence
>> of GNU_PROPERTY_MEMORY_SEAL.
> 
> But this flag is considered for a binary and its libraries separately.
> If libc is compiled with GNU_PROPERTY_MEMORY_SEAL, all binaries that
> load this libc will have sealed mappings, regardless of whether the
> binary itself has the flag or not.
> 
> I compiled glibc with the patches and performed a simple experiment:
> 
> ```
> [root@bc2868439161 install]# cat test.c
> int main() {
> 	return 0;
> }
> [root@bc2868439161 install]# gcc -Wl,-dynamic-linker,/mnt/glibc/install/lib/ld-linux-x86-64.so.2 -Wl,-z,nomemory-seal test.c
> [root@bc2868439161 install]# strace -e mseal,openat,mmap ./a.out
> mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda54b59000
> mseal(0x7fda54b59000, 8192, 0)          = 0
> openat(AT_FDCWD, "/mnt/glibc/install/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
> mmap(NULL, 2001, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fda54b58000
> openat(AT_FDCWD, "/mnt/glibc/install/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
> mmap(NULL, 1998928, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fda5496f000
> mmap(0x7fda54ace000, 483328, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15f000) = 0x7fda54ace000
> mmap(0x7fda54b44000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d5000) = 0x7fda54b44000
> mmap(0x7fda54b4a000, 53328, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fda54b4a000
> mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda5496c000
> mseal(0x7fda5496c000, 12288, 0)         = 0
> mseal(0x7fda5496f000, 1998928, 0)       = 0
> mseal(0x7fda54b61000, 163665, 0)        = 0
> mseal(0x7fda54b89000, 45544, 0)         = 0
> mseal(0x7fda54b95000, 13096, 0)         = 0
> +++ exited with 0 +++
> ```
> 
> The test binary was compiled without the GNU_PROPERTY_MEMORY_SEAL flag.
> However, we can see that all glibc mappings have been sealed. The
> initial mapping is sealed even before libc.so is loaded, likely because
> ld.so also has the GNU_PROPERTY_MEMORY_SEAL flag.

Yes, this is controlled by a new configure flags [1], which is enabled by
default.  With --disable-default-memory-seal you can disable sealing for
glibc itself.

[1] https://patchwork.sourceware.org/project/glibc/patch/20250129172550.1119706-8-adhemerval.zanella@linaro.org/

> 
> For operation, CRIU needs to be able to unmap all its mappings, which is
> essential for restoring process address spaces.  This means we need to
> compile CRIU so that its process doesn't have any sealed mappings.
> 
> The same requirement applies to gVisor and UML, which both use stub
> processes to manage guest address spaces.  Basically, the main process
> forks a new process, unmaps all existing mappings in the forked process,
> and then populates it with guest mappings.

The main problem here is memory sealing idea is hardening mechanism to prevent
exactly this kind of operation.  And this does not help also if the program
uses mseal directly, like Chrome and maybe other intends to do. How do intend
to work with these scenarios?

On previous iterations of this patch I have a tunable to disable sealing,
where GNU_PROPERTY_MEMORY_SEAL is simple ignored.  I removed because this
is way a bypass the security hardening, and it also does help on fork case.

I still think we need some kernel help here, where a process can configure
itself (with a prctl or something related) to make a fork() process not
inherit the sealing bit to proper fix it without making this hardening
a opt-out feature (which I defeats the whole idea).  

Ideally it would require a new clone flag, and most likely a new fork symbol,
to avoid concurrent issues (where multiple thread sets a global state).

> 
>>
>>>
>>> I can't see how this can break the CRIU dump for us (I believe it
>>> shouldn't but still worth checking), but for CRIU restore it's
>>> definitely a problem
>>> and reminds me of the rseq()&CRIU story we had a few years ago. My
>>> current understanding is:
>>>
>>> *during CRIU restore*
>>> 0. somehow disable mseal for CRIU binary itself, to make sure that
>>> when CRIU do clone() we don't get any mappings sealed
>>> 1. restore all memory mappings of the restorable process without
>>> mseal() applied to them
>>> 2. at the later criu restore stage go over them and apply mseal()
>>>
>>> I have a bad feeling that I still miss something, but even step 0 is a
>>> problem right now if we go with the current approach from this
>>> patch series, isn't it?
>>
>> I am not familiar on how CRIU snapshot/restore is done, and how is
>> responsible to do each step. Is the kernel involved in any dump step,
>> meaning that you need either to start the process with some IPC, or it
>> just done in userland (with ptrace or other way to stop the process
>> plus reading /proc/mem)?
> 
> It is done in userland. CRIU uses ptrace, proc and even injects a small
> binary code in a target process to collect all required information to
> be able to restore the process in the same state later.
> 
>>
>> And on restore, how is this accomplished? 
> 
> The process is a bit more complicated, but for a basic understanding, it
> involves the following steps: fork a new process; restore all mappings;
> unmap all CRIU mappings; remap the restored mappings to the correct
> addresses; and finally, resume the process.
> 
> Thanks,
> Andrei

Adhemerval Zanella Feb. 7, 2025, 12:17 p.m. UTC | #15

On 07/02/25 09:10, Adhemerval Zanella Netto wrote:
> 
> 
> On 06/02/25 21:47, Andrei Vagin wrote:
>> On Thu, Feb 06, 2025 at 04:47:32PM -0300, Adhemerval Zanella Netto wrote:
>>>
>>>
>>> On 06/02/25 15:03, Aleksandr Mikhalitsyn wrote:
>>>> On Thu, Feb 6, 2025 at 3:25 PM Adhemerval Zanella Netto
>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 06/02/25 06:15, Andrei Vagin wrote:
>>>>>> On Mon, Feb 03, 2025 at 11:11:56PM -0300, Cristian Rodríguez wrote:
>>>>>>> On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote:
>>>>>>>>
>>>>>>>> * Adhemerval Zanella Netto:
>>>>>>>>
>>>>>>>>>> CRIU needs to be able to unmap everything that was initially loaded by
>>>>>>>>>> the kernel and glibc.  This will stop working if we use mseal for glibc
>>>>>>>>>> itself.
>>>>>>>>>
>>>>>>>>> So in this case the easiest way it to filter of mseal (with seccomp or
>>>>>>>>> something related) and disable sealing.  I don't have a easy solution.
>>>>>>>>
>>>>>>>> Please test with CRIU and trace and find a way to make them work again
>>>>>>>> if they are broken.
>>>>>>>
>>>>>>> that is a kernel problem afaik..
>>>>>>
>>>>>> Could you please provide more details on why you think that is the
>>>>>> kernel issue?
>>>>>>
>>>>>> btw: this reminds me another discussion about mseal on lkml:
>>>>>> https://lore.kernel.org/lkml/htdv44tqzi4jl2b7dwutsdwnh4tgrxq6xdvumi5wwu3hnh7sgw@tfwlal74ukx6/
>>>>>>
>>>>>>> .why libc has to care about this limitation ?
>>>>>>
>>>>>> CRIU has worked with glibc for many years... It's not just about CRIU;
>>>>>> other projects, such as gVisor and UML, are also likely to be affected.
>>>>>
>>>>> The current proposal is a opt-in feature, but also without a way to disable it
>>>>> (similar to how RELRO is enableD).
>>>>>
>>>>> I don't have much experience on how CRIU or gVisor works internally, but if
>>>>> any requires to change any metadata (munmap, mprotect) of the PT_LOAD elf
>>>>> segments after startup this basically defeats the whole idea of the memory
>>>>> sealing hardening.
>>>>>
>>>>> I don't see a way to support both semantics without some extra kernel support,
>>>>> where either you can mark some process with extra credentials to do the
>>>>> required VMA operations (like process_madvise, etc.) or disable sealing during
>>>>> the snapshot.
>>>>>
>>>>> The mseal usage idea was primarily for program loaders, similar to how
>>>>> mimmutable for OpenBSD; but it seems that some programs also intend to
>>>>> use the syscall directly for some internal hardening (like Chrome). How
>>>>> CRIU/gVisor would handle such scenarios?
>>>>
>>>> Dear friends,
>>>>
>>>> I've quickly read a patchset [PATCH v8 0/8] Add support for memory
>>>> sealing (https://sourceware.org/pipermail/libc-alpha/2025-January/164361.html)
>>>> and noticed that on
>>>> https://sourceware.org/pipermail/libc-alpha/2025-January/164368.html
>>>> it's said:
>>>>> The GNU_PROPERTY_MEMORY_SEAL enforcement depends on whether the kernel
>>>>> supports the mseal syscall and how glibc is configured.  On the default
>>>>> configuration that aims to support older kernel releases, the memory
>>>>> sealing attribute is taken as a hint. If glibc is configured with a
>>>>> minimum kernel of 6.10, where mseal is implied to be supported,
>>>>> sealing is enforced.
>>>>
>>>> => if I understand it right, it makes memory sealing to be enabled by
>>>> default if the kernel supports it even without a linker flag, right?
>>>>
>>>> I don't really understand what "glibc is configured with a minimum
>>>> kernel of 6.10" means from the user perspective.
>>>> I'm not very familiar with glibc internals, so can somebody put some
>>>> light on this, please?
>>>
>>> On glibc has a minimum support kernel version of 3.2; but some 
>>> architectures override it (either because the ABI was added in newer
>>> versions, or due some other reason).
>>>
>>> We also have an option on where you can build glibc assuming it will
>>> always run on a specific kernel version (--enable-kernel=x.y.z).  On 
>>> previous releases we enforced by checking the kernel version at loading
>>> time, but currently glibc only uses to assume that certain syscall are 
>>> always present (so there is no need to use fallbacks or handle ENOSYS).
>>>
>>> So if you build glibc with --enable-kernel=6.10 it means that mseal
>>> is expected to be always usable, ENOSYS is not possible, and thus any
>>> syscall failure is expected to be an error (assuming that we are passing 
>>> valid arguments).
>>>
>>> If --enable-kernel is not used, it means that glibc can run on a kernel 
>>> without mseal, and thus memory sealing can not be applied (we still might 
>>> enforce it, but I think since we do have a way to enforce with  
>>> --enable-kernel there is no urgent need for it).
>>>
>>> In any case, memory sealing will be only applied in the presence
>>> of GNU_PROPERTY_MEMORY_SEAL.
>>
>> But this flag is considered for a binary and its libraries separately.
>> If libc is compiled with GNU_PROPERTY_MEMORY_SEAL, all binaries that
>> load this libc will have sealed mappings, regardless of whether the
>> binary itself has the flag or not.
>>
>> I compiled glibc with the patches and performed a simple experiment:
>>
>> ```
>> [root@bc2868439161 install]# cat test.c
>> int main() {
>> 	return 0;
>> }
>> [root@bc2868439161 install]# gcc -Wl,-dynamic-linker,/mnt/glibc/install/lib/ld-linux-x86-64.so.2 -Wl,-z,nomemory-seal test.c
>> [root@bc2868439161 install]# strace -e mseal,openat,mmap ./a.out
>> mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda54b59000
>> mseal(0x7fda54b59000, 8192, 0)          = 0
>> openat(AT_FDCWD, "/mnt/glibc/install/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
>> mmap(NULL, 2001, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fda54b58000
>> openat(AT_FDCWD, "/mnt/glibc/install/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
>> mmap(NULL, 1998928, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fda5496f000
>> mmap(0x7fda54ace000, 483328, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15f000) = 0x7fda54ace000
>> mmap(0x7fda54b44000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d5000) = 0x7fda54b44000
>> mmap(0x7fda54b4a000, 53328, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fda54b4a000
>> mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda5496c000
>> mseal(0x7fda5496c000, 12288, 0)         = 0
>> mseal(0x7fda5496f000, 1998928, 0)       = 0
>> mseal(0x7fda54b61000, 163665, 0)        = 0
>> mseal(0x7fda54b89000, 45544, 0)         = 0
>> mseal(0x7fda54b95000, 13096, 0)         = 0
>> +++ exited with 0 +++
>> ```
>>
>> The test binary was compiled without the GNU_PROPERTY_MEMORY_SEAL flag.
>> However, we can see that all glibc mappings have been sealed. The
>> initial mapping is sealed even before libc.so is loaded, likely because
>> ld.so also has the GNU_PROPERTY_MEMORY_SEAL flag.
> 
> Yes, this is controlled by a new configure flags [1], which is enabled by
> default.  With --disable-default-memory-seal you can disable sealing for
> glibc itself.
> 
> [1] https://patchwork.sourceware.org/project/glibc/patch/20250129172550.1119706-8-adhemerval.zanella@linaro.org/
> 
>>
>> For operation, CRIU needs to be able to unmap all its mappings, which is
>> essential for restoring process address spaces.  This means we need to
>> compile CRIU so that its process doesn't have any sealed mappings.
>>
>> The same requirement applies to gVisor and UML, which both use stub
>> processes to manage guest address spaces.  Basically, the main process
>> forks a new process, unmaps all existing mappings in the forked process,
>> and then populates it with guest mappings.
> 
> The main problem here is memory sealing idea is hardening mechanism to prevent
> exactly this kind of operation.  And this does not help also if the program
> uses mseal directly, like Chrome and maybe other intends to do. How do intend
> to work with these scenarios?
> 
> On previous iterations of this patch I have a tunable to disable sealing,
> where GNU_PROPERTY_MEMORY_SEAL is simple ignored.  I removed because this
> is way a bypass the security hardening, and it also does help on fork case.
> 
> I still think we need some kernel help here, where a process can configure
> itself (with a prctl or something related) to make a fork() process not
> inherit the sealing bit to proper fix it without making this hardening
> a opt-out feature (which I defeats the whole idea).  
> 
> Ideally it would require a new clone flag, and most likely a new fork symbol,
> to avoid concurrent issues (where multiple thread sets a global state).

I am assuming here that restore can happen at any time, in a API like manner
(I am not sure if I understand how CRIU/UMP/gVisor works in all cases).

If the idea is to just have a wrapper binary that is linked against a glibc
to just do the restore maybe a simple solution like filtering mseal syscall
(so it act a noop) might work better.

Florian Weimer Feb. 11, 2025, 8:19 a.m. UTC | #16

* Cristian Rodríguez:

> On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * Adhemerval Zanella Netto:
>>
>> >> CRIU needs to be able to unmap everything that was initially loaded by
>> >> the kernel and glibc.  This will stop working if we use mseal for glibc
>> >> itself.
>> >
>> > So in this case the easiest way it to filter of mseal (with seccomp or
>> > something related) and disable sealing.  I don't have a easy solution.
>>
>> Please test with CRIU and trace and find a way to make them work again
>> if they are broken.
>
> that is a kernel problem afaik.. .why libc has to care about this limitation ?

We want CRIU and similar tools to be able to build with glibc 2.42 and
later, without distributions having to provide two separate glibc
versions.

Thanks,
Florian

Jeff Xu Feb. 11, 2025, 5:25 p.m. UTC | #17

On Tue, Feb 11, 2025 at 12:19 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Cristian Rodríguez:
>
> > On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * Adhemerval Zanella Netto:
> >>
> >> >> CRIU needs to be able to unmap everything that was initially loaded by
> >> >> the kernel and glibc.  This will stop working if we use mseal for glibc
> >> >> itself.
> >> >
> >> > So in this case the easiest way it to filter of mseal (with seccomp or
> >> > something related) and disable sealing.  I don't have a easy solution.
> >>
> >> Please test with CRIU and trace and find a way to make them work again
> >> if they are broken.
> >
> > that is a kernel problem afaik.. .why libc has to care about this limitation ?
>
> We want CRIU and similar tools to be able to build with glibc 2.42 and
> later, without distributions having to provide two separate glibc
> versions.
>
Statically linking with glibc can be a walk around solution.

Thanks
-Jeff


> Thanks,
> Florian
>

[v8,0/8] Add support for memory sealing

Message

Comments