Message ID | 20250129172550.1119706-1-adhemerval.zanella@linaro.org |
---|---|
Headers | show |
Series | Add support for memory sealing | expand |
* Adhemerval Zanella: > The Linux 6.10 (8be7258aad44) added the mseal syscall that allows > blocking some memory operations on the VMA range: > > * Unmapping, moving to another location, extending or shrinking the > size, munmap, and mremap. > * Moving or expanding a different VMA into the current range, via > mremap. > * Modifying the memory range with mmap along with flag MAP_FIXED. > * Expanding the size with mremap. > * Change the protection flags with mprotect or pkey_mprotect. > * Destructive behaviors on anonymous memory, such as madvice with > MADV_DONTNEED. > > Memory sealing is a hardening mechanism [1] to avoid either remapping > the memory segments or changing the memory protection segments layout > by the dynamic loader (for instance, the RELRO hardening). The OpenBSD > supports a similar hardening with the mimmutable syscall [2]. Have you checked that CRIU restore still works after these changes? I don't see how it can because the initial mappings can no longer be unmapped. I don't think the design is quite right. The property flag (why not a flag in a dynamic tag? these things are very complicated) should allow flagging a program or initiallly loaded shared object, like CRIU or ltrace, as incompatible with mseal. If mseal is active for the process, the mseal range should be controlled by a program header similar to PT_GNU_RELRO, and not indiscriminately applied to the entire object. I don't think we should apply this blindly to any NODELETE mapping. Thanks, Florian
On 03/02/25 15:03, Florian Weimer wrote: > * Adhemerval Zanella: > >> The Linux 6.10 (8be7258aad44) added the mseal syscall that allows >> blocking some memory operations on the VMA range: >> >> * Unmapping, moving to another location, extending or shrinking the >> size, munmap, and mremap. >> * Moving or expanding a different VMA into the current range, via >> mremap. >> * Modifying the memory range with mmap along with flag MAP_FIXED. >> * Expanding the size with mremap. >> * Change the protection flags with mprotect or pkey_mprotect. >> * Destructive behaviors on anonymous memory, such as madvice with >> MADV_DONTNEED. >> >> Memory sealing is a hardening mechanism [1] to avoid either remapping >> the memory segments or changing the memory protection segments layout >> by the dynamic loader (for instance, the RELRO hardening). The OpenBSD >> supports a similar hardening with the mimmutable syscall [2]. > > Have you checked that CRIU restore still works after these changes? I > don't see how it can because the initial mappings can no longer be > unmapped. No, I will take a look but I am not if this applies since it is a opt-in feature. The CRIU adds some extra semantic requirements that is hard to take in consideration, similar to the defunct prelink support. But from CRIU discussion about a sealing extension to kernel mappings (such as vDSO) I got the impression that memory sealing is not a problem for userland (I guess that restoring just either ignore sealing or apply itself). > > I don't think the design is quite right. The property flag (why not a > flag in a dynamic tag? these things are very complicated) should allow > flagging a program or initiallly loaded shared object, like CRIU or > ltrace, as incompatible with mseal. If mseal is active for the process, > the mseal range should be controlled by a program header similar to > PT_GNU_RELRO, and not indiscriminately applied to the entire object. A dynamic tag would add support only for ET_DYN, where a GNU attribute allows for ET_EXEC as well (similar to recent hardening mechanism like BTI, GCS, etc.). An the range approach is an extra complexity that I don't really see much gain, since the idea of PT_LOAD are to being *immutable* over process execution and not selective applied (different than PT_GNU_RELRO where only part of memory range should be RO). A range-based approach will still allow a misconfigured binary to be partially sealed, which defeats the whole idea of the hardening (similar to partial relro). > > I don't think we should apply this blindly to any NODELETE mapping. It is not, it is controlled by the attribute presence. The NODELETE just allow shared library with the sealing attribute to be sealed with dlopen. > > Thanks, > Florian >
* Adhemerval Zanella Netto: > On 03/02/25 15:03, Florian Weimer wrote: >> * Adhemerval Zanella: >> >>> The Linux 6.10 (8be7258aad44) added the mseal syscall that allows >>> blocking some memory operations on the VMA range: >>> >>> * Unmapping, moving to another location, extending or shrinking the >>> size, munmap, and mremap. >>> * Moving or expanding a different VMA into the current range, via >>> mremap. >>> * Modifying the memory range with mmap along with flag MAP_FIXED. >>> * Expanding the size with mremap. >>> * Change the protection flags with mprotect or pkey_mprotect. >>> * Destructive behaviors on anonymous memory, such as madvice with >>> MADV_DONTNEED. >>> >>> Memory sealing is a hardening mechanism [1] to avoid either remapping >>> the memory segments or changing the memory protection segments layout >>> by the dynamic loader (for instance, the RELRO hardening). The OpenBSD >>> supports a similar hardening with the mimmutable syscall [2]. >> >> Have you checked that CRIU restore still works after these changes? I >> don't see how it can because the initial mappings can no longer be >> unmapped. > > No, I will take a look but I am not if this applies since it is a opt-in > feature. The CRIU adds some extra semantic requirements that is hard > to take in consideration, similar to the defunct prelink support. > > But from CRIU discussion about a sealing extension to kernel mappings > (such as vDSO) I got the impression that memory sealing is not a problem > for userland (I guess that restoring just either ignore sealing or > apply itself). CRIU needs to be able to unmap everything that was initially loaded by the kernel and glibc. This will stop working if we use mseal for glibc itself. > A dynamic tag would add support only for ET_DYN, where a GNU attribute > allows for ET_EXEC as well (similar to recent hardening mechanism like > BTI, GCS, etc.). We don't need the compatibility indicator on static executables because incompatible static binaries won't have the PT_* header that requests sealing. > An the range approach is an extra complexity that I don't really see > much gain, since the idea of PT_LOAD are to being *immutable* over process > execution and not selective applied (different than PT_GNU_RELRO where > only part of memory range should be RO). > > A range-based approach will still allow a misconfigured binary to be > partially sealed, which defeats the whole idea of the hardening (similar > to partial relro). But partial sealing could be a feature, too. For example, in libgcc_s, we might want to keep the root of the dynamic unwinding data structure in read-only memory most of the time, as a post-exploitation countermeasure that makes it harder to run arbitrary DWARF programs. Thanks, Florian
On 03/02/25 16:05, Florian Weimer wrote: > * Adhemerval Zanella Netto: > >> On 03/02/25 15:03, Florian Weimer wrote: >>> * Adhemerval Zanella: >>> >>>> The Linux 6.10 (8be7258aad44) added the mseal syscall that allows >>>> blocking some memory operations on the VMA range: >>>> >>>> * Unmapping, moving to another location, extending or shrinking the >>>> size, munmap, and mremap. >>>> * Moving or expanding a different VMA into the current range, via >>>> mremap. >>>> * Modifying the memory range with mmap along with flag MAP_FIXED. >>>> * Expanding the size with mremap. >>>> * Change the protection flags with mprotect or pkey_mprotect. >>>> * Destructive behaviors on anonymous memory, such as madvice with >>>> MADV_DONTNEED. >>>> >>>> Memory sealing is a hardening mechanism [1] to avoid either remapping >>>> the memory segments or changing the memory protection segments layout >>>> by the dynamic loader (for instance, the RELRO hardening). The OpenBSD >>>> supports a similar hardening with the mimmutable syscall [2]. >>> >>> Have you checked that CRIU restore still works after these changes? I >>> don't see how it can because the initial mappings can no longer be >>> unmapped. >> >> No, I will take a look but I am not if this applies since it is a opt-in >> feature. The CRIU adds some extra semantic requirements that is hard >> to take in consideration, similar to the defunct prelink support. >> >> But from CRIU discussion about a sealing extension to kernel mappings >> (such as vDSO) I got the impression that memory sealing is not a problem >> for userland (I guess that restoring just either ignore sealing or >> apply itself). > > CRIU needs to be able to unmap everything that was initially loaded by > the kernel and glibc. This will stop working if we use mseal for glibc > itself. So in this case the easiest way it to filter of mseal (with seccomp or something related) and disable sealing. I don't have a easy solution. > >> A dynamic tag would add support only for ET_DYN, where a GNU attribute >> allows for ET_EXEC as well (similar to recent hardening mechanism like >> BTI, GCS, etc.). > > We don't need the compatibility indicator on static executables because > incompatible static binaries won't have the PT_* header that requests > sealing. But we will still need to mark the ET_EXEC as opt-in for sealing. A dynamic tag will be only enabled for ET_DYN. > >> An the range approach is an extra complexity that I don't really see >> much gain, since the idea of PT_LOAD are to being *immutable* over process >> execution and not selective applied (different than PT_GNU_RELRO where >> only part of memory range should be RO). >> >> A range-based approach will still allow a misconfigured binary to be >> partially sealed, which defeats the whole idea of the hardening (similar >> to partial relro). > > But partial sealing could be a feature, too. For example, in libgcc_s, > we might want to keep the root of the dynamic unwinding data structure > in read-only memory most of the time, as a post-exploitation > countermeasure that makes it harder to run arbitrary DWARF programs. Right, but I feel that this is out of scope of the current proposed approach, where the idea is to only mark the binary to seal the current immutable easily obtained loaded ELF segments. Is this allocated dynamic during process execution of defined during build? Can't you put this on relro segment, with a constructor to do any required initialization?
* Adhemerval Zanella Netto: >> CRIU needs to be able to unmap everything that was initially loaded by >> the kernel and glibc. This will stop working if we use mseal for glibc >> itself. > > So in this case the easiest way it to filter of mseal (with seccomp or > something related) and disable sealing. I don't have a easy solution. Please test with CRIU and trace and find a way to make them work again if they are broken. >>> A dynamic tag would add support only for ET_DYN, where a GNU attribute >>> allows for ET_EXEC as well (similar to recent hardening mechanism like >>> BTI, GCS, etc.). >> >> We don't need the compatibility indicator on static executables because >> incompatible static binaries won't have the PT_* header that requests >> sealing. > > But we will still need to mark the ET_EXEC as opt-in for sealing. A dynamic > tag will be only enabled for ET_DYN. If the PT_* address ranges are present, the binary opts in to sealing. > Is this allocated dynamic during process execution of defined during build? > Can't you put this on relro segment, with a constructor to do any required > initialization? Constructors currently run after RELRO has been applied. Thanks, Florian
On 03/02/25 16:40, Florian Weimer wrote: > * Adhemerval Zanella Netto: > >>> CRIU needs to be able to unmap everything that was initially loaded by >>> the kernel and glibc. This will stop working if we use mseal for glibc >>> itself. >> >> So in this case the easiest way it to filter of mseal (with seccomp or >> something related) and disable sealing. I don't have a easy solution. > > Please test with CRIU and trace and find a way to make them work again > if they are broken. I don't see a way to make CRIU work where it is orthogonal of the idea of immutability of memory sealing. But I don't have much knowledge of how CRIU work internally, maybe someone more well versed in this project could help us out here. > >>>> A dynamic tag would add support only for ET_DYN, where a GNU attribute >>>> allows for ET_EXEC as well (similar to recent hardening mechanism like >>>> BTI, GCS, etc.). >>> >>> We don't need the compatibility indicator on static executables because >>> incompatible static binaries won't have the PT_* header that requests >>> sealing. >> >> But we will still need to mark the ET_EXEC as opt-in for sealing. A dynamic >> tag will be only enabled for ET_DYN. > > If the PT_* address ranges are present, the binary opts in to sealing. But you still need a opt-in mark to be memory sealed, that's the gnu attribute marking. A dynamic tag would not be present on ET_EXEC (although with PIE being more common it might not be that common), so we still need something else to mark the binary as opt-in to memory sealing. > >> Is this allocated dynamic during process execution of defined during build? >> Can't you put this on relro segment, with a constructor to do any required >> initialization? > > Constructors currently run after RELRO has been applied. Right, so is this some metadata in .data that is initialized after or during process execution? If so, I think for this case it would be better to have something like a special section, like ".gnu.mseal"; where the idea of adding sealing range makes sense. But again, I think this is different proposal and not really a blocker for this hardening.
On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote: > > * Adhemerval Zanella Netto: > > >> CRIU needs to be able to unmap everything that was initially loaded by > >> the kernel and glibc. This will stop working if we use mseal for glibc > >> itself. > > > > So in this case the easiest way it to filter of mseal (with seccomp or > > something related) and disable sealing. I don't have a easy solution. > > Please test with CRIU and trace and find a way to make them work again > if they are broken. that is a kernel problem afaik.. .why libc has to care about this limitation ?
On Mon, Feb 03, 2025 at 11:11:56PM -0300, Cristian Rodríguez wrote: > On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote: > > > > * Adhemerval Zanella Netto: > > > > >> CRIU needs to be able to unmap everything that was initially loaded by > > >> the kernel and glibc. This will stop working if we use mseal for glibc > > >> itself. > > > > > > So in this case the easiest way it to filter of mseal (with seccomp or > > > something related) and disable sealing. I don't have a easy solution. > > > > Please test with CRIU and trace and find a way to make them work again > > if they are broken. > > that is a kernel problem afaik.. Could you please provide more details on why you think that is the kernel issue? btw: this reminds me another discussion about mseal on lkml: https://lore.kernel.org/lkml/htdv44tqzi4jl2b7dwutsdwnh4tgrxq6xdvumi5wwu3hnh7sgw@tfwlal74ukx6/ > .why libc has to care about this limitation ? CRIU has worked with glibc for many years... It's not just about CRIU; other projects, such as gVisor and UML, are also likely to be affected. Thanks, Andrei
On 03/02/25 17:04, Adhemerval Zanella Netto wrote: > > > On 03/02/25 16:40, Florian Weimer wrote: >> * Adhemerval Zanella Netto: >> >>>> CRIU needs to be able to unmap everything that was initially loaded by >>>> the kernel and glibc. This will stop working if we use mseal for glibc >>>> itself. >>> >>> So in this case the easiest way it to filter of mseal (with seccomp or >>> something related) and disable sealing. I don't have a easy solution. >> >> Please test with CRIU and trace and find a way to make them work again >> if they are broken. > > I don't see a way to make CRIU work where it is orthogonal of the idea of > immutability of memory sealing. But I don't have much knowledge of how CRIU > work internally, maybe someone more well versed in this project could help us > out here. > >> >>>>> A dynamic tag would add support only for ET_DYN, where a GNU attribute >>>>> allows for ET_EXEC as well (similar to recent hardening mechanism like >>>>> BTI, GCS, etc.). >>>> >>>> We don't need the compatibility indicator on static executables because >>>> incompatible static binaries won't have the PT_* header that requests >>>> sealing. >>> >>> But we will still need to mark the ET_EXEC as opt-in for sealing. A dynamic >>> tag will be only enabled for ET_DYN. >> >> If the PT_* address ranges are present, the binary opts in to sealing. > > But you still need a opt-in mark to be memory sealed, that's the gnu attribute > marking. A dynamic tag would not be present on ET_EXEC (although with PIE > being more common it might not be that common), so we still need something > else to mark the binary as opt-in to memory sealing. > >> >>> Is this allocated dynamic during process execution of defined during build? >>> Can't you put this on relro segment, with a constructor to do any required >>> initialization? >> >> Constructors currently run after RELRO has been applied. > > Right, so is this some metadata in .data that is initialized after or during > process execution? If so, I think for this case it would be better to have > something like a special section, like ".gnu.mseal"; where the idea of > adding sealing range makes sense. But again, I think this is different > proposal and not really a blocker for this hardening. > Ok, I thought more about it and I still not sure if a range base is the ideal solution for this. The whole idea of memory sealing if the assumption of that all PT_LOAD of the binary should be *immutable* after process startup; meaning that if the program intends to change anything it would be better to be explicit mark as so. The OpenBSD memory sealing supported was added as default security hardening (no opt-out, different that this proposal) and later they found the need to add PT_OPENBSD_MUTABLE to handle some malloc metadata similar to the problem you bring. I think we can use a add a PT_GNU_SEAL instead with the expected range for sealing; so in the future a binary can adjust the range if required. I think it is somewhat more complex, since we will need to some extra constraints like the VirtAddr should be page-aligned, there should be no gap, the end should be in a allocated VMA. There is also the extra complication that we will need to use a different strategy for non-contiguous mapping (as done by the kernel in some cases); which adds even more complexity. Another solution is once we do require a way to mark the memory as mutable (so loader will remove them during sealing) to add a PT_GNU_NOSEAL similar to what OpenBSD does. It makes more sense to me because if binary does require, it knows exactly the range it requires to be mutable by marking it on a different section (.gnu.mutable or something like it).
On 06/02/25 06:15, Andrei Vagin wrote: > On Mon, Feb 03, 2025 at 11:11:56PM -0300, Cristian Rodríguez wrote: >> On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote: >>> >>> * Adhemerval Zanella Netto: >>> >>>>> CRIU needs to be able to unmap everything that was initially loaded by >>>>> the kernel and glibc. This will stop working if we use mseal for glibc >>>>> itself. >>>> >>>> So in this case the easiest way it to filter of mseal (with seccomp or >>>> something related) and disable sealing. I don't have a easy solution. >>> >>> Please test with CRIU and trace and find a way to make them work again >>> if they are broken. >> >> that is a kernel problem afaik.. > > Could you please provide more details on why you think that is the > kernel issue? > > btw: this reminds me another discussion about mseal on lkml: > https://lore.kernel.org/lkml/htdv44tqzi4jl2b7dwutsdwnh4tgrxq6xdvumi5wwu3hnh7sgw@tfwlal74ukx6/ > >> .why libc has to care about this limitation ? > > CRIU has worked with glibc for many years... It's not just about CRIU; > other projects, such as gVisor and UML, are also likely to be affected. The current proposal is a opt-in feature, but also without a way to disable it (similar to how RELRO is enableD). I don't have much experience on how CRIU or gVisor works internally, but if any requires to change any metadata (munmap, mprotect) of the PT_LOAD elf segments after startup this basically defeats the whole idea of the memory sealing hardening. I don't see a way to support both semantics without some extra kernel support, where either you can mark some process with extra credentials to do the required VMA operations (like process_madvise, etc.) or disable sealing during the snapshot. The mseal usage idea was primarily for program loaders, similar to how mimmutable for OpenBSD; but it seems that some programs also intend to use the syscall directly for some internal hardening (like Chrome). How CRIU/gVisor would handle such scenarios?
On Thu, Feb 6, 2025 at 3:25 PM Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> wrote: > > > > On 06/02/25 06:15, Andrei Vagin wrote: > > On Mon, Feb 03, 2025 at 11:11:56PM -0300, Cristian Rodríguez wrote: > >> On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote: > >>> > >>> * Adhemerval Zanella Netto: > >>> > >>>>> CRIU needs to be able to unmap everything that was initially loaded by > >>>>> the kernel and glibc. This will stop working if we use mseal for glibc > >>>>> itself. > >>>> > >>>> So in this case the easiest way it to filter of mseal (with seccomp or > >>>> something related) and disable sealing. I don't have a easy solution. > >>> > >>> Please test with CRIU and trace and find a way to make them work again > >>> if they are broken. > >> > >> that is a kernel problem afaik.. > > > > Could you please provide more details on why you think that is the > > kernel issue? > > > > btw: this reminds me another discussion about mseal on lkml: > > https://lore.kernel.org/lkml/htdv44tqzi4jl2b7dwutsdwnh4tgrxq6xdvumi5wwu3hnh7sgw@tfwlal74ukx6/ > > > >> .why libc has to care about this limitation ? > > > > CRIU has worked with glibc for many years... It's not just about CRIU; > > other projects, such as gVisor and UML, are also likely to be affected. > > The current proposal is a opt-in feature, but also without a way to disable it > (similar to how RELRO is enableD). > > I don't have much experience on how CRIU or gVisor works internally, but if > any requires to change any metadata (munmap, mprotect) of the PT_LOAD elf > segments after startup this basically defeats the whole idea of the memory > sealing hardening. > > I don't see a way to support both semantics without some extra kernel support, > where either you can mark some process with extra credentials to do the > required VMA operations (like process_madvise, etc.) or disable sealing during > the snapshot. > > The mseal usage idea was primarily for program loaders, similar to how > mimmutable for OpenBSD; but it seems that some programs also intend to > use the syscall directly for some internal hardening (like Chrome). How > CRIU/gVisor would handle such scenarios? Dear friends, I've quickly read a patchset [PATCH v8 0/8] Add support for memory sealing (https://sourceware.org/pipermail/libc-alpha/2025-January/164361.html) and noticed that on https://sourceware.org/pipermail/libc-alpha/2025-January/164368.html it's said: >The GNU_PROPERTY_MEMORY_SEAL enforcement depends on whether the kernel >supports the mseal syscall and how glibc is configured. On the default >configuration that aims to support older kernel releases, the memory >sealing attribute is taken as a hint. If glibc is configured with a >minimum kernel of 6.10, where mseal is implied to be supported, >sealing is enforced. => if I understand it right, it makes memory sealing to be enabled by default if the kernel supports it even without a linker flag, right? I don't really understand what "glibc is configured with a minimum kernel of 6.10" means from the user perspective. I'm not very familiar with glibc internals, so can somebody put some light on this, please? I can't see how this can break the CRIU dump for us (I believe it shouldn't but still worth checking), but for CRIU restore it's definitely a problem and reminds me of the rseq()&CRIU story we had a few years ago. My current understanding is: *during CRIU restore* 0. somehow disable mseal for CRIU binary itself, to make sure that when CRIU do clone() we don't get any mappings sealed 1. restore all memory mappings of the restorable process without mseal() applied to them 2. at the later criu restore stage go over them and apply mseal() I have a bad feeling that I still miss something, but even step 0 is a problem right now if we go with the current approach from this patch series, isn't it? Kind regards, Alex
On 06/02/25 15:03, Aleksandr Mikhalitsyn wrote: > On Thu, Feb 6, 2025 at 3:25 PM Adhemerval Zanella Netto > <adhemerval.zanella@linaro.org> wrote: >> >> >> >> On 06/02/25 06:15, Andrei Vagin wrote: >>> On Mon, Feb 03, 2025 at 11:11:56PM -0300, Cristian Rodríguez wrote: >>>> On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote: >>>>> >>>>> * Adhemerval Zanella Netto: >>>>> >>>>>>> CRIU needs to be able to unmap everything that was initially loaded by >>>>>>> the kernel and glibc. This will stop working if we use mseal for glibc >>>>>>> itself. >>>>>> >>>>>> So in this case the easiest way it to filter of mseal (with seccomp or >>>>>> something related) and disable sealing. I don't have a easy solution. >>>>> >>>>> Please test with CRIU and trace and find a way to make them work again >>>>> if they are broken. >>>> >>>> that is a kernel problem afaik.. >>> >>> Could you please provide more details on why you think that is the >>> kernel issue? >>> >>> btw: this reminds me another discussion about mseal on lkml: >>> https://lore.kernel.org/lkml/htdv44tqzi4jl2b7dwutsdwnh4tgrxq6xdvumi5wwu3hnh7sgw@tfwlal74ukx6/ >>> >>>> .why libc has to care about this limitation ? >>> >>> CRIU has worked with glibc for many years... It's not just about CRIU; >>> other projects, such as gVisor and UML, are also likely to be affected. >> >> The current proposal is a opt-in feature, but also without a way to disable it >> (similar to how RELRO is enableD). >> >> I don't have much experience on how CRIU or gVisor works internally, but if >> any requires to change any metadata (munmap, mprotect) of the PT_LOAD elf >> segments after startup this basically defeats the whole idea of the memory >> sealing hardening. >> >> I don't see a way to support both semantics without some extra kernel support, >> where either you can mark some process with extra credentials to do the >> required VMA operations (like process_madvise, etc.) or disable sealing during >> the snapshot. >> >> The mseal usage idea was primarily for program loaders, similar to how >> mimmutable for OpenBSD; but it seems that some programs also intend to >> use the syscall directly for some internal hardening (like Chrome). How >> CRIU/gVisor would handle such scenarios? > > Dear friends, > > I've quickly read a patchset [PATCH v8 0/8] Add support for memory > sealing (https://sourceware.org/pipermail/libc-alpha/2025-January/164361.html) > and noticed that on > https://sourceware.org/pipermail/libc-alpha/2025-January/164368.html > it's said: >> The GNU_PROPERTY_MEMORY_SEAL enforcement depends on whether the kernel >> supports the mseal syscall and how glibc is configured. On the default >> configuration that aims to support older kernel releases, the memory >> sealing attribute is taken as a hint. If glibc is configured with a >> minimum kernel of 6.10, where mseal is implied to be supported, >> sealing is enforced. > > => if I understand it right, it makes memory sealing to be enabled by > default if the kernel supports it even without a linker flag, right? > > I don't really understand what "glibc is configured with a minimum > kernel of 6.10" means from the user perspective. > I'm not very familiar with glibc internals, so can somebody put some > light on this, please? On glibc has a minimum support kernel version of 3.2; but some architectures override it (either because the ABI was added in newer versions, or due some other reason). We also have an option on where you can build glibc assuming it will always run on a specific kernel version (--enable-kernel=x.y.z). On previous releases we enforced by checking the kernel version at loading time, but currently glibc only uses to assume that certain syscall are always present (so there is no need to use fallbacks or handle ENOSYS). So if you build glibc with --enable-kernel=6.10 it means that mseal is expected to be always usable, ENOSYS is not possible, and thus any syscall failure is expected to be an error (assuming that we are passing valid arguments). If --enable-kernel is not used, it means that glibc can run on a kernel without mseal, and thus memory sealing can not be applied (we still might enforce it, but I think since we do have a way to enforce with --enable-kernel there is no urgent need for it). In any case, memory sealing will be only applied in the presence of GNU_PROPERTY_MEMORY_SEAL. > > I can't see how this can break the CRIU dump for us (I believe it > shouldn't but still worth checking), but for CRIU restore it's > definitely a problem > and reminds me of the rseq()&CRIU story we had a few years ago. My > current understanding is: > > *during CRIU restore* > 0. somehow disable mseal for CRIU binary itself, to make sure that > when CRIU do clone() we don't get any mappings sealed > 1. restore all memory mappings of the restorable process without > mseal() applied to them > 2. at the later criu restore stage go over them and apply mseal() > > I have a bad feeling that I still miss something, but even step 0 is a > problem right now if we go with the current approach from this > patch series, isn't it? I am not familiar on how CRIU snapshot/restore is done, and how is responsible to do each step. Is the kernel involved in any dump step, meaning that you need either to start the process with some IPC, or it just done in userland (with ptrace or other way to stop the process plus reading /proc/mem)? And on restore, how is this accomplished?
On Thu, Feb 06, 2025 at 04:47:32PM -0300, Adhemerval Zanella Netto wrote: > > > On 06/02/25 15:03, Aleksandr Mikhalitsyn wrote: > > On Thu, Feb 6, 2025 at 3:25 PM Adhemerval Zanella Netto > > <adhemerval.zanella@linaro.org> wrote: > >> > >> > >> > >> On 06/02/25 06:15, Andrei Vagin wrote: > >>> On Mon, Feb 03, 2025 at 11:11:56PM -0300, Cristian Rodríguez wrote: > >>>> On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote: > >>>>> > >>>>> * Adhemerval Zanella Netto: > >>>>> > >>>>>>> CRIU needs to be able to unmap everything that was initially loaded by > >>>>>>> the kernel and glibc. This will stop working if we use mseal for glibc > >>>>>>> itself. > >>>>>> > >>>>>> So in this case the easiest way it to filter of mseal (with seccomp or > >>>>>> something related) and disable sealing. I don't have a easy solution. > >>>>> > >>>>> Please test with CRIU and trace and find a way to make them work again > >>>>> if they are broken. > >>>> > >>>> that is a kernel problem afaik.. > >>> > >>> Could you please provide more details on why you think that is the > >>> kernel issue? > >>> > >>> btw: this reminds me another discussion about mseal on lkml: > >>> https://lore.kernel.org/lkml/htdv44tqzi4jl2b7dwutsdwnh4tgrxq6xdvumi5wwu3hnh7sgw@tfwlal74ukx6/ > >>> > >>>> .why libc has to care about this limitation ? > >>> > >>> CRIU has worked with glibc for many years... It's not just about CRIU; > >>> other projects, such as gVisor and UML, are also likely to be affected. > >> > >> The current proposal is a opt-in feature, but also without a way to disable it > >> (similar to how RELRO is enableD). > >> > >> I don't have much experience on how CRIU or gVisor works internally, but if > >> any requires to change any metadata (munmap, mprotect) of the PT_LOAD elf > >> segments after startup this basically defeats the whole idea of the memory > >> sealing hardening. > >> > >> I don't see a way to support both semantics without some extra kernel support, > >> where either you can mark some process with extra credentials to do the > >> required VMA operations (like process_madvise, etc.) or disable sealing during > >> the snapshot. > >> > >> The mseal usage idea was primarily for program loaders, similar to how > >> mimmutable for OpenBSD; but it seems that some programs also intend to > >> use the syscall directly for some internal hardening (like Chrome). How > >> CRIU/gVisor would handle such scenarios? > > > > Dear friends, > > > > I've quickly read a patchset [PATCH v8 0/8] Add support for memory > > sealing (https://sourceware.org/pipermail/libc-alpha/2025-January/164361.html) > > and noticed that on > > https://sourceware.org/pipermail/libc-alpha/2025-January/164368.html > > it's said: > >> The GNU_PROPERTY_MEMORY_SEAL enforcement depends on whether the kernel > >> supports the mseal syscall and how glibc is configured. On the default > >> configuration that aims to support older kernel releases, the memory > >> sealing attribute is taken as a hint. If glibc is configured with a > >> minimum kernel of 6.10, where mseal is implied to be supported, > >> sealing is enforced. > > > > => if I understand it right, it makes memory sealing to be enabled by > > default if the kernel supports it even without a linker flag, right? > > > > I don't really understand what "glibc is configured with a minimum > > kernel of 6.10" means from the user perspective. > > I'm not very familiar with glibc internals, so can somebody put some > > light on this, please? > > On glibc has a minimum support kernel version of 3.2; but some > architectures override it (either because the ABI was added in newer > versions, or due some other reason). > > We also have an option on where you can build glibc assuming it will > always run on a specific kernel version (--enable-kernel=x.y.z). On > previous releases we enforced by checking the kernel version at loading > time, but currently glibc only uses to assume that certain syscall are > always present (so there is no need to use fallbacks or handle ENOSYS). > > So if you build glibc with --enable-kernel=6.10 it means that mseal > is expected to be always usable, ENOSYS is not possible, and thus any > syscall failure is expected to be an error (assuming that we are passing > valid arguments). > > If --enable-kernel is not used, it means that glibc can run on a kernel > without mseal, and thus memory sealing can not be applied (we still might > enforce it, but I think since we do have a way to enforce with > --enable-kernel there is no urgent need for it). > > In any case, memory sealing will be only applied in the presence > of GNU_PROPERTY_MEMORY_SEAL. But this flag is considered for a binary and its libraries separately. If libc is compiled with GNU_PROPERTY_MEMORY_SEAL, all binaries that load this libc will have sealed mappings, regardless of whether the binary itself has the flag or not. I compiled glibc with the patches and performed a simple experiment: ``` [root@bc2868439161 install]# cat test.c int main() { return 0; } [root@bc2868439161 install]# gcc -Wl,-dynamic-linker,/mnt/glibc/install/lib/ld-linux-x86-64.so.2 -Wl,-z,nomemory-seal test.c [root@bc2868439161 install]# strace -e mseal,openat,mmap ./a.out mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda54b59000 mseal(0x7fda54b59000, 8192, 0) = 0 openat(AT_FDCWD, "/mnt/glibc/install/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 mmap(NULL, 2001, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fda54b58000 openat(AT_FDCWD, "/mnt/glibc/install/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 mmap(NULL, 1998928, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fda5496f000 mmap(0x7fda54ace000, 483328, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15f000) = 0x7fda54ace000 mmap(0x7fda54b44000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d5000) = 0x7fda54b44000 mmap(0x7fda54b4a000, 53328, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fda54b4a000 mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda5496c000 mseal(0x7fda5496c000, 12288, 0) = 0 mseal(0x7fda5496f000, 1998928, 0) = 0 mseal(0x7fda54b61000, 163665, 0) = 0 mseal(0x7fda54b89000, 45544, 0) = 0 mseal(0x7fda54b95000, 13096, 0) = 0 +++ exited with 0 +++ ``` The test binary was compiled without the GNU_PROPERTY_MEMORY_SEAL flag. However, we can see that all glibc mappings have been sealed. The initial mapping is sealed even before libc.so is loaded, likely because ld.so also has the GNU_PROPERTY_MEMORY_SEAL flag. For operation, CRIU needs to be able to unmap all its mappings, which is essential for restoring process address spaces. This means we need to compile CRIU so that its process doesn't have any sealed mappings. The same requirement applies to gVisor and UML, which both use stub processes to manage guest address spaces. Basically, the main process forks a new process, unmaps all existing mappings in the forked process, and then populates it with guest mappings. > > > > > I can't see how this can break the CRIU dump for us (I believe it > > shouldn't but still worth checking), but for CRIU restore it's > > definitely a problem > > and reminds me of the rseq()&CRIU story we had a few years ago. My > > current understanding is: > > > > *during CRIU restore* > > 0. somehow disable mseal for CRIU binary itself, to make sure that > > when CRIU do clone() we don't get any mappings sealed > > 1. restore all memory mappings of the restorable process without > > mseal() applied to them > > 2. at the later criu restore stage go over them and apply mseal() > > > > I have a bad feeling that I still miss something, but even step 0 is a > > problem right now if we go with the current approach from this > > patch series, isn't it? > > I am not familiar on how CRIU snapshot/restore is done, and how is > responsible to do each step. Is the kernel involved in any dump step, > meaning that you need either to start the process with some IPC, or it > just done in userland (with ptrace or other way to stop the process > plus reading /proc/mem)? It is done in userland. CRIU uses ptrace, proc and even injects a small binary code in a target process to collect all required information to be able to restore the process in the same state later. > > And on restore, how is this accomplished? The process is a bit more complicated, but for a basic understanding, it involves the following steps: fork a new process; restore all mappings; unmap all CRIU mappings; remap the restored mappings to the correct addresses; and finally, resume the process. Thanks, Andrei
On 06/02/25 21:47, Andrei Vagin wrote: > On Thu, Feb 06, 2025 at 04:47:32PM -0300, Adhemerval Zanella Netto wrote: >> >> >> On 06/02/25 15:03, Aleksandr Mikhalitsyn wrote: >>> On Thu, Feb 6, 2025 at 3:25 PM Adhemerval Zanella Netto >>> <adhemerval.zanella@linaro.org> wrote: >>>> >>>> >>>> >>>> On 06/02/25 06:15, Andrei Vagin wrote: >>>>> On Mon, Feb 03, 2025 at 11:11:56PM -0300, Cristian Rodríguez wrote: >>>>>> On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote: >>>>>>> >>>>>>> * Adhemerval Zanella Netto: >>>>>>> >>>>>>>>> CRIU needs to be able to unmap everything that was initially loaded by >>>>>>>>> the kernel and glibc. This will stop working if we use mseal for glibc >>>>>>>>> itself. >>>>>>>> >>>>>>>> So in this case the easiest way it to filter of mseal (with seccomp or >>>>>>>> something related) and disable sealing. I don't have a easy solution. >>>>>>> >>>>>>> Please test with CRIU and trace and find a way to make them work again >>>>>>> if they are broken. >>>>>> >>>>>> that is a kernel problem afaik.. >>>>> >>>>> Could you please provide more details on why you think that is the >>>>> kernel issue? >>>>> >>>>> btw: this reminds me another discussion about mseal on lkml: >>>>> https://lore.kernel.org/lkml/htdv44tqzi4jl2b7dwutsdwnh4tgrxq6xdvumi5wwu3hnh7sgw@tfwlal74ukx6/ >>>>> >>>>>> .why libc has to care about this limitation ? >>>>> >>>>> CRIU has worked with glibc for many years... It's not just about CRIU; >>>>> other projects, such as gVisor and UML, are also likely to be affected. >>>> >>>> The current proposal is a opt-in feature, but also without a way to disable it >>>> (similar to how RELRO is enableD). >>>> >>>> I don't have much experience on how CRIU or gVisor works internally, but if >>>> any requires to change any metadata (munmap, mprotect) of the PT_LOAD elf >>>> segments after startup this basically defeats the whole idea of the memory >>>> sealing hardening. >>>> >>>> I don't see a way to support both semantics without some extra kernel support, >>>> where either you can mark some process with extra credentials to do the >>>> required VMA operations (like process_madvise, etc.) or disable sealing during >>>> the snapshot. >>>> >>>> The mseal usage idea was primarily for program loaders, similar to how >>>> mimmutable for OpenBSD; but it seems that some programs also intend to >>>> use the syscall directly for some internal hardening (like Chrome). How >>>> CRIU/gVisor would handle such scenarios? >>> >>> Dear friends, >>> >>> I've quickly read a patchset [PATCH v8 0/8] Add support for memory >>> sealing (https://sourceware.org/pipermail/libc-alpha/2025-January/164361.html) >>> and noticed that on >>> https://sourceware.org/pipermail/libc-alpha/2025-January/164368.html >>> it's said: >>>> The GNU_PROPERTY_MEMORY_SEAL enforcement depends on whether the kernel >>>> supports the mseal syscall and how glibc is configured. On the default >>>> configuration that aims to support older kernel releases, the memory >>>> sealing attribute is taken as a hint. If glibc is configured with a >>>> minimum kernel of 6.10, where mseal is implied to be supported, >>>> sealing is enforced. >>> >>> => if I understand it right, it makes memory sealing to be enabled by >>> default if the kernel supports it even without a linker flag, right? >>> >>> I don't really understand what "glibc is configured with a minimum >>> kernel of 6.10" means from the user perspective. >>> I'm not very familiar with glibc internals, so can somebody put some >>> light on this, please? >> >> On glibc has a minimum support kernel version of 3.2; but some >> architectures override it (either because the ABI was added in newer >> versions, or due some other reason). >> >> We also have an option on where you can build glibc assuming it will >> always run on a specific kernel version (--enable-kernel=x.y.z). On >> previous releases we enforced by checking the kernel version at loading >> time, but currently glibc only uses to assume that certain syscall are >> always present (so there is no need to use fallbacks or handle ENOSYS). >> >> So if you build glibc with --enable-kernel=6.10 it means that mseal >> is expected to be always usable, ENOSYS is not possible, and thus any >> syscall failure is expected to be an error (assuming that we are passing >> valid arguments). >> >> If --enable-kernel is not used, it means that glibc can run on a kernel >> without mseal, and thus memory sealing can not be applied (we still might >> enforce it, but I think since we do have a way to enforce with >> --enable-kernel there is no urgent need for it). >> >> In any case, memory sealing will be only applied in the presence >> of GNU_PROPERTY_MEMORY_SEAL. > > But this flag is considered for a binary and its libraries separately. > If libc is compiled with GNU_PROPERTY_MEMORY_SEAL, all binaries that > load this libc will have sealed mappings, regardless of whether the > binary itself has the flag or not. > > I compiled glibc with the patches and performed a simple experiment: > > ``` > [root@bc2868439161 install]# cat test.c > int main() { > return 0; > } > [root@bc2868439161 install]# gcc -Wl,-dynamic-linker,/mnt/glibc/install/lib/ld-linux-x86-64.so.2 -Wl,-z,nomemory-seal test.c > [root@bc2868439161 install]# strace -e mseal,openat,mmap ./a.out > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda54b59000 > mseal(0x7fda54b59000, 8192, 0) = 0 > openat(AT_FDCWD, "/mnt/glibc/install/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 > mmap(NULL, 2001, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fda54b58000 > openat(AT_FDCWD, "/mnt/glibc/install/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 > mmap(NULL, 1998928, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fda5496f000 > mmap(0x7fda54ace000, 483328, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15f000) = 0x7fda54ace000 > mmap(0x7fda54b44000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d5000) = 0x7fda54b44000 > mmap(0x7fda54b4a000, 53328, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fda54b4a000 > mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda5496c000 > mseal(0x7fda5496c000, 12288, 0) = 0 > mseal(0x7fda5496f000, 1998928, 0) = 0 > mseal(0x7fda54b61000, 163665, 0) = 0 > mseal(0x7fda54b89000, 45544, 0) = 0 > mseal(0x7fda54b95000, 13096, 0) = 0 > +++ exited with 0 +++ > ``` > > The test binary was compiled without the GNU_PROPERTY_MEMORY_SEAL flag. > However, we can see that all glibc mappings have been sealed. The > initial mapping is sealed even before libc.so is loaded, likely because > ld.so also has the GNU_PROPERTY_MEMORY_SEAL flag. Yes, this is controlled by a new configure flags [1], which is enabled by default. With --disable-default-memory-seal you can disable sealing for glibc itself. [1] https://patchwork.sourceware.org/project/glibc/patch/20250129172550.1119706-8-adhemerval.zanella@linaro.org/ > > For operation, CRIU needs to be able to unmap all its mappings, which is > essential for restoring process address spaces. This means we need to > compile CRIU so that its process doesn't have any sealed mappings. > > The same requirement applies to gVisor and UML, which both use stub > processes to manage guest address spaces. Basically, the main process > forks a new process, unmaps all existing mappings in the forked process, > and then populates it with guest mappings. The main problem here is memory sealing idea is hardening mechanism to prevent exactly this kind of operation. And this does not help also if the program uses mseal directly, like Chrome and maybe other intends to do. How do intend to work with these scenarios? On previous iterations of this patch I have a tunable to disable sealing, where GNU_PROPERTY_MEMORY_SEAL is simple ignored. I removed because this is way a bypass the security hardening, and it also does help on fork case. I still think we need some kernel help here, where a process can configure itself (with a prctl or something related) to make a fork() process not inherit the sealing bit to proper fix it without making this hardening a opt-out feature (which I defeats the whole idea). Ideally it would require a new clone flag, and most likely a new fork symbol, to avoid concurrent issues (where multiple thread sets a global state). > >> >>> >>> I can't see how this can break the CRIU dump for us (I believe it >>> shouldn't but still worth checking), but for CRIU restore it's >>> definitely a problem >>> and reminds me of the rseq()&CRIU story we had a few years ago. My >>> current understanding is: >>> >>> *during CRIU restore* >>> 0. somehow disable mseal for CRIU binary itself, to make sure that >>> when CRIU do clone() we don't get any mappings sealed >>> 1. restore all memory mappings of the restorable process without >>> mseal() applied to them >>> 2. at the later criu restore stage go over them and apply mseal() >>> >>> I have a bad feeling that I still miss something, but even step 0 is a >>> problem right now if we go with the current approach from this >>> patch series, isn't it? >> >> I am not familiar on how CRIU snapshot/restore is done, and how is >> responsible to do each step. Is the kernel involved in any dump step, >> meaning that you need either to start the process with some IPC, or it >> just done in userland (with ptrace or other way to stop the process >> plus reading /proc/mem)? > > It is done in userland. CRIU uses ptrace, proc and even injects a small > binary code in a target process to collect all required information to > be able to restore the process in the same state later. > >> >> And on restore, how is this accomplished? > > The process is a bit more complicated, but for a basic understanding, it > involves the following steps: fork a new process; restore all mappings; > unmap all CRIU mappings; remap the restored mappings to the correct > addresses; and finally, resume the process. > > Thanks, > Andrei
On 07/02/25 09:10, Adhemerval Zanella Netto wrote: > > > On 06/02/25 21:47, Andrei Vagin wrote: >> On Thu, Feb 06, 2025 at 04:47:32PM -0300, Adhemerval Zanella Netto wrote: >>> >>> >>> On 06/02/25 15:03, Aleksandr Mikhalitsyn wrote: >>>> On Thu, Feb 6, 2025 at 3:25 PM Adhemerval Zanella Netto >>>> <adhemerval.zanella@linaro.org> wrote: >>>>> >>>>> >>>>> >>>>> On 06/02/25 06:15, Andrei Vagin wrote: >>>>>> On Mon, Feb 03, 2025 at 11:11:56PM -0300, Cristian Rodríguez wrote: >>>>>>> On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote: >>>>>>>> >>>>>>>> * Adhemerval Zanella Netto: >>>>>>>> >>>>>>>>>> CRIU needs to be able to unmap everything that was initially loaded by >>>>>>>>>> the kernel and glibc. This will stop working if we use mseal for glibc >>>>>>>>>> itself. >>>>>>>>> >>>>>>>>> So in this case the easiest way it to filter of mseal (with seccomp or >>>>>>>>> something related) and disable sealing. I don't have a easy solution. >>>>>>>> >>>>>>>> Please test with CRIU and trace and find a way to make them work again >>>>>>>> if they are broken. >>>>>>> >>>>>>> that is a kernel problem afaik.. >>>>>> >>>>>> Could you please provide more details on why you think that is the >>>>>> kernel issue? >>>>>> >>>>>> btw: this reminds me another discussion about mseal on lkml: >>>>>> https://lore.kernel.org/lkml/htdv44tqzi4jl2b7dwutsdwnh4tgrxq6xdvumi5wwu3hnh7sgw@tfwlal74ukx6/ >>>>>> >>>>>>> .why libc has to care about this limitation ? >>>>>> >>>>>> CRIU has worked with glibc for many years... It's not just about CRIU; >>>>>> other projects, such as gVisor and UML, are also likely to be affected. >>>>> >>>>> The current proposal is a opt-in feature, but also without a way to disable it >>>>> (similar to how RELRO is enableD). >>>>> >>>>> I don't have much experience on how CRIU or gVisor works internally, but if >>>>> any requires to change any metadata (munmap, mprotect) of the PT_LOAD elf >>>>> segments after startup this basically defeats the whole idea of the memory >>>>> sealing hardening. >>>>> >>>>> I don't see a way to support both semantics without some extra kernel support, >>>>> where either you can mark some process with extra credentials to do the >>>>> required VMA operations (like process_madvise, etc.) or disable sealing during >>>>> the snapshot. >>>>> >>>>> The mseal usage idea was primarily for program loaders, similar to how >>>>> mimmutable for OpenBSD; but it seems that some programs also intend to >>>>> use the syscall directly for some internal hardening (like Chrome). How >>>>> CRIU/gVisor would handle such scenarios? >>>> >>>> Dear friends, >>>> >>>> I've quickly read a patchset [PATCH v8 0/8] Add support for memory >>>> sealing (https://sourceware.org/pipermail/libc-alpha/2025-January/164361.html) >>>> and noticed that on >>>> https://sourceware.org/pipermail/libc-alpha/2025-January/164368.html >>>> it's said: >>>>> The GNU_PROPERTY_MEMORY_SEAL enforcement depends on whether the kernel >>>>> supports the mseal syscall and how glibc is configured. On the default >>>>> configuration that aims to support older kernel releases, the memory >>>>> sealing attribute is taken as a hint. If glibc is configured with a >>>>> minimum kernel of 6.10, where mseal is implied to be supported, >>>>> sealing is enforced. >>>> >>>> => if I understand it right, it makes memory sealing to be enabled by >>>> default if the kernel supports it even without a linker flag, right? >>>> >>>> I don't really understand what "glibc is configured with a minimum >>>> kernel of 6.10" means from the user perspective. >>>> I'm not very familiar with glibc internals, so can somebody put some >>>> light on this, please? >>> >>> On glibc has a minimum support kernel version of 3.2; but some >>> architectures override it (either because the ABI was added in newer >>> versions, or due some other reason). >>> >>> We also have an option on where you can build glibc assuming it will >>> always run on a specific kernel version (--enable-kernel=x.y.z). On >>> previous releases we enforced by checking the kernel version at loading >>> time, but currently glibc only uses to assume that certain syscall are >>> always present (so there is no need to use fallbacks or handle ENOSYS). >>> >>> So if you build glibc with --enable-kernel=6.10 it means that mseal >>> is expected to be always usable, ENOSYS is not possible, and thus any >>> syscall failure is expected to be an error (assuming that we are passing >>> valid arguments). >>> >>> If --enable-kernel is not used, it means that glibc can run on a kernel >>> without mseal, and thus memory sealing can not be applied (we still might >>> enforce it, but I think since we do have a way to enforce with >>> --enable-kernel there is no urgent need for it). >>> >>> In any case, memory sealing will be only applied in the presence >>> of GNU_PROPERTY_MEMORY_SEAL. >> >> But this flag is considered for a binary and its libraries separately. >> If libc is compiled with GNU_PROPERTY_MEMORY_SEAL, all binaries that >> load this libc will have sealed mappings, regardless of whether the >> binary itself has the flag or not. >> >> I compiled glibc with the patches and performed a simple experiment: >> >> ``` >> [root@bc2868439161 install]# cat test.c >> int main() { >> return 0; >> } >> [root@bc2868439161 install]# gcc -Wl,-dynamic-linker,/mnt/glibc/install/lib/ld-linux-x86-64.so.2 -Wl,-z,nomemory-seal test.c >> [root@bc2868439161 install]# strace -e mseal,openat,mmap ./a.out >> mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda54b59000 >> mseal(0x7fda54b59000, 8192, 0) = 0 >> openat(AT_FDCWD, "/mnt/glibc/install/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 >> mmap(NULL, 2001, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fda54b58000 >> openat(AT_FDCWD, "/mnt/glibc/install/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 >> mmap(NULL, 1998928, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fda5496f000 >> mmap(0x7fda54ace000, 483328, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15f000) = 0x7fda54ace000 >> mmap(0x7fda54b44000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d5000) = 0x7fda54b44000 >> mmap(0x7fda54b4a000, 53328, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fda54b4a000 >> mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda5496c000 >> mseal(0x7fda5496c000, 12288, 0) = 0 >> mseal(0x7fda5496f000, 1998928, 0) = 0 >> mseal(0x7fda54b61000, 163665, 0) = 0 >> mseal(0x7fda54b89000, 45544, 0) = 0 >> mseal(0x7fda54b95000, 13096, 0) = 0 >> +++ exited with 0 +++ >> ``` >> >> The test binary was compiled without the GNU_PROPERTY_MEMORY_SEAL flag. >> However, we can see that all glibc mappings have been sealed. The >> initial mapping is sealed even before libc.so is loaded, likely because >> ld.so also has the GNU_PROPERTY_MEMORY_SEAL flag. > > Yes, this is controlled by a new configure flags [1], which is enabled by > default. With --disable-default-memory-seal you can disable sealing for > glibc itself. > > [1] https://patchwork.sourceware.org/project/glibc/patch/20250129172550.1119706-8-adhemerval.zanella@linaro.org/ > >> >> For operation, CRIU needs to be able to unmap all its mappings, which is >> essential for restoring process address spaces. This means we need to >> compile CRIU so that its process doesn't have any sealed mappings. >> >> The same requirement applies to gVisor and UML, which both use stub >> processes to manage guest address spaces. Basically, the main process >> forks a new process, unmaps all existing mappings in the forked process, >> and then populates it with guest mappings. > > The main problem here is memory sealing idea is hardening mechanism to prevent > exactly this kind of operation. And this does not help also if the program > uses mseal directly, like Chrome and maybe other intends to do. How do intend > to work with these scenarios? > > On previous iterations of this patch I have a tunable to disable sealing, > where GNU_PROPERTY_MEMORY_SEAL is simple ignored. I removed because this > is way a bypass the security hardening, and it also does help on fork case. > > I still think we need some kernel help here, where a process can configure > itself (with a prctl or something related) to make a fork() process not > inherit the sealing bit to proper fix it without making this hardening > a opt-out feature (which I defeats the whole idea). > > Ideally it would require a new clone flag, and most likely a new fork symbol, > to avoid concurrent issues (where multiple thread sets a global state). I am assuming here that restore can happen at any time, in a API like manner (I am not sure if I understand how CRIU/UMP/gVisor works in all cases). If the idea is to just have a wrapper binary that is linked against a glibc to just do the restore maybe a simple solution like filtering mseal syscall (so it act a noop) might work better.
* Cristian Rodríguez: > On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote: >> >> * Adhemerval Zanella Netto: >> >> >> CRIU needs to be able to unmap everything that was initially loaded by >> >> the kernel and glibc. This will stop working if we use mseal for glibc >> >> itself. >> > >> > So in this case the easiest way it to filter of mseal (with seccomp or >> > something related) and disable sealing. I don't have a easy solution. >> >> Please test with CRIU and trace and find a way to make them work again >> if they are broken. > > that is a kernel problem afaik.. .why libc has to care about this limitation ? We want CRIU and similar tools to be able to build with glibc 2.42 and later, without distributions having to provide two separate glibc versions. Thanks, Florian
On Tue, Feb 11, 2025 at 12:19 AM Florian Weimer <fweimer@redhat.com> wrote: > > * Cristian Rodríguez: > > > On Mon, Feb 3, 2025 at 4:40 PM Florian Weimer <fweimer@redhat.com> wrote: > >> > >> * Adhemerval Zanella Netto: > >> > >> >> CRIU needs to be able to unmap everything that was initially loaded by > >> >> the kernel and glibc. This will stop working if we use mseal for glibc > >> >> itself. > >> > > >> > So in this case the easiest way it to filter of mseal (with seccomp or > >> > something related) and disable sealing. I don't have a easy solution. > >> > >> Please test with CRIU and trace and find a way to make them work again > >> if they are broken. > > > > that is a kernel problem afaik.. .why libc has to care about this limitation ? > > We want CRIU and similar tools to be able to build with glibc 2.42 and > later, without distributions having to provide two separate glibc > versions. > Statically linking with glibc can be a walk around solution. Thanks -Jeff > Thanks, > Florian >