Message ID | 20240513191544.94754-1-pobrn@protonmail.com |
---|---|
State | New |
Headers | show |
Series | [v1] memfd: `MFD_NOEXEC_SEAL` should not imply `MFD_ALLOW_SEALING` | expand |
On Mon, May 13, 2024 at 12:15 PM Barnabás Pőcze <pobrn@protonmail.com> wrote: > > `MFD_NOEXEC_SEAL` should remove the executable bits and set > `F_SEAL_EXEC` to prevent further modifications to the executable > bits as per the comment in the uapi header file: > > not executable and sealed to prevent changing to executable > > However, currently, it also unsets `F_SEAL_SEAL`, essentially > acting as a superset of `MFD_ALLOW_SEALING`. Nothing implies > that it should be so, and indeed up until the second version > of the of the patchset[0] that introduced `MFD_EXEC` and > `MFD_NOEXEC_SEAL`, `F_SEAL_SEAL` was not removed, however it > was changed in the third revision of the patchset[1] without > a clear explanation. > > This behaviour is suprising for application developers, > there is no documentation that would reveal that `MFD_NOEXEC_SEAL` > has the additional effect of `MFD_ALLOW_SEALING`. > Ya, I agree that there should be documentation, such as a man page. I will work on that. > So do not remove `F_SEAL_SEAL` when `MFD_NOEXEC_SEAL` is requested. > This is technically an ABI break, but it seems very unlikely that an > application would depend on this behaviour (unless by accident). > > [0]: https://lore.kernel.org/lkml/20220805222126.142525-3-jeffxu@google.com/ > [1]: https://lore.kernel.org/lkml/20221202013404.163143-3-jeffxu@google.com/ > > Fixes: 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") > Signed-off-by: Barnabás Pőcze <pobrn@protonmail.com> > --- > > Or did I miss the explanation as to why MFD_NOEXEC_SEAL should > imply MFD_ALLOW_SEALING? If so, please direct me to it and > sorry for the noise. > Previously I might be thinking MFD_NOEXEC_SEAL implies MFD_ALLOW_SEALING because MFD_NOEXEC_SEAL seals F_SEAL_EXEC, and sealing is added only when MFD_ALLOW_SEALING is set. I agree your patch handles this better, e.g. mfd_create(MFD_NOEXEC_SEAL) will have F_SEAL_SEAL and F_SEAL_EXEC mfd_create(MFD_NOEXEC_SEAL|MFD_ALLOW_SEALING) will have F_SEAL_EXEC > --- > mm/memfd.c | 9 ++++----- > tools/testing/selftests/memfd/memfd_test.c | 2 +- > 2 files changed, 5 insertions(+), 6 deletions(-) > > diff --git a/mm/memfd.c b/mm/memfd.c > index 7d8d3ab3fa37..8b7f6afee21d 100644 > --- a/mm/memfd.c > +++ b/mm/memfd.c > @@ -356,12 +356,11 @@ SYSCALL_DEFINE2(memfd_create, > > inode->i_mode &= ~0111; > file_seals = memfd_file_seals_ptr(file); > - if (file_seals) { > - *file_seals &= ~F_SEAL_SEAL; > + if (file_seals) > *file_seals |= F_SEAL_EXEC; > - } > - } else if (flags & MFD_ALLOW_SEALING) { > - /* MFD_EXEC and MFD_ALLOW_SEALING are set */ > + } > + > + if (flags & MFD_ALLOW_SEALING) { > file_seals = memfd_file_seals_ptr(file); > if (file_seals) > *file_seals &= ~F_SEAL_SEAL; > diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c > index 18f585684e20..b6a7ad68c3c1 100644 > --- a/tools/testing/selftests/memfd/memfd_test.c > +++ b/tools/testing/selftests/memfd/memfd_test.c > @@ -1151,7 +1151,7 @@ static void test_noexec_seal(void) > mfd_def_size, > MFD_CLOEXEC | MFD_NOEXEC_SEAL); > mfd_assert_mode(fd, 0666); > - mfd_assert_has_seals(fd, F_SEAL_EXEC); > + mfd_assert_has_seals(fd, F_SEAL_SEAL | F_SEAL_EXEC); > mfd_fail_chmod(fd, 0777); > close(fd); > } > -- > 2.45.0 > Reviewed-by: Jeff Xu <jeffxu@google.com> Thanks! -Jeff
On Wed, 15 May 2024 23:11:12 -0700 Jeff Xu <jeffxu@google.com> wrote: > On Mon, May 13, 2024 at 12:15 PM Barnabás Pőcze <pobrn@protonmail.com> wrote: > > > > `MFD_NOEXEC_SEAL` should remove the executable bits and set > > `F_SEAL_EXEC` to prevent further modifications to the executable > > bits as per the comment in the uapi header file: > > > > not executable and sealed to prevent changing to executable > > > > However, currently, it also unsets `F_SEAL_SEAL`, essentially > > acting as a superset of `MFD_ALLOW_SEALING`. Nothing implies > > that it should be so, and indeed up until the second version > > of the of the patchset[0] that introduced `MFD_EXEC` and > > `MFD_NOEXEC_SEAL`, `F_SEAL_SEAL` was not removed, however it > > was changed in the third revision of the patchset[1] without > > a clear explanation. > > > > This behaviour is suprising for application developers, > > there is no documentation that would reveal that `MFD_NOEXEC_SEAL` > > has the additional effect of `MFD_ALLOW_SEALING`. > > > Ya, I agree that there should be documentation, such as a man page. I will > work on that. > > > So do not remove `F_SEAL_SEAL` when `MFD_NOEXEC_SEAL` is requested. > > This is technically an ABI break, but it seems very unlikely that an > > application would depend on this behaviour (unless by accident). > > > > [0]: https://lore.kernel.org/lkml/20220805222126.142525-3-jeffxu@google.com/ > > [1]: https://lore.kernel.org/lkml/20221202013404.163143-3-jeffxu@google.com/ > > ... > > Reviewed-by: Jeff Xu <jeffxu@google.com> It's a change to a userspace API, yes? Please let's have a detailed description of why this is OK. Why it won't affect any existing users. Also, please let's give consideration to a -stable backport so that all kernel versions will eventually behave in the same manner.
Hi 2024. május 23., csütörtök 1:23 keltezéssel, Andrew Morton <akpm@linux-foundation.org> írta: > On Wed, 15 May 2024 23:11:12 -0700 Jeff Xu <jeffxu@google.com> wrote: > > > On Mon, May 13, 2024 at 12:15 PM Barnabás Pőcze <pobrn@protonmail.com> wrote: > > > > > > `MFD_NOEXEC_SEAL` should remove the executable bits and set > > > `F_SEAL_EXEC` to prevent further modifications to the executable > > > bits as per the comment in the uapi header file: > > > > > > not executable and sealed to prevent changing to executable > > > > > > However, currently, it also unsets `F_SEAL_SEAL`, essentially > > > acting as a superset of `MFD_ALLOW_SEALING`. Nothing implies > > > that it should be so, and indeed up until the second version > > > of the of the patchset[0] that introduced `MFD_EXEC` and > > > `MFD_NOEXEC_SEAL`, `F_SEAL_SEAL` was not removed, however it > > > was changed in the third revision of the patchset[1] without > > > a clear explanation. > > > > > > This behaviour is suprising for application developers, > > > there is no documentation that would reveal that `MFD_NOEXEC_SEAL` > > > has the additional effect of `MFD_ALLOW_SEALING`. > > > > > Ya, I agree that there should be documentation, such as a man page. I will > > work on that. > > > > > So do not remove `F_SEAL_SEAL` when `MFD_NOEXEC_SEAL` is requested. > > > This is technically an ABI break, but it seems very unlikely that an > > > application would depend on this behaviour (unless by accident). > > > > > > [0]: https://lore.kernel.org/lkml/20220805222126.142525-3-jeffxu@google.com/ > > > [1]: https://lore.kernel.org/lkml/20221202013404.163143-3-jeffxu@google.com/ > > > > ... > > > > Reviewed-by: Jeff Xu <jeffxu@google.com> > > It's a change to a userspace API, yes? Please let's have a detailed > description of why this is OK. Why it won't affect any existing users. Yes, it is a uAPI change. To trigger user visible change, a program has to - create a memfd - with MFD_NOEXEC_SEAL, - without MFD_ALLOW_SEALING; - try to add seals / check the seals. This change in essence reverts the kernel's behaviour to that of Linux <6.3, where only `MFD_ALLOW_SEALING` enabled sealing. If a program works correctly on those kernels, it will likely work correctly after this change. I have looked through Debian Code Search and GitHub, searching for `MFD_NOEXEC_SEAL`. And I could find only a single breakage that this change would case: dbus-broker has its own memfd_create() wrapper that is aware of this implicit `MFD_ALLOW_SEALING` behaviour[0], and tries to work around it. This workaround will break. Luckily, however, as far as I could tell this only affects the test suite of dbus-broker, not its normal operations, so I believe it should be fine. I have prepared a PR with a fix[1]. > > Also, please let's give consideration to a -stable backport so that all > kernel versions will eventually behave in the same manner. > > I think that is a good idea, should I resend this with the `Cc: stable@...` tag or what should I do? Regards, Barnabás Pőcze [0]: https://github.com/bus1/dbus-broker/blob/9eb0b7e5826fc76cad7b025bc46f267d4a8784cb/src/util/misc.c#L114 [1]: https://github.com/bus1/dbus-broker/pull/366
On Wed, May 22, 2024 at 4:23 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Wed, 15 May 2024 23:11:12 -0700 Jeff Xu <jeffxu@google.com> wrote: > > > On Mon, May 13, 2024 at 12:15 PM Barnabás Pőcze <pobrn@protonmail.com> wrote: > > > > > > `MFD_NOEXEC_SEAL` should remove the executable bits and set > > > `F_SEAL_EXEC` to prevent further modifications to the executable > > > bits as per the comment in the uapi header file: > > > > > > not executable and sealed to prevent changing to executable > > > > > > However, currently, it also unsets `F_SEAL_SEAL`, essentially > > > acting as a superset of `MFD_ALLOW_SEALING`. Nothing implies > > > that it should be so, and indeed up until the second version > > > of the of the patchset[0] that introduced `MFD_EXEC` and > > > `MFD_NOEXEC_SEAL`, `F_SEAL_SEAL` was not removed, however it > > > was changed in the third revision of the patchset[1] without > > > a clear explanation. > > > > > > This behaviour is suprising for application developers, > > > there is no documentation that would reveal that `MFD_NOEXEC_SEAL` > > > has the additional effect of `MFD_ALLOW_SEALING`. > > > > > Ya, I agree that there should be documentation, such as a man page. I will > > work on that. > > > > > So do not remove `F_SEAL_SEAL` when `MFD_NOEXEC_SEAL` is requested. > > > This is technically an ABI break, but it seems very unlikely that an > > > application would depend on this behaviour (unless by accident). > > > > > > [0]: https://lore.kernel.org/lkml/20220805222126.142525-3-jeffxu@google.com/ > > > [1]: https://lore.kernel.org/lkml/20221202013404.163143-3-jeffxu@google.com/ > > > > ... > > > > Reviewed-by: Jeff Xu <jeffxu@google.com> > > It's a change to a userspace API, yes? Please let's have a detailed > description of why this is OK. Why it won't affect any existing users. > Unfortunately, this is a breaking change that might break a application if they do below: memfd_create("", MFD_NOEXEC_SEAL) fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE); <-- this will fail in new semantics, due to mfd not being sealable. However, I still think the new semantics is a better, the reason is due to the sysctl: memfd_noexec_scope Currently, when the sysctl is set to MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL kernel adds MFD_NOEXEC_SEAL to memfd_create, and the memfd becomes sealable. E.g. When the sysctl is set to MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL The app calls memfd_create("",0) application will get sealable memfd, which might be a surprise to application. If the app doesn't want this behavior, they will need one of two below in current implementation. 1> set the sysctl: memfd_noexec_scope to 0. So the kernel doesn't overwrite the mdmfd_create 2> modify their code to get non-sealable NOEXEC memfd. memfd_create("", MEMFD_NOEXEC_SCOPE_NOEXEC) fcntl(fd, F_ADD_SEALS, F_SEAL_SEAL) The new semantics works better with the sysctl. Since memfd noexec is new, maybe there is no application using the MFD_NOEXEC_SEAL to create sealable memfd. They mostly likely use memfd(MFD_NOEXEC_SEAL|MFD_ALLOW_SEALING) instead. I think it might benefit in the long term with the new semantics. If breaking change is not recommended, the alternative is to introduce a new flag. MFD_NOEXEC_SEAL_SEAL. (I can't find a better name...) What do you think ? > Also, please let's give consideration to a -stable backport so that all > kernel versions will eventually behave in the same manner. > Yes. If the new semantics is acceptable, backport is needed as bugfix to all kernel versions. I can do that if someone helps me with the process. And sorry about this bug that I created.
Hi On Thu, May 23, 2024, at 4:25 AM, Barnabás Pőcze wrote: > 2024. május 23., csütörtök 1:23 keltezéssel, Andrew Morton > <akpm@linux-foundation.org> írta: >> It's a change to a userspace API, yes? Please let's have a detailed >> description of why this is OK. Why it won't affect any existing users. > > Yes, it is a uAPI change. To trigger user visible change, a program has to > > - create a memfd > - with MFD_NOEXEC_SEAL, > - without MFD_ALLOW_SEALING; > - try to add seals / check the seals. > > This change in essence reverts the kernel's behaviour to that of Linux > <6.3, where > only `MFD_ALLOW_SEALING` enabled sealing. If a program works correctly > on those > kernels, it will likely work correctly after this change. > > I have looked through Debian Code Search and GitHub, searching for > `MFD_NOEXEC_SEAL`. > And I could find only a single breakage that this change would case: > dbus-broker > has its own memfd_create() wrapper that is aware of this implicit > `MFD_ALLOW_SEALING` > behaviour[0], and tries to work around it. This workaround will break. > Luckily, > however, as far as I could tell this only affects the test suite of > dbus-broker, > not its normal operations, so I believe it should be fine. I have > prepared a PR > with a fix[1]. We asked for exactly this fix before, so I very much support this. Our test-suite in `dbus-broker` merely verifies what the current kernel behavior is (just like the kernel selftests). I am certainly ok if the kernel breaks it. I will gladly adapt the test-suite. Previous discussion was in: [PATCH] memfd: support MFD_NOEXEC alongside MFD_EXEC https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/ Note that this fix is particularly important in combination with `vm.memfd_noexec=2`, since this breaks existing user-space by enabling sealing on all memfds unconditionally. I also encourage backporting to stable kernels. Reviewed-by: David Rheinsberg <david@readahead.eu> Thanks David
On Thu, May 23, 2024 at 1:24 AM David Rheinsberg <david@readahead.eu> wrote: > > Hi > > On Thu, May 23, 2024, at 4:25 AM, Barnabás Pőcze wrote: > > 2024. május 23., csütörtök 1:23 keltezéssel, Andrew Morton > > <akpm@linux-foundation.org> írta: > >> It's a change to a userspace API, yes? Please let's have a detailed > >> description of why this is OK. Why it won't affect any existing users. > > > > Yes, it is a uAPI change. To trigger user visible change, a program has to > > > > - create a memfd > > - with MFD_NOEXEC_SEAL, > > - without MFD_ALLOW_SEALING; > > - try to add seals / check the seals. > > > > This change in essence reverts the kernel's behaviour to that of Linux > > <6.3, where > > only `MFD_ALLOW_SEALING` enabled sealing. If a program works correctly > > on those > > kernels, it will likely work correctly after this change. > > > > I have looked through Debian Code Search and GitHub, searching for > > `MFD_NOEXEC_SEAL`. > > And I could find only a single breakage that this change would case: > > dbus-broker > > has its own memfd_create() wrapper that is aware of this implicit > > `MFD_ALLOW_SEALING` > > behaviour[0], and tries to work around it. This workaround will break. > > Luckily, > > however, as far as I could tell this only affects the test suite of > > dbus-broker, > > not its normal operations, so I believe it should be fine. I have > > prepared a PR > > with a fix[1]. > > We asked for exactly this fix before, so I very much support this. Our test-suite in `dbus-broker` merely verifies what the current kernel behavior is (just like the kernel selftests). I am certainly ok if the kernel breaks it. I will gladly adapt the test-suite. > > Previous discussion was in: > > [PATCH] memfd: support MFD_NOEXEC alongside MFD_EXEC > https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/ > > Note that this fix is particularly important in combination with `vm.memfd_noexec=2`, since this breaks existing user-space by enabling sealing on all memfds unconditionally. I also encourage backporting to stable kernels. > Also with vm.memfd_noexec=1. I think that problem must be addressed either with this patch, or with a new flag. Regarding vm.memfd_noexec, on another topic. I think in addition to vm.memfd_noexec = 1 and 2, there still could be another state: 3 =0. Do nothing. =1. This will add MFD_NOEXEC_SEAL if application didn't set EXEC or MFD_NOEXE_SEAL (to help with the migration) =2: This will reject all calls without MFD_NOEXEC_SEAL (the whole system doesn't allow executable memfd) =3: Application must set MFD_EXEC or MFD_NOEXEC_SEAL explicitly, or else it will be rejected. 3 is useful because it lets applications choose what to use, and forces applications to migrate to new semantics (this is what 2 did before 9876cfe8). The caveat is 3 is less restrictive than 2, so must document it clearly. -Jeff > Reviewed-by: David Rheinsberg <david@readahead.eu> > > Thanks > David
On Thu, May 23, 2024 at 9:20 AM Jeff Xu <jeffxu@google.com> wrote: > > On Thu, May 23, 2024 at 1:24 AM David Rheinsberg <david@readahead.eu> wrote: > > > > Hi > > > > On Thu, May 23, 2024, at 4:25 AM, Barnabás Pőcze wrote: > > > 2024. május 23., csütörtök 1:23 keltezéssel, Andrew Morton > > > <akpm@linux-foundation.org> írta: > > >> It's a change to a userspace API, yes? Please let's have a detailed > > >> description of why this is OK. Why it won't affect any existing users. > > > > > > Yes, it is a uAPI change. To trigger user visible change, a program has to > > > > > > - create a memfd > > > - with MFD_NOEXEC_SEAL, > > > - without MFD_ALLOW_SEALING; > > > - try to add seals / check the seals. > > > > > > This change in essence reverts the kernel's behaviour to that of Linux > > > <6.3, where > > > only `MFD_ALLOW_SEALING` enabled sealing. If a program works correctly > > > on those > > > kernels, it will likely work correctly after this change. > > > > > > I have looked through Debian Code Search and GitHub, searching for > > > `MFD_NOEXEC_SEAL`. > > > And I could find only a single breakage that this change would case: > > > dbus-broker > > > has its own memfd_create() wrapper that is aware of this implicit > > > `MFD_ALLOW_SEALING` > > > behaviour[0], and tries to work around it. This workaround will break. > > > Luckily, > > > however, as far as I could tell this only affects the test suite of > > > dbus-broker, > > > not its normal operations, so I believe it should be fine. I have > > > prepared a PR > > > with a fix[1]. > > > > We asked for exactly this fix before, so I very much support this. Our test-suite in `dbus-broker` merely verifies what the current kernel behavior is (just like the kernel selftests). I am certainly ok if the kernel breaks it. I will gladly adapt the test-suite. > > memfd is by default not sealable, and file is by default sealable, right ? that makes the memfd semantics different from other objects in linux. I wonder what is the original reason to have memfd this way? Another solution is to change memfd to be by-default sealable, although that will be an api change, but what side effect will it be ? If we are worried about the memfd being sealed by an attacker, the malicious code could also overwrite the content since memfd is not sealed. > > Previous discussion was in: > > > > [PATCH] memfd: support MFD_NOEXEC alongside MFD_EXEC > > https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/ > > > > Note that this fix is particularly important in combination with `vm.memfd_noexec=2`, since this breaks existing user-space by enabling sealing on all memfds unconditionally. I also encourage backporting to stable kernels. > > > Also with vm.memfd_noexec=1. > I think that problem must be addressed either with this patch, or with > a new flag. > > Regarding vm.memfd_noexec, on another topic. > I think in addition to vm.memfd_noexec = 1 and 2, there still could > be another state: 3 > > =0. Do nothing. > =1. This will add MFD_NOEXEC_SEAL if application didn't set EXEC or > MFD_NOEXE_SEAL (to help with the migration) > =2: This will reject all calls without MFD_NOEXEC_SEAL (the whole > system doesn't allow executable memfd) > =3: Application must set MFD_EXEC or MFD_NOEXEC_SEAL explicitly, or > else it will be rejected. > > 3 is useful because it lets applications choose what to use, and > forces applications to migrate to new semantics (this is what 2 did > before 9876cfe8). > The caveat is 3 is less restrictive than 2, so must document it clearly. > > -Jeff > > > Reviewed-by: David Rheinsberg <david@readahead.eu> > > > > Thanks > > David
2024. május 23., csütörtök 22:44 keltezéssel, Jeff Xu <jeffxu@google.com> írta: > Hi Barnabás > > Is that OK that I work on V2 ? It will be based on your V1 change and > I will also add more test cases. Sure, please go ahead. At the very end of this letter you'll find the commit message that I would have sent in v2, maybe you can salvage some of it. Regards, Barnabás Pőcze > > Thanks > -Jeff > > - > > On Thu, May 23, 2024 at 12:45 PM Andrew Morton > <akpm@linux-foundation.org> wrote: > > > > On Wed, 22 May 2024 19:32:35 -0700 Jeff Xu <jeffxu@google.com> wrote: > > > > > > > > > > It's a change to a userspace API, yes? Please let's have a detailed > > > > description of why this is OK. Why it won't affect any existing users. > > > > > > > Unfortunately, this is a breaking change that might break a > > > application if they do below: > > > memfd_create("", MFD_NOEXEC_SEAL) > > > fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE); <-- this will fail in new > > > semantics, due to mfd not being sealable. > > > > > > However, I still think the new semantics is a better, the reason is > > > due to the sysctl: memfd_noexec_scope > > > Currently, when the sysctl is set to MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL > > > kernel adds MFD_NOEXEC_SEAL to memfd_create, and the memfd becomes sealable. > > > E.g. > > > When the sysctl is set to MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL > > > The app calls memfd_create("",0) > > > application will get sealable memfd, which might be a surprise to application. > > > > > > If the app doesn't want this behavior, they will need one of two below > > > in current implementation. > > > 1> > > > set the sysctl: memfd_noexec_scope to 0. > > > So the kernel doesn't overwrite the mdmfd_create > > > > > > 2> > > > modify their code to get non-sealable NOEXEC memfd. > > > memfd_create("", MEMFD_NOEXEC_SCOPE_NOEXEC) > > > fcntl(fd, F_ADD_SEALS, F_SEAL_SEAL) > > > > > > The new semantics works better with the sysctl. > > > > > > Since memfd noexec is new, maybe there is no application using the > > > MFD_NOEXEC_SEAL to create > > > sealable memfd. They mostly likely use > > > memfd(MFD_NOEXEC_SEAL|MFD_ALLOW_SEALING) instead. > > > I think it might benefit in the long term with the new semantics. > > > > Yes, it's new so I expect any damage will be small. Please prepare a > > v2 which fully explains/justifies the thinking for this > > non-backward-compatible change and which include the cc:stable. > > > > > --- memfd: `MFD_NOEXEC_SEAL` should not imply `MFD_ALLOW_SEALING` `MFD_NOEXEC_SEAL` should remove the executable bits and set `F_SEAL_EXEC` to prevent further modifications to the executable bits as per the comment in the uapi header file: not executable and sealed to prevent changing to executable However, currently, it also unsets `F_SEAL_SEAL`, essentially acting as a superset of `MFD_ALLOW_SEALING`. Nothing implies that it should be so, and indeed up until the second version of the of the patchset[0] that introduced `MFD_EXEC` and `MFD_NOEXEC_SEAL`, `F_SEAL_SEAL` was not removed, however it was changed in the third revision of the patchset[1] without a clear explanation. This behaviour is surprising for application developers, there is no documentation that would reveal that `MFD_NOEXEC_SEAL` has the additional effect of `MFD_ALLOW_SEALING`. Additionally, combined with `vm.memfd_noexec=2` it has the effect of making all memfds initially sealable. So do not remove `F_SEAL_SEAL` when `MFD_NOEXEC_SEAL` is requested, thereby returning to the pre-Linux 6.3 behaviour of only allowing sealing when `MFD_ALLOW_SEALING` is specified. Now, this is technically a uAPI break. However, the damage is expected to be minimal. To trigger user visible change, a program has to do the following steps: - create memfd: - with `MFD_NOEXEC_SEAL`, - without `MFD_ALLOW_SEALING`; - try to add seals / check the seals. But that seems unlikely to happen intentionally since this change essentially reverts the kernel's behaviour to that of Linux <6.3, so if a program worked correctly on those older kernels, it will likely work correctly after this change. I have used Debian Code Search and GitHub to try to find potential breakages, and I could only find a single one. dbus-broker's memfd_create() wrapper is aware of this implicit `MFD_ALLOW_SEALING` behaviour, and tries to work around it[2]. This workaround will break. Luckily, this only affects the test suite, it does not affect the normal operations of dbus-broker. There is a PR with a fix[3]. There was also a previous attempt to address this peculiarity by introducing a new flag[4]. [0]: https://lore.kernel.org/lkml/20220805222126.142525-3-jeffxu@google.com/ [1]: https://lore.kernel.org/lkml/20221202013404.163143-3-jeffxu@google.com/ [2]: https://github.com/bus1/dbus-broker/blob/9eb0b7e5826fc76cad7b025bc46f267d4a8784cb/src/util/misc.c#L114 [3]: https://github.com/bus1/dbus-broker/pull/366 [4]: https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/ Cc: stable@vger.kernel.org Fixes: 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") Signed-off-by: Barnabás Pőcze <pobrn@protonmail.com> Reviewed-by: Jeff Xu <jeffxu@google.com> Reviewed-by: David Rheinsberg <david@readahead.eu> ---
On 2024-05-23, Jeff Xu <jeffxu@google.com> wrote: > On Thu, May 23, 2024 at 1:24 AM David Rheinsberg <david@readahead.eu> wrote: > > > > Hi > > > > On Thu, May 23, 2024, at 4:25 AM, Barnabás Pőcze wrote: > > > 2024. május 23., csütörtök 1:23 keltezéssel, Andrew Morton > > > <akpm@linux-foundation.org> írta: > > >> It's a change to a userspace API, yes? Please let's have a detailed > > >> description of why this is OK. Why it won't affect any existing users. > > > > > > Yes, it is a uAPI change. To trigger user visible change, a program has to > > > > > > - create a memfd > > > - with MFD_NOEXEC_SEAL, > > > - without MFD_ALLOW_SEALING; > > > - try to add seals / check the seals. > > > > > > This change in essence reverts the kernel's behaviour to that of Linux > > > <6.3, where > > > only `MFD_ALLOW_SEALING` enabled sealing. If a program works correctly > > > on those > > > kernels, it will likely work correctly after this change. > > > > > > I have looked through Debian Code Search and GitHub, searching for > > > `MFD_NOEXEC_SEAL`. > > > And I could find only a single breakage that this change would case: > > > dbus-broker > > > has its own memfd_create() wrapper that is aware of this implicit > > > `MFD_ALLOW_SEALING` > > > behaviour[0], and tries to work around it. This workaround will break. > > > Luckily, > > > however, as far as I could tell this only affects the test suite of > > > dbus-broker, > > > not its normal operations, so I believe it should be fine. I have > > > prepared a PR > > > with a fix[1]. > > > > We asked for exactly this fix before, so I very much support this. Our test-suite in `dbus-broker` merely verifies what the current kernel behavior is (just like the kernel selftests). I am certainly ok if the kernel breaks it. I will gladly adapt the test-suite. > > > > Previous discussion was in: > > > > [PATCH] memfd: support MFD_NOEXEC alongside MFD_EXEC > > https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/ > > > > Note that this fix is particularly important in combination with `vm.memfd_noexec=2`, since this breaks existing user-space by enabling sealing on all memfds unconditionally. I also encourage backporting to stable kernels. > > > Also with vm.memfd_noexec=1. > I think that problem must be addressed either with this patch, or with > a new flag. > > Regarding vm.memfd_noexec, on another topic. > I think in addition to vm.memfd_noexec = 1 and 2, there still could > be another state: 3 > > =0. Do nothing. > =1. This will add MFD_NOEXEC_SEAL if application didn't set EXEC or > MFD_NOEXE_SEAL (to help with the migration) > =2: This will reject all calls without MFD_NOEXEC_SEAL (the whole > system doesn't allow executable memfd) > =3: Application must set MFD_EXEC or MFD_NOEXEC_SEAL explicitly, or > else it will be rejected. > > 3 is useful because it lets applications choose what to use, and > forces applications to migrate to new semantics (this is what 2 did > before 9876cfe8). > The caveat is 3 is less restrictive than 2, so must document it clearly. As discussed at the time, "you must use this flag" is not a useful setting for a general purpose operating system because it explicitly disables backwards compatibility (breaking any application that was written in the past 10 years!) for no reason other than "new is better". As I suggested when we fixed the semantics of vm.memfd_noexec, if you really want to block a particular flag from not being set, seccomp lets you do this incredibly easily without acting as a footgun for admins. Yes, vm.memfd_noexec can break programs that use executable memfds, but that is the point of the sysctl -- making vm.memfd_noexec break programs that don't use executable memfds (they are only guilty of being written before mid-2023) is not useful. In addition, making 3 less restrictive than 2 would make the original restriction mechanism useless. A malicious process could raise the setting to 3 and disable the "protection" (as discussed before, I really don't understand the threat model here, but making it possible to disable easily is pretty clearly). You could change the policy, but now you're adding more complexity for a feature that IMO doesn't really make sense in the first place. > -Jeff > > > Reviewed-by: David Rheinsberg <david@readahead.eu> > > > > Thanks > > David
diff --git a/mm/memfd.c b/mm/memfd.c index 7d8d3ab3fa37..8b7f6afee21d 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -356,12 +356,11 @@ SYSCALL_DEFINE2(memfd_create, inode->i_mode &= ~0111; file_seals = memfd_file_seals_ptr(file); - if (file_seals) { - *file_seals &= ~F_SEAL_SEAL; + if (file_seals) *file_seals |= F_SEAL_EXEC; - } - } else if (flags & MFD_ALLOW_SEALING) { - /* MFD_EXEC and MFD_ALLOW_SEALING are set */ + } + + if (flags & MFD_ALLOW_SEALING) { file_seals = memfd_file_seals_ptr(file); if (file_seals) *file_seals &= ~F_SEAL_SEAL; diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c index 18f585684e20..b6a7ad68c3c1 100644 --- a/tools/testing/selftests/memfd/memfd_test.c +++ b/tools/testing/selftests/memfd/memfd_test.c @@ -1151,7 +1151,7 @@ static void test_noexec_seal(void) mfd_def_size, MFD_CLOEXEC | MFD_NOEXEC_SEAL); mfd_assert_mode(fd, 0666); - mfd_assert_has_seals(fd, F_SEAL_EXEC); + mfd_assert_has_seals(fd, F_SEAL_SEAL | F_SEAL_EXEC); mfd_fail_chmod(fd, 0777); close(fd); }
`MFD_NOEXEC_SEAL` should remove the executable bits and set `F_SEAL_EXEC` to prevent further modifications to the executable bits as per the comment in the uapi header file: not executable and sealed to prevent changing to executable However, currently, it also unsets `F_SEAL_SEAL`, essentially acting as a superset of `MFD_ALLOW_SEALING`. Nothing implies that it should be so, and indeed up until the second version of the of the patchset[0] that introduced `MFD_EXEC` and `MFD_NOEXEC_SEAL`, `F_SEAL_SEAL` was not removed, however it was changed in the third revision of the patchset[1] without a clear explanation. This behaviour is suprising for application developers, there is no documentation that would reveal that `MFD_NOEXEC_SEAL` has the additional effect of `MFD_ALLOW_SEALING`. So do not remove `F_SEAL_SEAL` when `MFD_NOEXEC_SEAL` is requested. This is technically an ABI break, but it seems very unlikely that an application would depend on this behaviour (unless by accident). [0]: https://lore.kernel.org/lkml/20220805222126.142525-3-jeffxu@google.com/ [1]: https://lore.kernel.org/lkml/20221202013404.163143-3-jeffxu@google.com/ Fixes: 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") Signed-off-by: Barnabás Pőcze <pobrn@protonmail.com> --- Or did I miss the explanation as to why MFD_NOEXEC_SEAL should imply MFD_ALLOW_SEALING? If so, please direct me to it and sorry for the noise. --- mm/memfd.c | 9 ++++----- tools/testing/selftests/memfd/memfd_test.c | 2 +- 2 files changed, 5 insertions(+), 6 deletions(-)