diff mbox series

[v1] memfd: `MFD_NOEXEC_SEAL` should not imply `MFD_ALLOW_SEALING`

Message ID 20240513191544.94754-1-pobrn@protonmail.com
State New
Headers show
Series [v1] memfd: `MFD_NOEXEC_SEAL` should not imply `MFD_ALLOW_SEALING` | expand

Commit Message

Barnabás Pőcze May 13, 2024, 7:15 p.m. UTC
`MFD_NOEXEC_SEAL` should remove the executable bits and set
`F_SEAL_EXEC` to prevent further modifications to the executable
bits as per the comment in the uapi header file:

  not executable and sealed to prevent changing to executable

However, currently, it also unsets `F_SEAL_SEAL`, essentially
acting as a superset of `MFD_ALLOW_SEALING`. Nothing implies
that it should be so, and indeed up until the second version
of the of the patchset[0] that introduced `MFD_EXEC` and
`MFD_NOEXEC_SEAL`, `F_SEAL_SEAL` was not removed, however it
was changed in the third revision of the patchset[1] without
a clear explanation.

This behaviour is suprising for application developers,
there is no documentation that would reveal that `MFD_NOEXEC_SEAL`
has the additional effect of `MFD_ALLOW_SEALING`.

So do not remove `F_SEAL_SEAL` when `MFD_NOEXEC_SEAL` is requested.
This is technically an ABI break, but it seems very unlikely that an
application would depend on this behaviour (unless by accident).

[0]: https://lore.kernel.org/lkml/20220805222126.142525-3-jeffxu@google.com/
[1]: https://lore.kernel.org/lkml/20221202013404.163143-3-jeffxu@google.com/

Fixes: 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
Signed-off-by: Barnabás Pőcze <pobrn@protonmail.com>
---

Or did I miss the explanation as to why MFD_NOEXEC_SEAL should
imply MFD_ALLOW_SEALING? If so, please direct me to it and
sorry for the noise.

---
 mm/memfd.c                                 | 9 ++++-----
 tools/testing/selftests/memfd/memfd_test.c | 2 +-
 2 files changed, 5 insertions(+), 6 deletions(-)

Comments

Jeff Xu May 16, 2024, 6:11 a.m. UTC | #1
On Mon, May 13, 2024 at 12:15 PM Barnabás Pőcze <pobrn@protonmail.com> wrote:
>
> `MFD_NOEXEC_SEAL` should remove the executable bits and set
> `F_SEAL_EXEC` to prevent further modifications to the executable
> bits as per the comment in the uapi header file:
>
>   not executable and sealed to prevent changing to executable
>
> However, currently, it also unsets `F_SEAL_SEAL`, essentially
> acting as a superset of `MFD_ALLOW_SEALING`. Nothing implies
> that it should be so, and indeed up until the second version
> of the of the patchset[0] that introduced `MFD_EXEC` and
> `MFD_NOEXEC_SEAL`, `F_SEAL_SEAL` was not removed, however it
> was changed in the third revision of the patchset[1] without
> a clear explanation.
>
> This behaviour is suprising for application developers,
> there is no documentation that would reveal that `MFD_NOEXEC_SEAL`
> has the additional effect of `MFD_ALLOW_SEALING`.
>
Ya, I agree that there should be documentation, such as a man page. I will
work on that.

> So do not remove `F_SEAL_SEAL` when `MFD_NOEXEC_SEAL` is requested.
> This is technically an ABI break, but it seems very unlikely that an
> application would depend on this behaviour (unless by accident).
>
> [0]: https://lore.kernel.org/lkml/20220805222126.142525-3-jeffxu@google.com/
> [1]: https://lore.kernel.org/lkml/20221202013404.163143-3-jeffxu@google.com/
>
> Fixes: 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
> Signed-off-by: Barnabás Pőcze <pobrn@protonmail.com>
> ---
>
> Or did I miss the explanation as to why MFD_NOEXEC_SEAL should
> imply MFD_ALLOW_SEALING? If so, please direct me to it and
> sorry for the noise.
>
Previously I might be thinking  MFD_NOEXEC_SEAL implies
MFD_ALLOW_SEALING because MFD_NOEXEC_SEAL seals F_SEAL_EXEC, and
sealing is added only when MFD_ALLOW_SEALING is set.

I agree your patch handles this better, e.g.
mfd_create(MFD_NOEXEC_SEAL) will have F_SEAL_SEAL and F_SEAL_EXEC
mfd_create(MFD_NOEXEC_SEAL|MFD_ALLOW_SEALING) will have F_SEAL_EXEC


> ---
>  mm/memfd.c                                 | 9 ++++-----
>  tools/testing/selftests/memfd/memfd_test.c | 2 +-
>  2 files changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/mm/memfd.c b/mm/memfd.c
> index 7d8d3ab3fa37..8b7f6afee21d 100644
> --- a/mm/memfd.c
> +++ b/mm/memfd.c
> @@ -356,12 +356,11 @@ SYSCALL_DEFINE2(memfd_create,
>
>                 inode->i_mode &= ~0111;
>                 file_seals = memfd_file_seals_ptr(file);
> -               if (file_seals) {
> -                       *file_seals &= ~F_SEAL_SEAL;
> +               if (file_seals)
>                         *file_seals |= F_SEAL_EXEC;
> -               }
> -       } else if (flags & MFD_ALLOW_SEALING) {
> -               /* MFD_EXEC and MFD_ALLOW_SEALING are set */
> +       }
> +
> +       if (flags & MFD_ALLOW_SEALING) {
>                 file_seals = memfd_file_seals_ptr(file);
>                 if (file_seals)
>                         *file_seals &= ~F_SEAL_SEAL;
> diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c
> index 18f585684e20..b6a7ad68c3c1 100644
> --- a/tools/testing/selftests/memfd/memfd_test.c
> +++ b/tools/testing/selftests/memfd/memfd_test.c
> @@ -1151,7 +1151,7 @@ static void test_noexec_seal(void)
>                             mfd_def_size,
>                             MFD_CLOEXEC | MFD_NOEXEC_SEAL);
>         mfd_assert_mode(fd, 0666);
> -       mfd_assert_has_seals(fd, F_SEAL_EXEC);
> +       mfd_assert_has_seals(fd, F_SEAL_SEAL | F_SEAL_EXEC);
>         mfd_fail_chmod(fd, 0777);
>         close(fd);
>  }
> --
> 2.45.0
>

Reviewed-by: Jeff Xu <jeffxu@google.com>

Thanks!
-Jeff
Andrew Morton May 22, 2024, 11:23 p.m. UTC | #2
On Wed, 15 May 2024 23:11:12 -0700 Jeff Xu <jeffxu@google.com> wrote:

> On Mon, May 13, 2024 at 12:15 PM Barnabás Pőcze <pobrn@protonmail.com> wrote:
> >
> > `MFD_NOEXEC_SEAL` should remove the executable bits and set
> > `F_SEAL_EXEC` to prevent further modifications to the executable
> > bits as per the comment in the uapi header file:
> >
> >   not executable and sealed to prevent changing to executable
> >
> > However, currently, it also unsets `F_SEAL_SEAL`, essentially
> > acting as a superset of `MFD_ALLOW_SEALING`. Nothing implies
> > that it should be so, and indeed up until the second version
> > of the of the patchset[0] that introduced `MFD_EXEC` and
> > `MFD_NOEXEC_SEAL`, `F_SEAL_SEAL` was not removed, however it
> > was changed in the third revision of the patchset[1] without
> > a clear explanation.
> >
> > This behaviour is suprising for application developers,
> > there is no documentation that would reveal that `MFD_NOEXEC_SEAL`
> > has the additional effect of `MFD_ALLOW_SEALING`.
> >
> Ya, I agree that there should be documentation, such as a man page. I will
> work on that.
> 
> > So do not remove `F_SEAL_SEAL` when `MFD_NOEXEC_SEAL` is requested.
> > This is technically an ABI break, but it seems very unlikely that an
> > application would depend on this behaviour (unless by accident).
> >
> > [0]: https://lore.kernel.org/lkml/20220805222126.142525-3-jeffxu@google.com/
> > [1]: https://lore.kernel.org/lkml/20221202013404.163143-3-jeffxu@google.com/
> 
> ...
>
> Reviewed-by: Jeff Xu <jeffxu@google.com>

It's a change to a userspace API, yes?  Please let's have a detailed
description of why this is OK.  Why it won't affect any existing users.

Also, please let's give consideration to a -stable backport so that all
kernel versions will eventually behave in the same manner.
Barnabás Pőcze May 23, 2024, 2:25 a.m. UTC | #3
Hi


2024. május 23., csütörtök 1:23 keltezéssel, Andrew Morton <akpm@linux-foundation.org> írta:

> On Wed, 15 May 2024 23:11:12 -0700 Jeff Xu <jeffxu@google.com> wrote:
> 
> > On Mon, May 13, 2024 at 12:15 PM Barnabás Pőcze <pobrn@protonmail.com> wrote:
> > >
> > > `MFD_NOEXEC_SEAL` should remove the executable bits and set
> > > `F_SEAL_EXEC` to prevent further modifications to the executable
> > > bits as per the comment in the uapi header file:
> > >
> > >   not executable and sealed to prevent changing to executable
> > >
> > > However, currently, it also unsets `F_SEAL_SEAL`, essentially
> > > acting as a superset of `MFD_ALLOW_SEALING`. Nothing implies
> > > that it should be so, and indeed up until the second version
> > > of the of the patchset[0] that introduced `MFD_EXEC` and
> > > `MFD_NOEXEC_SEAL`, `F_SEAL_SEAL` was not removed, however it
> > > was changed in the third revision of the patchset[1] without
> > > a clear explanation.
> > >
> > > This behaviour is suprising for application developers,
> > > there is no documentation that would reveal that `MFD_NOEXEC_SEAL`
> > > has the additional effect of `MFD_ALLOW_SEALING`.
> > >
> > Ya, I agree that there should be documentation, such as a man page. I will
> > work on that.
> >
> > > So do not remove `F_SEAL_SEAL` when `MFD_NOEXEC_SEAL` is requested.
> > > This is technically an ABI break, but it seems very unlikely that an
> > > application would depend on this behaviour (unless by accident).
> > >
> > > [0]: https://lore.kernel.org/lkml/20220805222126.142525-3-jeffxu@google.com/
> > > [1]: https://lore.kernel.org/lkml/20221202013404.163143-3-jeffxu@google.com/
> >
> > ...
> >
> > Reviewed-by: Jeff Xu <jeffxu@google.com>
> 
> It's a change to a userspace API, yes?  Please let's have a detailed
> description of why this is OK.  Why it won't affect any existing users.

Yes, it is a uAPI change. To trigger user visible change, a program has to

 - create a memfd
   - with MFD_NOEXEC_SEAL,
   - without MFD_ALLOW_SEALING;
 - try to add seals / check the seals.

This change in essence reverts the kernel's behaviour to that of Linux <6.3, where
only `MFD_ALLOW_SEALING` enabled sealing. If a program works correctly on those
kernels, it will likely work correctly after this change.

I have looked through Debian Code Search and GitHub, searching for `MFD_NOEXEC_SEAL`.
And I could find only a single breakage that this change would case: dbus-broker
has its own memfd_create() wrapper that is aware of this implicit `MFD_ALLOW_SEALING`
behaviour[0], and tries to work around it. This workaround will break. Luckily,
however, as far as I could tell this only affects the test suite of dbus-broker,
not its normal operations, so I believe it should be fine. I have prepared a PR
with a fix[1].


> 
> Also, please let's give consideration to a -stable backport so that all
> kernel versions will eventually behave in the same manner.
> 
> 

I think that is a good idea, should I resend this with the `Cc: stable@...` tag or
what should I do?


Regards,
Barnabás Pőcze


[0]: https://github.com/bus1/dbus-broker/blob/9eb0b7e5826fc76cad7b025bc46f267d4a8784cb/src/util/misc.c#L114
[1]: https://github.com/bus1/dbus-broker/pull/366
Jeff Xu May 23, 2024, 2:32 a.m. UTC | #4
On Wed, May 22, 2024 at 4:23 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Wed, 15 May 2024 23:11:12 -0700 Jeff Xu <jeffxu@google.com> wrote:
>
> > On Mon, May 13, 2024 at 12:15 PM Barnabás Pőcze <pobrn@protonmail.com> wrote:
> > >
> > > `MFD_NOEXEC_SEAL` should remove the executable bits and set
> > > `F_SEAL_EXEC` to prevent further modifications to the executable
> > > bits as per the comment in the uapi header file:
> > >
> > >   not executable and sealed to prevent changing to executable
> > >
> > > However, currently, it also unsets `F_SEAL_SEAL`, essentially
> > > acting as a superset of `MFD_ALLOW_SEALING`. Nothing implies
> > > that it should be so, and indeed up until the second version
> > > of the of the patchset[0] that introduced `MFD_EXEC` and
> > > `MFD_NOEXEC_SEAL`, `F_SEAL_SEAL` was not removed, however it
> > > was changed in the third revision of the patchset[1] without
> > > a clear explanation.
> > >
> > > This behaviour is suprising for application developers,
> > > there is no documentation that would reveal that `MFD_NOEXEC_SEAL`
> > > has the additional effect of `MFD_ALLOW_SEALING`.
> > >
> > Ya, I agree that there should be documentation, such as a man page. I will
> > work on that.
> >
> > > So do not remove `F_SEAL_SEAL` when `MFD_NOEXEC_SEAL` is requested.
> > > This is technically an ABI break, but it seems very unlikely that an
> > > application would depend on this behaviour (unless by accident).
> > >
> > > [0]: https://lore.kernel.org/lkml/20220805222126.142525-3-jeffxu@google.com/
> > > [1]: https://lore.kernel.org/lkml/20221202013404.163143-3-jeffxu@google.com/
> >
> > ...
> >
> > Reviewed-by: Jeff Xu <jeffxu@google.com>
>
> It's a change to a userspace API, yes?  Please let's have a detailed
> description of why this is OK.  Why it won't affect any existing users.
>
Unfortunately, this is a breaking change that might break a
application if they do below:
memfd_create("", MFD_NOEXEC_SEAL)
fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE); <-- this will fail in new
semantics, due to mfd not being sealable.

However, I still think the new semantics is a better, the reason is
due to  the sysctl: memfd_noexec_scope
Currently, when the sysctl  is set to MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL
kernel adds MFD_NOEXEC_SEAL to memfd_create, and the memfd  becomes sealable.
E.g.
When the sysctl is set to MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL
The app calls memfd_create("",0)
application will get sealable memfd, which might be a surprise to application.

If the app doesn't want this behavior, they will need one of two below
in current implementation.
1>
set the sysctl: memfd_noexec_scope to 0.
So the kernel doesn't overwrite the mdmfd_create

2>
modify their code  to get non-sealable NOEXEC memfd.
memfd_create("", MEMFD_NOEXEC_SCOPE_NOEXEC)
fcntl(fd, F_ADD_SEALS, F_SEAL_SEAL)

The new semantics works better with the sysctl.

Since memfd noexec is new, maybe there is no application using the
MFD_NOEXEC_SEAL to create
sealable memfd. They mostly likely use
memfd(MFD_NOEXEC_SEAL|MFD_ALLOW_SEALING) instead.
I think it might benefit in the long term with the new semantics.

If breaking change is not recommended,  the alternative is to
introduce a new flag.
MFD_NOEXEC_SEAL_SEAL. (I can't find a better name...)

What do you think ?

> Also, please let's give consideration to a -stable backport so that all
> kernel versions will eventually behave in the same manner.
>
Yes. If the new semantics is acceptable, backport is needed as bugfix
to all kernel versions.
I can do that if someone helps me with the process.

And sorry about this bug that I created.
David Rheinsberg May 23, 2024, 8:24 a.m. UTC | #5
Hi

On Thu, May 23, 2024, at 4:25 AM, Barnabás Pőcze wrote:
> 2024. május 23., csütörtök 1:23 keltezéssel, Andrew Morton 
> <akpm@linux-foundation.org> írta:
>> It's a change to a userspace API, yes?  Please let's have a detailed
>> description of why this is OK.  Why it won't affect any existing users.
>
> Yes, it is a uAPI change. To trigger user visible change, a program has to
>
>  - create a memfd
>    - with MFD_NOEXEC_SEAL,
>    - without MFD_ALLOW_SEALING;
>  - try to add seals / check the seals.
>
> This change in essence reverts the kernel's behaviour to that of Linux 
> <6.3, where
> only `MFD_ALLOW_SEALING` enabled sealing. If a program works correctly 
> on those
> kernels, it will likely work correctly after this change.
>
> I have looked through Debian Code Search and GitHub, searching for 
> `MFD_NOEXEC_SEAL`.
> And I could find only a single breakage that this change would case: 
> dbus-broker
> has its own memfd_create() wrapper that is aware of this implicit 
> `MFD_ALLOW_SEALING`
> behaviour[0], and tries to work around it. This workaround will break. 
> Luckily,
> however, as far as I could tell this only affects the test suite of 
> dbus-broker,
> not its normal operations, so I believe it should be fine. I have 
> prepared a PR
> with a fix[1].

We asked for exactly this fix before, so I very much support this. Our test-suite in `dbus-broker` merely verifies what the current kernel behavior is (just like the kernel selftests). I am certainly ok if the kernel breaks it. I will gladly adapt the test-suite.

Previous discussion was in:

    [PATCH] memfd: support MFD_NOEXEC alongside MFD_EXEC
    https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/

Note that this fix is particularly important in combination with `vm.memfd_noexec=2`, since this breaks existing user-space by enabling sealing on all memfds unconditionally. I also encourage backporting to stable kernels.

Reviewed-by: David Rheinsberg <david@readahead.eu>

Thanks
David
Jeff Xu May 23, 2024, 4:20 p.m. UTC | #6
On Thu, May 23, 2024 at 1:24 AM David Rheinsberg <david@readahead.eu> wrote:
>
> Hi
>
> On Thu, May 23, 2024, at 4:25 AM, Barnabás Pőcze wrote:
> > 2024. május 23., csütörtök 1:23 keltezéssel, Andrew Morton
> > <akpm@linux-foundation.org> írta:
> >> It's a change to a userspace API, yes?  Please let's have a detailed
> >> description of why this is OK.  Why it won't affect any existing users.
> >
> > Yes, it is a uAPI change. To trigger user visible change, a program has to
> >
> >  - create a memfd
> >    - with MFD_NOEXEC_SEAL,
> >    - without MFD_ALLOW_SEALING;
> >  - try to add seals / check the seals.
> >
> > This change in essence reverts the kernel's behaviour to that of Linux
> > <6.3, where
> > only `MFD_ALLOW_SEALING` enabled sealing. If a program works correctly
> > on those
> > kernels, it will likely work correctly after this change.
> >
> > I have looked through Debian Code Search and GitHub, searching for
> > `MFD_NOEXEC_SEAL`.
> > And I could find only a single breakage that this change would case:
> > dbus-broker
> > has its own memfd_create() wrapper that is aware of this implicit
> > `MFD_ALLOW_SEALING`
> > behaviour[0], and tries to work around it. This workaround will break.
> > Luckily,
> > however, as far as I could tell this only affects the test suite of
> > dbus-broker,
> > not its normal operations, so I believe it should be fine. I have
> > prepared a PR
> > with a fix[1].
>
> We asked for exactly this fix before, so I very much support this. Our test-suite in `dbus-broker` merely verifies what the current kernel behavior is (just like the kernel selftests). I am certainly ok if the kernel breaks it. I will gladly adapt the test-suite.
>
> Previous discussion was in:
>
>     [PATCH] memfd: support MFD_NOEXEC alongside MFD_EXEC
>     https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/
>
> Note that this fix is particularly important in combination with `vm.memfd_noexec=2`, since this breaks existing user-space by enabling sealing on all memfds unconditionally. I also encourage backporting to stable kernels.
>
Also with vm.memfd_noexec=1.
I think that problem must be addressed either with this patch, or with
a new flag.

Regarding vm.memfd_noexec, on another topic.
I think in addition to  vm.memfd_noexec = 1 and 2,  there still could
be another state: 3

=0. Do nothing.
=1. This will add MFD_NOEXEC_SEAL if application didn't set EXEC or
MFD_NOEXE_SEAL (to help with the migration)
=2: This will reject all calls without MFD_NOEXEC_SEAL (the whole
system doesn't allow executable memfd)
=3:  Application must set MFD_EXEC or MFD_NOEXEC_SEAL explicitly, or
else it will be rejected.

3 is useful because it lets applications choose what to use, and
forces applications to migrate to new semantics (this is what 2 did
before 9876cfe8).
The caveat is 3 is less restrictive than 2, so must document it clearly.

-Jeff

> Reviewed-by: David Rheinsberg <david@readahead.eu>
>
> Thanks
> David
Jeff Xu May 23, 2024, 4:55 p.m. UTC | #7
On Thu, May 23, 2024 at 9:20 AM Jeff Xu <jeffxu@google.com> wrote:
>
> On Thu, May 23, 2024 at 1:24 AM David Rheinsberg <david@readahead.eu> wrote:
> >
> > Hi
> >
> > On Thu, May 23, 2024, at 4:25 AM, Barnabás Pőcze wrote:
> > > 2024. május 23., csütörtök 1:23 keltezéssel, Andrew Morton
> > > <akpm@linux-foundation.org> írta:
> > >> It's a change to a userspace API, yes?  Please let's have a detailed
> > >> description of why this is OK.  Why it won't affect any existing users.
> > >
> > > Yes, it is a uAPI change. To trigger user visible change, a program has to
> > >
> > >  - create a memfd
> > >    - with MFD_NOEXEC_SEAL,
> > >    - without MFD_ALLOW_SEALING;
> > >  - try to add seals / check the seals.
> > >
> > > This change in essence reverts the kernel's behaviour to that of Linux
> > > <6.3, where
> > > only `MFD_ALLOW_SEALING` enabled sealing. If a program works correctly
> > > on those
> > > kernels, it will likely work correctly after this change.
> > >
> > > I have looked through Debian Code Search and GitHub, searching for
> > > `MFD_NOEXEC_SEAL`.
> > > And I could find only a single breakage that this change would case:
> > > dbus-broker
> > > has its own memfd_create() wrapper that is aware of this implicit
> > > `MFD_ALLOW_SEALING`
> > > behaviour[0], and tries to work around it. This workaround will break.
> > > Luckily,
> > > however, as far as I could tell this only affects the test suite of
> > > dbus-broker,
> > > not its normal operations, so I believe it should be fine. I have
> > > prepared a PR
> > > with a fix[1].
> >
> > We asked for exactly this fix before, so I very much support this. Our test-suite in `dbus-broker` merely verifies what the current kernel behavior is (just like the kernel selftests). I am certainly ok if the kernel breaks it. I will gladly adapt the test-suite.
> >
memfd is by default not sealable, and file is by default sealable,
right ? that makes the memfd  semantics different from other objects
in linux.
I wonder what is the original reason to have memfd  this way?

Another solution is to change memfd to be by-default sealable,
although that will be an api change, but what side effect  will it be
?
If we are worried about the memfd being sealed by an attacker, the
malicious code could also overwrite the content since memfd is not
sealed.

> > Previous discussion was in:
> >
> >     [PATCH] memfd: support MFD_NOEXEC alongside MFD_EXEC
> >     https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/
> >
> > Note that this fix is particularly important in combination with `vm.memfd_noexec=2`, since this breaks existing user-space by enabling sealing on all memfds unconditionally. I also encourage backporting to stable kernels.
> >
> Also with vm.memfd_noexec=1.
> I think that problem must be addressed either with this patch, or with
> a new flag.
>
> Regarding vm.memfd_noexec, on another topic.
> I think in addition to  vm.memfd_noexec = 1 and 2,  there still could
> be another state: 3
>
> =0. Do nothing.
> =1. This will add MFD_NOEXEC_SEAL if application didn't set EXEC or
> MFD_NOEXE_SEAL (to help with the migration)
> =2: This will reject all calls without MFD_NOEXEC_SEAL (the whole
> system doesn't allow executable memfd)
> =3:  Application must set MFD_EXEC or MFD_NOEXEC_SEAL explicitly, or
> else it will be rejected.
>
> 3 is useful because it lets applications choose what to use, and
> forces applications to migrate to new semantics (this is what 2 did
> before 9876cfe8).
> The caveat is 3 is less restrictive than 2, so must document it clearly.
>
> -Jeff
>
> > Reviewed-by: David Rheinsberg <david@readahead.eu>
> >
> > Thanks
> > David
Barnabás Pőcze May 23, 2024, 8:50 p.m. UTC | #8
2024. május 23., csütörtök 22:44 keltezéssel, Jeff Xu <jeffxu@google.com> írta:

> Hi Barnabás
> 
> Is that OK that I work on V2 ? It will be based on your V1 change and
> I will also add more test cases.

Sure, please go ahead. At the very end of this letter you'll find
the commit message that I would have sent in v2, maybe you can salvage
some of it.


Regards,
Barnabás Pőcze


> 
> Thanks
> -Jeff
> 
> -
> 
> On Thu, May 23, 2024 at 12:45 PM Andrew Morton
> <akpm@linux-foundation.org> wrote:
> >
> > On Wed, 22 May 2024 19:32:35 -0700 Jeff Xu <jeffxu@google.com> wrote:
> >
> > > >
> > > > It's a change to a userspace API, yes?  Please let's have a detailed
> > > > description of why this is OK.  Why it won't affect any existing users.
> > > >
> > > Unfortunately, this is a breaking change that might break a
> > > application if they do below:
> > > memfd_create("", MFD_NOEXEC_SEAL)
> > > fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE); <-- this will fail in new
> > > semantics, due to mfd not being sealable.
> > >
> > > However, I still think the new semantics is a better, the reason is
> > > due to  the sysctl: memfd_noexec_scope
> > > Currently, when the sysctl  is set to MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL
> > > kernel adds MFD_NOEXEC_SEAL to memfd_create, and the memfd  becomes sealable.
> > > E.g.
> > > When the sysctl is set to MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL
> > > The app calls memfd_create("",0)
> > > application will get sealable memfd, which might be a surprise to application.
> > >
> > > If the app doesn't want this behavior, they will need one of two below
> > > in current implementation.
> > > 1>
> > > set the sysctl: memfd_noexec_scope to 0.
> > > So the kernel doesn't overwrite the mdmfd_create
> > >
> > > 2>
> > > modify their code  to get non-sealable NOEXEC memfd.
> > > memfd_create("", MEMFD_NOEXEC_SCOPE_NOEXEC)
> > > fcntl(fd, F_ADD_SEALS, F_SEAL_SEAL)
> > >
> > > The new semantics works better with the sysctl.
> > >
> > > Since memfd noexec is new, maybe there is no application using the
> > > MFD_NOEXEC_SEAL to create
> > > sealable memfd. They mostly likely use
> > > memfd(MFD_NOEXEC_SEAL|MFD_ALLOW_SEALING) instead.
> > > I think it might benefit in the long term with the new semantics.
> >
> > Yes, it's new so I expect any damage will be small.  Please prepare a
> > v2 which fully explains/justifies the thinking for this
> > non-backward-compatible change and which include the cc:stable.
> >
> >
> 

---

memfd: `MFD_NOEXEC_SEAL` should not imply `MFD_ALLOW_SEALING`

`MFD_NOEXEC_SEAL` should remove the executable bits and
set `F_SEAL_EXEC` to prevent further modifications to the
executable bits as per the comment in the uapi header file:

  not executable and sealed to prevent changing to executable

However, currently, it also unsets `F_SEAL_SEAL`, essentially
acting as a superset of `MFD_ALLOW_SEALING`. Nothing implies that
it should be so, and indeed up until the second version of the of
the patchset[0] that introduced `MFD_EXEC` and `MFD_NOEXEC_SEAL`,
`F_SEAL_SEAL` was not removed, however it was changed in the
third revision of the patchset[1] without a clear explanation.

This behaviour is surprising for application developers, there
is no documentation that would reveal that `MFD_NOEXEC_SEAL`
has the additional effect of `MFD_ALLOW_SEALING`.
Additionally, combined with `vm.memfd_noexec=2` it has
the effect of making all memfds initially sealable.

So do not remove `F_SEAL_SEAL` when `MFD_NOEXEC_SEAL` is
requested, thereby returning to the pre-Linux 6.3 behaviour of
only allowing sealing when `MFD_ALLOW_SEALING` is specified.

Now, this is technically a uAPI break. However, the
damage is expected to be minimal. To trigger user
visible change, a program has to do the following steps:

 - create memfd:
   - with `MFD_NOEXEC_SEAL`,
   - without `MFD_ALLOW_SEALING`;
 - try to add seals / check the seals.

But that seems unlikely to happen intentionally since this
change essentially reverts the kernel's behaviour to that of
Linux <6.3, so if a program worked correctly on those older
kernels, it will likely work correctly after this change.

I have used Debian Code Search and GitHub to try to find potential
breakages, and I could only find a single one. dbus-broker's
memfd_create() wrapper is aware of this implicit `MFD_ALLOW_SEALING`
behaviour, and tries to work around it[2]. This workaround will
break. Luckily, this only affects the test suite, it does not affect
the normal operations of dbus-broker. There is a PR with a fix[3].

There was also a previous attempt to address
this peculiarity by introducing a new flag[4].

[0]: https://lore.kernel.org/lkml/20220805222126.142525-3-jeffxu@google.com/
[1]: https://lore.kernel.org/lkml/20221202013404.163143-3-jeffxu@google.com/
[2]: https://github.com/bus1/dbus-broker/blob/9eb0b7e5826fc76cad7b025bc46f267d4a8784cb/src/util/misc.c#L114
[3]: https://github.com/bus1/dbus-broker/pull/366
[4]: https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/

Cc: stable@vger.kernel.org
Fixes: 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
Signed-off-by: Barnabás Pőcze <pobrn@protonmail.com>
Reviewed-by: Jeff Xu <jeffxu@google.com>
Reviewed-by: David Rheinsberg <david@readahead.eu>

---
Aleksa Sarai May 24, 2024, 4:12 p.m. UTC | #9
On 2024-05-23, Jeff Xu <jeffxu@google.com> wrote:
> On Thu, May 23, 2024 at 1:24 AM David Rheinsberg <david@readahead.eu> wrote:
> >
> > Hi
> >
> > On Thu, May 23, 2024, at 4:25 AM, Barnabás Pőcze wrote:
> > > 2024. május 23., csütörtök 1:23 keltezéssel, Andrew Morton
> > > <akpm@linux-foundation.org> írta:
> > >> It's a change to a userspace API, yes?  Please let's have a detailed
> > >> description of why this is OK.  Why it won't affect any existing users.
> > >
> > > Yes, it is a uAPI change. To trigger user visible change, a program has to
> > >
> > >  - create a memfd
> > >    - with MFD_NOEXEC_SEAL,
> > >    - without MFD_ALLOW_SEALING;
> > >  - try to add seals / check the seals.
> > >
> > > This change in essence reverts the kernel's behaviour to that of Linux
> > > <6.3, where
> > > only `MFD_ALLOW_SEALING` enabled sealing. If a program works correctly
> > > on those
> > > kernels, it will likely work correctly after this change.
> > >
> > > I have looked through Debian Code Search and GitHub, searching for
> > > `MFD_NOEXEC_SEAL`.
> > > And I could find only a single breakage that this change would case:
> > > dbus-broker
> > > has its own memfd_create() wrapper that is aware of this implicit
> > > `MFD_ALLOW_SEALING`
> > > behaviour[0], and tries to work around it. This workaround will break.
> > > Luckily,
> > > however, as far as I could tell this only affects the test suite of
> > > dbus-broker,
> > > not its normal operations, so I believe it should be fine. I have
> > > prepared a PR
> > > with a fix[1].
> >
> > We asked for exactly this fix before, so I very much support this. Our test-suite in `dbus-broker` merely verifies what the current kernel behavior is (just like the kernel selftests). I am certainly ok if the kernel breaks it. I will gladly adapt the test-suite.
> >
> > Previous discussion was in:
> >
> >     [PATCH] memfd: support MFD_NOEXEC alongside MFD_EXEC
> >     https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/
> >
> > Note that this fix is particularly important in combination with `vm.memfd_noexec=2`, since this breaks existing user-space by enabling sealing on all memfds unconditionally. I also encourage backporting to stable kernels.
> >
> Also with vm.memfd_noexec=1.
> I think that problem must be addressed either with this patch, or with
> a new flag.
> 
> Regarding vm.memfd_noexec, on another topic.
> I think in addition to  vm.memfd_noexec = 1 and 2,  there still could
> be another state: 3
> 
> =0. Do nothing.
> =1. This will add MFD_NOEXEC_SEAL if application didn't set EXEC or
> MFD_NOEXE_SEAL (to help with the migration)
> =2: This will reject all calls without MFD_NOEXEC_SEAL (the whole
> system doesn't allow executable memfd)
> =3:  Application must set MFD_EXEC or MFD_NOEXEC_SEAL explicitly, or
> else it will be rejected.
> 
> 3 is useful because it lets applications choose what to use, and
> forces applications to migrate to new semantics (this is what 2 did
> before 9876cfe8).
> The caveat is 3 is less restrictive than 2, so must document it clearly.

As discussed at the time, "you must use this flag" is not a useful
setting for a general purpose operating system because it explicitly
disables backwards compatibility (breaking any application that was
written in the past 10 years!) for no reason other than "new is better".

As I suggested when we fixed the semantics of vm.memfd_noexec, if you
really want to block a particular flag from not being set, seccomp lets
you do this incredibly easily without acting as a footgun for admins.
Yes, vm.memfd_noexec can break programs that use executable memfds, but
that is the point of the sysctl -- making vm.memfd_noexec break programs
that don't use executable memfds (they are only guilty of being written
before mid-2023) is not useful.

In addition, making 3 less restrictive than 2 would make the original
restriction mechanism useless. A malicious process could raise the
setting to 3 and disable the "protection" (as discussed before, I really
don't understand the threat model here, but making it possible to
disable easily is pretty clearly). You could change the policy, but now
you're adding more complexity for a feature that IMO doesn't really make
sense in the first place.

> -Jeff
> 
> > Reviewed-by: David Rheinsberg <david@readahead.eu>
> >
> > Thanks
> > David
diff mbox series

Patch

diff --git a/mm/memfd.c b/mm/memfd.c
index 7d8d3ab3fa37..8b7f6afee21d 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -356,12 +356,11 @@  SYSCALL_DEFINE2(memfd_create,
 
 		inode->i_mode &= ~0111;
 		file_seals = memfd_file_seals_ptr(file);
-		if (file_seals) {
-			*file_seals &= ~F_SEAL_SEAL;
+		if (file_seals)
 			*file_seals |= F_SEAL_EXEC;
-		}
-	} else if (flags & MFD_ALLOW_SEALING) {
-		/* MFD_EXEC and MFD_ALLOW_SEALING are set */
+	}
+
+	if (flags & MFD_ALLOW_SEALING) {
 		file_seals = memfd_file_seals_ptr(file);
 		if (file_seals)
 			*file_seals &= ~F_SEAL_SEAL;
diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c
index 18f585684e20..b6a7ad68c3c1 100644
--- a/tools/testing/selftests/memfd/memfd_test.c
+++ b/tools/testing/selftests/memfd/memfd_test.c
@@ -1151,7 +1151,7 @@  static void test_noexec_seal(void)
 			    mfd_def_size,
 			    MFD_CLOEXEC | MFD_NOEXEC_SEAL);
 	mfd_assert_mode(fd, 0666);
-	mfd_assert_has_seals(fd, F_SEAL_EXEC);
+	mfd_assert_has_seals(fd, F_SEAL_SEAL | F_SEAL_EXEC);
 	mfd_fail_chmod(fd, 0777);
 	close(fd);
 }