mbox series

[0/6] Memory Mapping (VMA) protection using PKU - set 1

Message ID 20230515130553.2311248-1-jeffxu@chromium.org
Headers show
Series Memory Mapping (VMA) protection using PKU - set 1 | expand

Message

Jeff Xu May 15, 2023, 1:05 p.m. UTC
From: Jeff Xu <jeffxu@google.com>

This is the first set of Memory mapping (VMA) protection patches using PKU.

* * * 

Background:

As discussed previously in the kernel mailing list [1], V8 CFI [2] uses 
PKU to protect memory, and Stephen Röttger proposes to extend the PKU to 
memory mapping [3].

We're using PKU for in-process isolation to enforce control-flow integrity
for a JIT compiler. In our threat model, an attacker exploits a 
vulnerability and has arbitrary read/write access to the whole process
space concurrently to other threads being executed. This attacker can
manipulate some arguments to syscalls from some threads.

Under such a powerful attack, we want to create a “safe/isolated”
thread environment. We assign dedicated PKUs to this thread, 
and use those PKUs to protect the threads’ runtime environment.
The thread has exclusive access to its run-time memory. This
includes modifying the protection of the memory mapping, or
munmap the memory mapping after use. And the other threads
won’t be able to access the memory or modify the memory mapping
(VMA) belonging to the thread.

* * * 

Proposed changes:

This patch introduces a new flag, PKEY_ENFORCE_API, to the pkey_alloc()
function. When a PKEY is created with this flag, it is enforced that any
thread that wants to make changes to the memory mapping (such as mprotect)
of the memory must have write access to the PKEY. PKEYs created without
this flag will continue to work as they do now, for backwards 
compatibility.

Only PKEY created from user space can have the new flag set, the PKEY
allocated by the kernel internally will not have it. In other words,
ARCH_DEFAULT_PKEY(0) and execute_only_pkey won’t have this flag set,
and continue work as today.

This flag is checked only at syscall entry, such as mprotect/munmap in
this set of patches. It will not apply to other call paths. In other
words, if the kernel want to change attributes of VMA for some reasons,
the kernel is free to do that and not affected by this new flag.

This set of patch covers mprotect/munmap, I plan to work on other 
syscalls after this. 

* * * 

Testing:

I have tested this patch on a Linux kernel 5.15, 6,1, and 6.4-rc1,
new selftest is added in: pkey_enforce_api.c 

* * * 

Discussion:

We believe that this patch provides a valuable security feature. 
It allows us to create “safe/isolated” thread environments that are 
protected from attackers with arbitrary read/write access to 
the process space.

We believe that the interface change and the patch don't 
introduce backwards compatibility risk.

We would like to disucss this patch in Linux kernel community
for feedback and support. 

* * * 

Reference:

[1]https://lore.kernel.org/all/202208221331.71C50A6F@keescook/
[2]https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXgeaRHo/edit?usp=sharing
[3]https://docs.google.com/document/d/1qqVoVfRiF2nRylL3yjZyCQvzQaej1HRPh3f5wj1AS9I/edit


Best Regards,
-Jeff Xu

Jeff Xu (6):
  PKEY: Introduce PKEY_ENFORCE_API flag
  PKEY: Add arch_check_pkey_enforce_api()
  PKEY: Apply PKEY_ENFORCE_API to mprotect
  PKEY:selftest pkey_enforce_api for mprotect
  KEY: Apply PKEY_ENFORCE_API to munmap
  PKEY:selftest pkey_enforce_api for munmap

 arch/powerpc/include/asm/pkeys.h              |   19 +-
 arch/x86/include/asm/mmu.h                    |    7 +
 arch/x86/include/asm/pkeys.h                  |   92 +-
 arch/x86/mm/pkeys.c                           |    2 +-
 include/linux/mm.h                            |    2 +-
 include/linux/pkeys.h                         |   18 +-
 include/uapi/linux/mman.h                     |    5 +
 mm/mmap.c                                     |   34 +-
 mm/mprotect.c                                 |   31 +-
 mm/mremap.c                                   |    6 +-
 tools/testing/selftests/mm/Makefile           |    1 +
 tools/testing/selftests/mm/pkey_enforce_api.c | 1312 +++++++++++++++++
 12 files changed, 1507 insertions(+), 22 deletions(-)
 create mode 100644 tools/testing/selftests/mm/pkey_enforce_api.c


base-commit: ba0ad6ed89fd5dada3b7b65ef2b08e95d449d4ab

Comments

Jeff Xu May 31, 2023, 11:02 p.m. UTC | #1
Hi Dave,

Regarding siglongjmp:

On Thu, May 18, 2023 at 8:37 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 5/17/23 16:48, Jeff Xu wrote:
> > However, there are a few challenges I have not yet worked through.
> > First, the code needs to track when the first signaling entry occurs
> > (saving the PKRU register to the thread struct) and when it is last
> > returned (restoring the PKRU register from the thread struct).
>
> Would tracking signal "depth" work in the face of things like siglongjmp?
>
siglongjmp is interesting, thanks for bringing this up.

With siglongjmp, the thread doesn't go back to the place where signal is
raised, indeed, this idea of tracking the first signaling entry
doesn't work well with siglongjmp.

Thanks for your insight!
-Jeff


-Jeff
Jeff Xu June 1, 2023, 1:39 a.m. UTC | #2
Hi Dave,
Thanks for feedback, regarding sigaltstack:

On Thu, May 18, 2023 at 2:04 PM Dave Hansen <dave.hansen@intel.com> wrote:
> >
> > Agreed on signaling handling is a tough part: what do you think about
> > the approach (modifying PKRU from saved stack after XSAVE), is there a
> > blocker ?
>
> Yes, signal entry and sigreturn are not necessarily symmetric so you
> can't really have a stack.
>

To clarify: I mean this option below:
- before get_sigframe(), save PKUR => tmp
- modify thread's PKRU so it can write to sigframe
- XSAVE
- save tmp => sigframe

I believe you proposed this in a previous discussion [1]:
and I quote here:
"There's a delicate point when building the stack frame that the
kernel would need to move over to the new PKRU value to build the
frame before it writes the *OLD* value to the frame.  But, it's far
from impossible."

sigreturn will restore thread's original PKRU from sigframe.
In case of asymmetrics caused by siglongjmp, user space doesn't call
sigreturn, the application needs to set desired PKRU before siglongjmp.

I think this solution should work.

[1] https://lore.kernel.org/lkml/b4f0dca5-1d15-67f7-4600-9a0a91e9d0bd@intel.com/

Best regards,
-Jeff