mbox series

[v3,0/8] add UFFDIO_POISON to simulate memory poisoning with UFFD

Message ID 20230706225037.1164380-1-axelrasmussen@google.com
Headers show
Series add UFFDIO_POISON to simulate memory poisoning with UFFD | expand

Message

Axel Rasmussen July 6, 2023, 10:50 p.m. UTC
This series adds a new userfaultfd feature, UFFDIO_POISON. See commit 4
for a detailed description of the feature.

The series is based on Linus master (partial 6.5 merge window), and
structured like this:

- Patches 1-3 are preparation / refactoring
- Patches 4-6 implement and advertise the new feature
- Patches 7-8 implement a unit test for the new feature

Changelog:

v2 -> v3:
 - Rebase onto current Linus master.
 - Don't overwrite existing PTE markers for non-hugetlb UFFDIO_POISON.
   Before, non-hugetlb would override them, but hugetlb would not. I don't
   think there's a use case where we *want* to override a UFFD_WP marker
   for example, so take the more conservative behavior for all kinds of
   memory.
 - [Peter] Drop hugetlb mfill atomic refactoring, since it isn't needed
   for this series (we don't touch that code directly anyway).
 - [Peter] Switch to re-using PTE_MARKER_SWAPIN_ERROR instead of defining
   new PTE_MARKER_UFFD_POISON.
 - [Peter] Extract start / len range overflow check into existing
   validate_range helper; this fixes the style issue of unnecessary braces
   in the UFFDIO_POISON implementation, because this code is just deleted.
 - [Peter] Extract file size check out into a new helper.
 - [Peter] Defer actually "enabling" the new feature until the last commit
   in the series; combine this with adding the documentation. As a
   consequence, move the selftest commits after this one.
 - [Randy] Fix typo in documentation.

v1 -> v2:
 - [Peter] Return VM_FAULT_HWPOISON not VM_FAULT_SIGBUS, to yield the
   correct behavior for KVM (guest MCE).
 - [Peter] Rename UFFDIO_SIGBUS to UFFDIO_POISON.
 - [Peter] Implement hugetlbfs support for UFFDIO_POISON.

Axel Rasmussen (8):
  mm: make PTE_MARKER_SWAPIN_ERROR more general
  mm: userfaultfd: check for start + len overflow in validate_range
  mm: userfaultfd: extract file size check out into a helper
  mm: userfaultfd: add new UFFDIO_POISON ioctl
  mm: userfaultfd: support UFFDIO_POISON for hugetlbfs
  mm: userfaultfd: document and enable new UFFDIO_POISON feature
  selftests/mm: refactor uffd_poll_thread to allow custom fault handlers
  selftests/mm: add uffd unit test for UFFDIO_POISON

 Documentation/admin-guide/mm/userfaultfd.rst |  15 +++
 fs/userfaultfd.c                             |  73 ++++++++++--
 include/linux/mm_inline.h                    |  19 +++
 include/linux/swapops.h                      |  10 +-
 include/linux/userfaultfd_k.h                |   4 +
 include/uapi/linux/userfaultfd.h             |  25 +++-
 mm/hugetlb.c                                 |  51 ++++++--
 mm/madvise.c                                 |   2 +-
 mm/memory.c                                  |  15 ++-
 mm/mprotect.c                                |   4 +-
 mm/shmem.c                                   |   4 +-
 mm/swapfile.c                                |   2 +-
 mm/userfaultfd.c                             |  83 ++++++++++---
 tools/testing/selftests/mm/uffd-common.c     |   5 +-
 tools/testing/selftests/mm/uffd-common.h     |   3 +
 tools/testing/selftests/mm/uffd-stress.c     |  12 +-
 tools/testing/selftests/mm/uffd-unit-tests.c | 117 +++++++++++++++++++
 17 files changed, 377 insertions(+), 67 deletions(-)

--
2.41.0.255.g8b1d071c50-goog

Comments

Peter Xu July 7, 2023, 1:10 p.m. UTC | #1
On Thu, Jul 06, 2023 at 03:50:29PM -0700, Axel Rasmussen wrote:
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 886f06066622..59e954586e2a 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -660,7 +660,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>  				free_swap_and_cache(entry);
>  				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
>  			} else if (is_hwpoison_entry(entry) ||
> -				   is_swapin_error_entry(entry)) {
> +				   is_error_swp_entry(entry)) {

is_error_swp_entry() made me think whether we should call this marker as
ERROR at all - it's too generic, is_error_swp_entry() can be easily read as
"this swap entry is not legal".  Can we come up with a less generic one?

PTE_MARKER_PAGE_POISONED / PTE_MARKER_POISONED / PTE_MARKER_PAGE_ERR / ...?

We do use poisoned only in real bad physical pages before, but I think we
can still keep using it when applying it to a pte.  I think it's fine to
call a pte poisoned if it's not legal to access, just like a poisoned page.

>  				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
>  			}
>  			continue;

The code change across the whole patch look good to me otherwise, thanks.
Peter Xu July 7, 2023, 1:15 p.m. UTC | #2
On Thu, Jul 06, 2023 at 03:50:31PM -0700, Axel Rasmussen wrote:
> This code is already duplicated twice, and UFFDIO_POISON will do the
> same check a third time. So, it's worth extracting into a helper to save
> repetitive lines of code.
> 
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>

Reviewed-by: Peter Xu <peterx@redhat.com>
Peter Xu July 7, 2023, 1:39 p.m. UTC | #3
On Thu, Jul 06, 2023 at 03:50:33PM -0700, Axel Rasmussen wrote:
> The behavior here is the same as it is for anon/shmem. This is done
> separately because hugetlb pte marker handling is a bit different.
> 
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>

Acked-by: Peter Xu <peterx@redhat.com>
Peter Xu July 7, 2023, 1:42 p.m. UTC | #4
On Thu, Jul 06, 2023 at 03:50:35PM -0700, Axel Rasmussen wrote:
> @@ -247,9 +245,13 @@ static int userfaultfd_stress(void)
>  {
>  	void *area;
>  	unsigned long nr;
> -	struct uffd_args args[nr_cpus];
> +	struct uffd_args *args;
>  	uint64_t mem_size = nr_pages * page_size;
>  
> +	args = calloc(nr_cpus, sizeof(struct uffd_args));
> +	if (!args)
> +		err("allocating args array failed");

This is trivial, but I think I requested a "free" (or keep it allocate on
stack) in previous version but it didn't get a response on why we cannot
and it kept going..  could you help explain?
Axel Rasmussen July 7, 2023, 5:03 p.m. UTC | #5
On Fri, Jul 7, 2023 at 6:42 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Thu, Jul 06, 2023 at 03:50:35PM -0700, Axel Rasmussen wrote:
> > @@ -247,9 +245,13 @@ static int userfaultfd_stress(void)
> >  {
> >       void *area;
> >       unsigned long nr;
> > -     struct uffd_args args[nr_cpus];
> > +     struct uffd_args *args;
> >       uint64_t mem_size = nr_pages * page_size;
> >
> > +     args = calloc(nr_cpus, sizeof(struct uffd_args));
> > +     if (!args)
> > +             err("allocating args array failed");
>
> This is trivial, but I think I requested a "free" (or keep it allocate on
> stack) in previous version but it didn't get a response on why we cannot
> and it kept going..  could you help explain?

Oh, sorry! I had meant to change this after our discussion, and simply
overlooked it while reworking the patches.

I'll include this change in a v4 which also addresses e.g. the
comments on commit 1.

>
> --
> Peter Xu
>
Axel Rasmussen July 7, 2023, 8:38 p.m. UTC | #6
On Fri, Jul 7, 2023 at 10:03 AM Axel Rasmussen <axelrasmussen@google.com> wrote:
>
> On Fri, Jul 7, 2023 at 6:42 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Thu, Jul 06, 2023 at 03:50:35PM -0700, Axel Rasmussen wrote:
> > > @@ -247,9 +245,13 @@ static int userfaultfd_stress(void)
> > >  {
> > >       void *area;
> > >       unsigned long nr;
> > > -     struct uffd_args args[nr_cpus];
> > > +     struct uffd_args *args;
> > >       uint64_t mem_size = nr_pages * page_size;
> > >
> > > +     args = calloc(nr_cpus, sizeof(struct uffd_args));
> > > +     if (!args)
> > > +             err("allocating args array failed");
> >
> > This is trivial, but I think I requested a "free" (or keep it allocate on
> > stack) in previous version but it didn't get a response on why we cannot
> > and it kept going..  could you help explain?
>
> Oh, sorry! I had meant to change this after our discussion, and simply
> overlooked it while reworking the patches.
>
> I'll include this change in a v4 which also addresses e.g. the
> comments on commit 1.

Ah, so I tried switching back to the {0} initializer, and was reminded
why I didn't do that in v1. :) Ignoring the missing braces warning I
talked about before, using {0} here is actually an error
("variable-sized object may not be initialized") because this is a
variable sized array (nr_cpus isn't constant). So, that option is out.

I'm not a huge fan of adding the free() cleanup and dealing with all
of the err() calls this function has.

Originally I switched to calloc() because I'm not a big fan of VLAs
anyway. But, as a compromise in v4 I'll leave it a VLA, and switch to
memset() for initializing it.

>
> >
> > --
> > Peter Xu
> >
Peter Xu July 7, 2023, 9 p.m. UTC | #7
On Fri, Jul 07, 2023 at 01:38:16PM -0700, Axel Rasmussen wrote:
> Ah, so I tried switching back to the {0} initializer, and was reminded
> why I didn't do that in v1. :) Ignoring the missing braces warning I
> talked about before, using {0} here is actually an error
> ("variable-sized object may not be initialized") because this is a
> variable sized array (nr_cpus isn't constant). So, that option is out.
> 
> I'm not a huge fan of adding the free() cleanup and dealing with all
> of the err() calls this function has.

Oh, that's definitely not needed - as long as we know we're going to quit,
we let kernel clean everything is fine.

I just worry in the future there can be a loop of userfaultfd_stress() so
it can OOM a host even if no err() hit but by looping.  I hope I explained
what I meant.. so it's still good we make sure things freed properly when
in success paths and when we're at it.

> 
> Originally I switched to calloc() because I'm not a big fan of VLAs
> anyway. But, as a compromise in v4 I'll leave it a VLA, and switch to
> memset() for initializing it.

That'll be good enough to me.  Thanks a lot,