mbox series

[v1,0/5] arm64/mm: uffd write-protect and soft-dirty tracking

Message ID 20240419074344.2643212-1-ryan.roberts@arm.com
Headers show
Series arm64/mm: uffd write-protect and soft-dirty tracking | expand

Message

Ryan Roberts April 19, 2024, 7:43 a.m. UTC
Hi All,

This series adds uffd write-protect and soft-dirty tracking support for arm64. I
consider the soft-dirty support (patches 3 and 4) as RFC - see rationale below.

Previous attempts to add these features have failed because of a perceived lack
of available PTE SW bits. However it actually turns out that there are 2
available but they are hidden. PTE_PROT_NONE was previously occupying a SW bit,
but it only applies when PTE_VALID is clear, so this is moved to overlay PTE_UXN
in patch 1, freeing up the SW bit. Bit 63 is marked as "IGNORED" in the Arm ARM,
but it does not currently indicate "reserved for SW use" like it does for the
other SW bits. I've confirmed with the spec owner that this is an oversight; the
bit is intended to be reserved for SW use and the spec will clarify this in a
future update.

So we have our two bits; patch 2 enables uffd-wp, patch 3 enables soft-dirty and
patches 4 and 5 sort out the selftests so that the soft-dirty tests are compiled
for, and run on arm64.

That said, these are the last 2 SW bits and we may want to keep 1 bit in reserve
for future use. soft-dirty is only used for CRIU to my knowledge, and it is
thought that their use case could be solved with the more generic uffd-wp. So
unless somebody makes a clear case for the inclusion of soft-dirty support, we
are probably better off dropping patches 3 and 4 and keeping bit 63 for future
use. Although note that the most recent attempt to add soft-dirty for arm64 was
last month [1] so I'd like to give Shivansh Vij the opportunity to make the
case.

---8<---
As an appendix, I've also experimented with adding an "extended SW bits" region
linked by the `struct ptdesc` (which you can always find from the `pte_t *`). If
demonstrated to work, this would act as an insurance policy in case we ever need
more SW bits in future, giving us confidence to merge soft-dirty now.
Unfortunately this approach suffers from 2 problems; 1) its slow; my fork()
microbenchmark takes 40% longer in the worst case. 2) it is not possible to read
the HW pte and the extended SW bits atomically so it is impossible to implement
ptep_get_lockess() in its current form. So I've abandoned this experiment. (I
can provide more details if there is interest).
---8<---

[1] https://lore.kernel.org/linux-arm-kernel/MW4PR12MB687563EFB56373E8D55DDEABB92B2@MW4PR12MB6875.namprd12.prod.outlook.com/

Thanks,
Ryan


Ryan Roberts (5):
  arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID
  arm64/mm: Add uffd write-protect support
  arm64/mm: Add soft-dirty page tracking support
  selftests/mm: Enable soft-dirty tests on arm64
  selftests/mm: soft-dirty should fail if a testcase fails

 arch/arm64/Kconfig                         |   2 +
 arch/arm64/include/asm/pgtable-prot.h      |  20 +++-
 arch/arm64/include/asm/pgtable.h           | 118 +++++++++++++++++++--
 arch/arm64/mm/contpte.c                    |   6 +-
 arch/arm64/mm/fault.c                      |   3 +-
 arch/arm64/mm/hugetlbpage.c                |   6 +-
 tools/testing/selftests/mm/Makefile        |   5 +-
 tools/testing/selftests/mm/madv_populate.c |  26 +----
 tools/testing/selftests/mm/run_vmtests.sh  |   5 +-
 tools/testing/selftests/mm/soft-dirty.c    |   2 +-
 10 files changed, 141 insertions(+), 52 deletions(-)

--
2.25.1

Comments

Shivansh Vij April 19, 2024, 8:33 a.m. UTC | #1
(Sorry about the previous HTML email, accidentally used the wrong email client) 

Hey All,

>On 19/04/2024 08:43, Ryan Roberts wrote:
>> Hi All,
>>
>> This series adds uffd write-protect and soft-dirty tracking support for arm64. I
>> consider the soft-dirty support (patches 3 and 4) as RFC - see rationale below.
>>
>> Previous attempts to add these features have failed because of a perceived lack
>> of available PTE SW bits. However it actually turns out that there are 2
>> available but they are hidden. PTE_PROT_NONE was previously occupying a SW bit,
>> but it only applies when PTE_VALID is clear, so this is moved to overlay PTE_UXN
>> in patch 1, freeing up the SW bit. Bit 63 is marked as "IGNORED" in the Arm ARM,
>> but it does not currently indicate "reserved for SW use" like it does for the
>> other SW bits. I've confirmed with the spec owner that this is an oversight; the
>> bit is intended to be reserved for SW use and the spec will clarify this in a
>> future update.
>>
>> So we have our two bits; patch 2 enables uffd-wp, patch 3 enables soft-dirty and
>> patches 4 and 5 sort out the selftests so that the soft-dirty tests are compiled
>> for, and run on arm64.
>>
>> That said, these are the last 2 SW bits and we may want to keep 1 bit in reserve
>> for future use. soft-dirty is only used for CRIU to my knowledge, and it is
>> thought that their use case could be solved with the more generic uffd-wp. So
>> unless somebody makes a clear case for the inclusion of soft-dirty support, we
>> are probably better off dropping patches 3 and 4 and keeping bit 63 for future
>> use. Although note that the most recent attempt to add soft-dirty for arm64 was
>> last month [1] so I'd like to give Shivansh Vij the opportunity to make the
>> case.
>
> Ugh, forgot to mention that this applies on top of v6.9-rc3, and all the uffd-wp
> and soft-dirty tests in the mm selftests suite run and pass. And no regressions
> are observed in any of the other selftests.

Appreciate the opportunity to provide input here. 

I personally don't know of any applications other than CRIU that make heavy use of soft-dirty, and my use case is specifically focused on adding live-migration support to CRIU on ARM. 

Cloud providers like AWS have pretty massive discounts for ARM-based spot instances (90% last time I checked), and having live-migration in CRIU would allow more applications to take advantage of that.

As Ryan mentioned, there are two ways to achieve this - add dirty tracking to ARM (Patch 3/4), or tear out the existing dirty tracking code in CRIU and replace it with uffd-wp. 

I picked option one (dirty tracking in arm) because it seems to be the simplest way to move forward, whereas it would be a relatively heavy effort to add uffd-wp support to CRIU.
Mike Rapoport April 19, 2024, 4:30 p.m. UTC | #2
On Fri, Apr 19, 2024 at 11:45:14AM +0200, David Hildenbrand wrote:
> On 19.04.24 10:33, Shivansh Vij wrote:
> > > On 19/04/2024 08:43, Ryan Roberts wrote:
> > > > Hi All,
> > > > 
> > > > This series adds uffd write-protect and soft-dirty tracking support for arm64. I
> > > > consider the soft-dirty support (patches 3 and 4) as RFC - see rationale below.
> > > > 
> > > > That said, these are the last 2 SW bits and we may want to keep 1 bit in reserve
> > > > for future use. soft-dirty is only used for CRIU to my knowledge, and it is
> > > > thought that their use case could be solved with the more generic uffd-wp. So
> > > > unless somebody makes a clear case for the inclusion of soft-dirty support, we
> > > > are probably better off dropping patches 3 and 4 and keeping bit 63 for future
> > > > use. Although note that the most recent attempt to add soft-dirty for arm64 was
> > > > last month [1] so I'd like to give Shivansh Vij the opportunity to make the
> > > > case.
> > 
> > Appreciate the opportunity to provide input here.
> > 
> > I picked option one (dirty tracking in arm) because it seems to be the
> > simplest way to move forward, whereas it would be a relatively heavy
> > effort to add uffd-wp support to CRIU.
> > 
> > From a performance perspective I am also a little worried that uffd
> > will be slower than just tracking the dirty bits asynchronously with
> > sw dirty, but maybe that's not as much of a concern with the addition
> > of uffd-wp async.
> > 
> > With all this being said, I'll defer to the wisdom of the crowd about
> > which approach makes more sense - after all, with this patch we should
> > get uffd-wp support on arm so at least there will be _a_ way forward
> > for CRIU (albeit one requiring slightly more work).
> 
> Ccing Mike and Peter. In 2017, Mike gave a presentation "Memory tracking for
> iterative container migration"[1] at LPC
> 
> Some key points are still true I think:
> (1) More flexible and robust than soft-dirty
> (2) May obsolete soft-dirty
> 
> We further recently added a new UFFD_FEATURE_WP_ASYNC feature as part of
> [2], because getting soft-dirty return reliable results in some cases turned
> out rather hard to fix.
> 
> We might still have to optimize that approach for some very sparse large
> VMAs, but that should be solvable.
> 
>  "The major defect of this approach of dirty tracking is we need to
>  populate the pgtables when tracking starts. Soft-dirty doesn't do it
>  like that. It's unwanted in the case where the range of memory to track
>  is huge and unpopulated (e.g., tracking updates on a 10G file with
>  mmap() on top, without having any page cache installed yet). One way to
>  improve this is to allow pte markers exist for larger than PTE level
>  for PMD+. That will not change the interface if to implemented, so we
>  can leave that for later.")[3]
> 
> 
> If we can avoid adding soft-dirty on arm64 that would be great. This will
> require work on the CRIU side. One downside of uffd-wp is that it is
> currently not as avilable on architectures as soft-dirty.

Using uffd-wp instead of soft-dirty in CRIU will require quite some work on
CRIU side and probably on the kernel side too.

And as of now we'll anyway have to maintain soft-dirty because powerpc and
s390 don't have uffd-wp.

With UFFD_FEATURE_WP_ASYNC the concern that uffd-wp will be slower than
soft-dirty probably doesn't exist, but we won't know for sure until
somebody will try.

But there were other limitations, the most prominent was checkpointing an
application that uses uffd. If CRIU is to use uffd-wp for tracking of the
dirty pages, there should be some support for multiple uffd contexts for a
VMA and that's surely a lot of work.

> But I'll throw in another idea: do we really need soft-dirty and uffd-wp to
> exist at the same time in the same process (or the VMA?). In theory, we

For instance to have dirty memory tracking in CRIU for an application that
uses uffd-wp :) 

> could have a VMA flag that defines the semantics of the bit and simply have
> arch code use a single, abstracted PTE bit. Requires a bit more work,
> though, but the benfit would be that architecturs that do support soft-dirty
> could support uffd-wp.
> 
> [1] https://blog.linuxplumbersconf.org/2017/ocw//system/presentations/4724/original/Memory%20tracking%20for%20iterative%20container%20migration.pdf
> [2] https://lore.kernel.org/all/20230821141518.870589-1-usama.anjum@collabora.com/
> [3] https://lore.kernel.org/all/20230821141518.870589-2-usama.anjum@collabora.com/
> 
> -- 
> Cheers,
> 
> David / dhildenb
>
David Hildenbrand April 23, 2024, 9:02 p.m. UTC | #3
>>
>> Shivansh, do you speak for CRIU? Are you able to comment on whether CRIU
>> supports checkpointing an app that uses uffd?
> 
> I do not speak for CRIU - I'm just a user (and hopefully a future contributor), but not a maintainer or owner. I can however comment on whether CRIU supports checkpointing an app that uses UFFD - it doesn't. Looking through both the implementation of CRIU (specifically how they restore memory [1]), and at recently filed Github issues [2], it's pretty clear that CRIU doesn't support processes using UFFD - that they do not currently have plans to [3].

Thanks for all these pointers!

> 
> [1] https://github.com/checkpoint-restore/criu/blob/criu-2.x-stable/criu/mem.c#L683
> [2] https://github.com/checkpoint-restore/criu/issues/2021
> [3] https://github.com/checkpoint-restore/criu/issues/2021#issuecomment-1346971967
> 
>>>
>>> Further ... isn't CRIU already using uffd in some cases? ...documentation
>>> mentions [1] that it is used for "lazy (or post-copy) restore in CRIU". At least
>>> if the documentation is correct and its actually implemented.
>>>
>>
>> Shivansh, same question - do you know the current CRIU status/plans for using
>> uffd-wp instead of soft-dirty? If CRIU doesn't currently implement it and has no
>> current plans to, how can we guage interest in making a plan?
>>
> 
> While I cannot gauge whether the maintainers or main contributors of CRIU plan on using uffd-wp instead of soft-dirty in the future, I can tell you that there is no currently open issue to track that work, and whenever anyone in the past has asked about ARM64 pre-dump support to CRIU (which is the feature that uses soft-dirty/would use uffd-wp), they've always just said it's not supported - but that they do want the feature [4].
> 
> So in summary, they want the feature, but no one is working on implementing it (either with soft-dirty or with uffd-wp).
> 
> I doubt that CRIU would have any issues with adding the feature via soft-dirty (since, as shown in [4] they're interested in it), but as for using uffd-wp they definitely haven't shown any interest thus far. Based on the fact that it would be a very significant amount of work and it would really only be for ARM64 support (which they're already fine without), I'd be very surprised if they were interested in pursuing it.
> 

Of course, nobody wants to do the work. But that doesn't mean that the 
kernel has to do the work :)

If there are some major challenges why it cannot possible be done with 
uffd-wp (unfixable), that's a different story.

> [4] https://github.com/checkpoint-restore/criu/issues/1859#issuecomment-1972674047
> 
>>>
>>>>
>>>>> But I'll throw in another idea: do we really need soft-dirty and uffd-wp to
>>>>> exist at the same time in the same process (or the VMA?). In theory, we
>>
>> My instinct is that MUXing a PTE bit like this will lead to some subtle problems
>> that won't appear on arches that support either one or both of the features
>> independently and unconditionally. Surely better to limit ourselves to either
>> "arm64 will only support uffd-wp" or "arm64 will support both uffd-wp and
>> soft-dirty". That way, we could move ahead with reviewing/merging the uffd-wp
>> support asynchronously to deciding whether we want to support soft-dirty.
>>
> 
> My personal preference is having both approaches supported - especially in the context of CRIU since I doubt they'll be willing to rewrite all of the dumping and restore logic just for ARM64 support.

Sure, nobody does any work unless they are forced to.

But this is something that arm64 maintainers will have to decide.

Let's start with uffd-wp that has other well-known users that could 
benefit (e.g., QEMU background snapshots).