mbox series

[v3,0/6] Optimize mremap during mutual alignment within PMD

Message ID 20230524153239.3036507-1-joel@joelfernandes.org
Headers show
Series Optimize mremap during mutual alignment within PMD | expand

Message

Joel Fernandes May 24, 2023, 3:32 p.m. UTC
Hello!

Here is v3 of the mremap start address optimization / fix for exec warning.

The main changes are:
1. Care to be taken to move purely within a VMA, in other words this check
   in call_align_down():
    if (vma->vm_start <= addr_masked)
            return false;

    As an example of why this is needed:
    Consider the following range which is 2MB aligned and is
    a part of a larger 10MB range which is not shown. Each
    character is 256KB below making the source and destination
    2MB each. The lower case letters are moved (s to d) and the
    upper case letters are not moved.

    |DDDDddddSSSSssss|

    If we align down 'ssss' to start from the 'SSSS', we will end up destroying
    SSSS. The above if statement prevents that and I verified it.

    I also added a test for this in the last patch.

2. Handle the stack case separately. We do not care about #1 for stack movement
   because the 'SSSS' does not matter during this move. Further we need to do this
   to prevent the stack move warning.

    if (!for_stack && vma->vm_start <= addr_masked)
            return false;

History of patches
==================
v2->v3:
1. Masked address was stored in int, fixed it to unsigned long to avoid truncation.
2. We now handle moves happening purely within a VMA, a new test is added to handle this.
3. More code comments.

v1->v2:
1. Trigger the optimization for mremaps smaller than a PMD. I tested by tracing
that it works correctly.

2. Fix issue with bogus return value found by Linus if we broke out of the
above loop for the first PMD itself.

v1: Initial RFC.

Description of patches
======================
These patches optimizes the start addresses in move_page_tables() and tests the
changes. It addresses a warning [1] that occurs due to a downward, overlapping
move on a mutually-aligned offset within a PMD during exec. By initiating the
copy process at the PMD level when such alignment is present, we can prevent
this warning and speed up the copying process at the same time. Linus Torvalds
suggested this idea.

Please check the individual patches for more details.

thanks,

 - Joel

[1] https://lore.kernel.org/all/ZB2GTBD%2FLWTrkOiO@dhcp22.suse.cz/

Joel Fernandes (Google) (6):
mm/mremap: Optimize the start addresses in move_page_tables()
mm/mremap: Allow moves within the same VMA
selftests: mm: Fix failure case when new remap region was not found
selftests: mm: Add a test for mutually aligned moves > PMD size
selftests: mm: Add a test for remapping to area immediately after
existing mapping
selftests: mm: Add a test for remapping within a range

fs/exec.c                                |   2 +-
include/linux/mm.h                       |   2 +-
mm/mremap.c                              |  69 ++++++++++-
tools/testing/selftests/mm/mremap_test.c | 148 +++++++++++++++++++++--
4 files changed, 209 insertions(+), 12 deletions(-)

--
2.40.1.698.g37aff9b760-goog

Comments

Joel Fernandes May 25, 2023, 7:51 p.m. UTC | #1
Hi Linus,

On Wed, May 24, 2023 at 7:23 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Hmm. I'm still quite unhappy about your can_align_down().
>
> On Wed, May 24, 2023 at 8:32 AM Joel Fernandes (Google)
> <joel@joelfernandes.org> wrote:
> >
> > +       /* If the masked address is within vma, we cannot align the address down. */
> > +       if (vma->vm_start <= addr_masked)
> > +               return false;
>
> I don't think this test is right.
>
> The test should not be "is the mapping still there at the point we
> aligned down to".
>
> No, the test should be whether there is any part of the mapping below
> the point we're starting with:
>
>         if (vma->vm_start < addr_to_align)
>                 return false;
>
> because we can do the "expand the move down" *only* if it's the
> beginning of the vma (because otherwise we'd be moving part of the vma
> that precedes the address!)

You are right, I missed that. Funny I did think about this case you
mentioned. I will fix it in the next revision, thanks.

> (Alternatively, just make that "<" be "!=" - we're basically saying
> that we can expand moving ptes to a pmd boundary *only* if this vma
> starts at that point. No?).

Yes, I prefer the "!=" check. I will use that.

>
> > +       cur = find_vma_prev(vma->vm_mm, vma->vm_start, &prev);
> > +       if (!cur || cur != vma || !prev)
> > +               return false;
>
> I've mentioned this test before, and I still find it actively misleading.
>
> First off, the "!cur || cur != vma" test is clearly redundant. We know
> 'vma' isn't NULL (we just dereferenced it!). So "cur != vma" already
> includes the "!cur" test.
>
> So that "!cur" part of the test simply *cannot* be sensible.

Ok, I agree with you now.

> And the "!prev" test still makes no sense to me. You tried to explain
> it to me earlier, and I clearly didn't get it. It seems actively
> wrong. I still think "!prev" should return true.

Yes, ok. Sounds good.

> You seemed to think that "!prev" couldn';t actually happen and would
> be a sign of some VM problem, but that doesn't make any sense to me.
> Of course !prev can happen - if "vma" is the first vma in the VM and
> there is no previous.
>
> It may be *rare*, but I still don't understand why you'd make that
> "there is no vma below us" mean "we cannot expand the move below us
> because there's something there".
>
> So I continue to think that this test should just be
>
>         if (WARN_ON_ONCE(cur != vma))
>                 return false;

I agree with this now.

>
> because if it ever returns something that *isn't* the same as vma,
> then we do indeed have serious problems. But that WARN_ON_ONCE() shows
> that that's a "cannot happen" thing, not some kind of "if this happens
> than don't do it" test.
>
> and then the *real* test  for "can we align down" should just be
>
>         return !prev || prev->vm_end <= addr_masked;

Agreed, that's cleaner.

> Because while I think your code _works_, it really doesn't seem to
> make much sense as it stands in your patch. The tests are actively
> misleading. No?

True, your approach makes me want to improve on writing cleaner code
than being excessively paranoid. So thank you for that.

These patches have been tricky to get right so thank you for your
continued input and quick feedback.

I will add a test for the case you mentioned above where the address
to realign wasn't in the VMA's beginning.

thanks,

- Joel