diff mbox series

[1/4] mm: allow guard regions in file-backed and read-only mappings

Message ID d885cb259174736c2830a5dfe07f81b214ef3faa.1739469950.git.lorenzo.stoakes@oracle.com
State New
Headers show
Series mm: permit guard regions for file-backed/shmem mappings | expand

Commit Message

Lorenzo Stoakes Feb. 13, 2025, 6:17 p.m. UTC
There is no reason to disallow guard regions in file-backed mappings -
readahead and fault-around both function correctly in the presence of PTE
markers, equally other operations relating to memory-mapped files function
correctly.

Additionally, read-only mappings if introducing guard-regions, only
restrict the mapping further, which means there is no violation of any
access rights by permitting this to be so.

Removing this restriction allows for read-only mapped files (such as
executable files) correctly which would otherwise not be permitted.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/madvise.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

Comments

David Hildenbrand Feb. 18, 2025, 4:01 p.m. UTC | #1
On 13.02.25 19:17, Lorenzo Stoakes wrote:
> There is no reason to disallow guard regions in file-backed mappings -
> readahead and fault-around both function correctly in the presence of PTE
> markers, equally other operations relating to memory-mapped files function
> correctly.
> 
> Additionally, read-only mappings if introducing guard-regions, only
> restrict the mapping further, which means there is no violation of any
> access rights by permitting this to be so.
> 
> Removing this restriction allows for read-only mapped files (such as
> executable files) correctly which would otherwise not be permitted.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>   mm/madvise.c | 8 +-------
>   1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 6ecead476a80..e01e93e179a8 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked)
>   	if (!allow_locked)
>   		disallowed |= VM_LOCKED;
>   
> -	if (!vma_is_anonymous(vma))
> -		return false;
> -
> -	if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE)
> -		return false;
> -
> -	return true;
> +	return !(vma->vm_flags & disallowed);
>   }
>   
>   static bool is_guard_pte_marker(pte_t ptent)

Acked-by: David Hildenbrand <david@redhat.com>

I assume these markers cannot completely prevent us from allocating 
pages/folios for these underlying file/pageache ranges of these markers 
in case of shmem during page faults, right?
Lorenzo Stoakes Feb. 18, 2025, 4:12 p.m. UTC | #2
On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote:
> On 13.02.25 19:17, Lorenzo Stoakes wrote:
> > There is no reason to disallow guard regions in file-backed mappings -
> > readahead and fault-around both function correctly in the presence of PTE
> > markers, equally other operations relating to memory-mapped files function
> > correctly.
> >
> > Additionally, read-only mappings if introducing guard-regions, only
> > restrict the mapping further, which means there is no violation of any
> > access rights by permitting this to be so.
> >
> > Removing this restriction allows for read-only mapped files (such as
> > executable files) correctly which would otherwise not be permitted.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> >   mm/madvise.c | 8 +-------
> >   1 file changed, 1 insertion(+), 7 deletions(-)
> >
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 6ecead476a80..e01e93e179a8 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked)
> >   	if (!allow_locked)
> >   		disallowed |= VM_LOCKED;
> > -	if (!vma_is_anonymous(vma))
> > -		return false;
> > -
> > -	if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE)
> > -		return false;
> > -
> > -	return true;
> > +	return !(vma->vm_flags & disallowed);
> >   }
> >   static bool is_guard_pte_marker(pte_t ptent)
>
> Acked-by: David Hildenbrand <david@redhat.com>

Thanks!

>
> I assume these markers cannot completely prevent us from allocating
> pages/folios for these underlying file/pageache ranges of these markers in
> case of shmem during page faults, right?

If the markers are in place, then page faulting will result in a
segfault. If we faulted in a shmem page then installed markers (which would
zap the range), then the page cache will be populated, but obviously
subject to standard reclaim.

If we perform synchronous readahead prior to a guard region that includes
(partially or fully) a guard region we might major fault entries into the
page cache that are then not accessable _from that mapping_, this is rather
unavoidable as this doesn't account for page table mappings and should be
largely trivial overhead (also these folios are reclaimable).

>
> --
> Cheers,
>
> David / dhildenb
>
David Hildenbrand Feb. 18, 2025, 4:17 p.m. UTC | #3
On 18.02.25 17:12, Lorenzo Stoakes wrote:
> On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote:
>> On 13.02.25 19:17, Lorenzo Stoakes wrote:
>>> There is no reason to disallow guard regions in file-backed mappings -
>>> readahead and fault-around both function correctly in the presence of PTE
>>> markers, equally other operations relating to memory-mapped files function
>>> correctly.
>>>
>>> Additionally, read-only mappings if introducing guard-regions, only
>>> restrict the mapping further, which means there is no violation of any
>>> access rights by permitting this to be so.
>>>
>>> Removing this restriction allows for read-only mapped files (such as
>>> executable files) correctly which would otherwise not be permitted.
>>>
>>> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>>> ---
>>>    mm/madvise.c | 8 +-------
>>>    1 file changed, 1 insertion(+), 7 deletions(-)
>>>
>>> diff --git a/mm/madvise.c b/mm/madvise.c
>>> index 6ecead476a80..e01e93e179a8 100644
>>> --- a/mm/madvise.c
>>> +++ b/mm/madvise.c
>>> @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked)
>>>    	if (!allow_locked)
>>>    		disallowed |= VM_LOCKED;
>>> -	if (!vma_is_anonymous(vma))
>>> -		return false;
>>> -
>>> -	if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE)
>>> -		return false;
>>> -
>>> -	return true;
>>> +	return !(vma->vm_flags & disallowed);
>>>    }
>>>    static bool is_guard_pte_marker(pte_t ptent)
>>
>> Acked-by: David Hildenbrand <david@redhat.com>
> 
> Thanks!
> 
>>
>> I assume these markers cannot completely prevent us from allocating
>> pages/folios for these underlying file/pageache ranges of these markers in
>> case of shmem during page faults, right?
> 
> If the markers are in place, then page faulting will result in a
> segfault. If we faulted in a shmem page then installed markers (which would
> zap the range), then the page cache will be populated, but obviously
> subject to standard reclaim.

Well, yes, (a) if there is swap and (b), if the noswap option was not 
specified for tmpfs.

Okay, so installing a guard entry might require punshing a hole to get 
rid of any already-existing memory. But readahead (below) might mess it up.

> 
> If we perform synchronous readahead prior to a guard region that includes
> (partially or fully) a guard region we might major fault entries into the
> page cache that are then not accessable _from that mapping_, this is rather
> unavoidable as this doesn't account for page table mappings and should be
> largely trivial overhead (also these folios are reclaimable).

Right, that's what I had in mind: assume I have a single marker in a 
PMD, shmem might allocate a PMD THP to back that region, ignoring the 
marker hint. (so I think)

Thanks!
David Hildenbrand Feb. 18, 2025, 4:27 p.m. UTC | #4
On 18.02.25 17:21, Lorenzo Stoakes wrote:
> On Tue, Feb 18, 2025 at 05:17:20PM +0100, David Hildenbrand wrote:
>> On 18.02.25 17:12, Lorenzo Stoakes wrote:
>>> On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote:
>>>> On 13.02.25 19:17, Lorenzo Stoakes wrote:
>>>>> There is no reason to disallow guard regions in file-backed mappings -
>>>>> readahead and fault-around both function correctly in the presence of PTE
>>>>> markers, equally other operations relating to memory-mapped files function
>>>>> correctly.
>>>>>
>>>>> Additionally, read-only mappings if introducing guard-regions, only
>>>>> restrict the mapping further, which means there is no violation of any
>>>>> access rights by permitting this to be so.
>>>>>
>>>>> Removing this restriction allows for read-only mapped files (such as
>>>>> executable files) correctly which would otherwise not be permitted.
>>>>>
>>>>> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>>>>> ---
>>>>>     mm/madvise.c | 8 +-------
>>>>>     1 file changed, 1 insertion(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/mm/madvise.c b/mm/madvise.c
>>>>> index 6ecead476a80..e01e93e179a8 100644
>>>>> --- a/mm/madvise.c
>>>>> +++ b/mm/madvise.c
>>>>> @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked)
>>>>>     	if (!allow_locked)
>>>>>     		disallowed |= VM_LOCKED;
>>>>> -	if (!vma_is_anonymous(vma))
>>>>> -		return false;
>>>>> -
>>>>> -	if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE)
>>>>> -		return false;
>>>>> -
>>>>> -	return true;
>>>>> +	return !(vma->vm_flags & disallowed);
>>>>>     }
>>>>>     static bool is_guard_pte_marker(pte_t ptent)
>>>>
>>>> Acked-by: David Hildenbrand <david@redhat.com>
>>>
>>> Thanks!
>>>
>>>>
>>>> I assume these markers cannot completely prevent us from allocating
>>>> pages/folios for these underlying file/pageache ranges of these markers in
>>>> case of shmem during page faults, right?
>>>
>>> If the markers are in place, then page faulting will result in a
>>> segfault. If we faulted in a shmem page then installed markers (which would
>>> zap the range), then the page cache will be populated, but obviously
>>> subject to standard reclaim.
>>
>> Well, yes, (a) if there is swap and (b), if the noswap option was not
>> specified for tmpfs.
>>
> 
> Right, yeah if you don't have it set up such that dropping a reference to the
> folio doesn't drop the page altogether.
> 
> I think this matches expectation though in that you'd get the same results from
> an MADV_DONTNEED followed by faulting the page again.

It might make sense to document that: installing a guard behaves just 
like MADV_DONTNEED; in case of a file, that means that the pagecache is 
left untouched.

> 
>> Okay, so installing a guard entry might require punshing a hole to get rid
>> of any already-existing memory. But readahead (below) might mess it up.
> 
> Only if you are so concerned about avoiding the page cache being populated there
> that you want to do this :)
> 
> Readahead I think will not readahead into a holepunched region as the hole
> punching extends to the fs layer _I believe_ I have not checked the code for
> this, but I believe it actually changes the underlying file too right to say
> 'this part of the file is empty'?

Well, we are talking about shmem here ... not your ordinary fs backed by 
an actual file :)
David Hildenbrand Feb. 18, 2025, 5 p.m. UTC | #5
On 18.02.25 17:49, Lorenzo Stoakes wrote:
> On Tue, Feb 18, 2025 at 05:27:24PM +0100, David Hildenbrand wrote:
>> On 18.02.25 17:21, Lorenzo Stoakes wrote:
>>> On Tue, Feb 18, 2025 at 05:17:20PM +0100, David Hildenbrand wrote:
>>>> On 18.02.25 17:12, Lorenzo Stoakes wrote:
>>>>> On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote:
>>>>>> On 13.02.25 19:17, Lorenzo Stoakes wrote:
>>>>>>> There is no reason to disallow guard regions in file-backed mappings -
>>>>>>> readahead and fault-around both function correctly in the presence of PTE
>>>>>>> markers, equally other operations relating to memory-mapped files function
>>>>>>> correctly.
>>>>>>>
>>>>>>> Additionally, read-only mappings if introducing guard-regions, only
>>>>>>> restrict the mapping further, which means there is no violation of any
>>>>>>> access rights by permitting this to be so.
>>>>>>>
>>>>>>> Removing this restriction allows for read-only mapped files (such as
>>>>>>> executable files) correctly which would otherwise not be permitted.
>>>>>>>
>>>>>>> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>>>>>>> ---
>>>>>>>      mm/madvise.c | 8 +-------
>>>>>>>      1 file changed, 1 insertion(+), 7 deletions(-)
>>>>>>>
>>>>>>> diff --git a/mm/madvise.c b/mm/madvise.c
>>>>>>> index 6ecead476a80..e01e93e179a8 100644
>>>>>>> --- a/mm/madvise.c
>>>>>>> +++ b/mm/madvise.c
>>>>>>> @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked)
>>>>>>>      	if (!allow_locked)
>>>>>>>      		disallowed |= VM_LOCKED;
>>>>>>> -	if (!vma_is_anonymous(vma))
>>>>>>> -		return false;
>>>>>>> -
>>>>>>> -	if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE)
>>>>>>> -		return false;
>>>>>>> -
>>>>>>> -	return true;
>>>>>>> +	return !(vma->vm_flags & disallowed);
>>>>>>>      }
>>>>>>>      static bool is_guard_pte_marker(pte_t ptent)
>>>>>>
>>>>>> Acked-by: David Hildenbrand <david@redhat.com>
>>>>>
>>>>> Thanks!
>>>>>
>>>>>>
>>>>>> I assume these markers cannot completely prevent us from allocating
>>>>>> pages/folios for these underlying file/pageache ranges of these markers in
>>>>>> case of shmem during page faults, right?
>>>>>
>>>>> If the markers are in place, then page faulting will result in a
>>>>> segfault. If we faulted in a shmem page then installed markers (which would
>>>>> zap the range), then the page cache will be populated, but obviously
>>>>> subject to standard reclaim.
>>>>
>>>> Well, yes, (a) if there is swap and (b), if the noswap option was not
>>>> specified for tmpfs.
>>>>
>>>
>>> Right, yeah if you don't have it set up such that dropping a reference to the
>>> folio doesn't drop the page altogether.
>>>
>>> I think this matches expectation though in that you'd get the same results from
>>> an MADV_DONTNEED followed by faulting the page again.
>>
>> It might make sense to document that: installing a guard behaves just like
>> MADV_DONTNEED; in case of a file, that means that the pagecache is left
>> untouched.
> 
> More docs noooo! :P I will update the man pages when this is more obviously
> heading for landing in 6.15 accordingly.
> 
> Current man page documentation on this is:
> 
> 'If the region maps memory pages those mappings will be replaced as part of
> the operation'
> 
> I think something like:
> 
> 'If the region maps pages those mappings will be replaced as part of the
> operation. When guard regions are removed via MADV_GUARD_REMOVE, faulting
> in the page will behave as if that region had MADV_DONTNEED applied to it,
> that is anonymous ranges will be backed by newly allocated zeroed pages and
> file-backed ranges will be backed by the underlying file pages.'
> 
> Probably something less wordy than this...

Yeah, but sounds good to me.

> 
>>
>>>
>>>> Okay, so installing a guard entry might require punshing a hole to get rid
>>>> of any already-existing memory. But readahead (below) might mess it up.
>>>
>>> Only if you are so concerned about avoiding the page cache being populated there
>>> that you want to do this :)
>>>
>>> Readahead I think will not readahead into a holepunched region as the hole
>>> punching extends to the fs layer _I believe_ I have not checked the code for
>>> this, but I believe it actually changes the underlying file too right to say
>>> 'this part of the file is empty'?
>>
>> Well, we are talking about shmem here ... not your ordinary fs backed by an
>> actual file :)
> 
> I am talking about both, I multitask ;)

For !shmem, we should indeed not be messing with a sparse file structure.
Lorenzo Stoakes Feb. 18, 2025, 5:04 p.m. UTC | #6
On Tue, Feb 18, 2025 at 06:00:36PM +0100, David Hildenbrand wrote:
> On 18.02.25 17:49, Lorenzo Stoakes wrote:
> > On Tue, Feb 18, 2025 at 05:27:24PM +0100, David Hildenbrand wrote:
> > > On 18.02.25 17:21, Lorenzo Stoakes wrote:
> > > > On Tue, Feb 18, 2025 at 05:17:20PM +0100, David Hildenbrand wrote:
> > > > > On 18.02.25 17:12, Lorenzo Stoakes wrote:
> > > > > > On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote:
> > > > > > > On 13.02.25 19:17, Lorenzo Stoakes wrote:
> > > > > > > > There is no reason to disallow guard regions in file-backed mappings -
> > > > > > > > readahead and fault-around both function correctly in the presence of PTE
> > > > > > > > markers, equally other operations relating to memory-mapped files function
> > > > > > > > correctly.
> > > > > > > >
> > > > > > > > Additionally, read-only mappings if introducing guard-regions, only
> > > > > > > > restrict the mapping further, which means there is no violation of any
> > > > > > > > access rights by permitting this to be so.
> > > > > > > >
> > > > > > > > Removing this restriction allows for read-only mapped files (such as
> > > > > > > > executable files) correctly which would otherwise not be permitted.
> > > > > > > >
> > > > > > > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > > > > > > > ---
> > > > > > > >      mm/madvise.c | 8 +-------
> > > > > > > >      1 file changed, 1 insertion(+), 7 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > > > > > > index 6ecead476a80..e01e93e179a8 100644
> > > > > > > > --- a/mm/madvise.c
> > > > > > > > +++ b/mm/madvise.c
> > > > > > > > @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked)
> > > > > > > >      	if (!allow_locked)
> > > > > > > >      		disallowed |= VM_LOCKED;
> > > > > > > > -	if (!vma_is_anonymous(vma))
> > > > > > > > -		return false;
> > > > > > > > -
> > > > > > > > -	if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE)
> > > > > > > > -		return false;
> > > > > > > > -
> > > > > > > > -	return true;
> > > > > > > > +	return !(vma->vm_flags & disallowed);
> > > > > > > >      }
> > > > > > > >      static bool is_guard_pte_marker(pte_t ptent)
> > > > > > >
> > > > > > > Acked-by: David Hildenbrand <david@redhat.com>
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > >
> > > > > > > I assume these markers cannot completely prevent us from allocating
> > > > > > > pages/folios for these underlying file/pageache ranges of these markers in
> > > > > > > case of shmem during page faults, right?
> > > > > >
> > > > > > If the markers are in place, then page faulting will result in a
> > > > > > segfault. If we faulted in a shmem page then installed markers (which would
> > > > > > zap the range), then the page cache will be populated, but obviously
> > > > > > subject to standard reclaim.
> > > > >
> > > > > Well, yes, (a) if there is swap and (b), if the noswap option was not
> > > > > specified for tmpfs.
> > > > >
> > > >
> > > > Right, yeah if you don't have it set up such that dropping a reference to the
> > > > folio doesn't drop the page altogether.
> > > >
> > > > I think this matches expectation though in that you'd get the same results from
> > > > an MADV_DONTNEED followed by faulting the page again.
> > >
> > > It might make sense to document that: installing a guard behaves just like
> > > MADV_DONTNEED; in case of a file, that means that the pagecache is left
> > > untouched.
> >
> > More docs noooo! :P I will update the man pages when this is more obviously
> > heading for landing in 6.15 accordingly.
> >
> > Current man page documentation on this is:
> >
> > 'If the region maps memory pages those mappings will be replaced as part of
> > the operation'
> >
> > I think something like:
> >
> > 'If the region maps pages those mappings will be replaced as part of the
> > operation. When guard regions are removed via MADV_GUARD_REMOVE, faulting
> > in the page will behave as if that region had MADV_DONTNEED applied to it,
> > that is anonymous ranges will be backed by newly allocated zeroed pages and
> > file-backed ranges will be backed by the underlying file pages.'
> >
> > Probably something less wordy than this...
>
> Yeah, but sounds good to me.
>
> >
> > >
> > > >
> > > > > Okay, so installing a guard entry might require punshing a hole to get rid
> > > > > of any already-existing memory. But readahead (below) might mess it up.
> > > >
> > > > Only if you are so concerned about avoiding the page cache being populated there
> > > > that you want to do this :)
> > > >
> > > > Readahead I think will not readahead into a holepunched region as the hole
> > > > punching extends to the fs layer _I believe_ I have not checked the code for
> > > > this, but I believe it actually changes the underlying file too right to say
> > > > 'this part of the file is empty'?
> > >
> > > Well, we are talking about shmem here ... not your ordinary fs backed by an
> > > actual file :)
> >
> > I am talking about both, I multitask ;)
>
> For !shmem, we should indeed not be messing with a sparse file structure.

Ah right I get you, sorry missed your point here. Yeah MADV_REMOVE for shmem
will just explicitly drop the pages which doesn't have any underlying file to
impact as with actually file-backed memory.

>
>
> --
> Cheers,
>
> David / dhildenb
>
diff mbox series

Patch

diff --git a/mm/madvise.c b/mm/madvise.c
index 6ecead476a80..e01e93e179a8 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1051,13 +1051,7 @@  static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked)
 	if (!allow_locked)
 		disallowed |= VM_LOCKED;
 
-	if (!vma_is_anonymous(vma))
-		return false;
-
-	if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE)
-		return false;
-
-	return true;
+	return !(vma->vm_flags & disallowed);
 }
 
 static bool is_guard_pte_marker(pte_t ptent)