Message ID | d885cb259174736c2830a5dfe07f81b214ef3faa.1739469950.git.lorenzo.stoakes@oracle.com |
---|---|
State | New |
Headers | show |
Series | mm: permit guard regions for file-backed/shmem mappings | expand |
On 13.02.25 19:17, Lorenzo Stoakes wrote: > There is no reason to disallow guard regions in file-backed mappings - > readahead and fault-around both function correctly in the presence of PTE > markers, equally other operations relating to memory-mapped files function > correctly. > > Additionally, read-only mappings if introducing guard-regions, only > restrict the mapping further, which means there is no violation of any > access rights by permitting this to be so. > > Removing this restriction allows for read-only mapped files (such as > executable files) correctly which would otherwise not be permitted. > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > --- > mm/madvise.c | 8 +------- > 1 file changed, 1 insertion(+), 7 deletions(-) > > diff --git a/mm/madvise.c b/mm/madvise.c > index 6ecead476a80..e01e93e179a8 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked) > if (!allow_locked) > disallowed |= VM_LOCKED; > > - if (!vma_is_anonymous(vma)) > - return false; > - > - if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE) > - return false; > - > - return true; > + return !(vma->vm_flags & disallowed); > } > > static bool is_guard_pte_marker(pte_t ptent) Acked-by: David Hildenbrand <david@redhat.com> I assume these markers cannot completely prevent us from allocating pages/folios for these underlying file/pageache ranges of these markers in case of shmem during page faults, right?
On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote: > On 13.02.25 19:17, Lorenzo Stoakes wrote: > > There is no reason to disallow guard regions in file-backed mappings - > > readahead and fault-around both function correctly in the presence of PTE > > markers, equally other operations relating to memory-mapped files function > > correctly. > > > > Additionally, read-only mappings if introducing guard-regions, only > > restrict the mapping further, which means there is no violation of any > > access rights by permitting this to be so. > > > > Removing this restriction allows for read-only mapped files (such as > > executable files) correctly which would otherwise not be permitted. > > > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > --- > > mm/madvise.c | 8 +------- > > 1 file changed, 1 insertion(+), 7 deletions(-) > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > index 6ecead476a80..e01e93e179a8 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked) > > if (!allow_locked) > > disallowed |= VM_LOCKED; > > - if (!vma_is_anonymous(vma)) > > - return false; > > - > > - if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE) > > - return false; > > - > > - return true; > > + return !(vma->vm_flags & disallowed); > > } > > static bool is_guard_pte_marker(pte_t ptent) > > Acked-by: David Hildenbrand <david@redhat.com> Thanks! > > I assume these markers cannot completely prevent us from allocating > pages/folios for these underlying file/pageache ranges of these markers in > case of shmem during page faults, right? If the markers are in place, then page faulting will result in a segfault. If we faulted in a shmem page then installed markers (which would zap the range), then the page cache will be populated, but obviously subject to standard reclaim. If we perform synchronous readahead prior to a guard region that includes (partially or fully) a guard region we might major fault entries into the page cache that are then not accessable _from that mapping_, this is rather unavoidable as this doesn't account for page table mappings and should be largely trivial overhead (also these folios are reclaimable). > > -- > Cheers, > > David / dhildenb >
On 18.02.25 17:12, Lorenzo Stoakes wrote: > On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote: >> On 13.02.25 19:17, Lorenzo Stoakes wrote: >>> There is no reason to disallow guard regions in file-backed mappings - >>> readahead and fault-around both function correctly in the presence of PTE >>> markers, equally other operations relating to memory-mapped files function >>> correctly. >>> >>> Additionally, read-only mappings if introducing guard-regions, only >>> restrict the mapping further, which means there is no violation of any >>> access rights by permitting this to be so. >>> >>> Removing this restriction allows for read-only mapped files (such as >>> executable files) correctly which would otherwise not be permitted. >>> >>> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> >>> --- >>> mm/madvise.c | 8 +------- >>> 1 file changed, 1 insertion(+), 7 deletions(-) >>> >>> diff --git a/mm/madvise.c b/mm/madvise.c >>> index 6ecead476a80..e01e93e179a8 100644 >>> --- a/mm/madvise.c >>> +++ b/mm/madvise.c >>> @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked) >>> if (!allow_locked) >>> disallowed |= VM_LOCKED; >>> - if (!vma_is_anonymous(vma)) >>> - return false; >>> - >>> - if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE) >>> - return false; >>> - >>> - return true; >>> + return !(vma->vm_flags & disallowed); >>> } >>> static bool is_guard_pte_marker(pte_t ptent) >> >> Acked-by: David Hildenbrand <david@redhat.com> > > Thanks! > >> >> I assume these markers cannot completely prevent us from allocating >> pages/folios for these underlying file/pageache ranges of these markers in >> case of shmem during page faults, right? > > If the markers are in place, then page faulting will result in a > segfault. If we faulted in a shmem page then installed markers (which would > zap the range), then the page cache will be populated, but obviously > subject to standard reclaim. Well, yes, (a) if there is swap and (b), if the noswap option was not specified for tmpfs. Okay, so installing a guard entry might require punshing a hole to get rid of any already-existing memory. But readahead (below) might mess it up. > > If we perform synchronous readahead prior to a guard region that includes > (partially or fully) a guard region we might major fault entries into the > page cache that are then not accessable _from that mapping_, this is rather > unavoidable as this doesn't account for page table mappings and should be > largely trivial overhead (also these folios are reclaimable). Right, that's what I had in mind: assume I have a single marker in a PMD, shmem might allocate a PMD THP to back that region, ignoring the marker hint. (so I think) Thanks!
On 18.02.25 17:21, Lorenzo Stoakes wrote: > On Tue, Feb 18, 2025 at 05:17:20PM +0100, David Hildenbrand wrote: >> On 18.02.25 17:12, Lorenzo Stoakes wrote: >>> On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote: >>>> On 13.02.25 19:17, Lorenzo Stoakes wrote: >>>>> There is no reason to disallow guard regions in file-backed mappings - >>>>> readahead and fault-around both function correctly in the presence of PTE >>>>> markers, equally other operations relating to memory-mapped files function >>>>> correctly. >>>>> >>>>> Additionally, read-only mappings if introducing guard-regions, only >>>>> restrict the mapping further, which means there is no violation of any >>>>> access rights by permitting this to be so. >>>>> >>>>> Removing this restriction allows for read-only mapped files (such as >>>>> executable files) correctly which would otherwise not be permitted. >>>>> >>>>> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> >>>>> --- >>>>> mm/madvise.c | 8 +------- >>>>> 1 file changed, 1 insertion(+), 7 deletions(-) >>>>> >>>>> diff --git a/mm/madvise.c b/mm/madvise.c >>>>> index 6ecead476a80..e01e93e179a8 100644 >>>>> --- a/mm/madvise.c >>>>> +++ b/mm/madvise.c >>>>> @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked) >>>>> if (!allow_locked) >>>>> disallowed |= VM_LOCKED; >>>>> - if (!vma_is_anonymous(vma)) >>>>> - return false; >>>>> - >>>>> - if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE) >>>>> - return false; >>>>> - >>>>> - return true; >>>>> + return !(vma->vm_flags & disallowed); >>>>> } >>>>> static bool is_guard_pte_marker(pte_t ptent) >>>> >>>> Acked-by: David Hildenbrand <david@redhat.com> >>> >>> Thanks! >>> >>>> >>>> I assume these markers cannot completely prevent us from allocating >>>> pages/folios for these underlying file/pageache ranges of these markers in >>>> case of shmem during page faults, right? >>> >>> If the markers are in place, then page faulting will result in a >>> segfault. If we faulted in a shmem page then installed markers (which would >>> zap the range), then the page cache will be populated, but obviously >>> subject to standard reclaim. >> >> Well, yes, (a) if there is swap and (b), if the noswap option was not >> specified for tmpfs. >> > > Right, yeah if you don't have it set up such that dropping a reference to the > folio doesn't drop the page altogether. > > I think this matches expectation though in that you'd get the same results from > an MADV_DONTNEED followed by faulting the page again. It might make sense to document that: installing a guard behaves just like MADV_DONTNEED; in case of a file, that means that the pagecache is left untouched. > >> Okay, so installing a guard entry might require punshing a hole to get rid >> of any already-existing memory. But readahead (below) might mess it up. > > Only if you are so concerned about avoiding the page cache being populated there > that you want to do this :) > > Readahead I think will not readahead into a holepunched region as the hole > punching extends to the fs layer _I believe_ I have not checked the code for > this, but I believe it actually changes the underlying file too right to say > 'this part of the file is empty'? Well, we are talking about shmem here ... not your ordinary fs backed by an actual file :)
On 18.02.25 17:49, Lorenzo Stoakes wrote: > On Tue, Feb 18, 2025 at 05:27:24PM +0100, David Hildenbrand wrote: >> On 18.02.25 17:21, Lorenzo Stoakes wrote: >>> On Tue, Feb 18, 2025 at 05:17:20PM +0100, David Hildenbrand wrote: >>>> On 18.02.25 17:12, Lorenzo Stoakes wrote: >>>>> On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote: >>>>>> On 13.02.25 19:17, Lorenzo Stoakes wrote: >>>>>>> There is no reason to disallow guard regions in file-backed mappings - >>>>>>> readahead and fault-around both function correctly in the presence of PTE >>>>>>> markers, equally other operations relating to memory-mapped files function >>>>>>> correctly. >>>>>>> >>>>>>> Additionally, read-only mappings if introducing guard-regions, only >>>>>>> restrict the mapping further, which means there is no violation of any >>>>>>> access rights by permitting this to be so. >>>>>>> >>>>>>> Removing this restriction allows for read-only mapped files (such as >>>>>>> executable files) correctly which would otherwise not be permitted. >>>>>>> >>>>>>> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> >>>>>>> --- >>>>>>> mm/madvise.c | 8 +------- >>>>>>> 1 file changed, 1 insertion(+), 7 deletions(-) >>>>>>> >>>>>>> diff --git a/mm/madvise.c b/mm/madvise.c >>>>>>> index 6ecead476a80..e01e93e179a8 100644 >>>>>>> --- a/mm/madvise.c >>>>>>> +++ b/mm/madvise.c >>>>>>> @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked) >>>>>>> if (!allow_locked) >>>>>>> disallowed |= VM_LOCKED; >>>>>>> - if (!vma_is_anonymous(vma)) >>>>>>> - return false; >>>>>>> - >>>>>>> - if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE) >>>>>>> - return false; >>>>>>> - >>>>>>> - return true; >>>>>>> + return !(vma->vm_flags & disallowed); >>>>>>> } >>>>>>> static bool is_guard_pte_marker(pte_t ptent) >>>>>> >>>>>> Acked-by: David Hildenbrand <david@redhat.com> >>>>> >>>>> Thanks! >>>>> >>>>>> >>>>>> I assume these markers cannot completely prevent us from allocating >>>>>> pages/folios for these underlying file/pageache ranges of these markers in >>>>>> case of shmem during page faults, right? >>>>> >>>>> If the markers are in place, then page faulting will result in a >>>>> segfault. If we faulted in a shmem page then installed markers (which would >>>>> zap the range), then the page cache will be populated, but obviously >>>>> subject to standard reclaim. >>>> >>>> Well, yes, (a) if there is swap and (b), if the noswap option was not >>>> specified for tmpfs. >>>> >>> >>> Right, yeah if you don't have it set up such that dropping a reference to the >>> folio doesn't drop the page altogether. >>> >>> I think this matches expectation though in that you'd get the same results from >>> an MADV_DONTNEED followed by faulting the page again. >> >> It might make sense to document that: installing a guard behaves just like >> MADV_DONTNEED; in case of a file, that means that the pagecache is left >> untouched. > > More docs noooo! :P I will update the man pages when this is more obviously > heading for landing in 6.15 accordingly. > > Current man page documentation on this is: > > 'If the region maps memory pages those mappings will be replaced as part of > the operation' > > I think something like: > > 'If the region maps pages those mappings will be replaced as part of the > operation. When guard regions are removed via MADV_GUARD_REMOVE, faulting > in the page will behave as if that region had MADV_DONTNEED applied to it, > that is anonymous ranges will be backed by newly allocated zeroed pages and > file-backed ranges will be backed by the underlying file pages.' > > Probably something less wordy than this... Yeah, but sounds good to me. > >> >>> >>>> Okay, so installing a guard entry might require punshing a hole to get rid >>>> of any already-existing memory. But readahead (below) might mess it up. >>> >>> Only if you are so concerned about avoiding the page cache being populated there >>> that you want to do this :) >>> >>> Readahead I think will not readahead into a holepunched region as the hole >>> punching extends to the fs layer _I believe_ I have not checked the code for >>> this, but I believe it actually changes the underlying file too right to say >>> 'this part of the file is empty'? >> >> Well, we are talking about shmem here ... not your ordinary fs backed by an >> actual file :) > > I am talking about both, I multitask ;) For !shmem, we should indeed not be messing with a sparse file structure.
On Tue, Feb 18, 2025 at 06:00:36PM +0100, David Hildenbrand wrote: > On 18.02.25 17:49, Lorenzo Stoakes wrote: > > On Tue, Feb 18, 2025 at 05:27:24PM +0100, David Hildenbrand wrote: > > > On 18.02.25 17:21, Lorenzo Stoakes wrote: > > > > On Tue, Feb 18, 2025 at 05:17:20PM +0100, David Hildenbrand wrote: > > > > > On 18.02.25 17:12, Lorenzo Stoakes wrote: > > > > > > On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote: > > > > > > > On 13.02.25 19:17, Lorenzo Stoakes wrote: > > > > > > > > There is no reason to disallow guard regions in file-backed mappings - > > > > > > > > readahead and fault-around both function correctly in the presence of PTE > > > > > > > > markers, equally other operations relating to memory-mapped files function > > > > > > > > correctly. > > > > > > > > > > > > > > > > Additionally, read-only mappings if introducing guard-regions, only > > > > > > > > restrict the mapping further, which means there is no violation of any > > > > > > > > access rights by permitting this to be so. > > > > > > > > > > > > > > > > Removing this restriction allows for read-only mapped files (such as > > > > > > > > executable files) correctly which would otherwise not be permitted. > > > > > > > > > > > > > > > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > > > > > > > --- > > > > > > > > mm/madvise.c | 8 +------- > > > > > > > > 1 file changed, 1 insertion(+), 7 deletions(-) > > > > > > > > > > > > > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > > > > > > > index 6ecead476a80..e01e93e179a8 100644 > > > > > > > > --- a/mm/madvise.c > > > > > > > > +++ b/mm/madvise.c > > > > > > > > @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked) > > > > > > > > if (!allow_locked) > > > > > > > > disallowed |= VM_LOCKED; > > > > > > > > - if (!vma_is_anonymous(vma)) > > > > > > > > - return false; > > > > > > > > - > > > > > > > > - if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE) > > > > > > > > - return false; > > > > > > > > - > > > > > > > > - return true; > > > > > > > > + return !(vma->vm_flags & disallowed); > > > > > > > > } > > > > > > > > static bool is_guard_pte_marker(pte_t ptent) > > > > > > > > > > > > > > Acked-by: David Hildenbrand <david@redhat.com> > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > I assume these markers cannot completely prevent us from allocating > > > > > > > pages/folios for these underlying file/pageache ranges of these markers in > > > > > > > case of shmem during page faults, right? > > > > > > > > > > > > If the markers are in place, then page faulting will result in a > > > > > > segfault. If we faulted in a shmem page then installed markers (which would > > > > > > zap the range), then the page cache will be populated, but obviously > > > > > > subject to standard reclaim. > > > > > > > > > > Well, yes, (a) if there is swap and (b), if the noswap option was not > > > > > specified for tmpfs. > > > > > > > > > > > > > Right, yeah if you don't have it set up such that dropping a reference to the > > > > folio doesn't drop the page altogether. > > > > > > > > I think this matches expectation though in that you'd get the same results from > > > > an MADV_DONTNEED followed by faulting the page again. > > > > > > It might make sense to document that: installing a guard behaves just like > > > MADV_DONTNEED; in case of a file, that means that the pagecache is left > > > untouched. > > > > More docs noooo! :P I will update the man pages when this is more obviously > > heading for landing in 6.15 accordingly. > > > > Current man page documentation on this is: > > > > 'If the region maps memory pages those mappings will be replaced as part of > > the operation' > > > > I think something like: > > > > 'If the region maps pages those mappings will be replaced as part of the > > operation. When guard regions are removed via MADV_GUARD_REMOVE, faulting > > in the page will behave as if that region had MADV_DONTNEED applied to it, > > that is anonymous ranges will be backed by newly allocated zeroed pages and > > file-backed ranges will be backed by the underlying file pages.' > > > > Probably something less wordy than this... > > Yeah, but sounds good to me. > > > > > > > > > > > > > > > Okay, so installing a guard entry might require punshing a hole to get rid > > > > > of any already-existing memory. But readahead (below) might mess it up. > > > > > > > > Only if you are so concerned about avoiding the page cache being populated there > > > > that you want to do this :) > > > > > > > > Readahead I think will not readahead into a holepunched region as the hole > > > > punching extends to the fs layer _I believe_ I have not checked the code for > > > > this, but I believe it actually changes the underlying file too right to say > > > > 'this part of the file is empty'? > > > > > > Well, we are talking about shmem here ... not your ordinary fs backed by an > > > actual file :) > > > > I am talking about both, I multitask ;) > > For !shmem, we should indeed not be messing with a sparse file structure. Ah right I get you, sorry missed your point here. Yeah MADV_REMOVE for shmem will just explicitly drop the pages which doesn't have any underlying file to impact as with actually file-backed memory. > > > -- > Cheers, > > David / dhildenb >
diff --git a/mm/madvise.c b/mm/madvise.c index 6ecead476a80..e01e93e179a8 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked) if (!allow_locked) disallowed |= VM_LOCKED; - if (!vma_is_anonymous(vma)) - return false; - - if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE) - return false; - - return true; + return !(vma->vm_flags & disallowed); } static bool is_guard_pte_marker(pte_t ptent)
There is no reason to disallow guard regions in file-backed mappings - readahead and fault-around both function correctly in the presence of PTE markers, equally other operations relating to memory-mapped files function correctly. Additionally, read-only mappings if introducing guard-regions, only restrict the mapping further, which means there is no violation of any access rights by permitting this to be so. Removing this restriction allows for read-only mapped files (such as executable files) correctly which would otherwise not be permitted. Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> --- mm/madvise.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)