diff mbox series

[3/3] mm: migrate: Stabilise page count when migrating transparent hugepages

Message ID 1496771916-28203-4-git-send-email-will.deacon@arm.com
State Superseded
Headers show
Series mm: huge pages: Misc fixes for issues found during fuzzing | expand

Commit Message

Will Deacon June 6, 2017, 5:58 p.m. UTC
When migrating a transparent hugepage, migrate_misplaced_transhuge_page
guards itself against a concurrent fastgup of the page by checking that
the page count is equal to 2 before and after installing the new pmd.

If the page count changes, then the pmd is reverted back to the original
entry, however there is a small window where the new (possibly writable)
pmd is installed and the underlying page could be written by userspace.
Restoring the old pmd could therefore result in loss of data.

This patch fixes the problem by freezing the page count whilst updating
the page tables, which protects against a concurrent fastgup without the
need to restore the old pmd in the failure case (since the page count can
no longer change under our feet).

Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>

---
 mm/migrate.c | 15 ++-------------
 1 file changed, 2 insertions(+), 13 deletions(-)

-- 
2.1.4

Comments

Kirill A. Shutemov June 8, 2017, 10:47 a.m. UTC | #1
On Tue, Jun 06, 2017 at 06:58:36PM +0100, Will Deacon wrote:
> When migrating a transparent hugepage, migrate_misplaced_transhuge_page

> guards itself against a concurrent fastgup of the page by checking that

> the page count is equal to 2 before and after installing the new pmd.

> 

> If the page count changes, then the pmd is reverted back to the original

> entry, however there is a small window where the new (possibly writable)

> pmd is installed and the underlying page could be written by userspace.

> Restoring the old pmd could therefore result in loss of data.

> 

> This patch fixes the problem by freezing the page count whilst updating

> the page tables, which protects against a concurrent fastgup without the

> need to restore the old pmd in the failure case (since the page count can

> no longer change under our feet).

> 

> Cc: Mel Gorman <mgorman@suse.de>

> Signed-off-by: Will Deacon <will.deacon@arm.com>


Looks correct to me.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>


-- 
 Kirill A. Shutemov
Vlastimil Babka June 8, 2017, 10:52 a.m. UTC | #2
On 06/06/2017 07:58 PM, Will Deacon wrote:
> When migrating a transparent hugepage, migrate_misplaced_transhuge_page

> guards itself against a concurrent fastgup of the page by checking that

> the page count is equal to 2 before and after installing the new pmd.

> 

> If the page count changes, then the pmd is reverted back to the original

> entry, however there is a small window where the new (possibly writable)

> pmd is installed and the underlying page could be written by userspace.

> Restoring the old pmd could therefore result in loss of data.

> 

> This patch fixes the problem by freezing the page count whilst updating

> the page tables, which protects against a concurrent fastgup without the

> need to restore the old pmd in the failure case (since the page count can

> no longer change under our feet).

> 

> Cc: Mel Gorman <mgorman@suse.de>

> Signed-off-by: Will Deacon <will.deacon@arm.com>

> ---

>  mm/migrate.c | 15 ++-------------

>  1 file changed, 2 insertions(+), 13 deletions(-)

> 

> diff --git a/mm/migrate.c b/mm/migrate.c

> index 89a0a1707f4c..8b21f1b1ec6e 100644

> --- a/mm/migrate.c

> +++ b/mm/migrate.c

> @@ -1913,7 +1913,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,

>  	int page_lru = page_is_file_cache(page);

>  	unsigned long mmun_start = address & HPAGE_PMD_MASK;

>  	unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE;

> -	pmd_t orig_entry;

>  

>  	/*

>  	 * Rate-limit the amount of data that is being migrated to a node.

> @@ -1956,8 +1955,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,

>  	/* Recheck the target PMD */

>  	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);

>  	ptl = pmd_lock(mm, pmd);

> -	if (unlikely(!pmd_same(*pmd, entry) || page_count(page) != 2)) {

> -fail_putback:

> +	if (unlikely(!pmd_same(*pmd, entry) || !page_ref_freeze(page, 2))) {

>  		spin_unlock(ptl);

>  		mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);

>  

> @@ -1979,7 +1977,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,

>  		goto out_unlock;

>  	}

>  

> -	orig_entry = *pmd;

>  	entry = mk_huge_pmd(new_page, vma->vm_page_prot);

>  	entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);

>  

> @@ -1996,15 +1993,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,


There's a comment above this:

       /*
         * Clear the old entry under pagetable lock and establish the new PTE.
         * Any parallel GUP will either observe the old page blocking on the
         * page lock, block on the page table lock or observe the new page.
         * The SetPageUptodate on the new page and page_add_new_anon_rmap
         * guarantee the copy is visible before the pagetable update.
         */

Is it still correct? Didn't the freezing prevent some of the cases above?

>  	set_pmd_at(mm, mmun_start, pmd, entry);

>  	update_mmu_cache_pmd(vma, address, &entry);

>  

> -	if (page_count(page) != 2) {


BTW, how did the old code recognize that page count would increase and then
decrease back?

> -		set_pmd_at(mm, mmun_start, pmd, orig_entry);

> -		flush_pmd_tlb_range(vma, mmun_start, mmun_end);

> -		mmu_notifier_invalidate_range(mm, mmun_start, mmun_end);

> -		update_mmu_cache_pmd(vma, address, &entry);

> -		page_remove_rmap(new_page, true);

> -		goto fail_putback;

> -	}

> -

> +	page_ref_unfreeze(page, 2);

>  	mlock_migrate_page(new_page, page);

>  	page_remove_rmap(page, true);

>  	set_page_owner_migrate_reason(new_page, MR_NUMA_MISPLACED);

>
Will Deacon June 8, 2017, 12:07 p.m. UTC | #3
On Thu, Jun 08, 2017 at 12:52:07PM +0200, Vlastimil Babka wrote:
> On 06/06/2017 07:58 PM, Will Deacon wrote:

> > When migrating a transparent hugepage, migrate_misplaced_transhuge_page

> > guards itself against a concurrent fastgup of the page by checking that

> > the page count is equal to 2 before and after installing the new pmd.

> > 

> > If the page count changes, then the pmd is reverted back to the original

> > entry, however there is a small window where the new (possibly writable)

> > pmd is installed and the underlying page could be written by userspace.

> > Restoring the old pmd could therefore result in loss of data.

> > 

> > This patch fixes the problem by freezing the page count whilst updating

> > the page tables, which protects against a concurrent fastgup without the

> > need to restore the old pmd in the failure case (since the page count can

> > no longer change under our feet).

> > 

> > Cc: Mel Gorman <mgorman@suse.de>

> > Signed-off-by: Will Deacon <will.deacon@arm.com>

> > ---

> >  mm/migrate.c | 15 ++-------------

> >  1 file changed, 2 insertions(+), 13 deletions(-)

> > 

> > diff --git a/mm/migrate.c b/mm/migrate.c

> > index 89a0a1707f4c..8b21f1b1ec6e 100644

> > --- a/mm/migrate.c

> > +++ b/mm/migrate.c

> > @@ -1913,7 +1913,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,

> >  	int page_lru = page_is_file_cache(page);

> >  	unsigned long mmun_start = address & HPAGE_PMD_MASK;

> >  	unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE;

> > -	pmd_t orig_entry;

> >  

> >  	/*

> >  	 * Rate-limit the amount of data that is being migrated to a node.

> > @@ -1956,8 +1955,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,

> >  	/* Recheck the target PMD */

> >  	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);

> >  	ptl = pmd_lock(mm, pmd);

> > -	if (unlikely(!pmd_same(*pmd, entry) || page_count(page) != 2)) {

> > -fail_putback:

> > +	if (unlikely(!pmd_same(*pmd, entry) || !page_ref_freeze(page, 2))) {

> >  		spin_unlock(ptl);

> >  		mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);

> >  

> > @@ -1979,7 +1977,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,

> >  		goto out_unlock;

> >  	}

> >  

> > -	orig_entry = *pmd;

> >  	entry = mk_huge_pmd(new_page, vma->vm_page_prot);

> >  	entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);

> >  

> > @@ -1996,15 +1993,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,

> 

> There's a comment above this:

> 

>        /*

>          * Clear the old entry under pagetable lock and establish the new PTE.

>          * Any parallel GUP will either observe the old page blocking on the

>          * page lock, block on the page table lock or observe the new page.

>          * The SetPageUptodate on the new page and page_add_new_anon_rmap

>          * guarantee the copy is visible before the pagetable update.

>          */

> 

> Is it still correct? Didn't the freezing prevent some of the cases above?


I don't think the comment needs to change, the freezing is just doing
correctly what the code tried to do before. Granted, the blocking might come
about because of the count momentarily being set to 0 (and
page_cache_add_speculative bailing), but that's just fastGUP implementation
details, I think.

> 

> >  	set_pmd_at(mm, mmun_start, pmd, entry);

> >  	update_mmu_cache_pmd(vma, address, &entry);

> >  

> > -	if (page_count(page) != 2) {

> 

> BTW, how did the old code recognize that page count would increase and then

> decrease back?


I'm not sure that case matters because the inc/dec would happen before the
new PMD is put in place (otherwise it wouldn't be reachable via the
fastGUP).

Will
zhong jiang June 9, 2017, 8:25 a.m. UTC | #4
On 2017/6/8 20:07, Will Deacon wrote:
> On Thu, Jun 08, 2017 at 12:52:07PM +0200, Vlastimil Babka wrote:

>> On 06/06/2017 07:58 PM, Will Deacon wrote:

>>> When migrating a transparent hugepage, migrate_misplaced_transhuge_page

>>> guards itself against a concurrent fastgup of the page by checking that

>>> the page count is equal to 2 before and after installing the new pmd.

>>>

>>> If the page count changes, then the pmd is reverted back to the original

>>> entry, however there is a small window where the new (possibly writable)

>>> pmd is installed and the underlying page could be written by userspace.

>>> Restoring the old pmd could therefore result in loss of data.

>>>

>>> This patch fixes the problem by freezing the page count whilst updating

>>> the page tables, which protects against a concurrent fastgup without the

>>> need to restore the old pmd in the failure case (since the page count can

>>> no longer change under our feet).

>>>

>>> Cc: Mel Gorman <mgorman@suse.de>

>>> Signed-off-by: Will Deacon <will.deacon@arm.com>

>>> ---

>>>  mm/migrate.c | 15 ++-------------

>>>  1 file changed, 2 insertions(+), 13 deletions(-)

>>>

>>> diff --git a/mm/migrate.c b/mm/migrate.c

>>> index 89a0a1707f4c..8b21f1b1ec6e 100644

>>> --- a/mm/migrate.c

>>> +++ b/mm/migrate.c

>>> @@ -1913,7 +1913,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,

>>>  	int page_lru = page_is_file_cache(page);

>>>  	unsigned long mmun_start = address & HPAGE_PMD_MASK;

>>>  	unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE;

>>> -	pmd_t orig_entry;

>>>  

>>>  	/*

>>>  	 * Rate-limit the amount of data that is being migrated to a node.

>>> @@ -1956,8 +1955,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,

>>>  	/* Recheck the target PMD */

>>>  	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);

>>>  	ptl = pmd_lock(mm, pmd);

>>> -	if (unlikely(!pmd_same(*pmd, entry) || page_count(page) != 2)) {

>>> -fail_putback:

>>> +	if (unlikely(!pmd_same(*pmd, entry) || !page_ref_freeze(page, 2))) {

>>>  		spin_unlock(ptl);

>>>  		mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);

>>>  

>>> @@ -1979,7 +1977,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,

>>>  		goto out_unlock;

>>>  	}

>>>  

>>> -	orig_entry = *pmd;

>>>  	entry = mk_huge_pmd(new_page, vma->vm_page_prot);

>>>  	entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);

>>>  

>>> @@ -1996,15 +1993,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,

>> There's a comment above this:

>>

>>        /*

>>          * Clear the old entry under pagetable lock and establish the new PTE.

>>          * Any parallel GUP will either observe the old page blocking on the

>>          * page lock, block on the page table lock or observe the new page.

>>          * The SetPageUptodate on the new page and page_add_new_anon_rmap

>>          * guarantee the copy is visible before the pagetable update.

>>          */

>>

>> Is it still correct? Didn't the freezing prevent some of the cases above?

> I don't think the comment needs to change, the freezing is just doing

> correctly what the code tried to do before. Granted, the blocking might come

> about because of the count momentarily being set to 0 (and

> page_cache_add_speculative bailing), but that's just fastGUP implementation

> details, I think.

  The pagetable lock will prevent the fastgup by userspace.  why the race will come across
  I do not unaderstand , can you explain it please?

 Thanks
 zhongjiang
>>>  	set_pmd_at(mm, mmun_start, pmd, entry);

>>>  	update_mmu_cache_pmd(vma, address, &entry);

>>>  

>>> -	if (page_count(page) != 2) {

>> BTW, how did the old code recognize that page count would increase and then

>> decrease back?

> I'm not sure that case matters because the inc/dec would happen before the

> new PMD is put in place (otherwise it wouldn't be reachable via the

> fastGUP).

>

> Will

> --

> To unsubscribe, send a message with 'unsubscribe linux-mm' in

> the body to majordomo@kvack.org.  For more info on Linux MM,

> see: http://www.linux-mm.org/ .

> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

>

> .

>
zhong jiang June 9, 2017, 9:16 a.m. UTC | #5
On 2017/6/8 20:07, Will Deacon wrote:
> On Thu, Jun 08, 2017 at 12:52:07PM +0200, Vlastimil Babka wrote:

>> On 06/06/2017 07:58 PM, Will Deacon wrote:

>>> When migrating a transparent hugepage, migrate_misplaced_transhuge_page

>>> guards itself against a concurrent fastgup of the page by checking that

>>> the page count is equal to 2 before and after installing the new pmd.

>>>

>>> If the page count changes, then the pmd is reverted back to the original

>>> entry, however there is a small window where the new (possibly writable)

>>> pmd is installed and the underlying page could be written by userspace.

>>> Restoring the old pmd could therefore result in loss of data.

>>>

>>> This patch fixes the problem by freezing the page count whilst updating

>>> the page tables, which protects against a concurrent fastgup without the

>>> need to restore the old pmd in the failure case (since the page count can

>>> no longer change under our feet).

>>>

>>> Cc: Mel Gorman <mgorman@suse.de>

>>> Signed-off-by: Will Deacon <will.deacon@arm.com>

>>> ---

>>>  mm/migrate.c | 15 ++-------------

>>>  1 file changed, 2 insertions(+), 13 deletions(-)

>>>

>>> diff --git a/mm/migrate.c b/mm/migrate.c

>>> index 89a0a1707f4c..8b21f1b1ec6e 100644

>>> --- a/mm/migrate.c

>>> +++ b/mm/migrate.c

>>> @@ -1913,7 +1913,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,

>>>  	int page_lru = page_is_file_cache(page);

>>>  	unsigned long mmun_start = address & HPAGE_PMD_MASK;

>>>  	unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE;

>>> -	pmd_t orig_entry;

>>>  

>>>  	/*

>>>  	 * Rate-limit the amount of data that is being migrated to a node.

>>> @@ -1956,8 +1955,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,

>>>  	/* Recheck the target PMD */

>>>  	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);

>>>  	ptl = pmd_lock(mm, pmd);

>>> -	if (unlikely(!pmd_same(*pmd, entry) || page_count(page) != 2)) {

>>> -fail_putback:

>>> +	if (unlikely(!pmd_same(*pmd, entry) || !page_ref_freeze(page, 2))) {

>>>  		spin_unlock(ptl);

>>>  		mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);

>>>  

>>> @@ -1979,7 +1977,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,

>>>  		goto out_unlock;

>>>  	}

>>>  

>>> -	orig_entry = *pmd;

>>>  	entry = mk_huge_pmd(new_page, vma->vm_page_prot);

>>>  	entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);

>>>  

>>> @@ -1996,15 +1993,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,

>> There's a comment above this:

>>

>>        /*

>>          * Clear the old entry under pagetable lock and establish the new PTE.

>>          * Any parallel GUP will either observe the old page blocking on the

>>          * page lock, block on the page table lock or observe the new page.

>>          * The SetPageUptodate on the new page and page_add_new_anon_rmap

>>          * guarantee the copy is visible before the pagetable update.

>>          */

>>

>> Is it still correct? Didn't the freezing prevent some of the cases above?

> I don't think the comment needs to change, the freezing is just doing

> correctly what the code tried to do before. Granted, the blocking might come

> about because of the count momentarily being set to 0 (and

> page_cache_add_speculative bailing), but that's just fastGUP implementation

> details, I think.

 I think it should be hold off the speculative , but the fastgup by userspace.
>>>  	set_pmd_at(mm, mmun_start, pmd, entry);

>>>  	update_mmu_cache_pmd(vma, address, &entry);

>>>  

>>> -	if (page_count(page) != 2) {

>> BTW, how did the old code recognize that page count would increase and then

>> decrease back?

> I'm not sure that case matters because the inc/dec would happen before the

> new PMD is put in place (otherwise it wouldn't be reachable via the

> fastGUP).

>

> Will

>

> --

> To unsubscribe, send a message with 'unsubscribe linux-mm' in

> the body to majordomo@kvack.org.  For more info on Linux MM,

> see: http://www.linux-mm.org/ .

> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

>

> .

>
diff mbox series

Patch

diff --git a/mm/migrate.c b/mm/migrate.c
index 89a0a1707f4c..8b21f1b1ec6e 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1913,7 +1913,6 @@  int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 	int page_lru = page_is_file_cache(page);
 	unsigned long mmun_start = address & HPAGE_PMD_MASK;
 	unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE;
-	pmd_t orig_entry;
 
 	/*
 	 * Rate-limit the amount of data that is being migrated to a node.
@@ -1956,8 +1955,7 @@  int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 	/* Recheck the target PMD */
 	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
 	ptl = pmd_lock(mm, pmd);
-	if (unlikely(!pmd_same(*pmd, entry) || page_count(page) != 2)) {
-fail_putback:
+	if (unlikely(!pmd_same(*pmd, entry) || !page_ref_freeze(page, 2))) {
 		spin_unlock(ptl);
 		mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
 
@@ -1979,7 +1977,6 @@  int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 		goto out_unlock;
 	}
 
-	orig_entry = *pmd;
 	entry = mk_huge_pmd(new_page, vma->vm_page_prot);
 	entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
 
@@ -1996,15 +1993,7 @@  int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 	set_pmd_at(mm, mmun_start, pmd, entry);
 	update_mmu_cache_pmd(vma, address, &entry);
 
-	if (page_count(page) != 2) {
-		set_pmd_at(mm, mmun_start, pmd, orig_entry);
-		flush_pmd_tlb_range(vma, mmun_start, mmun_end);
-		mmu_notifier_invalidate_range(mm, mmun_start, mmun_end);
-		update_mmu_cache_pmd(vma, address, &entry);
-		page_remove_rmap(new_page, true);
-		goto fail_putback;
-	}
-
+	page_ref_unfreeze(page, 2);
 	mlock_migrate_page(new_page, page);
 	page_remove_rmap(page, true);
 	set_page_owner_migrate_reason(new_page, MR_NUMA_MISPLACED);