diff mbox series

[2/3] mm/page_ref: Ensure page_ref_unfreeze is ordered against prior accesses

Message ID 1496771916-28203-3-git-send-email-will.deacon@arm.com
State New
Headers show
Series mm: huge pages: Misc fixes for issues found during fuzzing | expand

Commit Message

Will Deacon June 6, 2017, 5:58 p.m. UTC
page_ref_freeze and page_ref_unfreeze are designed to be used as a pair,
wrapping a critical section where struct pages can be modified without
having to worry about consistency for a concurrent fast-GUP.

Whilst page_ref_freeze has full barrier semantics due to its use of
atomic_cmpxchg, page_ref_unfreeze is implemented using atomic_set, which
doesn't provide any barrier semantics and allows the operation to be
reordered with respect to page modifications in the critical section.

This patch ensures that page_ref_unfreeze is ordered after any critical
section updates, by invoking smp_mb__before_atomic() prior to the
atomic_set.

Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Acked-by: Steve Capper <steve.capper@arm.com>

Signed-off-by: Will Deacon <will.deacon@arm.com>

---
 include/linux/page_ref.h | 1 +
 1 file changed, 1 insertion(+)

-- 
2.1.4

Comments

Vlastimil Babka June 8, 2017, 9:38 a.m. UTC | #1
On 06/06/2017 07:58 PM, Will Deacon wrote:
> page_ref_freeze and page_ref_unfreeze are designed to be used as a pair,

> wrapping a critical section where struct pages can be modified without

> having to worry about consistency for a concurrent fast-GUP.

> 

> Whilst page_ref_freeze has full barrier semantics due to its use of

> atomic_cmpxchg, page_ref_unfreeze is implemented using atomic_set, which

> doesn't provide any barrier semantics and allows the operation to be

> reordered with respect to page modifications in the critical section.

> 

> This patch ensures that page_ref_unfreeze is ordered after any critical

> section updates, by invoking smp_mb__before_atomic() prior to the

> atomic_set.

> 

> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

> Acked-by: Steve Capper <steve.capper@arm.com>

> Signed-off-by: Will Deacon <will.deacon@arm.com>


Undecided if it's really needed. This is IMHO not the classical case
from Documentation/core-api/atomic_ops.rst where we have to make
modifications visible before we let others see them? Here the one who is
freezing is doing it so others can't get their page pin and interfere
with the freezer's work. But maybe there are some (documented or not)
consistency guarantees to expect once you obtain the pin, that can be
violated, or they might be added later, so it would be safer to add the
barrier?

> ---

>  include/linux/page_ref.h | 1 +

>  1 file changed, 1 insertion(+)

> 

> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h

> index 610e13271918..74d32d7905cb 100644

> --- a/include/linux/page_ref.h

> +++ b/include/linux/page_ref.h

> @@ -174,6 +174,7 @@ static inline void page_ref_unfreeze(struct page *page, int count)

>  	VM_BUG_ON_PAGE(page_count(page) != 0, page);

>  	VM_BUG_ON(count == 0);

>  

> +	smp_mb__before_atomic();

>  	atomic_set(&page->_refcount, count);

>  	if (page_ref_tracepoint_active(__tracepoint_page_ref_unfreeze))

>  		__page_ref_unfreeze(page, count);

>
Will Deacon June 8, 2017, 10:34 a.m. UTC | #2
On Thu, Jun 08, 2017 at 11:38:21AM +0200, Vlastimil Babka wrote:
> On 06/06/2017 07:58 PM, Will Deacon wrote:

> > page_ref_freeze and page_ref_unfreeze are designed to be used as a pair,

> > wrapping a critical section where struct pages can be modified without

> > having to worry about consistency for a concurrent fast-GUP.

> > 

> > Whilst page_ref_freeze has full barrier semantics due to its use of

> > atomic_cmpxchg, page_ref_unfreeze is implemented using atomic_set, which

> > doesn't provide any barrier semantics and allows the operation to be

> > reordered with respect to page modifications in the critical section.

> > 

> > This patch ensures that page_ref_unfreeze is ordered after any critical

> > section updates, by invoking smp_mb__before_atomic() prior to the

> > atomic_set.

> > 

> > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

> > Acked-by: Steve Capper <steve.capper@arm.com>

> > Signed-off-by: Will Deacon <will.deacon@arm.com>

> 

> Undecided if it's really needed. This is IMHO not the classical case

> from Documentation/core-api/atomic_ops.rst where we have to make

> modifications visible before we let others see them? Here the one who is

> freezing is doing it so others can't get their page pin and interfere

> with the freezer's work. But maybe there are some (documented or not)

> consistency guarantees to expect once you obtain the pin, that can be

> violated, or they might be added later, so it would be safer to add the

> barrier?


The problem comes if the unfreeze is reordered so that it happens before the
freezer has performed its work. For example, in
migrate_huge_page_move_mapping:


	if (!page_ref_freeze(page, expected_count)) {
		spin_unlock_irq(&mapping->tree_lock);
		return -EAGAIN;
	}

	newpage->index = page->index;
	newpage->mapping = page->mapping;

	get_page(newpage);

	radix_tree_replace_slot(&mapping->page_tree, pslot, newpage);

	page_ref_unfreeze(page, expected_count - 1);


then there's nothing stopping the CPU (and potentially the compiler) from
reordering the unfreeze call so that it effectively becomes:


	if (!page_ref_freeze(page, expected_count)) {
		spin_unlock_irq(&mapping->tree_lock);
		return -EAGAIN;
	}

	page_ref_unfreeze(page, expected_count - 1);

	newpage->index = page->index;
	newpage->mapping = page->mapping;

	get_page(newpage);

	radix_tree_replace_slot(&mapping->page_tree, pslot, newpage);


which then means that the freezer's work is carried out without the page
being frozen.

Will
Kirill A. Shutemov June 8, 2017, 10:40 a.m. UTC | #3
On Thu, Jun 08, 2017 at 11:38:21AM +0200, Vlastimil Babka wrote:
> On 06/06/2017 07:58 PM, Will Deacon wrote:

> > page_ref_freeze and page_ref_unfreeze are designed to be used as a pair,

> > wrapping a critical section where struct pages can be modified without

> > having to worry about consistency for a concurrent fast-GUP.

> > 

> > Whilst page_ref_freeze has full barrier semantics due to its use of

> > atomic_cmpxchg, page_ref_unfreeze is implemented using atomic_set, which

> > doesn't provide any barrier semantics and allows the operation to be

> > reordered with respect to page modifications in the critical section.

> > 

> > This patch ensures that page_ref_unfreeze is ordered after any critical

> > section updates, by invoking smp_mb__before_atomic() prior to the

> > atomic_set.

> > 

> > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

> > Acked-by: Steve Capper <steve.capper@arm.com>

> > Signed-off-by: Will Deacon <will.deacon@arm.com>

> 

> Undecided if it's really needed. This is IMHO not the classical case

> from Documentation/core-api/atomic_ops.rst where we have to make

> modifications visible before we let others see them? Here the one who is

> freezing is doing it so others can't get their page pin and interfere

> with the freezer's work.


Hm.. I'm not sure I'm getting what you are talking about. 

What would guarantee others to see changes to page before seeing page
unfreezed?

> >  include/linux/page_ref.h | 1 +

> >  1 file changed, 1 insertion(+)

> > 

> > diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h

> > index 610e13271918..74d32d7905cb 100644

> > --- a/include/linux/page_ref.h

> > +++ b/include/linux/page_ref.h

> > @@ -174,6 +174,7 @@ static inline void page_ref_unfreeze(struct page *page, int count)

> >  	VM_BUG_ON_PAGE(page_count(page) != 0, page);

> >  	VM_BUG_ON(count == 0);

> >  

> > +	smp_mb__before_atomic();

> >  	atomic_set(&page->_refcount, count);


I *think* it should be smp_mb(), not __before_atomic(). atomic_set() is
not really atomic. For instance on x86 it's plain WRITE_ONCE() which CPU
would happily reorder.

> >  	if (page_ref_tracepoint_active(__tracepoint_page_ref_unfreeze))

> >  		__page_ref_unfreeze(page, count);

> > 

> 


-- 
 Kirill A. Shutemov
Vlastimil Babka June 8, 2017, 11:02 a.m. UTC | #4
On 06/08/2017 12:34 PM, Will Deacon wrote:
> On Thu, Jun 08, 2017 at 11:38:21AM +0200, Vlastimil Babka wrote:

>>

>> Undecided if it's really needed. This is IMHO not the classical case

>> from Documentation/core-api/atomic_ops.rst where we have to make

>> modifications visible before we let others see them? Here the one who is

>> freezing is doing it so others can't get their page pin and interfere

>> with the freezer's work. But maybe there are some (documented or not)

>> consistency guarantees to expect once you obtain the pin, that can be

>> violated, or they might be added later, so it would be safer to add the

>> barrier?

> 

> The problem comes if the unfreeze is reordered so that it happens before the

> freezer has performed its work. For example, in

> migrate_huge_page_move_mapping:

> 

> 

> 	if (!page_ref_freeze(page, expected_count)) {

> 		spin_unlock_irq(&mapping->tree_lock);

> 		return -EAGAIN;

> 	}

> 

> 	newpage->index = page->index;

> 	newpage->mapping = page->mapping;

> 

> 	get_page(newpage);

> 

> 	radix_tree_replace_slot(&mapping->page_tree, pslot, newpage);

> 

> 	page_ref_unfreeze(page, expected_count - 1);

> 

> 

> then there's nothing stopping the CPU (and potentially the compiler) from

> reordering the unfreeze call so that it effectively becomes:

> 

> 

> 	if (!page_ref_freeze(page, expected_count)) {

> 		spin_unlock_irq(&mapping->tree_lock);

> 		return -EAGAIN;

> 	}

> 

> 	page_ref_unfreeze(page, expected_count - 1);

> 

> 	newpage->index = page->index;

> 	newpage->mapping = page->mapping;

> 

> 	get_page(newpage);

> 

> 	radix_tree_replace_slot(&mapping->page_tree, pslot, newpage);

> 

> 

> which then means that the freezer's work is carried out without the page

> being frozen.


But in this example the modifications are for newpage and freezing is
for page, so I think it doesn't apply. But I get the point.

> Will

> 

> --

> To unsubscribe, send a message with 'unsubscribe linux-mm' in

> the body to majordomo@kvack.org.  For more info on Linux MM,

> see: http://www.linux-mm.org/ .

> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

>
Vlastimil Babka June 8, 2017, 11:07 a.m. UTC | #5
On 06/08/2017 12:40 PM, Kirill A. Shutemov wrote:
> On Thu, Jun 08, 2017 at 11:38:21AM +0200, Vlastimil Babka wrote:

>> On 06/06/2017 07:58 PM, Will Deacon wrote:

>>> page_ref_freeze and page_ref_unfreeze are designed to be used as a pair,

>>> wrapping a critical section where struct pages can be modified without

>>> having to worry about consistency for a concurrent fast-GUP.

>>>

>>> Whilst page_ref_freeze has full barrier semantics due to its use of

>>> atomic_cmpxchg, page_ref_unfreeze is implemented using atomic_set, which

>>> doesn't provide any barrier semantics and allows the operation to be

>>> reordered with respect to page modifications in the critical section.

>>>

>>> This patch ensures that page_ref_unfreeze is ordered after any critical

>>> section updates, by invoking smp_mb__before_atomic() prior to the

>>> atomic_set.

>>>

>>> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

>>> Acked-by: Steve Capper <steve.capper@arm.com>

>>> Signed-off-by: Will Deacon <will.deacon@arm.com>

>>

>> Undecided if it's really needed. This is IMHO not the classical case

>> from Documentation/core-api/atomic_ops.rst where we have to make

>> modifications visible before we let others see them? Here the one who is

>> freezing is doing it so others can't get their page pin and interfere

>> with the freezer's work.

> 

> Hm.. I'm not sure I'm getting what you are talking about. 

> 

> What would guarantee others to see changes to page before seeing page

> unfreezed?


My point was that we do the freezing for other reasons than to guarantee
this, but it can be needed too.

>>>  include/linux/page_ref.h | 1 +

>>>  1 file changed, 1 insertion(+)

>>>

>>> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h

>>> index 610e13271918..74d32d7905cb 100644

>>> --- a/include/linux/page_ref.h

>>> +++ b/include/linux/page_ref.h

>>> @@ -174,6 +174,7 @@ static inline void page_ref_unfreeze(struct page *page, int count)

>>>  	VM_BUG_ON_PAGE(page_count(page) != 0, page);

>>>  	VM_BUG_ON(count == 0);

>>>  

>>> +	smp_mb__before_atomic();

>>>  	atomic_set(&page->_refcount, count);

> 

> I *think* it should be smp_mb(), not __before_atomic(). atomic_set() is

> not really atomic. For instance on x86 it's plain WRITE_ONCE() which CPU

> would happily reorder.


Yeah but there are compile barriers, and x86 is TSO, so that's enough?
Also I found other instances by git grep (not a proof, though :)

>>>  	if (page_ref_tracepoint_active(__tracepoint_page_ref_unfreeze))

>>>  		__page_ref_unfreeze(page, count);

>>>

>>

>
Will Deacon June 8, 2017, 11:24 a.m. UTC | #6
[+ PeterZ]

On Thu, Jun 08, 2017 at 01:07:02PM +0200, Vlastimil Babka wrote:
> On 06/08/2017 12:40 PM, Kirill A. Shutemov wrote:

> > On Thu, Jun 08, 2017 at 11:38:21AM +0200, Vlastimil Babka wrote:

> >> On 06/06/2017 07:58 PM, Will Deacon wrote:

> >>>  include/linux/page_ref.h | 1 +

> >>>  1 file changed, 1 insertion(+)

> >>>

> >>> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h

> >>> index 610e13271918..74d32d7905cb 100644

> >>> --- a/include/linux/page_ref.h

> >>> +++ b/include/linux/page_ref.h

> >>> @@ -174,6 +174,7 @@ static inline void page_ref_unfreeze(struct page *page, int count)

> >>>  	VM_BUG_ON_PAGE(page_count(page) != 0, page);

> >>>  	VM_BUG_ON(count == 0);

> >>>  

> >>> +	smp_mb__before_atomic();

> >>>  	atomic_set(&page->_refcount, count);

> > 

> > I *think* it should be smp_mb(), not __before_atomic(). atomic_set() is

> > not really atomic. For instance on x86 it's plain WRITE_ONCE() which CPU

> > would happily reorder.

> 

> Yeah but there are compile barriers, and x86 is TSO, so that's enough?

> Also I found other instances by git grep (not a proof, though :)


I think it boils down to whether:

	smp_mb__before_atomic();
	atomic_set();

should have the same memory ordering semantics as:

	smp_mb();
	atomic_set();

which it doesn't with the x86 implementation AFAICT.

The horribly out-of-date atomic_ops.txt isn't so useful:

| If a caller requires memory barrier semantics around an atomic_t
| operation which does not return a value, a set of interfaces are
| defined which accomplish this::
| 
| 	void smp_mb__before_atomic(void);
| 	void smp_mb__after_atomic(void);
| 
| For example, smp_mb__before_atomic() can be used like so::
| 
| 	obj->dead = 1;
| 	smp_mb__before_atomic();
| 	atomic_dec(&obj->ref_count);
| 
| It makes sure that all memory operations preceding the atomic_dec()
| call are strongly ordered with respect to the atomic counter
| operation.  In the above example, it guarantees that the assignment of
| "1" to obj->dead will be globally visible to other cpus before the
| atomic counter decrement.
| 
| Without the explicit smp_mb__before_atomic() call, the
| implementation could legally allow the atomic counter update visible
| to other cpus before the "obj->dead = 1;" assignment.

which makes it sound more like the barrier is ordering all prior accesses
against the atomic operation itself (without going near cumulativity...),
and not with respect to anything later in program order.

Anyway, I think that's sufficient for what we want here, but we should
probably iron out the semantics of this thing.

Will
Peter Zijlstra June 8, 2017, 12:16 p.m. UTC | #7
On Thu, Jun 08, 2017 at 12:24:33PM +0100, Will Deacon wrote:
> [+ PeterZ]

> 

> On Thu, Jun 08, 2017 at 01:07:02PM +0200, Vlastimil Babka wrote:

> > On 06/08/2017 12:40 PM, Kirill A. Shutemov wrote:

> > > On Thu, Jun 08, 2017 at 11:38:21AM +0200, Vlastimil Babka wrote:

> > >> On 06/06/2017 07:58 PM, Will Deacon wrote:

> > >>>  include/linux/page_ref.h | 1 +

> > >>>  1 file changed, 1 insertion(+)

> > >>>

> > >>> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h

> > >>> index 610e13271918..74d32d7905cb 100644

> > >>> --- a/include/linux/page_ref.h

> > >>> +++ b/include/linux/page_ref.h

> > >>> @@ -174,6 +174,7 @@ static inline void page_ref_unfreeze(struct page *page, int count)

> > >>>  	VM_BUG_ON_PAGE(page_count(page) != 0, page);

> > >>>  	VM_BUG_ON(count == 0);

> > >>>  

> > >>> +	smp_mb__before_atomic();

> > >>>  	atomic_set(&page->_refcount, count);


Yeah, that's broken. smp_mb__{before,after}_atomic() goes with the
atomic RmW ops that do not already imply barriers, such like
atomic_add() and atomic_fetch_add_relaxed().

atomic_set() and atomic_read() are not RmW ops, they are plain
write/read ops respectively.

> > > 

> > > I *think* it should be smp_mb(), not __before_atomic(). atomic_set() is

> > > not really atomic. For instance on x86 it's plain WRITE_ONCE() which CPU

> > > would happily reorder.

> > 

> > Yeah but there are compile barriers, and x86 is TSO, so that's enough?

> > Also I found other instances by git grep (not a proof, though :)

> 

> I think it boils down to whether:

> 

> 	smp_mb__before_atomic();

> 	atomic_set();

> 

> should have the same memory ordering semantics as:

> 

> 	smp_mb();

> 	atomic_set();

> 

> which it doesn't with the x86 implementation AFAICT.


Correct, it doesn't.

The smp_mb__{before,after}_atomic() are to provide those barriers that
are required to upgrade the atomic RmW primitive of the architecture.

x86 has LOCK prefix instructions that are SC and therefore don't need
any upgrading.

ARM OTOH has unordered LL/SC and will need full DMB(ISH) for both.

MIPS has pretty much all variants under the sun, strongly ordered LL/SC,
half ordered LL/SC and weakly ordered LL/SC..

> The horribly out-of-date atomic_ops.txt isn't so useful:

> 

> | If a caller requires memory barrier semantics around an atomic_t

> | operation which does not return a value, a set of interfaces are

> | defined which accomplish this::

> | 

> | 	void smp_mb__before_atomic(void);

> | 	void smp_mb__after_atomic(void);

> | 

> | For example, smp_mb__before_atomic() can be used like so::

> | 

> | 	obj->dead = 1;

> | 	smp_mb__before_atomic();

> | 	atomic_dec(&obj->ref_count);

> | 

> | It makes sure that all memory operations preceding the atomic_dec()

> | call are strongly ordered with respect to the atomic counter

> | operation.  In the above example, it guarantees that the assignment of

> | "1" to obj->dead will be globally visible to other cpus before the

> | atomic counter decrement.

> | 

> | Without the explicit smp_mb__before_atomic() call, the

> | implementation could legally allow the atomic counter update visible

> | to other cpus before the "obj->dead = 1;" assignment.

> 

> which makes it sound more like the barrier is ordering all prior accesses

> against the atomic operation itself (without going near cumulativity...),

> and not with respect to anything later in program order.


This is correct.

> Anyway, I think that's sufficient for what we want here, but we should

> probably iron out the semantics of this thing.


s/smp_mb__\(before\|after\)_atomic/smp_mb/g

should not change the semantics of the code in _any_ way, just make it
slower on architectures that already have SC atomic primitives (like
x86).
Peter Zijlstra June 8, 2017, 12:19 p.m. UTC | #8
On Thu, Jun 08, 2017 at 02:16:41PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 08, 2017 at 12:24:33PM +0100, Will Deacon wrote:


> > The horribly out-of-date atomic_ops.txt isn't so useful:

> > 

> > | If a caller requires memory barrier semantics around an atomic_t

> > | operation which does not return a value, a set of interfaces are

> > | defined which accomplish this::

> > | 

> > | 	void smp_mb__before_atomic(void);

> > | 	void smp_mb__after_atomic(void);

> > | 

> > | For example, smp_mb__before_atomic() can be used like so::

> > | 

> > | 	obj->dead = 1;

> > | 	smp_mb__before_atomic();

> > | 	atomic_dec(&obj->ref_count);

> > | 

> > | It makes sure that all memory operations preceding the atomic_dec()

> > | call are strongly ordered with respect to the atomic counter

> > | operation.  In the above example, it guarantees that the assignment of

> > | "1" to obj->dead will be globally visible to other cpus before the

> > | atomic counter decrement.

> > | 

> > | Without the explicit smp_mb__before_atomic() call, the

> > | implementation could legally allow the atomic counter update visible

> > | to other cpus before the "obj->dead = 1;" assignment.

> > 

> > which makes it sound more like the barrier is ordering all prior accesses

> > against the atomic operation itself (without going near cumulativity...),

> > and not with respect to anything later in program order.

> 

> This is correct.


Ah, my bad, It orders against everything later, the first of which is
(obviously) the atomic op itself.

It being a full barrier means both the Read and the Write of the RmW
must happen _after_ everything preceding.

> > Anyway, I think that's sufficient for what we want here, but we should

> > probably iron out the semantics of this thing.

> 

> s/smp_mb__\(before\|after\)_atomic/smp_mb/g

> 

> should not change the semantics of the code in _any_ way, just make it

> slower on architectures that already have SC atomic primitives (like

> x86).

>
Peter Zijlstra June 8, 2017, 12:50 p.m. UTC | #9
On Thu, Jun 08, 2017 at 12:24:33PM +0100, Will Deacon wrote:
> [+ PeterZ]

> 

> On Thu, Jun 08, 2017 at 01:07:02PM +0200, Vlastimil Babka wrote:

> > On 06/08/2017 12:40 PM, Kirill A. Shutemov wrote:

> > > On Thu, Jun 08, 2017 at 11:38:21AM +0200, Vlastimil Babka wrote:

> > >> On 06/06/2017 07:58 PM, Will Deacon wrote:

> > >>>  include/linux/page_ref.h | 1 +

> > >>>  1 file changed, 1 insertion(+)

> > >>>

> > >>> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h

> > >>> index 610e13271918..74d32d7905cb 100644

> > >>> --- a/include/linux/page_ref.h

> > >>> +++ b/include/linux/page_ref.h

> > >>> @@ -174,6 +174,7 @@ static inline void page_ref_unfreeze(struct page *page, int count)

> > >>>  	VM_BUG_ON_PAGE(page_count(page) != 0, page);

> > >>>  	VM_BUG_ON(count == 0);

> > >>>  

> > >>> +	smp_mb__before_atomic();

> > >>>  	atomic_set(&page->_refcount, count);


So depending on what it actually required, we do have
atomic_set_release() (atomic_t equivalent to smp_store_release()).
Will Deacon June 9, 2017, 10:05 a.m. UTC | #10
On Thu, Jun 08, 2017 at 02:50:59PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 08, 2017 at 12:24:33PM +0100, Will Deacon wrote:

> > [+ PeterZ]

> > 

> > On Thu, Jun 08, 2017 at 01:07:02PM +0200, Vlastimil Babka wrote:

> > > On 06/08/2017 12:40 PM, Kirill A. Shutemov wrote:

> > > > On Thu, Jun 08, 2017 at 11:38:21AM +0200, Vlastimil Babka wrote:

> > > >> On 06/06/2017 07:58 PM, Will Deacon wrote:

> > > >>>  include/linux/page_ref.h | 1 +

> > > >>>  1 file changed, 1 insertion(+)

> > > >>>

> > > >>> diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h

> > > >>> index 610e13271918..74d32d7905cb 100644

> > > >>> --- a/include/linux/page_ref.h

> > > >>> +++ b/include/linux/page_ref.h

> > > >>> @@ -174,6 +174,7 @@ static inline void page_ref_unfreeze(struct page *page, int count)

> > > >>>  	VM_BUG_ON_PAGE(page_count(page) != 0, page);

> > > >>>  	VM_BUG_ON(count == 0);

> > > >>>  

> > > >>> +	smp_mb__before_atomic();

> > > >>>  	atomic_set(&page->_refcount, count);

> 

> So depending on what it actually required, we do have

> atomic_set_release() (atomic_t equivalent to smp_store_release()).


Yeah, I was wondering about that this morning. I think it should do the
trick here, but smp_mb() would be a better fit for the other parts of this
API (page_ref_freeze uses atomic_cmpxchg and page_cache_get_speculative
uses atomic_add_unless).

I'll send a v2 with the full barrier.

Will
diff mbox series

Patch

diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
index 610e13271918..74d32d7905cb 100644
--- a/include/linux/page_ref.h
+++ b/include/linux/page_ref.h
@@ -174,6 +174,7 @@  static inline void page_ref_unfreeze(struct page *page, int count)
 	VM_BUG_ON_PAGE(page_count(page) != 0, page);
 	VM_BUG_ON(count == 0);
 
+	smp_mb__before_atomic();
 	atomic_set(&page->_refcount, count);
 	if (page_ref_tracepoint_active(__tracepoint_page_ref_unfreeze))
 		__page_ref_unfreeze(page, count);