documentation: Fix two-CPU control-dependency example

Message ID	0d60072f-3848-a2b7-ce18-e1ed317b5c09@gmail.com
State	New
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> From: Akira Yokosawa <akiyks@gmail.com> Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Akira Yokosawa <akiyks@gmail.com> Subject: [PATCH] documentation: Fix two-CPU control-dependency example Message-ID: <0d60072f-3848-a2b7-ce18-e1ed317b5c09@gmail.com> Date: Mon, 17 Jul 2017 17:24:42 +0900 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk

Akira Yokosawa July 17, 2017, 8:24 a.m. UTC

From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 2001
From: Akira Yokosawa <akiyks@gmail.com>

Date: Mon, 17 Jul 2017 16:25:33 +0900
Subject: [PATCH] documentation: Fix two-CPU control-dependency example

In commit 5646f7acc95f ("memory-barriers: Fix control-ordering
no-transitivity example"), the operator in "if" statement of
the two-CPU example was modified from ">=" to ">".
Now the example misses the point because there is no party
who will modify "x" nor "y". So each CPU performs only the
READ_ONCE().

The point of this example is to use control dependency for ordering,
and the WRITE_ONCE() should always be executed.

So it was correct prior to the above mentioned commit. Partial
revert of the commit (with context adjustments regarding other
changes thereafter) restores the point.

Note that the three-CPU example demonstrating the lack of
transitivity stands regardless of this partial revert.

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>

---
 Documentation/memory-barriers.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.7.4

Paul E. McKenney July 19, 2017, 5:43 p.m. UTC | #1

On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote:
> >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 2001

> From: Akira Yokosawa <akiyks@gmail.com>

> Date: Mon, 17 Jul 2017 16:25:33 +0900

> Subject: [PATCH] documentation: Fix two-CPU control-dependency example

> 

> In commit 5646f7acc95f ("memory-barriers: Fix control-ordering

> no-transitivity example"), the operator in "if" statement of

> the two-CPU example was modified from ">=" to ">".

> Now the example misses the point because there is no party

> who will modify "x" nor "y". So each CPU performs only the

> READ_ONCE().

> 

> The point of this example is to use control dependency for ordering,

> and the WRITE_ONCE() should always be executed.

> 

> So it was correct prior to the above mentioned commit. Partial

> revert of the commit (with context adjustments regarding other

> changes thereafter) restores the point.

> 

> Note that the three-CPU example demonstrating the lack of

> transitivity stands regardless of this partial revert.

> 

> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>


Hello, Akira,

You are quite right that many compilers would generate straightforward
code for the code fragment below, and in that case, the assertion could
never trigger due to either TSO or control dependencies, depending on
the architecture this was running on.

However, if the compiler was too smart for our good, it could figure
out that "x" and "y" only take on the values zero and one, so that
the "if" would always be taken.  At that point, the compiler could
simply ignore the "if" with the result shown below.

> ---

>  Documentation/memory-barriers.txt | 2 +-

>  1 file changed, 1 insertion(+), 1 deletion(-)

> 

> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt

> index c4ddfcd..c1ebe99 100644

> --- a/Documentation/memory-barriers.txt

> +++ b/Documentation/memory-barriers.txt

> @@ -851,7 +851,7 @@ demonstrated by two related examples, with the initial values of

>  	CPU 0                     CPU 1

>  	=======================   =======================

>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> -	if (r1 > 0)               if (r2 > 0)

> +	if (r1 >= 0)              if (r2 >= 0)

>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> 

>  	assert(!(r1 == 1 && r2 == 1));


Original program:

 	CPU 0                     CPU 1
 	=======================   =======================
 	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
	if (r1 >= 0)              if (r2 >= 0)
 	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

 	assert(!(r1 == 1 && r2 == 1));

Enthusiastically optimized program:

 	CPU 0                     CPU 1
 	=======================   =======================
 	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
 	WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

 	assert(!(r1 == 1 && r2 == 1));

This optimized version could trigger the assertion.

So we do need to stick with the ">" comparison.

That said, I very much welcome critical reviews of memory-barriers.txt,
so please do feel free to continue doing that!

							Thanx, Paul

Akira Yokosawa July 19, 2017, 9:33 p.m. UTC | #2

On 2017/07/20 2:43, Paul E. McKenney wrote:
> On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote:

>> >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 2001

>> From: Akira Yokosawa <akiyks@gmail.com>

>> Date: Mon, 17 Jul 2017 16:25:33 +0900

>> Subject: [PATCH] documentation: Fix two-CPU control-dependency example

>>

>> In commit 5646f7acc95f ("memory-barriers: Fix control-ordering

>> no-transitivity example"), the operator in "if" statement of

>> the two-CPU example was modified from ">=" to ">".

>> Now the example misses the point because there is no party

>> who will modify "x" nor "y". So each CPU performs only the

>> READ_ONCE().

>>

>> The point of this example is to use control dependency for ordering,

>> and the WRITE_ONCE() should always be executed.

>>

>> So it was correct prior to the above mentioned commit. Partial

>> revert of the commit (with context adjustments regarding other

>> changes thereafter) restores the point.

>>

>> Note that the three-CPU example demonstrating the lack of

>> transitivity stands regardless of this partial revert.

>>

>> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>

> 

> Hello, Akira,

> 

> You are quite right that many compilers would generate straightforward

> code for the code fragment below, and in that case, the assertion could

> never trigger due to either TSO or control dependencies, depending on

> the architecture this was running on.

> 

> However, if the compiler was too smart for our good, it could figure

> out that "x" and "y" only take on the values zero and one, so that

> the "if" would always be taken.  At that point, the compiler could

> simply ignore the "if" with the result shown below.

> 

>> ---

>>  Documentation/memory-barriers.txt | 2 +-

>>  1 file changed, 1 insertion(+), 1 deletion(-)

>>

>> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt

>> index c4ddfcd..c1ebe99 100644

>> --- a/Documentation/memory-barriers.txt

>> +++ b/Documentation/memory-barriers.txt

>> @@ -851,7 +851,7 @@ demonstrated by two related examples, with the initial values of

>>  	CPU 0                     CPU 1

>>  	=======================   =======================

>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

>> -	if (r1 > 0)               if (r2 > 0)

>> +	if (r1 >= 0)              if (r2 >= 0)

>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

>>

>>  	assert(!(r1 == 1 && r2 == 1));

> 

> Original program:

> 

>  	CPU 0                     CPU 1

>  	=======================   =======================

>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> 	if (r1 >= 0)              if (r2 >= 0)

>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> 

>  	assert(!(r1 == 1 && r2 == 1));

> 

> Enthusiastically optimized program:

> 

>  	CPU 0                     CPU 1

>  	=======================   =======================

>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

>  	WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> 

>  	assert(!(r1 == 1 && r2 == 1));

> 

> This optimized version could trigger the assertion.

> 

> So we do need to stick with the ">" comparison.


Well but,

Current example:

 	CPU 0                     CPU 1
 	=======================   =======================
 	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
	if (r1 > 0)               if (r2 > 0)
 	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

	assert(!(r1 == 1 && r2 == 1));

Such a clever compiler might be able to prove that "x" and "y"
are never modified and end up in the following:

 	CPU 0                     CPU 1
 	=======================   =======================
 	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

	assert(!(r1 == 1 && r2 == 1));

This means it is impossible to describe this example in C,
doesn't it?

What am I missing here?

            Thanks, Akira

> That said, I very much welcome critical reviews of memory-barriers.txt,

> so please do feel free to continue doing that!

> 

> 							Thanx, Paul

> 

>

Paul E. McKenney July 19, 2017, 9:56 p.m. UTC | #3

On Thu, Jul 20, 2017 at 06:33:26AM +0900, Akira Yokosawa wrote:
> On 2017/07/20 2:43, Paul E. McKenney wrote:

> > On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote:

> >> >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 2001

> >> From: Akira Yokosawa <akiyks@gmail.com>

> >> Date: Mon, 17 Jul 2017 16:25:33 +0900

> >> Subject: [PATCH] documentation: Fix two-CPU control-dependency example

> >>

> >> In commit 5646f7acc95f ("memory-barriers: Fix control-ordering

> >> no-transitivity example"), the operator in "if" statement of

> >> the two-CPU example was modified from ">=" to ">".

> >> Now the example misses the point because there is no party

> >> who will modify "x" nor "y". So each CPU performs only the

> >> READ_ONCE().

> >>

> >> The point of this example is to use control dependency for ordering,

> >> and the WRITE_ONCE() should always be executed.

> >>

> >> So it was correct prior to the above mentioned commit. Partial

> >> revert of the commit (with context adjustments regarding other

> >> changes thereafter) restores the point.

> >>

> >> Note that the three-CPU example demonstrating the lack of

> >> transitivity stands regardless of this partial revert.

> >>

> >> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>

> > 

> > Hello, Akira,

> > 

> > You are quite right that many compilers would generate straightforward

> > code for the code fragment below, and in that case, the assertion could

> > never trigger due to either TSO or control dependencies, depending on

> > the architecture this was running on.

> > 

> > However, if the compiler was too smart for our good, it could figure

> > out that "x" and "y" only take on the values zero and one, so that

> > the "if" would always be taken.  At that point, the compiler could

> > simply ignore the "if" with the result shown below.

> > 

> >> ---

> >>  Documentation/memory-barriers.txt | 2 +-

> >>  1 file changed, 1 insertion(+), 1 deletion(-)

> >>

> >> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt

> >> index c4ddfcd..c1ebe99 100644

> >> --- a/Documentation/memory-barriers.txt

> >> +++ b/Documentation/memory-barriers.txt

> >> @@ -851,7 +851,7 @@ demonstrated by two related examples, with the initial values of

> >>  	CPU 0                     CPU 1

> >>  	=======================   =======================

> >>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> >> -	if (r1 > 0)               if (r2 > 0)

> >> +	if (r1 >= 0)              if (r2 >= 0)

> >>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> >>

> >>  	assert(!(r1 == 1 && r2 == 1));

> > 

> > Original program:

> > 

> >  	CPU 0                     CPU 1

> >  	=======================   =======================

> >  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> > 	if (r1 >= 0)              if (r2 >= 0)

> >  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> > 

> >  	assert(!(r1 == 1 && r2 == 1));

> > 

> > Enthusiastically optimized program:

> > 

> >  	CPU 0                     CPU 1

> >  	=======================   =======================

> >  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> >  	WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> > 

> >  	assert(!(r1 == 1 && r2 == 1));

> > 

> > This optimized version could trigger the assertion.

> > 

> > So we do need to stick with the ">" comparison.

> 

> Well but,

> 

> Current example:

> 

>  	CPU 0                     CPU 1

>  	=======================   =======================

>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> 	if (r1 > 0)               if (r2 > 0)

>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> 

> 	assert(!(r1 == 1 && r2 == 1));

> 

> Such a clever compiler might be able to prove that "x" and "y"

> are never modified and end up in the following:

> 

>  	CPU 0                     CPU 1

>  	=======================   =======================

>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> 

> 	assert(!(r1 == 1 && r2 == 1));

> 

> This means it is impossible to describe this example in C,

> doesn't it?


That is a matter of some debate, but it has gotten increasingly more
difficult to get C to do one's bidding over the decades.

> What am I missing here?


The compiler has to work harder in your example case, so it is probably
just a matter of time.  In the original with ">=", all the compiler needed
to do was look at all the assignments and the initial value.  In the
original, it also had to do reasoning based on control dependencies
(which are not yet part of the C standard).

But yes, the degree to which a compiler can optimize atomics is a matter
of some debate.  Here is a proposal to let the developer choose:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html

What are your thoughts on this?

							Thanx, Paul

>             Thanks, Akira

> 

> > That said, I very much welcome critical reviews of memory-barriers.txt,

> > so please do feel free to continue doing that!

> > 

> > 							Thanx, Paul

> > 

> > 

>

Boqun Feng July 20, 2017, 1:31 a.m. UTC | #4

On Wed, Jul 19, 2017 at 02:56:02PM -0700, Paul E. McKenney wrote:
> On Thu, Jul 20, 2017 at 06:33:26AM +0900, Akira Yokosawa wrote:

> > On 2017/07/20 2:43, Paul E. McKenney wrote:

> > > On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote:

> > >> >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 2001

> > >> From: Akira Yokosawa <akiyks@gmail.com>

> > >> Date: Mon, 17 Jul 2017 16:25:33 +0900

> > >> Subject: [PATCH] documentation: Fix two-CPU control-dependency example

> > >>

> > >> In commit 5646f7acc95f ("memory-barriers: Fix control-ordering

> > >> no-transitivity example"), the operator in "if" statement of

> > >> the two-CPU example was modified from ">=" to ">".

> > >> Now the example misses the point because there is no party

> > >> who will modify "x" nor "y". So each CPU performs only the

> > >> READ_ONCE().

> > >>

> > >> The point of this example is to use control dependency for ordering,

> > >> and the WRITE_ONCE() should always be executed.

> > >>

> > >> So it was correct prior to the above mentioned commit. Partial

> > >> revert of the commit (with context adjustments regarding other

> > >> changes thereafter) restores the point.

> > >>

> > >> Note that the three-CPU example demonstrating the lack of

> > >> transitivity stands regardless of this partial revert.

> > >>

> > >> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>

> > > 

> > > Hello, Akira,

> > > 

> > > You are quite right that many compilers would generate straightforward

> > > code for the code fragment below, and in that case, the assertion could

> > > never trigger due to either TSO or control dependencies, depending on

> > > the architecture this was running on.

> > > 

> > > However, if the compiler was too smart for our good, it could figure

> > > out that "x" and "y" only take on the values zero and one, so that

> > > the "if" would always be taken.  At that point, the compiler could

> > > simply ignore the "if" with the result shown below.

> > > 

> > >> ---

> > >>  Documentation/memory-barriers.txt | 2 +-

> > >>  1 file changed, 1 insertion(+), 1 deletion(-)

> > >>

> > >> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt

> > >> index c4ddfcd..c1ebe99 100644

> > >> --- a/Documentation/memory-barriers.txt

> > >> +++ b/Documentation/memory-barriers.txt

> > >> @@ -851,7 +851,7 @@ demonstrated by two related examples, with the initial values of

> > >>  	CPU 0                     CPU 1

> > >>  	=======================   =======================

> > >>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> > >> -	if (r1 > 0)               if (r2 > 0)

> > >> +	if (r1 >= 0)              if (r2 >= 0)

> > >>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> > >>

> > >>  	assert(!(r1 == 1 && r2 == 1));

> > > 

> > > Original program:

> > > 

> > >  	CPU 0                     CPU 1

> > >  	=======================   =======================

> > >  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> > > 	if (r1 >= 0)              if (r2 >= 0)

> > >  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> > > 

> > >  	assert(!(r1 == 1 && r2 == 1));

> > > 

> > > Enthusiastically optimized program:

> > > 

> > >  	CPU 0                     CPU 1

> > >  	=======================   =======================

> > >  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> > >  	WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> > > 

> > >  	assert(!(r1 == 1 && r2 == 1));

> > > 

> > > This optimized version could trigger the assertion.

> > > 

> > > So we do need to stick with the ">" comparison.

> > 

> > Well but,

> > 

> > Current example:

> > 

> >  	CPU 0                     CPU 1

> >  	=======================   =======================

> >  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> > 	if (r1 > 0)               if (r2 > 0)

> >  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> > 

> > 	assert(!(r1 == 1 && r2 == 1));

> > 

> > Such a clever compiler might be able to prove that "x" and "y"

> > are never modified and end up in the following:

> > 


Hi Akira,

I wouldn't call that compiler a clever one, it's a broken one ;-)

So here is the thing: READ_ONCE() is a *volatile* load, which means the
compiler has to generate code that actually does a load, so the values
of r1 and r2 depend on the loads, therefore, a sane compiler will not 
optimize the if()s out because the volatile semantics of READ_ONCE().
Otherwise, I think we have a lot more to worry about than this case.

> >  	CPU 0                     CPU 1

> >  	=======================   =======================

> >  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> > 

> > 	assert(!(r1 == 1 && r2 == 1));

> > 

> > This means it is impossible to describe this example in C,

> > doesn't it?

> 

> That is a matter of some debate, but it has gotten increasingly more

> difficult to get C to do one's bidding over the decades.

> 

> > What am I missing here?

> 

> The compiler has to work harder in your example case, so it is probably

> just a matter of time.  In the original with ">=", all the compiler needed

> to do was look at all the assignments and the initial value.  In the

> original, it also had to do reasoning based on control dependencies

> (which are not yet part of the C standard).

> 

> But yes, the degree to which a compiler can optimize atomics is a matter

> of some debate.  Here is a proposal to let the developer choose:

> 


Hi Paul,

I know the compiler could optimize atomics in very interesting ways, but
this case is about volatile, so I guess our case is still fine? ;-)

> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html

> 


Great material to wake up mind in the morning! Thanks.

Regards,
Boqun

> What are your thoughts on this?

> 

> 							Thanx, Paul

> 

> >             Thanks, Akira

> > 

> > > That said, I very much welcome critical reviews of memory-barriers.txt,

> > > so please do feel free to continue doing that!

> > > 

> > > 							Thanx, Paul

> > > 

> > > 

> > 

>

Paul E. McKenney July 20, 2017, 5:47 a.m. UTC | #5

On Thu, Jul 20, 2017 at 09:31:41AM +0800, Boqun Feng wrote:
> On Wed, Jul 19, 2017 at 02:56:02PM -0700, Paul E. McKenney wrote:

> > On Thu, Jul 20, 2017 at 06:33:26AM +0900, Akira Yokosawa wrote:

> > > On 2017/07/20 2:43, Paul E. McKenney wrote:

> > > > On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote:

> > > >> >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 2001

> > > >> From: Akira Yokosawa <akiyks@gmail.com>

> > > >> Date: Mon, 17 Jul 2017 16:25:33 +0900

> > > >> Subject: [PATCH] documentation: Fix two-CPU control-dependency example

> > > >>

> > > >> In commit 5646f7acc95f ("memory-barriers: Fix control-ordering

> > > >> no-transitivity example"), the operator in "if" statement of

> > > >> the two-CPU example was modified from ">=" to ">".

> > > >> Now the example misses the point because there is no party

> > > >> who will modify "x" nor "y". So each CPU performs only the

> > > >> READ_ONCE().

> > > >>

> > > >> The point of this example is to use control dependency for ordering,

> > > >> and the WRITE_ONCE() should always be executed.

> > > >>

> > > >> So it was correct prior to the above mentioned commit. Partial

> > > >> revert of the commit (with context adjustments regarding other

> > > >> changes thereafter) restores the point.

> > > >>

> > > >> Note that the three-CPU example demonstrating the lack of

> > > >> transitivity stands regardless of this partial revert.

> > > >>

> > > >> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>

> > > > 

> > > > Hello, Akira,

> > > > 

> > > > You are quite right that many compilers would generate straightforward

> > > > code for the code fragment below, and in that case, the assertion could

> > > > never trigger due to either TSO or control dependencies, depending on

> > > > the architecture this was running on.

> > > > 

> > > > However, if the compiler was too smart for our good, it could figure

> > > > out that "x" and "y" only take on the values zero and one, so that

> > > > the "if" would always be taken.  At that point, the compiler could

> > > > simply ignore the "if" with the result shown below.

> > > > 

> > > >> ---

> > > >>  Documentation/memory-barriers.txt | 2 +-

> > > >>  1 file changed, 1 insertion(+), 1 deletion(-)

> > > >>

> > > >> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt

> > > >> index c4ddfcd..c1ebe99 100644

> > > >> --- a/Documentation/memory-barriers.txt

> > > >> +++ b/Documentation/memory-barriers.txt

> > > >> @@ -851,7 +851,7 @@ demonstrated by two related examples, with the initial values of

> > > >>  	CPU 0                     CPU 1

> > > >>  	=======================   =======================

> > > >>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> > > >> -	if (r1 > 0)               if (r2 > 0)

> > > >> +	if (r1 >= 0)              if (r2 >= 0)

> > > >>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> > > >>

> > > >>  	assert(!(r1 == 1 && r2 == 1));

> > > > 

> > > > Original program:

> > > > 

> > > >  	CPU 0                     CPU 1

> > > >  	=======================   =======================

> > > >  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> > > > 	if (r1 >= 0)              if (r2 >= 0)

> > > >  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> > > > 

> > > >  	assert(!(r1 == 1 && r2 == 1));

> > > > 

> > > > Enthusiastically optimized program:

> > > > 

> > > >  	CPU 0                     CPU 1

> > > >  	=======================   =======================

> > > >  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> > > >  	WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> > > > 

> > > >  	assert(!(r1 == 1 && r2 == 1));

> > > > 

> > > > This optimized version could trigger the assertion.

> > > > 

> > > > So we do need to stick with the ">" comparison.

> > > 

> > > Well but,

> > > 

> > > Current example:

> > > 

> > >  	CPU 0                     CPU 1

> > >  	=======================   =======================

> > >  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> > > 	if (r1 > 0)               if (r2 > 0)

> > >  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> > > 

> > > 	assert(!(r1 == 1 && r2 == 1));

> > > 

> > > Such a clever compiler might be able to prove that "x" and "y"

> > > are never modified and end up in the following:

> > > 

> 

> Hi Akira,

> 

> I wouldn't call that compiler a clever one, it's a broken one ;-)

> 

> So here is the thing: READ_ONCE() is a *volatile* load, which means the

> compiler has to generate code that actually does a load, so the values

> of r1 and r2 depend on the loads, therefore, a sane compiler will not 

> optimize the if()s out because the volatile semantics of READ_ONCE().

> Otherwise, I think we have a lot more to worry about than this case.

> 

> > >  	CPU 0                     CPU 1

> > >  	=======================   =======================

> > >  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> > > 

> > > 	assert(!(r1 == 1 && r2 == 1));

> > > 

> > > This means it is impossible to describe this example in C,

> > > doesn't it?

> > 

> > That is a matter of some debate, but it has gotten increasingly more

> > difficult to get C to do one's bidding over the decades.

> > 

> > > What am I missing here?

> > 

> > The compiler has to work harder in your example case, so it is probably

> > just a matter of time.  In the original with ">=", all the compiler needed

> > to do was look at all the assignments and the initial value.  In the

> > original, it also had to do reasoning based on control dependencies

> > (which are not yet part of the C standard).

> > 

> > But yes, the degree to which a compiler can optimize atomics is a matter

> > of some debate.  Here is a proposal to let the developer choose:

> > 

> 

> Hi Paul,

> 

> I know the compiler could optimize atomics in very interesting ways, but

> this case is about volatile, so I guess our case is still fine? ;-)


Hello, Boqun,

When I asked that question, the answer I got was "the compiler must
emit the load instruction, but is under no obligation to actually use the
value loaded".

I don't happen to like that answer, by the way.  ;-)

							Thanx, Paul

> > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html

> > 

> 

> Great material to wake up mind in the morning! Thanks.

> 

> Regards,

> Boqun

> 

> > What are your thoughts on this?

> > 

> > 							Thanx, Paul

> > 

> > >             Thanks, Akira

> > > 

> > > > That said, I very much welcome critical reviews of memory-barriers.txt,

> > > > so please do feel free to continue doing that!

> > > > 

> > > > 							Thanx, Paul

> > > > 

> > > > 

> > > 

> > 

>

Boqun Feng July 20, 2017, 6:14 a.m. UTC | #6

On Wed, Jul 19, 2017 at 10:47:04PM -0700, Paul E. McKenney wrote:
[...]
> > Hi Paul,

> > 

> > I know the compiler could optimize atomics in very interesting ways, but

> > this case is about volatile, so I guess our case is still fine? ;-)

> 

> Hello, Boqun,

> 

> When I asked that question, the answer I got was "the compiler must

> emit the load instruction, but is under no obligation to actually use the

> value loaded".

> 

> I don't happen to like that answer, by the way.  ;-)

> 


Me neither, seems to me the kernel happens to work well at
compiler-optimization's mercy ;-/

With claim like that, compiler could do optimization as turning:

	struct task_struct *owner;

	for (;;) {
		owner = READ_ONCE(lock->owner);
		if (owner && !mutex_spin_on_owner(lock, owner))
			break;
		/* ... */

into:

	struct task_struct *owner;

	owner = READ_ONCE(lock->owner);

	for (;;READ_ONCE(lock->owner)) {
		if (owner && !mutex_spin_on_owner(lock, owner))
			break;
		/* ... */

Because the load executed in every loop, and they just happen to choose
not to use the values. And this is within their rights!

Regards,
Boqun

> 							Thanx, Paul

> 

> > > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html

> > > 

> > 

> > Great material to wake up mind in the morning! Thanks.

> > 

> > Regards,

> > Boqun

> > 

> > > What are your thoughts on this?

> > > 

> > > 							Thanx, Paul

> > > 

> > > >             Thanks, Akira

> > > > 

> > > > > That said, I very much welcome critical reviews of memory-barriers.txt,

> > > > > so please do feel free to continue doing that!

> > > > > 

> > > > > 							Thanx, Paul

> > > > > 

> > > > > 

> > > > 

> > > 

> > 

>

Paul E. McKenney July 20, 2017, 12:52 p.m. UTC | #7

On Thu, Jul 20, 2017 at 02:14:34PM +0800, Boqun Feng wrote:
> On Wed, Jul 19, 2017 at 10:47:04PM -0700, Paul E. McKenney wrote:

> [...]

> > > Hi Paul,

> > > 

> > > I know the compiler could optimize atomics in very interesting ways, but

> > > this case is about volatile, so I guess our case is still fine? ;-)

> > 

> > Hello, Boqun,

> > 

> > When I asked that question, the answer I got was "the compiler must

> > emit the load instruction, but is under no obligation to actually use the

> > value loaded".

> > 

> > I don't happen to like that answer, by the way.  ;-)

> > 

> 

> Me neither, seems to me the kernel happens to work well at

> compiler-optimization's mercy ;-/

> 

> With claim like that, compiler could do optimization as turning:

> 

> 	struct task_struct *owner;

> 

> 	for (;;) {

> 		owner = READ_ONCE(lock->owner);

> 		if (owner && !mutex_spin_on_owner(lock, owner))

> 			break;

> 		/* ... */

> 

> into:

> 

> 	struct task_struct *owner;

> 

> 	owner = READ_ONCE(lock->owner);

> 

> 	for (;;READ_ONCE(lock->owner)) {

> 		if (owner && !mutex_spin_on_owner(lock, owner))

> 			break;

> 		/* ... */

> 

> Because the load executed in every loop, and they just happen to choose

> not to use the values. And this is within their rights!


Well, this is one reason that I attend standards-committee meetings.
As does Will Deacon.  That way, there is someone there to protest when
people argue that the above behavior is just fine.  ;-)

							Thanx, Paul

> Regards,

> Boqun

> 

> > 							Thanx, Paul

> > 

> > > > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html

> > > > 

> > > 

> > > Great material to wake up mind in the morning! Thanks.

> > > 

> > > Regards,

> > > Boqun

> > > 

> > > > What are your thoughts on this?

> > > > 

> > > > 							Thanx, Paul

> > > > 

> > > > >             Thanks, Akira

> > > > > 

> > > > > > That said, I very much welcome critical reviews of memory-barriers.txt,

> > > > > > so please do feel free to continue doing that!

> > > > > > 

> > > > > > 							Thanx, Paul

> > > > > > 

> > > > > > 

> > > > > 

> > > > 

> > > 

> > 

>

Akira Yokosawa July 20, 2017, 12:55 p.m. UTC | #8

On 2017/07/20 14:47, Paul E. McKenney wrote:
> On Thu, Jul 20, 2017 at 09:31:41AM +0800, Boqun Feng wrote:

>> On Wed, Jul 19, 2017 at 02:56:02PM -0700, Paul E. McKenney wrote:

>>> On Thu, Jul 20, 2017 at 06:33:26AM +0900, Akira Yokosawa wrote:

>>>> On 2017/07/20 2:43, Paul E. McKenney wrote:

>>>>> On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote:

>>>>>> >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 2001

>>>>>> From: Akira Yokosawa <akiyks@gmail.com>

>>>>>> Date: Mon, 17 Jul 2017 16:25:33 +0900

>>>>>> Subject: [PATCH] documentation: Fix two-CPU control-dependency example

>>>>>>

>>>>>> In commit 5646f7acc95f ("memory-barriers: Fix control-ordering

>>>>>> no-transitivity example"), the operator in "if" statement of

>>>>>> the two-CPU example was modified from ">=" to ">".

>>>>>> Now the example misses the point because there is no party

>>>>>> who will modify "x" nor "y". So each CPU performs only the

>>>>>> READ_ONCE().

>>>>>>

>>>>>> The point of this example is to use control dependency for ordering,

>>>>>> and the WRITE_ONCE() should always be executed.

>>>>>>

>>>>>> So it was correct prior to the above mentioned commit. Partial

>>>>>> revert of the commit (with context adjustments regarding other

>>>>>> changes thereafter) restores the point.

>>>>>>

>>>>>> Note that the three-CPU example demonstrating the lack of

>>>>>> transitivity stands regardless of this partial revert.

>>>>>>

>>>>>> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>

>>>>>

>>>>> Hello, Akira,

>>>>>

>>>>> You are quite right that many compilers would generate straightforward

>>>>> code for the code fragment below, and in that case, the assertion could

>>>>> never trigger due to either TSO or control dependencies, depending on

>>>>> the architecture this was running on.

>>>>>

>>>>> However, if the compiler was too smart for our good, it could figure

>>>>> out that "x" and "y" only take on the values zero and one, so that

>>>>> the "if" would always be taken.  At that point, the compiler could

>>>>> simply ignore the "if" with the result shown below.

>>>>>

>>>>>> ---

>>>>>>  Documentation/memory-barriers.txt | 2 +-

>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)

>>>>>>

>>>>>> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt

>>>>>> index c4ddfcd..c1ebe99 100644

>>>>>> --- a/Documentation/memory-barriers.txt

>>>>>> +++ b/Documentation/memory-barriers.txt

>>>>>> @@ -851,7 +851,7 @@ demonstrated by two related examples, with the initial values of

>>>>>>  	CPU 0                     CPU 1

>>>>>>  	=======================   =======================

>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

>>>>>> -	if (r1 > 0)               if (r2 > 0)

>>>>>> +	if (r1 >= 0)              if (r2 >= 0)

>>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

>>>>>>

>>>>>>  	assert(!(r1 == 1 && r2 == 1));

>>>>>

>>>>> Original program:

>>>>>

>>>>>  	CPU 0                     CPU 1

>>>>>  	=======================   =======================

>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

>>>>> 	if (r1 >= 0)              if (r2 >= 0)

>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

>>>>>

>>>>>  	assert(!(r1 == 1 && r2 == 1));

>>>>>

>>>>> Enthusiastically optimized program:

>>>>>

>>>>>  	CPU 0                     CPU 1

>>>>>  	=======================   =======================

>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

>>>>>  	WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

>>>>>

>>>>>  	assert(!(r1 == 1 && r2 == 1));

>>>>>

>>>>> This optimized version could trigger the assertion.

>>>>>

>>>>> So we do need to stick with the ">" comparison.

>>>>

>>>> Well but,

>>>>

>>>> Current example:

>>>>

>>>>  	CPU 0                     CPU 1

>>>>  	=======================   =======================

>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

>>>> 	if (r1 > 0)               if (r2 > 0)

>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

>>>>

>>>> 	assert(!(r1 == 1 && r2 == 1));

>>>>

>>>> Such a clever compiler might be able to prove that "x" and "y"

>>>> are never modified and end up in the following:

>>>>

>>

>> Hi Akira,

>>

>> I wouldn't call that compiler a clever one, it's a broken one ;-)

>>

>> So here is the thing: READ_ONCE() is a *volatile* load, which means the

>> compiler has to generate code that actually does a load, so the values

>> of r1 and r2 depend on the loads, therefore, a sane compiler will not 

>> optimize the if()s out because the volatile semantics of READ_ONCE().

>> Otherwise, I think we have a lot more to worry about than this case.

>>

>>>>  	CPU 0                     CPU 1

>>>>  	=======================   =======================

>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

>>>>

>>>> 	assert(!(r1 == 1 && r2 == 1));

>>>>

>>>> This means it is impossible to describe this example in C,

>>>> doesn't it?

>>>

>>> That is a matter of some debate, but it has gotten increasingly more

>>> difficult to get C to do one's bidding over the decades.

>>>

>>>> What am I missing here?

>>>

>>> The compiler has to work harder in your example case, so it is probably

>>> just a matter of time.  In the original with ">=", all the compiler needed

>>> to do was look at all the assignments and the initial value.  In the

>>> original, it also had to do reasoning based on control dependencies

>>> (which are not yet part of the C standard).

>>>

>>> But yes, the degree to which a compiler can optimize atomics is a matter

>>> of some debate.  Here is a proposal to let the developer choose:

>>>

>>

>> Hi Paul,

>>

>> I know the compiler could optimize atomics in very interesting ways, but

>> this case is about volatile, so I guess our case is still fine? ;-)

> 

> Hello, Boqun,

> 

> When I asked that question, the answer I got was "the compiler must

> emit the load instruction, but is under no obligation to actually use the

> value loaded".


So, it sounds like the following description found in memory-barriers.txt
just above the example of lack of transitivity:

> However, stores are not speculated.  This means that ordering -is- provided

> for load-store control dependencies, as in the following example:

> 

> 	q = READ_ONCE(a);

> 	if (q) {

> 		WRITE_ONCE(b, 1);

> 	}

> 

> Control dependencies pair normally with other types of barriers.

> That said, please note that neither READ_ONCE() nor WRITE_ONCE()

> are optional! Without the READ_ONCE(), the compiler might combine the

> load from 'a' with other loads from 'a'.  Without the WRITE_ONCE(),

> the compiler might combine the store to 'b' with other stores to 'b'.

> Either can result in highly counterintuitive effects on ordering.

> 

> Worse yet, if the compiler is able to prove (say) that the value of

> variable 'a' is always non-zero, it would be well within its rights

> to optimize the original example by eliminating the "if" statement

> as follows:

> 

> 	q = a;

> 	b = 1;  /* BUG: Compiler and CPU can both reorder!!! */

> 

> So don't leave out the READ_ONCE().


does not stand if the answer needs to be taken seriously.
The READ_ONCE() is not good enough to prevent the "if" statement
from being eliminated.

Hmm... If we do need to care about such an extreme optimization,
we should not rely on any control dependency in C source code
at least until the compiler gets educated about control dependency.

Is there some reasonable compromise?

           Thanks, Akira

> 

> I don't happen to like that answer, by the way.  ;-)

> 

> 							Thanx, Paul

> 

>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html

>>>

>>

>> Great material to wake up mind in the morning! Thanks.

>>

>> Regards,

>> Boqun

>>

>>> What are your thoughts on this?

>>>

>>> 							Thanx, Paul

>>>

>>>>             Thanks, Akira

>>>>

>>>>> That said, I very much welcome critical reviews of memory-barriers.txt,

>>>>> so please do feel free to continue doing that!

>>>>>

>>>>> 							Thanx, Paul

>>>>>

>>>>>

>>>>

>>>

>>

> 

>

Paul E. McKenney July 20, 2017, 4:11 p.m. UTC | #9

On Thu, Jul 20, 2017 at 09:55:31PM +0900, Akira Yokosawa wrote:
> On 2017/07/20 14:47, Paul E. McKenney wrote:

> > On Thu, Jul 20, 2017 at 09:31:41AM +0800, Boqun Feng wrote:

> >> On Wed, Jul 19, 2017 at 02:56:02PM -0700, Paul E. McKenney wrote:

> >>> On Thu, Jul 20, 2017 at 06:33:26AM +0900, Akira Yokosawa wrote:

> >>>> On 2017/07/20 2:43, Paul E. McKenney wrote:

> >>>>> On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote:

> >>>>>> >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 2001

> >>>>>> From: Akira Yokosawa <akiyks@gmail.com>

> >>>>>> Date: Mon, 17 Jul 2017 16:25:33 +0900

> >>>>>> Subject: [PATCH] documentation: Fix two-CPU control-dependency example

> >>>>>>

> >>>>>> In commit 5646f7acc95f ("memory-barriers: Fix control-ordering

> >>>>>> no-transitivity example"), the operator in "if" statement of

> >>>>>> the two-CPU example was modified from ">=" to ">".

> >>>>>> Now the example misses the point because there is no party

> >>>>>> who will modify "x" nor "y". So each CPU performs only the

> >>>>>> READ_ONCE().

> >>>>>>

> >>>>>> The point of this example is to use control dependency for ordering,

> >>>>>> and the WRITE_ONCE() should always be executed.

> >>>>>>

> >>>>>> So it was correct prior to the above mentioned commit. Partial

> >>>>>> revert of the commit (with context adjustments regarding other

> >>>>>> changes thereafter) restores the point.

> >>>>>>

> >>>>>> Note that the three-CPU example demonstrating the lack of

> >>>>>> transitivity stands regardless of this partial revert.

> >>>>>>

> >>>>>> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>

> >>>>>

> >>>>> Hello, Akira,

> >>>>>

> >>>>> You are quite right that many compilers would generate straightforward

> >>>>> code for the code fragment below, and in that case, the assertion could

> >>>>> never trigger due to either TSO or control dependencies, depending on

> >>>>> the architecture this was running on.

> >>>>>

> >>>>> However, if the compiler was too smart for our good, it could figure

> >>>>> out that "x" and "y" only take on the values zero and one, so that

> >>>>> the "if" would always be taken.  At that point, the compiler could

> >>>>> simply ignore the "if" with the result shown below.

> >>>>>

> >>>>>> ---

> >>>>>>  Documentation/memory-barriers.txt | 2 +-

> >>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)

> >>>>>>

> >>>>>> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt

> >>>>>> index c4ddfcd..c1ebe99 100644

> >>>>>> --- a/Documentation/memory-barriers.txt

> >>>>>> +++ b/Documentation/memory-barriers.txt

> >>>>>> @@ -851,7 +851,7 @@ demonstrated by two related examples, with the initial values of

> >>>>>>  	CPU 0                     CPU 1

> >>>>>>  	=======================   =======================

> >>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> >>>>>> -	if (r1 > 0)               if (r2 > 0)

> >>>>>> +	if (r1 >= 0)              if (r2 >= 0)

> >>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> >>>>>>

> >>>>>>  	assert(!(r1 == 1 && r2 == 1));

> >>>>>

> >>>>> Original program:

> >>>>>

> >>>>>  	CPU 0                     CPU 1

> >>>>>  	=======================   =======================

> >>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> >>>>> 	if (r1 >= 0)              if (r2 >= 0)

> >>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> >>>>>

> >>>>>  	assert(!(r1 == 1 && r2 == 1));

> >>>>>

> >>>>> Enthusiastically optimized program:

> >>>>>

> >>>>>  	CPU 0                     CPU 1

> >>>>>  	=======================   =======================

> >>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> >>>>>  	WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> >>>>>

> >>>>>  	assert(!(r1 == 1 && r2 == 1));

> >>>>>

> >>>>> This optimized version could trigger the assertion.

> >>>>>

> >>>>> So we do need to stick with the ">" comparison.

> >>>>

> >>>> Well but,

> >>>>

> >>>> Current example:

> >>>>

> >>>>  	CPU 0                     CPU 1

> >>>>  	=======================   =======================

> >>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> >>>> 	if (r1 > 0)               if (r2 > 0)

> >>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> >>>>

> >>>> 	assert(!(r1 == 1 && r2 == 1));

> >>>>

> >>>> Such a clever compiler might be able to prove that "x" and "y"

> >>>> are never modified and end up in the following:

> >>>>

> >>

> >> Hi Akira,

> >>

> >> I wouldn't call that compiler a clever one, it's a broken one ;-)

> >>

> >> So here is the thing: READ_ONCE() is a *volatile* load, which means the

> >> compiler has to generate code that actually does a load, so the values

> >> of r1 and r2 depend on the loads, therefore, a sane compiler will not 

> >> optimize the if()s out because the volatile semantics of READ_ONCE().

> >> Otherwise, I think we have a lot more to worry about than this case.

> >>

> >>>>  	CPU 0                     CPU 1

> >>>>  	=======================   =======================

> >>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> >>>>

> >>>> 	assert(!(r1 == 1 && r2 == 1));

> >>>>

> >>>> This means it is impossible to describe this example in C,

> >>>> doesn't it?

> >>>

> >>> That is a matter of some debate, but it has gotten increasingly more

> >>> difficult to get C to do one's bidding over the decades.

> >>>

> >>>> What am I missing here?

> >>>

> >>> The compiler has to work harder in your example case, so it is probably

> >>> just a matter of time.  In the original with ">=", all the compiler needed

> >>> to do was look at all the assignments and the initial value.  In the

> >>> original, it also had to do reasoning based on control dependencies

> >>> (which are not yet part of the C standard).

> >>>

> >>> But yes, the degree to which a compiler can optimize atomics is a matter

> >>> of some debate.  Here is a proposal to let the developer choose:

> >>>

> >>

> >> Hi Paul,

> >>

> >> I know the compiler could optimize atomics in very interesting ways, but

> >> this case is about volatile, so I guess our case is still fine? ;-)

> > 

> > Hello, Boqun,

> > 

> > When I asked that question, the answer I got was "the compiler must

> > emit the load instruction, but is under no obligation to actually use the

> > value loaded".

> 

> So, it sounds like the following description found in memory-barriers.txt

> just above the example of lack of transitivity:

> 

> > However, stores are not speculated.  This means that ordering -is- provided

> > for load-store control dependencies, as in the following example:

> > 

> > 	q = READ_ONCE(a);

> > 	if (q) {

> > 		WRITE_ONCE(b, 1);

> > 	}

> > 

> > Control dependencies pair normally with other types of barriers.

> > That said, please note that neither READ_ONCE() nor WRITE_ONCE()

> > are optional! Without the READ_ONCE(), the compiler might combine the

> > load from 'a' with other loads from 'a'.  Without the WRITE_ONCE(),

> > the compiler might combine the store to 'b' with other stores to 'b'.

> > Either can result in highly counterintuitive effects on ordering.

> > 

> > Worse yet, if the compiler is able to prove (say) that the value of

> > variable 'a' is always non-zero, it would be well within its rights

> > to optimize the original example by eliminating the "if" statement

> > as follows:

> > 

> > 	q = a;

> > 	b = 1;  /* BUG: Compiler and CPU can both reorder!!! */

> > 

> > So don't leave out the READ_ONCE().

> 

> does not stand if the answer needs to be taken seriously.

> The READ_ONCE() is not good enough to prevent the "if" statement

> from being eliminated.

> 

> Hmm... If we do need to care about such an extreme optimization,

> we should not rely on any control dependency in C source code

> at least until the compiler gets educated about control dependency.

> 

> Is there some reasonable compromise?


Right now, what we have are the volatile loads such as from READ_ONCE()
and WRITE_ONCE().  My defense of them has been that they need to properly
handle MMIO.  But not all compiler writers agree with me, so we should
program at least a bit defensively.  Though we should probably also
be writing tests to verify that the compiler is doing the right thing,
and those litmus tests should probably include your ">=" litmus test.
But we should not encourage developers to rely on it quite yet.

My current position is that compiler writers are currently unlikely to
make their compilers do deep reasoning around the memory model, and that
they are currently unlikely to do cross-translation-unit assigned-value
analysis (though LTO might be changing that).  But given a static atomic
that is visible only within a translation unit, I would expect them
to do value-range analysis.  Hence my reluctance to present the ">="
variant as a pattern to follow.

							Thanx, Paul


>            Thanks, Akira

> 

> > 

> > I don't happen to like that answer, by the way.  ;-)

> > 

> > 							Thanx, Paul

> > 

> >>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html

> >>>

> >>

> >> Great material to wake up mind in the morning! Thanks.

> >>

> >> Regards,

> >> Boqun

> >>

> >>> What are your thoughts on this?

> >>>

> >>> 							Thanx, Paul

> >>>

> >>>>             Thanks, Akira

> >>>>

> >>>>> That said, I very much welcome critical reviews of memory-barriers.txt,

> >>>>> so please do feel free to continue doing that!

> >>>>>

> >>>>> 							Thanx, Paul

> >>>>>

> >>>>>

> >>>>

> >>>

> >>

> > 

> > 

>

Akira Yokosawa July 20, 2017, 9:12 p.m. UTC | #10

On 2017/07/20 09:11:52AM -0700, Paul E. McKenney wrote:
> On Thu, Jul 20, 2017 at 09:55:31PM +0900, Akira Yokosawa wrote:

>> On 2017/07/20 14:47, Paul E. McKenney wrote:

>>> On Thu, Jul 20, 2017 at 09:31:41AM +0800, Boqun Feng wrote:

>>>> On Wed, Jul 19, 2017 at 02:56:02PM -0700, Paul E. McKenney wrote:

>>>>> On Thu, Jul 20, 2017 at 06:33:26AM +0900, Akira Yokosawa wrote:

>>>>>> On 2017/07/20 2:43, Paul E. McKenney wrote:

>>>>>>> On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote:

>>>>>>>> >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 2001

>>>>>>>> From: Akira Yokosawa <akiyks@gmail.com>

>>>>>>>> Date: Mon, 17 Jul 2017 16:25:33 +0900

>>>>>>>> Subject: [PATCH] documentation: Fix two-CPU control-dependency example

>>>>>>>>

>>>>>>>> In commit 5646f7acc95f ("memory-barriers: Fix control-ordering

>>>>>>>> no-transitivity example"), the operator in "if" statement of

>>>>>>>> the two-CPU example was modified from ">=" to ">".

>>>>>>>> Now the example misses the point because there is no party

>>>>>>>> who will modify "x" nor "y". So each CPU performs only the

>>>>>>>> READ_ONCE().

>>>>>>>>

>>>>>>>> The point of this example is to use control dependency for ordering,

>>>>>>>> and the WRITE_ONCE() should always be executed.

>>>>>>>>

>>>>>>>> So it was correct prior to the above mentioned commit. Partial

>>>>>>>> revert of the commit (with context adjustments regarding other

>>>>>>>> changes thereafter) restores the point.

>>>>>>>>

>>>>>>>> Note that the three-CPU example demonstrating the lack of

>>>>>>>> transitivity stands regardless of this partial revert.

>>>>>>>>

>>>>>>>> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>

>>>>>>>

>>>>>>> Hello, Akira,

>>>>>>>

>>>>>>> You are quite right that many compilers would generate straightforward

>>>>>>> code for the code fragment below, and in that case, the assertion could

>>>>>>> never trigger due to either TSO or control dependencies, depending on

>>>>>>> the architecture this was running on.

>>>>>>>

>>>>>>> However, if the compiler was too smart for our good, it could figure

>>>>>>> out that "x" and "y" only take on the values zero and one, so that

>>>>>>> the "if" would always be taken.  At that point, the compiler could

>>>>>>> simply ignore the "if" with the result shown below.

>>>>>>>

>>>>>>>> ---

>>>>>>>>  Documentation/memory-barriers.txt | 2 +-

>>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)

>>>>>>>>

>>>>>>>> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt

>>>>>>>> index c4ddfcd..c1ebe99 100644

>>>>>>>> --- a/Documentation/memory-barriers.txt

>>>>>>>> +++ b/Documentation/memory-barriers.txt

>>>>>>>> @@ -851,7 +851,7 @@ demonstrated by two related examples, with the initial values of

>>>>>>>>  	CPU 0                     CPU 1

>>>>>>>>  	=======================   =======================

>>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

>>>>>>>> -	if (r1 > 0)               if (r2 > 0)

>>>>>>>> +	if (r1 >= 0)              if (r2 >= 0)

>>>>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

>>>>>>>>

>>>>>>>>  	assert(!(r1 == 1 && r2 == 1));

>>>>>>>

>>>>>>> Original program:

>>>>>>>

>>>>>>>  	CPU 0                     CPU 1

>>>>>>>  	=======================   =======================

>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

>>>>>>> 	if (r1 >= 0)              if (r2 >= 0)

>>>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

>>>>>>>

>>>>>>>  	assert(!(r1 == 1 && r2 == 1));

>>>>>>>

>>>>>>> Enthusiastically optimized program:

>>>>>>>

>>>>>>>  	CPU 0                     CPU 1

>>>>>>>  	=======================   =======================

>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

>>>>>>>  	WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

>>>>>>>

>>>>>>>  	assert(!(r1 == 1 && r2 == 1));

>>>>>>>

>>>>>>> This optimized version could trigger the assertion.

>>>>>>>

>>>>>>> So we do need to stick with the ">" comparison.

>>>>>>

>>>>>> Well but,

>>>>>>

>>>>>> Current example:

>>>>>>

>>>>>>  	CPU 0                     CPU 1

>>>>>>  	=======================   =======================

>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

>>>>>> 	if (r1 > 0)               if (r2 > 0)

>>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

>>>>>>

>>>>>> 	assert(!(r1 == 1 && r2 == 1));

>>>>>>

>>>>>> Such a clever compiler might be able to prove that "x" and "y"

>>>>>> are never modified and end up in the following:

>>>>>>

>>>>

>>>> Hi Akira,

>>>>

>>>> I wouldn't call that compiler a clever one, it's a broken one ;-)

>>>>

>>>> So here is the thing: READ_ONCE() is a *volatile* load, which means the

>>>> compiler has to generate code that actually does a load, so the values

>>>> of r1 and r2 depend on the loads, therefore, a sane compiler will not 

>>>> optimize the if()s out because the volatile semantics of READ_ONCE().

>>>> Otherwise, I think we have a lot more to worry about than this case.

>>>>

>>>>>>  	CPU 0                     CPU 1

>>>>>>  	=======================   =======================

>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

>>>>>>

>>>>>> 	assert(!(r1 == 1 && r2 == 1));

>>>>>>

>>>>>> This means it is impossible to describe this example in C,

>>>>>> doesn't it?

>>>>>

>>>>> That is a matter of some debate, but it has gotten increasingly more

>>>>> difficult to get C to do one's bidding over the decades.

>>>>>

>>>>>> What am I missing here?

>>>>>

>>>>> The compiler has to work harder in your example case, so it is probably

>>>>> just a matter of time.  In the original with ">=", all the compiler needed

>>>>> to do was look at all the assignments and the initial value.  In the

>>>>> original, it also had to do reasoning based on control dependencies

>>>>> (which are not yet part of the C standard).

>>>>>

>>>>> But yes, the degree to which a compiler can optimize atomics is a matter

>>>>> of some debate.  Here is a proposal to let the developer choose:

>>>>>

>>>>

>>>> Hi Paul,

>>>>

>>>> I know the compiler could optimize atomics in very interesting ways, but

>>>> this case is about volatile, so I guess our case is still fine? ;-)

>>>

>>> Hello, Boqun,

>>>

>>> When I asked that question, the answer I got was "the compiler must

>>> emit the load instruction, but is under no obligation to actually use the

>>> value loaded".

>>

>> So, it sounds like the following description found in memory-barriers.txt

>> just above the example of lack of transitivity:

>>

>>> However, stores are not speculated.  This means that ordering -is- provided

>>> for load-store control dependencies, as in the following example:

>>>

>>> 	q = READ_ONCE(a);

>>> 	if (q) {

>>> 		WRITE_ONCE(b, 1);

>>> 	}

>>>

>>> Control dependencies pair normally with other types of barriers.

>>> That said, please note that neither READ_ONCE() nor WRITE_ONCE()

>>> are optional! Without the READ_ONCE(), the compiler might combine the

>>> load from 'a' with other loads from 'a'.  Without the WRITE_ONCE(),

>>> the compiler might combine the store to 'b' with other stores to 'b'.

>>> Either can result in highly counterintuitive effects on ordering.

>>>

>>> Worse yet, if the compiler is able to prove (say) that the value of

>>> variable 'a' is always non-zero, it would be well within its rights

>>> to optimize the original example by eliminating the "if" statement

>>> as follows:

>>>

>>> 	q = a;

>>> 	b = 1;  /* BUG: Compiler and CPU can both reorder!!! */

>>>

>>> So don't leave out the READ_ONCE().

>>

>> does not stand if the answer needs to be taken seriously.

>> The READ_ONCE() is not good enough to prevent the "if" statement

>> from being eliminated.

>>

>> Hmm... If we do need to care about such an extreme optimization,

>> we should not rely on any control dependency in C source code

>> at least until the compiler gets educated about control dependency.

>>

>> Is there some reasonable compromise?

> 

> Right now, what we have are the volatile loads such as from READ_ONCE()

> and WRITE_ONCE().  My defense of them has been that they need to properly

> handle MMIO.  But not all compiler writers agree with me, so we should

> program at least a bit defensively.  Though we should probably also

> be writing tests to verify that the compiler is doing the right thing,

> and those litmus tests should probably include your ">=" litmus test.

> But we should not encourage developers to rely on it quite yet.

> 

> My current position is that compiler writers are currently unlikely to

> make their compilers do deep reasoning around the memory model, and that

> they are currently unlikely to do cross-translation-unit assigned-value

> analysis (though LTO might be changing that).  But given a static atomic

> that is visible only within a translation unit, I would expect them

> to do value-range analysis.  Hence my reluctance to present the ">="

> variant as a pattern to follow.


So if we declare "x" and "y" are not static (not file local),
can the ">=" variant be exempted from the possible optimization?

For example:

	extern int x, y;

	CPU 0                     CPU 1
	=======================   =======================
	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
	if (r1 >= 0)              if (r2 >= 0)
	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

	assert(!(r1 == 1 && r2 == 1));

This looks safe to me.

Thoughts?

            Thanks, Akira

> 

> 							Thanx, Paul

> 

> 

>>            Thanks, Akira

>>

>>>

>>> I don't happen to like that answer, by the way.  ;-)

>>>

>>> 							Thanx, Paul

>>>

>>>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html

>>>>>

>>>>

>>>> Great material to wake up mind in the morning! Thanks.

>>>>

>>>> Regards,

>>>> Boqun

>>>>

>>>>> What are your thoughts on this?

>>>>>

>>>>> 							Thanx, Paul

>>>>>

>>>>>>             Thanks, Akira

>>>>>>

>>>>>>> That said, I very much welcome critical reviews of memory-barriers.txt,

>>>>>>> so please do feel free to continue doing that!

>>>>>>>

>>>>>>> 							Thanx, Paul

>>>>>>>

>>>>>>>

>>>>>>

>>>>>

>>>>

>>>

>>>

>>

> 

>

Paul E. McKenney July 20, 2017, 9:42 p.m. UTC | #11

On Fri, Jul 21, 2017 at 06:12:56AM +0900, Akira Yokosawa wrote:
> On 2017/07/20 09:11:52AM -0700, Paul E. McKenney wrote:

> > On Thu, Jul 20, 2017 at 09:55:31PM +0900, Akira Yokosawa wrote:

> >> On 2017/07/20 14:47, Paul E. McKenney wrote:

> >>> On Thu, Jul 20, 2017 at 09:31:41AM +0800, Boqun Feng wrote:

> >>>> On Wed, Jul 19, 2017 at 02:56:02PM -0700, Paul E. McKenney wrote:

> >>>>> On Thu, Jul 20, 2017 at 06:33:26AM +0900, Akira Yokosawa wrote:

> >>>>>> On 2017/07/20 2:43, Paul E. McKenney wrote:

> >>>>>>> On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote:

> >>>>>>>> >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 2001

> >>>>>>>> From: Akira Yokosawa <akiyks@gmail.com>

> >>>>>>>> Date: Mon, 17 Jul 2017 16:25:33 +0900

> >>>>>>>> Subject: [PATCH] documentation: Fix two-CPU control-dependency example

> >>>>>>>>

> >>>>>>>> In commit 5646f7acc95f ("memory-barriers: Fix control-ordering

> >>>>>>>> no-transitivity example"), the operator in "if" statement of

> >>>>>>>> the two-CPU example was modified from ">=" to ">".

> >>>>>>>> Now the example misses the point because there is no party

> >>>>>>>> who will modify "x" nor "y". So each CPU performs only the

> >>>>>>>> READ_ONCE().

> >>>>>>>>

> >>>>>>>> The point of this example is to use control dependency for ordering,

> >>>>>>>> and the WRITE_ONCE() should always be executed.

> >>>>>>>>

> >>>>>>>> So it was correct prior to the above mentioned commit. Partial

> >>>>>>>> revert of the commit (with context adjustments regarding other

> >>>>>>>> changes thereafter) restores the point.

> >>>>>>>>

> >>>>>>>> Note that the three-CPU example demonstrating the lack of

> >>>>>>>> transitivity stands regardless of this partial revert.

> >>>>>>>>

> >>>>>>>> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>

> >>>>>>>

> >>>>>>> Hello, Akira,

> >>>>>>>

> >>>>>>> You are quite right that many compilers would generate straightforward

> >>>>>>> code for the code fragment below, and in that case, the assertion could

> >>>>>>> never trigger due to either TSO or control dependencies, depending on

> >>>>>>> the architecture this was running on.

> >>>>>>>

> >>>>>>> However, if the compiler was too smart for our good, it could figure

> >>>>>>> out that "x" and "y" only take on the values zero and one, so that

> >>>>>>> the "if" would always be taken.  At that point, the compiler could

> >>>>>>> simply ignore the "if" with the result shown below.

> >>>>>>>

> >>>>>>>> ---

> >>>>>>>>  Documentation/memory-barriers.txt | 2 +-

> >>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)

> >>>>>>>>

> >>>>>>>> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt

> >>>>>>>> index c4ddfcd..c1ebe99 100644

> >>>>>>>> --- a/Documentation/memory-barriers.txt

> >>>>>>>> +++ b/Documentation/memory-barriers.txt

> >>>>>>>> @@ -851,7 +851,7 @@ demonstrated by two related examples, with the initial values of

> >>>>>>>>  	CPU 0                     CPU 1

> >>>>>>>>  	=======================   =======================

> >>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> >>>>>>>> -	if (r1 > 0)               if (r2 > 0)

> >>>>>>>> +	if (r1 >= 0)              if (r2 >= 0)

> >>>>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> >>>>>>>>

> >>>>>>>>  	assert(!(r1 == 1 && r2 == 1));

> >>>>>>>

> >>>>>>> Original program:

> >>>>>>>

> >>>>>>>  	CPU 0                     CPU 1

> >>>>>>>  	=======================   =======================

> >>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> >>>>>>> 	if (r1 >= 0)              if (r2 >= 0)

> >>>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> >>>>>>>

> >>>>>>>  	assert(!(r1 == 1 && r2 == 1));

> >>>>>>>

> >>>>>>> Enthusiastically optimized program:

> >>>>>>>

> >>>>>>>  	CPU 0                     CPU 1

> >>>>>>>  	=======================   =======================

> >>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> >>>>>>>  	WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> >>>>>>>

> >>>>>>>  	assert(!(r1 == 1 && r2 == 1));

> >>>>>>>

> >>>>>>> This optimized version could trigger the assertion.

> >>>>>>>

> >>>>>>> So we do need to stick with the ">" comparison.

> >>>>>>

> >>>>>> Well but,

> >>>>>>

> >>>>>> Current example:

> >>>>>>

> >>>>>>  	CPU 0                     CPU 1

> >>>>>>  	=======================   =======================

> >>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> >>>>>> 	if (r1 > 0)               if (r2 > 0)

> >>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> >>>>>>

> >>>>>> 	assert(!(r1 == 1 && r2 == 1));

> >>>>>>

> >>>>>> Such a clever compiler might be able to prove that "x" and "y"

> >>>>>> are never modified and end up in the following:

> >>>>>>

> >>>>

> >>>> Hi Akira,

> >>>>

> >>>> I wouldn't call that compiler a clever one, it's a broken one ;-)

> >>>>

> >>>> So here is the thing: READ_ONCE() is a *volatile* load, which means the

> >>>> compiler has to generate code that actually does a load, so the values

> >>>> of r1 and r2 depend on the loads, therefore, a sane compiler will not 

> >>>> optimize the if()s out because the volatile semantics of READ_ONCE().

> >>>> Otherwise, I think we have a lot more to worry about than this case.

> >>>>

> >>>>>>  	CPU 0                     CPU 1

> >>>>>>  	=======================   =======================

> >>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> >>>>>>

> >>>>>> 	assert(!(r1 == 1 && r2 == 1));

> >>>>>>

> >>>>>> This means it is impossible to describe this example in C,

> >>>>>> doesn't it?

> >>>>>

> >>>>> That is a matter of some debate, but it has gotten increasingly more

> >>>>> difficult to get C to do one's bidding over the decades.

> >>>>>

> >>>>>> What am I missing here?

> >>>>>

> >>>>> The compiler has to work harder in your example case, so it is probably

> >>>>> just a matter of time.  In the original with ">=", all the compiler needed

> >>>>> to do was look at all the assignments and the initial value.  In the

> >>>>> original, it also had to do reasoning based on control dependencies

> >>>>> (which are not yet part of the C standard).

> >>>>>

> >>>>> But yes, the degree to which a compiler can optimize atomics is a matter

> >>>>> of some debate.  Here is a proposal to let the developer choose:

> >>>>>

> >>>>

> >>>> Hi Paul,

> >>>>

> >>>> I know the compiler could optimize atomics in very interesting ways, but

> >>>> this case is about volatile, so I guess our case is still fine? ;-)

> >>>

> >>> Hello, Boqun,

> >>>

> >>> When I asked that question, the answer I got was "the compiler must

> >>> emit the load instruction, but is under no obligation to actually use the

> >>> value loaded".

> >>

> >> So, it sounds like the following description found in memory-barriers.txt

> >> just above the example of lack of transitivity:

> >>

> >>> However, stores are not speculated.  This means that ordering -is- provided

> >>> for load-store control dependencies, as in the following example:

> >>>

> >>> 	q = READ_ONCE(a);

> >>> 	if (q) {

> >>> 		WRITE_ONCE(b, 1);

> >>> 	}

> >>>

> >>> Control dependencies pair normally with other types of barriers.

> >>> That said, please note that neither READ_ONCE() nor WRITE_ONCE()

> >>> are optional! Without the READ_ONCE(), the compiler might combine the

> >>> load from 'a' with other loads from 'a'.  Without the WRITE_ONCE(),

> >>> the compiler might combine the store to 'b' with other stores to 'b'.

> >>> Either can result in highly counterintuitive effects on ordering.

> >>>

> >>> Worse yet, if the compiler is able to prove (say) that the value of

> >>> variable 'a' is always non-zero, it would be well within its rights

> >>> to optimize the original example by eliminating the "if" statement

> >>> as follows:

> >>>

> >>> 	q = a;

> >>> 	b = 1;  /* BUG: Compiler and CPU can both reorder!!! */

> >>>

> >>> So don't leave out the READ_ONCE().

> >>

> >> does not stand if the answer needs to be taken seriously.

> >> The READ_ONCE() is not good enough to prevent the "if" statement

> >> from being eliminated.

> >>

> >> Hmm... If we do need to care about such an extreme optimization,

> >> we should not rely on any control dependency in C source code

> >> at least until the compiler gets educated about control dependency.

> >>

> >> Is there some reasonable compromise?

> > 

> > Right now, what we have are the volatile loads such as from READ_ONCE()

> > and WRITE_ONCE().  My defense of them has been that they need to properly

> > handle MMIO.  But not all compiler writers agree with me, so we should

> > program at least a bit defensively.  Though we should probably also

> > be writing tests to verify that the compiler is doing the right thing,

> > and those litmus tests should probably include your ">=" litmus test.

> > But we should not encourage developers to rely on it quite yet.

> > 

> > My current position is that compiler writers are currently unlikely to

> > make their compilers do deep reasoning around the memory model, and that

> > they are currently unlikely to do cross-translation-unit assigned-value

> > analysis (though LTO might be changing that).  But given a static atomic

> > that is visible only within a translation unit, I would expect them

> > to do value-range analysis.  Hence my reluctance to present the ">="

> > variant as a pattern to follow.

> 

> So if we declare "x" and "y" are not static (not file local),

> can the ">=" variant be exempted from the possible optimization?

> 

> For example:

> 

> 	extern int x, y;

> 

> 	CPU 0                     CPU 1

> 	=======================   =======================

> 	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> 	if (r1 >= 0)              if (r2 >= 0)

> 	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> 

> 	assert(!(r1 == 1 && r2 == 1));

> 

> This looks safe to me.


For the compilers I know about at the present time, yes.

The underlying problem is that the standard says almost nothing about what
volatile does.  I usually argue that it was intended to be used for MMIO,
so any optimization that would prevent a device driver from working must
be prohibited on volatiles.  A lot of people really don't like volatile,
and would like to eliminate it, and these people also aren't very happy
about any attempt to better define volatile.

Keeps things entertaining.  ;-)

							Thanx, Paul

> Thoughts?

> 

>             Thanks, Akira

> 

> > 

> > 							Thanx, Paul

> > 

> > 

> >>            Thanks, Akira

> >>

> >>>

> >>> I don't happen to like that answer, by the way.  ;-)

> >>>

> >>> 							Thanx, Paul

> >>>

> >>>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html

> >>>>>

> >>>>

> >>>> Great material to wake up mind in the morning! Thanks.

> >>>>

> >>>> Regards,

> >>>> Boqun

> >>>>

> >>>>> What are your thoughts on this?

> >>>>>

> >>>>> 							Thanx, Paul

> >>>>>

> >>>>>>             Thanks, Akira

> >>>>>>

> >>>>>>> That said, I very much welcome critical reviews of memory-barriers.txt,

> >>>>>>> so please do feel free to continue doing that!

> >>>>>>>

> >>>>>>> 							Thanx, Paul

> >>>>>>>

> >>>>>>>

> >>>>>>

> >>>>>

> >>>>

> >>>

> >>>

> >>

> > 

> > 

>

Akira Yokosawa July 20, 2017, 10:52 p.m. UTC | #12

On 2017/07/20 14:42:34 -0700, Paul E. McKenney wrote:
> On Fri, Jul 21, 2017 at 06:12:56AM +0900, Akira Yokosawa wrote:

>> On 2017/07/20 09:11:52AM -0700, Paul E. McKenney wrote:

>>> On Thu, Jul 20, 2017 at 09:55:31PM +0900, Akira Yokosawa wrote:

>>>> On 2017/07/20 14:47, Paul E. McKenney wrote:

>>>>> On Thu, Jul 20, 2017 at 09:31:41AM +0800, Boqun Feng wrote:

>>>>>> On Wed, Jul 19, 2017 at 02:56:02PM -0700, Paul E. McKenney wrote:

>>>>>>> On Thu, Jul 20, 2017 at 06:33:26AM +0900, Akira Yokosawa wrote:

>>>>>>>> On 2017/07/20 2:43, Paul E. McKenney wrote:

>>>>>>>>> On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote:

>>>>>>>>>> >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 2001

>>>>>>>>>> From: Akira Yokosawa <akiyks@gmail.com>

>>>>>>>>>> Date: Mon, 17 Jul 2017 16:25:33 +0900

>>>>>>>>>> Subject: [PATCH] documentation: Fix two-CPU control-dependency example

>>>>>>>>>>

>>>>>>>>>> In commit 5646f7acc95f ("memory-barriers: Fix control-ordering

>>>>>>>>>> no-transitivity example"), the operator in "if" statement of

>>>>>>>>>> the two-CPU example was modified from ">=" to ">".

>>>>>>>>>> Now the example misses the point because there is no party

>>>>>>>>>> who will modify "x" nor "y". So each CPU performs only the

>>>>>>>>>> READ_ONCE().

>>>>>>>>>>

>>>>>>>>>> The point of this example is to use control dependency for ordering,

>>>>>>>>>> and the WRITE_ONCE() should always be executed.

>>>>>>>>>>

>>>>>>>>>> So it was correct prior to the above mentioned commit. Partial

>>>>>>>>>> revert of the commit (with context adjustments regarding other

>>>>>>>>>> changes thereafter) restores the point.

>>>>>>>>>>

>>>>>>>>>> Note that the three-CPU example demonstrating the lack of

>>>>>>>>>> transitivity stands regardless of this partial revert.

>>>>>>>>>>

>>>>>>>>>> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>

>>>>>>>>>

>>>>>>>>> Hello, Akira,

>>>>>>>>>

>>>>>>>>> You are quite right that many compilers would generate straightforward

>>>>>>>>> code for the code fragment below, and in that case, the assertion could

>>>>>>>>> never trigger due to either TSO or control dependencies, depending on

>>>>>>>>> the architecture this was running on.

>>>>>>>>>

>>>>>>>>> However, if the compiler was too smart for our good, it could figure

>>>>>>>>> out that "x" and "y" only take on the values zero and one, so that

>>>>>>>>> the "if" would always be taken.  At that point, the compiler could

>>>>>>>>> simply ignore the "if" with the result shown below.

>>>>>>>>>

>>>>>>>>>> ---

>>>>>>>>>>  Documentation/memory-barriers.txt | 2 +-

>>>>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)

>>>>>>>>>>

>>>>>>>>>> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt

>>>>>>>>>> index c4ddfcd..c1ebe99 100644

>>>>>>>>>> --- a/Documentation/memory-barriers.txt

>>>>>>>>>> +++ b/Documentation/memory-barriers.txt

>>>>>>>>>> @@ -851,7 +851,7 @@ demonstrated by two related examples, with the initial values of

>>>>>>>>>>  	CPU 0                     CPU 1

>>>>>>>>>>  	=======================   =======================

>>>>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

>>>>>>>>>> -	if (r1 > 0)               if (r2 > 0)

>>>>>>>>>> +	if (r1 >= 0)              if (r2 >= 0)

>>>>>>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

>>>>>>>>>>

>>>>>>>>>>  	assert(!(r1 == 1 && r2 == 1));

>>>>>>>>>

>>>>>>>>> Original program:

>>>>>>>>>

>>>>>>>>>  	CPU 0                     CPU 1

>>>>>>>>>  	=======================   =======================

>>>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

>>>>>>>>> 	if (r1 >= 0)              if (r2 >= 0)

>>>>>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

>>>>>>>>>

>>>>>>>>>  	assert(!(r1 == 1 && r2 == 1));

>>>>>>>>>

>>>>>>>>> Enthusiastically optimized program:

>>>>>>>>>

>>>>>>>>>  	CPU 0                     CPU 1

>>>>>>>>>  	=======================   =======================

>>>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

>>>>>>>>>  	WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

>>>>>>>>>

>>>>>>>>>  	assert(!(r1 == 1 && r2 == 1));

>>>>>>>>>

>>>>>>>>> This optimized version could trigger the assertion.

>>>>>>>>>

>>>>>>>>> So we do need to stick with the ">" comparison.

>>>>>>>>

>>>>>>>> Well but,

>>>>>>>>

>>>>>>>> Current example:

>>>>>>>>

>>>>>>>>  	CPU 0                     CPU 1

>>>>>>>>  	=======================   =======================

>>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

>>>>>>>> 	if (r1 > 0)               if (r2 > 0)

>>>>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

>>>>>>>>

>>>>>>>> 	assert(!(r1 == 1 && r2 == 1));

>>>>>>>>

>>>>>>>> Such a clever compiler might be able to prove that "x" and "y"

>>>>>>>> are never modified and end up in the following:

>>>>>>>>

>>>>>>

>>>>>> Hi Akira,

>>>>>>

>>>>>> I wouldn't call that compiler a clever one, it's a broken one ;-)

>>>>>>

>>>>>> So here is the thing: READ_ONCE() is a *volatile* load, which means the

>>>>>> compiler has to generate code that actually does a load, so the values

>>>>>> of r1 and r2 depend on the loads, therefore, a sane compiler will not 

>>>>>> optimize the if()s out because the volatile semantics of READ_ONCE().

>>>>>> Otherwise, I think we have a lot more to worry about than this case.

>>>>>>

>>>>>>>>  	CPU 0                     CPU 1

>>>>>>>>  	=======================   =======================

>>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

>>>>>>>>

>>>>>>>> 	assert(!(r1 == 1 && r2 == 1));

>>>>>>>>

>>>>>>>> This means it is impossible to describe this example in C,

>>>>>>>> doesn't it?

>>>>>>>

>>>>>>> That is a matter of some debate, but it has gotten increasingly more

>>>>>>> difficult to get C to do one's bidding over the decades.

>>>>>>>

>>>>>>>> What am I missing here?

>>>>>>>

>>>>>>> The compiler has to work harder in your example case, so it is probably

>>>>>>> just a matter of time.  In the original with ">=", all the compiler needed

>>>>>>> to do was look at all the assignments and the initial value.  In the

>>>>>>> original, it also had to do reasoning based on control dependencies

>>>>>>> (which are not yet part of the C standard).

>>>>>>>

>>>>>>> But yes, the degree to which a compiler can optimize atomics is a matter

>>>>>>> of some debate.  Here is a proposal to let the developer choose:

>>>>>>>

>>>>>>

>>>>>> Hi Paul,

>>>>>>

>>>>>> I know the compiler could optimize atomics in very interesting ways, but

>>>>>> this case is about volatile, so I guess our case is still fine? ;-)

>>>>>

>>>>> Hello, Boqun,

>>>>>

>>>>> When I asked that question, the answer I got was "the compiler must

>>>>> emit the load instruction, but is under no obligation to actually use the

>>>>> value loaded".

>>>>

>>>> So, it sounds like the following description found in memory-barriers.txt

>>>> just above the example of lack of transitivity:

>>>>

>>>>> However, stores are not speculated.  This means that ordering -is- provided

>>>>> for load-store control dependencies, as in the following example:

>>>>>

>>>>> 	q = READ_ONCE(a);

>>>>> 	if (q) {

>>>>> 		WRITE_ONCE(b, 1);

>>>>> 	}

>>>>>

>>>>> Control dependencies pair normally with other types of barriers.

>>>>> That said, please note that neither READ_ONCE() nor WRITE_ONCE()

>>>>> are optional! Without the READ_ONCE(), the compiler might combine the

>>>>> load from 'a' with other loads from 'a'.  Without the WRITE_ONCE(),

>>>>> the compiler might combine the store to 'b' with other stores to 'b'.

>>>>> Either can result in highly counterintuitive effects on ordering.

>>>>>

>>>>> Worse yet, if the compiler is able to prove (say) that the value of

>>>>> variable 'a' is always non-zero, it would be well within its rights

>>>>> to optimize the original example by eliminating the "if" statement

>>>>> as follows:

>>>>>

>>>>> 	q = a;

>>>>> 	b = 1;  /* BUG: Compiler and CPU can both reorder!!! */

>>>>>

>>>>> So don't leave out the READ_ONCE().

>>>>

>>>> does not stand if the answer needs to be taken seriously.

>>>> The READ_ONCE() is not good enough to prevent the "if" statement

>>>> from being eliminated.

>>>>

>>>> Hmm... If we do need to care about such an extreme optimization,

>>>> we should not rely on any control dependency in C source code

>>>> at least until the compiler gets educated about control dependency.

>>>>

>>>> Is there some reasonable compromise?

>>>

>>> Right now, what we have are the volatile loads such as from READ_ONCE()

>>> and WRITE_ONCE().  My defense of them has been that they need to properly

>>> handle MMIO.  But not all compiler writers agree with me, so we should

>>> program at least a bit defensively.  Though we should probably also

>>> be writing tests to verify that the compiler is doing the right thing,

>>> and those litmus tests should probably include your ">=" litmus test.

>>> But we should not encourage developers to rely on it quite yet.

>>>

>>> My current position is that compiler writers are currently unlikely to

>>> make their compilers do deep reasoning around the memory model, and that

>>> they are currently unlikely to do cross-translation-unit assigned-value

>>> analysis (though LTO might be changing that).  But given a static atomic

>>> that is visible only within a translation unit, I would expect them

>>> to do value-range analysis.  Hence my reluctance to present the ">="

>>> variant as a pattern to follow.

>>

>> So if we declare "x" and "y" are not static (not file local),

>> can the ">=" variant be exempted from the possible optimization?

>>

>> For example:

>>

>> 	extern int x, y;

>>

>> 	CPU 0                     CPU 1

>> 	=======================   =======================

>> 	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

>> 	if (r1 >= 0)              if (r2 >= 0)

>> 	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

>>

>> 	assert(!(r1 == 1 && r2 == 1));

>>

>> This looks safe to me.

> 

> For the compilers I know about at the present time, yes.


So if I respin the patch with the extern, would you still feel reluctant?

             Regards, Akira

> 

> The underlying problem is that the standard says almost nothing about what

> volatile does.  I usually argue that it was intended to be used for MMIO,

> so any optimization that would prevent a device driver from working must

> be prohibited on volatiles.  A lot of people really don't like volatile,

> and would like to eliminate it, and these people also aren't very happy

> about any attempt to better define volatile.

> 

> Keeps things entertaining.  ;-)

> 

> 							Thanx, Paul

> 

>> Thoughts?

>>

>>             Thanks, Akira

>>

>>>

>>> 							Thanx, Paul

>>>

>>>

>>>>            Thanks, Akira

>>>>

>>>>>

>>>>> I don't happen to like that answer, by the way.  ;-)

>>>>>

>>>>> 							Thanx, Paul

>>>>>

>>>>>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html

>>>>>>>

>>>>>>

>>>>>> Great material to wake up mind in the morning! Thanks.

>>>>>>

>>>>>> Regards,

>>>>>> Boqun

>>>>>>

>>>>>>> What are your thoughts on this?

>>>>>>>

>>>>>>> 							Thanx, Paul

>>>>>>>

>>>>>>>>             Thanks, Akira

>>>>>>>>

>>>>>>>>> That said, I very much welcome critical reviews of memory-barriers.txt,

>>>>>>>>> so please do feel free to continue doing that!

>>>>>>>>>

>>>>>>>>> 							Thanx, Paul

>>>>>>>>>

>>>>>>>>>

>>>>>>>>

>>>>>>>

>>>>>>

>>>>>

>>>>>

>>>>

>>>

>>>

>>

> 

>

Paul E. McKenney July 20, 2017, 11:07 p.m. UTC | #13

On Fri, Jul 21, 2017 at 07:52:03AM +0900, Akira Yokosawa wrote:
> On 2017/07/20 14:42:34 -0700, Paul E. McKenney wrote:

> > On Fri, Jul 21, 2017 at 06:12:56AM +0900, Akira Yokosawa wrote:

> >> On 2017/07/20 09:11:52AM -0700, Paul E. McKenney wrote:

> >>> On Thu, Jul 20, 2017 at 09:55:31PM +0900, Akira Yokosawa wrote:

> >>>> On 2017/07/20 14:47, Paul E. McKenney wrote:

> >>>>> On Thu, Jul 20, 2017 at 09:31:41AM +0800, Boqun Feng wrote:

> >>>>>> On Wed, Jul 19, 2017 at 02:56:02PM -0700, Paul E. McKenney wrote:

> >>>>>>> On Thu, Jul 20, 2017 at 06:33:26AM +0900, Akira Yokosawa wrote:

> >>>>>>>> On 2017/07/20 2:43, Paul E. McKenney wrote:

> >>>>>>>>> On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote:

> >>>>>>>>>> >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 2001

> >>>>>>>>>> From: Akira Yokosawa <akiyks@gmail.com>

> >>>>>>>>>> Date: Mon, 17 Jul 2017 16:25:33 +0900

> >>>>>>>>>> Subject: [PATCH] documentation: Fix two-CPU control-dependency example

> >>>>>>>>>>

> >>>>>>>>>> In commit 5646f7acc95f ("memory-barriers: Fix control-ordering

> >>>>>>>>>> no-transitivity example"), the operator in "if" statement of

> >>>>>>>>>> the two-CPU example was modified from ">=" to ">".

> >>>>>>>>>> Now the example misses the point because there is no party

> >>>>>>>>>> who will modify "x" nor "y". So each CPU performs only the

> >>>>>>>>>> READ_ONCE().

> >>>>>>>>>>

> >>>>>>>>>> The point of this example is to use control dependency for ordering,

> >>>>>>>>>> and the WRITE_ONCE() should always be executed.

> >>>>>>>>>>

> >>>>>>>>>> So it was correct prior to the above mentioned commit. Partial

> >>>>>>>>>> revert of the commit (with context adjustments regarding other

> >>>>>>>>>> changes thereafter) restores the point.

> >>>>>>>>>>

> >>>>>>>>>> Note that the three-CPU example demonstrating the lack of

> >>>>>>>>>> transitivity stands regardless of this partial revert.

> >>>>>>>>>>

> >>>>>>>>>> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>

> >>>>>>>>>

> >>>>>>>>> Hello, Akira,

> >>>>>>>>>

> >>>>>>>>> You are quite right that many compilers would generate straightforward

> >>>>>>>>> code for the code fragment below, and in that case, the assertion could

> >>>>>>>>> never trigger due to either TSO or control dependencies, depending on

> >>>>>>>>> the architecture this was running on.

> >>>>>>>>>

> >>>>>>>>> However, if the compiler was too smart for our good, it could figure

> >>>>>>>>> out that "x" and "y" only take on the values zero and one, so that

> >>>>>>>>> the "if" would always be taken.  At that point, the compiler could

> >>>>>>>>> simply ignore the "if" with the result shown below.

> >>>>>>>>>

> >>>>>>>>>> ---

> >>>>>>>>>>  Documentation/memory-barriers.txt | 2 +-

> >>>>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)

> >>>>>>>>>>

> >>>>>>>>>> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt

> >>>>>>>>>> index c4ddfcd..c1ebe99 100644

> >>>>>>>>>> --- a/Documentation/memory-barriers.txt

> >>>>>>>>>> +++ b/Documentation/memory-barriers.txt

> >>>>>>>>>> @@ -851,7 +851,7 @@ demonstrated by two related examples, with the initial values of

> >>>>>>>>>>  	CPU 0                     CPU 1

> >>>>>>>>>>  	=======================   =======================

> >>>>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> >>>>>>>>>> -	if (r1 > 0)               if (r2 > 0)

> >>>>>>>>>> +	if (r1 >= 0)              if (r2 >= 0)

> >>>>>>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> >>>>>>>>>>

> >>>>>>>>>>  	assert(!(r1 == 1 && r2 == 1));

> >>>>>>>>>

> >>>>>>>>> Original program:

> >>>>>>>>>

> >>>>>>>>>  	CPU 0                     CPU 1

> >>>>>>>>>  	=======================   =======================

> >>>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> >>>>>>>>> 	if (r1 >= 0)              if (r2 >= 0)

> >>>>>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> >>>>>>>>>

> >>>>>>>>>  	assert(!(r1 == 1 && r2 == 1));

> >>>>>>>>>

> >>>>>>>>> Enthusiastically optimized program:

> >>>>>>>>>

> >>>>>>>>>  	CPU 0                     CPU 1

> >>>>>>>>>  	=======================   =======================

> >>>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> >>>>>>>>>  	WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> >>>>>>>>>

> >>>>>>>>>  	assert(!(r1 == 1 && r2 == 1));

> >>>>>>>>>

> >>>>>>>>> This optimized version could trigger the assertion.

> >>>>>>>>>

> >>>>>>>>> So we do need to stick with the ">" comparison.

> >>>>>>>>

> >>>>>>>> Well but,

> >>>>>>>>

> >>>>>>>> Current example:

> >>>>>>>>

> >>>>>>>>  	CPU 0                     CPU 1

> >>>>>>>>  	=======================   =======================

> >>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> >>>>>>>> 	if (r1 > 0)               if (r2 > 0)

> >>>>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> >>>>>>>>

> >>>>>>>> 	assert(!(r1 == 1 && r2 == 1));

> >>>>>>>>

> >>>>>>>> Such a clever compiler might be able to prove that "x" and "y"

> >>>>>>>> are never modified and end up in the following:

> >>>>>>>>

> >>>>>>

> >>>>>> Hi Akira,

> >>>>>>

> >>>>>> I wouldn't call that compiler a clever one, it's a broken one ;-)

> >>>>>>

> >>>>>> So here is the thing: READ_ONCE() is a *volatile* load, which means the

> >>>>>> compiler has to generate code that actually does a load, so the values

> >>>>>> of r1 and r2 depend on the loads, therefore, a sane compiler will not 

> >>>>>> optimize the if()s out because the volatile semantics of READ_ONCE().

> >>>>>> Otherwise, I think we have a lot more to worry about than this case.

> >>>>>>

> >>>>>>>>  	CPU 0                     CPU 1

> >>>>>>>>  	=======================   =======================

> >>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> >>>>>>>>

> >>>>>>>> 	assert(!(r1 == 1 && r2 == 1));

> >>>>>>>>

> >>>>>>>> This means it is impossible to describe this example in C,

> >>>>>>>> doesn't it?

> >>>>>>>

> >>>>>>> That is a matter of some debate, but it has gotten increasingly more

> >>>>>>> difficult to get C to do one's bidding over the decades.

> >>>>>>>

> >>>>>>>> What am I missing here?

> >>>>>>>

> >>>>>>> The compiler has to work harder in your example case, so it is probably

> >>>>>>> just a matter of time.  In the original with ">=", all the compiler needed

> >>>>>>> to do was look at all the assignments and the initial value.  In the

> >>>>>>> original, it also had to do reasoning based on control dependencies

> >>>>>>> (which are not yet part of the C standard).

> >>>>>>>

> >>>>>>> But yes, the degree to which a compiler can optimize atomics is a matter

> >>>>>>> of some debate.  Here is a proposal to let the developer choose:

> >>>>>>>

> >>>>>>

> >>>>>> Hi Paul,

> >>>>>>

> >>>>>> I know the compiler could optimize atomics in very interesting ways, but

> >>>>>> this case is about volatile, so I guess our case is still fine? ;-)

> >>>>>

> >>>>> Hello, Boqun,

> >>>>>

> >>>>> When I asked that question, the answer I got was "the compiler must

> >>>>> emit the load instruction, but is under no obligation to actually use the

> >>>>> value loaded".

> >>>>

> >>>> So, it sounds like the following description found in memory-barriers.txt

> >>>> just above the example of lack of transitivity:

> >>>>

> >>>>> However, stores are not speculated.  This means that ordering -is- provided

> >>>>> for load-store control dependencies, as in the following example:

> >>>>>

> >>>>> 	q = READ_ONCE(a);

> >>>>> 	if (q) {

> >>>>> 		WRITE_ONCE(b, 1);

> >>>>> 	}

> >>>>>

> >>>>> Control dependencies pair normally with other types of barriers.

> >>>>> That said, please note that neither READ_ONCE() nor WRITE_ONCE()

> >>>>> are optional! Without the READ_ONCE(), the compiler might combine the

> >>>>> load from 'a' with other loads from 'a'.  Without the WRITE_ONCE(),

> >>>>> the compiler might combine the store to 'b' with other stores to 'b'.

> >>>>> Either can result in highly counterintuitive effects on ordering.

> >>>>>

> >>>>> Worse yet, if the compiler is able to prove (say) that the value of

> >>>>> variable 'a' is always non-zero, it would be well within its rights

> >>>>> to optimize the original example by eliminating the "if" statement

> >>>>> as follows:

> >>>>>

> >>>>> 	q = a;

> >>>>> 	b = 1;  /* BUG: Compiler and CPU can both reorder!!! */

> >>>>>

> >>>>> So don't leave out the READ_ONCE().

> >>>>

> >>>> does not stand if the answer needs to be taken seriously.

> >>>> The READ_ONCE() is not good enough to prevent the "if" statement

> >>>> from being eliminated.

> >>>>

> >>>> Hmm... If we do need to care about such an extreme optimization,

> >>>> we should not rely on any control dependency in C source code

> >>>> at least until the compiler gets educated about control dependency.

> >>>>

> >>>> Is there some reasonable compromise?

> >>>

> >>> Right now, what we have are the volatile loads such as from READ_ONCE()

> >>> and WRITE_ONCE().  My defense of them has been that they need to properly

> >>> handle MMIO.  But not all compiler writers agree with me, so we should

> >>> program at least a bit defensively.  Though we should probably also

> >>> be writing tests to verify that the compiler is doing the right thing,

> >>> and those litmus tests should probably include your ">=" litmus test.

> >>> But we should not encourage developers to rely on it quite yet.

> >>>

> >>> My current position is that compiler writers are currently unlikely to

> >>> make their compilers do deep reasoning around the memory model, and that

> >>> they are currently unlikely to do cross-translation-unit assigned-value

> >>> analysis (though LTO might be changing that).  But given a static atomic

> >>> that is visible only within a translation unit, I would expect them

> >>> to do value-range analysis.  Hence my reluctance to present the ">="

> >>> variant as a pattern to follow.

> >>

> >> So if we declare "x" and "y" are not static (not file local),

> >> can the ">=" variant be exempted from the possible optimization?

> >>

> >> For example:

> >>

> >> 	extern int x, y;

> >>

> >> 	CPU 0                     CPU 1

> >> 	=======================   =======================

> >> 	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> >> 	if (r1 >= 0)              if (r2 >= 0)

> >> 	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> >>

> >> 	assert(!(r1 == 1 && r2 == 1));

> >>

> >> This looks safe to me.

> > 

> > For the compilers I know about at the present time, yes.

> 

> So if I respin the patch with the extern, would you still feel reluctant?


Yes, because I am not seeing how this change helps.  What is this telling
the reader that the original did not, and how does it help the reader
generate good concurrent code?

							Thanx, Paul

>              Regards, Akira

> 

> > 

> > The underlying problem is that the standard says almost nothing about what

> > volatile does.  I usually argue that it was intended to be used for MMIO,

> > so any optimization that would prevent a device driver from working must

> > be prohibited on volatiles.  A lot of people really don't like volatile,

> > and would like to eliminate it, and these people also aren't very happy

> > about any attempt to better define volatile.

> > 

> > Keeps things entertaining.  ;-)

> > 

> > 							Thanx, Paul

> > 

> >> Thoughts?

> >>

> >>             Thanks, Akira

> >>

> >>>

> >>> 							Thanx, Paul

> >>>

> >>>

> >>>>            Thanks, Akira

> >>>>

> >>>>>

> >>>>> I don't happen to like that answer, by the way.  ;-)

> >>>>>

> >>>>> 							Thanx, Paul

> >>>>>

> >>>>>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html

> >>>>>>>

> >>>>>>

> >>>>>> Great material to wake up mind in the morning! Thanks.

> >>>>>>

> >>>>>> Regards,

> >>>>>> Boqun

> >>>>>>

> >>>>>>> What are your thoughts on this?

> >>>>>>>

> >>>>>>> 							Thanx, Paul

> >>>>>>>

> >>>>>>>>             Thanks, Akira

> >>>>>>>>

> >>>>>>>>> That said, I very much welcome critical reviews of memory-barriers.txt,

> >>>>>>>>> so please do feel free to continue doing that!

> >>>>>>>>>

> >>>>>>>>> 							Thanx, Paul

> >>>>>>>>>

> >>>>>>>>>

> >>>>>>>>

> >>>>>>>

> >>>>>>

> >>>>>

> >>>>>

> >>>>

> >>>

> >>>

> >>

> > 

> > 

>

Boqun Feng July 21, 2017, 12:24 a.m. UTC | #14

On Thu, Jul 20, 2017 at 04:07:14PM -0700, Paul E. McKenney wrote:
[...]
> > 

> > So if I respin the patch with the extern, would you still feel reluctant?

> 

> Yes, because I am not seeing how this change helps.  What is this telling

> the reader that the original did not, and how does it help the reader

> generate good concurrent code?

> 

One thing I think we probably should do is to make READ_ONCE() semantics
more clear, i.e. call it out that in our conceptual model for kernel
programming we always rely on the compiler to be serious about the
return value of READ_ONCE(). I didn't find the comment before
READ_ONCE() or memory-barriers.txt talking about something similar.

Or am I the only one having this kinda semantics guarantee in mind?

Regards,
Boqun

> 							Thanx, Paul

>

Paul E. McKenney July 21, 2017, 4:31 p.m. UTC | #15

On Fri, Jul 21, 2017 at 08:24:40AM +0800, Boqun Feng wrote:
> On Thu, Jul 20, 2017 at 04:07:14PM -0700, Paul E. McKenney wrote:

> [...]

> > > 

> > > So if I respin the patch with the extern, would you still feel reluctant?

> > 

> > Yes, because I am not seeing how this change helps.  What is this telling

> > the reader that the original did not, and how does it help the reader

> > generate good concurrent code?

> 

> One thing I think we probably should do is to make READ_ONCE() semantics

> more clear, i.e. call it out that in our conceptual model for kernel

> programming we always rely on the compiler to be serious about the

> return value of READ_ONCE(). I didn't find the comment before

> READ_ONCE() or memory-barriers.txt talking about something similar.

> 

> Or am I the only one having this kinda semantics guarantee in mind?

That sounds like a good idea to me.  The C++ standards committee might
be reluctant to provide a good definition of "volatile", but we can at
least provide a definition of our own.  It should be possible to leverage
the need for "volatile" to support MMIO accesses, as there is general
agreement that this was and still is its main purpose.  Given those
guarantees, there is no reason we cannot apply them in other situations,
for example, interacting with irq handlers and preventing inappropriate
optimizations for concurrent code.

Even better, with tests to find compiler bugs in this area!

							Thanx, Paul

Akira Yokosawa July 21, 2017, 11:38 p.m. UTC | #16

On 2017/07/20 16:07:14 -0700, Paul E. McKenney wrote:
> On Fri, Jul 21, 2017 at 07:52:03AM +0900, Akira Yokosawa wrote:

>> On 2017/07/20 14:42:34 -0700, Paul E. McKenney wrote:

[...]
>>> For the compilers I know about at the present time, yes.

>>

>> So if I respin the patch with the extern, would you still feel reluctant?

> 

> Yes, because I am not seeing how this change helps.  What is this telling

> the reader that the original did not, and how does it help the reader

> generate good concurrent code?


Well, what bothers me in the ">" version of two-CPU example is the
explanation in memory-barriers.txt that follows:

> These two examples are the LB and WWC litmus tests from this paper:

> http://www.cl.cam.ac.uk/users/pes20/ppc-supplemental/test6.pdf and this

> site: https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html.


I'm wondering if calling the ">" version as an "LB" litmus test is correct.
Because it always results in "r1 == 0 && r2 == 0", 100%.

An LB litmus test with full memory barriers would be:

	CPU 0                     CPU 1
	=======================   =======================
	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
	smp_mb();                 smp_mb();
	WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

	assert(!(r1 == 1 && r2 == 1));

and this will result in either of

        r1 == 0 && r2 == 0
        r1 == 0 && r2 == 1
        r1 == 1 && r2 == 0

but never "r1 == 1 && r2 == 1".

The difference in the behavior distracts me in reading this part
of memory-barriers.txt.

Your priority seemed to be in reducing the chance of the "if" statement
to be optimized away.  So I suggested to use "extern" as a compromise.

Another way would be to express the ">=" version in a pseudo-asm form.

	CPU 0                     CPU 1
	=======================   =======================
	r1 = LOAD x               r2 = LOAD y
	if (r1 >= 0)              if (r2 >= 0)
	  STORE y = 1               STORE x = 1

	assert(!(r1 == 1 && r2 == 1));

This should eliminate any concern of compiler optimization.
In this final part of CONTROL DEPENDENCIES section, separating the
problem of optimization and transitivity would clarify the point
(at least for me).

Thoughts?

            Regards, Akira

> 

> 							Thanx, Paul

> 

>>              Regards, Akira

>>

>>>

>>> The underlying problem is that the standard says almost nothing about what

>>> volatile does.  I usually argue that it was intended to be used for MMIO,

>>> so any optimization that would prevent a device driver from working must

>>> be prohibited on volatiles.  A lot of people really don't like volatile,

>>> and would like to eliminate it, and these people also aren't very happy

>>> about any attempt to better define volatile.

>>>

>>> Keeps things entertaining.  ;-)

>>>

>>> 							Thanx, Paul

>>>

Paul E. McKenney July 23, 2017, 4:43 a.m. UTC | #17

On Sat, Jul 22, 2017 at 08:38:57AM +0900, Akira Yokosawa wrote:
> On 2017/07/20 16:07:14 -0700, Paul E. McKenney wrote:

> > On Fri, Jul 21, 2017 at 07:52:03AM +0900, Akira Yokosawa wrote:

> >> On 2017/07/20 14:42:34 -0700, Paul E. McKenney wrote:

> [...]

> >>> For the compilers I know about at the present time, yes.

> >>

> >> So if I respin the patch with the extern, would you still feel reluctant?

> > 

> > Yes, because I am not seeing how this change helps.  What is this telling

> > the reader that the original did not, and how does it help the reader

> > generate good concurrent code?

> 

> Well, what bothers me in the ">" version of two-CPU example is the

> explanation in memory-barriers.txt that follows:

> 

> > These two examples are the LB and WWC litmus tests from this paper:

> > http://www.cl.cam.ac.uk/users/pes20/ppc-supplemental/test6.pdf and this

> > site: https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html.

> 

> I'm wondering if calling the ">" version as an "LB" litmus test is correct.

> Because it always results in "r1 == 0 && r2 == 0", 100%.


As it should, because nothing can become non-zero unless something was
already non-zero.  It is possible to create similarly single-outcome
tests with address dependencies.

But yes, converting to ">=" makes the stores unconditional, and thus
allows more outcomes.  Perhaps we need both?  Another approach is to
write a second value in an "else" statement, keeping the original ">".

I agree that some of these examples can be a bit hard on one's intuition,
but the plain fact is that when it comes to memory models, one's intuition
is not always one's friend.

> An LB litmus test with full memory barriers would be:

> 

> 	CPU 0                     CPU 1

> 	=======================   =======================

> 	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);

> 	smp_mb();                 smp_mb();

> 	WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

> 

> 	assert(!(r1 == 1 && r2 == 1));

> 

> and this will result in either of

> 

>         r1 == 0 && r2 == 0

>         r1 == 0 && r2 == 1

>         r1 == 1 && r2 == 0

> 

> but never "r1 == 1 && r2 == 1".


Agreed, because unlike the control-dependency example, the WRITE_ONCE()s
happen unconditionally.

> The difference in the behavior distracts me in reading this part

> of memory-barriers.txt.


Then it likely needs more explanation.

> Your priority seemed to be in reducing the chance of the "if" statement

> to be optimized away.  So I suggested to use "extern" as a compromise.


If the various tools accept the "extern", this might not be a bad thing
to do.

But what this really means is that I need to take another tilt at
the "volatile" windmill in the committee.

> Another way would be to express the ">=" version in a pseudo-asm form.

> 

> 	CPU 0                     CPU 1

> 	=======================   =======================

> 	r1 = LOAD x               r2 = LOAD y

> 	if (r1 >= 0)              if (r2 >= 0)

> 	  STORE y = 1               STORE x = 1

> 

> 	assert(!(r1 == 1 && r2 == 1));

> 

> This should eliminate any concern of compiler optimization.

> In this final part of CONTROL DEPENDENCIES section, separating the

> problem of optimization and transitivity would clarify the point

> (at least for me).


The problem is that people really do use C-language control dependencies
in the Linux kernel, so we need to describe them.  Maybe someday it
will be necessary to convert them to asm, but I am hoping that we can
avoid that.

> Thoughts?


My hope is that the memory model can help here, but that will in any
case take time.

							Thanx, Paul

>             Regards, Akira

> 

> > 

> > 							Thanx, Paul

> > 

> >>              Regards, Akira

> >>

> >>>

> >>> The underlying problem is that the standard says almost nothing about what

> >>> volatile does.  I usually argue that it was intended to be used for MMIO,

> >>> so any optimization that would prevent a device driver from working must

> >>> be prohibited on volatiles.  A lot of people really don't like volatile,

> >>> and would like to eliminate it, and these people also aren't very happy

> >>> about any attempt to better define volatile.

> >>>

> >>> Keeps things entertaining.  ;-)

> >>>

> >>> 							Thanx, Paul

> >>>

>

Boqun Feng July 23, 2017, 3:39 p.m. UTC | #18

On Sat, Jul 22, 2017 at 09:43:00PM -0700, Paul E. McKenney wrote:
[...]
> > Your priority seemed to be in reducing the chance of the "if" statement

> > to be optimized away.  So I suggested to use "extern" as a compromise.

> 

Hi Akira,

The problem is that, such a compromise doesn't help *developers* write
good concurrent code. The document should serve as a reference book for
the developers, and with the compromise you suggest, the developers will
possibly add "extern" to their shared variables. This is not only
unrealistic but also wrong, because "extern" means external for
translation units(compiling units), not external for execution
units(CPUs).

And as I said, the proper semantics of READ_ONCE() should work well
without using "extern", if we find a 'volatile' load doesn't work, we
can find another way (writing in asm or use asm volatile("" : "+m"(var));
to indicate @var changed). And the compromise just changes the
semantics... To me, it's not worth changing the semantics because the
implementation might be broken in the feature ;-)

> If the various tools accept the "extern", this might not be a bad thing

> to do.

> 

> But what this really means is that I need to take another tilt at

> the "volatile" windmill in the committee.

> 

> > Another way would be to express the ">=" version in a pseudo-asm form.

> > 

> > 	CPU 0                     CPU 1

> > 	=======================   =======================

> > 	r1 = LOAD x               r2 = LOAD y

> > 	if (r1 >= 0)              if (r2 >= 0)

> > 	  STORE y = 1               STORE x = 1

> > 

> > 	assert(!(r1 == 1 && r2 == 1));

> > 

> > This should eliminate any concern of compiler optimization.

> > In this final part of CONTROL DEPENDENCIES section, separating the

> > problem of optimization and transitivity would clarify the point

> > (at least for me).

> 

> The problem is that people really do use C-language control dependencies

> in the Linux kernel, so we need to describe them.  Maybe someday it

> will be necessary to convert them to asm, but I am hoping that we can

> avoid that.

> 

> > Thoughts?

> 

> My hope is that the memory model can help here, but that will in any

> case take time.

Hi Paul,

I add some comments for READ_ONCE() to emphasize compilers should honor
the return value, in the future, we may need a separate document for the
use/definition of volatile in kernel, but I think the comment of
READ_ONCE() is good enough now?

Regards,
Boqun

----------------->8
Subject: [PATCH] kernel: Emphasize the return value of READ_ONCE() is honored

READ_ONCE() is used around in kernel to provide a control dependency,
and to make the control dependency valid, we must 1) make the load of
READ_ONCE() actually happen and 2) make sure compilers take the return
value of READ_ONCE() serious. 1) is already done and commented,
and in current implementation, 2) is also considered done in the
same way as 1): a 'volatile' load.

Whereas, Akira Yokosawa recently reported a problem that would be
triggered if 2) is not achieved. Moreover, according to Paul Mckenney,
using volatile might not actually give us what we want for 2) depending
on compiler writers' definition of 'volatile'. Therefore it's necessary
to emphasize 2) as a part of the semantics of READ_ONCE(), this not only
fits the conceptual semantics we have been using, but also makes the
implementation requirement more accurate.

In the future, we can either make compiler writers accept our use of
'volatile', or(if that fails) find another way to provide this
guarantee.

Cc: Akira Yokosawa <akiyks@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>

---
 include/linux/compiler.h | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

-- 
2.13.3diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 219f82f3ec1a..8094f594427c 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -305,6 +305,31 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s
  * mutilate accesses that either do not require ordering or that interact
  * with an explicit memory barrier or atomic instruction that provides the
  * required ordering.
+ *
+ * The return value of READ_ONCE() should be honored by compilers, IOW,
+ * compilers must treat the return value of READ_ONCE() as an unknown value at
+ * compile time, i.e. no optimization should be done based on the value of a
+ * READ_ONCE(). For example, the following code snippet:
+ *
+ * 	int a = 0;
+ * 	int x = 0;
+ *
+ * 	void some_func() {
+ * 		int t = READ_ONCE(a);
+ * 		if (!t)
+ * 			WRITE_ONCE(x, 1);
+ * 	}
+ *
+ * , should never be optimized as:
+ *
+ * 	void some_func() {
+ * 		WRITE_ONCE(x, 1);
+ * 	}
+ *
+ * because the compiler is 'smart' enough to think the value of 'a' is never
+ * changed.
+ *
+ * We provide this guarantee by making READ_ONCE() a *volatile* load.
  */

 #define __READ_ONCE(x, check)						\

Akira Yokosawa July 24, 2017, 12:04 a.m. UTC | #19

On 2017/07/23 23:39:36 +0800, Boqun Feng wrote:
> On Sat, Jul 22, 2017 at 09:43:00PM -0700, Paul E. McKenney wrote:

> [...]

>>> Your priority seemed to be in reducing the chance of the "if" statement

>>> to be optimized away.  So I suggested to use "extern" as a compromise.

>>

> 

> Hi Akira,

> 


Hi Boqun,

> The problem is that, such a compromise doesn't help *developers* write

> good concurrent code. The document should serve as a reference book for

> the developers, and with the compromise you suggest, the developers will

> possibly add "extern" to their shared variables. This is not only

> unrealistic but also wrong, because "extern" means external for

> translation units(compiling units), not external for execution

> units(CPUs).


Yes, I suggested it regarding the situation when the tiny litmus test
is compiled in a translation unit. Also it might not be effective once
link time optimization becomes "smart" enough.

And I agree it was not appropriate for memory-barriers.txt.

> 

> And as I said, the proper semantics of READ_ONCE() should work well

> without using "extern", if we find a 'volatile' load doesn't work, we

> can find another way (writing in asm or use asm volatile("" : "+m"(var));

> to indicate @var changed). And the compromise just changes the

> semantics... To me, it's not worth changing the semantics because the

> implementation might be broken in the feature ;-)


I agree.

> 

> 

>> If the various tools accept the "extern", this might not be a bad thing

>> to do.

>>

>> But what this really means is that I need to take another tilt at

>> the "volatile" windmill in the committee.

>>

>>> Another way would be to express the ">=" version in a pseudo-asm form.

>>>

>>> 	CPU 0                     CPU 1

>>> 	=======================   =======================

>>> 	r1 = LOAD x               r2 = LOAD y

>>> 	if (r1 >= 0)              if (r2 >= 0)

>>> 	  STORE y = 1               STORE x = 1

>>>

>>> 	assert(!(r1 == 1 && r2 == 1));

>>>

>>> This should eliminate any concern of compiler optimization.

>>> In this final part of CONTROL DEPENDENCIES section, separating the

>>> problem of optimization and transitivity would clarify the point

>>> (at least for me).

>>

>> The problem is that people really do use C-language control dependencies

>> in the Linux kernel, so we need to describe them.  Maybe someday it

>> will be necessary to convert them to asm, but I am hoping that we can

>> avoid that.

>>

>>> Thoughts?

>>

>> My hope is that the memory model can help here, but that will in any

>> case take time.

> 

> Hi Paul,

> 

> I add some comments for READ_ONCE() to emphasize compilers should honor

> the return value, in the future, we may need a separate document for the

> use/definition of volatile in kernel, but I think the comment of

> READ_ONCE() is good enough now?

> 

> Regards,

> Boqun

> 

> ----------------->8

> Subject: [PATCH] kernel: Emphasize the return value of READ_ONCE() is honored

> 

> READ_ONCE() is used around in kernel to provide a control dependency,

> and to make the control dependency valid, we must 1) make the load of

> READ_ONCE() actually happen and 2) make sure compilers take the return

> value of READ_ONCE() serious. 1) is already done and commented,

> and in current implementation, 2) is also considered done in the

> same way as 1): a 'volatile' load.

> 

> Whereas, Akira Yokosawa recently reported a problem that would be

> triggered if 2) is not achieved. 


To clarity the timeline, it was Paul who pointed out it would become
easier for compilers to optimize away the "if" statements in response
to my suggestion of partial revert (">" -> ">=").

>                                  Moreover, according to Paul Mckenney,

> using volatile might not actually give us what we want for 2) depending

> on compiler writers' definition of 'volatile'. Therefore it's necessary

> to emphasize 2) as a part of the semantics of READ_ONCE(), this not only

> fits the conceptual semantics we have been using, but also makes the

> implementation requirement more accurate.

> 

> In the future, we can either make compiler writers accept our use of

> 'volatile', or(if that fails) find another way to provide this

> guarantee.

> 

> Cc: Akira Yokosawa <akiyks@gmail.com>

> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>

> ---

>  include/linux/compiler.h | 25 +++++++++++++++++++++++++

>  1 file changed, 25 insertions(+)

> 

> diff --git a/include/linux/compiler.h b/include/linux/compiler.h

> index 219f82f3ec1a..8094f594427c 100644

> --- a/include/linux/compiler.h

> +++ b/include/linux/compiler.h

> @@ -305,6 +305,31 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s

>   * mutilate accesses that either do not require ordering or that interact

>   * with an explicit memory barrier or atomic instruction that provides the

>   * required ordering.

> + *

> + * The return value of READ_ONCE() should be honored by compilers, IOW,

> + * compilers must treat the return value of READ_ONCE() as an unknown value at

> + * compile time, i.e. no optimization should be done based on the value of a

> + * READ_ONCE(). For example, the following code snippet:

> + *

> + * 	int a = 0;

> + * 	int x = 0;

> + *

> + * 	void some_func() {

> + * 		int t = READ_ONCE(a);

> + * 		if (!t)

> + * 			WRITE_ONCE(x, 1);

> + * 	}

> + *

> + * , should never be optimized as:

> + *

> + * 	void some_func() {

> + * 		WRITE_ONCE(x, 1);

> + * 	}

READ_ONCE() should still be honored. so maybe the following?

+ * , should never be optimized as:
+ *
+ *	void some_func() {
+ *		int t = READ_ONCE(a);
+ *		WRITE_ONCE(x, 1);
+ *	}

         Thanks, Akira

> + *

> + * because the compiler is 'smart' enough to think the value of 'a' is never

> + * changed.

> + *

> + * We provide this guarantee by making READ_ONCE() a *volatile* load.

>   */

>  

>  #define __READ_ONCE(x, check)						\

>

Paul E. McKenney July 24, 2017, 4:36 a.m. UTC | #20

On Mon, Jul 24, 2017 at 09:04:57AM +0900, Akira Yokosawa wrote:
> On 2017/07/23 23:39:36 +0800, Boqun Feng wrote:

> > On Sat, Jul 22, 2017 at 09:43:00PM -0700, Paul E. McKenney wrote:

> > [...]

> >>> Your priority seemed to be in reducing the chance of the "if" statement

> >>> to be optimized away.  So I suggested to use "extern" as a compromise.

> >>

> > 

> > Hi Akira,

> > 

> 

> Hi Boqun,

> 

> > The problem is that, such a compromise doesn't help *developers* write

> > good concurrent code. The document should serve as a reference book for

> > the developers, and with the compromise you suggest, the developers will

> > possibly add "extern" to their shared variables. This is not only

> > unrealistic but also wrong, because "extern" means external for

> > translation units(compiling units), not external for execution

> > units(CPUs).

> 

> Yes, I suggested it regarding the situation when the tiny litmus test

> is compiled in a translation unit. Also it might not be effective once

> link time optimization becomes "smart" enough.

> 

> And I agree it was not appropriate for memory-barriers.txt.

> 

> > 

> > And as I said, the proper semantics of READ_ONCE() should work well

> > without using "extern", if we find a 'volatile' load doesn't work, we

> > can find another way (writing in asm or use asm volatile("" : "+m"(var));

> > to indicate @var changed). And the compromise just changes the

> > semantics... To me, it's not worth changing the semantics because the

> > implementation might be broken in the feature ;-)

> 

> I agree.

> 

> > 

> > 

> >> If the various tools accept the "extern", this might not be a bad thing

> >> to do.

> >>

> >> But what this really means is that I need to take another tilt at

> >> the "volatile" windmill in the committee.

> >>

> >>> Another way would be to express the ">=" version in a pseudo-asm form.

> >>>

> >>> 	CPU 0                     CPU 1

> >>> 	=======================   =======================

> >>> 	r1 = LOAD x               r2 = LOAD y

> >>> 	if (r1 >= 0)              if (r2 >= 0)

> >>> 	  STORE y = 1               STORE x = 1

> >>>

> >>> 	assert(!(r1 == 1 && r2 == 1));

> >>>

> >>> This should eliminate any concern of compiler optimization.

> >>> In this final part of CONTROL DEPENDENCIES section, separating the

> >>> problem of optimization and transitivity would clarify the point

> >>> (at least for me).

> >>

> >> The problem is that people really do use C-language control dependencies

> >> in the Linux kernel, so we need to describe them.  Maybe someday it

> >> will be necessary to convert them to asm, but I am hoping that we can

> >> avoid that.

> >>

> >>> Thoughts?

> >>

> >> My hope is that the memory model can help here, but that will in any

> >> case take time.

> > 

> > Hi Paul,

> > 

> > I add some comments for READ_ONCE() to emphasize compilers should honor

> > the return value, in the future, we may need a separate document for the

> > use/definition of volatile in kernel, but I think the comment of

> > READ_ONCE() is good enough now?

> > 

> > Regards,

> > Boqun

> > 

> > ----------------->8

> > Subject: [PATCH] kernel: Emphasize the return value of READ_ONCE() is honored

> > 

> > READ_ONCE() is used around in kernel to provide a control dependency,

> > and to make the control dependency valid, we must 1) make the load of

> > READ_ONCE() actually happen and 2) make sure compilers take the return

> > value of READ_ONCE() serious. 1) is already done and commented,

> > and in current implementation, 2) is also considered done in the

> > same way as 1): a 'volatile' load.

> > 

> > Whereas, Akira Yokosawa recently reported a problem that would be

> > triggered if 2) is not achieved. 

> 

> To clarity the timeline, it was Paul who pointed out it would become

> easier for compilers to optimize away the "if" statements in response

> to my suggestion of partial revert (">" -> ">=").


Indeed I did.

And if nothing else, this discussion convinced me that I should push
harder on volatile.  It would be nice if we had more of a guarantee!

							Thanx, Paul

> >                                  Moreover, according to Paul Mckenney,

> > using volatile might not actually give us what we want for 2) depending

> > on compiler writers' definition of 'volatile'. Therefore it's necessary

> > to emphasize 2) as a part of the semantics of READ_ONCE(), this not only

> > fits the conceptual semantics we have been using, but also makes the

> > implementation requirement more accurate.

> > 

> > In the future, we can either make compiler writers accept our use of

> > 'volatile', or(if that fails) find another way to provide this

> > guarantee.

> > 

> > Cc: Akira Yokosawa <akiyks@gmail.com>

> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>

> > ---

> >  include/linux/compiler.h | 25 +++++++++++++++++++++++++

> >  1 file changed, 25 insertions(+)

> > 

> > diff --git a/include/linux/compiler.h b/include/linux/compiler.h

> > index 219f82f3ec1a..8094f594427c 100644

> > --- a/include/linux/compiler.h

> > +++ b/include/linux/compiler.h

> > @@ -305,6 +305,31 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s

> >   * mutilate accesses that either do not require ordering or that interact

> >   * with an explicit memory barrier or atomic instruction that provides the

> >   * required ordering.

> > + *

> > + * The return value of READ_ONCE() should be honored by compilers, IOW,

> > + * compilers must treat the return value of READ_ONCE() as an unknown value at

> > + * compile time, i.e. no optimization should be done based on the value of a

> > + * READ_ONCE(). For example, the following code snippet:

> > + *

> > + * 	int a = 0;

> > + * 	int x = 0;

> > + *

> > + * 	void some_func() {

> > + * 		int t = READ_ONCE(a);

> > + * 		if (!t)

> > + * 			WRITE_ONCE(x, 1);

> > + * 	}

> > + *

> > + * , should never be optimized as:

> > + *

> > + * 	void some_func() {

> > + * 		WRITE_ONCE(x, 1);

> > + * 	}

> READ_ONCE() should still be honored. so maybe the following?

> 

> + * , should never be optimized as:

> + *

> + *	void some_func() {

> + *		int t = READ_ONCE(a);

> + *		WRITE_ONCE(x, 1);

> + *	}

> 

>          Thanks, Akira

> 

> > + *

> > + * because the compiler is 'smart' enough to think the value of 'a' is never

> > + * changed.

> > + *

> > + * We provide this guarantee by making READ_ONCE() a *volatile* load.

> >   */

> >  

> >  #define __READ_ONCE(x, check)						\

> > 

>

Boqun Feng July 24, 2017, 6:34 a.m. UTC | #21

On Mon, Jul 24, 2017 at 09:04:57AM +0900, Akira Yokosawa wrote:
[...]
> > 

> > ----------------->8

> > Subject: [PATCH] kernel: Emphasize the return value of READ_ONCE() is honored

> > 

> > READ_ONCE() is used around in kernel to provide a control dependency,

> > and to make the control dependency valid, we must 1) make the load of

> > READ_ONCE() actually happen and 2) make sure compilers take the return

> > value of READ_ONCE() serious. 1) is already done and commented,

> > and in current implementation, 2) is also considered done in the

> > same way as 1): a 'volatile' load.

> > 

> > Whereas, Akira Yokosawa recently reported a problem that would be

> > triggered if 2) is not achieved. 

> 

> To clarity the timeline, it was Paul who pointed out it would become

> easier for compilers to optimize away the "if" statements in response

> to my suggestion of partial revert (">" -> ">=").

> 


Ah.. right, I missed that part. I will use proper sentences here like:

	During a recent discussion brought up by Akira Yokosawa on
	memory-barriers.txt, a problem is discovered, which would be
	triggered if 2) is not achieved.

Works with you?

> >                                  Moreover, according to Paul Mckenney,

> > using volatile might not actually give us what we want for 2) depending

> > on compiler writers' definition of 'volatile'. Therefore it's necessary

> > to emphasize 2) as a part of the semantics of READ_ONCE(), this not only

> > fits the conceptual semantics we have been using, but also makes the

> > implementation requirement more accurate.

> > 

> > In the future, we can either make compiler writers accept our use of

> > 'volatile', or(if that fails) find another way to provide this

> > guarantee.

> > 

> > Cc: Akira Yokosawa <akiyks@gmail.com>

> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>

> > ---

> >  include/linux/compiler.h | 25 +++++++++++++++++++++++++

> >  1 file changed, 25 insertions(+)

> > 

> > diff --git a/include/linux/compiler.h b/include/linux/compiler.h

> > index 219f82f3ec1a..8094f594427c 100644

> > --- a/include/linux/compiler.h

> > +++ b/include/linux/compiler.h

> > @@ -305,6 +305,31 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s

> >   * mutilate accesses that either do not require ordering or that interact

> >   * with an explicit memory barrier or atomic instruction that provides the

> >   * required ordering.

> > + *

> > + * The return value of READ_ONCE() should be honored by compilers, IOW,

> > + * compilers must treat the return value of READ_ONCE() as an unknown value at

> > + * compile time, i.e. no optimization should be done based on the value of a

> > + * READ_ONCE(). For example, the following code snippet:

> > + *

> > + * 	int a = 0;

> > + * 	int x = 0;

> > + *

> > + * 	void some_func() {

> > + * 		int t = READ_ONCE(a);

> > + * 		if (!t)

> > + * 			WRITE_ONCE(x, 1);

> > + * 	}

> > + *

> > + * , should never be optimized as:

> > + *

> > + * 	void some_func() {

> > + * 		WRITE_ONCE(x, 1);

> > + * 	}

> READ_ONCE() should still be honored. so maybe the following?

> 


Make sense. Thanks!

Regaords,
Boqun

> + * , should never be optimized as:

> + *

> + *	void some_func() {

> + *		int t = READ_ONCE(a);

> + *		WRITE_ONCE(x, 1);

> + *	}

> 

>          Thanks, Akira

> 

> > + *

> > + * because the compiler is 'smart' enough to think the value of 'a' is never

> > + * changed.

> > + *

> > + * We provide this guarantee by making READ_ONCE() a *volatile* load.

> >   */

> >  

> >  #define __READ_ONCE(x, check)						\

> > 

>

Akira Yokosawa July 24, 2017, 10:47 a.m. UTC | #22

On 2017/07/24 14:34:07 +0800, Boqun Feng wrote:
> On Mon, Jul 24, 2017 at 09:04:57AM +0900, Akira Yokosawa wrote:

> [...]

>>>

>>> ----------------->8

>>> Subject: [PATCH] kernel: Emphasize the return value of READ_ONCE() is honored

>>>

>>> READ_ONCE() is used around in kernel to provide a control dependency,

>>> and to make the control dependency valid, we must 1) make the load of

>>> READ_ONCE() actually happen and 2) make sure compilers take the return

>>> value of READ_ONCE() serious. 1) is already done and commented,

>>> and in current implementation, 2) is also considered done in the

>>> same way as 1): a 'volatile' load.

>>>

>>> Whereas, Akira Yokosawa recently reported a problem that would be

>>> triggered if 2) is not achieved. 

>>

>> To clarity the timeline, it was Paul who pointed out it would become

>> easier for compilers to optimize away the "if" statements in response

>> to my suggestion of partial revert (">" -> ">=").

>>

> 

> Ah.. right, I missed that part. I will use proper sentences here like:

> 

> 	During a recent discussion brought up by Akira Yokosawa on

> 	memory-barriers.txt, a problem is discovered, which would be

> 	triggered if 2) is not achieved.

> 

> Works with you?


Looks fine. Thanks!

Akira

> 

>>>                                  Moreover, according to Paul Mckenney,

>>> using volatile might not actually give us what we want for 2) depending

>>> on compiler writers' definition of 'volatile'. Therefore it's necessary

>>> to emphasize 2) as a part of the semantics of READ_ONCE(), this not only

>>> fits the conceptual semantics we have been using, but also makes the

>>> implementation requirement more accurate.

>>>

>>> In the future, we can either make compiler writers accept our use of

>>> 'volatile', or(if that fails) find another way to provide this

>>> guarantee.

>>>

>>> Cc: Akira Yokosawa <akiyks@gmail.com>

>>> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

>>> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>

>>> ---

>>>  include/linux/compiler.h | 25 +++++++++++++++++++++++++

>>>  1 file changed, 25 insertions(+)

>>>

>>> diff --git a/include/linux/compiler.h b/include/linux/compiler.h

>>> index 219f82f3ec1a..8094f594427c 100644

>>> --- a/include/linux/compiler.h

>>> +++ b/include/linux/compiler.h

>>> @@ -305,6 +305,31 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s

>>>   * mutilate accesses that either do not require ordering or that interact

>>>   * with an explicit memory barrier or atomic instruction that provides the

>>>   * required ordering.

>>> + *

>>> + * The return value of READ_ONCE() should be honored by compilers, IOW,

>>> + * compilers must treat the return value of READ_ONCE() as an unknown value at

>>> + * compile time, i.e. no optimization should be done based on the value of a

>>> + * READ_ONCE(). For example, the following code snippet:

>>> + *

>>> + * 	int a = 0;

>>> + * 	int x = 0;

>>> + *

>>> + * 	void some_func() {

>>> + * 		int t = READ_ONCE(a);

>>> + * 		if (!t)

>>> + * 			WRITE_ONCE(x, 1);

>>> + * 	}

>>> + *

>>> + * , should never be optimized as:

>>> + *

>>> + * 	void some_func() {

>>> + * 		WRITE_ONCE(x, 1);

>>> + * 	}

>> READ_ONCE() should still be honored. so maybe the following?

>>

> 

> Make sense. Thanks!

> 

> Regaords,

> Boqun

> 

>> + * , should never be optimized as:

>> + *

>> + *	void some_func() {

>> + *		int t = READ_ONCE(a);

>> + *		WRITE_ONCE(x, 1);

>> + *	}

>>

>>          Thanks, Akira

>>

>>> + *

>>> + * because the compiler is 'smart' enough to think the value of 'a' is never

>>> + * changed.

>>> + *

>>> + * We provide this guarantee by making READ_ONCE() a *volatile* load.

>>>   */

>>>  

>>>  #define __READ_ONCE(x, check)						\

>>>

>>

documentation: Fix two-CPU control-dependency example

Commit Message

Comments

Patch