[1/6] locking/atomic, asm-generic: instrument ordering variants

Message ID 20180504173937.25300-2-mark.rutland@arm.com
State New
Headers show
Series
  • arm64: add instrumented atomics
Related show

Commit Message

Mark Rutland May 4, 2018, 5:39 p.m.
Currently <asm-generic/atomic-instrumented.h> only instruments the fully
ordered variants of atomic functions, ignoring the {relaxed,acquire,release}
ordering variants.

This patch reworks the header to instrument all ordering variants of the atomic
functions, so that architectures implementing these are instrumented
appropriately.

To minimise repetition, a macro is used to generate each variant from a common
template. The {full,relaxed,acquire,release} order variants respectively are
then built using this template, where the architecture provides an
implementation.

To stick to an 80 column limit while keeping the templates legible, the return
type and function name of each template are split over two lines. For
consistency, this is done even when not strictly necessary.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>

Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 include/asm-generic/atomic-instrumented.h | 1195 ++++++++++++++++++++++++-----
 1 file changed, 1008 insertions(+), 187 deletions(-)

-- 
2.11.0

Comments

Peter Zijlstra May 4, 2018, 6:01 p.m. | #1
On Fri, May 04, 2018 at 06:39:32PM +0100, Mark Rutland wrote:
> Currently <asm-generic/atomic-instrumented.h> only instruments the fully

> ordered variants of atomic functions, ignoring the {relaxed,acquire,release}

> ordering variants.

> 

> This patch reworks the header to instrument all ordering variants of the atomic

> functions, so that architectures implementing these are instrumented

> appropriately.

> 

> To minimise repetition, a macro is used to generate each variant from a common

> template. The {full,relaxed,acquire,release} order variants respectively are

> then built using this template, where the architecture provides an

> implementation.

> 

> To stick to an 80 column limit while keeping the templates legible, the return

> type and function name of each template are split over two lines. For

> consistency, this is done even when not strictly necessary.

> 

> Signed-off-by: Mark Rutland <mark.rutland@arm.com>

> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>

> Cc: Boqun Feng <boqun.feng@gmail.com>

> Cc: Dmitry Vyukov <dvyukov@google.com>

> Cc: Ingo Molnar <mingo@kernel.org>

> Cc: Peter Zijlstra <peterz@infradead.org>

> Cc: Will Deacon <will.deacon@arm.com>

> ---

>  include/asm-generic/atomic-instrumented.h | 1195 ++++++++++++++++++++++++-----

>  1 file changed, 1008 insertions(+), 187 deletions(-)


Is there really no way to either generate or further macro compress this?

This is stupid repetitive, we just got rid of all that endless copy
paste crap in atomic implementations and now we're going back to that.

Adding or changing atomic bits becomes horrifically painful because of this.
Mark Rutland May 4, 2018, 6:09 p.m. | #2
On Fri, May 04, 2018 at 08:01:05PM +0200, Peter Zijlstra wrote:
> On Fri, May 04, 2018 at 06:39:32PM +0100, Mark Rutland wrote:

> > Currently <asm-generic/atomic-instrumented.h> only instruments the fully

> > ordered variants of atomic functions, ignoring the {relaxed,acquire,release}

> > ordering variants.

> > 

> > This patch reworks the header to instrument all ordering variants of the atomic

> > functions, so that architectures implementing these are instrumented

> > appropriately.

> > 

> > To minimise repetition, a macro is used to generate each variant from a common

> > template. The {full,relaxed,acquire,release} order variants respectively are

> > then built using this template, where the architecture provides an

> > implementation.


> >  include/asm-generic/atomic-instrumented.h | 1195 ++++++++++++++++++++++++-----

> >  1 file changed, 1008 insertions(+), 187 deletions(-)

> 

> Is there really no way to either generate or further macro compress this?


I can definitely macro compress this somewhat, but the bulk of the
repetition will be the ifdeffery, which can't be macro'd away IIUC.

Generating this with a script is possible -- do we do anything like that
elsewhere?

> This is stupid repetitive, we just got rid of all that endless copy

> paste crap in atomic implementations and now we're going back to that.

> 

> Adding or changing atomic bits becomes horrifically painful because of this.


Sure thing; mangling it to its current state was a pain enough.

Thanks,
Mark.
Peter Zijlstra May 4, 2018, 6:24 p.m. | #3
On Fri, May 04, 2018 at 07:09:09PM +0100, Mark Rutland wrote:
> On Fri, May 04, 2018 at 08:01:05PM +0200, Peter Zijlstra wrote:

> > On Fri, May 04, 2018 at 06:39:32PM +0100, Mark Rutland wrote:


> > >  include/asm-generic/atomic-instrumented.h | 1195 ++++++++++++++++++++++++-----

> > >  1 file changed, 1008 insertions(+), 187 deletions(-)

> > 

> > Is there really no way to either generate or further macro compress this?

> 

> I can definitely macro compress this somewhat, but the bulk of the

> repetition will be the ifdeffery, which can't be macro'd away IIUC.


Right, much like what we already have in linux/atomic.h I suspect,
having to duplicating that isn't brilliant either.

> Generating this with a script is possible -- do we do anything like that

> elsewhere?


There's include/generated/ in your build directory. But nothing on this
scale I think.
Peter Zijlstra May 5, 2018, 8:47 a.m. | #4
On Sat, May 05, 2018 at 10:11:00AM +0200, Ingo Molnar wrote:

> Before:

> 

>  #ifndef atomic_fetch_dec_relaxed

> 

>  #ifndef atomic_fetch_dec

>  #define atomic_fetch_dec(v)	        atomic_fetch_sub(1, (v))

>  #define atomic_fetch_dec_relaxed(v)	atomic_fetch_sub_relaxed(1, (v))

>  #define atomic_fetch_dec_acquire(v)	atomic_fetch_sub_acquire(1, (v))

>  #define atomic_fetch_dec_release(v)	atomic_fetch_sub_release(1, (v))

>  #else /* atomic_fetch_dec */

>  #define atomic_fetch_dec_relaxed	atomic_fetch_dec

>  #define atomic_fetch_dec_acquire	atomic_fetch_dec

>  #define atomic_fetch_dec_release	atomic_fetch_dec

>  #endif /* atomic_fetch_dec */

> 

>  #else /* atomic_fetch_dec_relaxed */

> 

>  #ifndef atomic_fetch_dec_acquire

>  #define atomic_fetch_dec_acquire(...)					\

> 	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

>  #endif

> 

>  #ifndef atomic_fetch_dec_release

>  #define atomic_fetch_dec_release(...)					\

> 	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

>  #endif

> 

>  #ifndef atomic_fetch_dec

>  #define atomic_fetch_dec(...)						\

> 	__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

>  #endif

>  #endif /* atomic_fetch_dec_relaxed */

> 

> After:

> 

>  #ifndef atomic_fetch_dec_relaxed

>  # ifndef atomic_fetch_dec

>  #  define atomic_fetch_dec(v)			atomic_fetch_sub(1, (v))

>  #  define atomic_fetch_dec_relaxed(v)		atomic_fetch_sub_relaxed(1, (v))

>  #  define atomic_fetch_dec_acquire(v)		atomic_fetch_sub_acquire(1, (v))

>  #  define atomic_fetch_dec_release(v)		atomic_fetch_sub_release(1, (v))

>  # else

>  #  define atomic_fetch_dec_relaxed		atomic_fetch_dec

>  #  define atomic_fetch_dec_acquire		atomic_fetch_dec

>  #  define atomic_fetch_dec_release		atomic_fetch_dec

>  # endif

>  #else

>  # ifndef atomic_fetch_dec_acquire

>  #  define atomic_fetch_dec_acquire(...)	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

>  # endif

>  # ifndef atomic_fetch_dec_release

>  #  define atomic_fetch_dec_release(...)	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

>  # endif

>  # ifndef atomic_fetch_dec

>  #  define atomic_fetch_dec(...)		__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

>  # endif

>  #endif

> 

> The new variant is readable at a glance, and the hierarchy of defines is very 

> obvious as well.


It wraps and looks hideous in my normal setup. And I do detest that indent
after # thing.

> And I think we could do even better - there's absolutely no reason why _every_ 

> operation has to be made conditional on a finegrained level - they are overriden 

> in API groups. In fact allowing individual override is arguably a fragility.

> 

> So we could do the following simplification on top of that:

> 

>  #ifndef atomic_fetch_dec_relaxed

>  # ifndef atomic_fetch_dec

>  #  define atomic_fetch_dec(v)			atomic_fetch_sub(1, (v))

>  #  define atomic_fetch_dec_relaxed(v)		atomic_fetch_sub_relaxed(1, (v))

>  #  define atomic_fetch_dec_acquire(v)		atomic_fetch_sub_acquire(1, (v))

>  #  define atomic_fetch_dec_release(v)		atomic_fetch_sub_release(1, (v))

>  # else

>  #  define atomic_fetch_dec_relaxed		atomic_fetch_dec

>  #  define atomic_fetch_dec_acquire		atomic_fetch_dec

>  #  define atomic_fetch_dec_release		atomic_fetch_dec

>  # endif

>  #else

>  # ifndef atomic_fetch_dec

>  #  define atomic_fetch_dec(...)		__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

>  #  define atomic_fetch_dec_acquire(...)	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

>  #  define atomic_fetch_dec_release(...)	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

>  # endif

>  #endif


This would disallow an architecture to override just fetch_dec_release for
instance.

I don't think there currently is any architecture that does that, but the
intent was to allow it to override anything and only provide defaults where it
does not.

None of this takes away the giant trainwreck that is the annotated atomic stuff
though.

And I seriously hate this one:

  ba1c9f83f633 ("locking/atomic/x86: Un-macro-ify atomic ops implementation")

and will likely undo that the moment I need to change anything there.

So no, don't like.
Ingo Molnar May 5, 2018, 9:04 a.m. | #5
* Peter Zijlstra <peterz@infradead.org> wrote:

> > So we could do the following simplification on top of that:

> > 

> >  #ifndef atomic_fetch_dec_relaxed

> >  # ifndef atomic_fetch_dec

> >  #  define atomic_fetch_dec(v)		atomic_fetch_sub(1, (v))

> >  #  define atomic_fetch_dec_relaxed(v)	atomic_fetch_sub_relaxed(1, (v))

> >  #  define atomic_fetch_dec_acquire(v)	atomic_fetch_sub_acquire(1, (v))

> >  #  define atomic_fetch_dec_release(v)	atomic_fetch_sub_release(1, (v))

> >  # else

> >  #  define atomic_fetch_dec_relaxed		atomic_fetch_dec

> >  #  define atomic_fetch_dec_acquire		atomic_fetch_dec

> >  #  define atomic_fetch_dec_release		atomic_fetch_dec

> >  # endif

> >  #else

> >  # ifndef atomic_fetch_dec

> >  #  define atomic_fetch_dec(...)		__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

> >  #  define atomic_fetch_dec_acquire(...)	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

> >  #  define atomic_fetch_dec_release(...)	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

> >  # endif

> >  #endif

> 

> This would disallow an architecture to override just fetch_dec_release for

> instance.


Couldn't such a crazy arch just define _all_ the 3 APIs in this group?
That's really a small price and makes the place pay the complexity
price that does the weirdness...

> I don't think there currently is any architecture that does that, but the

> intent was to allow it to override anything and only provide defaults where it

> does not.


I'd argue that if a new arch only defines one of these APIs that's probably a bug. 
If they absolutely want to do it, they still can - by defining all 3 APIs.

So there's no loss in arch flexibility.

> None of this takes away the giant trainwreck that is the annotated atomic stuff

> though.

> 

> And I seriously hate this one:

> 

>   ba1c9f83f633 ("locking/atomic/x86: Un-macro-ify atomic ops implementation")

> 

> and will likely undo that the moment I need to change anything there.


If it makes the code more readable then I don't object - the problem was that the 
instrumentation indirection made all that code much harder to follow.

Thanks,

	Ingo
Dmitry Vyukov May 5, 2018, 9:05 a.m. | #6
On Sat, May 5, 2018 at 10:47 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Sat, May 05, 2018 at 10:11:00AM +0200, Ingo Molnar wrote:

>

>> Before:

>>

>>  #ifndef atomic_fetch_dec_relaxed

>>

>>  #ifndef atomic_fetch_dec

>>  #define atomic_fetch_dec(v)          atomic_fetch_sub(1, (v))

>>  #define atomic_fetch_dec_relaxed(v)  atomic_fetch_sub_relaxed(1, (v))

>>  #define atomic_fetch_dec_acquire(v)  atomic_fetch_sub_acquire(1, (v))

>>  #define atomic_fetch_dec_release(v)  atomic_fetch_sub_release(1, (v))

>>  #else /* atomic_fetch_dec */

>>  #define atomic_fetch_dec_relaxed     atomic_fetch_dec

>>  #define atomic_fetch_dec_acquire     atomic_fetch_dec

>>  #define atomic_fetch_dec_release     atomic_fetch_dec

>>  #endif /* atomic_fetch_dec */

>>

>>  #else /* atomic_fetch_dec_relaxed */

>>

>>  #ifndef atomic_fetch_dec_acquire

>>  #define atomic_fetch_dec_acquire(...)                                        \

>>       __atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

>>  #endif

>>

>>  #ifndef atomic_fetch_dec_release

>>  #define atomic_fetch_dec_release(...)                                        \

>>       __atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

>>  #endif

>>

>>  #ifndef atomic_fetch_dec

>>  #define atomic_fetch_dec(...)                                                \

>>       __atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

>>  #endif

>>  #endif /* atomic_fetch_dec_relaxed */

>>

>> After:

>>

>>  #ifndef atomic_fetch_dec_relaxed

>>  # ifndef atomic_fetch_dec

>>  #  define atomic_fetch_dec(v)                        atomic_fetch_sub(1, (v))

>>  #  define atomic_fetch_dec_relaxed(v)                atomic_fetch_sub_relaxed(1, (v))

>>  #  define atomic_fetch_dec_acquire(v)                atomic_fetch_sub_acquire(1, (v))

>>  #  define atomic_fetch_dec_release(v)                atomic_fetch_sub_release(1, (v))

>>  # else

>>  #  define atomic_fetch_dec_relaxed           atomic_fetch_dec

>>  #  define atomic_fetch_dec_acquire           atomic_fetch_dec

>>  #  define atomic_fetch_dec_release           atomic_fetch_dec

>>  # endif

>>  #else

>>  # ifndef atomic_fetch_dec_acquire

>>  #  define atomic_fetch_dec_acquire(...)      __atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

>>  # endif

>>  # ifndef atomic_fetch_dec_release

>>  #  define atomic_fetch_dec_release(...)      __atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

>>  # endif

>>  # ifndef atomic_fetch_dec

>>  #  define atomic_fetch_dec(...)              __atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

>>  # endif

>>  #endif

>>

>> The new variant is readable at a glance, and the hierarchy of defines is very

>> obvious as well.

>

> It wraps and looks hideous in my normal setup. And I do detest that indent

> after # thing.

>

>> And I think we could do even better - there's absolutely no reason why _every_

>> operation has to be made conditional on a finegrained level - they are overriden

>> in API groups. In fact allowing individual override is arguably a fragility.

>>

>> So we could do the following simplification on top of that:

>>

>>  #ifndef atomic_fetch_dec_relaxed

>>  # ifndef atomic_fetch_dec

>>  #  define atomic_fetch_dec(v)                        atomic_fetch_sub(1, (v))

>>  #  define atomic_fetch_dec_relaxed(v)                atomic_fetch_sub_relaxed(1, (v))

>>  #  define atomic_fetch_dec_acquire(v)                atomic_fetch_sub_acquire(1, (v))

>>  #  define atomic_fetch_dec_release(v)                atomic_fetch_sub_release(1, (v))

>>  # else

>>  #  define atomic_fetch_dec_relaxed           atomic_fetch_dec

>>  #  define atomic_fetch_dec_acquire           atomic_fetch_dec

>>  #  define atomic_fetch_dec_release           atomic_fetch_dec

>>  # endif

>>  #else

>>  # ifndef atomic_fetch_dec

>>  #  define atomic_fetch_dec(...)              __atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

>>  #  define atomic_fetch_dec_acquire(...)      __atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

>>  #  define atomic_fetch_dec_release(...)      __atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

>>  # endif

>>  #endif

>

> This would disallow an architecture to override just fetch_dec_release for

> instance.

>

> I don't think there currently is any architecture that does that, but the

> intent was to allow it to override anything and only provide defaults where it

> does not.

>

> None of this takes away the giant trainwreck that is the annotated atomic stuff

> though.

>

> And I seriously hate this one:

>

>   ba1c9f83f633 ("locking/atomic/x86: Un-macro-ify atomic ops implementation")

>

> and will likely undo that the moment I need to change anything there.

>

> So no, don't like.


That was asked by Ingo:
https://groups.google.com/d/msg/kasan-dev/3sNHjjb4GCI/Xz1uVWaaAAAJ

I think in the end all of current options suck in one way or another,
so we are just going in circles.
We either need something different (e.g. codegen), or settle on one
option for doing it.
Ingo Molnar May 5, 2018, 9:09 a.m. | #7
* Peter Zijlstra <peterz@infradead.org> wrote:

> On Sat, May 05, 2018 at 10:11:00AM +0200, Ingo Molnar wrote:

> 

> > Before:

> > 

> >  #ifndef atomic_fetch_dec_relaxed

> > 

> >  #ifndef atomic_fetch_dec

> >  #define atomic_fetch_dec(v)	        atomic_fetch_sub(1, (v))

> >  #define atomic_fetch_dec_relaxed(v)	atomic_fetch_sub_relaxed(1, (v))

> >  #define atomic_fetch_dec_acquire(v)	atomic_fetch_sub_acquire(1, (v))

> >  #define atomic_fetch_dec_release(v)	atomic_fetch_sub_release(1, (v))

> >  #else /* atomic_fetch_dec */

> >  #define atomic_fetch_dec_relaxed	atomic_fetch_dec

> >  #define atomic_fetch_dec_acquire	atomic_fetch_dec

> >  #define atomic_fetch_dec_release	atomic_fetch_dec

> >  #endif /* atomic_fetch_dec */

> > 

> >  #else /* atomic_fetch_dec_relaxed */

> > 

> >  #ifndef atomic_fetch_dec_acquire

> >  #define atomic_fetch_dec_acquire(...)					\

> > 	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

> >  #endif

> > 

> >  #ifndef atomic_fetch_dec_release

> >  #define atomic_fetch_dec_release(...)					\

> > 	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

> >  #endif

> > 

> >  #ifndef atomic_fetch_dec

> >  #define atomic_fetch_dec(...)						\

> > 	__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

> >  #endif

> >  #endif /* atomic_fetch_dec_relaxed */

> > 

> > After:

> > 

> >  #ifndef atomic_fetch_dec_relaxed

> >  # ifndef atomic_fetch_dec

> >  #  define atomic_fetch_dec(v)			atomic_fetch_sub(1, (v))

> >  #  define atomic_fetch_dec_relaxed(v)		atomic_fetch_sub_relaxed(1, (v))

> >  #  define atomic_fetch_dec_acquire(v)		atomic_fetch_sub_acquire(1, (v))

> >  #  define atomic_fetch_dec_release(v)		atomic_fetch_sub_release(1, (v))

> >  # else

> >  #  define atomic_fetch_dec_relaxed		atomic_fetch_dec

> >  #  define atomic_fetch_dec_acquire		atomic_fetch_dec

> >  #  define atomic_fetch_dec_release		atomic_fetch_dec

> >  # endif

> >  #else

> >  # ifndef atomic_fetch_dec_acquire

> >  #  define atomic_fetch_dec_acquire(...)	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

> >  # endif

> >  # ifndef atomic_fetch_dec_release

> >  #  define atomic_fetch_dec_release(...)	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

> >  # endif

> >  # ifndef atomic_fetch_dec

> >  #  define atomic_fetch_dec(...)		__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

> >  # endif

> >  #endif

> > 

> > The new variant is readable at a glance, and the hierarchy of defines is very 

> > obvious as well.

> 

> It wraps and looks hideous in my normal setup. And I do detest that indent

> after # thing.


You should use wider terminals if you take a look at such code - there's already 
numerous areas of the kernel that are not readable on 80x25 terminals.

_Please_ try the following experiment, for me:

Enter the 21st century temporarily and widen two of your terminals from 80 cols to 
100 cols - it's only ~20% wider.

Apply the 3 patches I sent and then open the new and the old atomic.h in the two 
terminals and compare them visually.

The new structure is _much_ more compact, it is nicer looking and much more 
readable.

Thanks,

	Ingo
Mark Rutland May 5, 2018, 9:12 a.m. | #8
On Fri, May 04, 2018 at 08:24:22PM +0200, Peter Zijlstra wrote:
> On Fri, May 04, 2018 at 07:09:09PM +0100, Mark Rutland wrote:

> > On Fri, May 04, 2018 at 08:01:05PM +0200, Peter Zijlstra wrote:

> > > On Fri, May 04, 2018 at 06:39:32PM +0100, Mark Rutland wrote:

> 

> > > >  include/asm-generic/atomic-instrumented.h | 1195 ++++++++++++++++++++++++-----

> > > >  1 file changed, 1008 insertions(+), 187 deletions(-)

> > > 

> > > Is there really no way to either generate or further macro compress this?

> > 

> > I can definitely macro compress this somewhat, but the bulk of the

> > repetition will be the ifdeffery, which can't be macro'd away IIUC.

> 

> Right, much like what we already have in linux/atomic.h I suspect,

> having to duplicating that isn't brilliant either.

> 

> > Generating this with a script is possible -- do we do anything like that

> > elsewhere?

> 

> There's include/generated/ in your build directory. But nothing on this

> scale I think.


Sure. I'm not familiar with how we generate those, so I'll go digging through
the build infrastructure.

Thanks,
Mark.
Peter Zijlstra May 5, 2018, 9:24 a.m. | #9
On Sat, May 05, 2018 at 11:04:03AM +0200, Ingo Molnar wrote:
> 

> * Peter Zijlstra <peterz@infradead.org> wrote:

> 

> > > So we could do the following simplification on top of that:

> > > 

> > >  #ifndef atomic_fetch_dec_relaxed

> > >  # ifndef atomic_fetch_dec

> > >  #  define atomic_fetch_dec(v)		atomic_fetch_sub(1, (v))

> > >  #  define atomic_fetch_dec_relaxed(v)	atomic_fetch_sub_relaxed(1, (v))

> > >  #  define atomic_fetch_dec_acquire(v)	atomic_fetch_sub_acquire(1, (v))

> > >  #  define atomic_fetch_dec_release(v)	atomic_fetch_sub_release(1, (v))

> > >  # else

> > >  #  define atomic_fetch_dec_relaxed		atomic_fetch_dec

> > >  #  define atomic_fetch_dec_acquire		atomic_fetch_dec

> > >  #  define atomic_fetch_dec_release		atomic_fetch_dec

> > >  # endif

> > >  #else

> > >  # ifndef atomic_fetch_dec

> > >  #  define atomic_fetch_dec(...)		__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

> > >  #  define atomic_fetch_dec_acquire(...)	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

> > >  #  define atomic_fetch_dec_release(...)	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

> > >  # endif

> > >  #endif

> > 

> > This would disallow an architecture to override just fetch_dec_release for

> > instance.

> 

> Couldn't such a crazy arch just define _all_ the 3 APIs in this group?

> That's really a small price and makes the place pay the complexity

> price that does the weirdness...


I would expect the pattern where it can do all 'release' and/or all
'acquire' variants special but cannot use the __atomic_op_*() wrappery.

> > I don't think there currently is any architecture that does that, but the

> > intent was to allow it to override anything and only provide defaults where it

> > does not.

> 

> I'd argue that if a new arch only defines one of these APIs that's probably a bug. 

> If they absolutely want to do it, they still can - by defining all 3 APIs.

> 

> So there's no loss in arch flexibility.


Ideally we'd generate the whole mess.. and then allowing these extra few
overrides is not a problem at all.

> > None of this takes away the giant trainwreck that is the annotated atomic stuff

> > though.

> > 

> > And I seriously hate this one:

> > 

> >   ba1c9f83f633 ("locking/atomic/x86: Un-macro-ify atomic ops implementation")

> > 

> > and will likely undo that the moment I need to change anything there.

> 

> If it makes the code more readable then I don't object - the problem was that the 

> instrumentation indirection made all that code much harder to follow.


Thing is, it is all the exact same loop, and bitrot mandates they drift
over time. When I cleaned up all the architectures I found plenty cases
where there were spurious differences between things.
Peter Zijlstra May 5, 2018, 9:29 a.m. | #10
On Sat, May 05, 2018 at 11:09:03AM +0200, Ingo Molnar wrote:
> > >  # ifndef atomic_fetch_dec_acquire

> > >  #  define atomic_fetch_dec_acquire(...)	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

> > >  # endif

> > >  # ifndef atomic_fetch_dec_release

> > >  #  define atomic_fetch_dec_release(...)	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

> > >  # endif

> > >  # ifndef atomic_fetch_dec

> > >  #  define atomic_fetch_dec(...)		__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

> > >  # endif

> > >  #endif

> > > 

> > > The new variant is readable at a glance, and the hierarchy of defines is very 

> > > obvious as well.

> > 

> > It wraps and looks hideous in my normal setup. And I do detest that indent

> > after # thing.

> 

> You should use wider terminals if you take a look at such code - there's already 

> numerous areas of the kernel that are not readable on 80x25 terminals.

> 

> _Please_ try the following experiment, for me:

> 

> Enter the 21st century temporarily and widen two of your terminals from 80 cols to 

> 100 cols - it's only ~20% wider.


Doesn't work that way. The only way I get more columns is if I shrink my
font further. I work with tiles per monitor (left/right obv.) and use
two columns per editor. This gets me a total of 4 columns.

On my desktop that is slightly over 100 characters per column, on my
laptop that is slightly below 100 -- mostly because I'm pixel limited on
fontsize on that thing (FullHD sucks).

If it wraps it wraps.
Peter Zijlstra May 5, 2018, 9:32 a.m. | #11
On Sat, May 05, 2018 at 11:05:51AM +0200, Dmitry Vyukov wrote:
> On Sat, May 5, 2018 at 10:47 AM, Peter Zijlstra <peterz@infradead.org> wrote:

> > And I seriously hate this one:

> >

> >   ba1c9f83f633 ("locking/atomic/x86: Un-macro-ify atomic ops implementation")

> >

> > and will likely undo that the moment I need to change anything there.


> That was asked by Ingo:

> https://groups.google.com/d/msg/kasan-dev/3sNHjjb4GCI/Xz1uVWaaAAAJ

> 

> I think in the end all of current options suck in one way or another,

> so we are just going in circles.


Yeah, and I disagree with him, but didn't have the energy to fight at
that time (and still don't really, I'm just complaining).

> We either need something different (e.g. codegen), or settle on one

> option for doing it.


Codegen I think is the only sensible option at this point for all the
wrappers. The existing ones (without the annotation muck) were already
cumbersome, the annotation stuff just makes it completely horrid.
Ingo Molnar May 5, 2018, 9:38 a.m. | #12
* Ingo Molnar <mingo@kernel.org> wrote:

> * Peter Zijlstra <peterz@infradead.org> wrote:

> 

> > > So we could do the following simplification on top of that:

> > > 

> > >  #ifndef atomic_fetch_dec_relaxed

> > >  # ifndef atomic_fetch_dec

> > >  #  define atomic_fetch_dec(v)		atomic_fetch_sub(1, (v))

> > >  #  define atomic_fetch_dec_relaxed(v)	atomic_fetch_sub_relaxed(1, (v))

> > >  #  define atomic_fetch_dec_acquire(v)	atomic_fetch_sub_acquire(1, (v))

> > >  #  define atomic_fetch_dec_release(v)	atomic_fetch_sub_release(1, (v))

> > >  # else

> > >  #  define atomic_fetch_dec_relaxed		atomic_fetch_dec

> > >  #  define atomic_fetch_dec_acquire		atomic_fetch_dec

> > >  #  define atomic_fetch_dec_release		atomic_fetch_dec

> > >  # endif

> > >  #else

> > >  # ifndef atomic_fetch_dec

> > >  #  define atomic_fetch_dec(...)		__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

> > >  #  define atomic_fetch_dec_acquire(...)	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

> > >  #  define atomic_fetch_dec_release(...)	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

> > >  # endif

> > >  #endif

> > 

> > This would disallow an architecture to override just fetch_dec_release for

> > instance.

> 

> Couldn't such a crazy arch just define _all_ the 3 APIs in this group?

> That's really a small price and makes the place pay the complexity

> price that does the weirdness...

> 

> > I don't think there currently is any architecture that does that, but the

> > intent was to allow it to override anything and only provide defaults where it

> > does not.

> 

> I'd argue that if a new arch only defines one of these APIs that's probably a bug. 

> If they absolutely want to do it, they still can - by defining all 3 APIs.

> 

> So there's no loss in arch flexibility.


BTW., PowerPC for example is already in such a situation, it does not define 
atomic_cmpxchg_release(), only the other APIs:

#define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
#define atomic_cmpxchg_relaxed(v, o, n) \
	cmpxchg_relaxed(&((v)->counter), (o), (n))
#define atomic_cmpxchg_acquire(v, o, n) \
	cmpxchg_acquire(&((v)->counter), (o), (n))

Was it really the intention on the PowerPC side that the generic code falls back 
to cmpxchg(), i.e.:

#  define atomic_cmpxchg_release(...)           __atomic_op_release(atomic_cmpxchg, __VA_ARGS__)

Which after macro expansion becomes:

	smp_mb__before_atomic();
	atomic_cmpxchg_relaxed(v, o, n);

smp_mb__before_atomic() on PowerPC falls back to the generic __smp_mb(), which 
falls back to mb(), which on PowerPC is the 'sync' instruction.

Isn't this a inefficiency bug?

While I'm pretty clueless about PowerPC low level cmpxchg atomics, they appear to 
have the following basic structure:

full cmpxchg():

	PPC_ATOMIC_ENTRY_BARRIER # sync
	ldarx + stdcx
	PPC_ATOMIC_EXIT_BARRIER  # sync

cmpxchg_relaxed():

	ldarx + stdcx

cmpxchg_acquire():

	ldarx + stdcx
	PPC_ACQUIRE_BARRIER      # lwsync

The logical extension for cmpxchg_release() would be:

cmpxchg_release():

	PPC_RELEASE_BARRIER      # lwsync
	ldarx + stdcx

But instead we silently get the generic fallback, which does:

	smp_mb__before_atomic();
	atomic_cmpxchg_relaxed(v, o, n);

Which maps to:

	sync
	ldarx + stdcx

Note that it uses a full barrier instead of lwsync (does that stand for 
'lightweight sync'?).

Even if it turns out we need the full barrier, with the overly finegrained 
structure of the atomics this detail is totally undocumented and non-obvious.

Thanks,

	Ingo
Boqun Feng May 5, 2018, 10:16 a.m. | #13
On Sat, May 05, 2018 at 11:38:29AM +0200, Ingo Molnar wrote:
> 

> * Ingo Molnar <mingo@kernel.org> wrote:

> 

> > * Peter Zijlstra <peterz@infradead.org> wrote:

> > 

> > > > So we could do the following simplification on top of that:

> > > > 

> > > >  #ifndef atomic_fetch_dec_relaxed

> > > >  # ifndef atomic_fetch_dec

> > > >  #  define atomic_fetch_dec(v)		atomic_fetch_sub(1, (v))

> > > >  #  define atomic_fetch_dec_relaxed(v)	atomic_fetch_sub_relaxed(1, (v))

> > > >  #  define atomic_fetch_dec_acquire(v)	atomic_fetch_sub_acquire(1, (v))

> > > >  #  define atomic_fetch_dec_release(v)	atomic_fetch_sub_release(1, (v))

> > > >  # else

> > > >  #  define atomic_fetch_dec_relaxed		atomic_fetch_dec

> > > >  #  define atomic_fetch_dec_acquire		atomic_fetch_dec

> > > >  #  define atomic_fetch_dec_release		atomic_fetch_dec

> > > >  # endif

> > > >  #else

> > > >  # ifndef atomic_fetch_dec

> > > >  #  define atomic_fetch_dec(...)		__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

> > > >  #  define atomic_fetch_dec_acquire(...)	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

> > > >  #  define atomic_fetch_dec_release(...)	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

> > > >  # endif

> > > >  #endif

> > > 

> > > This would disallow an architecture to override just fetch_dec_release for

> > > instance.

> > 

> > Couldn't such a crazy arch just define _all_ the 3 APIs in this group?

> > That's really a small price and makes the place pay the complexity

> > price that does the weirdness...

> > 

> > > I don't think there currently is any architecture that does that, but the

> > > intent was to allow it to override anything and only provide defaults where it

> > > does not.

> > 

> > I'd argue that if a new arch only defines one of these APIs that's probably a bug. 

> > If they absolutely want to do it, they still can - by defining all 3 APIs.

> > 

> > So there's no loss in arch flexibility.

> 

> BTW., PowerPC for example is already in such a situation, it does not define 

> atomic_cmpxchg_release(), only the other APIs:

> 

> #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))

> #define atomic_cmpxchg_relaxed(v, o, n) \

> 	cmpxchg_relaxed(&((v)->counter), (o), (n))

> #define atomic_cmpxchg_acquire(v, o, n) \

> 	cmpxchg_acquire(&((v)->counter), (o), (n))

> 

> Was it really the intention on the PowerPC side that the generic code falls back 

> to cmpxchg(), i.e.:

> 

> #  define atomic_cmpxchg_release(...)           __atomic_op_release(atomic_cmpxchg, __VA_ARGS__)

> 


So ppc has its own definition __atomic_op_release() in
arch/powerpc/include/asm/atomic.h:

	#define __atomic_op_release(op, args...)				\
	({									\
		__asm__ __volatile__(PPC_RELEASE_BARRIER "" : : : "memory");	\
		op##_relaxed(args);						\
	})

, and PPC_RELEASE_BARRIER is lwsync, so we map to

	lwsync();
	atomic_cmpxchg_relaxed(v, o, n);

And the reason, why we don't define atomic_cmpxchg_release() but define
atomic_cmpxchg_acquire() is that, atomic_cmpxchg_*() could provide no
ordering guarantee if the cmp fails, we did this for
atomic_cmpxchg_acquire() but not for atomic_cmpxchg_release(), because
doing so may introduce a memory barrier inside a ll/sc critical section,
please see the comment before __cmpxchg_u32_acquire() in
arch/powerpc/include/asm/cmpxchg.h:

	/*
	 * cmpxchg family don't have order guarantee if cmp part fails, therefore we
	 * can avoid superfluous barriers if we use assembly code to implement
	 * cmpxchg() and cmpxchg_acquire(), however we don't do the similar for
	 * cmpxchg_release() because that will result in putting a barrier in the
	 * middle of a ll/sc loop, which is probably a bad idea. For example, this
	 * might cause the conditional store more likely to fail.
	 */

Regards,
Boqun


> Which after macro expansion becomes:

> 

> 	smp_mb__before_atomic();

> 	atomic_cmpxchg_relaxed(v, o, n);

> 

> smp_mb__before_atomic() on PowerPC falls back to the generic __smp_mb(), which 

> falls back to mb(), which on PowerPC is the 'sync' instruction.

> 

> Isn't this a inefficiency bug?

> 

> While I'm pretty clueless about PowerPC low level cmpxchg atomics, they appear to 

> have the following basic structure:

> 

> full cmpxchg():

> 

> 	PPC_ATOMIC_ENTRY_BARRIER # sync

> 	ldarx + stdcx

> 	PPC_ATOMIC_EXIT_BARRIER  # sync

> 

> cmpxchg_relaxed():

> 

> 	ldarx + stdcx

> 

> cmpxchg_acquire():

> 

> 	ldarx + stdcx

> 	PPC_ACQUIRE_BARRIER      # lwsync

> 

> The logical extension for cmpxchg_release() would be:

> 

> cmpxchg_release():

> 

> 	PPC_RELEASE_BARRIER      # lwsync

> 	ldarx + stdcx

> 

> But instead we silently get the generic fallback, which does:

> 

> 	smp_mb__before_atomic();

> 	atomic_cmpxchg_relaxed(v, o, n);

> 

> Which maps to:

> 

> 	sync

> 	ldarx + stdcx

> 

> Note that it uses a full barrier instead of lwsync (does that stand for 

> 'lightweight sync'?).

> 

> Even if it turns out we need the full barrier, with the overly finegrained 

> structure of the atomics this detail is totally undocumented and non-obvious.

> 

> Thanks,

> 

> 	Ingo
Boqun Feng May 5, 2018, 10:26 a.m. | #14
Hi Ingo,

On Sat, May 05, 2018 at 12:00:55PM +0200, Ingo Molnar wrote:
> 

> * Ingo Molnar <mingo@kernel.org> wrote:

> 

> > > So there's no loss in arch flexibility.

> > 

> > BTW., PowerPC for example is already in such a situation, it does not define 

> > atomic_cmpxchg_release(), only the other APIs:

> > 

> > #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))

> > #define atomic_cmpxchg_relaxed(v, o, n) \

> > 	cmpxchg_relaxed(&((v)->counter), (o), (n))

> > #define atomic_cmpxchg_acquire(v, o, n) \

> > 	cmpxchg_acquire(&((v)->counter), (o), (n))

> > 

> > Was it really the intention on the PowerPC side that the generic code falls back 

> > to cmpxchg(), i.e.:

> > 

> > #  define atomic_cmpxchg_release(...)           __atomic_op_release(atomic_cmpxchg, __VA_ARGS__)

> > 

> > Which after macro expansion becomes:

> > 

> > 	smp_mb__before_atomic();

> > 	atomic_cmpxchg_relaxed(v, o, n);

> > 

> > smp_mb__before_atomic() on PowerPC falls back to the generic __smp_mb(), which 

> > falls back to mb(), which on PowerPC is the 'sync' instruction.

> > 

> > Isn't this a inefficiency bug?

> > 

> > While I'm pretty clueless about PowerPC low level cmpxchg atomics, they appear to 

> > have the following basic structure:

> > 

> > full cmpxchg():

> > 

> > 	PPC_ATOMIC_ENTRY_BARRIER # sync

> > 	ldarx + stdcx

> > 	PPC_ATOMIC_EXIT_BARRIER  # sync

> > 

> > cmpxchg_relaxed():

> > 

> > 	ldarx + stdcx

> > 

> > cmpxchg_acquire():

> > 

> > 	ldarx + stdcx

> > 	PPC_ACQUIRE_BARRIER      # lwsync

> > 

> > The logical extension for cmpxchg_release() would be:

> > 

> > cmpxchg_release():

> > 

> > 	PPC_RELEASE_BARRIER      # lwsync

> > 	ldarx + stdcx

> > 

> > But instead we silently get the generic fallback, which does:

> > 

> > 	smp_mb__before_atomic();

> > 	atomic_cmpxchg_relaxed(v, o, n);

> > 

> > Which maps to:

> > 

> > 	sync

> > 	ldarx + stdcx

> > 

> > Note that it uses a full barrier instead of lwsync (does that stand for 

> > 'lightweight sync'?).

> > 

> > Even if it turns out we need the full barrier, with the overly finegrained 

> > structure of the atomics this detail is totally undocumented and non-obvious.

> 

> The patch below fills in those bits and implements the optimized cmpxchg_release() 

> family of APIs. The end effect should be that cmpxchg_release() will now use 

> 'lwsync' instead of 'sync' on PowerPC, for the following APIs:

> 

>   cmpxchg_release()

>   cmpxchg64_release()

>   atomic_cmpxchg_release()

>   atomic64_cmpxchg_release()

> 

> I based this choice of the release barrier on an existing bitops low level PowerPC 

> method:

> 

>    DEFINE_BITOP(clear_bits_unlock, andc, PPC_RELEASE_BARRIER)

> 

> This clearly suggests that PPC_RELEASE_BARRIER is in active use and 'lwsync' is 

> the 'release barrier' instruction, if I interpreted that right.

> 


Thanks for looking into this, but as I said in other email:

	https://marc.info/?l=linux-kernel&m=152551511324210&w=2

, we actually generate light weight barriers for cmpxchg_release()
familiy.

The reason of the asymmetry between cmpxchg_acquire() and
cmpxchg_release() is that we want to save a barrier for
cmpxchg_acquire() if the cmp fails, but doing the similar for
cmpxchg_release() will introduce a scenario that puts a barrier in a
ll/sc loop, which may be a bad idea.

> But I know very little about PowerPC so this might be spectacularly wrong. It's 

> totally untested as well. I also pretty sick today so my mental capabilities are 

> significantly reduced ...

> 


Feel sorry about that, hope you well!

Please let me know if you think I should provide more document work to
make this more informative.

Regards,
Boqun

> So not signed off and such.

> 

> Thanks,

> 

> 	Ingo

> 

> ---

>  arch/powerpc/include/asm/atomic.h  |  4 ++

>  arch/powerpc/include/asm/cmpxchg.h | 81 ++++++++++++++++++++++++++++++++++++++

>  2 files changed, 85 insertions(+)

> 

> diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h

> index 682b3e6a1e21..f7a6f29acb12 100644

> --- a/arch/powerpc/include/asm/atomic.h

> +++ b/arch/powerpc/include/asm/atomic.h

> @@ -213,6 +213,8 @@ static __inline__ int atomic_dec_return_relaxed(atomic_t *v)

>  	cmpxchg_relaxed(&((v)->counter), (o), (n))

>  #define atomic_cmpxchg_acquire(v, o, n) \

>  	cmpxchg_acquire(&((v)->counter), (o), (n))

> +#define atomic_cmpxchg_release(v, o, n) \

> +	cmpxchg_release(&((v)->counter), (o), (n))

>  

>  #define atomic_xchg(v, new) (xchg(&((v)->counter), new))

>  #define atomic_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))

> @@ -519,6 +521,8 @@ static __inline__ long atomic64_dec_if_positive(atomic64_t *v)

>  	cmpxchg_relaxed(&((v)->counter), (o), (n))

>  #define atomic64_cmpxchg_acquire(v, o, n) \

>  	cmpxchg_acquire(&((v)->counter), (o), (n))

> +#define atomic64_cmpxchg_release(v, o, n) \

> +	cmpxchg_release(&((v)->counter), (o), (n))

>  

>  #define atomic64_xchg(v, new) (xchg(&((v)->counter), new))

>  #define atomic64_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))

> diff --git a/arch/powerpc/include/asm/cmpxchg.h b/arch/powerpc/include/asm/cmpxchg.h

> index 9b001f1f6b32..6e46310b1833 100644

> --- a/arch/powerpc/include/asm/cmpxchg.h

> +++ b/arch/powerpc/include/asm/cmpxchg.h

> @@ -213,10 +213,12 @@ __xchg_relaxed(void *ptr, unsigned long x, unsigned int size)

>  CMPXCHG_GEN(u8, , PPC_ATOMIC_ENTRY_BARRIER, PPC_ATOMIC_EXIT_BARRIER, "memory");

>  CMPXCHG_GEN(u8, _local, , , "memory");

>  CMPXCHG_GEN(u8, _acquire, , PPC_ACQUIRE_BARRIER, "memory");

> +CMPXCHG_GEN(u8, _release, PPC_RELEASE_BARRIER, , "memory");

>  CMPXCHG_GEN(u8, _relaxed, , , "cc");

>  CMPXCHG_GEN(u16, , PPC_ATOMIC_ENTRY_BARRIER, PPC_ATOMIC_EXIT_BARRIER, "memory");

>  CMPXCHG_GEN(u16, _local, , , "memory");

>  CMPXCHG_GEN(u16, _acquire, , PPC_ACQUIRE_BARRIER, "memory");

> +CMPXCHG_GEN(u16, _release, PPC_RELEASE_BARRIER, , "memory");

>  CMPXCHG_GEN(u16, _relaxed, , , "cc");

>  

>  static __always_inline unsigned long

> @@ -314,6 +316,29 @@ __cmpxchg_u32_acquire(u32 *p, unsigned long old, unsigned long new)

>  	return prev;

>  }

>  

> +static __always_inline unsigned long

> +__cmpxchg_u32_release(u32 *p, unsigned long old, unsigned long new)

> +{

> +	unsigned long prev;

> +

> +	__asm__ __volatile__ (

> +	PPC_RELEASE_BARRIER

> +"1:	lwarx	%0,0,%2		# __cmpxchg_u32_release\n"

> +"	cmpw	0,%0,%3\n"

> +"	bne-	2f\n"

> +	PPC405_ERR77(0, %2)

> +"	stwcx.	%4,0,%2\n"

> +"	bne-	1b\n"

> +	"\n"

> +"2:"

> +	: "=&r" (prev), "+m" (*p)

> +	: "r" (p), "r" (old), "r" (new)

> +	: "cc", "memory");

> +

> +	return prev;

> +}

> +

> +

>  #ifdef CONFIG_PPC64

>  static __always_inline unsigned long

>  __cmpxchg_u64(volatile unsigned long *p, unsigned long old, unsigned long new)

> @@ -397,6 +422,27 @@ __cmpxchg_u64_acquire(u64 *p, unsigned long old, unsigned long new)

>  

>  	return prev;

>  }

> +

> +static __always_inline unsigned long

> +__cmpxchg_u64_release(u64 *p, unsigned long old, unsigned long new)

> +{

> +	unsigned long prev;

> +

> +	__asm__ __volatile__ (

> +	PPC_RELEASE_BARRIER

> +"1:	ldarx	%0,0,%2		# __cmpxchg_u64_release\n"

> +"	cmpd	0,%0,%3\n"

> +"	bne-	2f\n"

> +"	stdcx.	%4,0,%2\n"

> +"	bne-	1b\n"

> +	"\n"

> +"2:"

> +	: "=&r" (prev), "+m" (*p)

> +	: "r" (p), "r" (old), "r" (new)

> +	: "cc", "memory");

> +

> +	return prev;

> +}

>  #endif

>  

>  static __always_inline unsigned long

> @@ -478,6 +524,27 @@ __cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new,

>  	BUILD_BUG_ON_MSG(1, "Unsupported size for __cmpxchg_acquire");

>  	return old;

>  }

> +

> +static __always_inline unsigned long

> +__cmpxchg_release(void *ptr, unsigned long old, unsigned long new,

> +		  unsigned int size)

> +{

> +	switch (size) {

> +	case 1:

> +		return __cmpxchg_u8_release(ptr, old, new);

> +	case 2:

> +		return __cmpxchg_u16_release(ptr, old, new);

> +	case 4:

> +		return __cmpxchg_u32_release(ptr, old, new);

> +#ifdef CONFIG_PPC64

> +	case 8:

> +		return __cmpxchg_u64_release(ptr, old, new);

> +#endif

> +	}

> +	BUILD_BUG_ON_MSG(1, "Unsupported size for __cmpxchg_release");

> +	return old;

> +}

> +

>  #define cmpxchg(ptr, o, n)						 \

>    ({									 \

>       __typeof__(*(ptr)) _o_ = (o);					 \

> @@ -512,6 +579,15 @@ __cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new,

>  			(unsigned long)_o_, (unsigned long)_n_,		\

>  			sizeof(*(ptr)));				\

>  })

> +

> +#define cmpxchg_release(ptr, o, n)					\

> +({									\

> +	__typeof__(*(ptr)) _o_ = (o);					\

> +	__typeof__(*(ptr)) _n_ = (n);					\

> +	(__typeof__(*(ptr))) __cmpxchg_release((ptr),			\

> +			(unsigned long)_o_, (unsigned long)_n_,		\

> +			sizeof(*(ptr)));				\

> +})

>  #ifdef CONFIG_PPC64

>  #define cmpxchg64(ptr, o, n)						\

>    ({									\

> @@ -533,6 +609,11 @@ __cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new,

>  	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\

>  	cmpxchg_acquire((ptr), (o), (n));				\

>  })

> +#define cmpxchg64_release(ptr, o, n)					\

> +({									\

> +	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\

> +	cmpxchg_release((ptr), (o), (n));				\

> +})

>  #else

>  #include <asm-generic/cmpxchg-local.h>

>  #define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n))
Ingo Molnar May 5, 2018, 10:59 a.m. | #15
* Ingo Molnar <mingo@kernel.org> wrote:

> If that's too then there's a few more things we could do - for example the 

              ^--too much
> attached patch renames a (very minor) misnomer to a shorter name and thus saves on 

> the longest lines, the column histogram now looks like this:


Thanks,

	Ingo
Boqun Feng May 5, 2018, 11:28 a.m. | #16
On Sat, May 05, 2018 at 12:35:50PM +0200, Ingo Molnar wrote:
> 

> * Boqun Feng <boqun.feng@gmail.com> wrote:

> 

> > On Sat, May 05, 2018 at 11:38:29AM +0200, Ingo Molnar wrote:

> > > 

> > > * Ingo Molnar <mingo@kernel.org> wrote:

> > > 

> > > > * Peter Zijlstra <peterz@infradead.org> wrote:

> > > > 

> > > > > > So we could do the following simplification on top of that:

> > > > > > 

> > > > > >  #ifndef atomic_fetch_dec_relaxed

> > > > > >  # ifndef atomic_fetch_dec

> > > > > >  #  define atomic_fetch_dec(v)		atomic_fetch_sub(1, (v))

> > > > > >  #  define atomic_fetch_dec_relaxed(v)	atomic_fetch_sub_relaxed(1, (v))

> > > > > >  #  define atomic_fetch_dec_acquire(v)	atomic_fetch_sub_acquire(1, (v))

> > > > > >  #  define atomic_fetch_dec_release(v)	atomic_fetch_sub_release(1, (v))

> > > > > >  # else

> > > > > >  #  define atomic_fetch_dec_relaxed		atomic_fetch_dec

> > > > > >  #  define atomic_fetch_dec_acquire		atomic_fetch_dec

> > > > > >  #  define atomic_fetch_dec_release		atomic_fetch_dec

> > > > > >  # endif

> > > > > >  #else

> > > > > >  # ifndef atomic_fetch_dec

> > > > > >  #  define atomic_fetch_dec(...)		__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

> > > > > >  #  define atomic_fetch_dec_acquire(...)	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

> > > > > >  #  define atomic_fetch_dec_release(...)	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

> > > > > >  # endif

> > > > > >  #endif

> > > > > 

> > > > > This would disallow an architecture to override just fetch_dec_release for

> > > > > instance.

> > > > 

> > > > Couldn't such a crazy arch just define _all_ the 3 APIs in this group?

> > > > That's really a small price and makes the place pay the complexity

> > > > price that does the weirdness...

> > > > 

> > > > > I don't think there currently is any architecture that does that, but the

> > > > > intent was to allow it to override anything and only provide defaults where it

> > > > > does not.

> > > > 

> > > > I'd argue that if a new arch only defines one of these APIs that's probably a bug. 

> > > > If they absolutely want to do it, they still can - by defining all 3 APIs.

> > > > 

> > > > So there's no loss in arch flexibility.

> > > 

> > > BTW., PowerPC for example is already in such a situation, it does not define 

> > > atomic_cmpxchg_release(), only the other APIs:

> > > 

> > > #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))

> > > #define atomic_cmpxchg_relaxed(v, o, n) \

> > > 	cmpxchg_relaxed(&((v)->counter), (o), (n))

> > > #define atomic_cmpxchg_acquire(v, o, n) \

> > > 	cmpxchg_acquire(&((v)->counter), (o), (n))

> > > 

> > > Was it really the intention on the PowerPC side that the generic code falls back 

> > > to cmpxchg(), i.e.:

> > > 

> > > #  define atomic_cmpxchg_release(...)           __atomic_op_release(atomic_cmpxchg, __VA_ARGS__)

> > > 

> > 

> > So ppc has its own definition __atomic_op_release() in

> > arch/powerpc/include/asm/atomic.h:

> > 

> > 	#define __atomic_op_release(op, args...)				\

> > 	({									\

> > 		__asm__ __volatile__(PPC_RELEASE_BARRIER "" : : : "memory");	\

> > 		op##_relaxed(args);						\

> > 	})

> > 

> > , and PPC_RELEASE_BARRIER is lwsync, so we map to

> > 

> > 	lwsync();

> > 	atomic_cmpxchg_relaxed(v, o, n);

> > 

> > And the reason, why we don't define atomic_cmpxchg_release() but define

> > atomic_cmpxchg_acquire() is that, atomic_cmpxchg_*() could provide no

> > ordering guarantee if the cmp fails, we did this for

> > atomic_cmpxchg_acquire() but not for atomic_cmpxchg_release(), because

> > doing so may introduce a memory barrier inside a ll/sc critical section,

> > please see the comment before __cmpxchg_u32_acquire() in

> > arch/powerpc/include/asm/cmpxchg.h:

> > 

> > 	/*

> > 	 * cmpxchg family don't have order guarantee if cmp part fails, therefore we

> > 	 * can avoid superfluous barriers if we use assembly code to implement

> > 	 * cmpxchg() and cmpxchg_acquire(), however we don't do the similar for

> > 	 * cmpxchg_release() because that will result in putting a barrier in the

> > 	 * middle of a ll/sc loop, which is probably a bad idea. For example, this

> > 	 * might cause the conditional store more likely to fail.

> > 	 */

> 

> Makes sense, thanks a lot for the explanation, missed that comment in the middle 

> of the assembly functions!

> 


;-) I could move it so somewhere else in the future.

> So the patch I sent is buggy, please disregard it.

> 

> May I suggest the patch below? No change in functionality, but it documents the 

> lack of the cmpxchg_release() APIs and maps them explicitly to the full cmpxchg() 

> version. (Which the generic code does now in a rather roundabout way.)

> 


Hmm.. cmpxchg_release() is actually lwsync() + cmpxchg_relaxed(), but
you just make it sync() + cmpxchg_relaxed() + sync() with the fallback,
and sync() is much heavier, so I don't think the fallback is correct.

I think maybe you can move powerpc's __atomic_op_{acqurie,release}()
from atomic.h to cmpxchg.h (in arch/powerpc/include/asm), and

	#define cmpxchg_release __atomic_op_release(cmpxchg, __VA_ARGS__);
	#define cmpxchg64_release __atomic_op_release(cmpxchg64, __VA_ARGS__);

I put a diff below to say what I mean (untested).

> Also, the change to arch/powerpc/include/asm/atomic.h has no functional effect 

> right now either, but should anyone add a _relaxed() variant in the future, with 

> this change atomic_cmpxchg_release() and atomic64_cmpxchg_release() will pick that 

> up automatically.

> 


You mean with your other modification in include/linux/atomic.h, right?
Because with the unmodified include/linux/atomic.h, we already pick that
autmatically. If so, I think that's fine.

Here is the diff for the modification for cmpxchg_release(), the idea is
we generate them in asm/cmpxchg.h other than linux/atomic.h for ppc, so
we keep the new linux/atomic.h working. Because if I understand
correctly, the next linux/atomic.h only accepts that

1)	architecture only defines fully ordered primitives

or

2)	architecture only defines _relaxed primitives

or

3)	architecture defines all four (fully, _relaxed, _acquire,
	_release) primitives

So powerpc needs to define all four primitives in its only
asm/cmpxchg.h.

Regards,
Boqun

diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h
index 682b3e6a1e21..0136be11c84f 100644
--- a/arch/powerpc/include/asm/atomic.h
+++ b/arch/powerpc/include/asm/atomic.h
@@ -13,24 +13,6 @@
 
 #define ATOMIC_INIT(i)		{ (i) }
 
-/*
- * Since *_return_relaxed and {cmp}xchg_relaxed are implemented with
- * a "bne-" instruction at the end, so an isync is enough as a acquire barrier
- * on the platform without lwsync.
- */
-#define __atomic_op_acquire(op, args...)				\
-({									\
-	typeof(op##_relaxed(args)) __ret  = op##_relaxed(args);		\
-	__asm__ __volatile__(PPC_ACQUIRE_BARRIER "" : : : "memory");	\
-	__ret;								\
-})
-
-#define __atomic_op_release(op, args...)				\
-({									\
-	__asm__ __volatile__(PPC_RELEASE_BARRIER "" : : : "memory");	\
-	op##_relaxed(args);						\
-})
-
 static __inline__ int atomic_read(const atomic_t *v)
 {
 	int t;
diff --git a/arch/powerpc/include/asm/cmpxchg.h b/arch/powerpc/include/asm/cmpxchg.h
index 9b001f1f6b32..9e20a942aff9 100644
--- a/arch/powerpc/include/asm/cmpxchg.h
+++ b/arch/powerpc/include/asm/cmpxchg.h
@@ -8,6 +8,24 @@
 #include <asm/asm-compat.h>
 #include <linux/bug.h>
 
+/*
+ * Since *_return_relaxed and {cmp}xchg_relaxed are implemented with
+ * a "bne-" instruction at the end, so an isync is enough as a acquire barrier
+ * on the platform without lwsync.
+ */
+#define __atomic_op_acquire(op, args...)				\
+({									\
+	typeof(op##_relaxed(args)) __ret  = op##_relaxed(args);		\
+	__asm__ __volatile__(PPC_ACQUIRE_BARRIER "" : : : "memory");	\
+	__ret;								\
+})
+
+#define __atomic_op_release(op, args...)				\
+({									\
+	__asm__ __volatile__(PPC_RELEASE_BARRIER "" : : : "memory");	\
+	op##_relaxed(args);						\
+})
+
 #ifdef __BIG_ENDIAN
 #define BITOFF_CAL(size, off)	((sizeof(u32) - size - off) * BITS_PER_BYTE)
 #else
@@ -512,6 +530,8 @@ __cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new,
 			(unsigned long)_o_, (unsigned long)_n_,		\
 			sizeof(*(ptr)));				\
 })
+
+#define cmpxchg_release(ptr, o, n) __atomic_op_release(cmpxchg, __VA_ARGS__)
 #ifdef CONFIG_PPC64
 #define cmpxchg64(ptr, o, n)						\
   ({									\
@@ -533,6 +553,7 @@ __cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new,
 	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
 	cmpxchg_acquire((ptr), (o), (n));				\
 })
+#define cmpxchg64_release(ptr, o, n) __atomic_op_release(cmpxchg64, __VA_ARGS__)
 #else
 #include <asm-generic/cmpxchg-local.h>
 #define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n))
Boqun Feng May 5, 2018, 2:03 p.m. | #17
On Sat, May 05, 2018 at 03:27:51PM +0200, Ingo Molnar wrote:
> 

> * Boqun Feng <boqun.feng@gmail.com> wrote:

> 

> > > May I suggest the patch below? No change in functionality, but it documents the 

> > > lack of the cmpxchg_release() APIs and maps them explicitly to the full cmpxchg() 

> > > version. (Which the generic code does now in a rather roundabout way.)

> > > 

> > 

> > Hmm.. cmpxchg_release() is actually lwsync() + cmpxchg_relaxed(), but

> > you just make it sync() + cmpxchg_relaxed() + sync() with the fallback,

> > and sync() is much heavier, so I don't think the fallback is correct.

> 

> Indeed!

> 

> The bit I missed previously is that PowerPC provides its own __atomic_op_release() 

> method:

> 

>    #define __atomic_op_release(op, args...)                                \

>    ({                                                                      \

>            __asm__ __volatile__(PPC_RELEASE_BARRIER "" : : : "memory");    \

>            op##_relaxed(args);                                             \

>    })

> 

> ... which maps to LWSYNC as you say, and my patch made that worse.

> 

> > I think maybe you can move powerpc's __atomic_op_{acqurie,release}()

> > from atomic.h to cmpxchg.h (in arch/powerpc/include/asm), and

> > 

> > 	#define cmpxchg_release __atomic_op_release(cmpxchg, __VA_ARGS__);

> > 	#define cmpxchg64_release __atomic_op_release(cmpxchg64, __VA_ARGS__);

> > 

> > I put a diff below to say what I mean (untested).

> > 

> > > Also, the change to arch/powerpc/include/asm/atomic.h has no functional effect 

> > > right now either, but should anyone add a _relaxed() variant in the future, with 

> > > this change atomic_cmpxchg_release() and atomic64_cmpxchg_release() will pick that 

> > > up automatically.

> > > 

> > 

> > You mean with your other modification in include/linux/atomic.h, right?

> > Because with the unmodified include/linux/atomic.h, we already pick that

> > autmatically. If so, I think that's fine.

> > 

> > Here is the diff for the modification for cmpxchg_release(), the idea is

> > we generate them in asm/cmpxchg.h other than linux/atomic.h for ppc, so

> > we keep the new linux/atomic.h working. Because if I understand

> > correctly, the next linux/atomic.h only accepts that

> > 

> > 1)	architecture only defines fully ordered primitives

> > 

> > or

> > 

> > 2)	architecture only defines _relaxed primitives

> > 

> > or

> > 

> > 3)	architecture defines all four (fully, _relaxed, _acquire,

> > 	_release) primitives

> > 

> > So powerpc needs to define all four primitives in its only

> > asm/cmpxchg.h.

> 

> Correct, although the new logic is still RFC, PeterZ didn't like the first version 

> I proposed and might NAK them.

> 


Understood. From my side, I don't have strong feelings for either way.
But since powerpc gets affected with the new logic, so I'm glad I could
help.

> Thanks for the patch - I have created the patch below from it and added your 

> Signed-off-by.

> 


Thanks ;-)

> The only change I made beyond a trivial build fix is that I also added the release 

> atomics variants explicitly:

> 

> +#define atomic_cmpxchg_release(v, o, n) \

> +	cmpxchg_release(&((v)->counter), (o), (n))

> +#define atomic64_cmpxchg_release(v, o, n) \

> +	cmpxchg_release(&((v)->counter), (o), (n))

> 

> It has passed a PowerPC cross-build test here, but no runtime tests.

> 


Do you have the commit at any branch in tip tree? I could pull it and
cross-build and check the assembly code of lib/atomic64_test.c, that way
I could verify whether we mess something up.

> Does this patch look good to you?

> 


Yep!

Regards,
Boqun

> (Still subject to PeterZ's Ack/NAK.)

> 

> Thanks,

> 

> 	Ingo

> 

> ======================>

> From: Boqun Feng <boqun.feng@gmail.com>

> Date: Sat, 5 May 2018 19:28:17 +0800

> Subject: [PATCH] locking/atomics/powerpc: Move cmpxchg helpers to asm/cmpxchg.h and define the full set of cmpxchg APIs

> 

> Move PowerPC's __op_{acqurie,release}() from atomic.h to

> cmpxchg.h (in arch/powerpc/include/asm), plus use them to

> define these two methods:

> 

> 	#define cmpxchg_release __op_release(cmpxchg, __VA_ARGS__);

> 	#define cmpxchg64_release __op_release(cmpxchg64, __VA_ARGS__);

> 

> ... the idea is to generate all these methods in cmpxchg.h and to define the full

> array of atomic primitives, including the cmpxchg_release() methods which were

> defined by the generic code before.

> 

> Also define the atomic[64]_() variants explicitly.

> 

> This ensures that all these low level cmpxchg APIs are defined in

> PowerPC headers, with no generic header fallbacks.

> 

> No change in functionality or code generation.

> 

> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>

> Cc: Linus Torvalds <torvalds@linux-foundation.org>

> Cc: Mark Rutland <mark.rutland@arm.com>

> Cc: Peter Zijlstra <peterz@infradead.org>

> Cc: Thomas Gleixner <tglx@linutronix.de>

> Cc: aryabinin@virtuozzo.com

> Cc: catalin.marinas@arm.com

> Cc: dvyukov@google.com

> Cc: linux-arm-kernel@lists.infradead.org

> Cc: will.deacon@arm.com

> Link: http://lkml.kernel.org/r/20180505112817.ihrb726i37bwm4cj@tardis

> Signed-off-by: Ingo Molnar <mingo@kernel.org>

> ---

>  arch/powerpc/include/asm/atomic.h  | 22 ++++------------------

>  arch/powerpc/include/asm/cmpxchg.h | 24 ++++++++++++++++++++++++

>  2 files changed, 28 insertions(+), 18 deletions(-)

> 

> diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h

> index 682b3e6a1e21..4e06955ec10f 100644

> --- a/arch/powerpc/include/asm/atomic.h

> +++ b/arch/powerpc/include/asm/atomic.h

> @@ -13,24 +13,6 @@

>  

>  #define ATOMIC_INIT(i)		{ (i) }

>  

> -/*

> - * Since *_return_relaxed and {cmp}xchg_relaxed are implemented with

> - * a "bne-" instruction at the end, so an isync is enough as a acquire barrier

> - * on the platform without lwsync.

> - */

> -#define __atomic_op_acquire(op, args...)				\

> -({									\

> -	typeof(op##_relaxed(args)) __ret  = op##_relaxed(args);		\

> -	__asm__ __volatile__(PPC_ACQUIRE_BARRIER "" : : : "memory");	\

> -	__ret;								\

> -})

> -

> -#define __atomic_op_release(op, args...)				\

> -({									\

> -	__asm__ __volatile__(PPC_RELEASE_BARRIER "" : : : "memory");	\

> -	op##_relaxed(args);						\

> -})

> -

>  static __inline__ int atomic_read(const atomic_t *v)

>  {

>  	int t;

> @@ -213,6 +195,8 @@ static __inline__ int atomic_dec_return_relaxed(atomic_t *v)

>  	cmpxchg_relaxed(&((v)->counter), (o), (n))

>  #define atomic_cmpxchg_acquire(v, o, n) \

>  	cmpxchg_acquire(&((v)->counter), (o), (n))

> +#define atomic_cmpxchg_release(v, o, n) \

> +	cmpxchg_release(&((v)->counter), (o), (n))

>  

>  #define atomic_xchg(v, new) (xchg(&((v)->counter), new))

>  #define atomic_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))

> @@ -519,6 +503,8 @@ static __inline__ long atomic64_dec_if_positive(atomic64_t *v)

>  	cmpxchg_relaxed(&((v)->counter), (o), (n))

>  #define atomic64_cmpxchg_acquire(v, o, n) \

>  	cmpxchg_acquire(&((v)->counter), (o), (n))

> +#define atomic64_cmpxchg_release(v, o, n) \

> +	cmpxchg_release(&((v)->counter), (o), (n))

>  

>  #define atomic64_xchg(v, new) (xchg(&((v)->counter), new))

>  #define atomic64_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))

> diff --git a/arch/powerpc/include/asm/cmpxchg.h b/arch/powerpc/include/asm/cmpxchg.h

> index 9b001f1f6b32..e27a612b957f 100644

> --- a/arch/powerpc/include/asm/cmpxchg.h

> +++ b/arch/powerpc/include/asm/cmpxchg.h

> @@ -8,6 +8,24 @@

>  #include <asm/asm-compat.h>

>  #include <linux/bug.h>

>  

> +/*

> + * Since *_return_relaxed and {cmp}xchg_relaxed are implemented with

> + * a "bne-" instruction at the end, so an isync is enough as a acquire barrier

> + * on the platform without lwsync.

> + */

> +#define __atomic_op_acquire(op, args...)				\

> +({									\

> +	typeof(op##_relaxed(args)) __ret  = op##_relaxed(args);		\

> +	__asm__ __volatile__(PPC_ACQUIRE_BARRIER "" : : : "memory");	\

> +	__ret;								\

> +})

> +

> +#define __atomic_op_release(op, args...)				\

> +({									\

> +	__asm__ __volatile__(PPC_RELEASE_BARRIER "" : : : "memory");	\

> +	op##_relaxed(args);						\

> +})

> +

>  #ifdef __BIG_ENDIAN

>  #define BITOFF_CAL(size, off)	((sizeof(u32) - size - off) * BITS_PER_BYTE)

>  #else

> @@ -512,6 +530,9 @@ __cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new,

>  			(unsigned long)_o_, (unsigned long)_n_,		\

>  			sizeof(*(ptr)));				\

>  })

> +

> +#define cmpxchg_release(...) __atomic_op_release(cmpxchg, __VA_ARGS__)

> +

>  #ifdef CONFIG_PPC64

>  #define cmpxchg64(ptr, o, n)						\

>    ({									\

> @@ -533,6 +554,9 @@ __cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new,

>  	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\

>  	cmpxchg_acquire((ptr), (o), (n));				\

>  })

> +

> +#define cmpxchg64_release(...) __atomic_op_release(cmpxchg64, __VA_ARGS__)

> +

>  #else

>  #include <asm-generic/cmpxchg-local.h>

>  #define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n))
Benjamin Herrenschmidt May 6, 2018, 1:56 a.m. | #18
On Sat, 2018-05-05 at 12:00 +0200, Ingo Molnar wrote:
> This clearly suggests that PPC_RELEASE_BARRIER is in active use and 'lwsync' is 

> the 'release barrier' instruction, if I interpreted that right.


The closest to one we got.

The semantics are that it orders all load/store pairs to cachable
storage except store+load.

Cheers,
Ben.
Ingo Molnar May 6, 2018, 12:11 p.m. | #19
* Boqun Feng <boqun.feng@gmail.com> wrote:

> > The only change I made beyond a trivial build fix is that I also added the release 

> > atomics variants explicitly:

> > 

> > +#define atomic_cmpxchg_release(v, o, n) \

> > +	cmpxchg_release(&((v)->counter), (o), (n))

> > +#define atomic64_cmpxchg_release(v, o, n) \

> > +	cmpxchg_release(&((v)->counter), (o), (n))

> > 

> > It has passed a PowerPC cross-build test here, but no runtime tests.

> > 

> 

> Do you have the commit at any branch in tip tree? I could pull it and

> cross-build and check the assembly code of lib/atomic64_test.c, that way

> I could verify whether we mess something up.

> 

> > Does this patch look good to you?

> > 

> 

> Yep!


Great - I have pushed the commits out into the locking tree, they can be found in:

  git fetch git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/core

The PowerPC preparatory commit from you is:

  0476a632cb3a: locking/atomics/powerpc: Move cmpxchg helpers to asm/cmpxchg.h and define the full set of cmpxchg APIs

Thanks,

	Ingo
Andrea Parri May 6, 2018, 2:12 p.m. | #20
Hi Ingo,

> From 5affbf7e91901143f84f1b2ca64f4afe70e210fd Mon Sep 17 00:00:00 2001

> From: Ingo Molnar <mingo@kernel.org>

> Date: Sat, 5 May 2018 10:23:23 +0200

> Subject: [PATCH] locking/atomics: Simplify the op definitions in atomic.h some more

> 

> Before:

> 

>  #ifndef atomic_fetch_dec_relaxed

>  # ifndef atomic_fetch_dec

>  #  define atomic_fetch_dec(v)			atomic_fetch_sub(1, (v))

>  #  define atomic_fetch_dec_relaxed(v)		atomic_fetch_sub_relaxed(1, (v))

>  #  define atomic_fetch_dec_acquire(v)		atomic_fetch_sub_acquire(1, (v))

>  #  define atomic_fetch_dec_release(v)		atomic_fetch_sub_release(1, (v))

>  # else

>  #  define atomic_fetch_dec_relaxed		atomic_fetch_dec

>  #  define atomic_fetch_dec_acquire		atomic_fetch_dec

>  #  define atomic_fetch_dec_release		atomic_fetch_dec

>  # endif

>  #else

>  # ifndef atomic_fetch_dec_acquire

>  #  define atomic_fetch_dec_acquire(...)	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

>  # endif

>  # ifndef atomic_fetch_dec_release

>  #  define atomic_fetch_dec_release(...)	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

>  # endif

>  # ifndef atomic_fetch_dec

>  #  define atomic_fetch_dec(...)		__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

>  # endif

>  #endif

> 

> After:

> 

>  #ifndef atomic_fetch_dec_relaxed

>  # ifndef atomic_fetch_dec

>  #  define atomic_fetch_dec(v)			atomic_fetch_sub(1, (v))

>  #  define atomic_fetch_dec_relaxed(v)		atomic_fetch_sub_relaxed(1, (v))

>  #  define atomic_fetch_dec_acquire(v)		atomic_fetch_sub_acquire(1, (v))

>  #  define atomic_fetch_dec_release(v)		atomic_fetch_sub_release(1, (v))

>  # else

>  #  define atomic_fetch_dec_relaxed		atomic_fetch_dec

>  #  define atomic_fetch_dec_acquire		atomic_fetch_dec

>  #  define atomic_fetch_dec_release		atomic_fetch_dec

>  # endif

>  #else

>  # ifndef atomic_fetch_dec

>  #  define atomic_fetch_dec(...)		__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

>  #  define atomic_fetch_dec_acquire(...)	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

>  #  define atomic_fetch_dec_release(...)	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

>  # endif

>  #endif

> 

> The idea is that because we already group these APIs by certain defines

> such as atomic_fetch_dec_relaxed and atomic_fetch_dec in the primary

> branches - we can do the same in the secondary branch as well.

> 

> ( Also remove some unnecessarily duplicate comments, as the API

>   group defines are now pretty much self-documenting. )

> 

> No change in functionality.

> 

> Cc: Peter Zijlstra <peterz@infradead.org>

> Cc: Linus Torvalds <torvalds@linux-foundation.org>

> Cc: Andrew Morton <akpm@linux-foundation.org>

> Cc: Thomas Gleixner <tglx@linutronix.de>

> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Cc: Will Deacon <will.deacon@arm.com>

> Cc: linux-kernel@vger.kernel.org

> Signed-off-by: Ingo Molnar <mingo@kernel.org>


This breaks compilation on RISC-V. (For some of its atomics, the arch
currently defines the _relaxed and the full variants and it relies on
the generic definitions for the _acquire and the _release variants.)

  Andrea


> ---

>  include/linux/atomic.h | 312 ++++++++++---------------------------------------

>  1 file changed, 62 insertions(+), 250 deletions(-)

> 

> diff --git a/include/linux/atomic.h b/include/linux/atomic.h

> index 67aaafba256b..352ecc72d7f5 100644

> --- a/include/linux/atomic.h

> +++ b/include/linux/atomic.h

> @@ -71,98 +71,66 @@

>  })

>  #endif

>  

> -/* atomic_add_return_relaxed() et al: */

> -

>  #ifndef atomic_add_return_relaxed

>  # define atomic_add_return_relaxed		atomic_add_return

>  # define atomic_add_return_acquire		atomic_add_return

>  # define atomic_add_return_release		atomic_add_return

>  #else

> -# ifndef atomic_add_return_acquire

> -#  define atomic_add_return_acquire(...)	__atomic_op_acquire(atomic_add_return, __VA_ARGS__)

> -# endif

> -# ifndef atomic_add_return_release

> -#  define atomic_add_return_release(...)	__atomic_op_release(atomic_add_return, __VA_ARGS__)

> -# endif

>  # ifndef atomic_add_return

>  #  define atomic_add_return(...)		__atomic_op_fence(atomic_add_return, __VA_ARGS__)

> +#  define atomic_add_return_acquire(...)	__atomic_op_acquire(atomic_add_return, __VA_ARGS__)

> +#  define atomic_add_return_release(...)	__atomic_op_release(atomic_add_return, __VA_ARGS__)

>  # endif

>  #endif

>  

> -/* atomic_inc_return_relaxed() et al: */

> -

>  #ifndef atomic_inc_return_relaxed

>  # define atomic_inc_return_relaxed		atomic_inc_return

>  # define atomic_inc_return_acquire		atomic_inc_return

>  # define atomic_inc_return_release		atomic_inc_return

>  #else

> -# ifndef atomic_inc_return_acquire

> -#  define atomic_inc_return_acquire(...)	__atomic_op_acquire(atomic_inc_return, __VA_ARGS__)

> -# endif

> -# ifndef atomic_inc_return_release

> -#  define atomic_inc_return_release(...)	__atomic_op_release(atomic_inc_return, __VA_ARGS__)

> -# endif

>  # ifndef atomic_inc_return

>  #  define atomic_inc_return(...)		__atomic_op_fence(atomic_inc_return, __VA_ARGS__)

> +#  define atomic_inc_return_acquire(...)	__atomic_op_acquire(atomic_inc_return, __VA_ARGS__)

> +#  define atomic_inc_return_release(...)	__atomic_op_release(atomic_inc_return, __VA_ARGS__)

>  # endif

>  #endif

>  

> -/* atomic_sub_return_relaxed() et al: */

> -

>  #ifndef atomic_sub_return_relaxed

>  # define atomic_sub_return_relaxed		atomic_sub_return

>  # define atomic_sub_return_acquire		atomic_sub_return

>  # define atomic_sub_return_release		atomic_sub_return

>  #else

> -# ifndef atomic_sub_return_acquire

> -#  define atomic_sub_return_acquire(...)	__atomic_op_acquire(atomic_sub_return, __VA_ARGS__)

> -# endif

> -# ifndef atomic_sub_return_release

> -#  define atomic_sub_return_release(...)	__atomic_op_release(atomic_sub_return, __VA_ARGS__)

> -# endif

>  # ifndef atomic_sub_return

>  #  define atomic_sub_return(...)		__atomic_op_fence(atomic_sub_return, __VA_ARGS__)

> +#  define atomic_sub_return_acquire(...)	__atomic_op_acquire(atomic_sub_return, __VA_ARGS__)

> +#  define atomic_sub_return_release(...)	__atomic_op_release(atomic_sub_return, __VA_ARGS__)

>  # endif

>  #endif

>  

> -/* atomic_dec_return_relaxed() et al: */

> -

>  #ifndef atomic_dec_return_relaxed

>  # define atomic_dec_return_relaxed		atomic_dec_return

>  # define atomic_dec_return_acquire		atomic_dec_return

>  # define atomic_dec_return_release		atomic_dec_return

>  #else

> -# ifndef atomic_dec_return_acquire

> -#  define atomic_dec_return_acquire(...)	__atomic_op_acquire(atomic_dec_return, __VA_ARGS__)

> -# endif

> -# ifndef atomic_dec_return_release

> -#  define atomic_dec_return_release(...)	__atomic_op_release(atomic_dec_return, __VA_ARGS__)

> -# endif

>  # ifndef atomic_dec_return

>  #  define atomic_dec_return(...)		__atomic_op_fence(atomic_dec_return, __VA_ARGS__)

> +#  define atomic_dec_return_acquire(...)	__atomic_op_acquire(atomic_dec_return, __VA_ARGS__)

> +#  define atomic_dec_return_release(...)	__atomic_op_release(atomic_dec_return, __VA_ARGS__)

>  # endif

>  #endif

>  

> -/* atomic_fetch_add_relaxed() et al: */

> -

>  #ifndef atomic_fetch_add_relaxed

>  # define atomic_fetch_add_relaxed		atomic_fetch_add

>  # define atomic_fetch_add_acquire		atomic_fetch_add

>  # define atomic_fetch_add_release		atomic_fetch_add

>  #else

> -# ifndef atomic_fetch_add_acquire

> -#  define atomic_fetch_add_acquire(...)		__atomic_op_acquire(atomic_fetch_add, __VA_ARGS__)

> -# endif

> -# ifndef atomic_fetch_add_release

> -#  define atomic_fetch_add_release(...)		__atomic_op_release(atomic_fetch_add, __VA_ARGS__)

> -# endif

>  # ifndef atomic_fetch_add

>  #  define atomic_fetch_add(...)			__atomic_op_fence(atomic_fetch_add, __VA_ARGS__)

> +#  define atomic_fetch_add_acquire(...)		__atomic_op_acquire(atomic_fetch_add, __VA_ARGS__)

> +#  define atomic_fetch_add_release(...)		__atomic_op_release(atomic_fetch_add, __VA_ARGS__)

>  # endif

>  #endif

>  

> -/* atomic_fetch_inc_relaxed() et al: */

> -

>  #ifndef atomic_fetch_inc_relaxed

>  # ifndef atomic_fetch_inc

>  #  define atomic_fetch_inc(v)			atomic_fetch_add(1, (v))

> @@ -175,37 +143,25 @@

>  #  define atomic_fetch_inc_release		atomic_fetch_inc

>  # endif

>  #else

> -# ifndef atomic_fetch_inc_acquire

> -#  define atomic_fetch_inc_acquire(...)		__atomic_op_acquire(atomic_fetch_inc, __VA_ARGS__)

> -# endif

> -# ifndef atomic_fetch_inc_release

> -#  define atomic_fetch_inc_release(...)		__atomic_op_release(atomic_fetch_inc, __VA_ARGS__)

> -# endif

>  # ifndef atomic_fetch_inc

>  #  define atomic_fetch_inc(...)			__atomic_op_fence(atomic_fetch_inc, __VA_ARGS__)

> +#  define atomic_fetch_inc_acquire(...)		__atomic_op_acquire(atomic_fetch_inc, __VA_ARGS__)

> +#  define atomic_fetch_inc_release(...)		__atomic_op_release(atomic_fetch_inc, __VA_ARGS__)

>  # endif

>  #endif

>  

> -/* atomic_fetch_sub_relaxed() et al: */

> -

>  #ifndef atomic_fetch_sub_relaxed

>  # define atomic_fetch_sub_relaxed		atomic_fetch_sub

>  # define atomic_fetch_sub_acquire		atomic_fetch_sub

>  # define atomic_fetch_sub_release		atomic_fetch_sub

>  #else

> -# ifndef atomic_fetch_sub_acquire

> -#  define atomic_fetch_sub_acquire(...)		__atomic_op_acquire(atomic_fetch_sub, __VA_ARGS__)

> -# endif

> -# ifndef atomic_fetch_sub_release

> -#  define atomic_fetch_sub_release(...)		__atomic_op_release(atomic_fetch_sub, __VA_ARGS__)

> -# endif

>  # ifndef atomic_fetch_sub

>  #  define atomic_fetch_sub(...)			__atomic_op_fence(atomic_fetch_sub, __VA_ARGS__)

> +#  define atomic_fetch_sub_acquire(...)		__atomic_op_acquire(atomic_fetch_sub, __VA_ARGS__)

> +#  define atomic_fetch_sub_release(...)		__atomic_op_release(atomic_fetch_sub, __VA_ARGS__)

>  # endif

>  #endif

>  

> -/* atomic_fetch_dec_relaxed() et al: */

> -

>  #ifndef atomic_fetch_dec_relaxed

>  # ifndef atomic_fetch_dec

>  #  define atomic_fetch_dec(v)			atomic_fetch_sub(1, (v))

> @@ -218,127 +174,86 @@

>  #  define atomic_fetch_dec_release		atomic_fetch_dec

>  # endif

>  #else

> -# ifndef atomic_fetch_dec_acquire

> -#  define atomic_fetch_dec_acquire(...)		__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

> -# endif

> -# ifndef atomic_fetch_dec_release

> -#  define atomic_fetch_dec_release(...)		__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

> -# endif

>  # ifndef atomic_fetch_dec

>  #  define atomic_fetch_dec(...)			__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

> +#  define atomic_fetch_dec_acquire(...)		__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

> +#  define atomic_fetch_dec_release(...)		__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

>  # endif

>  #endif

>  

> -/* atomic_fetch_or_relaxed() et al: */

> -

>  #ifndef atomic_fetch_or_relaxed

>  # define atomic_fetch_or_relaxed		atomic_fetch_or

>  # define atomic_fetch_or_acquire		atomic_fetch_or

>  # define atomic_fetch_or_release		atomic_fetch_or

>  #else

> -# ifndef atomic_fetch_or_acquire

> -#  define atomic_fetch_or_acquire(...)		__atomic_op_acquire(atomic_fetch_or, __VA_ARGS__)

> -# endif

> -# ifndef atomic_fetch_or_release

> -#  define atomic_fetch_or_release(...)		__atomic_op_release(atomic_fetch_or, __VA_ARGS__)

> -# endif

>  # ifndef atomic_fetch_or

>  #  define atomic_fetch_or(...)			__atomic_op_fence(atomic_fetch_or, __VA_ARGS__)

> +#  define atomic_fetch_or_acquire(...)		__atomic_op_acquire(atomic_fetch_or, __VA_ARGS__)

> +#  define atomic_fetch_or_release(...)		__atomic_op_release(atomic_fetch_or, __VA_ARGS__)

>  # endif

>  #endif

>  

> -/* atomic_fetch_and_relaxed() et al: */

> -

>  #ifndef atomic_fetch_and_relaxed

>  # define atomic_fetch_and_relaxed		atomic_fetch_and

>  # define atomic_fetch_and_acquire		atomic_fetch_and

>  # define atomic_fetch_and_release		atomic_fetch_and

>  #else

> -# ifndef atomic_fetch_and_acquire

> -#  define atomic_fetch_and_acquire(...)		__atomic_op_acquire(atomic_fetch_and, __VA_ARGS__)

> -# endif

> -# ifndef atomic_fetch_and_release

> -#  define atomic_fetch_and_release(...)		__atomic_op_release(atomic_fetch_and, __VA_ARGS__)

> -# endif

>  # ifndef atomic_fetch_and

>  #  define atomic_fetch_and(...)			__atomic_op_fence(atomic_fetch_and, __VA_ARGS__)

> +#  define atomic_fetch_and_acquire(...)		__atomic_op_acquire(atomic_fetch_and, __VA_ARGS__)

> +#  define atomic_fetch_and_release(...)		__atomic_op_release(atomic_fetch_and, __VA_ARGS__)

>  # endif

>  #endif

>  

>  #ifdef atomic_andnot

>  

> -/* atomic_fetch_andnot_relaxed() et al: */

> -

>  #ifndef atomic_fetch_andnot_relaxed

>  # define atomic_fetch_andnot_relaxed		atomic_fetch_andnot

>  # define atomic_fetch_andnot_acquire		atomic_fetch_andnot

>  # define atomic_fetch_andnot_release		atomic_fetch_andnot

>  #else

> -# ifndef atomic_fetch_andnot_acquire

> -#  define atomic_fetch_andnot_acquire(...)	__atomic_op_acquire(atomic_fetch_andnot, __VA_ARGS__)

> -# endif

> -# ifndef atomic_fetch_andnot_release

> -#  define atomic_fetch_andnot_release(...)	__atomic_op_release(atomic_fetch_andnot, __VA_ARGS__)

> -# endif

>  # ifndef atomic_fetch_andnot

>  #  define atomic_fetch_andnot(...)		__atomic_op_fence(atomic_fetch_andnot, __VA_ARGS__)

> +#  define atomic_fetch_andnot_acquire(...)	__atomic_op_acquire(atomic_fetch_andnot, __VA_ARGS__)

> +#  define atomic_fetch_andnot_release(...)	__atomic_op_release(atomic_fetch_andnot, __VA_ARGS__)

>  # endif

>  #endif

>  

>  #endif /* atomic_andnot */

>  

> -/* atomic_fetch_xor_relaxed() et al: */

> -

>  #ifndef atomic_fetch_xor_relaxed

>  # define atomic_fetch_xor_relaxed		atomic_fetch_xor

>  # define atomic_fetch_xor_acquire		atomic_fetch_xor

>  # define atomic_fetch_xor_release		atomic_fetch_xor

>  #else

> -# ifndef atomic_fetch_xor_acquire

> -#  define atomic_fetch_xor_acquire(...)		__atomic_op_acquire(atomic_fetch_xor, __VA_ARGS__)

> -# endif

> -# ifndef atomic_fetch_xor_release

> -#  define atomic_fetch_xor_release(...)		__atomic_op_release(atomic_fetch_xor, __VA_ARGS__)

> -# endif

>  # ifndef atomic_fetch_xor

>  #  define atomic_fetch_xor(...)			__atomic_op_fence(atomic_fetch_xor, __VA_ARGS__)

> +#  define atomic_fetch_xor_acquire(...)		__atomic_op_acquire(atomic_fetch_xor, __VA_ARGS__)

> +#  define atomic_fetch_xor_release(...)		__atomic_op_release(atomic_fetch_xor, __VA_ARGS__)

>  # endif

>  #endif

>  

> -

> -/* atomic_xchg_relaxed() et al: */

> -

>  #ifndef atomic_xchg_relaxed

>  #define atomic_xchg_relaxed			atomic_xchg

>  #define atomic_xchg_acquire			atomic_xchg

>  #define atomic_xchg_release			atomic_xchg

>  #else

> -# ifndef atomic_xchg_acquire

> -#  define atomic_xchg_acquire(...)		__atomic_op_acquire(atomic_xchg, __VA_ARGS__)

> -# endif

> -# ifndef atomic_xchg_release

> -#  define atomic_xchg_release(...)		__atomic_op_release(atomic_xchg, __VA_ARGS__)

> -# endif

>  # ifndef atomic_xchg

>  #  define atomic_xchg(...)			__atomic_op_fence(atomic_xchg, __VA_ARGS__)

> +#  define atomic_xchg_acquire(...)		__atomic_op_acquire(atomic_xchg, __VA_ARGS__)

> +#  define atomic_xchg_release(...)		__atomic_op_release(atomic_xchg, __VA_ARGS__)

>  # endif

>  #endif

>  

> -/* atomic_cmpxchg_relaxed() et al: */

> -

>  #ifndef atomic_cmpxchg_relaxed

>  # define atomic_cmpxchg_relaxed			atomic_cmpxchg

>  # define atomic_cmpxchg_acquire			atomic_cmpxchg

>  # define atomic_cmpxchg_release			atomic_cmpxchg

>  #else

> -# ifndef atomic_cmpxchg_acquire

> -#  define atomic_cmpxchg_acquire(...)		__atomic_op_acquire(atomic_cmpxchg, __VA_ARGS__)

> -# endif

> -# ifndef atomic_cmpxchg_release

> -#  define atomic_cmpxchg_release(...)		__atomic_op_release(atomic_cmpxchg, __VA_ARGS__)

> -# endif

>  # ifndef atomic_cmpxchg

>  #  define atomic_cmpxchg(...)			__atomic_op_fence(atomic_cmpxchg, __VA_ARGS__)

> +#  define atomic_cmpxchg_acquire(...)		__atomic_op_acquire(atomic_cmpxchg, __VA_ARGS__)

> +#  define atomic_cmpxchg_release(...)		__atomic_op_release(atomic_cmpxchg, __VA_ARGS__)

>  # endif

>  #endif

>  

> @@ -362,57 +277,39 @@

>  # define atomic_try_cmpxchg_release		atomic_try_cmpxchg

>  #endif

>  

> -/* cmpxchg_relaxed() et al: */

> -

>  #ifndef cmpxchg_relaxed

>  # define cmpxchg_relaxed			cmpxchg

>  # define cmpxchg_acquire			cmpxchg

>  # define cmpxchg_release			cmpxchg

>  #else

> -# ifndef cmpxchg_acquire

> -#  define cmpxchg_acquire(...)			__atomic_op_acquire(cmpxchg, __VA_ARGS__)

> -# endif

> -# ifndef cmpxchg_release

> -#  define cmpxchg_release(...)			__atomic_op_release(cmpxchg, __VA_ARGS__)

> -# endif

>  # ifndef cmpxchg

>  #  define cmpxchg(...)				__atomic_op_fence(cmpxchg, __VA_ARGS__)

> +#  define cmpxchg_acquire(...)			__atomic_op_acquire(cmpxchg, __VA_ARGS__)

> +#  define cmpxchg_release(...)			__atomic_op_release(cmpxchg, __VA_ARGS__)

>  # endif

>  #endif

>  

> -/* cmpxchg64_relaxed() et al: */

> -

>  #ifndef cmpxchg64_relaxed

>  # define cmpxchg64_relaxed			cmpxchg64

>  # define cmpxchg64_acquire			cmpxchg64

>  # define cmpxchg64_release			cmpxchg64

>  #else

> -# ifndef cmpxchg64_acquire

> -#  define cmpxchg64_acquire(...)		__atomic_op_acquire(cmpxchg64, __VA_ARGS__)

> -# endif

> -# ifndef cmpxchg64_release

> -#  define cmpxchg64_release(...)		__atomic_op_release(cmpxchg64, __VA_ARGS__)

> -# endif

>  # ifndef cmpxchg64

>  #  define cmpxchg64(...)			__atomic_op_fence(cmpxchg64, __VA_ARGS__)

> +#  define cmpxchg64_acquire(...)		__atomic_op_acquire(cmpxchg64, __VA_ARGS__)

> +#  define cmpxchg64_release(...)		__atomic_op_release(cmpxchg64, __VA_ARGS__)

>  # endif

>  #endif

>  

> -/* xchg_relaxed() et al: */

> -

>  #ifndef xchg_relaxed

>  # define xchg_relaxed				xchg

>  # define xchg_acquire				xchg

>  # define xchg_release				xchg

>  #else

> -# ifndef xchg_acquire

> -#  define xchg_acquire(...)			__atomic_op_acquire(xchg, __VA_ARGS__)

> -# endif

> -# ifndef xchg_release

> -#  define xchg_release(...)			__atomic_op_release(xchg, __VA_ARGS__)

> -# endif

>  # ifndef xchg

>  #  define xchg(...)				__atomic_op_fence(xchg, __VA_ARGS__)

> +#  define xchg_acquire(...)			__atomic_op_acquire(xchg, __VA_ARGS__)

> +#  define xchg_release(...)			__atomic_op_release(xchg, __VA_ARGS__)

>  # endif

>  #endif

>  

> @@ -569,98 +466,66 @@ static inline int atomic_dec_if_positive(atomic_t *v)

>  # define atomic64_set_release(v, i)		smp_store_release(&(v)->counter, (i))

>  #endif

>  

> -/* atomic64_add_return_relaxed() et al: */

> -

>  #ifndef atomic64_add_return_relaxed

>  # define atomic64_add_return_relaxed		atomic64_add_return

>  # define atomic64_add_return_acquire		atomic64_add_return

>  # define atomic64_add_return_release		atomic64_add_return

>  #else

> -# ifndef atomic64_add_return_acquire

> -#  define atomic64_add_return_acquire(...)	__atomic_op_acquire(atomic64_add_return, __VA_ARGS__)

> -# endif

> -# ifndef atomic64_add_return_release

> -#  define atomic64_add_return_release(...)	__atomic_op_release(atomic64_add_return, __VA_ARGS__)

> -# endif

>  # ifndef atomic64_add_return

>  #  define atomic64_add_return(...)		__atomic_op_fence(atomic64_add_return, __VA_ARGS__)

> +#  define atomic64_add_return_acquire(...)	__atomic_op_acquire(atomic64_add_return, __VA_ARGS__)

> +#  define atomic64_add_return_release(...)	__atomic_op_release(atomic64_add_return, __VA_ARGS__)

>  # endif

>  #endif

>  

> -/* atomic64_inc_return_relaxed() et al: */

> -

>  #ifndef atomic64_inc_return_relaxed

>  # define atomic64_inc_return_relaxed		atomic64_inc_return

>  # define atomic64_inc_return_acquire		atomic64_inc_return

>  # define atomic64_inc_return_release		atomic64_inc_return

>  #else

> -# ifndef atomic64_inc_return_acquire

> -#  define atomic64_inc_return_acquire(...)	__atomic_op_acquire(atomic64_inc_return, __VA_ARGS__)

> -# endif

> -# ifndef atomic64_inc_return_release

> -#  define atomic64_inc_return_release(...)	__atomic_op_release(atomic64_inc_return, __VA_ARGS__)

> -# endif

>  # ifndef atomic64_inc_return

>  #  define atomic64_inc_return(...)		__atomic_op_fence(atomic64_inc_return, __VA_ARGS__)

> +#  define atomic64_inc_return_acquire(...)	__atomic_op_acquire(atomic64_inc_return, __VA_ARGS__)

> +#  define atomic64_inc_return_release(...)	__atomic_op_release(atomic64_inc_return, __VA_ARGS__)

>  # endif

>  #endif

>  

> -/* atomic64_sub_return_relaxed() et al: */

> -

>  #ifndef atomic64_sub_return_relaxed

>  # define atomic64_sub_return_relaxed		atomic64_sub_return

>  # define atomic64_sub_return_acquire		atomic64_sub_return

>  # define atomic64_sub_return_release		atomic64_sub_return

>  #else

> -# ifndef atomic64_sub_return_acquire

> -#  define atomic64_sub_return_acquire(...)	__atomic_op_acquire(atomic64_sub_return, __VA_ARGS__)

> -# endif

> -# ifndef atomic64_sub_return_release

> -#  define atomic64_sub_return_release(...)	__atomic_op_release(atomic64_sub_return, __VA_ARGS__)

> -# endif

>  # ifndef atomic64_sub_return

>  #  define atomic64_sub_return(...)		__atomic_op_fence(atomic64_sub_return, __VA_ARGS__)

> +#  define atomic64_sub_return_acquire(...)	__atomic_op_acquire(atomic64_sub_return, __VA_ARGS__)

> +#  define atomic64_sub_return_release(...)	__atomic_op_release(atomic64_sub_return, __VA_ARGS__)

>  # endif

>  #endif

>  

> -/* atomic64_dec_return_relaxed() et al: */

> -

>  #ifndef atomic64_dec_return_relaxed

>  # define atomic64_dec_return_relaxed		atomic64_dec_return

>  # define atomic64_dec_return_acquire		atomic64_dec_return

>  # define atomic64_dec_return_release		atomic64_dec_return

>  #else

> -# ifndef atomic64_dec_return_acquire

> -#  define atomic64_dec_return_acquire(...)	__atomic_op_acquire(atomic64_dec_return, __VA_ARGS__)

> -# endif

> -# ifndef atomic64_dec_return_release

> -#  define atomic64_dec_return_release(...)	__atomic_op_release(atomic64_dec_return, __VA_ARGS__)

> -# endif

>  # ifndef atomic64_dec_return

>  #  define atomic64_dec_return(...)		__atomic_op_fence(atomic64_dec_return, __VA_ARGS__)

> +#  define atomic64_dec_return_acquire(...)	__atomic_op_acquire(atomic64_dec_return, __VA_ARGS__)

> +#  define atomic64_dec_return_release(...)	__atomic_op_release(atomic64_dec_return, __VA_ARGS__)

>  # endif

>  #endif

>  

> -/* atomic64_fetch_add_relaxed() et al: */

> -

>  #ifndef atomic64_fetch_add_relaxed

>  # define atomic64_fetch_add_relaxed		atomic64_fetch_add

>  # define atomic64_fetch_add_acquire		atomic64_fetch_add

>  # define atomic64_fetch_add_release		atomic64_fetch_add

>  #else

> -# ifndef atomic64_fetch_add_acquire

> -#  define atomic64_fetch_add_acquire(...)	__atomic_op_acquire(atomic64_fetch_add, __VA_ARGS__)

> -# endif

> -# ifndef atomic64_fetch_add_release

> -#  define atomic64_fetch_add_release(...)	__atomic_op_release(atomic64_fetch_add, __VA_ARGS__)

> -# endif

>  # ifndef atomic64_fetch_add

>  #  define atomic64_fetch_add(...)		__atomic_op_fence(atomic64_fetch_add, __VA_ARGS__)

> +#  define atomic64_fetch_add_acquire(...)	__atomic_op_acquire(atomic64_fetch_add, __VA_ARGS__)

> +#  define atomic64_fetch_add_release(...)	__atomic_op_release(atomic64_fetch_add, __VA_ARGS__)

>  # endif

>  #endif

>  

> -/* atomic64_fetch_inc_relaxed() et al: */

> -

>  #ifndef atomic64_fetch_inc_relaxed

>  # ifndef atomic64_fetch_inc

>  #  define atomic64_fetch_inc(v)			atomic64_fetch_add(1, (v))

> @@ -673,37 +538,25 @@ static inline int atomic_dec_if_positive(atomic_t *v)

>  #  define atomic64_fetch_inc_release		atomic64_fetch_inc

>  # endif

>  #else

> -# ifndef atomic64_fetch_inc_acquire

> -#  define atomic64_fetch_inc_acquire(...)	__atomic_op_acquire(atomic64_fetch_inc, __VA_ARGS__)

> -# endif

> -# ifndef atomic64_fetch_inc_release

> -#  define atomic64_fetch_inc_release(...)	__atomic_op_release(atomic64_fetch_inc, __VA_ARGS__)

> -# endif

>  # ifndef atomic64_fetch_inc

>  #  define atomic64_fetch_inc(...)		__atomic_op_fence(atomic64_fetch_inc, __VA_ARGS__)

> +#  define atomic64_fetch_inc_acquire(...)	__atomic_op_acquire(atomic64_fetch_inc, __VA_ARGS__)

> +#  define atomic64_fetch_inc_release(...)	__atomic_op_release(atomic64_fetch_inc, __VA_ARGS__)

>  # endif

>  #endif

>  

> -/* atomic64_fetch_sub_relaxed() et al: */

> -

>  #ifndef atomic64_fetch_sub_relaxed

>  # define atomic64_fetch_sub_relaxed		atomic64_fetch_sub

>  # define atomic64_fetch_sub_acquire		atomic64_fetch_sub

>  # define atomic64_fetch_sub_release		atomic64_fetch_sub

>  #else

> -# ifndef atomic64_fetch_sub_acquire

> -#  define atomic64_fetch_sub_acquire(...)	__atomic_op_acquire(atomic64_fetch_sub, __VA_ARGS__)

> -# endif

> -# ifndef atomic64_fetch_sub_release

> -#  define atomic64_fetch_sub_release(...)	__atomic_op_release(atomic64_fetch_sub, __VA_ARGS__)

> -# endif

>  # ifndef atomic64_fetch_sub

>  #  define atomic64_fetch_sub(...)		__atomic_op_fence(atomic64_fetch_sub, __VA_ARGS__)

> +#  define atomic64_fetch_sub_acquire(...)	__atomic_op_acquire(atomic64_fetch_sub, __VA_ARGS__)

> +#  define atomic64_fetch_sub_release(...)	__atomic_op_release(atomic64_fetch_sub, __VA_ARGS__)

>  # endif

>  #endif

>  

> -/* atomic64_fetch_dec_relaxed() et al: */

> -

>  #ifndef atomic64_fetch_dec_relaxed

>  # ifndef atomic64_fetch_dec

>  #  define atomic64_fetch_dec(v)			atomic64_fetch_sub(1, (v))

> @@ -716,127 +569,86 @@ static inline int atomic_dec_if_positive(atomic_t *v)

>  #  define atomic64_fetch_dec_release		atomic64_fetch_dec

>  # endif

>  #else

> -# ifndef atomic64_fetch_dec_acquire

> -#  define atomic64_fetch_dec_acquire(...)	__atomic_op_acquire(atomic64_fetch_dec, __VA_ARGS__)

> -# endif

> -# ifndef atomic64_fetch_dec_release

> -#  define atomic64_fetch_dec_release(...)	__atomic_op_release(atomic64_fetch_dec, __VA_ARGS__)

> -# endif

>  # ifndef atomic64_fetch_dec

>  #  define atomic64_fetch_dec(...)		__atomic_op_fence(atomic64_fetch_dec, __VA_ARGS__)

> +#  define atomic64_fetch_dec_acquire(...)	__atomic_op_acquire(atomic64_fetch_dec, __VA_ARGS__)

> +#  define atomic64_fetch_dec_release(...)	__atomic_op_release(atomic64_fetch_dec, __VA_ARGS__)

>  # endif

>  #endif

>  

> -/* atomic64_fetch_or_relaxed() et al: */

> -

>  #ifndef atomic64_fetch_or_relaxed

>  # define atomic64_fetch_or_relaxed		atomic64_fetch_or

>  # define atomic64_fetch_or_acquire		atomic64_fetch_or

>  # define atomic64_fetch_or_release		atomic64_fetch_or

>  #else

> -# ifndef atomic64_fetch_or_acquire

> -#  define atomic64_fetch_or_acquire(...)	__atomic_op_acquire(atomic64_fetch_or, __VA_ARGS__)

> -# endif

> -# ifndef atomic64_fetch_or_release

> -#  define atomic64_fetch_or_release(...)	__atomic_op_release(atomic64_fetch_or, __VA_ARGS__)

> -# endif

>  # ifndef atomic64_fetch_or

>  #  define atomic64_fetch_or(...)		__atomic_op_fence(atomic64_fetch_or, __VA_ARGS__)

> +#  define atomic64_fetch_or_acquire(...)	__atomic_op_acquire(atomic64_fetch_or, __VA_ARGS__)

> +#  define atomic64_fetch_or_release(...)	__atomic_op_release(atomic64_fetch_or, __VA_ARGS__)

>  # endif

>  #endif

>  

> -

> -/* atomic64_fetch_and_relaxed() et al: */

> -

>  #ifndef atomic64_fetch_and_relaxed

>  # define atomic64_fetch_and_relaxed		atomic64_fetch_and

>  # define atomic64_fetch_and_acquire		atomic64_fetch_and

>  # define atomic64_fetch_and_release		atomic64_fetch_and

>  #else

> -# ifndef atomic64_fetch_and_acquire

> -#  define atomic64_fetch_and_acquire(...)	__atomic_op_acquire(atomic64_fetch_and, __VA_ARGS__)

> -# endif

> -# ifndef atomic64_fetch_and_release

> -#  define atomic64_fetch_and_release(...)	__atomic_op_release(atomic64_fetch_and, __VA_ARGS__)

> -# endif

>  # ifndef atomic64_fetch_and

>  #  define atomic64_fetch_and(...)		__atomic_op_fence(atomic64_fetch_and, __VA_ARGS__)

> +#  define atomic64_fetch_and_acquire(...)	__atomic_op_acquire(atomic64_fetch_and, __VA_ARGS__)

> +#  define atomic64_fetch_and_release(...)	__atomic_op_release(atomic64_fetch_and, __VA_ARGS__)

>  # endif

>  #endif

>  

>  #ifdef atomic64_andnot

>  

> -/* atomic64_fetch_andnot_relaxed() et al: */

> -

>  #ifndef atomic64_fetch_andnot_relaxed

>  # define atomic64_fetch_andnot_relaxed		atomic64_fetch_andnot

>  # define atomic64_fetch_andnot_acquire		atomic64_fetch_andnot

>  # define atomic64_fetch_andnot_release		atomic64_fetch_andnot

>  #else

> -# ifndef atomic64_fetch_andnot_acquire

> -#  define atomic64_fetch_andnot_acquire(...)	__atomic_op_acquire(atomic64_fetch_andnot, __VA_ARGS__)

> -# endif

> -# ifndef atomic64_fetch_andnot_release

> -#  define atomic64_fetch_andnot_release(...)	__atomic_op_release(atomic64_fetch_andnot, __VA_ARGS__)

> -# endif

>  # ifndef atomic64_fetch_andnot

>  #  define atomic64_fetch_andnot(...)		__atomic_op_fence(atomic64_fetch_andnot, __VA_ARGS__)

> +#  define atomic64_fetch_andnot_acquire(...)	__atomic_op_acquire(atomic64_fetch_andnot, __VA_ARGS__)

> +#  define atomic64_fetch_andnot_release(...)	__atomic_op_release(atomic64_fetch_andnot, __VA_ARGS__)

>  # endif

>  #endif

>  

>  #endif /* atomic64_andnot */

>  

> -/* atomic64_fetch_xor_relaxed() et al: */

> -

>  #ifndef atomic64_fetch_xor_relaxed

>  # define atomic64_fetch_xor_relaxed		atomic64_fetch_xor

>  # define atomic64_fetch_xor_acquire		atomic64_fetch_xor

>  # define atomic64_fetch_xor_release		atomic64_fetch_xor

>  #else

> -# ifndef atomic64_fetch_xor_acquire

> -#  define atomic64_fetch_xor_acquire(...)	__atomic_op_acquire(atomic64_fetch_xor, __VA_ARGS__)

> -# endif

> -# ifndef atomic64_fetch_xor_release

> -#  define atomic64_fetch_xor_release(...)	__atomic_op_release(atomic64_fetch_xor, __VA_ARGS__)

> -# endif

>  # ifndef atomic64_fetch_xor

>  #  define atomic64_fetch_xor(...)		__atomic_op_fence(atomic64_fetch_xor, __VA_ARGS__)

> +#  define atomic64_fetch_xor_acquire(...)	__atomic_op_acquire(atomic64_fetch_xor, __VA_ARGS__)

> +#  define atomic64_fetch_xor_release(...)	__atomic_op_release(atomic64_fetch_xor, __VA_ARGS__)

>  # endif

>  #endif

>  

> -/* atomic64_xchg_relaxed() et al: */

> -

>  #ifndef atomic64_xchg_relaxed

>  # define atomic64_xchg_relaxed			atomic64_xchg

>  # define atomic64_xchg_acquire			atomic64_xchg

>  # define atomic64_xchg_release			atomic64_xchg

>  #else

> -# ifndef atomic64_xchg_acquire

> -#  define atomic64_xchg_acquire(...)		__atomic_op_acquire(atomic64_xchg, __VA_ARGS__)

> -# endif

> -# ifndef atomic64_xchg_release

> -#  define atomic64_xchg_release(...)		__atomic_op_release(atomic64_xchg, __VA_ARGS__)

> -# endif

>  # ifndef atomic64_xchg

>  #  define atomic64_xchg(...)			__atomic_op_fence(atomic64_xchg, __VA_ARGS__)

> +#  define atomic64_xchg_acquire(...)		__atomic_op_acquire(atomic64_xchg, __VA_ARGS__)

> +#  define atomic64_xchg_release(...)		__atomic_op_release(atomic64_xchg, __VA_ARGS__)

>  # endif

>  #endif

>  

> -/* atomic64_cmpxchg_relaxed() et al: */

> -

>  #ifndef atomic64_cmpxchg_relaxed

>  # define atomic64_cmpxchg_relaxed		atomic64_cmpxchg

>  # define atomic64_cmpxchg_acquire		atomic64_cmpxchg

>  # define atomic64_cmpxchg_release		atomic64_cmpxchg

>  #else

> -# ifndef atomic64_cmpxchg_acquire

> -#  define atomic64_cmpxchg_acquire(...)		__atomic_op_acquire(atomic64_cmpxchg, __VA_ARGS__)

> -# endif

> -# ifndef atomic64_cmpxchg_release

> -#  define atomic64_cmpxchg_release(...)		__atomic_op_release(atomic64_cmpxchg, __VA_ARGS__)

> -# endif

>  # ifndef atomic64_cmpxchg

>  #  define atomic64_cmpxchg(...)			__atomic_op_fence(atomic64_cmpxchg, __VA_ARGS__)

> +#  define atomic64_cmpxchg_acquire(...)		__atomic_op_acquire(atomic64_cmpxchg, __VA_ARGS__)

> +#  define atomic64_cmpxchg_release(...)		__atomic_op_release(atomic64_cmpxchg, __VA_ARGS__)

>  # endif

>  #endif

>
Andrea Parri May 6, 2018, 2:15 p.m. | #21
Hi Ingo,

> From f5efafa83af8c46b9e81b010b46caeeadb450179 Mon Sep 17 00:00:00 2001

> From: Ingo Molnar <mingo@kernel.org>

> Date: Sat, 5 May 2018 10:46:41 +0200

> Subject: [PATCH] locking/atomics: Combine the atomic_andnot() and atomic64_andnot() API definitions

> 

> The atomic_andnot() and atomic64_andnot() are defined in 4 separate groups

> spred out in the atomic.h header:

> 

>  #ifdef atomic_andnot

>  ...

>  #endif /* atomic_andnot */

>  ...

>  #ifndef atomic_andnot

>  ...

>  #endif

>  ...

>  #ifdef atomic64_andnot

>  ...

>  #endif /* atomic64_andnot */

>  ...

>  #ifndef atomic64_andnot

>  ...

>  #endif

> 

> Combine them into unify them into two groups:


Nit: "Combine them into unify them into"

  Andrea


> 

>  #ifdef atomic_andnot

>  #else

>  #endif

> 

>  ...

> 

>  #ifdef atomic64_andnot

>  #else

>  #endif

> 

> So that one API group is defined in a single place within the header.

> 

> Cc: Peter Zijlstra <peterz@infradead.org>

> Cc: Linus Torvalds <torvalds@linux-foundation.org>

> Cc: Andrew Morton <akpm@linux-foundation.org>

> Cc: Thomas Gleixner <tglx@linutronix.de>

> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Cc: Will Deacon <will.deacon@arm.com>

> Cc: linux-kernel@vger.kernel.org

> Signed-off-by: Ingo Molnar <mingo@kernel.org>

> ---

>  include/linux/atomic.h | 72 +++++++++++++++++++++++++-------------------------

>  1 file changed, 36 insertions(+), 36 deletions(-)

> 

> diff --git a/include/linux/atomic.h b/include/linux/atomic.h

> index 352ecc72d7f5..1176cf7c6f03 100644

> --- a/include/linux/atomic.h

> +++ b/include/linux/atomic.h

> @@ -205,22 +205,6 @@

>  # endif

>  #endif

>  

> -#ifdef atomic_andnot

> -

> -#ifndef atomic_fetch_andnot_relaxed

> -# define atomic_fetch_andnot_relaxed		atomic_fetch_andnot

> -# define atomic_fetch_andnot_acquire		atomic_fetch_andnot

> -# define atomic_fetch_andnot_release		atomic_fetch_andnot

> -#else

> -# ifndef atomic_fetch_andnot

> -#  define atomic_fetch_andnot(...)		__atomic_op_fence(atomic_fetch_andnot, __VA_ARGS__)

> -#  define atomic_fetch_andnot_acquire(...)	__atomic_op_acquire(atomic_fetch_andnot, __VA_ARGS__)

> -#  define atomic_fetch_andnot_release(...)	__atomic_op_release(atomic_fetch_andnot, __VA_ARGS__)

> -# endif

> -#endif

> -

> -#endif /* atomic_andnot */

> -

>  #ifndef atomic_fetch_xor_relaxed

>  # define atomic_fetch_xor_relaxed		atomic_fetch_xor

>  # define atomic_fetch_xor_acquire		atomic_fetch_xor

> @@ -338,7 +322,22 @@ static inline int atomic_add_unless(atomic_t *v, int a, int u)

>  # define atomic_inc_not_zero(v)			atomic_add_unless((v), 1, 0)

>  #endif

>  

> -#ifndef atomic_andnot

> +#ifdef atomic_andnot

> +

> +#ifndef atomic_fetch_andnot_relaxed

> +# define atomic_fetch_andnot_relaxed		atomic_fetch_andnot

> +# define atomic_fetch_andnot_acquire		atomic_fetch_andnot

> +# define atomic_fetch_andnot_release		atomic_fetch_andnot

> +#else

> +# ifndef atomic_fetch_andnot

> +#  define atomic_fetch_andnot(...)		__atomic_op_fence(atomic_fetch_andnot, __VA_ARGS__)

> +#  define atomic_fetch_andnot_acquire(...)	__atomic_op_acquire(atomic_fetch_andnot, __VA_ARGS__)

> +#  define atomic_fetch_andnot_release(...)	__atomic_op_release(atomic_fetch_andnot, __VA_ARGS__)

> +# endif

> +#endif

> +

> +#else /* !atomic_andnot: */

> +

>  static inline void atomic_andnot(int i, atomic_t *v)

>  {

>  	atomic_and(~i, v);

> @@ -363,7 +362,8 @@ static inline int atomic_fetch_andnot_release(int i, atomic_t *v)

>  {

>  	return atomic_fetch_and_release(~i, v);

>  }

> -#endif

> +

> +#endif /* !atomic_andnot */

>  

>  /**

>   * atomic_inc_not_zero_hint - increment if not null

> @@ -600,22 +600,6 @@ static inline int atomic_dec_if_positive(atomic_t *v)

>  # endif

>  #endif

>  

> -#ifdef atomic64_andnot

> -

> -#ifndef atomic64_fetch_andnot_relaxed

> -# define atomic64_fetch_andnot_relaxed		atomic64_fetch_andnot

> -# define atomic64_fetch_andnot_acquire		atomic64_fetch_andnot

> -# define atomic64_fetch_andnot_release		atomic64_fetch_andnot

> -#else

> -# ifndef atomic64_fetch_andnot

> -#  define atomic64_fetch_andnot(...)		__atomic_op_fence(atomic64_fetch_andnot, __VA_ARGS__)

> -#  define atomic64_fetch_andnot_acquire(...)	__atomic_op_acquire(atomic64_fetch_andnot, __VA_ARGS__)

> -#  define atomic64_fetch_andnot_release(...)	__atomic_op_release(atomic64_fetch_andnot, __VA_ARGS__)

> -# endif

> -#endif

> -

> -#endif /* atomic64_andnot */

> -

>  #ifndef atomic64_fetch_xor_relaxed

>  # define atomic64_fetch_xor_relaxed		atomic64_fetch_xor

>  # define atomic64_fetch_xor_acquire		atomic64_fetch_xor

> @@ -672,7 +656,22 @@ static inline int atomic_dec_if_positive(atomic_t *v)

>  # define atomic64_try_cmpxchg_release		atomic64_try_cmpxchg

>  #endif

>  

> -#ifndef atomic64_andnot

> +#ifdef atomic64_andnot

> +

> +#ifndef atomic64_fetch_andnot_relaxed

> +# define atomic64_fetch_andnot_relaxed		atomic64_fetch_andnot

> +# define atomic64_fetch_andnot_acquire		atomic64_fetch_andnot

> +# define atomic64_fetch_andnot_release		atomic64_fetch_andnot

> +#else

> +# ifndef atomic64_fetch_andnot

> +#  define atomic64_fetch_andnot(...)		__atomic_op_fence(atomic64_fetch_andnot, __VA_ARGS__)

> +#  define atomic64_fetch_andnot_acquire(...)	__atomic_op_acquire(atomic64_fetch_andnot, __VA_ARGS__)

> +#  define atomic64_fetch_andnot_release(...)	__atomic_op_release(atomic64_fetch_andnot, __VA_ARGS__)

> +# endif

> +#endif

> +

> +#else /* !atomic64_andnot: */

> +

>  static inline void atomic64_andnot(long long i, atomic64_t *v)

>  {

>  	atomic64_and(~i, v);

> @@ -697,7 +696,8 @@ static inline long long atomic64_fetch_andnot_release(long long i, atomic64_t *v

>  {

>  	return atomic64_fetch_and_release(~i, v);

>  }

> -#endif

> +

> +#endif /* !atomic64_andnot */

>  

>  #define atomic64_cond_read_relaxed(v, c)	smp_cond_load_relaxed(&(v)->counter, (c))

>  #define atomic64_cond_read_acquire(v, c)	smp_cond_load_acquire(&(v)->counter, (c))
Ingo Molnar May 6, 2018, 2:57 p.m. | #22
* Andrea Parri <andrea.parri@amarulasolutions.com> wrote:

> Hi Ingo,

> 

> > From 5affbf7e91901143f84f1b2ca64f4afe70e210fd Mon Sep 17 00:00:00 2001

> > From: Ingo Molnar <mingo@kernel.org>

> > Date: Sat, 5 May 2018 10:23:23 +0200

> > Subject: [PATCH] locking/atomics: Simplify the op definitions in atomic.h some more

> > 

> > Before:

> > 

> >  #ifndef atomic_fetch_dec_relaxed

> >  # ifndef atomic_fetch_dec

> >  #  define atomic_fetch_dec(v)			atomic_fetch_sub(1, (v))

> >  #  define atomic_fetch_dec_relaxed(v)		atomic_fetch_sub_relaxed(1, (v))

> >  #  define atomic_fetch_dec_acquire(v)		atomic_fetch_sub_acquire(1, (v))

> >  #  define atomic_fetch_dec_release(v)		atomic_fetch_sub_release(1, (v))

> >  # else

> >  #  define atomic_fetch_dec_relaxed		atomic_fetch_dec

> >  #  define atomic_fetch_dec_acquire		atomic_fetch_dec

> >  #  define atomic_fetch_dec_release		atomic_fetch_dec

> >  # endif

> >  #else

> >  # ifndef atomic_fetch_dec_acquire

> >  #  define atomic_fetch_dec_acquire(...)	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

> >  # endif

> >  # ifndef atomic_fetch_dec_release

> >  #  define atomic_fetch_dec_release(...)	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

> >  # endif

> >  # ifndef atomic_fetch_dec

> >  #  define atomic_fetch_dec(...)		__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

> >  # endif

> >  #endif

> > 

> > After:

> > 

> >  #ifndef atomic_fetch_dec_relaxed

> >  # ifndef atomic_fetch_dec

> >  #  define atomic_fetch_dec(v)			atomic_fetch_sub(1, (v))

> >  #  define atomic_fetch_dec_relaxed(v)		atomic_fetch_sub_relaxed(1, (v))

> >  #  define atomic_fetch_dec_acquire(v)		atomic_fetch_sub_acquire(1, (v))

> >  #  define atomic_fetch_dec_release(v)		atomic_fetch_sub_release(1, (v))

> >  # else

> >  #  define atomic_fetch_dec_relaxed		atomic_fetch_dec

> >  #  define atomic_fetch_dec_acquire		atomic_fetch_dec

> >  #  define atomic_fetch_dec_release		atomic_fetch_dec

> >  # endif

> >  #else

> >  # ifndef atomic_fetch_dec

> >  #  define atomic_fetch_dec(...)		__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

> >  #  define atomic_fetch_dec_acquire(...)	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

> >  #  define atomic_fetch_dec_release(...)	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

> >  # endif

> >  #endif

> > 

> > The idea is that because we already group these APIs by certain defines

> > such as atomic_fetch_dec_relaxed and atomic_fetch_dec in the primary

> > branches - we can do the same in the secondary branch as well.

> > 

> > ( Also remove some unnecessarily duplicate comments, as the API

> >   group defines are now pretty much self-documenting. )

> > 

> > No change in functionality.

> > 

> > Cc: Peter Zijlstra <peterz@infradead.org>

> > Cc: Linus Torvalds <torvalds@linux-foundation.org>

> > Cc: Andrew Morton <akpm@linux-foundation.org>

> > Cc: Thomas Gleixner <tglx@linutronix.de>

> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> > Cc: Will Deacon <will.deacon@arm.com>

> > Cc: linux-kernel@vger.kernel.org

> > Signed-off-by: Ingo Molnar <mingo@kernel.org>

> 

> This breaks compilation on RISC-V. (For some of its atomics, the arch

> currently defines the _relaxed and the full variants and it relies on

> the generic definitions for the _acquire and the _release variants.)


I don't have cross-compilation for RISC-V, which is a relatively new arch.
(Is there any RISC-V set of cross-compilation tools on kernel.org somewhere?)

Could you please send a patch that defines those variants against Linus's tree, 
like the PowerPC patch that does something similar:

  0476a632cb3a: locking/atomics/powerpc: Move cmpxchg helpers to asm/cmpxchg.h and define the full set of cmpxchg APIs

?

... and I'll integrate it into the proper place to make it all bisectable, etc.

Thanks,

	Ingo
Boqun Feng May 7, 2018, 1:04 a.m. | #23
On Sun, May 6, 2018, at 8:11 PM, Ingo Molnar wrote:
> 

> * Boqun Feng <boqun.feng@gmail.com> wrote:

> 

> > > The only change I made beyond a trivial build fix is that I also added the release 

> > > atomics variants explicitly:

> > > 

> > > +#define atomic_cmpxchg_release(v, o, n) \

> > > +	cmpxchg_release(&((v)->counter), (o), (n))

> > > +#define atomic64_cmpxchg_release(v, o, n) \

> > > +	cmpxchg_release(&((v)->counter), (o), (n))

> > > 

> > > It has passed a PowerPC cross-build test here, but no runtime tests.

> > > 

> > 

> > Do you have the commit at any branch in tip tree? I could pull it and

> > cross-build and check the assembly code of lib/atomic64_test.c, that way

> > I could verify whether we mess something up.

> > 

> > > Does this patch look good to you?

> > > 

> > 

> > Yep!

> 

> Great - I have pushed the commits out into the locking tree, they can be 

> found in:

> 

>   git fetch git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 

> locking/core

> 


Thanks! My compile test told me that we need to remove the definitions of 
atomic_xchg and atomic64_xchg in ppc's asm/atomic.h: they are now
duplicate, and will prevent the generation of _release and _acquire in the
new logic.

If you need a updated patch for this from me, I could send later today.
(I don't have a  handy environment for patch sending now, so...)

Other than this, the modification looks fine, the lib/atomic64_test.c
generated the same asm before and after the patches.

Regards,
Boqun

> The PowerPC preparatory commit from you is:

> 

>   0476a632cb3a: locking/atomics/powerpc: Move cmpxchg helpers to asm/

> cmpxchg.h and define the full set of cmpxchg APIs

> 

> Thanks,

> 

> 	Ingo
Ingo Molnar May 7, 2018, 6:50 a.m. | #24
* Boqun Feng <boqun.feng@gmail.com> wrote:

> 

> 

> On Sun, May 6, 2018, at 8:11 PM, Ingo Molnar wrote:

> > 

> > * Boqun Feng <boqun.feng@gmail.com> wrote:

> > 

> > > > The only change I made beyond a trivial build fix is that I also added the release 

> > > > atomics variants explicitly:

> > > > 

> > > > +#define atomic_cmpxchg_release(v, o, n) \

> > > > +	cmpxchg_release(&((v)->counter), (o), (n))

> > > > +#define atomic64_cmpxchg_release(v, o, n) \

> > > > +	cmpxchg_release(&((v)->counter), (o), (n))

> > > > 

> > > > It has passed a PowerPC cross-build test here, but no runtime tests.

> > > > 

> > > 

> > > Do you have the commit at any branch in tip tree? I could pull it and

> > > cross-build and check the assembly code of lib/atomic64_test.c, that way

> > > I could verify whether we mess something up.

> > > 

> > > > Does this patch look good to you?

> > > > 

> > > 

> > > Yep!

> > 

> > Great - I have pushed the commits out into the locking tree, they can be 

> > found in:

> > 

> >   git fetch git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 

> > locking/core

> > 

> 

> Thanks! My compile test told me that we need to remove the definitions of 

> atomic_xchg and atomic64_xchg in ppc's asm/atomic.h: they are now

> duplicate, and will prevent the generation of _release and _acquire in the

> new logic.

> 

> If you need a updated patch for this from me, I could send later today.

> (I don't have a  handy environment for patch sending now, so...)


That would be cool, thanks! My own cross-build testing didn't trigger that build 
failure.

> Other than this, the modification looks fine, the lib/atomic64_test.c

> generated the same asm before and after the patches.


Cool, thanks for checking!

Thanks,

	Ingo
Andrea Parri May 7, 2018, 9:54 a.m. | #25
On Sun, May 06, 2018 at 04:57:27PM +0200, Ingo Molnar wrote:
> 

> * Andrea Parri <andrea.parri@amarulasolutions.com> wrote:

> 

> > Hi Ingo,

> > 

> > > From 5affbf7e91901143f84f1b2ca64f4afe70e210fd Mon Sep 17 00:00:00 2001

> > > From: Ingo Molnar <mingo@kernel.org>

> > > Date: Sat, 5 May 2018 10:23:23 +0200

> > > Subject: [PATCH] locking/atomics: Simplify the op definitions in atomic.h some more

> > > 

> > > Before:

> > > 

> > >  #ifndef atomic_fetch_dec_relaxed

> > >  # ifndef atomic_fetch_dec

> > >  #  define atomic_fetch_dec(v)			atomic_fetch_sub(1, (v))

> > >  #  define atomic_fetch_dec_relaxed(v)		atomic_fetch_sub_relaxed(1, (v))

> > >  #  define atomic_fetch_dec_acquire(v)		atomic_fetch_sub_acquire(1, (v))

> > >  #  define atomic_fetch_dec_release(v)		atomic_fetch_sub_release(1, (v))

> > >  # else

> > >  #  define atomic_fetch_dec_relaxed		atomic_fetch_dec

> > >  #  define atomic_fetch_dec_acquire		atomic_fetch_dec

> > >  #  define atomic_fetch_dec_release		atomic_fetch_dec

> > >  # endif

> > >  #else

> > >  # ifndef atomic_fetch_dec_acquire

> > >  #  define atomic_fetch_dec_acquire(...)	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

> > >  # endif

> > >  # ifndef atomic_fetch_dec_release

> > >  #  define atomic_fetch_dec_release(...)	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

> > >  # endif

> > >  # ifndef atomic_fetch_dec

> > >  #  define atomic_fetch_dec(...)		__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

> > >  # endif

> > >  #endif

> > > 

> > > After:

> > > 

> > >  #ifndef atomic_fetch_dec_relaxed

> > >  # ifndef atomic_fetch_dec

> > >  #  define atomic_fetch_dec(v)			atomic_fetch_sub(1, (v))

> > >  #  define atomic_fetch_dec_relaxed(v)		atomic_fetch_sub_relaxed(1, (v))

> > >  #  define atomic_fetch_dec_acquire(v)		atomic_fetch_sub_acquire(1, (v))

> > >  #  define atomic_fetch_dec_release(v)		atomic_fetch_sub_release(1, (v))

> > >  # else

> > >  #  define atomic_fetch_dec_relaxed		atomic_fetch_dec

> > >  #  define atomic_fetch_dec_acquire		atomic_fetch_dec

> > >  #  define atomic_fetch_dec_release		atomic_fetch_dec

> > >  # endif

> > >  #else

> > >  # ifndef atomic_fetch_dec

> > >  #  define atomic_fetch_dec(...)		__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

> > >  #  define atomic_fetch_dec_acquire(...)	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

> > >  #  define atomic_fetch_dec_release(...)	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

> > >  # endif

> > >  #endif

> > > 

> > > The idea is that because we already group these APIs by certain defines

> > > such as atomic_fetch_dec_relaxed and atomic_fetch_dec in the primary

> > > branches - we can do the same in the secondary branch as well.

> > > 

> > > ( Also remove some unnecessarily duplicate comments, as the API

> > >   group defines are now pretty much self-documenting. )

> > > 

> > > No change in functionality.

> > > 

> > > Cc: Peter Zijlstra <peterz@infradead.org>

> > > Cc: Linus Torvalds <torvalds@linux-foundation.org>

> > > Cc: Andrew Morton <akpm@linux-foundation.org>

> > > Cc: Thomas Gleixner <tglx@linutronix.de>

> > > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> > > Cc: Will Deacon <will.deacon@arm.com>

> > > Cc: linux-kernel@vger.kernel.org

> > > Signed-off-by: Ingo Molnar <mingo@kernel.org>

> > 

> > This breaks compilation on RISC-V. (For some of its atomics, the arch

> > currently defines the _relaxed and the full variants and it relies on

> > the generic definitions for the _acquire and the _release variants.)

> 

> I don't have cross-compilation for RISC-V, which is a relatively new arch.

> (Is there any RISC-V set of cross-compilation tools on kernel.org somewhere?)


I'm using the toolchain from:

  https://riscv.org/software-tools/

(adding Palmer and Albert in Cc:)


> 

> Could you please send a patch that defines those variants against Linus's tree, 

> like the PowerPC patch that does something similar:

> 

>   0476a632cb3a: locking/atomics/powerpc: Move cmpxchg helpers to asm/cmpxchg.h and define the full set of cmpxchg APIs

> 

> ?


Yes, please see below for a first RFC.

(BTW, get_maintainer.pl says that that patch missed Benjamin, Paul, Michael
 and linuxppc-dev@lists.ozlabs.org: FWIW, I'm Cc-ing the maintainers here.)

  Andrea


From 411f05a44e0b53a435331b977ff864fba7501a95 Mon Sep 17 00:00:00 2001
From: Andrea Parri <andrea.parri@amarulasolutions.com>

Date: Mon, 7 May 2018 10:59:20 +0200
Subject: [RFC PATCH] riscv/atomic: Defines _acquire/_release variants

In preparation for Ingo's renovation of the generic atomic.h header [1],
define the _acquire/_release variants in the arch's header.

No change in code generation.

[1] http://lkml.kernel.org/r/20180505081100.nsyrqrpzq2vd27bk@gmail.com
    http://lkml.kernel.org/r/20180505083635.622xmcvb42dw5xxh@gmail.com

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com>

Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <albert@sifive.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: linux-riscv@lists.infradead.org
---
 arch/riscv/include/asm/atomic.h | 88 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)

diff --git a/arch/riscv/include/asm/atomic.h b/arch/riscv/include/asm/atomic.h
index 855115ace98c8..7cbd8033dfb5d 100644
--- a/arch/riscv/include/asm/atomic.h
+++ b/arch/riscv/include/asm/atomic.h
@@ -153,22 +153,54 @@ ATOMIC_OPS(sub, add, +, -i)
 
 #define atomic_add_return_relaxed	atomic_add_return_relaxed
 #define atomic_sub_return_relaxed	atomic_sub_return_relaxed
+#define atomic_add_return_acquire(...)					\
+	__atomic_op_acquire(atomic_add_return, __VA_ARGS__)
+#define atomic_sub_return_acquire(...)					\
+	__atomic_op_acquire(atomic_sub_return, __VA_ARGS__)
+#define atomic_add_return_release(...)					\
+	__atomic_op_release(atomic_add_return, __VA_ARGS__)
+#define atomic_sub_return_release(...)					\
+	__atomic_op_release(atomic_sub_return, __VA_ARGS__)
 #define atomic_add_return		atomic_add_return
 #define atomic_sub_return		atomic_sub_return
 
 #define atomic_fetch_add_relaxed	atomic_fetch_add_relaxed
 #define atomic_fetch_sub_relaxed	atomic_fetch_sub_relaxed
+#define atomic_fetch_add_acquire(...)					\
+	__atomic_op_acquire(atomic_fetch_add, __VA_ARGS__)
+#define atomic_fetch_sub_acquire(...)					\
+	__atomic_op_acquire(atomic_fetch_sub, __VA_ARGS__)
+#define atomic_fetch_add_release(...)					\
+	__atomic_op_release(atomic_fetch_add, __VA_ARGS__)
+#define atomic_fetch_sub_release(...)					\
+	__atomic_op_release(atomic_fetch_sub, __VA_ARGS__)
 #define atomic_fetch_add		atomic_fetch_add
 #define atomic_fetch_sub		atomic_fetch_sub
 
 #ifndef CONFIG_GENERIC_ATOMIC64
 #define atomic64_add_return_relaxed	atomic64_add_return_relaxed
 #define atomic64_sub_return_relaxed	atomic64_sub_return_relaxed
+#define atomic64_add_return_acquire(...)				\
+	__atomic_op_acquire(atomic64_add_return, __VA_ARGS__)
+#define atomic64_sub_return_acquire(...)				\
+	__atomic_op_acquire(atomic64_sub_return, __VA_ARGS__)
+#define atomic64_add_return_release(...)				\
+	__atomic_op_release(atomic64_add_return, __VA_ARGS__)
+#define atomic64_sub_return_release(...)				\
+	__atomic_op_release(atomic64_sub_return, __VA_ARGS__)
 #define atomic64_add_return		atomic64_add_return
 #define atomic64_sub_return		atomic64_sub_return
 
 #define atomic64_fetch_add_relaxed	atomic64_fetch_add_relaxed
 #define atomic64_fetch_sub_relaxed	atomic64_fetch_sub_relaxed
+#define atomic64_fetch_add_acquire(...)					\
+	__atomic_op_acquire(atomic64_fetch_add, __VA_ARGS__)
+#define atomic64_fetch_sub_acquire(...)					\
+	__atomic_op_acquire(atomic64_fetch_sub, __VA_ARGS__)
+#define atomic64_fetch_add_release(...)					\
+	__atomic_op_release(atomic64_fetch_add, __VA_ARGS__)
+#define atomic64_fetch_sub_release(...)					\
+	__atomic_op_release(atomic64_fetch_sub, __VA_ARGS__)
 #define atomic64_fetch_add		atomic64_fetch_add
 #define atomic64_fetch_sub		atomic64_fetch_sub
 #endif
@@ -191,6 +223,18 @@ ATOMIC_OPS(xor, xor, i)
 #define atomic_fetch_and_relaxed	atomic_fetch_and_relaxed
 #define atomic_fetch_or_relaxed		atomic_fetch_or_relaxed
 #define atomic_fetch_xor_relaxed	atomic_fetch_xor_relaxed
+#define atomic_fetch_and_acquire(...)					\
+	__atomic_op_acquire(atomic_fetch_and, __VA_ARGS__)
+#define atomic_fetch_or_acquire(...)					\
+	__atomic_op_acquire(atomic_fetch_or, __VA_ARGS__)
+#define atomic_fetch_xor_acquire(...)					\
+	__atomic_op_acquire(atomic_fetch_xor, __VA_ARGS__)
+#define atomic_fetch_and_release(...)					\
+	__atomic_op_release(atomic_fetch_and, __VA_ARGS__)
+#define atomic_fetch_or_release(...)					\
+	__atomic_op_release(atomic_fetch_or, __VA_ARGS__)
+#define atomic_fetch_xor_release(...)					\
+	__atomic_op_release(atomic_fetch_xor, __VA_ARGS__)
 #define atomic_fetch_and		atomic_fetch_and
 #define atomic_fetch_or			atomic_fetch_or
 #define atomic_fetch_xor		atomic_fetch_xor
@@ -199,6 +243,18 @@ ATOMIC_OPS(xor, xor, i)
 #define atomic64_fetch_and_relaxed	atomic64_fetch_and_relaxed
 #define atomic64_fetch_or_relaxed	atomic64_fetch_or_relaxed
 #define atomic64_fetch_xor_relaxed	atomic64_fetch_xor_relaxed
+#define atomic64_fetch_and_acquire(...)					\
+	__atomic_op_acquire(atomic64_fetch_and, __VA_ARGS__)
+#define atomic64_fetch_or_acquire(...)					\
+	__atomic_op_acquire(atomic64_fetch_or, __VA_ARGS__)
+#define atomic64_fetch_xor_acquire(...)					\
+	__atomic_op_acquire(atomic64_fetch_xor, __VA_ARGS__)
+#define atomic64_fetch_and_release(...)					\
+	__atomic_op_release(atomic64_fetch_and, __VA_ARGS__)
+#define atomic64_fetch_or_release(...)					\
+	__atomic_op_release(atomic64_fetch_or, __VA_ARGS__)
+#define atomic64_fetch_xor_release(...)					\
+	__atomic_op_release(atomic64_fetch_xor, __VA_ARGS__)
 #define atomic64_fetch_and		atomic64_fetch_and
 #define atomic64_fetch_or		atomic64_fetch_or
 #define atomic64_fetch_xor		atomic64_fetch_xor
@@ -290,22 +346,54 @@ ATOMIC_OPS(dec, add, +, -1)
 
 #define atomic_inc_return_relaxed	atomic_inc_return_relaxed
 #define atomic_dec_return_relaxed	atomic_dec_return_relaxed
+#define atomic_inc_return_acquire(...)					\
+	__atomic_op_acquire(atomic_inc_return, __VA_ARGS__)
+#define atomic_dec_return_acquire(...)					\
+	__atomic_op_acquire(atomic_dec_return, __VA_ARGS__)
+#define atomic_inc_return_release(...)					\
+	__atomic_op_release(atomic_inc_return, __VA_ARGS__)
+#define atomic_dec_return_release(...)					\
+	__atomic_op_release(atomic_dec_return, __VA_ARGS__)
 #define atomic_inc_return		atomic_inc_return
 #define atomic_dec_return		atomic_dec_return
 
 #define atomic_fetch_inc_relaxed	atomic_fetch_inc_relaxed
 #define atomic_fetch_dec_relaxed	atomic_fetch_dec_relaxed
+#define atomic_fetch_inc_acquire(...)					\
+	__atomic_op_acquire(atomic_fetch_inc, __VA_ARGS__)
+#define atomic_fetch_dec_acquire(...)					\
+	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)
+#define atomic_fetch_inc_release(...)					\
+	__atomic_op_release(atomic_fetch_inc, __VA_ARGS__)
+#define atomic_fetch_dec_release(...)					\
+	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)
 #define atomic_fetch_inc		atomic_fetch_inc
 #define atomic_fetch_dec		atomic_fetch_dec
 
 #ifndef CONFIG_GENERIC_ATOMIC64
 #define atomic64_inc_return_relaxed	atomic64_inc_return_relaxed
 #define atomic64_dec_return_relaxed	atomic64_dec_return_relaxed
+#define atomic64_inc_return_acquire(...)				\
+	__atomic_op_acquire(atomic64_inc_return, __VA_ARGS__)
+#define atomic64_dec_return_acquire(...)				\
+	__atomic_op_acquire(atomic64_dec_return, __VA_ARGS__)
+#define atomic64_inc_return_release(...)				\
+	__atomic_op_release(atomic64_inc_return, __VA_ARGS__)
+#define atomic64_dec_return_release(...)				\
+	__atomic_op_release(atomic64_dec_return, __VA_ARGS__)
 #define atomic64_inc_return		atomic64_inc_return
 #define atomic64_dec_return		atomic64_dec_return
 
 #define atomic64_fetch_inc_relaxed	atomic64_fetch_inc_relaxed
 #define atomic64_fetch_dec_relaxed	atomic64_fetch_dec_relaxed
+#define atomic64_fetch_inc_acquire(...)					\
+	__atomic_op_acquire(atomic64_fetch_inc, __VA_ARGS__)
+#define atomic64_fetch_dec_acquire(...)					\
+	__atomic_op_acquire(atomic64_fetch_dec, __VA_ARGS__)
+#define atomic64_fetch_inc_release(...)					\
+	__atomic_op_release(atomic64_fetch_inc, __VA_ARGS__)
+#define atomic64_fetch_dec_release(...)					\
+	__atomic_op_release(atomic64_fetch_dec, __VA_ARGS__)
 #define atomic64_fetch_inc		atomic64_fetch_inc
 #define atomic64_fetch_dec		atomic64_fetch_dec
 #endif
-- 
2.7.4



> 

> ... and I'll integrate it into the proper place to make it all bisectable, etc.

> 

> Thanks,

> 

> 	Ingo
Palmer Dabbelt May 18, 2018, 6:43 p.m. | #26
On Sun, 06 May 2018 07:57:27 PDT (-0700), mingo@kernel.org wrote:
>

> * Andrea Parri <andrea.parri@amarulasolutions.com> wrote:

>

>> Hi Ingo,

>>

>> > From 5affbf7e91901143f84f1b2ca64f4afe70e210fd Mon Sep 17 00:00:00 2001

>> > From: Ingo Molnar <mingo@kernel.org>

>> > Date: Sat, 5 May 2018 10:23:23 +0200

>> > Subject: [PATCH] locking/atomics: Simplify the op definitions in atomic.h some more

>> >

>> > Before:

>> >

>> >  #ifndef atomic_fetch_dec_relaxed

>> >  # ifndef atomic_fetch_dec

>> >  #  define atomic_fetch_dec(v)			atomic_fetch_sub(1, (v))

>> >  #  define atomic_fetch_dec_relaxed(v)		atomic_fetch_sub_relaxed(1, (v))

>> >  #  define atomic_fetch_dec_acquire(v)		atomic_fetch_sub_acquire(1, (v))

>> >  #  define atomic_fetch_dec_release(v)		atomic_fetch_sub_release(1, (v))

>> >  # else

>> >  #  define atomic_fetch_dec_relaxed		atomic_fetch_dec

>> >  #  define atomic_fetch_dec_acquire		atomic_fetch_dec

>> >  #  define atomic_fetch_dec_release		atomic_fetch_dec

>> >  # endif

>> >  #else

>> >  # ifndef atomic_fetch_dec_acquire

>> >  #  define atomic_fetch_dec_acquire(...)	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

>> >  # endif

>> >  # ifndef atomic_fetch_dec_release

>> >  #  define atomic_fetch_dec_release(...)	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

>> >  # endif

>> >  # ifndef atomic_fetch_dec

>> >  #  define atomic_fetch_dec(...)		__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

>> >  # endif

>> >  #endif

>> >

>> > After:

>> >

>> >  #ifndef atomic_fetch_dec_relaxed

>> >  # ifndef atomic_fetch_dec

>> >  #  define atomic_fetch_dec(v)			atomic_fetch_sub(1, (v))

>> >  #  define atomic_fetch_dec_relaxed(v)		atomic_fetch_sub_relaxed(1, (v))

>> >  #  define atomic_fetch_dec_acquire(v)		atomic_fetch_sub_acquire(1, (v))

>> >  #  define atomic_fetch_dec_release(v)		atomic_fetch_sub_release(1, (v))

>> >  # else

>> >  #  define atomic_fetch_dec_relaxed		atomic_fetch_dec

>> >  #  define atomic_fetch_dec_acquire		atomic_fetch_dec

>> >  #  define atomic_fetch_dec_release		atomic_fetch_dec

>> >  # endif

>> >  #else

>> >  # ifndef atomic_fetch_dec

>> >  #  define atomic_fetch_dec(...)		__atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)

>> >  #  define atomic_fetch_dec_acquire(...)	__atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)

>> >  #  define atomic_fetch_dec_release(...)	__atomic_op_release(atomic_fetch_dec, __VA_ARGS__)

>> >  # endif

>> >  #endif

>> >

>> > The idea is that because we already group these APIs by certain defines

>> > such as atomic_fetch_dec_relaxed and atomic_fetch_dec in the primary

>> > branches - we can do the same in the secondary branch as well.

>> >

>> > ( Also remove some unnecessarily duplicate comments, as the API

>> >   group defines are now pretty much self-documenting. )

>> >

>> > No change in functionality.

>> >

>> > Cc: Peter Zijlstra <peterz@infradead.org>

>> > Cc: Linus Torvalds <torvalds@linux-foundation.org>

>> > Cc: Andrew Morton <akpm@linux-foundation.org>

>> > Cc: Thomas Gleixner <tglx@linutronix.de>

>> > Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

>> > Cc: Will Deacon <will.deacon@arm.com>

>> > Cc: linux-kernel@vger.kernel.org

>> > Signed-off-by: Ingo Molnar <mingo@kernel.org>

>>

>> This breaks compilation on RISC-V. (For some of its atomics, the arch

>> currently defines the _relaxed and the full variants and it relies on

>> the generic definitions for the _acquire and the _release variants.)

>

> I don't have cross-compilation for RISC-V, which is a relatively new arch.

> (Is there any RISC-V set of cross-compilation tools on kernel.org somewhere?)


Arnd added RISC-V to the cross compiler list a month or two ago when he updated 
them all.  I use the "make.cross" script from the Intel test robot, which will 
fetch the cross compilers for you.  It looks like I made a Git Hub pull request 
to update the script for RISC-V, it fetches from kernel.org

    https://github.com/palmer-dabbelt/lkp-tests/blob/e14f4236ccd0572f4b87ffd480fecefee412dedc/sbin/make.cross
    http://cdn.kernel.org/pub/tools/crosstool/files/bin/
    http://cdn.kernel.org/pub/tools/crosstool/files/bin/x86_64/7.3.0/x86_64-gcc-7.3.0-nolibc_riscv64-linux.tar.gz

> Could you please send a patch that defines those variants against Linus's tree,

> like the PowerPC patch that does something similar:

>

>   0476a632cb3a: locking/atomics/powerpc: Move cmpxchg helpers to asm/cmpxchg.h and define the full set of cmpxchg APIs

>

> ?

>

> ... and I'll integrate it into the proper place to make it all bisectable, etc.


Sorry, I got buried in email again.  Did this get merged, or is there a current 
version of the patch set I should look at?

Patch

diff --git a/include/asm-generic/atomic-instrumented.h b/include/asm-generic/atomic-instrumented.h
index ec07f23678ea..26f0e3098442 100644
--- a/include/asm-generic/atomic-instrumented.h
+++ b/include/asm-generic/atomic-instrumented.h
@@ -40,171 +40,664 @@  static __always_inline void atomic64_set(atomic64_t *v, s64 i)
 	arch_atomic64_set(v, i);
 }
 
-static __always_inline int atomic_xchg(atomic_t *v, int i)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic_xchg(v, i);
+#define INSTR_ATOMIC_XCHG(order)					\
+static __always_inline int						\
+atomic_xchg##order(atomic_t *v, int i)					\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic_xchg##order(v, i);				\
 }
 
-static __always_inline s64 atomic64_xchg(atomic64_t *v, s64 i)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic64_xchg(v, i);
+INSTR_ATOMIC_XCHG()
+
+#ifdef arch_atomic_xchg_relaxed
+INSTR_ATOMIC_XCHG(_relaxed)
+#define atomic_xchg_relaxed atomic_xchg_relaxed
+#endif
+
+#ifdef arch_atomic_xchg_acquire
+INSTR_ATOMIC_XCHG(_acquire)
+#define atomic_xchg_acquire atomic_xchg_acquire
+#endif
+
+#ifdef arch_atomic_xchg_release
+INSTR_ATOMIC_XCHG(_release)
+#define atomic_xchg_release atomic_xchg_release
+#endif
+
+#define INSTR_ATOMIC64_XCHG(order)					\
+static __always_inline s64						\
+atomic64_xchg##order(atomic64_t *v, s64 i)				\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic64_xchg##order(v, i);				\
 }
 
-static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic_cmpxchg(v, old, new);
+INSTR_ATOMIC64_XCHG()
+
+#ifdef arch_atomic64_xchg_relaxed
+INSTR_ATOMIC64_XCHG(_relaxed)
+#define atomic64_xchg_relaxed atomic64_xchg_relaxed
+#endif
+
+#ifdef arch_atomic64_xchg_acquire
+INSTR_ATOMIC64_XCHG(_acquire)
+#define atomic64_xchg_acquire atomic64_xchg_acquire
+#endif
+
+#ifdef arch_atomic64_xchg_release
+INSTR_ATOMIC64_XCHG(_release)
+#define atomic64_xchg_release atomic64_xchg_release
+#endif
+
+#define INSTR_ATOMIC_CMPXCHG(order)					\
+static __always_inline int						\
+atomic_cmpxchg##order(atomic_t *v, int old, int new)			\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic_cmpxchg##order(v, old, new);			\
 }
 
-static __always_inline s64 atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic64_cmpxchg(v, old, new);
+INSTR_ATOMIC_CMPXCHG()
+
+#ifdef arch_atomic_cmpxchg_relaxed
+INSTR_ATOMIC_CMPXCHG(_relaxed)
+#define atomic_cmpxchg_relaxed atomic_cmpxchg_relaxed
+#endif
+
+#ifdef arch_atomic_cmpxchg_acquire
+INSTR_ATOMIC_CMPXCHG(_acquire)
+#define atomic_cmpxchg_acquire atomic_cmpxchg_acquire
+#endif
+
+#ifdef arch_atomic_cmpxchg_release
+INSTR_ATOMIC_CMPXCHG(_release)
+#define atomic_cmpxchg_release atomic_cmpxchg_release
+#endif
+
+#define INSTR_ATOMIC64_CMPXCHG(order)					\
+static __always_inline s64						\
+atomic64_cmpxchg##order(atomic64_t *v, s64 old, s64 new)		\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic64_cmpxchg##order(v, old, new);		\
+}
+
+INSTR_ATOMIC64_CMPXCHG()
+
+#ifdef arch_atomic64_cmpxchg_relaxed
+INSTR_ATOMIC64_CMPXCHG(_relaxed)
+#define atomic64_cmpxchg_relaxed atomic64_cmpxchg_relaxed
+#endif
+
+#ifdef arch_atomic64_cmpxchg_acquire
+INSTR_ATOMIC64_CMPXCHG(_acquire)
+#define atomic64_cmpxchg_acquire atomic64_cmpxchg_acquire
+#endif
+
+#ifdef arch_atomic64_cmpxchg_release
+INSTR_ATOMIC64_CMPXCHG(_release)
+#define atomic64_cmpxchg_release atomic64_cmpxchg_release
+#endif
+
+#define INSTR_ATOMIC_TRY_CMPXCHG(order)					\
+static __always_inline bool						\
+atomic_try_cmpxchg##order(atomic_t *v, int *old, int new)		\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	kasan_check_read(old, sizeof(*old));				\
+	return arch_atomic_try_cmpxchg##order(v, old, new);		\
 }
 
 #ifdef arch_atomic_try_cmpxchg
+INSTR_ATOMIC_TRY_CMPXCHG()
 #define atomic_try_cmpxchg atomic_try_cmpxchg
-static __always_inline bool atomic_try_cmpxchg(atomic_t *v, int *old, int new)
-{
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_read(old, sizeof(*old));
-	return arch_atomic_try_cmpxchg(v, old, new);
-}
 #endif
 
-#ifdef arch_atomic64_try_cmpxchg
-#define atomic64_try_cmpxchg atomic64_try_cmpxchg
-static __always_inline bool atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
-{
-	kasan_check_write(v, sizeof(*v));
-	kasan_check_read(old, sizeof(*old));
-	return arch_atomic64_try_cmpxchg(v, old, new);
+#ifdef arch_atomic_try_cmpxchg_relaxed
+INSTR_ATOMIC_TRY_CMPXCHG(_relaxed)
+#define atomic_try_cmpxchg_relaxed atomic_try_cmpxchg_relaxed
+#endif
+
+#ifdef arch_atomic_try_cmpxchg_acquire
+INSTR_ATOMIC_TRY_CMPXCHG(_acquire)
+#define atomic_try_cmpxchg_acquire atomic_try_cmpxchg_acquire
+#endif
+
+#ifdef arch_atomic_try_cmpxchg_release
+INSTR_ATOMIC_TRY_CMPXCHG(_release)
+#define atomic_try_cmpxchg_release atomic_try_cmpxchg_release
+#endif
+
+#define INSTR_ATOMIC64_TRY_CMPXCHG(order)				\
+static __always_inline bool						\
+atomic64_try_cmpxchg##order(atomic64_t *v, s64 *old, s64 new)		\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	kasan_check_read(old, sizeof(*old));				\
+	return arch_atomic64_try_cmpxchg##order(v, old, new);		\
 }
+
+#ifdef arch_atomic64_try_cmpxchg
+INSTR_ATOMIC64_TRY_CMPXCHG()
+#define atomic_try_cmpxchg atomic_try_cmpxchg
 #endif
 
-static __always_inline int __atomic_add_unless(atomic_t *v, int a, int u)
-{
-	kasan_check_write(v, sizeof(*v));
-	return __arch_atomic_add_unless(v, a, u);
+#ifdef arch_atomic64_try_cmpxchg_relaxed
+INSTR_ATOMIC64_TRY_CMPXCHG(_relaxed)
+#define atomic_try_cmpxchg_relaxed atomic_try_cmpxchg_relaxed
+#endif
+
+#ifdef arch_atomic64_try_cmpxchg_acquire
+INSTR_ATOMIC64_TRY_CMPXCHG(_acquire)
+#define atomic_try_cmpxchg_acquire atomic_try_cmpxchg_acquire
+#endif
+
+#ifdef arch_atomic64_try_cmpxchg_release
+INSTR_ATOMIC64_TRY_CMPXCHG(_release)
+#define atomic_try_cmpxchg_release atomic_try_cmpxchg_release
+#endif
+
+#define __INSTR_ATOMIC_ADD_UNLESS(order)				\
+static __always_inline int						\
+__atomic_add_unless##order(atomic_t *v, int a, int u)			\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return __arch_atomic_add_unless##order(v, a, u);		\
 }
 
+__INSTR_ATOMIC_ADD_UNLESS()
 
-static __always_inline bool atomic64_add_unless(atomic64_t *v, s64 a, s64 u)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic64_add_unless(v, a, u);
+#ifdef __arch_atomic_add_unless_relaxed
+__INSTR_ATOMIC_ADD_UNLESS(_relaxed)
+#define __atomic_add_unless_relaxed __atomic_add_unless_relaxed
+#endif
+
+#ifdef __arch_atomic_add_unless_acquire
+__INSTR_ATOMIC_ADD_UNLESS(_acquire)
+#define __atomic_add_unless_acquire __atomic_add_unless_acquire
+#endif
+
+#ifdef __arch_atomic_add_unless_release
+__INSTR_ATOMIC_ADD_UNLESS(_release)
+#define __atomic_add_unless_release __atomic_add_unless_release
+#endif
+
+#define INSTR_ATOMIC64_ADD_UNLESS(order)				\
+static __always_inline bool						\
+atomic64_add_unless##order(atomic64_t *v, s64 a, s64 u)			\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic64_add_unless##order(v, a, u);		\
 }
 
-static __always_inline void atomic_inc(atomic_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	arch_atomic_inc(v);
+INSTR_ATOMIC64_ADD_UNLESS()
+
+#ifdef arch_atomic64_add_unless_relaxed
+INSTR_ATOMIC64_ADD_UNLESS(_relaxed)
+#define atomic64_add_unless_relaxed atomic64_add_unless_relaxed
+#endif
+
+#ifdef arch_atomic64_add_unless_acquire
+INSTR_ATOMIC64_ADD_UNLESS(_acquire)
+#define atomic64_add_unless_acquire atomic64_add_unless_acquire
+#endif
+
+#ifdef arch_atomic64_add_unless_release
+INSTR_ATOMIC64_ADD_UNLESS(_release)
+#define atomic64_add_unless_release atomic64_add_unless_release
+#endif
+
+#define INSTR_ATOMIC_INC(order)						\
+static __always_inline void						\
+atomic_inc##order(atomic_t *v)						\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	arch_atomic_inc##order(v);					\
 }
 
-static __always_inline void atomic64_inc(atomic64_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	arch_atomic64_inc(v);
+INSTR_ATOMIC_INC()
+
+#ifdef arch_atomic_inc_relaxed
+INSTR_ATOMIC_INC(_relaxed)
+#define atomic_inc_relaxed atomic_inc_relaxed
+#endif
+
+#ifdef arch_atomic_inc_acquire
+INSTR_ATOMIC_INC(_acquire)
+#define atomic_inc_acquire atomic_inc_acquire
+#endif
+
+#ifdef arch_atomic_inc_release
+INSTR_ATOMIC_INC(_release)
+#define atomic_inc_release atomic_inc_release
+#endif
+
+#define INSTR_ATOMIC64_INC(order)					\
+static __always_inline void						\
+atomic64_inc##order(atomic64_t *v)					\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	arch_atomic64_inc##order(v);					\
 }
 
-static __always_inline void atomic_dec(atomic_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	arch_atomic_dec(v);
+INSTR_ATOMIC64_INC()
+
+#ifdef arch_atomic64_inc_relaxed
+INSTR_ATOMIC64_INC(_relaxed)
+#define atomic64_inc_relaxed atomic64_inc_relaxed
+#endif
+
+#ifdef arch_atomic64_inc_acquire
+INSTR_ATOMIC64_INC(_acquire)
+#define atomic64_inc_acquire atomic64_inc_acquire
+#endif
+
+#ifdef arch_atomic64_inc_release
+INSTR_ATOMIC64_INC(_release)
+#define atomic64_inc_release atomic64_inc_release
+#endif
+
+#define INSTR_ATOMIC_DEC(order)						\
+static __always_inline void						\
+atomic_dec##order(atomic_t *v)						\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	arch_atomic_dec##order(v);					\
 }
 
-static __always_inline void atomic64_dec(atomic64_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	arch_atomic64_dec(v);
+INSTR_ATOMIC_DEC()
+
+#ifdef arch_atomic_dec_relaxed
+INSTR_ATOMIC_DEC(_relaxed)
+#define atomic_dec_relaxed atomic_dec_relaxed
+#endif
+
+#ifdef arch_atomic_dec_acquire
+INSTR_ATOMIC_DEC(_acquire)
+#define atomic_dec_acquire atomic_dec_acquire
+#endif
+
+#ifdef arch_atomic_dec_release
+INSTR_ATOMIC_DEC(_release)
+#define atomic_dec_release atomic_dec_release
+#endif
+
+#define INSTR_ATOMIC64_DEC(order)					\
+static __always_inline void						\
+atomic64_dec##order(atomic64_t *v)					\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	arch_atomic64_dec##order(v);					\
 }
 
-static __always_inline void atomic_add(int i, atomic_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	arch_atomic_add(i, v);
+INSTR_ATOMIC64_DEC()
+
+#ifdef arch_atomic64_dec_relaxed
+INSTR_ATOMIC64_DEC(_relaxed)
+#define atomic64_dec_relaxed atomic64_dec_relaxed
+#endif
+
+#ifdef arch_atomic64_dec_acquire
+INSTR_ATOMIC64_DEC(_acquire)
+#define atomic64_dec_acquire atomic64_dec_acquire
+#endif
+
+#ifdef arch_atomic64_dec_release
+INSTR_ATOMIC64_DEC(_release)
+#define atomic64_dec_release atomic64_dec_release
+#endif
+
+#define INSTR_ATOMIC_ADD(order)						\
+static __always_inline void						\
+atomic_add##order(int i, atomic_t *v)					\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	arch_atomic_add##order(i, v);					\
 }
 
-static __always_inline void atomic64_add(s64 i, atomic64_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	arch_atomic64_add(i, v);
+INSTR_ATOMIC_ADD()
+
+#ifdef arch_atomic_add_relaxed
+INSTR_ATOMIC_ADD(_relaxed)
+#define atomic_add_relaxed atomic_add_relaxed
+#endif
+
+#ifdef arch_atomic_add_acquire
+INSTR_ATOMIC_ADD(_acquire)
+#define atomic_add_acquire atomic_add_acquire
+#endif
+
+#ifdef arch_atomic_add_release
+INSTR_ATOMIC_ADD(_release)
+#define atomic_add_release atomic_add_release
+#endif
+
+#define INSTR_ATOMIC64_ADD(order)					\
+static __always_inline void						\
+atomic64_add##order(s64 i, atomic64_t *v)				\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	arch_atomic64_add##order(i, v);					\
 }
 
-static __always_inline void atomic_sub(int i, atomic_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	arch_atomic_sub(i, v);
+INSTR_ATOMIC64_ADD()
+
+#ifdef arch_atomic64_add_relaxed
+INSTR_ATOMIC64_ADD(_relaxed)
+#define atomic64_add_relaxed atomic64_add_relaxed
+#endif
+
+#ifdef arch_atomic64_add_acquire
+INSTR_ATOMIC64_ADD(_acquire)
+#define atomic64_add_acquire atomic64_add_acquire
+#endif
+
+#ifdef arch_atomic64_add_release
+INSTR_ATOMIC64_ADD(_release)
+#define atomic64_add_release atomic64_add_release
+#endif
+
+#define INSTR_ATOMIC_SUB(order)						\
+static __always_inline void						\
+atomic_sub##order(int i, atomic_t *v)					\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	arch_atomic_sub##order(i, v);					\
 }
 
-static __always_inline void atomic64_sub(s64 i, atomic64_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	arch_atomic64_sub(i, v);
+INSTR_ATOMIC_SUB()
+
+#ifdef arch_atomic_sub_relaxed
+INSTR_ATOMIC_SUB(_relaxed)
+#define atomic_sub_relaxed atomic_sub_relaxed
+#endif
+
+#ifdef arch_atomic_sub_acquire
+INSTR_ATOMIC_SUB(_acquire)
+#define atomic_sub_acquire atomic_sub_acquire
+#endif
+
+#ifdef arch_atomic_sub_release
+INSTR_ATOMIC_SUB(_release)
+#define atomic_sub_release atomic_sub_release
+#endif
+
+#define INSTR_ATOMIC64_SUB(order)					\
+static __always_inline void						\
+atomic64_sub##order(s64 i, atomic64_t *v)				\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	arch_atomic64_sub##order(i, v);					\
 }
 
-static __always_inline void atomic_and(int i, atomic_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	arch_atomic_and(i, v);
+INSTR_ATOMIC64_SUB()
+
+#ifdef arch_atomic64_sub_relaxed
+INSTR_ATOMIC64_SUB(_relaxed)
+#define atomic64_sub_relaxed atomic64_sub_relaxed
+#endif
+
+#ifdef arch_atomic64_sub_acquire
+INSTR_ATOMIC64_SUB(_acquire)
+#define atomic64_sub_acquire atomic64_sub_acquire
+#endif
+
+#ifdef arch_atomic64_sub_release
+INSTR_ATOMIC64_SUB(_release)
+#define atomic64_sub_release atomic64_sub_release
+#endif
+
+#define INSTR_ATOMIC_AND(order)						\
+static __always_inline void						\
+atomic_and##order(int i, atomic_t *v)					\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	arch_atomic_and##order(i, v);					\
 }
 
-static __always_inline void atomic64_and(s64 i, atomic64_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	arch_atomic64_and(i, v);
+INSTR_ATOMIC_AND()
+
+#ifdef arch_atomic_and_relaxed
+INSTR_ATOMIC_AND(_relaxed)
+#define atomic_and_relaxed atomic_and_relaxed
+#endif
+
+#ifdef arch_atomic_and_acquire
+INSTR_ATOMIC_AND(_acquire)
+#define atomic_and_acquire atomic_and_acquire
+#endif
+
+#ifdef arch_atomic_and_release
+INSTR_ATOMIC_AND(_release)
+#define atomic_and_release atomic_and_release
+#endif
+
+#define INSTR_ATOMIC64_AND(order)					\
+static __always_inline void						\
+atomic64_and##order(s64 i, atomic64_t *v)				\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	arch_atomic64_and##order(i, v);					\
 }
 
-static __always_inline void atomic_or(int i, atomic_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	arch_atomic_or(i, v);
+INSTR_ATOMIC64_AND()
+
+#ifdef arch_atomic64_and_relaxed
+INSTR_ATOMIC64_AND(_relaxed)
+#define atomic64_and_relaxed atomic64_and_relaxed
+#endif
+
+#ifdef arch_atomic64_and_acquire
+INSTR_ATOMIC64_AND(_acquire)
+#define atomic64_and_acquire atomic64_and_acquire
+#endif
+
+#ifdef arch_atomic64_and_release
+INSTR_ATOMIC64_AND(_release)
+#define atomic64_and_release atomic64_and_release
+#endif
+
+#define INSTR_ATOMIC_OR(order)						\
+static __always_inline void						\
+atomic_or##order(int i, atomic_t *v)					\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	arch_atomic_or##order(i, v);					\
 }
 
-static __always_inline void atomic64_or(s64 i, atomic64_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	arch_atomic64_or(i, v);
+INSTR_ATOMIC_OR()
+
+#ifdef arch_atomic_or_relaxed
+INSTR_ATOMIC_OR(_relaxed)
+#define atomic_or_relaxed atomic_or_relaxed
+#endif
+
+#ifdef arch_atomic_or_acquire
+INSTR_ATOMIC_OR(_acquire)
+#define atomic_or_acquire atomic_or_acquire
+#endif
+
+#ifdef arch_atomic_or_release
+INSTR_ATOMIC_OR(_release)
+#define atomic_or_release atomic_or_release
+#endif
+
+#define INSTR_ATOMIC64_OR(order)					\
+static __always_inline void						\
+atomic64_or##order(s64 i, atomic64_t *v)				\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	arch_atomic64_or##order(i, v);					\
 }
 
-static __always_inline void atomic_xor(int i, atomic_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	arch_atomic_xor(i, v);
+INSTR_ATOMIC64_OR()
+
+#ifdef arch_atomic64_or_relaxed
+INSTR_ATOMIC64_OR(_relaxed)
+#define atomic64_or_relaxed atomic64_or_relaxed
+#endif
+
+#ifdef arch_atomic64_or_acquire
+INSTR_ATOMIC64_OR(_acquire)
+#define atomic64_or_acquire atomic64_or_acquire
+#endif
+
+#ifdef arch_atomic64_or_release
+INSTR_ATOMIC64_OR(_release)
+#define atomic64_or_release atomic64_or_release
+#endif
+
+#define INSTR_ATOMIC_XOR(order)						\
+static __always_inline void						\
+atomic_xor##order(int i, atomic_t *v)					\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	arch_atomic_xor##order(i, v);					\
 }
 
-static __always_inline void atomic64_xor(s64 i, atomic64_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	arch_atomic64_xor(i, v);
+INSTR_ATOMIC_XOR()
+
+#ifdef arch_atomic_xor_relaxed
+INSTR_ATOMIC_XOR(_relaxed)
+#define atomic_xor_relaxed atomic_xor_relaxed
+#endif
+
+#ifdef arch_atomic_xor_acquire
+INSTR_ATOMIC_XOR(_acquire)
+#define atomic_xor_acquire atomic_xor_acquire
+#endif
+
+#ifdef arch_atomic_xor_release
+INSTR_ATOMIC_XOR(_release)
+#define atomic_xor_release atomic_xor_release
+#endif
+
+#define INSTR_ATOMIC64_XOR(order)					\
+static __always_inline void						\
+atomic64_xor##order(s64 i, atomic64_t *v)				\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	arch_atomic64_xor##order(i, v);					\
 }
 
-static __always_inline int atomic_inc_return(atomic_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic_inc_return(v);
+INSTR_ATOMIC64_XOR()
+
+#ifdef arch_atomic64_xor_relaxed
+INSTR_ATOMIC64_XOR(_relaxed)
+#define atomic64_xor_relaxed atomic64_xor_relaxed
+#endif
+
+#ifdef arch_atomic64_xor_acquire
+INSTR_ATOMIC64_XOR(_acquire)
+#define atomic64_xor_acquire atomic64_xor_acquire
+#endif
+
+#ifdef arch_atomic64_xor_release
+INSTR_ATOMIC64_XOR(_release)
+#define atomic64_xor_release atomic64_xor_release
+#endif
+
+#define INSTR_ATOMIC_INC_RETURN(order)					\
+static __always_inline int						\
+atomic_inc_return##order(atomic_t *v)					\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic_inc_return##order(v);			\
 }
 
-static __always_inline s64 atomic64_inc_return(atomic64_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic64_inc_return(v);
+INSTR_ATOMIC_INC_RETURN()
+
+#ifdef arch_atomic_inc_return_relaxed
+INSTR_ATOMIC_INC_RETURN(_relaxed)
+#define atomic_inc_return_relaxed atomic_inc_return_relaxed
+#endif
+
+#ifdef arch_atomic_inc_return_acquire
+INSTR_ATOMIC_INC_RETURN(_acquire)
+#define atomic_inc_return_acquire atomic_inc_return_acquire
+#endif
+
+#ifdef arch_atomic_inc_return_release
+INSTR_ATOMIC_INC_RETURN(_release)
+#define atomic_inc_return_release atomic_inc_return_release
+#endif
+
+#define INSTR_ATOMIC64_INC_RETURN(order)				\
+static __always_inline s64						\
+atomic64_inc_return##order(atomic64_t *v)				\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic64_inc_return##order(v);			\
 }
 
-static __always_inline int atomic_dec_return(atomic_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic_dec_return(v);
+INSTR_ATOMIC64_INC_RETURN()
+
+#ifdef arch_atomic64_inc_return_relaxed
+INSTR_ATOMIC64_INC_RETURN(_relaxed)
+#define atomic64_inc_return_relaxed atomic64_inc_return_relaxed
+#endif
+
+#ifdef arch_atomic64_inc_return_acquire
+INSTR_ATOMIC64_INC_RETURN(_acquire)
+#define atomic64_inc_return_acquire atomic64_inc_return_acquire
+#endif
+
+#ifdef arch_atomic64_inc_return_release
+INSTR_ATOMIC64_INC_RETURN(_release)
+#define atomic64_inc_return_release atomic64_inc_return_release
+#endif
+
+#define INSTR_ATOMIC_DEC_RETURN(order)					\
+static __always_inline int						\
+atomic_dec_return##order(atomic_t *v)					\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic_dec_return##order(v);			\
 }
 
-static __always_inline s64 atomic64_dec_return(atomic64_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic64_dec_return(v);
+INSTR_ATOMIC_DEC_RETURN()
+
+#ifdef arch_atomic_dec_return_relaxed
+INSTR_ATOMIC_DEC_RETURN(_relaxed)
+#define atomic_dec_return_relaxed atomic_dec_return_relaxed
+#endif
+
+#ifdef arch_atomic_dec_return_acquire
+INSTR_ATOMIC_DEC_RETURN(_acquire)
+#define atomic_dec_return_acquire atomic_dec_return_acquire
+#endif
+
+#ifdef arch_atomic_dec_return_release
+INSTR_ATOMIC_DEC_RETURN(_release)
+#define atomic_dec_return_release atomic_dec_return_release
+#endif
+
+#define INSTR_ATOMIC64_DEC_RETURN(order)				\
+static __always_inline s64						\
+atomic64_dec_return##order(atomic64_t *v)				\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic64_dec_return##order(v);			\
 }
 
+INSTR_ATOMIC64_DEC_RETURN()
+
+#ifdef arch_atomic64_dec_return_relaxed
+INSTR_ATOMIC64_DEC_RETURN(_relaxed)
+#define atomic64_dec_return_relaxed atomic64_dec_return_relaxed
+#endif
+
+#ifdef arch_atomic64_dec_return_acquire
+INSTR_ATOMIC64_DEC_RETURN(_acquire)
+#define atomic64_dec_return_acquire atomic64_dec_return_acquire
+#endif
+
+#ifdef arch_atomic64_dec_return_release
+INSTR_ATOMIC64_DEC_RETURN(_release)
+#define atomic64_dec_return_release atomic64_dec_return_release
+#endif
+
 static __always_inline s64 atomic64_inc_not_zero(atomic64_t *v)
 {
 	kasan_check_write(v, sizeof(*v));
@@ -241,90 +734,356 @@  static __always_inline bool atomic64_inc_and_test(atomic64_t *v)
 	return arch_atomic64_inc_and_test(v);
 }
 
-static __always_inline int atomic_add_return(int i, atomic_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic_add_return(i, v);
+#define INSTR_ATOMIC_ADD_RETURN(order)					\
+static __always_inline int						\
+atomic_add_return##order(int i, atomic_t *v)				\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic_add_return##order(i, v);			\
 }
 
-static __always_inline s64 atomic64_add_return(s64 i, atomic64_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic64_add_return(i, v);
+INSTR_ATOMIC_ADD_RETURN()
+
+#ifdef arch_atomic_add_return_relaxed
+INSTR_ATOMIC_ADD_RETURN(_relaxed)
+#define atomic_add_return_relaxed atomic_add_return_relaxed
+#endif
+
+#ifdef arch_atomic_add_return_acquire
+INSTR_ATOMIC_ADD_RETURN(_acquire)
+#define atomic_add_return_acquire atomic_add_return_acquire
+#endif
+
+#ifdef arch_atomic_add_return_release
+INSTR_ATOMIC_ADD_RETURN(_release)
+#define atomic_add_return_release atomic_add_return_release
+#endif
+
+#define INSTR_ATOMIC64_ADD_RETURN(order)				\
+static __always_inline s64						\
+atomic64_add_return##order(s64 i, atomic64_t *v)			\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic64_add_return##order(i, v);			\
 }
 
-static __always_inline int atomic_sub_return(int i, atomic_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic_sub_return(i, v);
+INSTR_ATOMIC64_ADD_RETURN()
+
+#ifdef arch_atomic64_add_return_relaxed
+INSTR_ATOMIC64_ADD_RETURN(_relaxed)
+#define atomic64_add_return_relaxed atomic64_add_return_relaxed
+#endif
+
+#ifdef arch_atomic64_add_return_acquire
+INSTR_ATOMIC64_ADD_RETURN(_acquire)
+#define atomic64_add_return_acquire atomic64_add_return_acquire
+#endif
+
+#ifdef arch_atomic64_add_return_release
+INSTR_ATOMIC64_ADD_RETURN(_release)
+#define atomic64_add_return_release atomic64_add_return_release
+#endif
+
+#define INSTR_ATOMIC_SUB_RETURN(order)					\
+static __always_inline int						\
+atomic_sub_return##order(int i, atomic_t *v)				\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic_sub_return##order(i, v);			\
 }
 
-static __always_inline s64 atomic64_sub_return(s64 i, atomic64_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic64_sub_return(i, v);
+INSTR_ATOMIC_SUB_RETURN()
+
+#ifdef arch_atomic_sub_return_relaxed
+INSTR_ATOMIC_SUB_RETURN(_relaxed)
+#define atomic_sub_return_relaxed atomic_sub_return_relaxed
+#endif
+
+#ifdef arch_atomic_sub_return_acquire
+INSTR_ATOMIC_SUB_RETURN(_acquire)
+#define atomic_sub_return_acquire atomic_sub_return_acquire
+#endif
+
+#ifdef arch_atomic_sub_return_release
+INSTR_ATOMIC_SUB_RETURN(_release)
+#define atomic_sub_return_release atomic_sub_return_release
+#endif
+
+#define INSTR_ATOMIC64_SUB_RETURN(order)				\
+static __always_inline s64						\
+atomic64_sub_return##order(s64 i, atomic64_t *v)			\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic64_sub_return##order(i, v);			\
 }
 
-static __always_inline int atomic_fetch_add(int i, atomic_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic_fetch_add(i, v);
+INSTR_ATOMIC64_SUB_RETURN()
+
+#ifdef arch_atomic64_sub_return_relaxed
+INSTR_ATOMIC64_SUB_RETURN(_relaxed)
+#define atomic64_sub_return_relaxed atomic64_sub_return_relaxed
+#endif
+
+#ifdef arch_atomic64_sub_return_acquire
+INSTR_ATOMIC64_SUB_RETURN(_acquire)
+#define atomic64_sub_return_acquire atomic64_sub_return_acquire
+#endif
+
+#ifdef arch_atomic64_sub_return_release
+INSTR_ATOMIC64_SUB_RETURN(_release)
+#define atomic64_sub_return_release atomic64_sub_return_release
+#endif
+
+#define INSTR_ATOMIC_FETCH_ADD(order)					\
+static __always_inline int						\
+atomic_fetch_add##order(int i, atomic_t *v)				\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic_fetch_add##order(i, v);			\
 }
 
-static __always_inline s64 atomic64_fetch_add(s64 i, atomic64_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic64_fetch_add(i, v);
+INSTR_ATOMIC_FETCH_ADD()
+
+#ifdef arch_atomic_fetch_add_relaxed
+INSTR_ATOMIC_FETCH_ADD(_relaxed)
+#define atomic_fetch_add_relaxed atomic_fetch_add_relaxed
+#endif
+
+#ifdef arch_atomic_fetch_add_acquire
+INSTR_ATOMIC_FETCH_ADD(_acquire)
+#define atomic_fetch_add_acquire atomic_fetch_add_acquire
+#endif
+
+#ifdef arch_atomic_fetch_add_release
+INSTR_ATOMIC_FETCH_ADD(_release)
+#define atomic_fetch_add_release atomic_fetch_add_release
+#endif
+
+#define INSTR_ATOMIC64_FETCH_ADD(order)					\
+static __always_inline s64						\
+atomic64_fetch_add##order(s64 i, atomic64_t *v)				\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic64_fetch_add##order(i, v);			\
 }
 
-static __always_inline int atomic_fetch_sub(int i, atomic_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic_fetch_sub(i, v);
+INSTR_ATOMIC64_FETCH_ADD()
+
+#ifdef arch_atomic64_fetch_add_relaxed
+INSTR_ATOMIC64_FETCH_ADD(_relaxed)
+#define atomic64_fetch_add_relaxed atomic64_fetch_add_relaxed
+#endif
+
+#ifdef arch_atomic64_fetch_add_acquire
+INSTR_ATOMIC64_FETCH_ADD(_acquire)
+#define atomic64_fetch_add_acquire atomic64_fetch_add_acquire
+#endif
+
+#ifdef arch_atomic64_fetch_add_release
+INSTR_ATOMIC64_FETCH_ADD(_release)
+#define atomic64_fetch_add_release atomic64_fetch_add_release
+#endif
+
+#define INSTR_ATOMIC_FETCH_SUB(order)					\
+static __always_inline int						\
+atomic_fetch_sub##order(int i, atomic_t *v)				\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic_fetch_sub##order(i, v);			\
 }
 
-static __always_inline s64 atomic64_fetch_sub(s64 i, atomic64_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic64_fetch_sub(i, v);
+INSTR_ATOMIC_FETCH_SUB()
+
+#ifdef arch_atomic_fetch_sub_relaxed
+INSTR_ATOMIC_FETCH_SUB(_relaxed)
+#define atomic_fetch_sub_relaxed atomic_fetch_sub_relaxed
+#endif
+
+#ifdef arch_atomic_fetch_sub_acquire
+INSTR_ATOMIC_FETCH_SUB(_acquire)
+#define atomic_fetch_sub_acquire atomic_fetch_sub_acquire
+#endif
+
+#ifdef arch_atomic_fetch_sub_release
+INSTR_ATOMIC_FETCH_SUB(_release)
+#define atomic_fetch_sub_release atomic_fetch_sub_release
+#endif
+
+#define INSTR_ATOMIC64_FETCH_SUB(order)					\
+static __always_inline s64						\
+atomic64_fetch_sub##order(s64 i, atomic64_t *v)				\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic64_fetch_sub##order(i, v);			\
 }
 
-static __always_inline int atomic_fetch_and(int i, atomic_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic_fetch_and(i, v);
+INSTR_ATOMIC64_FETCH_SUB()
+
+#ifdef arch_atomic64_fetch_sub_relaxed
+INSTR_ATOMIC64_FETCH_SUB(_relaxed)
+#define atomic64_fetch_sub_relaxed atomic64_fetch_sub_relaxed
+#endif
+
+#ifdef arch_atomic64_fetch_sub_acquire
+INSTR_ATOMIC64_FETCH_SUB(_acquire)
+#define atomic64_fetch_sub_acquire atomic64_fetch_sub_acquire
+#endif
+
+#ifdef arch_atomic64_fetch_sub_release
+INSTR_ATOMIC64_FETCH_SUB(_release)
+#define atomic64_fetch_sub_release atomic64_fetch_sub_release
+#endif
+
+#define INSTR_ATOMIC_FETCH_AND(order)					\
+static __always_inline int						\
+atomic_fetch_and##order(int i, atomic_t *v)				\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic_fetch_and##order(i, v);			\
 }
 
-static __always_inline s64 atomic64_fetch_and(s64 i, atomic64_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic64_fetch_and(i, v);
+INSTR_ATOMIC_FETCH_AND()
+
+#ifdef arch_atomic_fetch_and_relaxed
+INSTR_ATOMIC_FETCH_AND(_relaxed)
+#define atomic_fetch_and_relaxed atomic_fetch_and_relaxed
+#endif
+
+#ifdef arch_atomic_fetch_and_acquire
+INSTR_ATOMIC_FETCH_AND(_acquire)
+#define atomic_fetch_and_acquire atomic_fetch_and_acquire
+#endif
+
+#ifdef arch_atomic_fetch_and_release
+INSTR_ATOMIC_FETCH_AND(_release)
+#define atomic_fetch_and_release atomic_fetch_and_release
+#endif
+
+#define INSTR_ATOMIC64_FETCH_AND(order)					\
+static __always_inline s64						\
+atomic64_fetch_and##order(s64 i, atomic64_t *v)				\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic64_fetch_and##order(i, v);			\
 }
 
-static __always_inline int atomic_fetch_or(int i, atomic_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic_fetch_or(i, v);
+INSTR_ATOMIC64_FETCH_AND()
+
+#ifdef arch_atomic64_fetch_and_relaxed
+INSTR_ATOMIC64_FETCH_AND(_relaxed)
+#define atomic64_fetch_and_relaxed atomic64_fetch_and_relaxed
+#endif
+
+#ifdef arch_atomic64_fetch_and_acquire
+INSTR_ATOMIC64_FETCH_AND(_acquire)
+#define atomic64_fetch_and_acquire atomic64_fetch_and_acquire
+#endif
+
+#ifdef arch_atomic64_fetch_and_release
+INSTR_ATOMIC64_FETCH_AND(_release)
+#define atomic64_fetch_and_release atomic64_fetch_and_release
+#endif
+
+#define INSTR_ATOMIC_FETCH_OR(order)					\
+static __always_inline int						\
+atomic_fetch_or##order(int i, atomic_t *v)				\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic_fetch_or##order(i, v);			\
 }
 
-static __always_inline s64 atomic64_fetch_or(s64 i, atomic64_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic64_fetch_or(i, v);
+INSTR_ATOMIC_FETCH_OR()
+
+#ifdef arch_atomic_fetch_or_relaxed
+INSTR_ATOMIC_FETCH_OR(_relaxed)
+#define atomic_fetch_or_relaxed atomic_fetch_or_relaxed
+#endif
+
+#ifdef arch_atomic_fetch_or_acquire
+INSTR_ATOMIC_FETCH_OR(_acquire)
+#define atomic_fetch_or_acquire atomic_fetch_or_acquire
+#endif
+
+#ifdef arch_atomic_fetch_or_release
+INSTR_ATOMIC_FETCH_OR(_release)
+#define atomic_fetch_or_release atomic_fetch_or_release
+#endif
+
+#define INSTR_ATOMIC64_FETCH_OR(order)					\
+static __always_inline s64						\
+atomic64_fetch_or##order(s64 i, atomic64_t *v)				\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic64_fetch_or##order(i, v);			\
 }
 
-static __always_inline int atomic_fetch_xor(int i, atomic_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic_fetch_xor(i, v);
+INSTR_ATOMIC64_FETCH_OR()
+
+#ifdef arch_atomic64_fetch_or_relaxed
+INSTR_ATOMIC64_FETCH_OR(_relaxed)
+#define atomic64_fetch_or_relaxed atomic64_fetch_or_relaxed
+#endif
+
+#ifdef arch_atomic64_fetch_or_acquire
+INSTR_ATOMIC64_FETCH_OR(_acquire)
+#define atomic64_fetch_or_acquire atomic64_fetch_or_acquire
+#endif
+
+#ifdef arch_atomic64_fetch_or_release
+INSTR_ATOMIC64_FETCH_OR(_release)
+#define atomic64_fetch_or_release atomic64_fetch_or_release
+#endif
+
+#define INSTR_ATOMIC_FETCH_XOR(order)					\
+static __always_inline int						\
+atomic_fetch_xor##order(int i, atomic_t *v)				\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic_fetch_xor##order(i, v);			\
 }
 
-static __always_inline s64 atomic64_fetch_xor(s64 i, atomic64_t *v)
-{
-	kasan_check_write(v, sizeof(*v));
-	return arch_atomic64_fetch_xor(i, v);
+INSTR_ATOMIC_FETCH_XOR()
+
+#ifdef arch_atomic_fetch_xor_relaxed
+INSTR_ATOMIC_FETCH_XOR(_relaxed)
+#define atomic_fetch_xor_relaxed atomic_fetch_xor_relaxed
+#endif
+
+#ifdef arch_atomic_fetch_xor_acquire
+INSTR_ATOMIC_FETCH_XOR(_acquire)
+#define atomic_fetch_xor_acquire atomic_fetch_xor_acquire
+#endif
+
+#ifdef arch_atomic_fetch_xor_release
+INSTR_ATOMIC_FETCH_XOR(_release)
+#define atomic_fetch_xor_release atomic_fetch_xor_release
+#endif
+
+#define INSTR_ATOMIC64_FETCH_XOR(xorder)				\
+static __always_inline s64						\
+atomic64_fetch_xor##xorder(s64 i, atomic64_t *v)			\
+{									\
+	kasan_check_write(v, sizeof(*v));				\
+	return arch_atomic64_fetch_xor##xorder(i, v);			\
 }
 
+INSTR_ATOMIC64_FETCH_XOR()
+
+#ifdef arch_atomic64_fetch_xor_relaxed
+INSTR_ATOMIC64_FETCH_XOR(_relaxed)
+#define atomic64_fetch_xor_relaxed atomic64_fetch_xor_relaxed
+#endif
+
+#ifdef arch_atomic64_fetch_xor_acquire
+INSTR_ATOMIC64_FETCH_XOR(_acquire)
+#define atomic64_fetch_xor_acquire atomic64_fetch_xor_acquire
+#endif
+
+#ifdef arch_atomic64_fetch_xor_release
+INSTR_ATOMIC64_FETCH_XOR(_release)
+#define atomic64_fetch_xor_release atomic64_fetch_xor_release
+#endif
+
 static __always_inline bool atomic_sub_and_test(int i, atomic_t *v)
 {
 	kasan_check_write(v, sizeof(*v));
@@ -349,31 +1108,64 @@  static __always_inline bool atomic64_add_negative(s64 i, atomic64_t *v)
 	return arch_atomic64_add_negative(i, v);
 }
 
-static __always_inline unsigned long
-cmpxchg_size(volatile void *ptr, unsigned long old, unsigned long new, int size)
-{
-	kasan_check_write(ptr, size);
-	switch (size) {
-	case 1:
-		return arch_cmpxchg((u8 *)ptr, (u8)old, (u8)new);
-	case 2:
-		return arch_cmpxchg((u16 *)ptr, (u16)old, (u16)new);
-	case 4:
-		return arch_cmpxchg((u32 *)ptr, (u32)old, (u32)new);
-	case 8:
-		BUILD_BUG_ON(sizeof(unsigned long) != 8);
-		return arch_cmpxchg((u64 *)ptr, (u64)old, (u64)new);
-	}
-	BUILD_BUG();
-	return 0;
+#define INSTR_CMPXCHG(order)							\
+static __always_inline unsigned long						\
+cmpxchg##order##_size(volatile void *ptr, unsigned long old,			\
+		       unsigned long new, int size)				\
+{										\
+	kasan_check_write(ptr, size);						\
+	switch (size) {								\
+	case 1:									\
+		return arch_cmpxchg##order((u8 *)ptr, (u8)old, (u8)new);	\
+	case 2:									\
+		return arch_cmpxchg##order((u16 *)ptr, (u16)old, (u16)new);	\
+	case 4:									\
+		return arch_cmpxchg##order((u32 *)ptr, (u32)old, (u32)new);	\
+	case 8:									\
+		BUILD_BUG_ON(sizeof(unsigned long) != 8);			\
+		return arch_cmpxchg##order((u64 *)ptr, (u64)old, (u64)new);	\
+	}									\
+	BUILD_BUG();								\
+	return 0;								\
 }
 
+INSTR_CMPXCHG()
 #define cmpxchg(ptr, old, new)						\
 ({									\
 	((__typeof__(*(ptr)))cmpxchg_size((ptr), (unsigned long)(old),	\
 		(unsigned long)(new), sizeof(*(ptr))));			\
 })
 
+#ifdef arch_cmpxchg_relaxed
+INSTR_CMPXCHG(_relaxed)
+#define cmpxchg_relaxed(ptr, old, new)					\
+({									\
+	((__typeof__(*(ptr)))cmpxchg_relaxed_size((ptr),		\
+		(unsigned long)(old), (unsigned long)(new), 		\
+		sizeof(*(ptr))));					\
+})
+#endif
+
+#ifdef arch_cmpxchg_acquire
+INSTR_CMPXCHG(_acquire)
+#define cmpxchg_acquire(ptr, old, new)					\
+({									\
+	((__typeof__(*(ptr)))cmpxchg_acquire_size((ptr),		\
+		(unsigned long)(old), (unsigned long)(new), 		\
+		sizeof(*(ptr))));					\
+})
+#endif
+
+#ifdef arch_cmpxchg_release
+INSTR_CMPXCHG(_release)
+#define cmpxchg_release(ptr, old, new)					\
+({									\
+	((__typeof__(*(ptr)))cmpxchg_release_size((ptr),		\
+		(unsigned long)(old), (unsigned long)(new), 		\
+		sizeof(*(ptr))));					\
+})
+#endif
+
 static __always_inline unsigned long
 sync_cmpxchg_size(volatile void *ptr, unsigned long old, unsigned long new,
 		  int size)
@@ -428,19 +1220,48 @@  cmpxchg_local_size(volatile void *ptr, unsigned long old, unsigned long new,
 		sizeof(*(ptr))));					\
 })
 
-static __always_inline u64
-cmpxchg64_size(volatile u64 *ptr, u64 old, u64 new)
-{
-	kasan_check_write(ptr, sizeof(*ptr));
-	return arch_cmpxchg64(ptr, old, new);
+#define INSTR_CMPXCHG64(order)						\
+static __always_inline u64						\
+cmpxchg64##order##_size(volatile u64 *ptr, u64 old, u64 new)		\
+{									\
+	kasan_check_write(ptr, sizeof(*ptr));				\
+	return arch_cmpxchg64##order(ptr, old, new);			\
 }
 
+INSTR_CMPXCHG64()
 #define cmpxchg64(ptr, old, new)					\
 ({									\
 	((__typeof__(*(ptr)))cmpxchg64_size((ptr), (u64)(old),		\
 		(u64)(new)));						\
 })
 
+#ifdef arch_cmpxchg64_relaxed
+INSTR_CMPXCHG64(_relaxed)
+#define cmpxchg64_relaxed(ptr, old, new)				\
+({									\
+	((__typeof__(*(ptr)))cmpxchg64_relaxed_size((ptr), (u64)(old),	\
+		(u64)(new)));						\
+})
+#endif
+
+#ifdef arch_cmpxchg64_acquire
+INSTR_CMPXCHG64(_acquire)
+#define cmpxchg64_acquire(ptr, old, new)				\
+({									\
+	((__typeof__(*(ptr)))cmpxchg64_acquire_size((ptr), (u64)(old),	\
+		(u64)(new)));						\
+})
+#endif
+
+#ifdef arch_cmpxchg64_release
+INSTR_CMPXCHG64(_release)
+#define cmpxchg64_release(ptr, old, new)				\
+({									\
+	((__typeof__(*(ptr)))cmpxchg64_release_size((ptr), (u64)(old),	\
+		(u64)(new)));						\
+})
+#endif
+
 static __always_inline u64
 cmpxchg64_local_size(volatile u64 *ptr, u64 old, u64 new)
 {