Message ID | 20181228010255.21406-5-adhemerval.zanella@linaro.org |
---|---|
State | New |
Headers | show |
Series | General fixes and refactor for BZ#12683 | expand |
On 28/12/18 6:32 AM, Adhemerval Zanella wrote: > The x86 defines optimized THREAD_ATOMIC_* macros where reference always > the current thread instead of the one indicated by input 'descr' argument. > It work as long the input is the self thread pointer, however it generates > wrong code is the semantic is to set a bit atomicialy from another thread. "if the semantic is to set a bit atomically" > > This is not an issue for current GLIBC usage, however the new cancellation > code expects that some synchronization code to atomically set bits from > different threads. > > If some usage indeed proves to be a hotspot we can add an extra macro > with a more descriptive name (THREAD_ATOMIC_BIT_SET_SELF for instance) > where i386 might optimize it. > > Checked on i686-linux-gnu. > > * sysdeps/i686/nptl/tls.h (THREAD_ATOMIC_CMPXCHG_VAL, > THREAD_ATOMIC_AND, THREAD_ATOMIC_BIT_SET): Remove macros. OK with that nit. Siddhesh
* Adhemerval Zanella: > The x86 defines optimized THREAD_ATOMIC_* macros where reference always > the current thread instead of the one indicated by input 'descr' argument. > It work as long the input is the self thread pointer, however it generates > wrong code is the semantic is to set a bit atomicialy from another thread. The entire set of THREAD_* macros is poorly documented, but this is part of the interface: On some targets, it is expensive to obtain the thread pointer, so it makes sense to cache it. On other targets, including x86-64 or i386, access to the thread pointer is implicit (or can be cached by the compiler), so an explicit cached descriptor is unnecessary. I think we can improve this a lot: We could use a new type for the cache, to make clear it is a cache, and make the macros type-safe. Or we could try to have the compiler cache the value for us, like we do with __errno_location. Thanks, Florian
On 25/01/2019 13:16, Florian Weimer wrote: > * Adhemerval Zanella: > >> The x86 defines optimized THREAD_ATOMIC_* macros where reference always >> the current thread instead of the one indicated by input 'descr' argument. >> It work as long the input is the self thread pointer, however it generates >> wrong code is the semantic is to set a bit atomicialy from another thread. > > The entire set of THREAD_* macros is poorly documented, but this is part > of the interface: On some targets, it is expensive to obtain the thread > pointer, so it makes sense to cache it. On other targets, including > x86-64 or i386, access to the thread pointer is implicit (or can be > cached by the compiler), so an explicit cached descriptor is > unnecessary. > > I think we can improve this a lot: We could use a new type for the > cache, to make clear it is a cache, and make the macros type-safe. Or > we could try to have the compiler cache the value for us, like we do > with __errno_location. > > Thanks, > Florian > It seems that besides the attribute const annotation for __errno_location, gcc does some conditional dead call elimination (gcc/tree-call-cdce.c) to optimize errno handling. Are you suggesting some similar? It also seems that some targets support __builtin_thread_pointer, maybe this might be better option (it might simplify compiler analysis). And I am not sure how compiler will handle when it is based on kernel ABI. Anyway, I agree that adding some type-safety, document the THREAD_* macros, evaluate if arch-specific do yield better code-gen, and check for inconsistency is indeed a gain.
* Adhemerval Zanella: > On 25/01/2019 13:16, Florian Weimer wrote: >> * Adhemerval Zanella: >> >>> The x86 defines optimized THREAD_ATOMIC_* macros where reference always >>> the current thread instead of the one indicated by input 'descr' argument. >>> It work as long the input is the self thread pointer, however it generates >>> wrong code is the semantic is to set a bit atomicialy from another thread. >> >> The entire set of THREAD_* macros is poorly documented, but this is part >> of the interface: On some targets, it is expensive to obtain the thread >> pointer, so it makes sense to cache it. On other targets, including >> x86-64 or i386, access to the thread pointer is implicit (or can be >> cached by the compiler), so an explicit cached descriptor is >> unnecessary. >> >> I think we can improve this a lot: We could use a new type for the >> cache, to make clear it is a cache, and make the macros type-safe. Or >> we could try to have the compiler cache the value for us, like we do >> with __errno_location. >> >> Thanks, >> Florian >> > > It seems that besides the attribute const annotation for __errno_location, > gcc does some conditional dead call elimination (gcc/tree-call-cdce.c) to > optimize errno handling. Are you suggesting some similar? Yes, that was my idea. But if GCC hard-codes the name of __errno_location, that's probably not possible to do. > It also seems that some targets support __builtin_thread_pointer, maybe > this might be better option (it might simplify compiler analysis). And > I am not sure how compiler will handle when it is based on kernel ABI. It would actually need Richard Henderson's namespace patches because some architectures use something close to that (x86 for instance). Getting the thread pointer on these architectures requires a load of a TLS variable. Thanks, Florian
diff --git a/sysdeps/i386/nptl/tls.h b/sysdeps/i386/nptl/tls.h index 12285d3217..22ebf3d741 100644 --- a/sysdeps/i386/nptl/tls.h +++ b/sysdeps/i386/nptl/tls.h @@ -362,43 +362,6 @@ tls_fill_user_desc (union user_desc_init *desc, }}) -/* Atomic compare and exchange on TLS, returning old value. */ -#define THREAD_ATOMIC_CMPXCHG_VAL(descr, member, newval, oldval) \ - ({ __typeof (descr->member) __ret; \ - __typeof (oldval) __old = (oldval); \ - if (sizeof (descr->member) == 4) \ - asm volatile (LOCK_PREFIX "cmpxchgl %2, %%gs:%P3" \ - : "=a" (__ret) \ - : "0" (__old), "r" (newval), \ - "i" (offsetof (struct pthread, member))); \ - else \ - /* Not necessary for other sizes in the moment. */ \ - abort (); \ - __ret; }) - - -/* Atomic logical and. */ -#define THREAD_ATOMIC_AND(descr, member, val) \ - (void) ({ if (sizeof ((descr)->member) == 4) \ - asm volatile (LOCK_PREFIX "andl %1, %%gs:%P0" \ - :: "i" (offsetof (struct pthread, member)), \ - "ir" (val)); \ - else \ - /* Not necessary for other sizes in the moment. */ \ - abort (); }) - - -/* Atomic set bit. */ -#define THREAD_ATOMIC_BIT_SET(descr, member, bit) \ - (void) ({ if (sizeof ((descr)->member) == 4) \ - asm volatile (LOCK_PREFIX "orl %1, %%gs:%P0" \ - :: "i" (offsetof (struct pthread, member)), \ - "ir" (1 << (bit))); \ - else \ - /* Not necessary for other sizes in the moment. */ \ - abort (); }) - - /* Set the stack guard field in TCB head. */ #define THREAD_SET_STACK_GUARD(value) \ THREAD_SETMEM (THREAD_SELF, header.stack_guard, value)