[GIT,PULL] optimize 64-by-32 ddivision for constant divisors on 32-bit machines

Message ID alpine.LFD.2.20.1511231053450.22569@knanqh.ubzr
State New
Headers show

Commit Message

Nicolas Pitre Nov. 23, 2015, 4:04 p.m.
On Mon, 23 Nov 2015, Nicolas Pitre wrote:

> On Mon, 23 Nov 2015, Arnd Bergmann wrote:

> 

> > On Sunday 22 November 2015 17:28:33 Nicolas Pitre wrote:

> > > On Sun, 22 Nov 2015, Arnd Bergmann wrote:

> > > > On Monday 16 November 2015 20:20:38 you wrote:

> > > > I'm now getting a build regressing with the attached randconfig configuration,

> > > > when compiling drivers/net/wireless/iwlegacy/common.o:

> > > > 

> > > > drivers/built-in.o: In function `il_send_rxon_timing':

> > > > :(.text+0xbbac80): undefined reference to `__aeabi_uldivmod'

> > > > :(.text+0xbbac9c): undefined reference to `__aeabi_uldivmod'

> > > > :(.text+0xbbacdc): undefined reference to `__aeabi_uldivmod'

> > > > :(.text+0xbbadc8): undefined reference to `__aeabi_uldivmod'

> > > > :(.text+0xbbadf8): undefined reference to `__aeabi_uldivmod'

> > > > :(.text+0xbbae3c): more undefined references to `__aeabi_uldivmod' follow

> > > > drivers/built-in.o: In function `il_send_rxon_timing':

> > > > :(.text+0xbbb11c): undefined reference to `____ilog2_NaN'

> > > 

> 

> I'm able to reproduce it, and I am in the process of reintroducing the 

> last patch piece by piece to see what breaks.  The presence of 

> ____ilog2_NaN is particularly intriguing as this would mean 0 is passed 

> to ilog2() somewhere.


OK... I'm able to "fix" the build with:


What doesn't make sense to me is the fact that is_power_of_2() is 
defined as:

static inline __attribute__((const))
bool is_power_of_2(unsigned long n)
{
        return (n != 0 && ((n & (n - 1)) == 0));
}

So the test for zero is already in there.

And adding BUILD_BUG_ON(__builtin_constant_p(__base) && __base == 0) 
before the if doesn't trig either.

Confused.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Comments

Nicolas Pitre Nov. 23, 2015, 4:38 p.m. | #1
On Mon, 23 Nov 2015, Arnd Bergmann wrote:

> On Monday 23 November 2015 11:04:33 Nicolas Pitre wrote:

> > 

> > OK... I'm able to "fix" the build with:

> > 

> > diff --git a/include/asm-generic/div64.h b/include/asm-generic/div64.h

> > index 163f77999e..d246c4c801 100644

> > --- a/include/asm-generic/div64.h

> > +++ b/include/asm-generic/div64.h

> > @@ -206,7 +206,7 @@ extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor);

> >         uint32_t __rem;                                 \

> >         (void)(((typeof((n)) *)0) == ((uint64_t *)0));  \

> >         if (__builtin_constant_p(__base) &&             \

> > -           is_power_of_2(__base)) {                    \

> > +           is_power_of_2(__base) && __base != 0) {     \

> >                 __rem = (n) & (__base - 1);             \

> >                 (n) >>= ilog2(__base);                  \

> >         } else if (__div64_const32_is_OK &&             \

> > 

> > What doesn't make sense to me is the fact that is_power_of_2() is 

> > defined as:

> > 

> > static inline __attribute__((const))

> > bool is_power_of_2(unsigned long n)

> > {

> >         return (n != 0 && ((n & (n - 1)) == 0));

> > }

> > 

> > So the test for zero is already in there.

> > 

> > And adding BUILD_BUG_ON(__builtin_constant_p(__base) && __base == 0) 

> > before the if doesn't trig either.

> 

> I've seen similarly messed up situations with PROFILE_ALL_BRANCHES

> before, I think it's got something to do with how __builtin_constant_p()

> is used inside of the __trace_if() macro, and how gcc sometimes falls

> back to treating variables as not-really-constant based on context.

> 

> To gcc, __builtin_constant_p is just best-effort, and they don't care

> about returning false sometimes if they catch most cases in practice.


But here it must have returned true, and is_power_of_2() returned true 
as well (which implies that __base is not zero), ans somehow aving an 
additional __base != 0 test changes the outcome.  There is a correctness 
issue beyond __builtin_constant_p it seems.

> Note that llvm will always return false for __builtin_constant_p on

> non-pointer arguments, which breaks a lot of optimizations.


If llvm is able to optimize this case on its own then we won't need all 
this contraption.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Nicolas Pitre Nov. 24, 2015, 5:37 a.m. | #2
On Mon, 23 Nov 2015, Arnd Bergmann wrote:

> On Monday 23 November 2015 11:04:33 Nicolas Pitre wrote:

> > 

> > OK... I'm able to "fix" the build with:

> > 

> > diff --git a/include/asm-generic/div64.h b/include/asm-generic/div64.h

> > index 163f77999e..d246c4c801 100644

> > --- a/include/asm-generic/div64.h

> > +++ b/include/asm-generic/div64.h

> > @@ -206,7 +206,7 @@ extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor);

> >         uint32_t __rem;                                 \

> >         (void)(((typeof((n)) *)0) == ((uint64_t *)0));  \

> >         if (__builtin_constant_p(__base) &&             \

> > -           is_power_of_2(__base)) {                    \

> > +           is_power_of_2(__base) && __base != 0) {     \

> >                 __rem = (n) & (__base - 1);             \

> >                 (n) >>= ilog2(__base);                  \

> >         } else if (__div64_const32_is_OK &&             \

> > 

> > What doesn't make sense to me is the fact that is_power_of_2() is 

> > defined as:

> > 

> > static inline __attribute__((const))

> > bool is_power_of_2(unsigned long n)

> > {

> >         return (n != 0 && ((n & (n - 1)) == 0));

> > }

> > 

> > So the test for zero is already in there.

> > 

> > And adding BUILD_BUG_ON(__builtin_constant_p(__base) && __base == 0) 

> > before the if doesn't trig either.

> 

> I've seen similarly messed up situations with PROFILE_ALL_BRANCHES

> before, I think it's got something to do with how __builtin_constant_p()

> is used inside of the __trace_if() macro, and how gcc sometimes falls

> back to treating variables as not-really-constant based on context.

> 

> To gcc, __builtin_constant_p is just best-effort, and they don't care

> about returning false sometimes if they catch most cases in practice.


OK... I produced a minimal test case. I think gcc is screwed. And it is 
not a question of __builtin_constant_p being best effort. The resulting 
code is plain wrong where a variable is suddenly turned into a constant 
of value 0. Any random modification to various part of the code just 
makes it disappear but I didn't check the disassembly to see if it is 
then correct.

And the good news(tm) is that the same bug is triggered with x86 gcc as 
well.

Please try the attached test case.


Nicolas
Nicolas Pitre Nov. 24, 2015, 4:44 p.m. | #3
On Tue, 24 Nov 2015, Arnd Bergmann wrote:

> On Tuesday 24 November 2015 00:37:17 Nicolas Pitre wrote:

> > OK... I produced a minimal test case. I think gcc is screwed. And it is 

> > not a question of __builtin_constant_p being best effort. The resulting 

> > code is plain wrong where a variable is suddenly turned into a constant 

> > of value 0. Any random modification to various part of the code just 

> > makes it disappear but I didn't check the disassembly to see if it is 

> > then correct.

> > 

> > And the good news(tm) is that the same bug is triggered with x86 gcc as 

> > well.

> > 

> > Please try the attached test case.

> 

> I can confirm the behavior you see with gcc-4.9 and later (I only saw

> it on 5.x or later with the kernel code). 


It is really sensitive to code modifications. You simplify it somehow 
and the bug is gone.  So there might be some kind of complexity threshold 
that gcc-4.9 didn't reach with the kernel code.

> It seems to have been

> introduced with

> 

> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=206941

> 

> 

>         PR tree-optimization/59597

>         * tree-ssa-threadupdate.c (dump_jump_thread_path): Move to earlier

>         in file.  Accept new argument REGISTERING and use it to modify

>         dump output appropriately.

>         (register_jump_thread): Corresponding changes.

>         (mark_threaded_blocks): Reinstate code to cancel unprofitable

>         thread paths involving joiner blocks.  Add code to dump cancelled

>         jump threading paths.

>     

>         PR tree-optimization/59597

>         * gcc.dg/tree-ssa/pr59597.c: New test.

>     

>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@206941 138bc75d-0d04-0410-961f-82ee72b054a4

> 

>  gcc/ChangeLog                           |  11 +++++

>  gcc/testsuite/ChangeLog                 |   5 +++

>  gcc/testsuite/gcc.dg/tree-ssa/pr59597.c |  55 +++++++++++++++++++++++++

>  gcc/tree-ssa-threadupdate.c             | 125 +++++++++++++++++++++++++++++++++++++++------------------

>  4 files changed, 157 insertions(+), 39 deletions(-)

> 

> however, in that version, the -DHIDE_THE_BUG variant also fails:

> 

> compilation that works:

> gcc-cross -Wall -I /home/arnd/git/buildall/arm/gcc/gcc/include  -O2 -c -o /home/arnd/git/buildall/arm/gcc_testcase/src.o /home/arnd/git/buildall/arm/gcc_testcase/src.c -DHIDE_THE_BUG

> /home/arnd/git/buildall/arm/gcc_testcase/src.c: In function 'il_send_rxon_timing':

> /home/arnd/git/buildall/arm/gcc_testcase/src.c:39:30: error: call to '____ilog2_NaN' declared with attribute error: gcc is screwed


IMHO the fact that the -DHIDE_THE_BUG case is enough to make the bug go 
away sometimes is even more worrisome given the nature of the 
"workaround".

Given you have a good handle with the bad commits already, are you going 
to bring it up with the gcc people?


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch

diff --git a/include/asm-generic/div64.h b/include/asm-generic/div64.h
index 163f77999e..d246c4c801 100644
--- a/include/asm-generic/div64.h
+++ b/include/asm-generic/div64.h
@@ -206,7 +206,7 @@  extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor);
 	uint32_t __rem;					\
 	(void)(((typeof((n)) *)0) == ((uint64_t *)0));	\
 	if (__builtin_constant_p(__base) &&		\
-	    is_power_of_2(__base)) {			\
+	    is_power_of_2(__base) && __base != 0) {	\
 		__rem = (n) & (__base - 1);		\
 		(n) >>= ilog2(__base);			\
 	} else if (__div64_const32_is_OK &&		\