Message ID | 87h8mxea5s.fsf@linaro.org |
---|---|
State | New |
Headers | show |
Series | Prefer open-coding vector integer division | expand |
On Thu, May 24, 2018 at 09:21:35AM +0100, Richard Sandiford wrote: > vect_recog_divmod_pattern currently bails out if the target has > native support for integer division, but I think in practice > it's always going to be better to open-code it anyway, just as > we usually open-code scalar divisions by constants. > > I think the only currently affected target is MIPS MSA, where for: Isn't powerpcspe affected too? It has a divv2si3 pattern. > --- gcc/tree-vect-patterns.c 2018-05-16 12:48:59.115202362 +0100 > +++ gcc/tree-vect-patterns.c 2018-05-24 09:18:10.445466941 +0100 > @@ -2639,7 +2639,6 @@ vect_recog_divmod_pattern (vec<gimple *> > enum tree_code rhs_code; > stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt); > vec_info *vinfo = stmt_vinfo->vinfo; > - optab optab; > tree q; > int dummy_int, prec; > stmt_vec_info def_stmt_vinfo; > @@ -2674,17 +2673,6 @@ vect_recog_divmod_pattern (vec<gimple *> > if (vectype == NULL_TREE) > return NULL; > > - /* If the target can handle vectorized division or modulo natively, > - don't attempt to optimize this. */ > - optab = optab_for_tree_code (rhs_code, vectype, optab_default); > - if (optab != unknown_optab) > - { > - machine_mode vec_mode = TYPE_MODE (vectype); > - int icode = (int) optab_handler (optab, vec_mode); > - if (icode != CODE_FOR_nothing) > - return NULL; > - } > - Shouldn't we instead keep it, but only do it if optimize_bb_for_size_p (gimple_bb (last_stmt)) ? I mean, a hw division is very likely shorter than what we replace it with... Jakub
On Thu, May 24, 2018 at 10:31 AM Jakub Jelinek <jakub@redhat.com> wrote: > On Thu, May 24, 2018 at 09:21:35AM +0100, Richard Sandiford wrote: > > vect_recog_divmod_pattern currently bails out if the target has > > native support for integer division, but I think in practice > > it's always going to be better to open-code it anyway, just as > > we usually open-code scalar divisions by constants. > > > > I think the only currently affected target is MIPS MSA, where for: > Isn't powerpcspe affected too? It has a divv2si3 pattern. > > --- gcc/tree-vect-patterns.c 2018-05-16 12:48:59.115202362 +0100 > > +++ gcc/tree-vect-patterns.c 2018-05-24 09:18:10.445466941 +0100 > > @@ -2639,7 +2639,6 @@ vect_recog_divmod_pattern (vec<gimple *> > > enum tree_code rhs_code; > > stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt); > > vec_info *vinfo = stmt_vinfo->vinfo; > > - optab optab; > > tree q; > > int dummy_int, prec; > > stmt_vec_info def_stmt_vinfo; > > @@ -2674,17 +2673,6 @@ vect_recog_divmod_pattern (vec<gimple *> > > if (vectype == NULL_TREE) > > return NULL; > > > > - /* If the target can handle vectorized division or modulo natively, > > - don't attempt to optimize this. */ > > - optab = optab_for_tree_code (rhs_code, vectype, optab_default); > > - if (optab != unknown_optab) > > - { > > - machine_mode vec_mode = TYPE_MODE (vectype); > > - int icode = (int) optab_handler (optab, vec_mode); > > - if (icode != CODE_FOR_nothing) > > - return NULL; > > - } > > - > Shouldn't we instead keep it, but only do it if > optimize_bb_for_size_p (gimple_bb (last_stmt)) ? > I mean, a hw division is very likely shorter than what we replace it with... Good idea. While loops are checked for optimize_loop_nest_for_speed if not forced vect BB vectorization has no such predicate. Richard. > Jakub
Richard Biener <richard.guenther@gmail.com> writes: > On Thu, May 24, 2018 at 10:31 AM Jakub Jelinek <jakub@redhat.com> wrote: >> On Thu, May 24, 2018 at 09:21:35AM +0100, Richard Sandiford wrote: >> > vect_recog_divmod_pattern currently bails out if the target has >> > native support for integer division, but I think in practice >> > it's always going to be better to open-code it anyway, just as >> > we usually open-code scalar divisions by constants. >> > >> > I think the only currently affected target is MIPS MSA, where for: > >> Isn't powerpcspe affected too? It has a divv2si3 pattern. Ah, yeah. I forgot to update the description after you mentioned that on IRC, sorry. >> > --- gcc/tree-vect-patterns.c 2018-05-16 12:48:59.115202362 +0100 >> > +++ gcc/tree-vect-patterns.c 2018-05-24 09:18:10.445466941 +0100 >> > @@ -2639,7 +2639,6 @@ vect_recog_divmod_pattern (vec<gimple *> >> > enum tree_code rhs_code; >> > stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt); >> > vec_info *vinfo = stmt_vinfo->vinfo; >> > - optab optab; >> > tree q; >> > int dummy_int, prec; >> > stmt_vec_info def_stmt_vinfo; >> > @@ -2674,17 +2673,6 @@ vect_recog_divmod_pattern (vec<gimple *> >> > if (vectype == NULL_TREE) >> > return NULL; >> > >> > - /* If the target can handle vectorized division or modulo natively, >> > - don't attempt to optimize this. */ >> > - optab = optab_for_tree_code (rhs_code, vectype, optab_default); >> > - if (optab != unknown_optab) >> > - { >> > - machine_mode vec_mode = TYPE_MODE (vectype); >> > - int icode = (int) optab_handler (optab, vec_mode); >> > - if (icode != CODE_FOR_nothing) >> > - return NULL; >> > - } >> > - > >> Shouldn't we instead keep it, but only do it if >> optimize_bb_for_size_p (gimple_bb (last_stmt)) ? >> I mean, a hw division is very likely shorter than what we replace it > with... > > Good idea. While loops are checked for optimize_loop_nest_for_speed if > not forced vect BB vectorization has no such predicate. Like this, if it passes testing? Thanks, Richard 2018-05-24 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-vect-patterns.c: Include predict.h. (vect_recog_divmod_pattern): Restrict check for division support to when optimizing for size. gcc/testsuite/ * gcc.dg/vect/bb-slp-div-1.c: New XFAILed test. Index: gcc/tree-vect-patterns.c =================================================================== *** gcc/tree-vect-patterns.c 2018-05-24 13:50:21.656138942 +0100 --- gcc/tree-vect-patterns.c 2018-05-24 13:50:21.819134113 +0100 *************** vect_recog_divmod_pattern (vec<gimple *> *** 2674,2688 **** if (vectype == NULL_TREE) return NULL; ! /* If the target can handle vectorized division or modulo natively, ! don't attempt to optimize this. */ ! optab = optab_for_tree_code (rhs_code, vectype, optab_default); ! if (optab != unknown_optab) { ! machine_mode vec_mode = TYPE_MODE (vectype); ! int icode = (int) optab_handler (optab, vec_mode); ! if (icode != CODE_FOR_nothing) ! return NULL; } prec = TYPE_PRECISION (itype); --- 2674,2692 ---- if (vectype == NULL_TREE) return NULL; ! if (optimize_bb_for_size_p (gimple_bb (last_stmt))) { ! /* If the target can handle vectorized division or modulo natively, ! don't attempt to optimize this, since native division is likely ! to give smaller code. */ ! optab = optab_for_tree_code (rhs_code, vectype, optab_default); ! if (optab != unknown_optab) ! { ! machine_mode vec_mode = TYPE_MODE (vectype); ! int icode = (int) optab_handler (optab, vec_mode); ! if (icode != CODE_FOR_nothing) ! return NULL; ! } } prec = TYPE_PRECISION (itype); Index: gcc/testsuite/gcc.dg/vect/bb-slp-div-1.c =================================================================== *** /dev/null 2018-04-20 16:19:46.369131350 +0100 --- gcc/testsuite/gcc.dg/vect/bb-slp-div-1.c 2018-05-24 13:50:21.818134143 +0100 *************** *** 0 **** --- 1,19 ---- + /* { dg-do compile } */ + /* { dg-additional-options "-msve-vector-bits=256" { target aarch64_sve } } */ + + int x[8]; + + void + f (void) + { + x[0] /= 2; + x[1] /= 3; + x[2] /= 4; + x[3] /= 5; + x[4] /= 6; + x[5] /= 7; + x[6] /= 8; + x[7] /= 9; + } + + /* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { xfail *-*-* } } } */
On Thu, May 24, 2018 at 2:52 PM Richard Sandiford < richard.sandiford@linaro.org> wrote: > Richard Biener <richard.guenther@gmail.com> writes: > > On Thu, May 24, 2018 at 10:31 AM Jakub Jelinek <jakub@redhat.com> wrote: > >> On Thu, May 24, 2018 at 09:21:35AM +0100, Richard Sandiford wrote: > >> > vect_recog_divmod_pattern currently bails out if the target has > >> > native support for integer division, but I think in practice > >> > it's always going to be better to open-code it anyway, just as > >> > we usually open-code scalar divisions by constants. > >> > > >> > I think the only currently affected target is MIPS MSA, where for: > > > >> Isn't powerpcspe affected too? It has a divv2si3 pattern. > Ah, yeah. I forgot to update the description after you mentioned > that on IRC, sorry. > >> > --- gcc/tree-vect-patterns.c 2018-05-16 12:48:59.115202362 +0100 > >> > +++ gcc/tree-vect-patterns.c 2018-05-24 09:18:10.445466941 +0100 > >> > @@ -2639,7 +2639,6 @@ vect_recog_divmod_pattern (vec<gimple *> > >> > enum tree_code rhs_code; > >> > stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt); > >> > vec_info *vinfo = stmt_vinfo->vinfo; > >> > - optab optab; > >> > tree q; > >> > int dummy_int, prec; > >> > stmt_vec_info def_stmt_vinfo; > >> > @@ -2674,17 +2673,6 @@ vect_recog_divmod_pattern (vec<gimple *> > >> > if (vectype == NULL_TREE) > >> > return NULL; > >> > > >> > - /* If the target can handle vectorized division or modulo natively, > >> > - don't attempt to optimize this. */ > >> > - optab = optab_for_tree_code (rhs_code, vectype, optab_default); > >> > - if (optab != unknown_optab) > >> > - { > >> > - machine_mode vec_mode = TYPE_MODE (vectype); > >> > - int icode = (int) optab_handler (optab, vec_mode); > >> > - if (icode != CODE_FOR_nothing) > >> > - return NULL; > >> > - } > >> > - > > > >> Shouldn't we instead keep it, but only do it if > >> optimize_bb_for_size_p (gimple_bb (last_stmt)) ? > >> I mean, a hw division is very likely shorter than what we replace it > > with... > > > > Good idea. While loops are checked for optimize_loop_nest_for_speed if > > not forced vect BB vectorization has no such predicate. > Like this, if it passes testing? Looks good. Thanks, Richard. > Thanks, > Richard > 2018-05-24 Richard Sandiford <richard.sandiford@linaro.org> > gcc/ > * tree-vect-patterns.c: Include predict.h. > (vect_recog_divmod_pattern): Restrict check for division support > to when optimizing for size. > gcc/testsuite/ > * gcc.dg/vect/bb-slp-div-1.c: New XFAILed test. > Index: gcc/tree-vect-patterns.c > =================================================================== > *** gcc/tree-vect-patterns.c 2018-05-24 13:50:21.656138942 +0100 > --- gcc/tree-vect-patterns.c 2018-05-24 13:50:21.819134113 +0100 > *************** vect_recog_divmod_pattern (vec<gimple *> > *** 2674,2688 **** > if (vectype == NULL_TREE) > return NULL; > ! /* If the target can handle vectorized division or modulo natively, > ! don't attempt to optimize this. */ > ! optab = optab_for_tree_code (rhs_code, vectype, optab_default); > ! if (optab != unknown_optab) > { > ! machine_mode vec_mode = TYPE_MODE (vectype); > ! int icode = (int) optab_handler (optab, vec_mode); > ! if (icode != CODE_FOR_nothing) > ! return NULL; > } > prec = TYPE_PRECISION (itype); > --- 2674,2692 ---- > if (vectype == NULL_TREE) > return NULL; > ! if (optimize_bb_for_size_p (gimple_bb (last_stmt))) > { > ! /* If the target can handle vectorized division or modulo natively, > ! don't attempt to optimize this, since native division is likely > ! to give smaller code. */ > ! optab = optab_for_tree_code (rhs_code, vectype, optab_default); > ! if (optab != unknown_optab) > ! { > ! machine_mode vec_mode = TYPE_MODE (vectype); > ! int icode = (int) optab_handler (optab, vec_mode); > ! if (icode != CODE_FOR_nothing) > ! return NULL; > ! } > } > prec = TYPE_PRECISION (itype); > Index: gcc/testsuite/gcc.dg/vect/bb-slp-div-1.c > =================================================================== > *** /dev/null 2018-04-20 16:19:46.369131350 +0100 > --- gcc/testsuite/gcc.dg/vect/bb-slp-div-1.c 2018-05-24 13:50:21.818134143 +0100 > *************** > *** 0 **** > --- 1,19 ---- > + /* { dg-do compile } */ > + /* { dg-additional-options "-msve-vector-bits=256" { target aarch64_sve } } */ > + > + int x[8]; > + > + void > + f (void) > + { > + x[0] /= 2; > + x[1] /= 3; > + x[2] /= 4; > + x[3] /= 5; > + x[4] /= 6; > + x[5] /= 7; > + x[6] /= 8; > + x[7] /= 9; > + } > + > + /* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { xfail *-*-* } } } */
Richard Sandiford <richard.sandiford@linaro.org> writes: > vect_recog_divmod_pattern currently bails out if the target has > native support for integer division, but I think in practice > it's always going to be better to open-code it anyway, just as > we usually open-code scalar divisions by constants. > > I think the only currently affected target is MIPS MSA, where for: > > void > foo (int *x) > { > for (int i = 0; i < 100; ++i) > x[i] /= 2; > } > > we previously preferred to use division for powers of 2: > > .set noreorder > bnz.w $w1,1f > div_s.w $w0,$w0,$w1 > break 7 > .set reorder > 1: > > (or just the div_s.w for -mno-check-zero-division), but after the patch > we open-code them using shifts: > > clt_s.w $w1,$w0,$w2 > subv.w $w0,$w0,$w1 > srai.w $w0,$w0,1 > > I assume that's better. Matthew, is that right? Sorry for extreme tardiness. Yes, the alternate sequence has a max latency of 6. Although I don't have the range of latencies to hand for the FDIV, as far as I remember 6 cycles is better than the fastest FDIV case at least for i6400/i6500. Matthew
Index: gcc/tree-vect-patterns.c =================================================================== --- gcc/tree-vect-patterns.c 2018-05-16 12:48:59.115202362 +0100 +++ gcc/tree-vect-patterns.c 2018-05-24 09:18:10.445466941 +0100 @@ -2639,7 +2639,6 @@ vect_recog_divmod_pattern (vec<gimple *> enum tree_code rhs_code; stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt); vec_info *vinfo = stmt_vinfo->vinfo; - optab optab; tree q; int dummy_int, prec; stmt_vec_info def_stmt_vinfo; @@ -2674,17 +2673,6 @@ vect_recog_divmod_pattern (vec<gimple *> if (vectype == NULL_TREE) return NULL; - /* If the target can handle vectorized division or modulo natively, - don't attempt to optimize this. */ - optab = optab_for_tree_code (rhs_code, vectype, optab_default); - if (optab != unknown_optab) - { - machine_mode vec_mode = TYPE_MODE (vectype); - int icode = (int) optab_handler (optab, vec_mode); - if (icode != CODE_FOR_nothing) - return NULL; - } - prec = TYPE_PRECISION (itype); if (integer_pow2p (oprnd1)) { Index: gcc/testsuite/gcc.dg/vect/bb-slp-div-1.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/vect/bb-slp-div-1.c 2018-05-24 09:18:10.444466986 +0100 @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-msve-vector-bits=256" { target aarch64_sve } } */ + +int x[8]; + +void +f (void) +{ + x[0] /= 2; + x[1] /= 3; + x[2] /= 4; + x[3] /= 5; + x[4] /= 6; + x[5] /= 7; + x[6] /= 8; + x[7] /= 9; +} + +/* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { xfail *-*-* } } } */