Message ID | CABXYE2VQo+55AACbN_H-DJXVZwu5TAbTYP94Op8Voh62cZ5_zw@mail.gmail.com |
---|---|
State | New |
Headers | show |
On 03/01/2017 03:06 PM, Jim Wilson wrote: > This is a proposed patch for the bug 79794 which I just submitted. > This isn't a regression, so this can wait for after the gcc 7 branch > if necessary. > > The problem here is that a reg+offset MEM target is passed to > extract_bit_field with a vector register source. On aarch64, we have > an instruction for this, but it accepts a reg address only, so the > address gets loaded into a reg inside extract_bit_field. We then > return to expand_expr which does > ! rtx_equal_p (temp, target) > which fails because of the address mode change, so we end up copying > target into a reg and then back to itself. > > expand_expr has a solution for this problem. There is an alt_rtl > variable that can be set when temp is logically the same as target. > This variable is currently not passed into extract_bit_field. This > patch does that. > > There is an additional complication that the actual address load into > a reg occurs inside maybe_expand_insn, and it doesn't seem reasonable > to pass alt_reg into that. However, I can grab a bit from the > expand_operand structure to indicate when an operand is the target, > and then clear it if target is replaced with a reg. > > The resulting patch works, but ends up a bit more invasive than I > hoped. The patch has passed a bootstrap and make check test on x86_64 > and aarch64. > > Jim > > > alt-rtl.patch > > > Proposed patch for RTL expand bug affecting aarch64 vector code. > > PR middle-end/79794 > * expmed.c (extract_bit_field_1): Add alt_rtl argument. Before > maybe_expand_insn call, set ops[0].target. If still set after call, > set alt_rtl. Add extra arg to recursive calls. > (extract_bit_field): Add alt_rtl argument. Pass to > extract_bit_field. > * expmed.h (extract_bit_field): Fix prototype. > * expr.c (emit_group_load_1, copy_blkmode_from_reg) > (copy_blkmode_to_reg, read_complex_part, store_field): Pass extra NULL > to extract_bit_field_calls. > (expand_expr_real_1): Pass alt_rtl to expand_expr_real instead of 0. > Pass alt_rtl to extract_bit_field calls. > * calls.c (store_unaligned_arguments_into_psuedos) > load_register_parameters): Pass extra NULL to extract_bit_field calls. > * optabs.c (maybe_legitimize_operand): Clear op->target when call > gen_reg_rtx. > * optabs.h (struct expand_operand): Add target bitfield. The only part I found intrusive was the op->target stuff, but it wasn't terrible. The additional argument creates visual clutter in the diffs as the callers get updated, but that's pretty easy to filter out. This seems fine to me. A testcase to add to the gcc.target testsuite would be useful, but I don't think it's strictly necessary. jeff
On Thu, May 4, 2017 at 7:24 PM, Jeff Law <law@redhat.com> wrote: > On 03/01/2017 03:06 PM, Jim Wilson wrote: > This seems fine to me. A testcase to add to the gcc.target testsuite would > be useful, but I don't think it's strictly necessary. Thanks for the review. It was 2 months since I posted it, so I retested it to make sure x86_64 and aarch64 bootstrap and make check still worked, then checked it in. That took me a while as I was busy with other stuff this week. The comment about a testcase is a good point; I will work on one to submit. Jim
On 05/04/2017 08:24 PM, Jeff Law wrote: > On 03/01/2017 03:06 PM, Jim Wilson wrote: >> This is a proposed patch for the bug 79794 which I just submitted. >> This isn't a regression, so this can wait for after the gcc 7 branch >> if necessary. >> >> The problem here is that a reg+offset MEM target is passed to >> extract_bit_field with a vector register source. On aarch64, we have >> an instruction for this, but it accepts a reg address only, so the >> address gets loaded into a reg inside extract_bit_field. We then >> return to expand_expr which does >> ! rtx_equal_p (temp, target) >> which fails because of the address mode change, so we end up copying >> target into a reg and then back to itself. >> >> expand_expr has a solution for this problem. There is an alt_rtl >> variable that can be set when temp is logically the same as target. >> This variable is currently not passed into extract_bit_field. This >> patch does that. >> >> There is an additional complication that the actual address load into >> a reg occurs inside maybe_expand_insn, and it doesn't seem reasonable >> to pass alt_reg into that. However, I can grab a bit from the >> expand_operand structure to indicate when an operand is the target, >> and then clear it if target is replaced with a reg. >> >> The resulting patch works, but ends up a bit more invasive than I >> hoped. The patch has passed a bootstrap and make check test on x86_64 >> and aarch64. >> >> Jim >> >> >> alt-rtl.patch >> >> >> Proposed patch for RTL expand bug affecting aarch64 vector code. >> >> PR middle-end/79794 >> * expmed.c (extract_bit_field_1): Add alt_rtl argument. Before >> maybe_expand_insn call, set ops[0].target. If still set after call, >> set alt_rtl. Add extra arg to recursive calls. >> (extract_bit_field): Add alt_rtl argument. Pass to >> extract_bit_field. >> * expmed.h (extract_bit_field): Fix prototype. >> * expr.c (emit_group_load_1, copy_blkmode_from_reg) >> (copy_blkmode_to_reg, read_complex_part, store_field): Pass extra >> NULL >> to extract_bit_field_calls. >> (expand_expr_real_1): Pass alt_rtl to expand_expr_real instead of 0. >> Pass alt_rtl to extract_bit_field calls. >> * calls.c (store_unaligned_arguments_into_psuedos) >> load_register_parameters): Pass extra NULL to extract_bit_field >> calls. >> * optabs.c (maybe_legitimize_operand): Clear op->target when call >> gen_reg_rtx. >> * optabs.h (struct expand_operand): Add target bitfield. > The only part I found intrusive was the op->target stuff, but it wasn't > terrible. > > The additional argument creates visual clutter in the diffs as the > callers get updated, but that's pretty easy to filter out. Explicitly passing the additional argument at all the call sites can be mitigated by giving the new alt_rtl argument a default value of NULL in the declarations of the extract_bit_field functions. Martin
On Fri, May 12, 2017 at 7:01 PM, Martin Sebor <msebor@gmail.com> wrote: > Explicitly passing the additional argument at all the call sites > can be mitigated by giving the new alt_rtl argument a default > value of NULL in the declarations of the extract_bit_field functions. I keep forgetting about C++ features, as I'm not used to writing C++. I already checked in the patch, and I don't see the benefit of changing it again. I will try to remember this for next time. Jim
The extra argument to extract_bit_field breaks builds for tilegx-linux-gnu and tilepro-linux-gnu (as shown by my glibc bot); there are calls in those back ends which haven't been updated. -- Joseph S. Myers joseph@codesourcery.com
On 05/15/2017 05:46 AM, Joseph Myers wrote: > The extra argument to extract_bit_field breaks builds for tilegx-linux-gnu > and tilepro-linux-gnu (as shown by my glibc bot); there are calls in those > back ends which haven't been updated. > I've got patches for the tile backends that I'll push today. jeff
I've got a testcase to add for this patch. Sorry about the delay, I took some time off to deal with a medical problem. This was tested with and without the extract_bit_field patch. The testcase fails without the patch and works with the patch. Jim gcc/testsuite/ PR middle-end/79794 * gcc.target/aarch64/pr79794.c: New. Index: gcc/testsuite/gcc.target/aarch64/pr79794.c =================================================================== --- gcc/testsuite/gcc.target/aarch64/pr79794.c (nonexistent) +++ gcc/testsuite/gcc.target/aarch64/pr79794.c (working copy) @@ -0,0 +1,25 @@ +/* PR middle-end/79794 */ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ +/* { dg-final { scan-assembler-not "umov" } } */ + +struct node_struct +{ + float _Complex gap; + unsigned long long state; +}; + +struct reg_struct +{ + int size; + struct node_struct *node; +}; + +void +func(int target, struct reg_struct *reg) +{ + int i; + + for(i=0; i<reg->size; i++) + reg->node[i].state ^= ((unsigned long long) 1 << target); +}
On 06/08/2017 11:07 AM, Jim Wilson wrote: > I've got a testcase to add for this patch. Sorry about the delay, I > took some time off to deal with a medical problem. > > This was tested with and without the extract_bit_field patch. The > testcase fails without the patch and works with the patch. Thanks. Please go ahead and install this. Sorry for the delay on my side as well. stack-clash has had me buried since late May and I'm just starting to dig out a bit. jeff
Proposed patch for RTL expand bug affecting aarch64 vector code. PR middle-end/79794 * expmed.c (extract_bit_field_1): Add alt_rtl argument. Before maybe_expand_insn call, set ops[0].target. If still set after call, set alt_rtl. Add extra arg to recursive calls. (extract_bit_field): Add alt_rtl argument. Pass to extract_bit_field. * expmed.h (extract_bit_field): Fix prototype. * expr.c (emit_group_load_1, copy_blkmode_from_reg) (copy_blkmode_to_reg, read_complex_part, store_field): Pass extra NULL to extract_bit_field_calls. (expand_expr_real_1): Pass alt_rtl to expand_expr_real instead of 0. Pass alt_rtl to extract_bit_field calls. * calls.c (store_unaligned_arguments_into_psuedos) load_register_parameters): Pass extra NULL to extract_bit_field calls. * optabs.c (maybe_legitimize_operand): Clear op->target when call gen_reg_rtx. * optabs.h (struct expand_operand): Add target bitfield. Index: gcc/calls.c =================================================================== --- gcc/calls.c (revision 245764) +++ gcc/calls.c (working copy) @@ -1161,7 +1161,7 @@ store_unaligned_arguments_into_pseudos (struct arg args[i].aligned_regs[j] = reg; word = extract_bit_field (word, bitsize, 0, 1, NULL_RTX, - word_mode, word_mode, false); + word_mode, word_mode, false, NULL); /* There is no need to restrict this code to loading items in TYPE_ALIGN sized hunks. The bitfield instructions can @@ -2554,7 +2554,8 @@ load_register_parameters (struct arg_data *args, i unsigned int bitoff = (nregs - 1) * BITS_PER_WORD; unsigned int bitsize = size * BITS_PER_UNIT - bitoff; rtx x = extract_bit_field (mem, bitsize, bitoff, 1, dest, - word_mode, word_mode, false); + word_mode, word_mode, false, + NULL); if (BYTES_BIG_ENDIAN) x = expand_shift (LSHIFT_EXPR, word_mode, x, BITS_PER_WORD - bitsize, dest, 1); Index: gcc/expmed.c =================================================================== --- gcc/expmed.c (revision 245764) +++ gcc/expmed.c (working copy) @@ -1528,7 +1528,7 @@ static rtx extract_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize, unsigned HOST_WIDE_INT bitnum, int unsignedp, rtx target, machine_mode mode, machine_mode tmode, - bool reverse, bool fallback_p) + bool reverse, bool fallback_p, rtx *alt_rtl) { rtx op0 = str_rtx; machine_mode int_mode; @@ -1604,10 +1604,13 @@ extract_bit_field_1 (rtx str_rtx, unsigned HOST_WI unsigned HOST_WIDE_INT pos = bitnum / GET_MODE_BITSIZE (innermode); create_output_operand (&ops[0], target, innermode); + ops[0].target = 1; create_input_operand (&ops[1], op0, outermode); create_integer_operand (&ops[2], pos); if (maybe_expand_insn (icode, 3, ops)) { + if (alt_rtl && ops[0].target) + *alt_rtl = target; target = ops[0].value; if (GET_MODE (target) != mode) return gen_lowpart (tmode, target); @@ -1729,7 +1732,7 @@ extract_bit_field_1 (rtx str_rtx, unsigned HOST_WI = extract_bit_field_1 (op0, MIN (BITS_PER_WORD, bitsize - i * BITS_PER_WORD), bitnum + bit_offset, 1, target_part, - mode, word_mode, reverse, fallback_p); + mode, word_mode, reverse, fallback_p, NULL); gcc_assert (target_part); if (!result_part) @@ -1832,7 +1835,7 @@ extract_bit_field_1 (rtx str_rtx, unsigned HOST_WI xop0 = copy_to_reg (xop0); rtx result = extract_bit_field_1 (xop0, bitsize, bitpos, unsignedp, target, - mode, tmode, reverse, false); + mode, tmode, reverse, false, NULL); if (result) return result; delete_insns_since (last); @@ -1888,7 +1891,8 @@ extract_bit_field_1 (rtx str_rtx, unsigned HOST_WI rtx extract_bit_field (rtx str_rtx, unsigned HOST_WIDE_INT bitsize, unsigned HOST_WIDE_INT bitnum, int unsignedp, rtx target, - machine_mode mode, machine_mode tmode, bool reverse) + machine_mode mode, machine_mode tmode, bool reverse, + rtx *alt_rtl) { machine_mode mode1; @@ -1923,7 +1927,7 @@ extract_bit_field (rtx str_rtx, unsigned HOST_WIDE } return extract_bit_field_1 (str_rtx, bitsize, bitnum, unsignedp, - target, mode, tmode, reverse, true); + target, mode, tmode, reverse, true, alt_rtl); } /* Use shifts and boolean operations to extract a field of BITSIZE bits Index: gcc/expmed.h =================================================================== --- gcc/expmed.h (revision 245764) +++ gcc/expmed.h (working copy) @@ -725,7 +725,7 @@ extern void store_bit_field (rtx, unsigned HOST_WI machine_mode, rtx, bool); extern rtx extract_bit_field (rtx, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, int, rtx, - machine_mode, machine_mode, bool); + machine_mode, machine_mode, bool, rtx *); extern rtx extract_low_bits (machine_mode, machine_mode, rtx); extern rtx expand_mult (machine_mode, rtx, rtx, rtx, int); extern rtx expand_mult_highpart_adjust (machine_mode, rtx, rtx, rtx, rtx, int); Index: gcc/expr.c =================================================================== --- gcc/expr.c (revision 245764) +++ gcc/expr.c (working copy) @@ -2192,7 +2192,8 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_sr && (!REG_P (tmps[i]) || GET_MODE (tmps[i]) != mode))) tmps[i] = extract_bit_field (tmps[i], bytelen * BITS_PER_UNIT, subpos * BITS_PER_UNIT, - 1, NULL_RTX, mode, mode, false); + 1, NULL_RTX, mode, mode, false, + NULL); } else { @@ -2202,7 +2203,8 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_sr mem = assign_stack_temp (GET_MODE (src), slen); emit_move_insn (mem, src); tmps[i] = extract_bit_field (mem, bytelen * BITS_PER_UNIT, - 0, 1, NULL_RTX, mode, mode, false); + 0, 1, NULL_RTX, mode, mode, false, + NULL); } } /* FIXME: A SIMD parallel will eventually lead to a subreg of a @@ -2245,7 +2247,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_sr else tmps[i] = extract_bit_field (src, bytelen * BITS_PER_UNIT, bytepos * BITS_PER_UNIT, 1, NULL_RTX, - mode, mode, false); + mode, mode, false, NULL); if (shift) tmps[i] = expand_shift (LSHIFT_EXPR, mode, tmps[i], @@ -2697,7 +2699,7 @@ copy_blkmode_from_reg (rtx target, rtx srcreg, tre extract_bit_field (src, bitsize, xbitpos % BITS_PER_WORD, 1, NULL_RTX, copy_mode, copy_mode, - false), + false, NULL), false); } } @@ -2776,7 +2778,7 @@ copy_blkmode_to_reg (machine_mode mode, tree src) extract_bit_field (src_word, bitsize, bitpos % BITS_PER_WORD, 1, NULL_RTX, word_mode, word_mode, - false), + false, NULL), false); } @@ -3225,7 +3227,7 @@ read_complex_part (rtx cplx, bool imag_p) } return extract_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, - true, NULL_RTX, imode, imode, false); + true, NULL_RTX, imode, imode, false, NULL); } /* A subroutine of emit_move_insn_1. Yet another lowpart generator. @@ -6911,7 +6913,7 @@ store_field (rtx target, HOST_WIDE_INT bitsize, HO { machine_mode temp_mode = smallest_mode_for_size (bitsize, MODE_INT); temp = extract_bit_field (temp, bitsize, 0, 1, NULL_RTX, temp_mode, - temp_mode, false); + temp_mode, false, NULL); } /* Store the value in the bitfield. */ @@ -9760,7 +9762,8 @@ expand_expr_real_1 (tree exp, rtx target, machine_ case GIMPLE_SINGLE_RHS: { r = expand_expr_real (gimple_assign_rhs1 (g), target, - tmode, modifier, NULL, inner_reference_p); + tmode, modifier, alt_rtl, + inner_reference_p); break; } default: @@ -10190,7 +10193,7 @@ expand_expr_real_1 (tree exp, rtx target, machine_ 0, TYPE_UNSIGNED (TREE_TYPE (exp)), (modifier == EXPAND_STACK_PARM ? NULL_RTX : target), - mode, mode, false); + mode, mode, false, alt_rtl); } if (reverse && modifier != EXPAND_MEMORY @@ -10687,7 +10690,7 @@ expand_expr_real_1 (tree exp, rtx target, machine_ op0 = extract_bit_field (op0, bitsize, bitpos, unsignedp, (modifier == EXPAND_STACK_PARM ? NULL_RTX : target), - ext_mode, ext_mode, reversep); + ext_mode, ext_mode, reversep, alt_rtl); /* If the result has a record type and the mode of OP0 is an integral mode then, if BITSIZE is narrower than this mode @@ -10908,7 +10911,7 @@ expand_expr_real_1 (tree exp, rtx target, machine_ else if (reduce_bit_field) return extract_bit_field (op0, TYPE_PRECISION (type), 0, TYPE_UNSIGNED (type), NULL_RTX, - mode, mode, false); + mode, mode, false, NULL); /* As a last resort, spill op0 to memory, and reload it in a different mode. */ else if (!MEM_P (op0)) Index: gcc/optabs.c =================================================================== --- gcc/optabs.c (revision 245764) +++ gcc/optabs.c (working copy) @@ -6942,6 +6942,7 @@ maybe_legitimize_operand (enum insn_code icode, un return true; op->value = gen_reg_rtx (mode); + op->target = 0; break; case EXPAND_INPUT: Index: gcc/optabs.h =================================================================== --- gcc/optabs.h (revision 245764) +++ gcc/optabs.h (working copy) @@ -48,8 +48,11 @@ struct expand_operand { rather than signed. Only meaningful for certain types. */ unsigned int unsigned_p : 1; + /* Is the target operand. */ + unsigned int target : 1; + /* Unused; available for future use. */ - unsigned int unused : 7; + unsigned int unused : 6; /* The mode passed to the convert_*_operand function. It has a type-dependent meaning. */