From patchwork Wed Mar 30 08:57:52 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 836 Return-Path: Delivered-To: unknown Received: from imap.gmail.com (74.125.159.109) by localhost6.localdomain6 with IMAP4-SSL; 08 Jun 2011 14:46:18 -0000 Delivered-To: patches@linaro.org Received: by 10.42.161.68 with SMTP id s4cs28665icx; Wed, 30 Mar 2011 01:57:57 -0700 (PDT) Received: by 10.216.145.23 with SMTP id o23mr926637wej.14.1301475476847; Wed, 30 Mar 2011 01:57:56 -0700 (PDT) Received: from mail-wy0-f178.google.com (mail-wy0-f178.google.com [74.125.82.178]) by mx.google.com with ESMTPS id z1si10477285weq.97.2011.03.30.01.57.55 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 30 Mar 2011 01:57:55 -0700 (PDT) Received-SPF: neutral (google.com: 74.125.82.178 is neither permitted nor denied by best guess record for domain of richard.sandiford@linaro.org) client-ip=74.125.82.178; Authentication-Results: mx.google.com; spf=neutral (google.com: 74.125.82.178 is neither permitted nor denied by best guess record for domain of richard.sandiford@linaro.org) smtp.mail=richard.sandiford@linaro.org Received: by wyb33 with SMTP id 33so1063032wyb.37 for ; Wed, 30 Mar 2011 01:57:55 -0700 (PDT) Received: by 10.227.168.138 with SMTP id u10mr915408wby.186.1301475475028; Wed, 30 Mar 2011 01:57:55 -0700 (PDT) Received: from richards-thinkpad (gbibp9ph1--blueice2n1.emea.ibm.com [195.212.29.75]) by mx.google.com with ESMTPS id y29sm2893940wbd.4.2011.03.30.01.57.53 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 30 Mar 2011 01:57:54 -0700 (PDT) From: Richard Sandiford To: patches@linaro.org Mail-Followup-To: patches@linaro.org, richard.sandiford@linaro.org Subject: [Richard Sandiford] Some remodelling of the ARM vld and vst patterns Date: Wed, 30 Mar 2011 09:57:52 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) MIME-Version: 1.0 Content-Disposition: inline The patterns for the Neon vld and vst intrinsics use the following sort of construct to refer to memory: (mem:FOO (match_operand:SI X "register_operand" "r")) This patch changes them to use: (match_operand:FOO' X "neon_struct_operand" "(=)Um") instead. This has some performance benefits: - It allows the loads to use post-increment addresses as well as bare registers. - If: /* FIXME: vld1 allows register post-modify. */ were fixed, it would allow register post-modify addresses too. - It allows alignment hints to be generated. It also more closely matches the form that future autovectorisation optabs would have. There are a couple of correctness fixes too: - The old v{ld,st}{3,4}q patterns generated two individual instructions, each post-incrementing the address. The problem is the expander passed the original register input operand to both patterns, instead of passing a temporary register. We could therefore end up post-incrementing a live register variable. E.g. for: void __attribute__((noinline)) foo (uint32_t *a) { uint32x4x3_t x; x = vld3q_u32 (a); x.val[0] = vaddq_u32 (x.val[0], x.val[1]); vst3q_u32 (a, x); } the vld3q_u32 moves "a" forward 12 elements, so the vst3q_u32 stores to the wrong address. After the above change, we don't need to encode the post-increment directly. We can just leave the auto-inc-dec pass to figure out a good sequence (which it does seem to do in practice). [tested by neon-vld3-1.c] - At the moment, we use this mode attribute to set the modes of three-element loads and stores: ;; Similar, for three elements. ;; ??? Should we define extra modes so that sizes of all three-element ;; accesses can be accurately represented? (define_mode_attr V_three_elem [(V8QI "SI") (V16QI "SI") (V4HI "V4HI") (V8HI "V4HI") (V2SI "V4SI") (V4SI "V4SI") (V2SF "V4SF") (V4SF "V4SF") (DI "EI") (V2DI "EI")]) The ??? is saying that the V8QI-derived MEM is really a 3-byte access, not a 4-byte (SI) access, and so on. The comment makes the mode sound like a representational niceity, but really, there's no such thing as a "conservatively wrong" memory size here. If a store's mode is too small, dependent loads could be deleted as dead. If it's too big, unrelated live loads could be deleted as dead. The approach taken in the patch means that we can use BLKmode here, and rely on MEM_SIZE to specify the size of the access. One problem with using BLKmode is that it stops pre- and post-modifications being used. Seeing as that wasn't possible before the patch either, I'd like to leave it as future work. [tested by neon-vst3-1.c] At the moment, it isn't safe to use the natural alias set, because arm_neon.h uses the same built-in function for both signed and unsigned operations. If this patch is OK, we could in principle go further and add separate signed and unsigned built-in functions. It all depends on whether uses of the API implemented by arm_neon.h are expected to be alias-safe or not. The patch applies on top of: http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01634.html (unreviewed). Tested on arm-linux-gnueabi. OK to install? Richard gcc/ * config/arm/arm.c (arm_print_operand): Use MEM_SIZE to get the size of a '%A' memory reference. (T_DREG, T_QREG): New neon_builtin_type_bits. (arm_init_neon_builtins): Assert that the load and store operands are neon_struct_operands. (locate_neon_builtin_icode): Provide the neon_builtin_type_bits. (NEON_ARG_MEMORY): New builtin_arg. (neon_dereference_pointer): New function. (arm_expand_neon_args): Add a neon_builtin_type_bits argument. Handle NEON_ARG_MEMORY. (arm_expand_neon_builtin): Update after above interface changes. Use NEON_ARG_MEMORY for loads and stores. * config/arm/predicates.md (neon_struct_operand): New predicate. * config/arm/iterators.md (V_two_elem): Tweak formatting. (V_three_elem): Use BLKmode for accesses that have no associated mode. (V_four_elem): Tweak formatting. * config/arm/neon.md (neon_vld1, neon_vld1_dup) (neon_vst1_lane, neon_vst1, neon_vld2) (neon_vld2_lane, neon_vld2_dup, neon_vst2) (neon_vst2_lane, neon_vld3, neon_vld3_lane) (neon_vld3_dup, neon_vst3, neon_vst3_lane) (neon_vld4, neon_vld4_lane, neon_vld4_dup) (neon_vst4): Replace pointer operand with a memory operand. Use %A in the output template. (neon_vld3qa, neon_vld3qb, neon_vst3qa) (neon_vst3qb, neon_vld4qa, neon_vld4qb) (neon_vst4qa, neon_vst4qb): Likewise, but halve the width of the memory access. Remove post-increment. * config/arm/neon-testgen.ml: Allow addresses to have an alignment. gcc/testsuite/ * gcc.target/arm/neon-vld3-1.c: New test. * gcc.target/arm/neon-vst3-1.c: New test. * gcc.target/arm/neon/v*.c: Regenerate. Index: gcc/config/arm/arm.c =================================================================== --- gcc/config/arm/arm.c 2011-03-29 08:52:13.000000000 +0100 +++ gcc/config/arm/arm.c 2011-03-29 09:38:42.000000000 +0100 @@ -16613,7 +16613,7 @@ arm_print_operand (FILE *stream, rtx x, { rtx addr; bool postinc = FALSE; - unsigned align, modesize, align_bits; + unsigned align, memsize, align_bits; gcc_assert (GET_CODE (x) == MEM); addr = XEXP (x, 0); @@ -16628,12 +16628,12 @@ arm_print_operand (FILE *stream, rtx x, instruction (for some alignments) as an aid to the memory subsystem of the target. */ align = MEM_ALIGN (x) >> 3; - modesize = GET_MODE_SIZE (GET_MODE (x)); + memsize = INTVAL (MEM_SIZE (x)); /* Only certain alignment specifiers are supported by the hardware. */ - if (modesize == 16 && (align % 32) == 0) + if (memsize == 16 && (align % 32) == 0) align_bits = 256; - else if ((modesize == 8 || modesize == 16) && (align % 16) == 0) + else if ((memsize == 8 || memsize == 16) && (align % 16) == 0) align_bits = 128; else if ((align % 8) == 0) align_bits = 64; @@ -18293,12 +18293,14 @@ enum neon_builtin_type_bits { T_V2SI = 0x0004, T_V2SF = 0x0008, T_DI = 0x0010, + T_DREG = 0x001F, T_V16QI = 0x0020, T_V8HI = 0x0040, T_V4SI = 0x0080, T_V4SF = 0x0100, T_V2DI = 0x0200, T_TI = 0x0400, + T_QREG = 0x07E0, T_EI = 0x0800, T_OI = 0x1000 }; @@ -18944,10 +18946,9 @@ arm_init_neon_builtins (void) if (is_load && k == 1) { /* Neon load patterns always have the memory operand - (a SImode pointer) in the operand 1 position. We - want a const pointer to the element type in that - position. */ - gcc_assert (insn_data[icode].operand[k].mode == SImode); + in the operand 1 position. */ + gcc_assert (insn_data[icode].operand[k].predicate + == neon_struct_operand); switch (1 << j) { @@ -18982,10 +18983,9 @@ arm_init_neon_builtins (void) else if (is_store && k == 0) { /* Similarly, Neon store patterns use operand 0 as - the memory location to store to (a SImode pointer). - Use a pointer to the element type of the store in - that position. */ - gcc_assert (insn_data[icode].operand[k].mode == SImode); + the memory location to store to. */ + gcc_assert (insn_data[icode].operand[k].predicate + == neon_struct_operand); switch (1 << j) { @@ -19305,12 +19305,13 @@ neon_builtin_compare (const void *a, con } static enum insn_code -locate_neon_builtin_icode (int fcode, neon_itype *itype) +locate_neon_builtin_icode (int fcode, neon_itype *itype, + enum neon_builtin_type_bits *type_bit) { neon_builtin_datum key = { NULL, (neon_itype) 0, 0, { CODE_FOR_nothing }, 0, 0 }; neon_builtin_datum *found; - int idx; + int idx, type, ntypes; key.base_fcode = fcode; found = (neon_builtin_datum *) @@ -19323,20 +19324,84 @@ locate_neon_builtin_icode (int fcode, ne if (itype) *itype = found->itype; + if (type_bit) + { + ntypes = 0; + for (type = 0; type < T_MAX; type++) + if (found->bits & (1 << type)) + { + if (ntypes == idx) + break; + ntypes++; + } + gcc_assert (type < T_MAX); + *type_bit = (enum neon_builtin_type_bits) (1 << type); + } return found->codes[idx]; } typedef enum { NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT, + NEON_ARG_MEMORY, NEON_ARG_STOP } builtin_arg; #define NEON_MAX_BUILTIN_ARGS 5 +/* EXP is a pointer argument to a Neon load or store intrinsic. Derive + and return an expression for the accessed memory. + + The intrinsic function operates on a block of registers that has + mode REG_MODE. This block contains vectors of type TYPE_BIT. + The function references the memory at EXP in mode MEM_MODE; + this mode may be BLKmode if no more suitable mode is available. */ + +static tree +neon_dereference_pointer (tree exp, enum machine_mode mem_mode, + enum machine_mode reg_mode, + enum neon_builtin_type_bits type_bit) +{ + HOST_WIDE_INT reg_size, vector_size, nvectors, nelems; + tree elem_type, upper_bound, array_type; + + /* Work out the size of the register block in bytes. */ + reg_size = GET_MODE_SIZE (reg_mode); + + /* Work out the size of each vector in bytes. */ + gcc_assert (type_bit & (T_DREG | T_QREG)); + vector_size = (type_bit & T_QREG ? 16 : 8); + + /* Work out how many vectors there are. */ + gcc_assert (reg_size % vector_size == 0); + nvectors = reg_size / vector_size; + + /* Work out how many elements are being loaded or stored. + MEM_MODE == REG_MODE implies a one-to-one mapping between register + and memory elements; anything else implies a lane load or store. */ + if (mem_mode == reg_mode) + nelems = vector_size * nvectors; + else + nelems = nvectors; + + /* Work out the type of each element. */ + gcc_assert (POINTER_TYPE_P (TREE_TYPE (exp))); + elem_type = TREE_TYPE (TREE_TYPE (exp)); + + /* Create a type that describes the full access. */ + upper_bound = build_int_cst (size_type_node, nelems - 1); + array_type = build_array_type (elem_type, build_index_type (upper_bound)); + + /* Dereference EXP using that type. */ + exp = convert (build_pointer_type (array_type), exp); + return fold_build2 (MEM_REF, array_type, exp, + build_int_cst (TREE_TYPE (exp), 0)); +} + /* Expand a Neon builtin. */ static rtx arm_expand_neon_args (rtx target, int icode, int have_retval, + enum neon_builtin_type_bits type_bit, tree exp, ...) { va_list ap; @@ -19345,7 +19410,9 @@ arm_expand_neon_args (rtx target, int ic rtx op[NEON_MAX_BUILTIN_ARGS]; enum machine_mode tmode = insn_data[icode].operand[0].mode; enum machine_mode mode[NEON_MAX_BUILTIN_ARGS]; + enum machine_mode other_mode; int argc = 0; + int opno; if (have_retval && (!target @@ -19363,26 +19430,46 @@ arm_expand_neon_args (rtx target, int ic break; else { + opno = argc + have_retval; + mode[argc] = insn_data[icode].operand[opno].mode; arg[argc] = CALL_EXPR_ARG (exp, argc); + if (thisarg == NEON_ARG_MEMORY) + { + other_mode = insn_data[icode].operand[1 - opno].mode; + arg[argc] = neon_dereference_pointer (arg[argc], mode[argc], + other_mode, type_bit); + } op[argc] = expand_normal (arg[argc]); - mode[argc] = insn_data[icode].operand[argc + have_retval].mode; switch (thisarg) { case NEON_ARG_COPY_TO_REG: /*gcc_assert (GET_MODE (op[argc]) == mode[argc]);*/ - if (!(*insn_data[icode].operand[argc + have_retval].predicate) + if (!(*insn_data[icode].operand[opno].predicate) (op[argc], mode[argc])) op[argc] = copy_to_mode_reg (mode[argc], op[argc]); break; case NEON_ARG_CONSTANT: /* FIXME: This error message is somewhat unhelpful. */ - if (!(*insn_data[icode].operand[argc + have_retval].predicate) + if (!(*insn_data[icode].operand[opno].predicate) (op[argc], mode[argc])) error ("argument must be a constant"); break; + case NEON_ARG_MEMORY: + gcc_assert (MEM_P (op[argc])); + PUT_MODE (op[argc], mode[argc]); + /* ??? arm_neon.h uses the same built-in functions for signed + and unsigned accesses, casting where necessary. This isn't + alias safe. */ + set_mem_alias_set (op[argc], 0); + if (!(*insn_data[icode].operand[opno].predicate) + (op[argc], mode[argc])) + op[argc] = (replace_equiv_address + (op[argc], force_reg (Pmode, XEXP (op[argc], 0)))); + break; + case NEON_ARG_STOP: gcc_unreachable (); } @@ -19461,14 +19548,15 @@ arm_expand_neon_args (rtx target, int ic arm_expand_neon_builtin (int fcode, tree exp, rtx target) { neon_itype itype; - enum insn_code icode = locate_neon_builtin_icode (fcode, &itype); + enum neon_builtin_type_bits type_bit; + enum insn_code icode = locate_neon_builtin_icode (fcode, &itype, &type_bit); switch (itype) { case NEON_UNOP: case NEON_CONVERT: case NEON_DUPLANE: - return arm_expand_neon_args (target, icode, 1, exp, + return arm_expand_neon_args (target, icode, 1, type_bit, exp, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT, NEON_ARG_STOP); case NEON_BINOP: @@ -19478,90 +19566,90 @@ arm_expand_neon_builtin (int fcode, tree case NEON_SCALARMULH: case NEON_SHIFTINSERT: case NEON_LOGICBINOP: - return arm_expand_neon_args (target, icode, 1, exp, + return arm_expand_neon_args (target, icode, 1, type_bit, exp, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT, NEON_ARG_STOP); case NEON_TERNOP: - return arm_expand_neon_args (target, icode, 1, exp, + return arm_expand_neon_args (target, icode, 1, type_bit, exp, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT, NEON_ARG_STOP); case NEON_GETLANE: case NEON_FIXCONV: case NEON_SHIFTIMM: - return arm_expand_neon_args (target, icode, 1, exp, + return arm_expand_neon_args (target, icode, 1, type_bit, exp, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT, NEON_ARG_CONSTANT, NEON_ARG_STOP); case NEON_CREATE: - return arm_expand_neon_args (target, icode, 1, exp, + return arm_expand_neon_args (target, icode, 1, type_bit, exp, NEON_ARG_COPY_TO_REG, NEON_ARG_STOP); case NEON_DUP: case NEON_SPLIT: case NEON_REINTERP: - return arm_expand_neon_args (target, icode, 1, exp, + return arm_expand_neon_args (target, icode, 1, type_bit, exp, NEON_ARG_COPY_TO_REG, NEON_ARG_STOP); case NEON_COMBINE: case NEON_VTBL: - return arm_expand_neon_args (target, icode, 1, exp, + return arm_expand_neon_args (target, icode, 1, type_bit, exp, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_STOP); case NEON_RESULTPAIR: - return arm_expand_neon_args (target, icode, 0, exp, + return arm_expand_neon_args (target, icode, 0, type_bit, exp, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_STOP); case NEON_LANEMUL: case NEON_LANEMULL: case NEON_LANEMULH: - return arm_expand_neon_args (target, icode, 1, exp, + return arm_expand_neon_args (target, icode, 1, type_bit, exp, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT, NEON_ARG_CONSTANT, NEON_ARG_STOP); case NEON_LANEMAC: - return arm_expand_neon_args (target, icode, 1, exp, + return arm_expand_neon_args (target, icode, 1, type_bit, exp, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT, NEON_ARG_CONSTANT, NEON_ARG_STOP); case NEON_SHIFTACC: - return arm_expand_neon_args (target, icode, 1, exp, + return arm_expand_neon_args (target, icode, 1, type_bit, exp, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT, NEON_ARG_CONSTANT, NEON_ARG_STOP); case NEON_SCALARMAC: - return arm_expand_neon_args (target, icode, 1, exp, + return arm_expand_neon_args (target, icode, 1, type_bit, exp, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT, NEON_ARG_STOP); case NEON_SELECT: case NEON_VTBX: - return arm_expand_neon_args (target, icode, 1, exp, + return arm_expand_neon_args (target, icode, 1, type_bit, exp, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_STOP); case NEON_LOAD1: case NEON_LOADSTRUCT: - return arm_expand_neon_args (target, icode, 1, exp, - NEON_ARG_COPY_TO_REG, NEON_ARG_STOP); + return arm_expand_neon_args (target, icode, 1, type_bit, exp, + NEON_ARG_MEMORY, NEON_ARG_STOP); case NEON_LOAD1LANE: case NEON_LOADSTRUCTLANE: - return arm_expand_neon_args (target, icode, 1, exp, - NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT, + return arm_expand_neon_args (target, icode, 1, type_bit, exp, + NEON_ARG_MEMORY, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT, NEON_ARG_STOP); case NEON_STORE1: case NEON_STORESTRUCT: - return arm_expand_neon_args (target, icode, 0, exp, - NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_STOP); + return arm_expand_neon_args (target, icode, 0, type_bit, exp, + NEON_ARG_MEMORY, NEON_ARG_COPY_TO_REG, NEON_ARG_STOP); case NEON_STORE1LANE: case NEON_STORESTRUCTLANE: - return arm_expand_neon_args (target, icode, 0, exp, - NEON_ARG_COPY_TO_REG, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT, + return arm_expand_neon_args (target, icode, 0, type_bit, exp, + NEON_ARG_MEMORY, NEON_ARG_COPY_TO_REG, NEON_ARG_CONSTANT, NEON_ARG_STOP); } Index: gcc/config/arm/predicates.md =================================================================== --- gcc/config/arm/predicates.md 2011-03-29 08:52:13.000000000 +0100 +++ gcc/config/arm/predicates.md 2011-03-29 08:52:16.000000000 +0100 @@ -683,3 +683,7 @@ (define_special_predicate "vect_par_cons } return true; }) + +(define_special_predicate "neon_struct_operand" + (and (match_code "mem") + (match_test "TARGET_32BIT && neon_vector_mem_operand (op, 2)"))) Index: gcc/config/arm/iterators.md =================================================================== --- gcc/config/arm/iterators.md 2011-03-29 08:52:13.000000000 +0100 +++ gcc/config/arm/iterators.md 2011-03-29 09:40:14.000000000 +0100 @@ -194,24 +194,22 @@ (define_mode_attr V_ext [(V8QI "SI") (V1 ;; Mode of pair of elements for each vector mode, to define transfer ;; size for structure lane/dup loads and stores. -(define_mode_attr V_two_elem [(V8QI "HI") (V16QI "HI") - (V4HI "SI") (V8HI "SI") +(define_mode_attr V_two_elem [(V8QI "HI") (V16QI "HI") + (V4HI "SI") (V8HI "SI") (V2SI "V2SI") (V4SI "V2SI") (V2SF "V2SF") (V4SF "V2SF") (DI "V2DI") (V2DI "V2DI")]) ;; Similar, for three elements. -;; ??? Should we define extra modes so that sizes of all three-element -;; accesses can be accurately represented? -(define_mode_attr V_three_elem [(V8QI "SI") (V16QI "SI") - (V4HI "V4HI") (V8HI "V4HI") - (V2SI "V4SI") (V4SI "V4SI") - (V2SF "V4SF") (V4SF "V4SF") - (DI "EI") (V2DI "EI")]) +(define_mode_attr V_three_elem [(V8QI "BLK") (V16QI "BLK") + (V4HI "BLK") (V8HI "BLK") + (V2SI "BLK") (V4SI "BLK") + (V2SF "BLK") (V4SF "BLK") + (DI "EI") (V2DI "EI")]) ;; Similar, for four elements. (define_mode_attr V_four_elem [(V8QI "SI") (V16QI "SI") - (V4HI "V4HI") (V8HI "V4HI") + (V4HI "V4HI") (V8HI "V4HI") (V2SI "V4SI") (V4SI "V4SI") (V2SF "V4SF") (V4SF "V4SF") (DI "OI") (V2DI "OI")]) Index: gcc/config/arm/neon.md =================================================================== --- gcc/config/arm/neon.md 2011-03-29 08:52:13.000000000 +0100 +++ gcc/config/arm/neon.md 2011-03-29 09:46:11.000000000 +0100 @@ -4259,16 +4259,16 @@ (define_expand "neon_vreinterpretv2di" [(set (match_operand:VDQX 0 "s_register_operand" "=w") - (unspec:VDQX [(mem:VDQX (match_operand:SI 1 "s_register_operand" "r"))] + (unspec:VDQX [(match_operand:VDQX 1 "neon_struct_operand" "Um")] UNSPEC_VLD1))] "TARGET_NEON" - "vld1.\t%h0, [%1]" + "vld1.\t%h0, %A1" [(set_attr "neon_type" "neon_vld1_1_2_regs")] ) (define_insn "neon_vld1_lane" [(set (match_operand:VDX 0 "s_register_operand" "=w") - (unspec:VDX [(mem: (match_operand:SI 1 "s_register_operand" "r")) + (unspec:VDX [(match_operand: 1 "neon_struct_operand" "Um") (match_operand:VDX 2 "s_register_operand" "0") (match_operand:SI 3 "immediate_operand" "i")] UNSPEC_VLD1_LANE))] @@ -4279,9 +4279,9 @@ (define_insn "neon_vld1_lane" if (lane < 0 || lane >= max) error ("lane out of range"); if (max == 1) - return "vld1.\t%P0, [%1]"; + return "vld1.\t%P0, %A1"; else - return "vld1.\t{%P0[%c3]}, [%1]"; + return "vld1.\t{%P0[%c3]}, %A1"; } [(set (attr "neon_type") (if_then_else (eq (const_string "") (const_int 2)) @@ -4291,7 +4291,7 @@ (define_insn "neon_vld1_lane" (define_insn "neon_vld1_lane" [(set (match_operand:VQX 0 "s_register_operand" "=w") - (unspec:VQX [(mem: (match_operand:SI 1 "s_register_operand" "r")) + (unspec:VQX [(match_operand: 1 "neon_struct_operand" "Um") (match_operand:VQX 2 "s_register_operand" "0") (match_operand:SI 3 "immediate_operand" "i")] UNSPEC_VLD1_LANE))] @@ -4310,9 +4310,9 @@ (define_insn "neon_vld1_lane" } operands[0] = gen_rtx_REG (mode, regno); if (max == 2) - return "vld1.\t%P0, [%1]"; + return "vld1.\t%P0, %A1"; else - return "vld1.\t{%P0[%c3]}, [%1]"; + return "vld1.\t{%P0[%c3]}, %A1"; } [(set (attr "neon_type") (if_then_else (eq (const_string "") (const_int 2)) @@ -4322,14 +4322,14 @@ (define_insn "neon_vld1_lane" (define_insn "neon_vld1_dup" [(set (match_operand:VDX 0 "s_register_operand" "=w") - (unspec:VDX [(mem: (match_operand:SI 1 "s_register_operand" "r"))] + (unspec:VDX [(match_operand: 1 "neon_struct_operand" "Um")] UNSPEC_VLD1_DUP))] "TARGET_NEON" { if (GET_MODE_NUNITS (mode) > 1) - return "vld1.\t{%P0[]}, [%1]"; + return "vld1.\t{%P0[]}, %A1"; else - return "vld1.\t%h0, [%1]"; + return "vld1.\t%h0, %A1"; } [(set (attr "neon_type") (if_then_else (gt (const_string "") (const_string "1")) @@ -4339,14 +4339,14 @@ (define_insn "neon_vld1_dup" (define_insn "neon_vld1_dup" [(set (match_operand:VQX 0 "s_register_operand" "=w") - (unspec:VQX [(mem: (match_operand:SI 1 "s_register_operand" "r"))] + (unspec:VQX [(match_operand: 1 "neon_struct_operand" "Um")] UNSPEC_VLD1_DUP))] "TARGET_NEON" { if (GET_MODE_NUNITS (mode) > 2) - return "vld1.\t{%e0[], %f0[]}, [%1]"; + return "vld1.\t{%e0[], %f0[]}, %A1"; else - return "vld1.\t%h0, [%1]"; + return "vld1.\t%h0, %A1"; } [(set (attr "neon_type") (if_then_else (gt (const_string "") (const_string "1")) @@ -4355,15 +4355,15 @@ (define_insn "neon_vld1_dup" ) (define_insn "neon_vst1" - [(set (mem:VDQX (match_operand:SI 0 "s_register_operand" "r")) + [(set (match_operand:VDQX 0 "neon_struct_operand" "=Um") (unspec:VDQX [(match_operand:VDQX 1 "s_register_operand" "w")] UNSPEC_VST1))] "TARGET_NEON" - "vst1.\t%h1, [%0]" + "vst1.\t%h1, %A0" [(set_attr "neon_type" "neon_vst1_1_2_regs_vst2_2_regs")]) (define_insn "neon_vst1_lane" - [(set (mem: (match_operand:SI 0 "s_register_operand" "r")) + [(set (match_operand: 0 "neon_struct_operand" "=Um") (vec_select: (match_operand:VDX 1 "s_register_operand" "w") (parallel [(match_operand:SI 2 "neon_lane_number" "i")])))] @@ -4374,9 +4374,9 @@ (define_insn "neon_vst1_lane" if (lane < 0 || lane >= max) error ("lane out of range"); if (max == 1) - return "vst1.\t{%P1}, [%0]"; + return "vst1.\t{%P1}, %A0"; else - return "vst1.\t{%P1[%c2]}, [%0]"; + return "vst1.\t{%P1[%c2]}, %A0"; } [(set (attr "neon_type") (if_then_else (eq (const_string "") (const_int 1)) @@ -4384,7 +4384,7 @@ (define_insn "neon_vst1_lane" (const_string "neon_vst1_vst2_lane")))]) (define_insn "neon_vst1_lane" - [(set (mem: (match_operand:SI 0 "s_register_operand" "r")) + [(set (match_operand: 0 "neon_struct_operand" "=Um") (vec_select: (match_operand:VQX 1 "s_register_operand" "w") (parallel [(match_operand:SI 2 "neon_lane_number" "i")])))] @@ -4403,24 +4403,24 @@ (define_insn "neon_vst1_lane" } operands[1] = gen_rtx_REG (mode, regno); if (max == 2) - return "vst1.\t{%P1}, [%0]"; + return "vst1.\t{%P1}, %A0"; else - return "vst1.\t{%P1[%c2]}, [%0]"; + return "vst1.\t{%P1[%c2]}, %A0"; } [(set_attr "neon_type" "neon_vst1_vst2_lane")] ) (define_insn "neon_vld2" [(set (match_operand:TI 0 "s_register_operand" "=w") - (unspec:TI [(mem:TI (match_operand:SI 1 "s_register_operand" "r")) + (unspec:TI [(match_operand:TI 1 "neon_struct_operand" "Um") (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD2))] "TARGET_NEON" { if ( == 64) - return "vld1.64\t%h0, [%1]"; + return "vld1.64\t%h0, %A1"; else - return "vld2.\t%h0, [%1]"; + return "vld2.\t%h0, %A1"; } [(set (attr "neon_type") (if_then_else (eq (const_string "") (const_string "64")) @@ -4430,16 +4430,16 @@ (define_insn "neon_vld2" (define_insn "neon_vld2" [(set (match_operand:OI 0 "s_register_operand" "=w") - (unspec:OI [(mem:OI (match_operand:SI 1 "s_register_operand" "r")) + (unspec:OI [(match_operand:OI 1 "neon_struct_operand" "Um") (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD2))] "TARGET_NEON" - "vld2.\t%h0, [%1]" + "vld2.\t%h0, %A1" [(set_attr "neon_type" "neon_vld2_2_regs_vld1_vld2_all_lanes")]) (define_insn "neon_vld2_lane" [(set (match_operand:TI 0 "s_register_operand" "=w") - (unspec:TI [(mem: (match_operand:SI 1 "s_register_operand" "r")) + (unspec:TI [(match_operand: 1 "neon_struct_operand" "Um") (match_operand:TI 2 "s_register_operand" "0") (match_operand:SI 3 "immediate_operand" "i") (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] @@ -4456,7 +4456,7 @@ (define_insn "neon_vld2_lane" ops[1] = gen_rtx_REG (DImode, regno + 2); ops[2] = operands[1]; ops[3] = operands[3]; - output_asm_insn ("vld2.\t{%P0[%c3], %P1[%c3]}, [%2]", ops); + output_asm_insn ("vld2.\t{%P0[%c3], %P1[%c3]}, %A2", ops); return ""; } [(set_attr "neon_type" "neon_vld1_vld2_lane")] @@ -4464,7 +4464,7 @@ (define_insn "neon_vld2_lane" (define_insn "neon_vld2_lane" [(set (match_operand:OI 0 "s_register_operand" "=w") - (unspec:OI [(mem: (match_operand:SI 1 "s_register_operand" "r")) + (unspec:OI [(match_operand: 1 "neon_struct_operand" "Um") (match_operand:OI 2 "s_register_operand" "0") (match_operand:SI 3 "immediate_operand" "i") (unspec:VMQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] @@ -4486,7 +4486,7 @@ (define_insn "neon_vld2_lane" ops[1] = gen_rtx_REG (DImode, regno + 4); ops[2] = operands[1]; ops[3] = GEN_INT (lane); - output_asm_insn ("vld2.\t{%P0[%c3], %P1[%c3]}, [%2]", ops); + output_asm_insn ("vld2.\t{%P0[%c3], %P1[%c3]}, %A2", ops); return ""; } [(set_attr "neon_type" "neon_vld1_vld2_lane")] @@ -4494,15 +4494,15 @@ (define_insn "neon_vld2_lane" (define_insn "neon_vld2_dup" [(set (match_operand:TI 0 "s_register_operand" "=w") - (unspec:TI [(mem: (match_operand:SI 1 "s_register_operand" "r")) + (unspec:TI [(match_operand: 1 "neon_struct_operand" "Um") (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD2_DUP))] "TARGET_NEON" { if (GET_MODE_NUNITS (mode) > 1) - return "vld2.\t{%e0[], %f0[]}, [%1]"; + return "vld2.\t{%e0[], %f0[]}, %A1"; else - return "vld1.\t%h0, [%1]"; + return "vld1.\t%h0, %A1"; } [(set (attr "neon_type") (if_then_else (gt (const_string "") (const_string "1")) @@ -4511,16 +4511,16 @@ (define_insn "neon_vld2_dup" ) (define_insn "neon_vst2" - [(set (mem:TI (match_operand:SI 0 "s_register_operand" "r")) + [(set (match_operand:TI 0 "neon_struct_operand" "=Um") (unspec:TI [(match_operand:TI 1 "s_register_operand" "w") (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VST2))] "TARGET_NEON" { if ( == 64) - return "vst1.64\t%h1, [%0]"; + return "vst1.64\t%h1, %A0"; else - return "vst2.\t%h1, [%0]"; + return "vst2.\t%h1, %A0"; } [(set (attr "neon_type") (if_then_else (eq (const_string "") (const_string "64")) @@ -4529,17 +4529,17 @@ (define_insn "neon_vst2" ) (define_insn "neon_vst2" - [(set (mem:OI (match_operand:SI 0 "s_register_operand" "r")) + [(set (match_operand:OI 0 "neon_struct_operand" "=Um") (unspec:OI [(match_operand:OI 1 "s_register_operand" "w") (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VST2))] "TARGET_NEON" - "vst2.\t%h1, [%0]" + "vst2.\t%h1, %A0" [(set_attr "neon_type" "neon_vst1_1_2_regs_vst2_2_regs")] ) (define_insn "neon_vst2_lane" - [(set (mem: (match_operand:SI 0 "s_register_operand" "r")) + [(set (match_operand: 0 "neon_struct_operand" "=Um") (unspec: [(match_operand:TI 1 "s_register_operand" "w") (match_operand:SI 2 "immediate_operand" "i") @@ -4557,14 +4557,14 @@ (define_insn "neon_vst2_lane" ops[1] = gen_rtx_REG (DImode, regno); ops[2] = gen_rtx_REG (DImode, regno + 2); ops[3] = operands[2]; - output_asm_insn ("vst2.\t{%P1[%c3], %P2[%c3]}, [%0]", ops); + output_asm_insn ("vst2.\t{%P1[%c3], %P2[%c3]}, %A0", ops); return ""; } [(set_attr "neon_type" "neon_vst1_vst2_lane")] ) (define_insn "neon_vst2_lane" - [(set (mem: (match_operand:SI 0 "s_register_operand" "r")) + [(set (match_operand: 0 "neon_struct_operand" "=Um") (unspec: [(match_operand:OI 1 "s_register_operand" "w") (match_operand:SI 2 "immediate_operand" "i") @@ -4587,7 +4587,7 @@ (define_insn "neon_vst2_lane" ops[1] = gen_rtx_REG (DImode, regno); ops[2] = gen_rtx_REG (DImode, regno + 4); ops[3] = GEN_INT (lane); - output_asm_insn ("vst2.\t{%P1[%c3], %P2[%c3]}, [%0]", ops); + output_asm_insn ("vst2.\t{%P1[%c3], %P2[%c3]}, %A0", ops); return ""; } [(set_attr "neon_type" "neon_vst1_vst2_lane")] @@ -4595,15 +4595,15 @@ (define_insn "neon_vst2_lane" (define_insn "neon_vld3" [(set (match_operand:EI 0 "s_register_operand" "=w") - (unspec:EI [(mem:EI (match_operand:SI 1 "s_register_operand" "r")) + (unspec:EI [(match_operand:EI 1 "neon_struct_operand" "Um") (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD3))] "TARGET_NEON" { if ( == 64) - return "vld1.64\t%h0, [%1]"; + return "vld1.64\t%h0, %A1"; else - return "vld3.\t%h0, [%1]"; + return "vld3.\t%h0, %A1"; } [(set (attr "neon_type") (if_then_else (eq (const_string "") (const_string "64")) @@ -4612,25 +4612,25 @@ (define_insn "neon_vld3" ) (define_expand "neon_vld3" - [(match_operand:CI 0 "s_register_operand" "=w") - (match_operand:SI 1 "s_register_operand" "+r") + [(match_operand:CI 0 "s_register_operand") + (match_operand:CI 1 "neon_struct_operand") (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] "TARGET_NEON" { - emit_insn (gen_neon_vld3qa (operands[0], operands[1], operands[1])); - emit_insn (gen_neon_vld3qb (operands[0], operands[0], - operands[1], operands[1])); + rtx mem; + + mem = adjust_address (operands[1], EImode, 0); + emit_insn (gen_neon_vld3qa (operands[0], mem)); + mem = adjust_address (mem, EImode, GET_MODE_SIZE (EImode)); + emit_insn (gen_neon_vld3qb (operands[0], mem, operands[0])); DONE; }) (define_insn "neon_vld3qa" [(set (match_operand:CI 0 "s_register_operand" "=w") - (unspec:CI [(mem:CI (match_operand:SI 2 "s_register_operand" "1")) + (unspec:CI [(match_operand:EI 1 "neon_struct_operand" "Um") (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_VLD3A)) - (set (match_operand:SI 1 "s_register_operand" "=r") - (plus:SI (match_dup 2) - (const_int 24)))] + UNSPEC_VLD3A))] "TARGET_NEON" { int regno = REGNO (operands[0]); @@ -4639,7 +4639,7 @@ (define_insn "neon_vld3qa" ops[1] = gen_rtx_REG (DImode, regno + 4); ops[2] = gen_rtx_REG (DImode, regno + 8); ops[3] = operands[1]; - output_asm_insn ("vld3.\t{%P0, %P1, %P2}, [%3]!", ops); + output_asm_insn ("vld3.\t{%P0, %P1, %P2}, %A3", ops); return ""; } [(set_attr "neon_type" "neon_vld3_vld4")] @@ -4647,13 +4647,10 @@ (define_insn "neon_vld3qa" (define_insn "neon_vld3qb" [(set (match_operand:CI 0 "s_register_operand" "=w") - (unspec:CI [(mem:CI (match_operand:SI 3 "s_register_operand" "2")) - (match_operand:CI 1 "s_register_operand" "0") + (unspec:CI [(match_operand:EI 1 "neon_struct_operand" "Um") + (match_operand:CI 2 "s_register_operand" "0") (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_VLD3B)) - (set (match_operand:SI 2 "s_register_operand" "=r") - (plus:SI (match_dup 3) - (const_int 24)))] + UNSPEC_VLD3B))] "TARGET_NEON" { int regno = REGNO (operands[0]); @@ -4661,8 +4658,8 @@ (define_insn "neon_vld3qb" ops[0] = gen_rtx_REG (DImode, regno + 2); ops[1] = gen_rtx_REG (DImode, regno + 6); ops[2] = gen_rtx_REG (DImode, regno + 10); - ops[3] = operands[2]; - output_asm_insn ("vld3.\t{%P0, %P1, %P2}, [%3]!", ops); + ops[3] = operands[1]; + output_asm_insn ("vld3.\t{%P0, %P1, %P2}, %A3", ops); return ""; } [(set_attr "neon_type" "neon_vld3_vld4")] @@ -4670,7 +4667,7 @@ (define_insn "neon_vld3qb" (define_insn "neon_vld3_lane" [(set (match_operand:EI 0 "s_register_operand" "=w") - (unspec:EI [(mem: (match_operand:SI 1 "s_register_operand" "r")) + (unspec:EI [(match_operand: 1 "neon_struct_operand" "Um") (match_operand:EI 2 "s_register_operand" "0") (match_operand:SI 3 "immediate_operand" "i") (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] @@ -4688,7 +4685,7 @@ (define_insn "neon_vld3_lane" ops[2] = gen_rtx_REG (DImode, regno + 4); ops[3] = operands[1]; ops[4] = operands[3]; - output_asm_insn ("vld3.\t{%P0[%c4], %P1[%c4], %P2[%c4]}, [%3]", + output_asm_insn ("vld3.\t{%P0[%c4], %P1[%c4], %P2[%c4]}, %A3", ops); return ""; } @@ -4697,7 +4694,7 @@ (define_insn "neon_vld3_lane" (define_insn "neon_vld3_lane" [(set (match_operand:CI 0 "s_register_operand" "=w") - (unspec:CI [(mem: (match_operand:SI 1 "s_register_operand" "r")) + (unspec:CI [(match_operand: 1 "neon_struct_operand" "Um") (match_operand:CI 2 "s_register_operand" "0") (match_operand:SI 3 "immediate_operand" "i") (unspec:VMQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] @@ -4720,7 +4717,7 @@ (define_insn "neon_vld3_lane" ops[2] = gen_rtx_REG (DImode, regno + 8); ops[3] = operands[1]; ops[4] = GEN_INT (lane); - output_asm_insn ("vld3.\t{%P0[%c4], %P1[%c4], %P2[%c4]}, [%3]", + output_asm_insn ("vld3.\t{%P0[%c4], %P1[%c4], %P2[%c4]}, %A3", ops); return ""; } @@ -4729,7 +4726,7 @@ (define_insn "neon_vld3_lane" (define_insn "neon_vld3_dup" [(set (match_operand:EI 0 "s_register_operand" "=w") - (unspec:EI [(mem: (match_operand:SI 1 "s_register_operand" "r")) + (unspec:EI [(match_operand: 1 "neon_struct_operand" "Um") (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD3_DUP))] "TARGET_NEON" @@ -4742,11 +4739,11 @@ (define_insn "neon_vld3_dup" ops[1] = gen_rtx_REG (DImode, regno + 2); ops[2] = gen_rtx_REG (DImode, regno + 4); ops[3] = operands[1]; - output_asm_insn ("vld3.\t{%P0[], %P1[], %P2[]}, [%3]", ops); + output_asm_insn ("vld3.\t{%P0[], %P1[], %P2[]}, %A3", ops); return ""; } else - return "vld1.\t%h0, [%1]"; + return "vld1.\t%h0, %A1"; } [(set (attr "neon_type") (if_then_else (gt (const_string "") (const_string "1")) @@ -4754,16 +4751,16 @@ (define_insn "neon_vld3_dup" (const_string "neon_vld1_1_2_regs")))]) (define_insn "neon_vst3" - [(set (mem:EI (match_operand:SI 0 "s_register_operand" "r")) + [(set (match_operand:EI 0 "neon_struct_operand" "=Um") (unspec:EI [(match_operand:EI 1 "s_register_operand" "w") (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VST3))] "TARGET_NEON" { if ( == 64) - return "vst1.64\t%h1, [%0]"; + return "vst1.64\t%h1, %A0"; else - return "vst3.\t%h1, [%0]"; + return "vst3.\t%h1, %A0"; } [(set (attr "neon_type") (if_then_else (eq (const_string "") (const_string "64")) @@ -4771,62 +4768,60 @@ (define_insn "neon_vst3" (const_string "neon_vst2_4_regs_vst3_vst4")))]) (define_expand "neon_vst3" - [(match_operand:SI 0 "s_register_operand" "+r") - (match_operand:CI 1 "s_register_operand" "w") + [(match_operand:CI 0 "neon_struct_operand") + (match_operand:CI 1 "s_register_operand") (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] "TARGET_NEON" { - emit_insn (gen_neon_vst3qa (operands[0], operands[0], operands[1])); - emit_insn (gen_neon_vst3qb (operands[0], operands[0], operands[1])); + rtx mem; + + mem = adjust_address (operands[0], EImode, 0); + emit_insn (gen_neon_vst3qa (mem, operands[1])); + mem = adjust_address (mem, EImode, GET_MODE_SIZE (EImode)); + emit_insn (gen_neon_vst3qb (mem, operands[1])); DONE; }) (define_insn "neon_vst3qa" - [(set (mem:EI (match_operand:SI 1 "s_register_operand" "0")) - (unspec:EI [(match_operand:CI 2 "s_register_operand" "w") + [(set (match_operand:EI 0 "neon_struct_operand" "=Um") + (unspec:EI [(match_operand:CI 1 "s_register_operand" "w") (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_VST3A)) - (set (match_operand:SI 0 "s_register_operand" "=r") - (plus:SI (match_dup 1) - (const_int 24)))] + UNSPEC_VST3A))] "TARGET_NEON" { - int regno = REGNO (operands[2]); + int regno = REGNO (operands[1]); rtx ops[4]; ops[0] = operands[0]; ops[1] = gen_rtx_REG (DImode, regno); ops[2] = gen_rtx_REG (DImode, regno + 4); ops[3] = gen_rtx_REG (DImode, regno + 8); - output_asm_insn ("vst3.\t{%P1, %P2, %P3}, [%0]!", ops); + output_asm_insn ("vst3.\t{%P1, %P2, %P3}, %A0", ops); return ""; } [(set_attr "neon_type" "neon_vst2_4_regs_vst3_vst4")] ) (define_insn "neon_vst3qb" - [(set (mem:EI (match_operand:SI 1 "s_register_operand" "0")) - (unspec:EI [(match_operand:CI 2 "s_register_operand" "w") + [(set (match_operand:EI 0 "neon_struct_operand" "=Um") + (unspec:EI [(match_operand:CI 1 "s_register_operand" "w") (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_VST3B)) - (set (match_operand:SI 0 "s_register_operand" "=r") - (plus:SI (match_dup 1) - (const_int 24)))] + UNSPEC_VST3B))] "TARGET_NEON" { - int regno = REGNO (operands[2]); + int regno = REGNO (operands[1]); rtx ops[4]; ops[0] = operands[0]; ops[1] = gen_rtx_REG (DImode, regno + 2); ops[2] = gen_rtx_REG (DImode, regno + 6); ops[3] = gen_rtx_REG (DImode, regno + 10); - output_asm_insn ("vst3.\t{%P1, %P2, %P3}, [%0]!", ops); + output_asm_insn ("vst3.\t{%P1, %P2, %P3}, %A0", ops); return ""; } [(set_attr "neon_type" "neon_vst2_4_regs_vst3_vst4")] ) (define_insn "neon_vst3_lane" - [(set (mem: (match_operand:SI 0 "s_register_operand" "r")) + [(set (match_operand: 0 "neon_struct_operand" "=Um") (unspec: [(match_operand:EI 1 "s_register_operand" "w") (match_operand:SI 2 "immediate_operand" "i") @@ -4845,7 +4840,7 @@ (define_insn "neon_vst3_lane" ops[2] = gen_rtx_REG (DImode, regno + 2); ops[3] = gen_rtx_REG (DImode, regno + 4); ops[4] = operands[2]; - output_asm_insn ("vst3.\t{%P1[%c4], %P2[%c4], %P3[%c4]}, [%0]", + output_asm_insn ("vst3.\t{%P1[%c4], %P2[%c4], %P3[%c4]}, %A0", ops); return ""; } @@ -4853,7 +4848,7 @@ (define_insn "neon_vst3_lane" ) (define_insn "neon_vst3_lane" - [(set (mem: (match_operand:SI 0 "s_register_operand" "r")) + [(set (match_operand: 0 "neon_struct_operand" "=Um") (unspec: [(match_operand:CI 1 "s_register_operand" "w") (match_operand:SI 2 "immediate_operand" "i") @@ -4877,7 +4872,7 @@ (define_insn "neon_vst3_lane" ops[2] = gen_rtx_REG (DImode, regno + 4); ops[3] = gen_rtx_REG (DImode, regno + 8); ops[4] = GEN_INT (lane); - output_asm_insn ("vst3.\t{%P1[%c4], %P2[%c4], %P3[%c4]}, [%0]", + output_asm_insn ("vst3.\t{%P1[%c4], %P2[%c4], %P3[%c4]}, %A0", ops); return ""; } @@ -4885,15 +4880,15 @@ (define_insn "neon_vst3_lane" (define_insn "neon_vld4" [(set (match_operand:OI 0 "s_register_operand" "=w") - (unspec:OI [(mem:OI (match_operand:SI 1 "s_register_operand" "r")) + (unspec:OI [(match_operand:OI 1 "neon_struct_operand" "Um") (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD4))] "TARGET_NEON" { if ( == 64) - return "vld1.64\t%h0, [%1]"; + return "vld1.64\t%h0, %A1"; else - return "vld4.\t%h0, [%1]"; + return "vld4.\t%h0, %A1"; } [(set (attr "neon_type") (if_then_else (eq (const_string "") (const_string "64")) @@ -4902,25 +4897,25 @@ (define_insn "neon_vld4" ) (define_expand "neon_vld4" - [(match_operand:XI 0 "s_register_operand" "=w") - (match_operand:SI 1 "s_register_operand" "+r") + [(match_operand:XI 0 "s_register_operand") + (match_operand:XI 1 "neon_struct_operand") (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] "TARGET_NEON" { - emit_insn (gen_neon_vld4qa (operands[0], operands[1], operands[1])); - emit_insn (gen_neon_vld4qb (operands[0], operands[0], - operands[1], operands[1])); + rtx mem; + + mem = adjust_address (operands[1], OImode, 0); + emit_insn (gen_neon_vld4qa (operands[0], mem)); + mem = adjust_address (mem, OImode, GET_MODE_SIZE (OImode)); + emit_insn (gen_neon_vld4qb (operands[0], mem, operands[0])); DONE; }) (define_insn "neon_vld4qa" [(set (match_operand:XI 0 "s_register_operand" "=w") - (unspec:XI [(mem:XI (match_operand:SI 2 "s_register_operand" "1")) + (unspec:XI [(match_operand:OI 1 "neon_struct_operand" "Um") (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_VLD4A)) - (set (match_operand:SI 1 "s_register_operand" "=r") - (plus:SI (match_dup 2) - (const_int 32)))] + UNSPEC_VLD4A))] "TARGET_NEON" { int regno = REGNO (operands[0]); @@ -4930,7 +4925,7 @@ (define_insn "neon_vld4qa" ops[2] = gen_rtx_REG (DImode, regno + 8); ops[3] = gen_rtx_REG (DImode, regno + 12); ops[4] = operands[1]; - output_asm_insn ("vld4.\t{%P0, %P1, %P2, %P3}, [%4]!", ops); + output_asm_insn ("vld4.\t{%P0, %P1, %P2, %P3}, %A4", ops); return ""; } [(set_attr "neon_type" "neon_vld3_vld4")] @@ -4938,13 +4933,10 @@ (define_insn "neon_vld4qa" (define_insn "neon_vld4qb" [(set (match_operand:XI 0 "s_register_operand" "=w") - (unspec:XI [(mem:XI (match_operand:SI 3 "s_register_operand" "2")) - (match_operand:XI 1 "s_register_operand" "0") + (unspec:XI [(match_operand:OI 1 "neon_struct_operand" "Um") + (match_operand:XI 2 "s_register_operand" "0") (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_VLD4B)) - (set (match_operand:SI 2 "s_register_operand" "=r") - (plus:SI (match_dup 3) - (const_int 32)))] + UNSPEC_VLD4B))] "TARGET_NEON" { int regno = REGNO (operands[0]); @@ -4953,8 +4945,8 @@ (define_insn "neon_vld4qb" ops[1] = gen_rtx_REG (DImode, regno + 6); ops[2] = gen_rtx_REG (DImode, regno + 10); ops[3] = gen_rtx_REG (DImode, regno + 14); - ops[4] = operands[2]; - output_asm_insn ("vld4.\t{%P0, %P1, %P2, %P3}, [%4]!", ops); + ops[4] = operands[1]; + output_asm_insn ("vld4.\t{%P0, %P1, %P2, %P3}, %A4", ops); return ""; } [(set_attr "neon_type" "neon_vld3_vld4")] @@ -4962,7 +4954,7 @@ (define_insn "neon_vld4qb" (define_insn "neon_vld4_lane" [(set (match_operand:OI 0 "s_register_operand" "=w") - (unspec:OI [(mem: (match_operand:SI 1 "s_register_operand" "r")) + (unspec:OI [(match_operand: 1 "neon_struct_operand" "Um") (match_operand:OI 2 "s_register_operand" "0") (match_operand:SI 3 "immediate_operand" "i") (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] @@ -4981,7 +4973,7 @@ (define_insn "neon_vld4_lane" ops[3] = gen_rtx_REG (DImode, regno + 6); ops[4] = operands[1]; ops[5] = operands[3]; - output_asm_insn ("vld4.\t{%P0[%c5], %P1[%c5], %P2[%c5], %P3[%c5]}, [%4]", + output_asm_insn ("vld4.\t{%P0[%c5], %P1[%c5], %P2[%c5], %P3[%c5]}, %A4", ops); return ""; } @@ -4990,7 +4982,7 @@ (define_insn "neon_vld4_lane" (define_insn "neon_vld4_lane" [(set (match_operand:XI 0 "s_register_operand" "=w") - (unspec:XI [(mem: (match_operand:SI 1 "s_register_operand" "r")) + (unspec:XI [(match_operand: 1 "neon_struct_operand" "Um") (match_operand:XI 2 "s_register_operand" "0") (match_operand:SI 3 "immediate_operand" "i") (unspec:VMQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] @@ -5014,7 +5006,7 @@ (define_insn "neon_vld4_lane" ops[3] = gen_rtx_REG (DImode, regno + 12); ops[4] = operands[1]; ops[5] = GEN_INT (lane); - output_asm_insn ("vld4.\t{%P0[%c5], %P1[%c5], %P2[%c5], %P3[%c5]}, [%4]", + output_asm_insn ("vld4.\t{%P0[%c5], %P1[%c5], %P2[%c5], %P3[%c5]}, %A4", ops); return ""; } @@ -5023,7 +5015,7 @@ (define_insn "neon_vld4_lane" (define_insn "neon_vld4_dup" [(set (match_operand:OI 0 "s_register_operand" "=w") - (unspec:OI [(mem: (match_operand:SI 1 "s_register_operand" "r")) + (unspec:OI [(match_operand: 1 "neon_struct_operand" "Um") (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VLD4_DUP))] "TARGET_NEON" @@ -5037,12 +5029,12 @@ (define_insn "neon_vld4_dup" ops[2] = gen_rtx_REG (DImode, regno + 4); ops[3] = gen_rtx_REG (DImode, regno + 6); ops[4] = operands[1]; - output_asm_insn ("vld4.\t{%P0[], %P1[], %P2[], %P3[]}, [%4]", + output_asm_insn ("vld4.\t{%P0[], %P1[], %P2[], %P3[]}, %A4", ops); return ""; } else - return "vld1.\t%h0, [%1]"; + return "vld1.\t%h0, %A1"; } [(set (attr "neon_type") (if_then_else (gt (const_string "") (const_string "1")) @@ -5051,16 +5043,16 @@ (define_insn "neon_vld4_dup" ) (define_insn "neon_vst4" - [(set (mem:OI (match_operand:SI 0 "s_register_operand" "r")) + [(set (match_operand:OI 0 "neon_struct_operand" "=Um") (unspec:OI [(match_operand:OI 1 "s_register_operand" "w") (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] UNSPEC_VST4))] "TARGET_NEON" { if ( == 64) - return "vst1.64\t%h1, [%0]"; + return "vst1.64\t%h1, %A0"; else - return "vst4.\t%h1, [%0]"; + return "vst4.\t%h1, %A0"; } [(set (attr "neon_type") (if_then_else (eq (const_string "") (const_string "64")) @@ -5069,64 +5061,62 @@ (define_insn "neon_vst4" ) (define_expand "neon_vst4" - [(match_operand:SI 0 "s_register_operand" "+r") - (match_operand:XI 1 "s_register_operand" "w") + [(match_operand:XI 0 "neon_struct_operand") + (match_operand:XI 1 "s_register_operand") (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] "TARGET_NEON" { - emit_insn (gen_neon_vst4qa (operands[0], operands[0], operands[1])); - emit_insn (gen_neon_vst4qb (operands[0], operands[0], operands[1])); + rtx mem; + + mem = adjust_address (operands[0], OImode, 0); + emit_insn (gen_neon_vst4qa (mem, operands[1])); + mem = adjust_address (mem, OImode, GET_MODE_SIZE (OImode)); + emit_insn (gen_neon_vst4qb (mem, operands[1])); DONE; }) (define_insn "neon_vst4qa" - [(set (mem:OI (match_operand:SI 1 "s_register_operand" "0")) - (unspec:OI [(match_operand:XI 2 "s_register_operand" "w") + [(set (match_operand:OI 0 "neon_struct_operand" "=Um") + (unspec:OI [(match_operand:XI 1 "s_register_operand" "w") (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_VST4A)) - (set (match_operand:SI 0 "s_register_operand" "=r") - (plus:SI (match_dup 1) - (const_int 32)))] + UNSPEC_VST4A))] "TARGET_NEON" { - int regno = REGNO (operands[2]); + int regno = REGNO (operands[1]); rtx ops[5]; ops[0] = operands[0]; ops[1] = gen_rtx_REG (DImode, regno); ops[2] = gen_rtx_REG (DImode, regno + 4); ops[3] = gen_rtx_REG (DImode, regno + 8); ops[4] = gen_rtx_REG (DImode, regno + 12); - output_asm_insn ("vst4.\t{%P1, %P2, %P3, %P4}, [%0]!", ops); + output_asm_insn ("vst4.\t{%P1, %P2, %P3, %P4}, %A0", ops); return ""; } [(set_attr "neon_type" "neon_vst2_4_regs_vst3_vst4")] ) (define_insn "neon_vst4qb" - [(set (mem:OI (match_operand:SI 1 "s_register_operand" "0")) - (unspec:OI [(match_operand:XI 2 "s_register_operand" "w") + [(set (match_operand:OI 0 "neon_struct_operand" "=Um") + (unspec:OI [(match_operand:XI 1 "s_register_operand" "w") (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] - UNSPEC_VST4B)) - (set (match_operand:SI 0 "s_register_operand" "=r") - (plus:SI (match_dup 1) - (const_int 32)))] + UNSPEC_VST4B))] "TARGET_NEON" { - int regno = REGNO (operands[2]); + int regno = REGNO (operands[1]); rtx ops[5]; ops[0] = operands[0]; ops[1] = gen_rtx_REG (DImode, regno + 2); ops[2] = gen_rtx_REG (DImode, regno + 6); ops[3] = gen_rtx_REG (DImode, regno + 10); ops[4] = gen_rtx_REG (DImode, regno + 14); - output_asm_insn ("vst4.\t{%P1, %P2, %P3, %P4}, [%0]!", ops); + output_asm_insn ("vst4.\t{%P1, %P2, %P3, %P4}, %A0", ops); return ""; } [(set_attr "neon_type" "neon_vst2_4_regs_vst3_vst4")] ) (define_insn "neon_vst4_lane" - [(set (mem: (match_operand:SI 0 "s_register_operand" "r")) + [(set (match_operand: 0 "neon_struct_operand" "=Um") (unspec: [(match_operand:OI 1 "s_register_operand" "w") (match_operand:SI 2 "immediate_operand" "i") @@ -5146,7 +5136,7 @@ (define_insn "neon_vst4_lane" ops[3] = gen_rtx_REG (DImode, regno + 4); ops[4] = gen_rtx_REG (DImode, regno + 6); ops[5] = operands[2]; - output_asm_insn ("vst4.\t{%P1[%c5], %P2[%c5], %P3[%c5], %P4[%c5]}, [%0]", + output_asm_insn ("vst4.\t{%P1[%c5], %P2[%c5], %P3[%c5], %P4[%c5]}, %A0", ops); return ""; } @@ -5154,7 +5144,7 @@ (define_insn "neon_vst4_lane" ) (define_insn "neon_vst4_lane" - [(set (mem: (match_operand:SI 0 "s_register_operand" "r")) + [(set (match_operand: 0 "neon_struct_operand" "=Um") (unspec: [(match_operand:XI 1 "s_register_operand" "w") (match_operand:SI 2 "immediate_operand" "i") @@ -5179,7 +5169,7 @@ (define_insn "neon_vst4_lane" ops[3] = gen_rtx_REG (DImode, regno + 8); ops[4] = gen_rtx_REG (DImode, regno + 12); ops[5] = GEN_INT (lane); - output_asm_insn ("vst4.\t{%P1[%c5], %P2[%c5], %P3[%c5], %P4[%c5]}, [%0]", + output_asm_insn ("vst4.\t{%P1[%c5], %P2[%c5], %P3[%c5], %P4[%c5]}, %A0", ops); return ""; } Index: gcc/config/arm/neon-testgen.ml =================================================================== --- gcc/config/arm/neon-testgen.ml 2011-03-29 08:52:13.000000000 +0100 +++ gcc/config/arm/neon-testgen.ml 2011-03-29 08:52:16.000000000 +0100 @@ -177,7 +177,7 @@ let rec analyze_shape shape = let alt2 = commas (fun x -> x) (n_things n elt_regexp) "" in "\\\\\\{((" ^ alt1 ^ ")|(" ^ alt2 ^ "))\\\\\\}" | (PtrTo elt | CstPtrTo elt) -> - "\\\\\\[" ^ (analyze_shape_elt elt) ^ "\\\\\\]" + "\\\\\\[" ^ (analyze_shape_elt elt) ^ "\\(:\\[0-9\\]+\\)?\\\\\\]" | Element_of_dreg -> (analyze_shape_elt Dreg) ^ "\\\\\\[\\[0-9\\]+\\\\\\]" | Element_of_qreg -> (analyze_shape_elt Qreg) ^ "\\\\\\[\\[0-9\\]+\\\\\\]" | All_elements_of_dreg -> (analyze_shape_elt Dreg) ^ "\\\\\\[\\\\\\]" Index: gcc/testsuite/gcc.target/arm/neon-vld3-1.c =================================================================== --- /dev/null 2011-03-23 08:42:11.268792848 +0000 +++ gcc/testsuite/gcc.target/arm/neon-vld3-1.c 2011-03-29 08:52:16.000000000 +0100 @@ -0,0 +1,27 @@ +/* { dg-do run } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-options "-O2" } */ +/* { dg-add-options arm_neon } */ + +#include "arm_neon.h" + +uint32_t buffer[12]; + +void __attribute__((noinline)) +foo (uint32_t *a) +{ + uint32x4x3_t x; + + x = vld3q_u32 (a); + x.val[0] = vaddq_u32 (x.val[0], x.val[1]); + vst3q_u32 (a, x); +} + +int +main (void) +{ + buffer[0] = 1; + buffer[1] = 2; + foo (buffer); + return buffer[0] != 3; +} Index: gcc/testsuite/gcc.target/arm/neon-vst3-1.c =================================================================== --- /dev/null 2011-03-23 08:42:11.268792848 +0000 +++ gcc/testsuite/gcc.target/arm/neon-vst3-1.c 2011-03-29 08:52:16.000000000 +0100 @@ -0,0 +1,25 @@ +/* { dg-do run } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-options "-O2" } */ +/* { dg-add-options arm_neon } */ + +#include "arm_neon.h" + +uint32_t buffer[64]; + +void __attribute__((noinline)) +foo (uint32_t *a) +{ + uint32x4x3_t x; + + x = vld3q_u32 (a); + a[35] = 1; + vst3q_lane_u32 (a + 32, x, 1); +} + +int +main (void) +{ + foo (buffer); + return buffer[35] != 1; +}