From patchwork Wed May 3 08:00:08 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 98471 Delivered-To: patch@linaro.org Received: by 10.140.89.200 with SMTP id v66csp179319qgd; Wed, 3 May 2017 01:01:26 -0700 (PDT) X-Received: by 10.98.194.199 with SMTP id w68mr3316245pfk.192.1493798486382; Wed, 03 May 2017 01:01:26 -0700 (PDT) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id p84si1910108pfj.171.2017.05.03.01.01.26 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 03 May 2017 01:01:26 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-452660-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org; spf=pass (google.com: domain of gcc-patches-return-452660-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-452660-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=Lmx02TRMrQRtnbPfiHjlHUwUgZYwUyVLzkzSqTroHEM4fCeR8MD+V Wi7a5ma8BVIu+uaZBqp6UGghGiqRW9K2yCKPQSRgCH7dsRhVwreeXcrjp+7KA7At 4w6Yff10zQ7n6FHGh7Nuw/saXZUP5mhvRVz9z/OC671Q12Cmezvzys= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=/2M7vYIMNcdFPB4j3MDg+pAgzm8=; b=oSmFA4WKJqgCzFCoJiis E+nF8qCm9LiGBL9J5Ahz7UuBD+vzmNc0ols25kNaOpJIvBOtGPn0FhWmkbPcx9xH mNBvS88M1I5OscXMlM5hiaOXk+KurX8ihL7rGEeEZJddPUg1vP6wzdVA2nSL8B32 5uojQoCs9wk/s0YhRAtSxVU= Received: (qmail 68040 invoked by alias); 3 May 2017 08:00:59 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 47018 invoked by uid 89); 3 May 2017 08:00:29 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-15.1 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, RCVD_IN_SORBS_SPAM, SPF_PASS autolearn=ham version=3.3.2 spammy=contrast, classic, 1238, ge X-HELO: mail-wm0-f42.google.com Received: from mail-wm0-f42.google.com (HELO mail-wm0-f42.google.com) (74.125.82.42) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 03 May 2017 08:00:18 +0000 Received: by mail-wm0-f42.google.com with SMTP id w64so137447666wma.0 for ; Wed, 03 May 2017 01:00:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:subject:date:message-id :user-agent:mime-version; bh=KuGauXTNujQ4vzxEBtsjZkYan3qFm/F+8KvYaeEj91M=; b=GL1OptWTqvKVkqRJxQN3oJ8tJwzSVgjtgZNpbS/qFSvVS1USG0AjVeF7i3U+A9OR+K RMC8JNTTMNVmr0pODnj+e1I6Na0YkUjQVyqyvMJ/93S8HORMs/3yOrbIyL4q+5FdxXdY Fwl1lcQcDDzgNfipXSB59waP/3ObV800OW2mvp07ltTvfjpOj+1j/u/VjrL46aY5pPp8 NhmbiO7YeWjlPVhyoshIPGFeCK6x0A8OaqZL8z79m9Mamq1lwrYFPlsFBqgcodQzKC76 49mAKalWlAuIBaGoeoerbwXmMBNa7xTmG0py1pcL84xhUB7L02n8wcFTZbTm/RcxBf9r TSZQ== X-Gm-Message-State: AN3rC/7Dp1s9AScSzOVco9UtCDnts4r8bFjlQsSN02Kg+nlP7wIXA48n UJ1DaWVsccW2Q9j+Y30gUw== X-Received: by 10.28.157.15 with SMTP id g15mr5648597wme.133.1493798410604; Wed, 03 May 2017 01:00:10 -0700 (PDT) Received: from localhost (94.197.120.23.threembb.co.uk. [94.197.120.23]) by smtp.gmail.com with ESMTPSA id x20sm4154039wme.0.2017.05.03.01.00.09 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 03 May 2017 01:00:09 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Subject: Handle data dependence relations with different bases Date: Wed, 03 May 2017 09:00:08 +0100 Message-ID: <87tw52s73r.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 This patch tries to calculate conservatively-correct distance vectors for two references whose base addresses are not the same. It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence isn't guaranteed to occur. The motivating example is: struct s { int x[8]; }; void f (struct s *a, struct s *b) { for (int i = 0; i < 8; ++i) a->x[i] += b->x[i]; } in which the "a" and "b" accesses are either independent or have a dependence distance of 0 (assuming -fstrict-aliasing). Neither case prevents vectorisation, so we can vectorise without an alias check. I'd originally wanted to do the same thing for arrays as well, e.g.: void f (int a[][8], struct b[][8]) { for (int i = 0; i < 8; ++i) a[0][i] += b[0][i]; } I think this is valid because C11 6.7.6.2/6 says: For two array types to be compatible, both shall have compatible element types, and if both size specifiers are present, and are integer constant expressions, then both size specifiers shall have the same constant value. So if we access an array through an int (*)[8], it must have type X[8] or X[], where X is compatible with int. It doesn't seem possible in either case for "a[0]" and "b[0]" to overlap when "a != b". However, Richard B said that (at least in gimple) we support arbitrary overlap of arrays and allow arrays to be accessed with different dimensionality. There are examples of this in PR50067. I've therefore only handled references that end in a structure field access. There are two ways of handling these dependences in the vectoriser: use them to limit VF, or check at runtime as before. I've gone for the approach of checking at runtime if we can, to avoid limiting VF unnecessarily. We still fall back to a VF cap when runtime checks aren't allowed. The patch tests whether we queued an alias check with a dependence distance of X and then picked a VF <= X, in which case it's safe to drop the alias check. Since vect_prune_runtime_alias_check_list can be called twice with different VF for the same loop, it's no longer safe to clear may_alias_ddrs on exit. Instead we should use comp_alias_ddrs to check whether versioning is necessary. Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Thanks, Richard gcc/ 2017-05-03 Richard Sandiford * tree-data-ref.h (subscript): Add access_fn field. (data_dependence_relation): Add could_be_independent_p. (SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros. (same_access_functions): Move to tree-data-ref.c. * tree-data-ref.c (ref_contains_union_access_p): New function. (dump_data_dependence_relation): Use SUB_ACCESS_FN instead of DR_ACCESS_FN. (constant_access_functions): Likewise. (add_other_self_distances): Likewise. (same_access_functions): Likewise. (Moved from tree-data-ref.h.) (initialize_data_dependence_relation): Use XCNEW and remove explicit zeroing of DDR_REVERSED_P. Look for a subsequence of access functions that have the same type. Allow the subsequence to end with different bases in some circumstances. Record the chosen access functions in SUB_ACCESS_FN. (build_classic_dist_vector_1): Replace ddr_a and ddr_b with a_index and b_index. Use SUB_ACCESS_FN instead of DR_ACCESS_FN. (subscript_dependence_tester_1): Likewise dra and drb. (build_classic_dist_vector): Update calls accordingly. (subscript_dependence_tester): Likewise. * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Check DDR_COULD_BE_INDEPENDENT_P. * tree-vectorizer.h (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Test comp_alias_ddrs instead of may_alias_ddrs. * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Try to mark for aliasing if DDR_COULD_BE_INDEPENDENT_P, but fall back to using the recorded distance vectors if that fails. (dependence_distance_ge_vf): New function. (vect_prune_runtime_alias_test_list): Use it. Don't clear LOOP_VINFO_MAY_ALIAS_DDRS. gcc/testsuite/ * gcc.dg/vect/vect-alias-check-3.c: New test. * gcc.dg/vect/vect-alias-check-4.c: Likewise. * gcc.dg/vect/vect-alias-check-5.c: Likewise. Index: gcc/tree-data-ref.h =================================================================== --- gcc/tree-data-ref.h 2017-05-03 08:48:11.977015306 +0100 +++ gcc/tree-data-ref.h 2017-05-03 08:48:48.737038502 +0100 @@ -191,6 +191,9 @@ struct conflict_function struct subscript { + /* The access functions of the two references. */ + tree access_fn[2]; + /* A description of the iterations for which the elements are accessed twice. */ conflict_function *conflicting_iterations_in_a; @@ -209,6 +212,7 @@ struct subscript typedef struct subscript *subscript_p; +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I] #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict @@ -264,6 +268,33 @@ struct data_dependence_relation /* Set to true when the dependence relation is on the same data access. */ bool self_reference_p; + + /* True if the dependence described is conservatively correct rather + than exact, and if it is still possible for the accesses to be + conditionally independent. For example, the a and b references in: + + struct s *a, *b; + for (int i = 0; i < n; ++i) + a->f[i] += b->f[i]; + + conservatively have a distance vector of (0), for the case in which + a == b, but the accesses are independent if a != b. Similarly, + the a and b references in: + + struct s *a, *b; + for (int i = 0; i < n; ++i) + a[0].f[i] += b[i].f[i]; + + conservatively have a distance vector of (0), but they are indepenent + when a != b + i. In contrast, the references in: + + struct s *a; + for (int i = 0; i < n; ++i) + a->f[i] += a->f[i]; + + have the same distance vector of (0), but the accesses can never be + independent. */ + bool could_be_independent_p; }; typedef struct data_dependence_relation *ddr_p; @@ -294,6 +325,7 @@ #define DDR_DIR_VECT(DDR, I) \ #define DDR_DIST_VECT(DDR, I) \ DDR_DIST_VECTS (DDR)[I] #define DDR_REVERSED_P(DDR) (DDR)->reversed_p +#define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p bool dr_analyze_innermost (struct data_reference *, struct loop *); @@ -372,22 +404,6 @@ same_data_refs (data_reference_p a, data return false; return true; -} - -/* Return true when the DDR contains two data references that have the - same access functions. */ - -static inline bool -same_access_functions (const struct data_dependence_relation *ddr) -{ - unsigned i; - - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) - if (!eq_evolutions_p (DR_ACCESS_FN (DDR_A (ddr), i), - DR_ACCESS_FN (DDR_B (ddr), i))) - return false; - - return true; } /* Returns true when all the dependences are computable. */ Index: gcc/tree-data-ref.c =================================================================== --- gcc/tree-data-ref.c 2017-02-23 19:54:15.000000000 +0000 +++ gcc/tree-data-ref.c 2017-05-03 08:48:48.737038502 +0100 @@ -123,8 +123,7 @@ Software Foundation; either version 3, o } dependence_stats; static bool subscript_dependence_tester_1 (struct data_dependence_relation *, - struct data_reference *, - struct data_reference *, + unsigned int, unsigned int, struct loop *); /* Returns true iff A divides B. */ @@ -144,6 +143,21 @@ int_divides_p (int a, int b) return ((b % a) == 0); } +/* Return true if reference REF contains a union access. */ + +static bool +ref_contains_union_access_p (tree ref) +{ + while (handled_component_p (ref)) + { + ref = TREE_OPERAND (ref, 0); + if (TREE_CODE (TREE_TYPE (ref)) == UNION_TYPE + || TREE_CODE (TREE_TYPE (ref)) == QUAL_UNION_TYPE) + return true; + } + return false; +} + /* Dump into FILE all the data references from DATAREFS. */ @@ -433,13 +447,14 @@ dump_data_dependence_relation (FILE *out unsigned int i; struct loop *loopi; - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) + subscript *sub; + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) { fprintf (outf, " access_fn_A: "); - print_generic_stmt (outf, DR_ACCESS_FN (dra, i), 0); + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 0), 0); fprintf (outf, " access_fn_B: "); - print_generic_stmt (outf, DR_ACCESS_FN (drb, i), 0); - dump_subscript (outf, DDR_SUBSCRIPT (ddr, i)); + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 1), 0); + dump_subscript (outf, sub); } fprintf (outf, " inner loop index: %d\n", DDR_INNER_LOOP (ddr)); @@ -1484,11 +1499,10 @@ initialize_data_dependence_relation (str struct data_dependence_relation *res; unsigned int i; - res = XNEW (struct data_dependence_relation); + res = XCNEW (struct data_dependence_relation); DDR_A (res) = a; DDR_B (res) = b; DDR_LOOP_NEST (res).create (0); - DDR_REVERSED_P (res) = false; DDR_SUBSCRIPTS (res).create (0); DDR_DIR_VECTS (res).create (0); DDR_DIST_VECTS (res).create (0); @@ -1506,82 +1520,217 @@ initialize_data_dependence_relation (str return res; } - /* The case where the references are exactly the same. */ - if (operand_equal_p (DR_REF (a), DR_REF (b), 0)) + unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a); + unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b); + if (num_dimensions_a == 0 || num_dimensions_b == 0) { - if ((loop_nest.exists () - && !object_address_invariant_in_loop_p (loop_nest[0], - DR_BASE_OBJECT (a))) - || DR_NUM_DIMENSIONS (a) == 0) - { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; - } - DDR_AFFINE_P (res) = true; - DDR_ARE_DEPENDENT (res) = NULL_TREE; - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); - DDR_LOOP_NEST (res) = loop_nest; - DDR_INNER_LOOP (res) = 0; - DDR_SELF_REFERENCE (res) = true; - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) - { - struct subscript *subscript; + DDR_ARE_DEPENDENT (res) = chrec_dont_know; + return res; + } + + /* For unconstrained bases, the outer (highest-index) subscript + describes a variation in the base of the original DR_REF rather + than a component access. We have no type that accurately describes + the new DR_BASE_OBJECT (whose TREE_TYPE describes the type *after* + applying the outer subscript) so limit the search to the last real + component access. + + E.g. for: - subscript = XNEW (struct subscript); - SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); - SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); - SUB_LAST_CONFLICT (subscript) = chrec_dont_know; - SUB_DISTANCE (subscript) = chrec_dont_know; - DDR_SUBSCRIPTS (res).safe_push (subscript); + void + f (int a[][8], int b[][8]) + { + for (int i = 0; i < 8; ++i) + a[i * 2][0] = b[i][0]; } - return res; + + the a and b accesses have a single ARRAY_REF component reference [0] + but have two subscripts. */ + if (DR_UNCONSTRAINED_BASE (a)) + num_dimensions_a -= 1; + if (DR_UNCONSTRAINED_BASE (b)) + num_dimensions_b -= 1; + + /* Now look for two sequences of component references that have the same + type in both A and B. The first sequence includes an arbitrary mixture + of array and structure references while the second always ends on a + structure reference. + + The former (arbitrary) sequence uses access functions: + + [START_A, START_A + NUM_DIMENSIONS) of A + [START_B, START_B + NUM_DIMENSIONS) of B + + The latter sequence uses access functions: + + [STRUCT_START_A, STRUCT_START_A + STRUCT_NUM_DIMENSIONS) of A + [STRUCT_START_B, STRUCT_START_B + STRUCT_NUM_DIMENSIONS) of B + + STRUCT_REF_A and STRUCT_REF_B are the outer references for the + latter sequence. */ + unsigned int start_a = 0; + unsigned int start_b = 0; + unsigned int num_dimensions = 0; + unsigned int struct_start_a = 0; + unsigned int struct_start_b = 0; + unsigned int struct_num_dimensions = 0; + unsigned int index_a = 0; + unsigned int index_b = 0; + tree next_ref_a = DR_REF (a); + tree next_ref_b = DR_REF (b); + tree struct_ref_a = NULL_TREE; + tree struct_ref_b = NULL_TREE; + while (index_a < num_dimensions_a && index_b < num_dimensions_b) + { + gcc_checking_assert (handled_component_p (next_ref_a)); + gcc_checking_assert (handled_component_p (next_ref_b)); + tree outer_ref_a = TREE_OPERAND (next_ref_a, 0); + tree outer_ref_b = TREE_OPERAND (next_ref_b, 0); + tree type_a = TREE_TYPE (outer_ref_a); + tree type_b = TREE_TYPE (outer_ref_b); + if (types_compatible_p (type_a, type_b)) + { + /* This pair of accesses belong to a suitable sequence. */ + if (start_a + num_dimensions != index_a + || start_b + num_dimensions != index_b) + { + /* Start a new sequence here. */ + start_a = index_a; + start_b = index_b; + num_dimensions = 0; + } + num_dimensions += 1; + if (TREE_CODE (type_a) == RECORD_TYPE) + { + struct_start_a = start_a; + struct_start_b = start_b; + struct_num_dimensions = num_dimensions; + struct_ref_a = outer_ref_a; + struct_ref_b = outer_ref_b; + } + next_ref_a = outer_ref_a; + next_ref_b = outer_ref_b; + index_a += 1; + index_b += 1; + continue; + } + /* Try to approach equal type sizes. */ + if (!COMPLETE_TYPE_P (type_a) + || !COMPLETE_TYPE_P (type_b) + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) + break; + unsigned HOST_WIDE_INT size_a = tree_to_uhwi (TYPE_SIZE_UNIT (type_a)); + unsigned HOST_WIDE_INT size_b = tree_to_uhwi (TYPE_SIZE_UNIT (type_b)); + if (size_a <= size_b) + { + index_a += 1; + next_ref_a = outer_ref_a; + } + if (size_b <= size_a) + { + index_b += 1; + next_ref_b = outer_ref_b; + } } - /* If the references do not access the same object, we do not know - whether they alias or not. We do not care about TBAA or alignment - info so we can use OEP_ADDRESS_OF to avoid false negatives. - But the accesses have to use compatible types as otherwise the - built indices would not match. */ - if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), OEP_ADDRESS_OF) - || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)), - TREE_TYPE (DR_BASE_OBJECT (b)))) + /* See whether the sequence ends at the base and whether the two bases + are equal. We do not care about TBAA or alignment info so we can use + OEP_ADDRESS_OF to avoid false negatives. */ + tree base_a = DR_BASE_OBJECT (a); + tree base_b = DR_BASE_OBJECT (b); + bool same_base_p = (start_a + num_dimensions == num_dimensions_a + && start_b + num_dimensions == num_dimensions_b + && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) + && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) + && types_compatible_p (TREE_TYPE (base_a), + TREE_TYPE (base_b)) + && (!loop_nest.exists () + || (object_address_invariant_in_loop_p + (loop_nest[0], base_a)))); + + /* If the bases are the same, we can include the base variation too. + E.g. the b accesses in: + + for (int i = 0; i < n; ++i) + b[i + 4][0] = b[i][0]; + + have a definite dependence distance of 4, while for: + + for (int i = 0; i < n; ++i) + a[i + 4][0] = b[i][0]; + + the dependence distance depends on the gap between a and b. + + If the bases are different then we can only rely on the sequence + rooted at a structure access, since arrays are allowed to overlap + arbitrarily and change shape arbitrarily. E.g. we treat this as + valid code: + + int a[256]; + ... + ((int (*)[4][3])&a[1])[i][0] += ((int (*)[4][3])&a[2])[i][0]; + + where two lvalues with the same int[4][3] type overlap, and where + both lvalues are distinct from the object's declared type. */ + if (same_base_p) { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; + if (DR_UNCONSTRAINED_BASE (a)) + num_dimensions += 1; + } + else + { + start_a = struct_start_a; + start_b = struct_start_b; + num_dimensions = struct_num_dimensions; } - /* If the base of the object is not invariant in the loop nest, we cannot - analyze it. TODO -- in fact, it would suffice to record that there may - be arbitrary dependences in the loops where the base object varies. */ - if ((loop_nest.exists () - && !object_address_invariant_in_loop_p (loop_nest[0], DR_BASE_OBJECT (a))) - || DR_NUM_DIMENSIONS (a) == 0) + /* Punt if we didn't find a suitable sequence. */ + if (num_dimensions == 0) { DDR_ARE_DEPENDENT (res) = chrec_dont_know; return res; } - /* If the number of dimensions of the access to not agree we can have - a pointer access to a component of the array element type and an - array access while the base-objects are still the same. Punt. */ - if (DR_NUM_DIMENSIONS (a) != DR_NUM_DIMENSIONS (b)) + if (!same_base_p) { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; + /* Partial overlap is possible for different bases when strict aliasing + is not in effect. It's also possible if either base involves a union + access; e.g. for: + + struct s1 { int a[2]; }; + struct s2 { struct s1 b; int c; }; + struct s3 { int d; struct s1 e; }; + union u { struct s2 f; struct s3 g; } *p, *q; + + the s1 at "p->f.b" (base "p->f") partially overlaps the s1 at + "p->g.e" (base "p->g") and might partially overlap the s1 at + "q->g.e" (base "q->g"). */ + if (!flag_strict_aliasing + || ref_contains_union_access_p (struct_ref_a) + || ref_contains_union_access_p (struct_ref_b)) + { + DDR_ARE_DEPENDENT (res) = chrec_dont_know; + return res; + } + + DDR_COULD_BE_INDEPENDENT_P (res) = true; } DDR_AFFINE_P (res) = true; DDR_ARE_DEPENDENT (res) = NULL_TREE; - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); + DDR_SUBSCRIPTS (res).create (num_dimensions); DDR_LOOP_NEST (res) = loop_nest; DDR_INNER_LOOP (res) = 0; DDR_SELF_REFERENCE (res) = false; - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) + for (i = 0; i < num_dimensions; ++i) { struct subscript *subscript; subscript = XNEW (struct subscript); + SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, start_a + i); + SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, start_b + i); SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); SUB_LAST_CONFLICT (subscript) = chrec_dont_know; @@ -3163,14 +3312,15 @@ add_outer_distances (struct data_depende } /* Return false when fail to represent the data dependence as a - distance vector. INIT_B is set to true when a component has been + distance vector. A_INDEX is the index of the first reference + (0 for DDR_A, 1 for DDR_B) and B_INDEX is the index of the + second reference. INIT_B is set to true when a component has been added to the distance vector DIST_V. INDEX_CARRY is then set to the index in DIST_V that carries the dependence. */ static bool build_classic_dist_vector_1 (struct data_dependence_relation *ddr, - struct data_reference *ddr_a, - struct data_reference *ddr_b, + unsigned int a_index, unsigned int b_index, lambda_vector dist_v, bool *init_b, int *index_carry) { @@ -3188,8 +3338,8 @@ build_classic_dist_vector_1 (struct data return false; } - access_fn_a = DR_ACCESS_FN (ddr_a, i); - access_fn_b = DR_ACCESS_FN (ddr_b, i); + access_fn_a = SUB_ACCESS_FN (subscript, a_index); + access_fn_b = SUB_ACCESS_FN (subscript, b_index); if (TREE_CODE (access_fn_a) == POLYNOMIAL_CHREC && TREE_CODE (access_fn_b) == POLYNOMIAL_CHREC) @@ -3249,10 +3399,11 @@ build_classic_dist_vector_1 (struct data constant_access_functions (const struct data_dependence_relation *ddr) { unsigned i; + subscript *sub; - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) - if (!evolution_function_is_constant_p (DR_ACCESS_FN (DDR_A (ddr), i)) - || !evolution_function_is_constant_p (DR_ACCESS_FN (DDR_B (ddr), i))) + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) + if (!evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 0)) + || !evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 1))) return false; return true; @@ -3315,10 +3466,11 @@ add_other_self_distances (struct data_de lambda_vector dist_v; unsigned i; int index_carry = DDR_NB_LOOPS (ddr); + subscript *sub; - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) { - tree access_fun = DR_ACCESS_FN (DDR_A (ddr), i); + tree access_fun = SUB_ACCESS_FN (sub, 0); if (TREE_CODE (access_fun) == POLYNOMIAL_CHREC) { @@ -3330,7 +3482,7 @@ add_other_self_distances (struct data_de return; } - access_fun = DR_ACCESS_FN (DDR_A (ddr), 0); + access_fun = SUB_ACCESS_FN (DDR_SUBSCRIPT (ddr, 0), 0); if (TREE_CODE (CHREC_LEFT (access_fun)) == POLYNOMIAL_CHREC) add_multivariate_self_dist (ddr, access_fun); @@ -3401,6 +3553,23 @@ add_distance_for_zero_overlaps (struct d } } +/* Return true when the DDR contains two data references that have the + same access functions. */ + +static inline bool +same_access_functions (const struct data_dependence_relation *ddr) +{ + unsigned i; + subscript *sub; + + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) + if (!eq_evolutions_p (SUB_ACCESS_FN (sub, 0), + SUB_ACCESS_FN (sub, 1))) + return false; + + return true; +} + /* Compute the classic per loop distance vector. DDR is the data dependence relation to build a vector from. Return false when fail to represent the data dependence as a distance vector. */ @@ -3432,8 +3601,7 @@ build_classic_dist_vector (struct data_d } dist_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); - if (!build_classic_dist_vector_1 (ddr, DDR_A (ddr), DDR_B (ddr), - dist_v, &init_b, &index_carry)) + if (!build_classic_dist_vector_1 (ddr, 0, 1, dist_v, &init_b, &index_carry)) return false; /* Save the distance vector if we initialized one. */ @@ -3466,12 +3634,11 @@ build_classic_dist_vector (struct data_d if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr))) { lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), DDR_A (ddr), - loop_nest)) + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) return false; compute_subscript_distance (ddr); - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), - save_v, &init_b, &index_carry)) + if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b, + &index_carry)) return false; save_dist_v (ddr, save_v); DDR_REVERSED_P (ddr) = true; @@ -3507,12 +3674,10 @@ build_classic_dist_vector (struct data_d { lambda_vector opposite_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), - DDR_A (ddr), loop_nest)) + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) return false; compute_subscript_distance (ddr); - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), - opposite_v, &init_b, + if (!build_classic_dist_vector_1 (ddr, 1, 0, opposite_v, &init_b, &index_carry)) return false; @@ -3591,13 +3756,13 @@ build_classic_dir_vector (struct data_de } } -/* Helper function. Returns true when there is a dependence between - data references DRA and DRB. */ +/* Helper function. Returns true when there is a dependence between the + data references. A_INDEX is the index of the first reference (0 for + DDR_A, 1 for DDR_B) and B_INDEX is the index of the second reference. */ static bool subscript_dependence_tester_1 (struct data_dependence_relation *ddr, - struct data_reference *dra, - struct data_reference *drb, + unsigned int a_index, unsigned int b_index, struct loop *loop_nest) { unsigned int i; @@ -3609,8 +3774,8 @@ subscript_dependence_tester_1 (struct da { conflict_function *overlaps_a, *overlaps_b; - analyze_overlapping_iterations (DR_ACCESS_FN (dra, i), - DR_ACCESS_FN (drb, i), + analyze_overlapping_iterations (SUB_ACCESS_FN (subscript, a_index), + SUB_ACCESS_FN (subscript, b_index), &overlaps_a, &overlaps_b, &last_conflicts, loop_nest); @@ -3659,7 +3824,7 @@ subscript_dependence_tester_1 (struct da subscript_dependence_tester (struct data_dependence_relation *ddr, struct loop *loop_nest) { - if (subscript_dependence_tester_1 (ddr, DDR_A (ddr), DDR_B (ddr), loop_nest)) + if (subscript_dependence_tester_1 (ddr, 0, 1, loop_nest)) dependence_stats.num_dependence_dependent++; compute_subscript_distance (ddr); Index: gcc/tree-ssa-loop-prefetch.c =================================================================== --- gcc/tree-ssa-loop-prefetch.c 2017-03-28 16:19:28.000000000 +0100 +++ gcc/tree-ssa-loop-prefetch.c 2017-05-03 08:48:48.737038502 +0100 @@ -1650,6 +1650,7 @@ determine_loop_nest_reuse (struct loop * refb = (struct mem_ref *) DDR_B (dep)->aux; if (DDR_ARE_DEPENDENT (dep) == chrec_dont_know + || DDR_COULD_BE_INDEPENDENT_P (dep) || DDR_NUM_DIST_VECTS (dep) == 0) { /* If the dependence cannot be analyzed, assume that there might be Index: gcc/tree-vectorizer.h =================================================================== --- gcc/tree-vectorizer.h 2017-03-28 16:19:28.000000000 +0100 +++ gcc/tree-vectorizer.h 2017-05-03 08:48:48.738045102 +0100 @@ -383,7 +383,7 @@ #define LOOP_VINFO_ORIG_LOOP_INFO(L) #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \ ((L)->may_misalign_stmts.length () > 0) #define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L) \ - ((L)->may_alias_ddrs.length () > 0) + ((L)->comp_alias_ddrs.length () > 0) #define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L) \ (LOOP_VINFO_NITERS_ASSUMPTIONS (L)) #define LOOP_REQUIRES_VERSIONING(L) \ Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2017-05-03 08:48:30.536704993 +0100 +++ gcc/tree-vect-data-refs.c 2017-05-03 08:48:48.738045102 +0100 @@ -340,6 +340,26 @@ vect_analyze_data_ref_dependence (struct } loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr)); + + if (DDR_COULD_BE_INDEPENDENT_P (ddr)) + /* For dependence distances of 2 or more, we have the option of + limiting VF or checking for an alias at runtime. Prefer to check + at runtime if we can, to avoid limiting the VF unnecessarily when + the bases are in fact independent. + + Note that the alias checks will be removed if the VF ends up + being small enough. */ + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) + { + int dist = dist_v[loop_depth]; + if (dist != 0 && !(dist > 0 && DDR_REVERSED_P (ddr))) + { + if (vect_mark_for_runtime_alias_test (ddr, loop_vinfo)) + return false; + break; + } + } + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) { int dist = dist_v[loop_depth]; @@ -3017,6 +3037,44 @@ vect_no_alias_p (struct data_reference * return false; } +/* Return true if the minimum nonzero dependence distance for loop LOOP_DEPTH + in DDR is >= VF. */ + +static bool +dependence_distance_ge_vf (data_dependence_relation *ddr, + unsigned int loop_depth, unsigned HOST_WIDE_INT vf) +{ + if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE + || DDR_NUM_DIST_VECTS (ddr) == 0) + return false; + + /* If the dependence is exact, we should have limited the VF instead. */ + gcc_checking_assert (DDR_COULD_BE_INDEPENDENT_P (ddr)); + + unsigned int i; + lambda_vector dist_v; + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) + { + HOST_WIDE_INT dist = dist_v[loop_depth]; + if (dist != 0 + && !(dist > 0 && DDR_REVERSED_P (ddr)) + && (unsigned HOST_WIDE_INT) abs_hwi (dist) < vf) + return false; + } + + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_NOTE, vect_location, + "dependence distance between "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_A (ddr))); + dump_printf (MSG_NOTE, " and "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_B (ddr))); + dump_printf (MSG_NOTE, " is >= VF\n"); + } + + return true; +} + /* Function vect_prune_runtime_alias_test_list. Prune a list of ddrs to be tested at run-time by versioning for alias. @@ -3075,6 +3133,10 @@ vect_prune_runtime_alias_test_list (loop comp_alias_ddrs.create (may_alias_ddrs.length ()); + unsigned int loop_depth + = index_in_loop_nest (LOOP_VINFO_LOOP (loop_vinfo)->num, + LOOP_VINFO_LOOP_NEST (loop_vinfo)); + /* First, we collect all data ref pairs for aliasing checks. */ FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) { @@ -3084,6 +3146,11 @@ vect_prune_runtime_alias_test_list (loop tree segment_length_a, segment_length_b; gimple *stmt_a, *stmt_b; + /* Ignore the alias if the VF we chose ended up being no greater + than the dependence distance. */ + if (dependence_distance_ge_vf (ddr, loop_depth, vect_factor)) + continue; + dr_a = DDR_A (ddr); stmt_a = DR_STMT (DDR_A (ddr)); dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); @@ -3294,10 +3361,6 @@ vect_prune_runtime_alias_test_list (loop return false; } - /* All alias checks have been resolved at compilation time. */ - if (!comp_alias_ddrs.length ()) - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0); - return true; } Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c =================================================================== --- /dev/null 2017-05-03 08:16:26.972699664 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c 2017-05-03 08:48:48.736031902 +0100 @@ -0,0 +1,104 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ + +/* Intended to be larger than any VF. */ +#define GAP 128 +#define N (GAP * 3) + +struct s { int x[N + 1]; }; +struct t { struct s x[N + 1]; }; +struct u { int x[N + 1]; int y; }; + +void +f1 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[i]; +} + +void +f2 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[1].x[i] += b[2].x[i]; +} + +void +f3 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[1].x[i] += b[i].x[i]; +} + +void +f4 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[i].x[i] += b[i].x[i]; +} + +void +f5 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[i + 1]; +} + +void +f6 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[1].x[i] += b[2].x[i + 1]; +} + +void +f7 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[1].x[i] += b[i].x[i + 1]; +} + +void +f8 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[i].x[i] += b[i].x[i + 1]; +} + +void +f9 (struct s *a, struct t *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[1].x[i]; +} + +void +f10 (struct s *a, struct t *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[i].x[i]; +} + +void +f11 (struct u *a, struct u *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[i] + b[i].y; +} + +void +f12 (struct s *a, struct s *b) +{ + for (int i = 0; i < GAP; ++i) + a->x[i + GAP] += b->x[i]; +} + +void +f13 (struct s *a, struct s *b) +{ + for (int i = 0; i < GAP * 2; ++i) + a->x[i + GAP] += b->x[i]; +} + +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 13 "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c =================================================================== --- /dev/null 2017-05-03 08:16:26.972699664 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c 2017-05-03 08:48:48.736031902 +0100 @@ -0,0 +1,35 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ + +#define N 16 + +struct s1 { int a[N]; }; +struct s2 { struct s1 b; int c; }; +struct s3 { int d; struct s1 e; }; +union u { struct s2 f; struct s3 g; }; + +/* We allow a and b to overlap arbitrarily. */ + +void +f1 (int a[][N], int b[][N]) +{ + for (int i = 0; i < N; ++i) + a[0][i] += b[0][i]; +} + +void +f2 (union u *a, union u *b) +{ + for (int i = 0; i < N; ++i) + a->f.b.a[i] += b->g.e.a[i]; +} + +void +f3 (struct s1 *a, struct s1 *b) +{ + for (int i = 0; i < N - 1; ++i) + a->a[i + 1] += b->a[i]; +} + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c =================================================================== --- /dev/null 2017-05-03 08:16:26.972699664 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c 2017-05-03 08:48:48.736031902 +0100 @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ + +/* Intended to be larger than any VF. */ +#define GAP 128 +#define N (GAP * 3) + +struct s { int x[N]; }; + +void +f1 (struct s *a, struct s *b) +{ + for (int i = 0; i < GAP * 2; ++i) + a->x[i + GAP] += b->x[i]; +} + +/* { dg-final { scan-tree-dump-times "mark for run-time aliasing" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "improved number of alias checks from 1 to 0" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */