diff mbox series

PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding

Message ID CAAgBjMm2O=bSQBen=Mn7QdPKQEcJwRMs3MV8+wMuR7hvBr5n-w@mail.gmail.com
State New
Headers show
Series PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding | expand

Commit Message

Prathamesh Kulkarni Oct. 4, 2023, 10:16 a.m. UTC
Hi,
The attached patch attempts to fix PR111648.
As mentioned in PR, the issue is when a1 is a multiple of vector
length, we end up creating following encoding in result: { base_elem,
arg[0], arg[1], ... } (assuming S = 1),
where arg is chosen input vector, which is incorrect, since the
encoding originally in arg would be: { arg[0], arg[1], arg[2], ... }

For the test-case mentioned in PR, vectorizer pass creates
VEC_PERM_EXPR<arg0, arg, sel> where:
arg0: { -16, -9, -10, -11 }
arg1: { -12, -5, -6, -7 }
sel = { 3, 4, 5, 6 }

arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3.
Since a1 = 4 and arg_len = 4, it ended up creating the result with
following encoding:
res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern = 3
      = { -11, -12, -5 }

So for res[3], it used S = (-5) - (-12) = 7
And hence computed it as -5 + 7 = 2.
instead of selecting arg1[2], ie, -6.

The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is a multiple
of vector length, so a1 ... ae select elements only from stepped part
of the pattern
from input vector and return false for this case.

Since the vectors are VLS, fold_vec_perm_cst then sets:
res_npatterns = res_nelts
res_nelts_per_pattern  = 1
which seems to fix the issue by encoding all the elements.

The patch resulted in Case 4 and Case 5 failing from test_nunits_min_2 because
they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively,
which used a1 = 0, and thus selected arg1[0].

I removed Case 4 because it was already covered in test_nunits_min_4,
and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... }
and added a new Case 9 to test for this issue.

Passes bootstrap+test on aarch64-linux-gnu with and without SVE,
and on x86_64-linux-gnu.
Does the patch look OK ?

Thanks,
Prathamesh
[PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.

gcc/ChangeLog:
	PR tree-optimization/111648
	* fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if a1
	is a multiple of vector length.
	(test_nunits_min_2): Remove Case 4 and move Case 5 to ...
	(test_nunits_min_4): ... here and rename case numbers. Also add
	Case 9.

gcc/testsuite/ChangeLog:
	PR tree-optimization/111648
	* gcc.dg/vect/pr111648.c: New test.

Comments

Richard Sandiford Oct. 9, 2023, 11:35 a.m. UTC | #1
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> Hi,
> The attached patch attempts to fix PR111648.
> As mentioned in PR, the issue is when a1 is a multiple of vector
> length, we end up creating following encoding in result: { base_elem,
> arg[0], arg[1], ... } (assuming S = 1),
> where arg is chosen input vector, which is incorrect, since the
> encoding originally in arg would be: { arg[0], arg[1], arg[2], ... }
>
> For the test-case mentioned in PR, vectorizer pass creates
> VEC_PERM_EXPR<arg0, arg, sel> where:
> arg0: { -16, -9, -10, -11 }
> arg1: { -12, -5, -6, -7 }
> sel = { 3, 4, 5, 6 }
>
> arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3.
> Since a1 = 4 and arg_len = 4, it ended up creating the result with
> following encoding:
> res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern = 3
>       = { -11, -12, -5 }
>
> So for res[3], it used S = (-5) - (-12) = 7
> And hence computed it as -5 + 7 = 2.
> instead of selecting arg1[2], ie, -6.
>
> The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is a multiple
> of vector length, so a1 ... ae select elements only from stepped part
> of the pattern
> from input vector and return false for this case.
>
> Since the vectors are VLS, fold_vec_perm_cst then sets:
> res_npatterns = res_nelts
> res_nelts_per_pattern  = 1
> which seems to fix the issue by encoding all the elements.
>
> The patch resulted in Case 4 and Case 5 failing from test_nunits_min_2 because
> they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively,
> which used a1 = 0, and thus selected arg1[0].
>
> I removed Case 4 because it was already covered in test_nunits_min_4,
> and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... }
> and added a new Case 9 to test for this issue.
>
> Passes bootstrap+test on aarch64-linux-gnu with and without SVE,
> and on x86_64-linux-gnu.
> Does the patch look OK ?
>
> Thanks,
> Prathamesh
>
> [PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.
>
> gcc/ChangeLog:
> 	PR tree-optimization/111648
> 	* fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if a1
> 	is a multiple of vector length.
> 	(test_nunits_min_2): Remove Case 4 and move Case 5 to ...
> 	(test_nunits_min_4): ... here and rename case numbers. Also add
> 	Case 9.
>
> gcc/testsuite/ChangeLog:
> 	PR tree-optimization/111648
> 	* gcc.dg/vect/pr111648.c: New test.
>
>
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 4f8561509ff..c5f421d6b76 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
>  	  return false;
>  	}
>  
> -      /* Ensure that the stepped sequence always selects from the same
> -	 input pattern.  */
> +      /* Ensure that the stepped sequence always selects from the stepped
> +	 part of same input pattern.  */
>        unsigned arg_npatterns
>  	= ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
>  			  : VECTOR_CST_NPATTERNS (arg1);
> @@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
>  	    *reason = "step is not multiple of npatterns";
>  	  return false;
>  	}
> +
> +      /* If a1 is a multiple of len, it will select base element of input
> +	 vector resulting in following encoding:
> +	 { base_elem, arg[0], arg[1], ... } where arg is the chosen input
> +	 vector. This encoding is not originally present in arg, since it's
> +	 defined as:
> +	 { arg[0], arg[1], arg[2], ... }.  */
> +
> +      if (multiple_p (a1, arg_len))
> +	{
> +	  if (reason)
> +	    *reason = "selecting base element of input vector";
> +	  return false;
> +	}

That wouldn't catch (for example) cases where a1 == arg_len + 1 and the
second argument has 2 stepped patterns.

The equivalent condition that handles multiple patterns would
probably be to reject q1 < arg_npatterns.  But that's only necessary if:

(1) the argument has three elements per pattern (i.e. has a stepped
    sequence) and

(2) element 2 - element 1 != element 1 - element 0

I think we should check those to avoid pessimising VLA cases.

Thanks,
Richard

>      }
>  
>    return true;
> @@ -17425,47 +17439,6 @@ test_nunits_min_2 (machine_mode vmode)
>  	tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) };
>  	validate_res (2, 2, res, expected_res);
>        }
> -
> -      /* Case 4: mask = {0, 0, 1, ...} // (1, 3)
> -	 Test that the stepped sequence of the pattern selects from
> -	 same input pattern. Since input vectors have npatterns = 2,
> -	 and step (a2 - a1) = 1, step is not a multiple of npatterns
> -	 in input vector. So return NULL_TREE.  */
> -      {
> -	tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
> -	tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
> -	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> -
> -	vec_perm_builder builder (len, 1, 3);
> -	poly_uint64 mask_elems[] = { 0, 0, 1 };
> -	builder_push_elems (builder, mask_elems);
> -
> -	vec_perm_indices sel (builder, 2, len);
> -	const char *reason;
> -	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel,
> -				      &reason);
> -	ASSERT_TRUE (res == NULL_TREE);
> -	ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns"));
> -      }
> -
> -      /* Case 5: mask = {len, 0, 1, ...} // (1, 3)
> -	 Test that stepped sequence of the pattern selects from arg0.
> -	 res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
> -      {
> -	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> -	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> -	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> -
> -	vec_perm_builder builder (len, 1, 3);
> -	poly_uint64 mask_elems[] = { len, 0, 1 };
> -	builder_push_elems (builder, mask_elems);
> -
> -	vec_perm_indices sel (builder, 2, len);
> -	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> -
> -	tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) };
> -	validate_res (1, 3, res, expected_res);
> -      }
>      }
>  }
>  
> @@ -17528,7 +17501,26 @@ test_nunits_min_4 (machine_mode vmode)
>  	validate_res (1, 3, res, expected_res);
>        }
>  
> -      /* Case 4:
> +      /* Case 4: mask = {len, 1, 2, ...} // (1, 3)
> +	 Test that stepped sequence of the pattern selects from arg0.
> +	 res = { arg1[0], arg0[1], arg0[2], ... } // (1, 3)  */
> +      {
> +	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +	vec_perm_builder builder (len, 1, 3);
> +	poly_uint64 mask_elems[] = { len, 1, 2 };
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, len);
> +	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +
> +	tree expected_res[] = { ARG1(0), ARG0(1), ARG0(2) };
> +	validate_res (1, 3, res, expected_res);
> +      }
> +
> +      /* Case 5:
>  	sel = { len, 0, 2, ... } // (1, 3)
>  	This should return NULL because we cross the input vectors.
>  	Because,
> @@ -17561,7 +17553,7 @@ test_nunits_min_4 (machine_mode vmode)
>  	ASSERT_TRUE (!strcmp (reason, "crossed input vectors"));
>        }
>  
> -      /* Case 5: npatterns(arg0) = 4 > npatterns(sel) = 2
> +      /* Case 6: npatterns(arg0) = 4 > npatterns(sel) = 2
>  	 mask = { 0, len, 1, len + 1, ...} // (2, 2)
>  	 res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2)
>  
> @@ -17583,7 +17575,7 @@ test_nunits_min_4 (machine_mode vmode)
>  	validate_res (2, 2, res, expected_res);
>        }
>  
> -      /* Case 6: Test combination in sel, where one pattern is dup and other
> +      /* Case 7: Test combination in sel, where one pattern is dup and other
>  	 is stepped sequence.
>  	 sel = { 0, 0, 0, 1, 0, 2, ... } // (2, 3)
>  	 res = { arg0[0], arg0[0], arg0[0],
> @@ -17605,7 +17597,7 @@ test_nunits_min_4 (machine_mode vmode)
>  	validate_res (2, 3, res, expected_res);
>        }
>  
> -      /* Case 7: PR111048: Check that we set arg_npatterns correctly,
> +      /* Case 8: PR111048: Check that we set arg_npatterns correctly,
>  	 when arg0, arg1 and sel have different number of patterns.
>  	 arg0 is of shape (1, 1)
>  	 arg1 is of shape (4, 1)
> @@ -17634,6 +17626,51 @@ test_nunits_min_4 (machine_mode vmode)
>  	ASSERT_TRUE (res == NULL_TREE);
>  	ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns"));
>        }
> +
> +      /* Case 9: PR111648 - a1 is multiple of vector length,
> +	 which results in incorrect encoding. Verify that we return
> +	 NULL for this case.
> +	 sel = { base_elem, len, len+1, ... } // (1, 3)
> +	 In this case, the single pattern is: { base_elem len, len+1, ...}
> +	 Let's assume that base_elem is used for indexing into arg0,
> +	 and a1 ... ae chooses elements from arg1.
> +	 So res = { arg0[base_elem], arg1[0], arg1[1], ... } // (1, 3)
> +	 Which creates an incorrect encoding with S = arg1[1] - arg1[0]
> +	 while the original encoding in arg1 is
> +	 arg1: { arg1[0], arg1[1], arg1[2], ... }
> +	 with S = arg1[2] - arg1[1].
> +
> +	 As a concrete example, for above PR:
> +	 arg0: { -16, -9, -10, -11 }
> +	 arg1: { -12, -5, -6, -7 }
> +	 sel = { 3, 4, 5, 6 }
> +
> +	 arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3.
> +	 Since a1 = 4 and arg_len = 4, it ended up creating the result with
> +	 following encoding:
> +	 res = { arg0[3], arg1[0], arg1[1] } // (1, 3)
> +	     = { -11, -12, -5 }
> +
> +	 So for res[3], it used S = (-5) - (-12) = 7
> +	 And hence computed it as -5 + 7 = 2.
> +	 instead of arg1[2], ie, -6, which is the correct value.
> +	 Ensure that valid_mask_for_fold_vec_perm_cst returns false for this case.  */
> +      {
> +	tree arg0 = build_vec_cst_rand (vmode, 1, 3);
> +	tree arg1 = build_vec_cst_rand (vmode, 1, 3);
> +	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +	vec_perm_builder builder (len, 1, 3);
> +	poly_uint64 mask_elems[] = { 0, len, len+1 };
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, len);
> +	const char *reason;
> +	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason);
> +	ASSERT_TRUE (res == NULL_TREE);
> +	ASSERT_TRUE (!strcmp (reason,
> +			      "selecting base element of input vector"));
> +      }
>      }
>  }
>  
> diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c
> new file mode 100644
> index 00000000000..093e2b02654
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +int a;
> +int *b = &a;
> +static int **c = &b;
> +static int d;
> +short e;
> +short f;
> +
> +_Bool foo ()
> +{
> +  f = -21;
> +  for (; f < -5; f++) {
> +    e = f ^ 3;
> +    d = *b;
> +    **c = e;
> +  }
> +
> +  return d == -6;
> +}
> +
> +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */
Prathamesh Kulkarni Oct. 11, 2023, 11:12 a.m. UTC | #2
On Mon, 9 Oct 2023 at 17:05, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > Hi,
> > The attached patch attempts to fix PR111648.
> > As mentioned in PR, the issue is when a1 is a multiple of vector
> > length, we end up creating following encoding in result: { base_elem,
> > arg[0], arg[1], ... } (assuming S = 1),
> > where arg is chosen input vector, which is incorrect, since the
> > encoding originally in arg would be: { arg[0], arg[1], arg[2], ... }
> >
> > For the test-case mentioned in PR, vectorizer pass creates
> > VEC_PERM_EXPR<arg0, arg, sel> where:
> > arg0: { -16, -9, -10, -11 }
> > arg1: { -12, -5, -6, -7 }
> > sel = { 3, 4, 5, 6 }
> >
> > arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3.
> > Since a1 = 4 and arg_len = 4, it ended up creating the result with
> > following encoding:
> > res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern = 3
> >       = { -11, -12, -5 }
> >
> > So for res[3], it used S = (-5) - (-12) = 7
> > And hence computed it as -5 + 7 = 2.
> > instead of selecting arg1[2], ie, -6.
> >
> > The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is a multiple
> > of vector length, so a1 ... ae select elements only from stepped part
> > of the pattern
> > from input vector and return false for this case.
> >
> > Since the vectors are VLS, fold_vec_perm_cst then sets:
> > res_npatterns = res_nelts
> > res_nelts_per_pattern  = 1
> > which seems to fix the issue by encoding all the elements.
> >
> > The patch resulted in Case 4 and Case 5 failing from test_nunits_min_2 because
> > they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively,
> > which used a1 = 0, and thus selected arg1[0].
> >
> > I removed Case 4 because it was already covered in test_nunits_min_4,
> > and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... }
> > and added a new Case 9 to test for this issue.
> >
> > Passes bootstrap+test on aarch64-linux-gnu with and without SVE,
> > and on x86_64-linux-gnu.
> > Does the patch look OK ?
> >
> > Thanks,
> > Prathamesh
> >
> > [PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.
> >
> > gcc/ChangeLog:
> >       PR tree-optimization/111648
> >       * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if a1
> >       is a multiple of vector length.
> >       (test_nunits_min_2): Remove Case 4 and move Case 5 to ...
> >       (test_nunits_min_4): ... here and rename case numbers. Also add
> >       Case 9.
> >
> > gcc/testsuite/ChangeLog:
> >       PR tree-optimization/111648
> >       * gcc.dg/vect/pr111648.c: New test.
> >
> >
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > index 4f8561509ff..c5f421d6b76 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
> >         return false;
> >       }
> >
> > -      /* Ensure that the stepped sequence always selects from the same
> > -      input pattern.  */
> > +      /* Ensure that the stepped sequence always selects from the stepped
> > +      part of same input pattern.  */
> >        unsigned arg_npatterns
> >       = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
> >                         : VECTOR_CST_NPATTERNS (arg1);
> > @@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
> >           *reason = "step is not multiple of npatterns";
> >         return false;
> >       }
> > +
> > +      /* If a1 is a multiple of len, it will select base element of input
> > +      vector resulting in following encoding:
> > +      { base_elem, arg[0], arg[1], ... } where arg is the chosen input
> > +      vector. This encoding is not originally present in arg, since it's
> > +      defined as:
> > +      { arg[0], arg[1], arg[2], ... }.  */
> > +
> > +      if (multiple_p (a1, arg_len))
> > +     {
> > +       if (reason)
> > +         *reason = "selecting base element of input vector";
> > +       return false;
> > +     }
>
> That wouldn't catch (for example) cases where a1 == arg_len + 1 and the
> second argument has 2 stepped patterns.
Ah right, thanks for pointing out. In the attached patch I extended the check
so that r1 < arg_npatterns which should check if we are choosing base
elements from any of the patterns in arg (and not just first).
Does that look OK ?
>
> The equivalent condition that handles multiple patterns would
> probably be to reject q1 < arg_npatterns.  But that's only necessary if:
Sorry, I don't understand -- we use q1 only for determining which
vector to choose from,
and r1 will give the index for first element ?
>
> (1) the argument has three elements per pattern (i.e. has a stepped
>     sequence) and
>
> (2) element 2 - element 1 != element 1 - element 0
>
> I think we should check those to avoid pessimising VLA cases.
Thanks for the suggestions. In attached POC patch (stage-1 tested), I
added the above checks,
does it look in the right direction ? Also, should this patch be the
right fix for PR111754 ?

Thanks,
Prathamesh
>
> Thanks,
> Richard
>
> >      }
> >
> >    return true;
> > @@ -17425,47 +17439,6 @@ test_nunits_min_2 (machine_mode vmode)
> >       tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) };
> >       validate_res (2, 2, res, expected_res);
> >        }
> > -
> > -      /* Case 4: mask = {0, 0, 1, ...} // (1, 3)
> > -      Test that the stepped sequence of the pattern selects from
> > -      same input pattern. Since input vectors have npatterns = 2,
> > -      and step (a2 - a1) = 1, step is not a multiple of npatterns
> > -      in input vector. So return NULL_TREE.  */
> > -      {
> > -     tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
> > -     tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
> > -     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> > -
> > -     vec_perm_builder builder (len, 1, 3);
> > -     poly_uint64 mask_elems[] = { 0, 0, 1 };
> > -     builder_push_elems (builder, mask_elems);
> > -
> > -     vec_perm_indices sel (builder, 2, len);
> > -     const char *reason;
> > -     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel,
> > -                                   &reason);
> > -     ASSERT_TRUE (res == NULL_TREE);
> > -     ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns"));
> > -      }
> > -
> > -      /* Case 5: mask = {len, 0, 1, ...} // (1, 3)
> > -      Test that stepped sequence of the pattern selects from arg0.
> > -      res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
> > -      {
> > -     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> > -     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> > -     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> > -
> > -     vec_perm_builder builder (len, 1, 3);
> > -     poly_uint64 mask_elems[] = { len, 0, 1 };
> > -     builder_push_elems (builder, mask_elems);
> > -
> > -     vec_perm_indices sel (builder, 2, len);
> > -     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> > -
> > -     tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) };
> > -     validate_res (1, 3, res, expected_res);
> > -      }
> >      }
> >  }
> >
> > @@ -17528,7 +17501,26 @@ test_nunits_min_4 (machine_mode vmode)
> >       validate_res (1, 3, res, expected_res);
> >        }
> >
> > -      /* Case 4:
> > +      /* Case 4: mask = {len, 1, 2, ...} // (1, 3)
> > +      Test that stepped sequence of the pattern selects from arg0.
> > +      res = { arg1[0], arg0[1], arg0[2], ... } // (1, 3)  */
> > +      {
> > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> > +     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> > +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> > +
> > +     vec_perm_builder builder (len, 1, 3);
> > +     poly_uint64 mask_elems[] = { len, 1, 2 };
> > +     builder_push_elems (builder, mask_elems);
> > +
> > +     vec_perm_indices sel (builder, 2, len);
> > +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> > +
> > +     tree expected_res[] = { ARG1(0), ARG0(1), ARG0(2) };
> > +     validate_res (1, 3, res, expected_res);
> > +      }
> > +
> > +      /* Case 5:
> >       sel = { len, 0, 2, ... } // (1, 3)
> >       This should return NULL because we cross the input vectors.
> >       Because,
> > @@ -17561,7 +17553,7 @@ test_nunits_min_4 (machine_mode vmode)
> >       ASSERT_TRUE (!strcmp (reason, "crossed input vectors"));
> >        }
> >
> > -      /* Case 5: npatterns(arg0) = 4 > npatterns(sel) = 2
> > +      /* Case 6: npatterns(arg0) = 4 > npatterns(sel) = 2
> >        mask = { 0, len, 1, len + 1, ...} // (2, 2)
> >        res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2)
> >
> > @@ -17583,7 +17575,7 @@ test_nunits_min_4 (machine_mode vmode)
> >       validate_res (2, 2, res, expected_res);
> >        }
> >
> > -      /* Case 6: Test combination in sel, where one pattern is dup and other
> > +      /* Case 7: Test combination in sel, where one pattern is dup and other
> >        is stepped sequence.
> >        sel = { 0, 0, 0, 1, 0, 2, ... } // (2, 3)
> >        res = { arg0[0], arg0[0], arg0[0],
> > @@ -17605,7 +17597,7 @@ test_nunits_min_4 (machine_mode vmode)
> >       validate_res (2, 3, res, expected_res);
> >        }
> >
> > -      /* Case 7: PR111048: Check that we set arg_npatterns correctly,
> > +      /* Case 8: PR111048: Check that we set arg_npatterns correctly,
> >        when arg0, arg1 and sel have different number of patterns.
> >        arg0 is of shape (1, 1)
> >        arg1 is of shape (4, 1)
> > @@ -17634,6 +17626,51 @@ test_nunits_min_4 (machine_mode vmode)
> >       ASSERT_TRUE (res == NULL_TREE);
> >       ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns"));
> >        }
> > +
> > +      /* Case 9: PR111648 - a1 is multiple of vector length,
> > +      which results in incorrect encoding. Verify that we return
> > +      NULL for this case.
> > +      sel = { base_elem, len, len+1, ... } // (1, 3)
> > +      In this case, the single pattern is: { base_elem len, len+1, ...}
> > +      Let's assume that base_elem is used for indexing into arg0,
> > +      and a1 ... ae chooses elements from arg1.
> > +      So res = { arg0[base_elem], arg1[0], arg1[1], ... } // (1, 3)
> > +      Which creates an incorrect encoding with S = arg1[1] - arg1[0]
> > +      while the original encoding in arg1 is
> > +      arg1: { arg1[0], arg1[1], arg1[2], ... }
> > +      with S = arg1[2] - arg1[1].
> > +
> > +      As a concrete example, for above PR:
> > +      arg0: { -16, -9, -10, -11 }
> > +      arg1: { -12, -5, -6, -7 }
> > +      sel = { 3, 4, 5, 6 }
> > +
> > +      arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3.
> > +      Since a1 = 4 and arg_len = 4, it ended up creating the result with
> > +      following encoding:
> > +      res = { arg0[3], arg1[0], arg1[1] } // (1, 3)
> > +          = { -11, -12, -5 }
> > +
> > +      So for res[3], it used S = (-5) - (-12) = 7
> > +      And hence computed it as -5 + 7 = 2.
> > +      instead of arg1[2], ie, -6, which is the correct value.
> > +      Ensure that valid_mask_for_fold_vec_perm_cst returns false for this case.  */
> > +      {
> > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3);
> > +     tree arg1 = build_vec_cst_rand (vmode, 1, 3);
> > +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> > +
> > +     vec_perm_builder builder (len, 1, 3);
> > +     poly_uint64 mask_elems[] = { 0, len, len+1 };
> > +     builder_push_elems (builder, mask_elems);
> > +
> > +     vec_perm_indices sel (builder, 2, len);
> > +     const char *reason;
> > +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason);
> > +     ASSERT_TRUE (res == NULL_TREE);
> > +     ASSERT_TRUE (!strcmp (reason,
> > +                           "selecting base element of input vector"));
> > +      }
> >      }
> >  }
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c
> > new file mode 100644
> > index 00000000000..093e2b02654
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c
> > @@ -0,0 +1,23 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +
> > +int a;
> > +int *b = &a;
> > +static int **c = &b;
> > +static int d;
> > +short e;
> > +short f;
> > +
> > +_Bool foo ()
> > +{
> > +  f = -21;
> > +  for (; f < -5; f++) {
> > +    e = f ^ 3;
> > +    d = *b;
> > +    **c = e;
> > +  }
> > +
> > +  return d == -6;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 4f8561509ff..ae914cbd880 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -10607,6 +10607,25 @@ vec_cst_ctor_to_array (tree arg, unsigned int nelts, tree *elts)
   return true;
 }
 
+static poly_int64
+vector_cst_elt_poly_index (tree arg, poly_uint64 index)
+{
+  int q;
+  poly_uint64 r;
+  poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg));
+
+  if (!can_div_trunc_p (index, len, &q, &r))
+    return NULL_TREE;
+
+  unsigned HOST_WIDE_INT i;
+  if (!r.is_constant (&i))
+    return NULL_TREE;
+
+  tree elem = vector_cst_elt (arg, i);
+  gcc_assert (elem != NULL_TREE);
+  return tree_to_poly_int64 (elem);
+}
+
 /* Helper routine for fold_vec_perm_cst to check if SEL is a suitable
    mask for VLA vec_perm folding.
    REASON if specified, will contain the reason why SEL is not suitable.
@@ -10684,9 +10703,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
 
       /* Ensure that the stepped sequence always selects from the same
 	 input pattern.  */
-      unsigned arg_npatterns
-	= ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
-			  : VECTOR_CST_NPATTERNS (arg1);
+      tree arg = ((q1 & 1) == 0) ? arg0 : arg1;
+      unsigned arg_npatterns = VECTOR_CST_NPATTERNS (arg);
 
       if (!multiple_p (step, arg_npatterns))
 	{
@@ -10694,6 +10712,21 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
 	    *reason = "step is not multiple of npatterns";
 	  return false;
 	}
+
+      /* Ensure that a1 only selects from stepped part of the pattern from arg,
+	 if the pattern is not a natural stepped sequence, ie,
+	 ((a2 - a1) != (a1 - a0)).  */
+
+      poly_int64 arg_elem0 = vector_cst_elt_poly_index (arg, sel[pattern]);
+      poly_int64 arg_elem1 = vector_cst_elt_poly_index (arg, a1);
+      poly_int64 arg_elem2 = vector_cst_elt_poly_index (arg, a2);
+      if (!known_eq (arg_elem2 - arg_elem1, arg_elem1 - arg_elem0)
+	  && known_lt (r1, arg_npatterns))
+	{
+	  if (reason)
+	    *reason = "choosing base element from input vector";
+	  return false;
+	}
     }
 
   return true;
Prathamesh Kulkarni Oct. 11, 2023, 11:27 a.m. UTC | #3
On Wed, 11 Oct 2023 at 16:42, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> On Mon, 9 Oct 2023 at 17:05, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
> >
> > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > > Hi,
> > > The attached patch attempts to fix PR111648.
> > > As mentioned in PR, the issue is when a1 is a multiple of vector
> > > length, we end up creating following encoding in result: { base_elem,
> > > arg[0], arg[1], ... } (assuming S = 1),
> > > where arg is chosen input vector, which is incorrect, since the
> > > encoding originally in arg would be: { arg[0], arg[1], arg[2], ... }
> > >
> > > For the test-case mentioned in PR, vectorizer pass creates
> > > VEC_PERM_EXPR<arg0, arg, sel> where:
> > > arg0: { -16, -9, -10, -11 }
> > > arg1: { -12, -5, -6, -7 }
> > > sel = { 3, 4, 5, 6 }
> > >
> > > arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3.
> > > Since a1 = 4 and arg_len = 4, it ended up creating the result with
> > > following encoding:
> > > res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern = 3
> > >       = { -11, -12, -5 }
> > >
> > > So for res[3], it used S = (-5) - (-12) = 7
> > > And hence computed it as -5 + 7 = 2.
> > > instead of selecting arg1[2], ie, -6.
> > >
> > > The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is a multiple
> > > of vector length, so a1 ... ae select elements only from stepped part
> > > of the pattern
> > > from input vector and return false for this case.
> > >
> > > Since the vectors are VLS, fold_vec_perm_cst then sets:
> > > res_npatterns = res_nelts
> > > res_nelts_per_pattern  = 1
> > > which seems to fix the issue by encoding all the elements.
> > >
> > > The patch resulted in Case 4 and Case 5 failing from test_nunits_min_2 because
> > > they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively,
> > > which used a1 = 0, and thus selected arg1[0].
> > >
> > > I removed Case 4 because it was already covered in test_nunits_min_4,
> > > and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... }
> > > and added a new Case 9 to test for this issue.
> > >
> > > Passes bootstrap+test on aarch64-linux-gnu with and without SVE,
> > > and on x86_64-linux-gnu.
> > > Does the patch look OK ?
> > >
> > > Thanks,
> > > Prathamesh
> > >
> > > [PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.
> > >
> > > gcc/ChangeLog:
> > >       PR tree-optimization/111648
> > >       * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if a1
> > >       is a multiple of vector length.
> > >       (test_nunits_min_2): Remove Case 4 and move Case 5 to ...
> > >       (test_nunits_min_4): ... here and rename case numbers. Also add
> > >       Case 9.
> > >
> > > gcc/testsuite/ChangeLog:
> > >       PR tree-optimization/111648
> > >       * gcc.dg/vect/pr111648.c: New test.
> > >
> > >
> > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > > index 4f8561509ff..c5f421d6b76 100644
> > > --- a/gcc/fold-const.cc
> > > +++ b/gcc/fold-const.cc
> > > @@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
> > >         return false;
> > >       }
> > >
> > > -      /* Ensure that the stepped sequence always selects from the same
> > > -      input pattern.  */
> > > +      /* Ensure that the stepped sequence always selects from the stepped
> > > +      part of same input pattern.  */
> > >        unsigned arg_npatterns
> > >       = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
> > >                         : VECTOR_CST_NPATTERNS (arg1);
> > > @@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
> > >           *reason = "step is not multiple of npatterns";
> > >         return false;
> > >       }
> > > +
> > > +      /* If a1 is a multiple of len, it will select base element of input
> > > +      vector resulting in following encoding:
> > > +      { base_elem, arg[0], arg[1], ... } where arg is the chosen input
> > > +      vector. This encoding is not originally present in arg, since it's
> > > +      defined as:
> > > +      { arg[0], arg[1], arg[2], ... }.  */
> > > +
> > > +      if (multiple_p (a1, arg_len))
> > > +     {
> > > +       if (reason)
> > > +         *reason = "selecting base element of input vector";
> > > +       return false;
> > > +     }
> >
> > That wouldn't catch (for example) cases where a1 == arg_len + 1 and the
> > second argument has 2 stepped patterns.
> Ah right, thanks for pointing out. In the attached patch I extended the check
> so that r1 < arg_npatterns which should check if we are choosing base
> elements from any of the patterns in arg (and not just first).
> Does that look OK ?
> >
> > The equivalent condition that handles multiple patterns would
> > probably be to reject q1 < arg_npatterns.  But that's only necessary if:
> Sorry, I don't understand -- we use q1 only for determining which
> vector to choose from,
> and r1 will give the index for first element ?
> >
> > (1) the argument has three elements per pattern (i.e. has a stepped
> >     sequence) and
> >
> > (2) element 2 - element 1 != element 1 - element 0
> >
> > I think we should check those to avoid pessimising VLA cases.
> Thanks for the suggestions. In attached POC patch (stage-1 tested), I
> added the above checks,
> does it look in the right direction ? Also, should this patch be the
> right fix for PR111754 ?
Oops sorry, this patch causes build errors on aarch64. Please ignore it.
Sorry for the noise.

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Richard
> >
> > >      }
> > >
> > >    return true;
> > > @@ -17425,47 +17439,6 @@ test_nunits_min_2 (machine_mode vmode)
> > >       tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) };
> > >       validate_res (2, 2, res, expected_res);
> > >        }
> > > -
> > > -      /* Case 4: mask = {0, 0, 1, ...} // (1, 3)
> > > -      Test that the stepped sequence of the pattern selects from
> > > -      same input pattern. Since input vectors have npatterns = 2,
> > > -      and step (a2 - a1) = 1, step is not a multiple of npatterns
> > > -      in input vector. So return NULL_TREE.  */
> > > -      {
> > > -     tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
> > > -     tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
> > > -     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> > > -
> > > -     vec_perm_builder builder (len, 1, 3);
> > > -     poly_uint64 mask_elems[] = { 0, 0, 1 };
> > > -     builder_push_elems (builder, mask_elems);
> > > -
> > > -     vec_perm_indices sel (builder, 2, len);
> > > -     const char *reason;
> > > -     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel,
> > > -                                   &reason);
> > > -     ASSERT_TRUE (res == NULL_TREE);
> > > -     ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns"));
> > > -      }
> > > -
> > > -      /* Case 5: mask = {len, 0, 1, ...} // (1, 3)
> > > -      Test that stepped sequence of the pattern selects from arg0.
> > > -      res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
> > > -      {
> > > -     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> > > -     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> > > -     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> > > -
> > > -     vec_perm_builder builder (len, 1, 3);
> > > -     poly_uint64 mask_elems[] = { len, 0, 1 };
> > > -     builder_push_elems (builder, mask_elems);
> > > -
> > > -     vec_perm_indices sel (builder, 2, len);
> > > -     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> > > -
> > > -     tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) };
> > > -     validate_res (1, 3, res, expected_res);
> > > -      }
> > >      }
> > >  }
> > >
> > > @@ -17528,7 +17501,26 @@ test_nunits_min_4 (machine_mode vmode)
> > >       validate_res (1, 3, res, expected_res);
> > >        }
> > >
> > > -      /* Case 4:
> > > +      /* Case 4: mask = {len, 1, 2, ...} // (1, 3)
> > > +      Test that stepped sequence of the pattern selects from arg0.
> > > +      res = { arg1[0], arg0[1], arg0[2], ... } // (1, 3)  */
> > > +      {
> > > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> > > +     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> > > +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> > > +
> > > +     vec_perm_builder builder (len, 1, 3);
> > > +     poly_uint64 mask_elems[] = { len, 1, 2 };
> > > +     builder_push_elems (builder, mask_elems);
> > > +
> > > +     vec_perm_indices sel (builder, 2, len);
> > > +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> > > +
> > > +     tree expected_res[] = { ARG1(0), ARG0(1), ARG0(2) };
> > > +     validate_res (1, 3, res, expected_res);
> > > +      }
> > > +
> > > +      /* Case 5:
> > >       sel = { len, 0, 2, ... } // (1, 3)
> > >       This should return NULL because we cross the input vectors.
> > >       Because,
> > > @@ -17561,7 +17553,7 @@ test_nunits_min_4 (machine_mode vmode)
> > >       ASSERT_TRUE (!strcmp (reason, "crossed input vectors"));
> > >        }
> > >
> > > -      /* Case 5: npatterns(arg0) = 4 > npatterns(sel) = 2
> > > +      /* Case 6: npatterns(arg0) = 4 > npatterns(sel) = 2
> > >        mask = { 0, len, 1, len + 1, ...} // (2, 2)
> > >        res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2)
> > >
> > > @@ -17583,7 +17575,7 @@ test_nunits_min_4 (machine_mode vmode)
> > >       validate_res (2, 2, res, expected_res);
> > >        }
> > >
> > > -      /* Case 6: Test combination in sel, where one pattern is dup and other
> > > +      /* Case 7: Test combination in sel, where one pattern is dup and other
> > >        is stepped sequence.
> > >        sel = { 0, 0, 0, 1, 0, 2, ... } // (2, 3)
> > >        res = { arg0[0], arg0[0], arg0[0],
> > > @@ -17605,7 +17597,7 @@ test_nunits_min_4 (machine_mode vmode)
> > >       validate_res (2, 3, res, expected_res);
> > >        }
> > >
> > > -      /* Case 7: PR111048: Check that we set arg_npatterns correctly,
> > > +      /* Case 8: PR111048: Check that we set arg_npatterns correctly,
> > >        when arg0, arg1 and sel have different number of patterns.
> > >        arg0 is of shape (1, 1)
> > >        arg1 is of shape (4, 1)
> > > @@ -17634,6 +17626,51 @@ test_nunits_min_4 (machine_mode vmode)
> > >       ASSERT_TRUE (res == NULL_TREE);
> > >       ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns"));
> > >        }
> > > +
> > > +      /* Case 9: PR111648 - a1 is multiple of vector length,
> > > +      which results in incorrect encoding. Verify that we return
> > > +      NULL for this case.
> > > +      sel = { base_elem, len, len+1, ... } // (1, 3)
> > > +      In this case, the single pattern is: { base_elem len, len+1, ...}
> > > +      Let's assume that base_elem is used for indexing into arg0,
> > > +      and a1 ... ae chooses elements from arg1.
> > > +      So res = { arg0[base_elem], arg1[0], arg1[1], ... } // (1, 3)
> > > +      Which creates an incorrect encoding with S = arg1[1] - arg1[0]
> > > +      while the original encoding in arg1 is
> > > +      arg1: { arg1[0], arg1[1], arg1[2], ... }
> > > +      with S = arg1[2] - arg1[1].
> > > +
> > > +      As a concrete example, for above PR:
> > > +      arg0: { -16, -9, -10, -11 }
> > > +      arg1: { -12, -5, -6, -7 }
> > > +      sel = { 3, 4, 5, 6 }
> > > +
> > > +      arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3.
> > > +      Since a1 = 4 and arg_len = 4, it ended up creating the result with
> > > +      following encoding:
> > > +      res = { arg0[3], arg1[0], arg1[1] } // (1, 3)
> > > +          = { -11, -12, -5 }
> > > +
> > > +      So for res[3], it used S = (-5) - (-12) = 7
> > > +      And hence computed it as -5 + 7 = 2.
> > > +      instead of arg1[2], ie, -6, which is the correct value.
> > > +      Ensure that valid_mask_for_fold_vec_perm_cst returns false for this case.  */
> > > +      {
> > > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3);
> > > +     tree arg1 = build_vec_cst_rand (vmode, 1, 3);
> > > +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> > > +
> > > +     vec_perm_builder builder (len, 1, 3);
> > > +     poly_uint64 mask_elems[] = { 0, len, len+1 };
> > > +     builder_push_elems (builder, mask_elems);
> > > +
> > > +     vec_perm_indices sel (builder, 2, len);
> > > +     const char *reason;
> > > +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason);
> > > +     ASSERT_TRUE (res == NULL_TREE);
> > > +     ASSERT_TRUE (!strcmp (reason,
> > > +                           "selecting base element of input vector"));
> > > +      }
> > >      }
> > >  }
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c
> > > new file mode 100644
> > > index 00000000000..093e2b02654
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c
> > > @@ -0,0 +1,23 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > +
> > > +int a;
> > > +int *b = &a;
> > > +static int **c = &b;
> > > +static int d;
> > > +short e;
> > > +short f;
> > > +
> > > +_Bool foo ()
> > > +{
> > > +  f = -21;
> > > +  for (; f < -5; f++) {
> > > +    e = f ^ 3;
> > > +    d = *b;
> > > +    **c = e;
> > > +  }
> > > +
> > > +  return d == -6;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */
Prathamesh Kulkarni Oct. 12, 2023, 9:54 a.m. UTC | #4
On Wed, 11 Oct 2023 at 16:57, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> On Wed, 11 Oct 2023 at 16:42, Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
> >
> > On Mon, 9 Oct 2023 at 17:05, Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> > >
> > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > > > Hi,
> > > > The attached patch attempts to fix PR111648.
> > > > As mentioned in PR, the issue is when a1 is a multiple of vector
> > > > length, we end up creating following encoding in result: { base_elem,
> > > > arg[0], arg[1], ... } (assuming S = 1),
> > > > where arg is chosen input vector, which is incorrect, since the
> > > > encoding originally in arg would be: { arg[0], arg[1], arg[2], ... }
> > > >
> > > > For the test-case mentioned in PR, vectorizer pass creates
> > > > VEC_PERM_EXPR<arg0, arg, sel> where:
> > > > arg0: { -16, -9, -10, -11 }
> > > > arg1: { -12, -5, -6, -7 }
> > > > sel = { 3, 4, 5, 6 }
> > > >
> > > > arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3.
> > > > Since a1 = 4 and arg_len = 4, it ended up creating the result with
> > > > following encoding:
> > > > res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern = 3
> > > >       = { -11, -12, -5 }
> > > >
> > > > So for res[3], it used S = (-5) - (-12) = 7
> > > > And hence computed it as -5 + 7 = 2.
> > > > instead of selecting arg1[2], ie, -6.
> > > >
> > > > The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is a multiple
> > > > of vector length, so a1 ... ae select elements only from stepped part
> > > > of the pattern
> > > > from input vector and return false for this case.
> > > >
> > > > Since the vectors are VLS, fold_vec_perm_cst then sets:
> > > > res_npatterns = res_nelts
> > > > res_nelts_per_pattern  = 1
> > > > which seems to fix the issue by encoding all the elements.
> > > >
> > > > The patch resulted in Case 4 and Case 5 failing from test_nunits_min_2 because
> > > > they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively,
> > > > which used a1 = 0, and thus selected arg1[0].
> > > >
> > > > I removed Case 4 because it was already covered in test_nunits_min_4,
> > > > and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... }
> > > > and added a new Case 9 to test for this issue.
> > > >
> > > > Passes bootstrap+test on aarch64-linux-gnu with and without SVE,
> > > > and on x86_64-linux-gnu.
> > > > Does the patch look OK ?
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > >
> > > > [PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.
> > > >
> > > > gcc/ChangeLog:
> > > >       PR tree-optimization/111648
> > > >       * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if a1
> > > >       is a multiple of vector length.
> > > >       (test_nunits_min_2): Remove Case 4 and move Case 5 to ...
> > > >       (test_nunits_min_4): ... here and rename case numbers. Also add
> > > >       Case 9.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >       PR tree-optimization/111648
> > > >       * gcc.dg/vect/pr111648.c: New test.
> > > >
> > > >
> > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > > > index 4f8561509ff..c5f421d6b76 100644
> > > > --- a/gcc/fold-const.cc
> > > > +++ b/gcc/fold-const.cc
> > > > @@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
> > > >         return false;
> > > >       }
> > > >
> > > > -      /* Ensure that the stepped sequence always selects from the same
> > > > -      input pattern.  */
> > > > +      /* Ensure that the stepped sequence always selects from the stepped
> > > > +      part of same input pattern.  */
> > > >        unsigned arg_npatterns
> > > >       = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
> > > >                         : VECTOR_CST_NPATTERNS (arg1);
> > > > @@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
> > > >           *reason = "step is not multiple of npatterns";
> > > >         return false;
> > > >       }
> > > > +
> > > > +      /* If a1 is a multiple of len, it will select base element of input
> > > > +      vector resulting in following encoding:
> > > > +      { base_elem, arg[0], arg[1], ... } where arg is the chosen input
> > > > +      vector. This encoding is not originally present in arg, since it's
> > > > +      defined as:
> > > > +      { arg[0], arg[1], arg[2], ... }.  */
> > > > +
> > > > +      if (multiple_p (a1, arg_len))
> > > > +     {
> > > > +       if (reason)
> > > > +         *reason = "selecting base element of input vector";
> > > > +       return false;
> > > > +     }
> > >
> > > That wouldn't catch (for example) cases where a1 == arg_len + 1 and the
> > > second argument has 2 stepped patterns.
> > Ah right, thanks for pointing out. In the attached patch I extended the check
> > so that r1 < arg_npatterns which should check if we are choosing base
> > elements from any of the patterns in arg (and not just first).
> > Does that look OK ?
> > >
> > > The equivalent condition that handles multiple patterns would
> > > probably be to reject q1 < arg_npatterns.  But that's only necessary if:
> > Sorry, I don't understand -- we use q1 only for determining which
> > vector to choose from,
> > and r1 will give the index for first element ?
> > >
> > > (1) the argument has three elements per pattern (i.e. has a stepped
> > >     sequence) and
> > >
> > > (2) element 2 - element 1 != element 1 - element 0
> > >
> > > I think we should check those to avoid pessimising VLA cases.
> > Thanks for the suggestions. In attached POC patch (stage-1 tested), I
> > added the above checks,
> > does it look in the right direction ? Also, should this patch be the
> > right fix for PR111754 ?
> Oops sorry, this patch causes build errors on aarch64. Please ignore it.
> Sorry for the noise.
Hi Richard,
The attached patch passes bootstrap+test on aarch64-linux-gnu with and
without SVE,
and on x86_64-linux-gnu.
Does it look OK ?

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Richard
> > >
> > > >      }
> > > >
> > > >    return true;
> > > > @@ -17425,47 +17439,6 @@ test_nunits_min_2 (machine_mode vmode)
> > > >       tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) };
> > > >       validate_res (2, 2, res, expected_res);
> > > >        }
> > > > -
> > > > -      /* Case 4: mask = {0, 0, 1, ...} // (1, 3)
> > > > -      Test that the stepped sequence of the pattern selects from
> > > > -      same input pattern. Since input vectors have npatterns = 2,
> > > > -      and step (a2 - a1) = 1, step is not a multiple of npatterns
> > > > -      in input vector. So return NULL_TREE.  */
> > > > -      {
> > > > -     tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
> > > > -     tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
> > > > -     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> > > > -
> > > > -     vec_perm_builder builder (len, 1, 3);
> > > > -     poly_uint64 mask_elems[] = { 0, 0, 1 };
> > > > -     builder_push_elems (builder, mask_elems);
> > > > -
> > > > -     vec_perm_indices sel (builder, 2, len);
> > > > -     const char *reason;
> > > > -     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel,
> > > > -                                   &reason);
> > > > -     ASSERT_TRUE (res == NULL_TREE);
> > > > -     ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns"));
> > > > -      }
> > > > -
> > > > -      /* Case 5: mask = {len, 0, 1, ...} // (1, 3)
> > > > -      Test that stepped sequence of the pattern selects from arg0.
> > > > -      res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
> > > > -      {
> > > > -     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> > > > -     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> > > > -     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> > > > -
> > > > -     vec_perm_builder builder (len, 1, 3);
> > > > -     poly_uint64 mask_elems[] = { len, 0, 1 };
> > > > -     builder_push_elems (builder, mask_elems);
> > > > -
> > > > -     vec_perm_indices sel (builder, 2, len);
> > > > -     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> > > > -
> > > > -     tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) };
> > > > -     validate_res (1, 3, res, expected_res);
> > > > -      }
> > > >      }
> > > >  }
> > > >
> > > > @@ -17528,7 +17501,26 @@ test_nunits_min_4 (machine_mode vmode)
> > > >       validate_res (1, 3, res, expected_res);
> > > >        }
> > > >
> > > > -      /* Case 4:
> > > > +      /* Case 4: mask = {len, 1, 2, ...} // (1, 3)
> > > > +      Test that stepped sequence of the pattern selects from arg0.
> > > > +      res = { arg1[0], arg0[1], arg0[2], ... } // (1, 3)  */
> > > > +      {
> > > > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> > > > +     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> > > > +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> > > > +
> > > > +     vec_perm_builder builder (len, 1, 3);
> > > > +     poly_uint64 mask_elems[] = { len, 1, 2 };
> > > > +     builder_push_elems (builder, mask_elems);
> > > > +
> > > > +     vec_perm_indices sel (builder, 2, len);
> > > > +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> > > > +
> > > > +     tree expected_res[] = { ARG1(0), ARG0(1), ARG0(2) };
> > > > +     validate_res (1, 3, res, expected_res);
> > > > +      }
> > > > +
> > > > +      /* Case 5:
> > > >       sel = { len, 0, 2, ... } // (1, 3)
> > > >       This should return NULL because we cross the input vectors.
> > > >       Because,
> > > > @@ -17561,7 +17553,7 @@ test_nunits_min_4 (machine_mode vmode)
> > > >       ASSERT_TRUE (!strcmp (reason, "crossed input vectors"));
> > > >        }
> > > >
> > > > -      /* Case 5: npatterns(arg0) = 4 > npatterns(sel) = 2
> > > > +      /* Case 6: npatterns(arg0) = 4 > npatterns(sel) = 2
> > > >        mask = { 0, len, 1, len + 1, ...} // (2, 2)
> > > >        res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2)
> > > >
> > > > @@ -17583,7 +17575,7 @@ test_nunits_min_4 (machine_mode vmode)
> > > >       validate_res (2, 2, res, expected_res);
> > > >        }
> > > >
> > > > -      /* Case 6: Test combination in sel, where one pattern is dup and other
> > > > +      /* Case 7: Test combination in sel, where one pattern is dup and other
> > > >        is stepped sequence.
> > > >        sel = { 0, 0, 0, 1, 0, 2, ... } // (2, 3)
> > > >        res = { arg0[0], arg0[0], arg0[0],
> > > > @@ -17605,7 +17597,7 @@ test_nunits_min_4 (machine_mode vmode)
> > > >       validate_res (2, 3, res, expected_res);
> > > >        }
> > > >
> > > > -      /* Case 7: PR111048: Check that we set arg_npatterns correctly,
> > > > +      /* Case 8: PR111048: Check that we set arg_npatterns correctly,
> > > >        when arg0, arg1 and sel have different number of patterns.
> > > >        arg0 is of shape (1, 1)
> > > >        arg1 is of shape (4, 1)
> > > > @@ -17634,6 +17626,51 @@ test_nunits_min_4 (machine_mode vmode)
> > > >       ASSERT_TRUE (res == NULL_TREE);
> > > >       ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns"));
> > > >        }
> > > > +
> > > > +      /* Case 9: PR111648 - a1 is multiple of vector length,
> > > > +      which results in incorrect encoding. Verify that we return
> > > > +      NULL for this case.
> > > > +      sel = { base_elem, len, len+1, ... } // (1, 3)
> > > > +      In this case, the single pattern is: { base_elem len, len+1, ...}
> > > > +      Let's assume that base_elem is used for indexing into arg0,
> > > > +      and a1 ... ae chooses elements from arg1.
> > > > +      So res = { arg0[base_elem], arg1[0], arg1[1], ... } // (1, 3)
> > > > +      Which creates an incorrect encoding with S = arg1[1] - arg1[0]
> > > > +      while the original encoding in arg1 is
> > > > +      arg1: { arg1[0], arg1[1], arg1[2], ... }
> > > > +      with S = arg1[2] - arg1[1].
> > > > +
> > > > +      As a concrete example, for above PR:
> > > > +      arg0: { -16, -9, -10, -11 }
> > > > +      arg1: { -12, -5, -6, -7 }
> > > > +      sel = { 3, 4, 5, 6 }
> > > > +
> > > > +      arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3.
> > > > +      Since a1 = 4 and arg_len = 4, it ended up creating the result with
> > > > +      following encoding:
> > > > +      res = { arg0[3], arg1[0], arg1[1] } // (1, 3)
> > > > +          = { -11, -12, -5 }
> > > > +
> > > > +      So for res[3], it used S = (-5) - (-12) = 7
> > > > +      And hence computed it as -5 + 7 = 2.
> > > > +      instead of arg1[2], ie, -6, which is the correct value.
> > > > +      Ensure that valid_mask_for_fold_vec_perm_cst returns false for this case.  */
> > > > +      {
> > > > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3);
> > > > +     tree arg1 = build_vec_cst_rand (vmode, 1, 3);
> > > > +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> > > > +
> > > > +     vec_perm_builder builder (len, 1, 3);
> > > > +     poly_uint64 mask_elems[] = { 0, len, len+1 };
> > > > +     builder_push_elems (builder, mask_elems);
> > > > +
> > > > +     vec_perm_indices sel (builder, 2, len);
> > > > +     const char *reason;
> > > > +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason);
> > > > +     ASSERT_TRUE (res == NULL_TREE);
> > > > +     ASSERT_TRUE (!strcmp (reason,
> > > > +                           "selecting base element of input vector"));
> > > > +      }
> > > >      }
> > > >  }
> > > >
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c
> > > > new file mode 100644
> > > > index 00000000000..093e2b02654
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c
> > > > @@ -0,0 +1,23 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > > +
> > > > +int a;
> > > > +int *b = &a;
> > > > +static int **c = &b;
> > > > +static int d;
> > > > +short e;
> > > > +short f;
> > > > +
> > > > +_Bool foo ()
> > > > +{
> > > > +  f = -21;
> > > > +  for (; f < -5; f++) {
> > > > +    e = f ^ 3;
> > > > +    d = *b;
> > > > +    **c = e;
> > > > +  }
> > > > +
> > > > +  return d == -6;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 4f8561509ff..55a6a68c16c 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -10684,9 +10684,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
 
       /* Ensure that the stepped sequence always selects from the same
 	 input pattern.  */
-      unsigned arg_npatterns
-	= ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
-			  : VECTOR_CST_NPATTERNS (arg1);
+      tree arg = ((q1 & 1) == 0) ? arg0 : arg1;
+      unsigned arg_npatterns = VECTOR_CST_NPATTERNS (arg);
 
       if (!multiple_p (step, arg_npatterns))
 	{
@@ -10694,6 +10693,29 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
 	    *reason = "step is not multiple of npatterns";
 	  return false;
 	}
+
+      /* If a1 chooses base element from arg, ensure that it's a natural
+	 stepped sequence, ie, (arg[2] - arg[1]) == (arg[1] - arg[0])
+	 to preserve arg's encoding.  */
+
+      unsigned HOST_WIDE_INT index;
+      if (!r1.is_constant (&index))
+	return false;
+      if (index < arg_npatterns)
+	{
+	  tree arg_elem0 = vector_cst_elt (arg, index);
+	  tree arg_elem1 = vector_cst_elt (arg, index + arg_npatterns);
+	  tree arg_elem2 = vector_cst_elt (arg, index + arg_npatterns * 2);
+
+	  if (!operand_equal_p (const_binop (MINUS_EXPR, arg_elem2, arg_elem1),
+				const_binop (MINUS_EXPR, arg_elem1, arg_elem0),
+				0))
+	    {
+	      if (reason)
+		*reason = "not a natural stepped sequence";
+	      return false;
+	    }
+	}
     }
 
   return true;
@@ -17161,7 +17183,8 @@ namespace test_fold_vec_perm_cst {
 static tree
 build_vec_cst_rand (machine_mode vmode, unsigned npatterns,
 		    unsigned nelts_per_pattern,
-		    int step = 0, int threshold = 100)
+		    int step = 0, bool natural_stepped = false,
+		    int threshold = 100)
 {
   tree inner_type = lang_hooks.types.type_for_mode (GET_MODE_INNER (vmode), 1);
   tree vectype = build_vector_type_for_mode (inner_type, vmode);
@@ -17176,17 +17199,28 @@ build_vec_cst_rand (machine_mode vmode, unsigned npatterns,
 
   // Fill a1 for each pattern
   for (unsigned i = 0; i < npatterns; i++)
-    builder.quick_push (build_int_cst (inner_type, rand () % threshold));
-
+    {
+      tree a1;
+      if (natural_stepped)
+	{
+	  tree a0 = builder[i];
+	  wide_int a0_val = wi::to_wide (a0);
+	  wide_int a1_val = a0_val + step;
+	  a1 = wide_int_to_tree (inner_type, a1_val);
+	}
+      else
+	a1 = build_int_cst (inner_type, rand () % threshold);
+      builder.quick_push (a1);
+    }
   if (nelts_per_pattern == 2)
     return builder.build ();
 
   for (unsigned i = npatterns * 2; i < npatterns * nelts_per_pattern; i++)
     {
       tree prev_elem = builder[i - npatterns];
-      int prev_elem_val = TREE_INT_CST_LOW (prev_elem);
-      int val = prev_elem_val + step;
-      builder.quick_push (build_int_cst (inner_type, val));
+      wide_int prev_elem_val = wi::to_wide (prev_elem);
+      wide_int val = prev_elem_val + step;
+      builder.quick_push (wide_int_to_tree (inner_type, val));
     }
 
   return builder.build ();
@@ -17432,7 +17466,7 @@ test_nunits_min_2 (machine_mode vmode)
 	 and step (a2 - a1) = 1, step is not a multiple of npatterns
 	 in input vector. So return NULL_TREE.  */
       {
-	tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
+	tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1, true);
 	tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
 	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
 
@@ -17452,7 +17486,7 @@ test_nunits_min_2 (machine_mode vmode)
 	 Test that stepped sequence of the pattern selects from arg0.
 	 res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
       {
-	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
+	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1, true);
 	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
 	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
 
@@ -17466,6 +17500,62 @@ test_nunits_min_2 (machine_mode vmode)
 	tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) };
 	validate_res (1, 3, res, expected_res);
       }
+
+      /* Case 6: PR111648 - a1 chooses base element from input vector arg.
+	 In this case ensure that arg has a natural stepped sequence
+	 to preserve arg's encoding.
+
+	 As a concrete example, consider:
+	 arg0: { -16, -9, -10, ... } // (1, 3)
+	 arg1: { -12, -5, -6, ... }  // (1, 3)
+	 sel = { 0, len, len + 1, ... } // (1, 3)
+
+	 This will create res with following encoding:
+	 res = { arg0[0], arg1[0], arg1[1], ... } // (1, 3)
+	     = { -16, -12, -5, ... }
+
+	 The step in above encoding would be: (-5) - (-12) = 7
+	 And hence res[3] would be computed as -5 + 7 = 2.
+	 instead of arg1[2], ie, -6.
+	 Ensure that valid_mask_for_fold_vec_perm_cst returns false
+	 for this case.  */
+      {
+	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
+	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
+	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
+
+	vec_perm_builder builder (len, 1, 3);
+	poly_uint64 mask_elems[] = { 0, len, len+1 };
+	builder_push_elems (builder, mask_elems);
+
+	vec_perm_indices sel (builder, 2, len);
+	const char *reason;
+	/* FIXME: It may happen that build_vec_cst_rand may build a natural
+	   stepped pattern, even if we didn't explicitly tell it to. So folding
+	   may not always fail, but if it does, ensure that's because arg1 does
+	   not have a natural stepped sequence (and not due to other reason)  */
+	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason);
+	if (res == NULL_TREE)
+	  ASSERT_TRUE (!strcmp (reason, "not a natural stepped sequence"));
+      }
+
+      /* Case 7: Same as Case 6, except that arg1 contains natural stepped
+	 sequence and thus folding should be valid for this case.  */
+      {
+	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
+	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1, true);
+	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
+
+	vec_perm_builder builder (len, 1, 3);
+	poly_uint64 mask_elems[] = { 0, len, len+1 };
+	builder_push_elems (builder, mask_elems);
+
+	vec_perm_indices sel (builder, 2, len);
+	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
+
+	tree expected_res[] = { ARG0(0), ARG1(0), ARG1(1) };
+	validate_res (1, 3, res, expected_res);
+      }
     }
 }
 
diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c
new file mode 100644
index 00000000000..093e2b02654
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr111648.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int a;
+int *b = &a;
+static int **c = &b;
+static int d;
+short e;
+short f;
+
+_Bool foo ()
+{
+  f = -21;
+  for (; f < -5; f++) {
+    e = f ^ 3;
+    d = *b;
+    **c = e;
+  }
+
+  return d == -6;
+}
+
+/* { dg-final { scan-tree-dump "return 1" "optimized" } } */
Richard Sandiford Oct. 16, 2023, 9:10 p.m. UTC | #5
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> On Wed, 11 Oct 2023 at 16:57, Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
>>
>> On Wed, 11 Oct 2023 at 16:42, Prathamesh Kulkarni
>> <prathamesh.kulkarni@linaro.org> wrote:
>> >
>> > On Mon, 9 Oct 2023 at 17:05, Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> > >
>> > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
>> > > > Hi,
>> > > > The attached patch attempts to fix PR111648.
>> > > > As mentioned in PR, the issue is when a1 is a multiple of vector
>> > > > length, we end up creating following encoding in result: { base_elem,
>> > > > arg[0], arg[1], ... } (assuming S = 1),
>> > > > where arg is chosen input vector, which is incorrect, since the
>> > > > encoding originally in arg would be: { arg[0], arg[1], arg[2], ... }
>> > > >
>> > > > For the test-case mentioned in PR, vectorizer pass creates
>> > > > VEC_PERM_EXPR<arg0, arg, sel> where:
>> > > > arg0: { -16, -9, -10, -11 }
>> > > > arg1: { -12, -5, -6, -7 }
>> > > > sel = { 3, 4, 5, 6 }
>> > > >
>> > > > arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3.
>> > > > Since a1 = 4 and arg_len = 4, it ended up creating the result with
>> > > > following encoding:
>> > > > res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern = 3
>> > > >       = { -11, -12, -5 }
>> > > >
>> > > > So for res[3], it used S = (-5) - (-12) = 7
>> > > > And hence computed it as -5 + 7 = 2.
>> > > > instead of selecting arg1[2], ie, -6.
>> > > >
>> > > > The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is a multiple
>> > > > of vector length, so a1 ... ae select elements only from stepped part
>> > > > of the pattern
>> > > > from input vector and return false for this case.
>> > > >
>> > > > Since the vectors are VLS, fold_vec_perm_cst then sets:
>> > > > res_npatterns = res_nelts
>> > > > res_nelts_per_pattern  = 1
>> > > > which seems to fix the issue by encoding all the elements.
>> > > >
>> > > > The patch resulted in Case 4 and Case 5 failing from test_nunits_min_2 because
>> > > > they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively,
>> > > > which used a1 = 0, and thus selected arg1[0].
>> > > >
>> > > > I removed Case 4 because it was already covered in test_nunits_min_4,
>> > > > and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... }
>> > > > and added a new Case 9 to test for this issue.
>> > > >
>> > > > Passes bootstrap+test on aarch64-linux-gnu with and without SVE,
>> > > > and on x86_64-linux-gnu.
>> > > > Does the patch look OK ?
>> > > >
>> > > > Thanks,
>> > > > Prathamesh
>> > > >
>> > > > [PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.
>> > > >
>> > > > gcc/ChangeLog:
>> > > >       PR tree-optimization/111648
>> > > >       * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if a1
>> > > >       is a multiple of vector length.
>> > > >       (test_nunits_min_2): Remove Case 4 and move Case 5 to ...
>> > > >       (test_nunits_min_4): ... here and rename case numbers. Also add
>> > > >       Case 9.
>> > > >
>> > > > gcc/testsuite/ChangeLog:
>> > > >       PR tree-optimization/111648
>> > > >       * gcc.dg/vect/pr111648.c: New test.
>> > > >
>> > > >
>> > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
>> > > > index 4f8561509ff..c5f421d6b76 100644
>> > > > --- a/gcc/fold-const.cc
>> > > > +++ b/gcc/fold-const.cc
>> > > > @@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
>> > > >         return false;
>> > > >       }
>> > > >
>> > > > -      /* Ensure that the stepped sequence always selects from the same
>> > > > -      input pattern.  */
>> > > > +      /* Ensure that the stepped sequence always selects from the stepped
>> > > > +      part of same input pattern.  */
>> > > >        unsigned arg_npatterns
>> > > >       = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
>> > > >                         : VECTOR_CST_NPATTERNS (arg1);
>> > > > @@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
>> > > >           *reason = "step is not multiple of npatterns";
>> > > >         return false;
>> > > >       }
>> > > > +
>> > > > +      /* If a1 is a multiple of len, it will select base element of input
>> > > > +      vector resulting in following encoding:
>> > > > +      { base_elem, arg[0], arg[1], ... } where arg is the chosen input
>> > > > +      vector. This encoding is not originally present in arg, since it's
>> > > > +      defined as:
>> > > > +      { arg[0], arg[1], arg[2], ... }.  */
>> > > > +
>> > > > +      if (multiple_p (a1, arg_len))
>> > > > +     {
>> > > > +       if (reason)
>> > > > +         *reason = "selecting base element of input vector";
>> > > > +       return false;
>> > > > +     }
>> > >
>> > > That wouldn't catch (for example) cases where a1 == arg_len + 1 and the
>> > > second argument has 2 stepped patterns.
>> > Ah right, thanks for pointing out. In the attached patch I extended the check
>> > so that r1 < arg_npatterns which should check if we are choosing base
>> > elements from any of the patterns in arg (and not just first).
>> > Does that look OK ?
>> > >
>> > > The equivalent condition that handles multiple patterns would
>> > > probably be to reject q1 < arg_npatterns.  But that's only necessary if:
>> > Sorry, I don't understand -- we use q1 only for determining which
>> > vector to choose from,
>> > and r1 will give the index for first element ?
>> > >
>> > > (1) the argument has three elements per pattern (i.e. has a stepped
>> > >     sequence) and
>> > >
>> > > (2) element 2 - element 1 != element 1 - element 0
>> > >
>> > > I think we should check those to avoid pessimising VLA cases.
>> > Thanks for the suggestions. In attached POC patch (stage-1 tested), I
>> > added the above checks,
>> > does it look in the right direction ? Also, should this patch be the
>> > right fix for PR111754 ?
>> Oops sorry, this patch causes build errors on aarch64. Please ignore it.
>> Sorry for the noise.
> Hi Richard,
> The attached patch passes bootstrap+test on aarch64-linux-gnu with and
> without SVE,
> and on x86_64-linux-gnu.
> Does it look OK ?
>
> Thanks,
> Prathamesh
>>
>> Thanks,
>> Prathamesh
>> >
>> > Thanks,
>> > Prathamesh
>> > >
>> > > Thanks,
>> > > Richard
>> > >
>> > > >      }
>> > > >
>> > > >    return true;
>> > > > @@ -17425,47 +17439,6 @@ test_nunits_min_2 (machine_mode vmode)
>> > > >       tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) };
>> > > >       validate_res (2, 2, res, expected_res);
>> > > >        }
>> > > > -
>> > > > -      /* Case 4: mask = {0, 0, 1, ...} // (1, 3)
>> > > > -      Test that the stepped sequence of the pattern selects from
>> > > > -      same input pattern. Since input vectors have npatterns = 2,
>> > > > -      and step (a2 - a1) = 1, step is not a multiple of npatterns
>> > > > -      in input vector. So return NULL_TREE.  */
>> > > > -      {
>> > > > -     tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
>> > > > -     tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
>> > > > -     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>> > > > -
>> > > > -     vec_perm_builder builder (len, 1, 3);
>> > > > -     poly_uint64 mask_elems[] = { 0, 0, 1 };
>> > > > -     builder_push_elems (builder, mask_elems);
>> > > > -
>> > > > -     vec_perm_indices sel (builder, 2, len);
>> > > > -     const char *reason;
>> > > > -     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel,
>> > > > -                                   &reason);
>> > > > -     ASSERT_TRUE (res == NULL_TREE);
>> > > > -     ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns"));
>> > > > -      }
>> > > > -
>> > > > -      /* Case 5: mask = {len, 0, 1, ...} // (1, 3)
>> > > > -      Test that stepped sequence of the pattern selects from arg0.
>> > > > -      res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
>> > > > -      {
>> > > > -     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
>> > > > -     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
>> > > > -     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>> > > > -
>> > > > -     vec_perm_builder builder (len, 1, 3);
>> > > > -     poly_uint64 mask_elems[] = { len, 0, 1 };
>> > > > -     builder_push_elems (builder, mask_elems);
>> > > > -
>> > > > -     vec_perm_indices sel (builder, 2, len);
>> > > > -     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
>> > > > -
>> > > > -     tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) };
>> > > > -     validate_res (1, 3, res, expected_res);
>> > > > -      }
>> > > >      }
>> > > >  }
>> > > >
>> > > > @@ -17528,7 +17501,26 @@ test_nunits_min_4 (machine_mode vmode)
>> > > >       validate_res (1, 3, res, expected_res);
>> > > >        }
>> > > >
>> > > > -      /* Case 4:
>> > > > +      /* Case 4: mask = {len, 1, 2, ...} // (1, 3)
>> > > > +      Test that stepped sequence of the pattern selects from arg0.
>> > > > +      res = { arg1[0], arg0[1], arg0[2], ... } // (1, 3)  */
>> > > > +      {
>> > > > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
>> > > > +     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
>> > > > +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>> > > > +
>> > > > +     vec_perm_builder builder (len, 1, 3);
>> > > > +     poly_uint64 mask_elems[] = { len, 1, 2 };
>> > > > +     builder_push_elems (builder, mask_elems);
>> > > > +
>> > > > +     vec_perm_indices sel (builder, 2, len);
>> > > > +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
>> > > > +
>> > > > +     tree expected_res[] = { ARG1(0), ARG0(1), ARG0(2) };
>> > > > +     validate_res (1, 3, res, expected_res);
>> > > > +      }
>> > > > +
>> > > > +      /* Case 5:
>> > > >       sel = { len, 0, 2, ... } // (1, 3)
>> > > >       This should return NULL because we cross the input vectors.
>> > > >       Because,
>> > > > @@ -17561,7 +17553,7 @@ test_nunits_min_4 (machine_mode vmode)
>> > > >       ASSERT_TRUE (!strcmp (reason, "crossed input vectors"));
>> > > >        }
>> > > >
>> > > > -      /* Case 5: npatterns(arg0) = 4 > npatterns(sel) = 2
>> > > > +      /* Case 6: npatterns(arg0) = 4 > npatterns(sel) = 2
>> > > >        mask = { 0, len, 1, len + 1, ...} // (2, 2)
>> > > >        res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2)
>> > > >
>> > > > @@ -17583,7 +17575,7 @@ test_nunits_min_4 (machine_mode vmode)
>> > > >       validate_res (2, 2, res, expected_res);
>> > > >        }
>> > > >
>> > > > -      /* Case 6: Test combination in sel, where one pattern is dup and other
>> > > > +      /* Case 7: Test combination in sel, where one pattern is dup and other
>> > > >        is stepped sequence.
>> > > >        sel = { 0, 0, 0, 1, 0, 2, ... } // (2, 3)
>> > > >        res = { arg0[0], arg0[0], arg0[0],
>> > > > @@ -17605,7 +17597,7 @@ test_nunits_min_4 (machine_mode vmode)
>> > > >       validate_res (2, 3, res, expected_res);
>> > > >        }
>> > > >
>> > > > -      /* Case 7: PR111048: Check that we set arg_npatterns correctly,
>> > > > +      /* Case 8: PR111048: Check that we set arg_npatterns correctly,
>> > > >        when arg0, arg1 and sel have different number of patterns.
>> > > >        arg0 is of shape (1, 1)
>> > > >        arg1 is of shape (4, 1)
>> > > > @@ -17634,6 +17626,51 @@ test_nunits_min_4 (machine_mode vmode)
>> > > >       ASSERT_TRUE (res == NULL_TREE);
>> > > >       ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns"));
>> > > >        }
>> > > > +
>> > > > +      /* Case 9: PR111648 - a1 is multiple of vector length,
>> > > > +      which results in incorrect encoding. Verify that we return
>> > > > +      NULL for this case.
>> > > > +      sel = { base_elem, len, len+1, ... } // (1, 3)
>> > > > +      In this case, the single pattern is: { base_elem len, len+1, ...}
>> > > > +      Let's assume that base_elem is used for indexing into arg0,
>> > > > +      and a1 ... ae chooses elements from arg1.
>> > > > +      So res = { arg0[base_elem], arg1[0], arg1[1], ... } // (1, 3)
>> > > > +      Which creates an incorrect encoding with S = arg1[1] - arg1[0]
>> > > > +      while the original encoding in arg1 is
>> > > > +      arg1: { arg1[0], arg1[1], arg1[2], ... }
>> > > > +      with S = arg1[2] - arg1[1].
>> > > > +
>> > > > +      As a concrete example, for above PR:
>> > > > +      arg0: { -16, -9, -10, -11 }
>> > > > +      arg1: { -12, -5, -6, -7 }
>> > > > +      sel = { 3, 4, 5, 6 }
>> > > > +
>> > > > +      arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3.
>> > > > +      Since a1 = 4 and arg_len = 4, it ended up creating the result with
>> > > > +      following encoding:
>> > > > +      res = { arg0[3], arg1[0], arg1[1] } // (1, 3)
>> > > > +          = { -11, -12, -5 }
>> > > > +
>> > > > +      So for res[3], it used S = (-5) - (-12) = 7
>> > > > +      And hence computed it as -5 + 7 = 2.
>> > > > +      instead of arg1[2], ie, -6, which is the correct value.
>> > > > +      Ensure that valid_mask_for_fold_vec_perm_cst returns false for this case.  */
>> > > > +      {
>> > > > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3);
>> > > > +     tree arg1 = build_vec_cst_rand (vmode, 1, 3);
>> > > > +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>> > > > +
>> > > > +     vec_perm_builder builder (len, 1, 3);
>> > > > +     poly_uint64 mask_elems[] = { 0, len, len+1 };
>> > > > +     builder_push_elems (builder, mask_elems);
>> > > > +
>> > > > +     vec_perm_indices sel (builder, 2, len);
>> > > > +     const char *reason;
>> > > > +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason);
>> > > > +     ASSERT_TRUE (res == NULL_TREE);
>> > > > +     ASSERT_TRUE (!strcmp (reason,
>> > > > +                           "selecting base element of input vector"));
>> > > > +      }
>> > > >      }
>> > > >  }
>> > > >
>> > > > diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c
>> > > > new file mode 100644
>> > > > index 00000000000..093e2b02654
>> > > > --- /dev/null
>> > > > +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c
>> > > > @@ -0,0 +1,23 @@
>> > > > +/* { dg-do compile } */
>> > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
>> > > > +
>> > > > +int a;
>> > > > +int *b = &a;
>> > > > +static int **c = &b;
>> > > > +static int d;
>> > > > +short e;
>> > > > +short f;
>> > > > +
>> > > > +_Bool foo ()
>> > > > +{
>> > > > +  f = -21;
>> > > > +  for (; f < -5; f++) {
>> > > > +    e = f ^ 3;
>> > > > +    d = *b;
>> > > > +    **c = e;
>> > > > +  }
>> > > > +
>> > > > +  return d == -6;
>> > > > +}
>> > > > +
>> > > > +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */
>
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 4f8561509ff..55a6a68c16c 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -10684,9 +10684,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
>  
>        /* Ensure that the stepped sequence always selects from the same
>  	 input pattern.  */
> -      unsigned arg_npatterns
> -	= ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
> -			  : VECTOR_CST_NPATTERNS (arg1);
> +      tree arg = ((q1 & 1) == 0) ? arg0 : arg1;
> +      unsigned arg_npatterns = VECTOR_CST_NPATTERNS (arg);
>  
>        if (!multiple_p (step, arg_npatterns))
>  	{
> @@ -10694,6 +10693,29 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
>  	    *reason = "step is not multiple of npatterns";
>  	  return false;
>  	}
> +
> +      /* If a1 chooses base element from arg, ensure that it's a natural
> +	 stepped sequence, ie, (arg[2] - arg[1]) == (arg[1] - arg[0])
> +	 to preserve arg's encoding.  */
> +
> +      unsigned HOST_WIDE_INT index;
> +      if (!r1.is_constant (&index))
> +	return false;
> +      if (index < arg_npatterns)
> +	{

I don't know whether it matters in practice, but I think the two conditions
above are more natural as:

    if (maybe_lt (r1, arg_npatterns))
      {
        unsigned HOST_WIDE_INT index;
        if (!r1.is_constant (&index))
          return false;

        ...[code below]...
      }        

> +	  tree arg_elem0 = vector_cst_elt (arg, index);
> +	  tree arg_elem1 = vector_cst_elt (arg, index + arg_npatterns);
> +	  tree arg_elem2 = vector_cst_elt (arg, index + arg_npatterns * 2);
> +
> +	  if (!operand_equal_p (const_binop (MINUS_EXPR, arg_elem2, arg_elem1),
> +				const_binop (MINUS_EXPR, arg_elem1, arg_elem0),
> +				0))

This needs to check whether const_binop returns null.  Maybe:

   tree step1, step2;
   if (!(step1 = const_binop (MINUS_EXPR, arg_elem1, arg_elem0))
       || !(step2 = const_binop (MINUS_EXPR, arg_elem2, arg_elem1))
       || !operand_equal_p (step1, step2, 0))

OK with those changes, thanks.

Richard

> +	    {
> +	      if (reason)
> +		*reason = "not a natural stepped sequence";
> +	      return false;
> +	    }
> +	}
>      }
>  
>    return true;
> @@ -17161,7 +17183,8 @@ namespace test_fold_vec_perm_cst {
>  static tree
>  build_vec_cst_rand (machine_mode vmode, unsigned npatterns,
>  		    unsigned nelts_per_pattern,
> -		    int step = 0, int threshold = 100)
> +		    int step = 0, bool natural_stepped = false,
> +		    int threshold = 100)
>  {
>    tree inner_type = lang_hooks.types.type_for_mode (GET_MODE_INNER (vmode), 1);
>    tree vectype = build_vector_type_for_mode (inner_type, vmode);
> @@ -17176,17 +17199,28 @@ build_vec_cst_rand (machine_mode vmode, unsigned npatterns,
>  
>    // Fill a1 for each pattern
>    for (unsigned i = 0; i < npatterns; i++)
> -    builder.quick_push (build_int_cst (inner_type, rand () % threshold));
> -
> +    {
> +      tree a1;
> +      if (natural_stepped)
> +	{
> +	  tree a0 = builder[i];
> +	  wide_int a0_val = wi::to_wide (a0);
> +	  wide_int a1_val = a0_val + step;
> +	  a1 = wide_int_to_tree (inner_type, a1_val);
> +	}
> +      else
> +	a1 = build_int_cst (inner_type, rand () % threshold);
> +      builder.quick_push (a1);
> +    }
>    if (nelts_per_pattern == 2)
>      return builder.build ();
>  
>    for (unsigned i = npatterns * 2; i < npatterns * nelts_per_pattern; i++)
>      {
>        tree prev_elem = builder[i - npatterns];
> -      int prev_elem_val = TREE_INT_CST_LOW (prev_elem);
> -      int val = prev_elem_val + step;
> -      builder.quick_push (build_int_cst (inner_type, val));
> +      wide_int prev_elem_val = wi::to_wide (prev_elem);
> +      wide_int val = prev_elem_val + step;
> +      builder.quick_push (wide_int_to_tree (inner_type, val));
>      }
>  
>    return builder.build ();
> @@ -17432,7 +17466,7 @@ test_nunits_min_2 (machine_mode vmode)
>  	 and step (a2 - a1) = 1, step is not a multiple of npatterns
>  	 in input vector. So return NULL_TREE.  */
>        {
> -	tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
> +	tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1, true);
>  	tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
>  	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>  
> @@ -17452,7 +17486,7 @@ test_nunits_min_2 (machine_mode vmode)
>  	 Test that stepped sequence of the pattern selects from arg0.
>  	 res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
>        {
> -	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1, true);
>  	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
>  	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>  
> @@ -17466,6 +17500,62 @@ test_nunits_min_2 (machine_mode vmode)
>  	tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) };
>  	validate_res (1, 3, res, expected_res);
>        }
> +
> +      /* Case 6: PR111648 - a1 chooses base element from input vector arg.
> +	 In this case ensure that arg has a natural stepped sequence
> +	 to preserve arg's encoding.
> +
> +	 As a concrete example, consider:
> +	 arg0: { -16, -9, -10, ... } // (1, 3)
> +	 arg1: { -12, -5, -6, ... }  // (1, 3)
> +	 sel = { 0, len, len + 1, ... } // (1, 3)
> +
> +	 This will create res with following encoding:
> +	 res = { arg0[0], arg1[0], arg1[1], ... } // (1, 3)
> +	     = { -16, -12, -5, ... }
> +
> +	 The step in above encoding would be: (-5) - (-12) = 7
> +	 And hence res[3] would be computed as -5 + 7 = 2.
> +	 instead of arg1[2], ie, -6.
> +	 Ensure that valid_mask_for_fold_vec_perm_cst returns false
> +	 for this case.  */
> +      {
> +	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +	vec_perm_builder builder (len, 1, 3);
> +	poly_uint64 mask_elems[] = { 0, len, len+1 };
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, len);
> +	const char *reason;
> +	/* FIXME: It may happen that build_vec_cst_rand may build a natural
> +	   stepped pattern, even if we didn't explicitly tell it to. So folding
> +	   may not always fail, but if it does, ensure that's because arg1 does
> +	   not have a natural stepped sequence (and not due to other reason)  */
> +	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason);
> +	if (res == NULL_TREE)
> +	  ASSERT_TRUE (!strcmp (reason, "not a natural stepped sequence"));
> +      }
> +
> +      /* Case 7: Same as Case 6, except that arg1 contains natural stepped
> +	 sequence and thus folding should be valid for this case.  */
> +      {
> +	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1, true);
> +	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +	vec_perm_builder builder (len, 1, 3);
> +	poly_uint64 mask_elems[] = { 0, len, len+1 };
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, len);
> +	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +
> +	tree expected_res[] = { ARG0(0), ARG1(0), ARG1(1) };
> +	validate_res (1, 3, res, expected_res);
> +      }
>      }
>  }
>  
> diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c
> new file mode 100644
> index 00000000000..093e2b02654
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +int a;
> +int *b = &a;
> +static int **c = &b;
> +static int d;
> +short e;
> +short f;
> +
> +_Bool foo ()
> +{
> +  f = -21;
> +  for (; f < -5; f++) {
> +    e = f ^ 3;
> +    d = *b;
> +    **c = e;
> +  }
> +
> +  return d == -6;
> +}
> +
> +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */
Prathamesh Kulkarni Oct. 17, 2023, 6:53 p.m. UTC | #6
On Tue, 17 Oct 2023 at 02:40, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > On Wed, 11 Oct 2023 at 16:57, Prathamesh Kulkarni
> > <prathamesh.kulkarni@linaro.org> wrote:
> >>
> >> On Wed, 11 Oct 2023 at 16:42, Prathamesh Kulkarni
> >> <prathamesh.kulkarni@linaro.org> wrote:
> >> >
> >> > On Mon, 9 Oct 2023 at 17:05, Richard Sandiford
> >> > <richard.sandiford@arm.com> wrote:
> >> > >
> >> > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> >> > > > Hi,
> >> > > > The attached patch attempts to fix PR111648.
> >> > > > As mentioned in PR, the issue is when a1 is a multiple of vector
> >> > > > length, we end up creating following encoding in result: { base_elem,
> >> > > > arg[0], arg[1], ... } (assuming S = 1),
> >> > > > where arg is chosen input vector, which is incorrect, since the
> >> > > > encoding originally in arg would be: { arg[0], arg[1], arg[2], ... }
> >> > > >
> >> > > > For the test-case mentioned in PR, vectorizer pass creates
> >> > > > VEC_PERM_EXPR<arg0, arg, sel> where:
> >> > > > arg0: { -16, -9, -10, -11 }
> >> > > > arg1: { -12, -5, -6, -7 }
> >> > > > sel = { 3, 4, 5, 6 }
> >> > > >
> >> > > > arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3.
> >> > > > Since a1 = 4 and arg_len = 4, it ended up creating the result with
> >> > > > following encoding:
> >> > > > res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern = 3
> >> > > >       = { -11, -12, -5 }
> >> > > >
> >> > > > So for res[3], it used S = (-5) - (-12) = 7
> >> > > > And hence computed it as -5 + 7 = 2.
> >> > > > instead of selecting arg1[2], ie, -6.
> >> > > >
> >> > > > The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is a multiple
> >> > > > of vector length, so a1 ... ae select elements only from stepped part
> >> > > > of the pattern
> >> > > > from input vector and return false for this case.
> >> > > >
> >> > > > Since the vectors are VLS, fold_vec_perm_cst then sets:
> >> > > > res_npatterns = res_nelts
> >> > > > res_nelts_per_pattern  = 1
> >> > > > which seems to fix the issue by encoding all the elements.
> >> > > >
> >> > > > The patch resulted in Case 4 and Case 5 failing from test_nunits_min_2 because
> >> > > > they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively,
> >> > > > which used a1 = 0, and thus selected arg1[0].
> >> > > >
> >> > > > I removed Case 4 because it was already covered in test_nunits_min_4,
> >> > > > and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... }
> >> > > > and added a new Case 9 to test for this issue.
> >> > > >
> >> > > > Passes bootstrap+test on aarch64-linux-gnu with and without SVE,
> >> > > > and on x86_64-linux-gnu.
> >> > > > Does the patch look OK ?
> >> > > >
> >> > > > Thanks,
> >> > > > Prathamesh
> >> > > >
> >> > > > [PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.
> >> > > >
> >> > > > gcc/ChangeLog:
> >> > > >       PR tree-optimization/111648
> >> > > >       * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if a1
> >> > > >       is a multiple of vector length.
> >> > > >       (test_nunits_min_2): Remove Case 4 and move Case 5 to ...
> >> > > >       (test_nunits_min_4): ... here and rename case numbers. Also add
> >> > > >       Case 9.
> >> > > >
> >> > > > gcc/testsuite/ChangeLog:
> >> > > >       PR tree-optimization/111648
> >> > > >       * gcc.dg/vect/pr111648.c: New test.
> >> > > >
> >> > > >
> >> > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> >> > > > index 4f8561509ff..c5f421d6b76 100644
> >> > > > --- a/gcc/fold-const.cc
> >> > > > +++ b/gcc/fold-const.cc
> >> > > > @@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
> >> > > >         return false;
> >> > > >       }
> >> > > >
> >> > > > -      /* Ensure that the stepped sequence always selects from the same
> >> > > > -      input pattern.  */
> >> > > > +      /* Ensure that the stepped sequence always selects from the stepped
> >> > > > +      part of same input pattern.  */
> >> > > >        unsigned arg_npatterns
> >> > > >       = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
> >> > > >                         : VECTOR_CST_NPATTERNS (arg1);
> >> > > > @@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
> >> > > >           *reason = "step is not multiple of npatterns";
> >> > > >         return false;
> >> > > >       }
> >> > > > +
> >> > > > +      /* If a1 is a multiple of len, it will select base element of input
> >> > > > +      vector resulting in following encoding:
> >> > > > +      { base_elem, arg[0], arg[1], ... } where arg is the chosen input
> >> > > > +      vector. This encoding is not originally present in arg, since it's
> >> > > > +      defined as:
> >> > > > +      { arg[0], arg[1], arg[2], ... }.  */
> >> > > > +
> >> > > > +      if (multiple_p (a1, arg_len))
> >> > > > +     {
> >> > > > +       if (reason)
> >> > > > +         *reason = "selecting base element of input vector";
> >> > > > +       return false;
> >> > > > +     }
> >> > >
> >> > > That wouldn't catch (for example) cases where a1 == arg_len + 1 and the
> >> > > second argument has 2 stepped patterns.
> >> > Ah right, thanks for pointing out. In the attached patch I extended the check
> >> > so that r1 < arg_npatterns which should check if we are choosing base
> >> > elements from any of the patterns in arg (and not just first).
> >> > Does that look OK ?
> >> > >
> >> > > The equivalent condition that handles multiple patterns would
> >> > > probably be to reject q1 < arg_npatterns.  But that's only necessary if:
> >> > Sorry, I don't understand -- we use q1 only for determining which
> >> > vector to choose from,
> >> > and r1 will give the index for first element ?
> >> > >
> >> > > (1) the argument has three elements per pattern (i.e. has a stepped
> >> > >     sequence) and
> >> > >
> >> > > (2) element 2 - element 1 != element 1 - element 0
> >> > >
> >> > > I think we should check those to avoid pessimising VLA cases.
> >> > Thanks for the suggestions. In attached POC patch (stage-1 tested), I
> >> > added the above checks,
> >> > does it look in the right direction ? Also, should this patch be the
> >> > right fix for PR111754 ?
> >> Oops sorry, this patch causes build errors on aarch64. Please ignore it.
> >> Sorry for the noise.
> > Hi Richard,
> > The attached patch passes bootstrap+test on aarch64-linux-gnu with and
> > without SVE,
> > and on x86_64-linux-gnu.
> > Does it look OK ?
> >
> > Thanks,
> > Prathamesh
> >>
> >> Thanks,
> >> Prathamesh
> >> >
> >> > Thanks,
> >> > Prathamesh
> >> > >
> >> > > Thanks,
> >> > > Richard
> >> > >
> >> > > >      }
> >> > > >
> >> > > >    return true;
> >> > > > @@ -17425,47 +17439,6 @@ test_nunits_min_2 (machine_mode vmode)
> >> > > >       tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) };
> >> > > >       validate_res (2, 2, res, expected_res);
> >> > > >        }
> >> > > > -
> >> > > > -      /* Case 4: mask = {0, 0, 1, ...} // (1, 3)
> >> > > > -      Test that the stepped sequence of the pattern selects from
> >> > > > -      same input pattern. Since input vectors have npatterns = 2,
> >> > > > -      and step (a2 - a1) = 1, step is not a multiple of npatterns
> >> > > > -      in input vector. So return NULL_TREE.  */
> >> > > > -      {
> >> > > > -     tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
> >> > > > -     tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
> >> > > > -     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> >> > > > -
> >> > > > -     vec_perm_builder builder (len, 1, 3);
> >> > > > -     poly_uint64 mask_elems[] = { 0, 0, 1 };
> >> > > > -     builder_push_elems (builder, mask_elems);
> >> > > > -
> >> > > > -     vec_perm_indices sel (builder, 2, len);
> >> > > > -     const char *reason;
> >> > > > -     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel,
> >> > > > -                                   &reason);
> >> > > > -     ASSERT_TRUE (res == NULL_TREE);
> >> > > > -     ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns"));
> >> > > > -      }
> >> > > > -
> >> > > > -      /* Case 5: mask = {len, 0, 1, ...} // (1, 3)
> >> > > > -      Test that stepped sequence of the pattern selects from arg0.
> >> > > > -      res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
> >> > > > -      {
> >> > > > -     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> >> > > > -     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> >> > > > -     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> >> > > > -
> >> > > > -     vec_perm_builder builder (len, 1, 3);
> >> > > > -     poly_uint64 mask_elems[] = { len, 0, 1 };
> >> > > > -     builder_push_elems (builder, mask_elems);
> >> > > > -
> >> > > > -     vec_perm_indices sel (builder, 2, len);
> >> > > > -     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> >> > > > -
> >> > > > -     tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) };
> >> > > > -     validate_res (1, 3, res, expected_res);
> >> > > > -      }
> >> > > >      }
> >> > > >  }
> >> > > >
> >> > > > @@ -17528,7 +17501,26 @@ test_nunits_min_4 (machine_mode vmode)
> >> > > >       validate_res (1, 3, res, expected_res);
> >> > > >        }
> >> > > >
> >> > > > -      /* Case 4:
> >> > > > +      /* Case 4: mask = {len, 1, 2, ...} // (1, 3)
> >> > > > +      Test that stepped sequence of the pattern selects from arg0.
> >> > > > +      res = { arg1[0], arg0[1], arg0[2], ... } // (1, 3)  */
> >> > > > +      {
> >> > > > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> >> > > > +     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> >> > > > +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> >> > > > +
> >> > > > +     vec_perm_builder builder (len, 1, 3);
> >> > > > +     poly_uint64 mask_elems[] = { len, 1, 2 };
> >> > > > +     builder_push_elems (builder, mask_elems);
> >> > > > +
> >> > > > +     vec_perm_indices sel (builder, 2, len);
> >> > > > +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> >> > > > +
> >> > > > +     tree expected_res[] = { ARG1(0), ARG0(1), ARG0(2) };
> >> > > > +     validate_res (1, 3, res, expected_res);
> >> > > > +      }
> >> > > > +
> >> > > > +      /* Case 5:
> >> > > >       sel = { len, 0, 2, ... } // (1, 3)
> >> > > >       This should return NULL because we cross the input vectors.
> >> > > >       Because,
> >> > > > @@ -17561,7 +17553,7 @@ test_nunits_min_4 (machine_mode vmode)
> >> > > >       ASSERT_TRUE (!strcmp (reason, "crossed input vectors"));
> >> > > >        }
> >> > > >
> >> > > > -      /* Case 5: npatterns(arg0) = 4 > npatterns(sel) = 2
> >> > > > +      /* Case 6: npatterns(arg0) = 4 > npatterns(sel) = 2
> >> > > >        mask = { 0, len, 1, len + 1, ...} // (2, 2)
> >> > > >        res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2)
> >> > > >
> >> > > > @@ -17583,7 +17575,7 @@ test_nunits_min_4 (machine_mode vmode)
> >> > > >       validate_res (2, 2, res, expected_res);
> >> > > >        }
> >> > > >
> >> > > > -      /* Case 6: Test combination in sel, where one pattern is dup and other
> >> > > > +      /* Case 7: Test combination in sel, where one pattern is dup and other
> >> > > >        is stepped sequence.
> >> > > >        sel = { 0, 0, 0, 1, 0, 2, ... } // (2, 3)
> >> > > >        res = { arg0[0], arg0[0], arg0[0],
> >> > > > @@ -17605,7 +17597,7 @@ test_nunits_min_4 (machine_mode vmode)
> >> > > >       validate_res (2, 3, res, expected_res);
> >> > > >        }
> >> > > >
> >> > > > -      /* Case 7: PR111048: Check that we set arg_npatterns correctly,
> >> > > > +      /* Case 8: PR111048: Check that we set arg_npatterns correctly,
> >> > > >        when arg0, arg1 and sel have different number of patterns.
> >> > > >        arg0 is of shape (1, 1)
> >> > > >        arg1 is of shape (4, 1)
> >> > > > @@ -17634,6 +17626,51 @@ test_nunits_min_4 (machine_mode vmode)
> >> > > >       ASSERT_TRUE (res == NULL_TREE);
> >> > > >       ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns"));
> >> > > >        }
> >> > > > +
> >> > > > +      /* Case 9: PR111648 - a1 is multiple of vector length,
> >> > > > +      which results in incorrect encoding. Verify that we return
> >> > > > +      NULL for this case.
> >> > > > +      sel = { base_elem, len, len+1, ... } // (1, 3)
> >> > > > +      In this case, the single pattern is: { base_elem len, len+1, ...}
> >> > > > +      Let's assume that base_elem is used for indexing into arg0,
> >> > > > +      and a1 ... ae chooses elements from arg1.
> >> > > > +      So res = { arg0[base_elem], arg1[0], arg1[1], ... } // (1, 3)
> >> > > > +      Which creates an incorrect encoding with S = arg1[1] - arg1[0]
> >> > > > +      while the original encoding in arg1 is
> >> > > > +      arg1: { arg1[0], arg1[1], arg1[2], ... }
> >> > > > +      with S = arg1[2] - arg1[1].
> >> > > > +
> >> > > > +      As a concrete example, for above PR:
> >> > > > +      arg0: { -16, -9, -10, -11 }
> >> > > > +      arg1: { -12, -5, -6, -7 }
> >> > > > +      sel = { 3, 4, 5, 6 }
> >> > > > +
> >> > > > +      arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3.
> >> > > > +      Since a1 = 4 and arg_len = 4, it ended up creating the result with
> >> > > > +      following encoding:
> >> > > > +      res = { arg0[3], arg1[0], arg1[1] } // (1, 3)
> >> > > > +          = { -11, -12, -5 }
> >> > > > +
> >> > > > +      So for res[3], it used S = (-5) - (-12) = 7
> >> > > > +      And hence computed it as -5 + 7 = 2.
> >> > > > +      instead of arg1[2], ie, -6, which is the correct value.
> >> > > > +      Ensure that valid_mask_for_fold_vec_perm_cst returns false for this case.  */
> >> > > > +      {
> >> > > > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3);
> >> > > > +     tree arg1 = build_vec_cst_rand (vmode, 1, 3);
> >> > > > +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> >> > > > +
> >> > > > +     vec_perm_builder builder (len, 1, 3);
> >> > > > +     poly_uint64 mask_elems[] = { 0, len, len+1 };
> >> > > > +     builder_push_elems (builder, mask_elems);
> >> > > > +
> >> > > > +     vec_perm_indices sel (builder, 2, len);
> >> > > > +     const char *reason;
> >> > > > +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason);
> >> > > > +     ASSERT_TRUE (res == NULL_TREE);
> >> > > > +     ASSERT_TRUE (!strcmp (reason,
> >> > > > +                           "selecting base element of input vector"));
> >> > > > +      }
> >> > > >      }
> >> > > >  }
> >> > > >
> >> > > > diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c
> >> > > > new file mode 100644
> >> > > > index 00000000000..093e2b02654
> >> > > > --- /dev/null
> >> > > > +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c
> >> > > > @@ -0,0 +1,23 @@
> >> > > > +/* { dg-do compile } */
> >> > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> >> > > > +
> >> > > > +int a;
> >> > > > +int *b = &a;
> >> > > > +static int **c = &b;
> >> > > > +static int d;
> >> > > > +short e;
> >> > > > +short f;
> >> > > > +
> >> > > > +_Bool foo ()
> >> > > > +{
> >> > > > +  f = -21;
> >> > > > +  for (; f < -5; f++) {
> >> > > > +    e = f ^ 3;
> >> > > > +    d = *b;
> >> > > > +    **c = e;
> >> > > > +  }
> >> > > > +
> >> > > > +  return d == -6;
> >> > > > +}
> >> > > > +
> >> > > > +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */
> >
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > index 4f8561509ff..55a6a68c16c 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -10684,9 +10684,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
> >
> >        /* Ensure that the stepped sequence always selects from the same
> >        input pattern.  */
> > -      unsigned arg_npatterns
> > -     = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
> > -                       : VECTOR_CST_NPATTERNS (arg1);
> > +      tree arg = ((q1 & 1) == 0) ? arg0 : arg1;
> > +      unsigned arg_npatterns = VECTOR_CST_NPATTERNS (arg);
> >
> >        if (!multiple_p (step, arg_npatterns))
> >       {
> > @@ -10694,6 +10693,29 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
> >           *reason = "step is not multiple of npatterns";
> >         return false;
> >       }
> > +
> > +      /* If a1 chooses base element from arg, ensure that it's a natural
> > +      stepped sequence, ie, (arg[2] - arg[1]) == (arg[1] - arg[0])
> > +      to preserve arg's encoding.  */
> > +
> > +      unsigned HOST_WIDE_INT index;
> > +      if (!r1.is_constant (&index))
> > +     return false;
> > +      if (index < arg_npatterns)
> > +     {
>
> I don't know whether it matters in practice, but I think the two conditions
> above are more natural as:
>
>     if (maybe_lt (r1, arg_npatterns))
>       {
>         unsigned HOST_WIDE_INT index;
>         if (!r1.is_constant (&index))
>           return false;
>
>         ...[code below]...
>       }
>
> > +       tree arg_elem0 = vector_cst_elt (arg, index);
> > +       tree arg_elem1 = vector_cst_elt (arg, index + arg_npatterns);
> > +       tree arg_elem2 = vector_cst_elt (arg, index + arg_npatterns * 2);
> > +
> > +       if (!operand_equal_p (const_binop (MINUS_EXPR, arg_elem2, arg_elem1),
> > +                             const_binop (MINUS_EXPR, arg_elem1, arg_elem0),
> > +                             0))
>
> This needs to check whether const_binop returns null.  Maybe:
>
>    tree step1, step2;
>    if (!(step1 = const_binop (MINUS_EXPR, arg_elem1, arg_elem0))
>        || !(step2 = const_binop (MINUS_EXPR, arg_elem2, arg_elem1))
>        || !operand_equal_p (step1, step2, 0))
>
> OK with those changes, thanks.
Hi Richard,
Thanks for the suggestions, updated the attached patch accordingly.
Bootstrapped+tested with and without SVE on aarch64-linux-gnu and
x86_64-linux-gnu.
OK to commit ?

Thanks,
Prathamesh
>
> Richard
>
> > +         {
> > +           if (reason)
> > +             *reason = "not a natural stepped sequence";
> > +           return false;
> > +         }
> > +     }
> >      }
> >
> >    return true;
> > @@ -17161,7 +17183,8 @@ namespace test_fold_vec_perm_cst {
> >  static tree
> >  build_vec_cst_rand (machine_mode vmode, unsigned npatterns,
> >                   unsigned nelts_per_pattern,
> > -                 int step = 0, int threshold = 100)
> > +                 int step = 0, bool natural_stepped = false,
> > +                 int threshold = 100)
> >  {
> >    tree inner_type = lang_hooks.types.type_for_mode (GET_MODE_INNER (vmode), 1);
> >    tree vectype = build_vector_type_for_mode (inner_type, vmode);
> > @@ -17176,17 +17199,28 @@ build_vec_cst_rand (machine_mode vmode, unsigned npatterns,
> >
> >    // Fill a1 for each pattern
> >    for (unsigned i = 0; i < npatterns; i++)
> > -    builder.quick_push (build_int_cst (inner_type, rand () % threshold));
> > -
> > +    {
> > +      tree a1;
> > +      if (natural_stepped)
> > +     {
> > +       tree a0 = builder[i];
> > +       wide_int a0_val = wi::to_wide (a0);
> > +       wide_int a1_val = a0_val + step;
> > +       a1 = wide_int_to_tree (inner_type, a1_val);
> > +     }
> > +      else
> > +     a1 = build_int_cst (inner_type, rand () % threshold);
> > +      builder.quick_push (a1);
> > +    }
> >    if (nelts_per_pattern == 2)
> >      return builder.build ();
> >
> >    for (unsigned i = npatterns * 2; i < npatterns * nelts_per_pattern; i++)
> >      {
> >        tree prev_elem = builder[i - npatterns];
> > -      int prev_elem_val = TREE_INT_CST_LOW (prev_elem);
> > -      int val = prev_elem_val + step;
> > -      builder.quick_push (build_int_cst (inner_type, val));
> > +      wide_int prev_elem_val = wi::to_wide (prev_elem);
> > +      wide_int val = prev_elem_val + step;
> > +      builder.quick_push (wide_int_to_tree (inner_type, val));
> >      }
> >
> >    return builder.build ();
> > @@ -17432,7 +17466,7 @@ test_nunits_min_2 (machine_mode vmode)
> >        and step (a2 - a1) = 1, step is not a multiple of npatterns
> >        in input vector. So return NULL_TREE.  */
> >        {
> > -     tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
> > +     tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1, true);
> >       tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
> >       poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> >
> > @@ -17452,7 +17486,7 @@ test_nunits_min_2 (machine_mode vmode)
> >        Test that stepped sequence of the pattern selects from arg0.
> >        res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
> >        {
> > -     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1, true);
> >       tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> >       poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> >
> > @@ -17466,6 +17500,62 @@ test_nunits_min_2 (machine_mode vmode)
> >       tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) };
> >       validate_res (1, 3, res, expected_res);
> >        }
> > +
> > +      /* Case 6: PR111648 - a1 chooses base element from input vector arg.
> > +      In this case ensure that arg has a natural stepped sequence
> > +      to preserve arg's encoding.
> > +
> > +      As a concrete example, consider:
> > +      arg0: { -16, -9, -10, ... } // (1, 3)
> > +      arg1: { -12, -5, -6, ... }  // (1, 3)
> > +      sel = { 0, len, len + 1, ... } // (1, 3)
> > +
> > +      This will create res with following encoding:
> > +      res = { arg0[0], arg1[0], arg1[1], ... } // (1, 3)
> > +          = { -16, -12, -5, ... }
> > +
> > +      The step in above encoding would be: (-5) - (-12) = 7
> > +      And hence res[3] would be computed as -5 + 7 = 2.
> > +      instead of arg1[2], ie, -6.
> > +      Ensure that valid_mask_for_fold_vec_perm_cst returns false
> > +      for this case.  */
> > +      {
> > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> > +     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> > +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> > +
> > +     vec_perm_builder builder (len, 1, 3);
> > +     poly_uint64 mask_elems[] = { 0, len, len+1 };
> > +     builder_push_elems (builder, mask_elems);
> > +
> > +     vec_perm_indices sel (builder, 2, len);
> > +     const char *reason;
> > +     /* FIXME: It may happen that build_vec_cst_rand may build a natural
> > +        stepped pattern, even if we didn't explicitly tell it to. So folding
> > +        may not always fail, but if it does, ensure that's because arg1 does
> > +        not have a natural stepped sequence (and not due to other reason)  */
> > +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason);
> > +     if (res == NULL_TREE)
> > +       ASSERT_TRUE (!strcmp (reason, "not a natural stepped sequence"));
> > +      }
> > +
> > +      /* Case 7: Same as Case 6, except that arg1 contains natural stepped
> > +      sequence and thus folding should be valid for this case.  */
> > +      {
> > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> > +     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1, true);
> > +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> > +
> > +     vec_perm_builder builder (len, 1, 3);
> > +     poly_uint64 mask_elems[] = { 0, len, len+1 };
> > +     builder_push_elems (builder, mask_elems);
> > +
> > +     vec_perm_indices sel (builder, 2, len);
> > +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> > +
> > +     tree expected_res[] = { ARG0(0), ARG1(0), ARG1(1) };
> > +     validate_res (1, 3, res, expected_res);
> > +      }
> >      }
> >  }
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c
> > new file mode 100644
> > index 00000000000..093e2b02654
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c
> > @@ -0,0 +1,23 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +
> > +int a;
> > +int *b = &a;
> > +static int **c = &b;
> > +static int d;
> > +short e;
> > +short f;
> > +
> > +_Bool foo ()
> > +{
> > +  f = -21;
> > +  for (; f < -5; f++) {
> > +    e = f ^ 3;
> > +    d = *b;
> > +    **c = e;
> > +  }
> > +
> > +  return d == -6;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 44118e799eb..40767736389 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -10692,9 +10692,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
 
       /* Ensure that the stepped sequence always selects from the same
 	 input pattern.  */
-      unsigned arg_npatterns
-	= ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
-			  : VECTOR_CST_NPATTERNS (arg1);
+      tree arg = ((q1 & 1) == 0) ? arg0 : arg1;
+      unsigned arg_npatterns = VECTOR_CST_NPATTERNS (arg);
 
       if (!multiple_p (step, arg_npatterns))
 	{
@@ -10702,6 +10701,31 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
 	    *reason = "step is not multiple of npatterns";
 	  return false;
 	}
+
+      /* If a1 chooses base element from arg, ensure that it's a natural
+	 stepped sequence, ie, (arg[2] - arg[1]) == (arg[1] - arg[0])
+	 to preserve arg's encoding.  */
+
+      if (maybe_lt (r1, arg_npatterns))
+	{
+	  unsigned HOST_WIDE_INT index;
+	  if (!r1.is_constant (&index))
+	    return false;
+
+	  tree arg_elem0 = vector_cst_elt (arg, index);
+	  tree arg_elem1 = vector_cst_elt (arg, index + arg_npatterns);
+	  tree arg_elem2 = vector_cst_elt (arg, index + arg_npatterns * 2);
+
+	  tree step1, step2;
+	  if (!(step1 = const_binop (MINUS_EXPR, arg_elem1, arg_elem0))
+	      || !(step2 = const_binop (MINUS_EXPR, arg_elem2, arg_elem1))
+	      || !operand_equal_p (step1, step2, 0))
+	    {
+	      if (reason)
+		*reason = "not a natural stepped sequence";
+	      return false;
+	    }
+	}
     }
 
   return true;
@@ -17165,7 +17189,8 @@ namespace test_fold_vec_perm_cst {
 static tree
 build_vec_cst_rand (machine_mode vmode, unsigned npatterns,
 		    unsigned nelts_per_pattern,
-		    int step = 0, int threshold = 100)
+		    int step = 0, bool natural_stepped = false,
+		    int threshold = 100)
 {
   tree inner_type = lang_hooks.types.type_for_mode (GET_MODE_INNER (vmode), 1);
   tree vectype = build_vector_type_for_mode (inner_type, vmode);
@@ -17180,17 +17205,28 @@ build_vec_cst_rand (machine_mode vmode, unsigned npatterns,
 
   // Fill a1 for each pattern
   for (unsigned i = 0; i < npatterns; i++)
-    builder.quick_push (build_int_cst (inner_type, rand () % threshold));
-
+    {
+      tree a1;
+      if (natural_stepped)
+	{
+	  tree a0 = builder[i];
+	  wide_int a0_val = wi::to_wide (a0);
+	  wide_int a1_val = a0_val + step;
+	  a1 = wide_int_to_tree (inner_type, a1_val);
+	}
+      else
+	a1 = build_int_cst (inner_type, rand () % threshold);
+      builder.quick_push (a1);
+    }
   if (nelts_per_pattern == 2)
     return builder.build ();
 
   for (unsigned i = npatterns * 2; i < npatterns * nelts_per_pattern; i++)
     {
       tree prev_elem = builder[i - npatterns];
-      int prev_elem_val = TREE_INT_CST_LOW (prev_elem);
-      int val = prev_elem_val + step;
-      builder.quick_push (build_int_cst (inner_type, val));
+      wide_int prev_elem_val = wi::to_wide (prev_elem);
+      wide_int val = prev_elem_val + step;
+      builder.quick_push (wide_int_to_tree (inner_type, val));
     }
 
   return builder.build ();
@@ -17436,7 +17472,7 @@ test_nunits_min_2 (machine_mode vmode)
 	 and step (a2 - a1) = 1, step is not a multiple of npatterns
 	 in input vector. So return NULL_TREE.  */
       {
-	tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
+	tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1, true);
 	tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
 	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
 
@@ -17456,7 +17492,7 @@ test_nunits_min_2 (machine_mode vmode)
 	 Test that stepped sequence of the pattern selects from arg0.
 	 res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
       {
-	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
+	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1, true);
 	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
 	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
 
@@ -17470,6 +17506,62 @@ test_nunits_min_2 (machine_mode vmode)
 	tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) };
 	validate_res (1, 3, res, expected_res);
       }
+
+      /* Case 6: PR111648 - a1 chooses base element from input vector arg.
+	 In this case ensure that arg has a natural stepped sequence
+	 to preserve arg's encoding.
+
+	 As a concrete example, consider:
+	 arg0: { -16, -9, -10, ... } // (1, 3)
+	 arg1: { -12, -5, -6, ... }  // (1, 3)
+	 sel = { 0, len, len + 1, ... } // (1, 3)
+
+	 This will create res with following encoding:
+	 res = { arg0[0], arg1[0], arg1[1], ... } // (1, 3)
+	     = { -16, -12, -5, ... }
+
+	 The step in above encoding would be: (-5) - (-12) = 7
+	 And hence res[3] would be computed as -5 + 7 = 2.
+	 instead of arg1[2], ie, -6.
+	 Ensure that valid_mask_for_fold_vec_perm_cst returns false
+	 for this case.  */
+      {
+	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
+	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
+	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
+
+	vec_perm_builder builder (len, 1, 3);
+	poly_uint64 mask_elems[] = { 0, len, len+1 };
+	builder_push_elems (builder, mask_elems);
+
+	vec_perm_indices sel (builder, 2, len);
+	const char *reason;
+	/* FIXME: It may happen that build_vec_cst_rand may build a natural
+	   stepped pattern, even if we didn't explicitly tell it to. So folding
+	   may not always fail, but if it does, ensure that's because arg1 does
+	   not have a natural stepped sequence (and not due to other reason)  */
+	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason);
+	if (res == NULL_TREE)
+	  ASSERT_TRUE (!strcmp (reason, "not a natural stepped sequence"));
+      }
+
+      /* Case 7: Same as Case 6, except that arg1 contains natural stepped
+	 sequence and thus folding should be valid for this case.  */
+      {
+	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
+	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1, true);
+	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
+
+	vec_perm_builder builder (len, 1, 3);
+	poly_uint64 mask_elems[] = { 0, len, len+1 };
+	builder_push_elems (builder, mask_elems);
+
+	vec_perm_indices sel (builder, 2, len);
+	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
+
+	tree expected_res[] = { ARG0(0), ARG1(0), ARG1(1) };
+	validate_res (1, 3, res, expected_res);
+      }
     }
 }
Richard Sandiford Oct. 18, 2023, 5:52 p.m. UTC | #7
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> On Tue, 17 Oct 2023 at 02:40, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
>> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
>> > index 4f8561509ff..55a6a68c16c 100644
>> > --- a/gcc/fold-const.cc
>> > +++ b/gcc/fold-const.cc
>> > @@ -10684,9 +10684,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
>> >
>> >        /* Ensure that the stepped sequence always selects from the same
>> >        input pattern.  */
>> > -      unsigned arg_npatterns
>> > -     = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
>> > -                       : VECTOR_CST_NPATTERNS (arg1);
>> > +      tree arg = ((q1 & 1) == 0) ? arg0 : arg1;
>> > +      unsigned arg_npatterns = VECTOR_CST_NPATTERNS (arg);
>> >
>> >        if (!multiple_p (step, arg_npatterns))
>> >       {
>> > @@ -10694,6 +10693,29 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
>> >           *reason = "step is not multiple of npatterns";
>> >         return false;
>> >       }
>> > +
>> > +      /* If a1 chooses base element from arg, ensure that it's a natural
>> > +      stepped sequence, ie, (arg[2] - arg[1]) == (arg[1] - arg[0])
>> > +      to preserve arg's encoding.  */
>> > +
>> > +      unsigned HOST_WIDE_INT index;
>> > +      if (!r1.is_constant (&index))
>> > +     return false;
>> > +      if (index < arg_npatterns)
>> > +     {
>>
>> I don't know whether it matters in practice, but I think the two conditions
>> above are more natural as:
>>
>>     if (maybe_lt (r1, arg_npatterns))
>>       {
>>         unsigned HOST_WIDE_INT index;
>>         if (!r1.is_constant (&index))
>>           return false;
>>
>>         ...[code below]...
>>       }
>>
>> > +       tree arg_elem0 = vector_cst_elt (arg, index);
>> > +       tree arg_elem1 = vector_cst_elt (arg, index + arg_npatterns);
>> > +       tree arg_elem2 = vector_cst_elt (arg, index + arg_npatterns * 2);
>> > +
>> > +       if (!operand_equal_p (const_binop (MINUS_EXPR, arg_elem2, arg_elem1),
>> > +                             const_binop (MINUS_EXPR, arg_elem1, arg_elem0),
>> > +                             0))
>>
>> This needs to check whether const_binop returns null.  Maybe:
>>
>>    tree step1, step2;
>>    if (!(step1 = const_binop (MINUS_EXPR, arg_elem1, arg_elem0))
>>        || !(step2 = const_binop (MINUS_EXPR, arg_elem2, arg_elem1))
>>        || !operand_equal_p (step1, step2, 0))
>>
>> OK with those changes, thanks.
> Hi Richard,
> Thanks for the suggestions, updated the attached patch accordingly.
> Bootstrapped+tested with and without SVE on aarch64-linux-gnu and
> x86_64-linux-gnu.
> OK to commit ?

Yes, thanks.

Richard

>
> Thanks,
> Prathamesh
>>
>> Richard
>>
>> > +         {
>> > +           if (reason)
>> > +             *reason = "not a natural stepped sequence";
>> > +           return false;
>> > +         }
>> > +     }
>> >      }
>> >
>> >    return true;
>> > @@ -17161,7 +17183,8 @@ namespace test_fold_vec_perm_cst {
>> >  static tree
>> >  build_vec_cst_rand (machine_mode vmode, unsigned npatterns,
>> >                   unsigned nelts_per_pattern,
>> > -                 int step = 0, int threshold = 100)
>> > +                 int step = 0, bool natural_stepped = false,
>> > +                 int threshold = 100)
>> >  {
>> >    tree inner_type = lang_hooks.types.type_for_mode (GET_MODE_INNER (vmode), 1);
>> >    tree vectype = build_vector_type_for_mode (inner_type, vmode);
>> > @@ -17176,17 +17199,28 @@ build_vec_cst_rand (machine_mode vmode, unsigned npatterns,
>> >
>> >    // Fill a1 for each pattern
>> >    for (unsigned i = 0; i < npatterns; i++)
>> > -    builder.quick_push (build_int_cst (inner_type, rand () % threshold));
>> > -
>> > +    {
>> > +      tree a1;
>> > +      if (natural_stepped)
>> > +     {
>> > +       tree a0 = builder[i];
>> > +       wide_int a0_val = wi::to_wide (a0);
>> > +       wide_int a1_val = a0_val + step;
>> > +       a1 = wide_int_to_tree (inner_type, a1_val);
>> > +     }
>> > +      else
>> > +     a1 = build_int_cst (inner_type, rand () % threshold);
>> > +      builder.quick_push (a1);
>> > +    }
>> >    if (nelts_per_pattern == 2)
>> >      return builder.build ();
>> >
>> >    for (unsigned i = npatterns * 2; i < npatterns * nelts_per_pattern; i++)
>> >      {
>> >        tree prev_elem = builder[i - npatterns];
>> > -      int prev_elem_val = TREE_INT_CST_LOW (prev_elem);
>> > -      int val = prev_elem_val + step;
>> > -      builder.quick_push (build_int_cst (inner_type, val));
>> > +      wide_int prev_elem_val = wi::to_wide (prev_elem);
>> > +      wide_int val = prev_elem_val + step;
>> > +      builder.quick_push (wide_int_to_tree (inner_type, val));
>> >      }
>> >
>> >    return builder.build ();
>> > @@ -17432,7 +17466,7 @@ test_nunits_min_2 (machine_mode vmode)
>> >        and step (a2 - a1) = 1, step is not a multiple of npatterns
>> >        in input vector. So return NULL_TREE.  */
>> >        {
>> > -     tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
>> > +     tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1, true);
>> >       tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
>> >       poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>> >
>> > @@ -17452,7 +17486,7 @@ test_nunits_min_2 (machine_mode vmode)
>> >        Test that stepped sequence of the pattern selects from arg0.
>> >        res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
>> >        {
>> > -     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
>> > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1, true);
>> >       tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
>> >       poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>> >
>> > @@ -17466,6 +17500,62 @@ test_nunits_min_2 (machine_mode vmode)
>> >       tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) };
>> >       validate_res (1, 3, res, expected_res);
>> >        }
>> > +
>> > +      /* Case 6: PR111648 - a1 chooses base element from input vector arg.
>> > +      In this case ensure that arg has a natural stepped sequence
>> > +      to preserve arg's encoding.
>> > +
>> > +      As a concrete example, consider:
>> > +      arg0: { -16, -9, -10, ... } // (1, 3)
>> > +      arg1: { -12, -5, -6, ... }  // (1, 3)
>> > +      sel = { 0, len, len + 1, ... } // (1, 3)
>> > +
>> > +      This will create res with following encoding:
>> > +      res = { arg0[0], arg1[0], arg1[1], ... } // (1, 3)
>> > +          = { -16, -12, -5, ... }
>> > +
>> > +      The step in above encoding would be: (-5) - (-12) = 7
>> > +      And hence res[3] would be computed as -5 + 7 = 2.
>> > +      instead of arg1[2], ie, -6.
>> > +      Ensure that valid_mask_for_fold_vec_perm_cst returns false
>> > +      for this case.  */
>> > +      {
>> > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
>> > +     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
>> > +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>> > +
>> > +     vec_perm_builder builder (len, 1, 3);
>> > +     poly_uint64 mask_elems[] = { 0, len, len+1 };
>> > +     builder_push_elems (builder, mask_elems);
>> > +
>> > +     vec_perm_indices sel (builder, 2, len);
>> > +     const char *reason;
>> > +     /* FIXME: It may happen that build_vec_cst_rand may build a natural
>> > +        stepped pattern, even if we didn't explicitly tell it to. So folding
>> > +        may not always fail, but if it does, ensure that's because arg1 does
>> > +        not have a natural stepped sequence (and not due to other reason)  */
>> > +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason);
>> > +     if (res == NULL_TREE)
>> > +       ASSERT_TRUE (!strcmp (reason, "not a natural stepped sequence"));
>> > +      }
>> > +
>> > +      /* Case 7: Same as Case 6, except that arg1 contains natural stepped
>> > +      sequence and thus folding should be valid for this case.  */
>> > +      {
>> > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
>> > +     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1, true);
>> > +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>> > +
>> > +     vec_perm_builder builder (len, 1, 3);
>> > +     poly_uint64 mask_elems[] = { 0, len, len+1 };
>> > +     builder_push_elems (builder, mask_elems);
>> > +
>> > +     vec_perm_indices sel (builder, 2, len);
>> > +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
>> > +
>> > +     tree expected_res[] = { ARG0(0), ARG1(0), ARG1(1) };
>> > +     validate_res (1, 3, res, expected_res);
>> > +      }
>> >      }
>> >  }
>> >
>> > diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c
>> > new file mode 100644
>> > index 00000000000..093e2b02654
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c
>> > @@ -0,0 +1,23 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
>> > +
>> > +int a;
>> > +int *b = &a;
>> > +static int **c = &b;
>> > +static int d;
>> > +short e;
>> > +short f;
>> > +
>> > +_Bool foo ()
>> > +{
>> > +  f = -21;
>> > +  for (; f < -5; f++) {
>> > +    e = f ^ 3;
>> > +    d = *b;
>> > +    **c = e;
>> > +  }
>> > +
>> > +  return d == -6;
>> > +}
>> > +
>> > +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */
>
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 44118e799eb..40767736389 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -10692,9 +10692,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
>  
>        /* Ensure that the stepped sequence always selects from the same
>  	 input pattern.  */
> -      unsigned arg_npatterns
> -	= ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
> -			  : VECTOR_CST_NPATTERNS (arg1);
> +      tree arg = ((q1 & 1) == 0) ? arg0 : arg1;
> +      unsigned arg_npatterns = VECTOR_CST_NPATTERNS (arg);
>  
>        if (!multiple_p (step, arg_npatterns))
>  	{
> @@ -10702,6 +10701,31 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
>  	    *reason = "step is not multiple of npatterns";
>  	  return false;
>  	}
> +
> +      /* If a1 chooses base element from arg, ensure that it's a natural
> +	 stepped sequence, ie, (arg[2] - arg[1]) == (arg[1] - arg[0])
> +	 to preserve arg's encoding.  */
> +
> +      if (maybe_lt (r1, arg_npatterns))
> +	{
> +	  unsigned HOST_WIDE_INT index;
> +	  if (!r1.is_constant (&index))
> +	    return false;
> +
> +	  tree arg_elem0 = vector_cst_elt (arg, index);
> +	  tree arg_elem1 = vector_cst_elt (arg, index + arg_npatterns);
> +	  tree arg_elem2 = vector_cst_elt (arg, index + arg_npatterns * 2);
> +
> +	  tree step1, step2;
> +	  if (!(step1 = const_binop (MINUS_EXPR, arg_elem1, arg_elem0))
> +	      || !(step2 = const_binop (MINUS_EXPR, arg_elem2, arg_elem1))
> +	      || !operand_equal_p (step1, step2, 0))
> +	    {
> +	      if (reason)
> +		*reason = "not a natural stepped sequence";
> +	      return false;
> +	    }
> +	}
>      }
>  
>    return true;
> @@ -17165,7 +17189,8 @@ namespace test_fold_vec_perm_cst {
>  static tree
>  build_vec_cst_rand (machine_mode vmode, unsigned npatterns,
>  		    unsigned nelts_per_pattern,
> -		    int step = 0, int threshold = 100)
> +		    int step = 0, bool natural_stepped = false,
> +		    int threshold = 100)
>  {
>    tree inner_type = lang_hooks.types.type_for_mode (GET_MODE_INNER (vmode), 1);
>    tree vectype = build_vector_type_for_mode (inner_type, vmode);
> @@ -17180,17 +17205,28 @@ build_vec_cst_rand (machine_mode vmode, unsigned npatterns,
>  
>    // Fill a1 for each pattern
>    for (unsigned i = 0; i < npatterns; i++)
> -    builder.quick_push (build_int_cst (inner_type, rand () % threshold));
> -
> +    {
> +      tree a1;
> +      if (natural_stepped)
> +	{
> +	  tree a0 = builder[i];
> +	  wide_int a0_val = wi::to_wide (a0);
> +	  wide_int a1_val = a0_val + step;
> +	  a1 = wide_int_to_tree (inner_type, a1_val);
> +	}
> +      else
> +	a1 = build_int_cst (inner_type, rand () % threshold);
> +      builder.quick_push (a1);
> +    }
>    if (nelts_per_pattern == 2)
>      return builder.build ();
>  
>    for (unsigned i = npatterns * 2; i < npatterns * nelts_per_pattern; i++)
>      {
>        tree prev_elem = builder[i - npatterns];
> -      int prev_elem_val = TREE_INT_CST_LOW (prev_elem);
> -      int val = prev_elem_val + step;
> -      builder.quick_push (build_int_cst (inner_type, val));
> +      wide_int prev_elem_val = wi::to_wide (prev_elem);
> +      wide_int val = prev_elem_val + step;
> +      builder.quick_push (wide_int_to_tree (inner_type, val));
>      }
>  
>    return builder.build ();
> @@ -17436,7 +17472,7 @@ test_nunits_min_2 (machine_mode vmode)
>  	 and step (a2 - a1) = 1, step is not a multiple of npatterns
>  	 in input vector. So return NULL_TREE.  */
>        {
> -	tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
> +	tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1, true);
>  	tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
>  	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>  
> @@ -17456,7 +17492,7 @@ test_nunits_min_2 (machine_mode vmode)
>  	 Test that stepped sequence of the pattern selects from arg0.
>  	 res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
>        {
> -	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1, true);
>  	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
>  	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>  
> @@ -17470,6 +17506,62 @@ test_nunits_min_2 (machine_mode vmode)
>  	tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) };
>  	validate_res (1, 3, res, expected_res);
>        }
> +
> +      /* Case 6: PR111648 - a1 chooses base element from input vector arg.
> +	 In this case ensure that arg has a natural stepped sequence
> +	 to preserve arg's encoding.
> +
> +	 As a concrete example, consider:
> +	 arg0: { -16, -9, -10, ... } // (1, 3)
> +	 arg1: { -12, -5, -6, ... }  // (1, 3)
> +	 sel = { 0, len, len + 1, ... } // (1, 3)
> +
> +	 This will create res with following encoding:
> +	 res = { arg0[0], arg1[0], arg1[1], ... } // (1, 3)
> +	     = { -16, -12, -5, ... }
> +
> +	 The step in above encoding would be: (-5) - (-12) = 7
> +	 And hence res[3] would be computed as -5 + 7 = 2.
> +	 instead of arg1[2], ie, -6.
> +	 Ensure that valid_mask_for_fold_vec_perm_cst returns false
> +	 for this case.  */
> +      {
> +	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +	vec_perm_builder builder (len, 1, 3);
> +	poly_uint64 mask_elems[] = { 0, len, len+1 };
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, len);
> +	const char *reason;
> +	/* FIXME: It may happen that build_vec_cst_rand may build a natural
> +	   stepped pattern, even if we didn't explicitly tell it to. So folding
> +	   may not always fail, but if it does, ensure that's because arg1 does
> +	   not have a natural stepped sequence (and not due to other reason)  */
> +	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason);
> +	if (res == NULL_TREE)
> +	  ASSERT_TRUE (!strcmp (reason, "not a natural stepped sequence"));
> +      }
> +
> +      /* Case 7: Same as Case 6, except that arg1 contains natural stepped
> +	 sequence and thus folding should be valid for this case.  */
> +      {
> +	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1, true);
> +	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +	vec_perm_builder builder (len, 1, 3);
> +	poly_uint64 mask_elems[] = { 0, len, len+1 };
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, len);
> +	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +
> +	tree expected_res[] = { ARG0(0), ARG1(0), ARG1(1) };
> +	validate_res (1, 3, res, expected_res);
> +      }
>      }
>  }
>
Prathamesh Kulkarni Oct. 18, 2023, 7:04 p.m. UTC | #8
On Wed, 18 Oct 2023 at 23:22, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > On Tue, 17 Oct 2023 at 02:40, Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> >> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> >> > index 4f8561509ff..55a6a68c16c 100644
> >> > --- a/gcc/fold-const.cc
> >> > +++ b/gcc/fold-const.cc
> >> > @@ -10684,9 +10684,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
> >> >
> >> >        /* Ensure that the stepped sequence always selects from the same
> >> >        input pattern.  */
> >> > -      unsigned arg_npatterns
> >> > -     = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
> >> > -                       : VECTOR_CST_NPATTERNS (arg1);
> >> > +      tree arg = ((q1 & 1) == 0) ? arg0 : arg1;
> >> > +      unsigned arg_npatterns = VECTOR_CST_NPATTERNS (arg);
> >> >
> >> >        if (!multiple_p (step, arg_npatterns))
> >> >       {
> >> > @@ -10694,6 +10693,29 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
> >> >           *reason = "step is not multiple of npatterns";
> >> >         return false;
> >> >       }
> >> > +
> >> > +      /* If a1 chooses base element from arg, ensure that it's a natural
> >> > +      stepped sequence, ie, (arg[2] - arg[1]) == (arg[1] - arg[0])
> >> > +      to preserve arg's encoding.  */
> >> > +
> >> > +      unsigned HOST_WIDE_INT index;
> >> > +      if (!r1.is_constant (&index))
> >> > +     return false;
> >> > +      if (index < arg_npatterns)
> >> > +     {
> >>
> >> I don't know whether it matters in practice, but I think the two conditions
> >> above are more natural as:
> >>
> >>     if (maybe_lt (r1, arg_npatterns))
> >>       {
> >>         unsigned HOST_WIDE_INT index;
> >>         if (!r1.is_constant (&index))
> >>           return false;
> >>
> >>         ...[code below]...
> >>       }
> >>
> >> > +       tree arg_elem0 = vector_cst_elt (arg, index);
> >> > +       tree arg_elem1 = vector_cst_elt (arg, index + arg_npatterns);
> >> > +       tree arg_elem2 = vector_cst_elt (arg, index + arg_npatterns * 2);
> >> > +
> >> > +       if (!operand_equal_p (const_binop (MINUS_EXPR, arg_elem2, arg_elem1),
> >> > +                             const_binop (MINUS_EXPR, arg_elem1, arg_elem0),
> >> > +                             0))
> >>
> >> This needs to check whether const_binop returns null.  Maybe:
> >>
> >>    tree step1, step2;
> >>    if (!(step1 = const_binop (MINUS_EXPR, arg_elem1, arg_elem0))
> >>        || !(step2 = const_binop (MINUS_EXPR, arg_elem2, arg_elem1))
> >>        || !operand_equal_p (step1, step2, 0))
> >>
> >> OK with those changes, thanks.
> > Hi Richard,
> > Thanks for the suggestions, updated the attached patch accordingly.
> > Bootstrapped+tested with and without SVE on aarch64-linux-gnu and
> > x86_64-linux-gnu.
> > OK to commit ?
>
> Yes, thanks.
Thanks, committed to trunk in 3ec8ecb8e92faec889bc6f7aeac9ff59e82b4f7f.

Thanks,
Prathamesh
>
> Richard
>
> >
> > Thanks,
> > Prathamesh
> >>
> >> Richard
> >>
> >> > +         {
> >> > +           if (reason)
> >> > +             *reason = "not a natural stepped sequence";
> >> > +           return false;
> >> > +         }
> >> > +     }
> >> >      }
> >> >
> >> >    return true;
> >> > @@ -17161,7 +17183,8 @@ namespace test_fold_vec_perm_cst {
> >> >  static tree
> >> >  build_vec_cst_rand (machine_mode vmode, unsigned npatterns,
> >> >                   unsigned nelts_per_pattern,
> >> > -                 int step = 0, int threshold = 100)
> >> > +                 int step = 0, bool natural_stepped = false,
> >> > +                 int threshold = 100)
> >> >  {
> >> >    tree inner_type = lang_hooks.types.type_for_mode (GET_MODE_INNER (vmode), 1);
> >> >    tree vectype = build_vector_type_for_mode (inner_type, vmode);
> >> > @@ -17176,17 +17199,28 @@ build_vec_cst_rand (machine_mode vmode, unsigned npatterns,
> >> >
> >> >    // Fill a1 for each pattern
> >> >    for (unsigned i = 0; i < npatterns; i++)
> >> > -    builder.quick_push (build_int_cst (inner_type, rand () % threshold));
> >> > -
> >> > +    {
> >> > +      tree a1;
> >> > +      if (natural_stepped)
> >> > +     {
> >> > +       tree a0 = builder[i];
> >> > +       wide_int a0_val = wi::to_wide (a0);
> >> > +       wide_int a1_val = a0_val + step;
> >> > +       a1 = wide_int_to_tree (inner_type, a1_val);
> >> > +     }
> >> > +      else
> >> > +     a1 = build_int_cst (inner_type, rand () % threshold);
> >> > +      builder.quick_push (a1);
> >> > +    }
> >> >    if (nelts_per_pattern == 2)
> >> >      return builder.build ();
> >> >
> >> >    for (unsigned i = npatterns * 2; i < npatterns * nelts_per_pattern; i++)
> >> >      {
> >> >        tree prev_elem = builder[i - npatterns];
> >> > -      int prev_elem_val = TREE_INT_CST_LOW (prev_elem);
> >> > -      int val = prev_elem_val + step;
> >> > -      builder.quick_push (build_int_cst (inner_type, val));
> >> > +      wide_int prev_elem_val = wi::to_wide (prev_elem);
> >> > +      wide_int val = prev_elem_val + step;
> >> > +      builder.quick_push (wide_int_to_tree (inner_type, val));
> >> >      }
> >> >
> >> >    return builder.build ();
> >> > @@ -17432,7 +17466,7 @@ test_nunits_min_2 (machine_mode vmode)
> >> >        and step (a2 - a1) = 1, step is not a multiple of npatterns
> >> >        in input vector. So return NULL_TREE.  */
> >> >        {
> >> > -     tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
> >> > +     tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1, true);
> >> >       tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
> >> >       poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> >> >
> >> > @@ -17452,7 +17486,7 @@ test_nunits_min_2 (machine_mode vmode)
> >> >        Test that stepped sequence of the pattern selects from arg0.
> >> >        res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
> >> >        {
> >> > -     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> >> > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1, true);
> >> >       tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> >> >       poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> >> >
> >> > @@ -17466,6 +17500,62 @@ test_nunits_min_2 (machine_mode vmode)
> >> >       tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) };
> >> >       validate_res (1, 3, res, expected_res);
> >> >        }
> >> > +
> >> > +      /* Case 6: PR111648 - a1 chooses base element from input vector arg.
> >> > +      In this case ensure that arg has a natural stepped sequence
> >> > +      to preserve arg's encoding.
> >> > +
> >> > +      As a concrete example, consider:
> >> > +      arg0: { -16, -9, -10, ... } // (1, 3)
> >> > +      arg1: { -12, -5, -6, ... }  // (1, 3)
> >> > +      sel = { 0, len, len + 1, ... } // (1, 3)
> >> > +
> >> > +      This will create res with following encoding:
> >> > +      res = { arg0[0], arg1[0], arg1[1], ... } // (1, 3)
> >> > +          = { -16, -12, -5, ... }
> >> > +
> >> > +      The step in above encoding would be: (-5) - (-12) = 7
> >> > +      And hence res[3] would be computed as -5 + 7 = 2.
> >> > +      instead of arg1[2], ie, -6.
> >> > +      Ensure that valid_mask_for_fold_vec_perm_cst returns false
> >> > +      for this case.  */
> >> > +      {
> >> > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> >> > +     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> >> > +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> >> > +
> >> > +     vec_perm_builder builder (len, 1, 3);
> >> > +     poly_uint64 mask_elems[] = { 0, len, len+1 };
> >> > +     builder_push_elems (builder, mask_elems);
> >> > +
> >> > +     vec_perm_indices sel (builder, 2, len);
> >> > +     const char *reason;
> >> > +     /* FIXME: It may happen that build_vec_cst_rand may build a natural
> >> > +        stepped pattern, even if we didn't explicitly tell it to. So folding
> >> > +        may not always fail, but if it does, ensure that's because arg1 does
> >> > +        not have a natural stepped sequence (and not due to other reason)  */
> >> > +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason);
> >> > +     if (res == NULL_TREE)
> >> > +       ASSERT_TRUE (!strcmp (reason, "not a natural stepped sequence"));
> >> > +      }
> >> > +
> >> > +      /* Case 7: Same as Case 6, except that arg1 contains natural stepped
> >> > +      sequence and thus folding should be valid for this case.  */
> >> > +      {
> >> > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> >> > +     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1, true);
> >> > +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> >> > +
> >> > +     vec_perm_builder builder (len, 1, 3);
> >> > +     poly_uint64 mask_elems[] = { 0, len, len+1 };
> >> > +     builder_push_elems (builder, mask_elems);
> >> > +
> >> > +     vec_perm_indices sel (builder, 2, len);
> >> > +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> >> > +
> >> > +     tree expected_res[] = { ARG0(0), ARG1(0), ARG1(1) };
> >> > +     validate_res (1, 3, res, expected_res);
> >> > +      }
> >> >      }
> >> >  }
> >> >
> >> > diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c
> >> > new file mode 100644
> >> > index 00000000000..093e2b02654
> >> > --- /dev/null
> >> > +++ b/gcc/testsuite/gcc.dg/vect/pr111648.c
> >> > @@ -0,0 +1,23 @@
> >> > +/* { dg-do compile } */
> >> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> >> > +
> >> > +int a;
> >> > +int *b = &a;
> >> > +static int **c = &b;
> >> > +static int d;
> >> > +short e;
> >> > +short f;
> >> > +
> >> > +_Bool foo ()
> >> > +{
> >> > +  f = -21;
> >> > +  for (; f < -5; f++) {
> >> > +    e = f ^ 3;
> >> > +    d = *b;
> >> > +    **c = e;
> >> > +  }
> >> > +
> >> > +  return d == -6;
> >> > +}
> >> > +
> >> > +/* { dg-final { scan-tree-dump "return 1" "optimized" } } */
> >
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > index 44118e799eb..40767736389 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -10692,9 +10692,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
> >
> >        /* Ensure that the stepped sequence always selects from the same
> >        input pattern.  */
> > -      unsigned arg_npatterns
> > -     = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
> > -                       : VECTOR_CST_NPATTERNS (arg1);
> > +      tree arg = ((q1 & 1) == 0) ? arg0 : arg1;
> > +      unsigned arg_npatterns = VECTOR_CST_NPATTERNS (arg);
> >
> >        if (!multiple_p (step, arg_npatterns))
> >       {
> > @@ -10702,6 +10701,31 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
> >           *reason = "step is not multiple of npatterns";
> >         return false;
> >       }
> > +
> > +      /* If a1 chooses base element from arg, ensure that it's a natural
> > +      stepped sequence, ie, (arg[2] - arg[1]) == (arg[1] - arg[0])
> > +      to preserve arg's encoding.  */
> > +
> > +      if (maybe_lt (r1, arg_npatterns))
> > +     {
> > +       unsigned HOST_WIDE_INT index;
> > +       if (!r1.is_constant (&index))
> > +         return false;
> > +
> > +       tree arg_elem0 = vector_cst_elt (arg, index);
> > +       tree arg_elem1 = vector_cst_elt (arg, index + arg_npatterns);
> > +       tree arg_elem2 = vector_cst_elt (arg, index + arg_npatterns * 2);
> > +
> > +       tree step1, step2;
> > +       if (!(step1 = const_binop (MINUS_EXPR, arg_elem1, arg_elem0))
> > +           || !(step2 = const_binop (MINUS_EXPR, arg_elem2, arg_elem1))
> > +           || !operand_equal_p (step1, step2, 0))
> > +         {
> > +           if (reason)
> > +             *reason = "not a natural stepped sequence";
> > +           return false;
> > +         }
> > +     }
> >      }
> >
> >    return true;
> > @@ -17165,7 +17189,8 @@ namespace test_fold_vec_perm_cst {
> >  static tree
> >  build_vec_cst_rand (machine_mode vmode, unsigned npatterns,
> >                   unsigned nelts_per_pattern,
> > -                 int step = 0, int threshold = 100)
> > +                 int step = 0, bool natural_stepped = false,
> > +                 int threshold = 100)
> >  {
> >    tree inner_type = lang_hooks.types.type_for_mode (GET_MODE_INNER (vmode), 1);
> >    tree vectype = build_vector_type_for_mode (inner_type, vmode);
> > @@ -17180,17 +17205,28 @@ build_vec_cst_rand (machine_mode vmode, unsigned npatterns,
> >
> >    // Fill a1 for each pattern
> >    for (unsigned i = 0; i < npatterns; i++)
> > -    builder.quick_push (build_int_cst (inner_type, rand () % threshold));
> > -
> > +    {
> > +      tree a1;
> > +      if (natural_stepped)
> > +     {
> > +       tree a0 = builder[i];
> > +       wide_int a0_val = wi::to_wide (a0);
> > +       wide_int a1_val = a0_val + step;
> > +       a1 = wide_int_to_tree (inner_type, a1_val);
> > +     }
> > +      else
> > +     a1 = build_int_cst (inner_type, rand () % threshold);
> > +      builder.quick_push (a1);
> > +    }
> >    if (nelts_per_pattern == 2)
> >      return builder.build ();
> >
> >    for (unsigned i = npatterns * 2; i < npatterns * nelts_per_pattern; i++)
> >      {
> >        tree prev_elem = builder[i - npatterns];
> > -      int prev_elem_val = TREE_INT_CST_LOW (prev_elem);
> > -      int val = prev_elem_val + step;
> > -      builder.quick_push (build_int_cst (inner_type, val));
> > +      wide_int prev_elem_val = wi::to_wide (prev_elem);
> > +      wide_int val = prev_elem_val + step;
> > +      builder.quick_push (wide_int_to_tree (inner_type, val));
> >      }
> >
> >    return builder.build ();
> > @@ -17436,7 +17472,7 @@ test_nunits_min_2 (machine_mode vmode)
> >        and step (a2 - a1) = 1, step is not a multiple of npatterns
> >        in input vector. So return NULL_TREE.  */
> >        {
> > -     tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
> > +     tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1, true);
> >       tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
> >       poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> >
> > @@ -17456,7 +17492,7 @@ test_nunits_min_2 (machine_mode vmode)
> >        Test that stepped sequence of the pattern selects from arg0.
> >        res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
> >        {
> > -     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1, true);
> >       tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> >       poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> >
> > @@ -17470,6 +17506,62 @@ test_nunits_min_2 (machine_mode vmode)
> >       tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) };
> >       validate_res (1, 3, res, expected_res);
> >        }
> > +
> > +      /* Case 6: PR111648 - a1 chooses base element from input vector arg.
> > +      In this case ensure that arg has a natural stepped sequence
> > +      to preserve arg's encoding.
> > +
> > +      As a concrete example, consider:
> > +      arg0: { -16, -9, -10, ... } // (1, 3)
> > +      arg1: { -12, -5, -6, ... }  // (1, 3)
> > +      sel = { 0, len, len + 1, ... } // (1, 3)
> > +
> > +      This will create res with following encoding:
> > +      res = { arg0[0], arg1[0], arg1[1], ... } // (1, 3)
> > +          = { -16, -12, -5, ... }
> > +
> > +      The step in above encoding would be: (-5) - (-12) = 7
> > +      And hence res[3] would be computed as -5 + 7 = 2.
> > +      instead of arg1[2], ie, -6.
> > +      Ensure that valid_mask_for_fold_vec_perm_cst returns false
> > +      for this case.  */
> > +      {
> > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> > +     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> > +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> > +
> > +     vec_perm_builder builder (len, 1, 3);
> > +     poly_uint64 mask_elems[] = { 0, len, len+1 };
> > +     builder_push_elems (builder, mask_elems);
> > +
> > +     vec_perm_indices sel (builder, 2, len);
> > +     const char *reason;
> > +     /* FIXME: It may happen that build_vec_cst_rand may build a natural
> > +        stepped pattern, even if we didn't explicitly tell it to. So folding
> > +        may not always fail, but if it does, ensure that's because arg1 does
> > +        not have a natural stepped sequence (and not due to other reason)  */
> > +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason);
> > +     if (res == NULL_TREE)
> > +       ASSERT_TRUE (!strcmp (reason, "not a natural stepped sequence"));
> > +      }
> > +
> > +      /* Case 7: Same as Case 6, except that arg1 contains natural stepped
> > +      sequence and thus folding should be valid for this case.  */
> > +      {
> > +     tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> > +     tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1, true);
> > +     poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> > +
> > +     vec_perm_builder builder (len, 1, 3);
> > +     poly_uint64 mask_elems[] = { 0, len, len+1 };
> > +     builder_push_elems (builder, mask_elems);
> > +
> > +     vec_perm_indices sel (builder, 2, len);
> > +     tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> > +
> > +     tree expected_res[] = { ARG0(0), ARG1(0), ARG1(1) };
> > +     validate_res (1, 3, res, expected_res);
> > +      }
> >      }
> >  }
> >
diff mbox series

Patch

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 4f8561509ff..c5f421d6b76 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -10682,8 +10682,8 @@  valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
 	  return false;
 	}
 
-      /* Ensure that the stepped sequence always selects from the same
-	 input pattern.  */
+      /* Ensure that the stepped sequence always selects from the stepped
+	 part of same input pattern.  */
       unsigned arg_npatterns
 	= ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
 			  : VECTOR_CST_NPATTERNS (arg1);
@@ -10694,6 +10694,20 @@  valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
 	    *reason = "step is not multiple of npatterns";
 	  return false;
 	}
+
+      /* If a1 is a multiple of len, it will select base element of input
+	 vector resulting in following encoding:
+	 { base_elem, arg[0], arg[1], ... } where arg is the chosen input
+	 vector. This encoding is not originally present in arg, since it's
+	 defined as:
+	 { arg[0], arg[1], arg[2], ... }.  */
+
+      if (multiple_p (a1, arg_len))
+	{
+	  if (reason)
+	    *reason = "selecting base element of input vector";
+	  return false;
+	}
     }
 
   return true;
@@ -17425,47 +17439,6 @@  test_nunits_min_2 (machine_mode vmode)
 	tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) };
 	validate_res (2, 2, res, expected_res);
       }
-
-      /* Case 4: mask = {0, 0, 1, ...} // (1, 3)
-	 Test that the stepped sequence of the pattern selects from
-	 same input pattern. Since input vectors have npatterns = 2,
-	 and step (a2 - a1) = 1, step is not a multiple of npatterns
-	 in input vector. So return NULL_TREE.  */
-      {
-	tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
-	tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
-	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
-
-	vec_perm_builder builder (len, 1, 3);
-	poly_uint64 mask_elems[] = { 0, 0, 1 };
-	builder_push_elems (builder, mask_elems);
-
-	vec_perm_indices sel (builder, 2, len);
-	const char *reason;
-	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel,
-				      &reason);
-	ASSERT_TRUE (res == NULL_TREE);
-	ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns"));
-      }
-
-      /* Case 5: mask = {len, 0, 1, ...} // (1, 3)
-	 Test that stepped sequence of the pattern selects from arg0.
-	 res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
-      {
-	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
-	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
-	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
-
-	vec_perm_builder builder (len, 1, 3);
-	poly_uint64 mask_elems[] = { len, 0, 1 };
-	builder_push_elems (builder, mask_elems);
-
-	vec_perm_indices sel (builder, 2, len);
-	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
-
-	tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) };
-	validate_res (1, 3, res, expected_res);
-      }
     }
 }
 
@@ -17528,7 +17501,26 @@  test_nunits_min_4 (machine_mode vmode)
 	validate_res (1, 3, res, expected_res);
       }
 
-      /* Case 4:
+      /* Case 4: mask = {len, 1, 2, ...} // (1, 3)
+	 Test that stepped sequence of the pattern selects from arg0.
+	 res = { arg1[0], arg0[1], arg0[2], ... } // (1, 3)  */
+      {
+	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
+	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
+	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
+
+	vec_perm_builder builder (len, 1, 3);
+	poly_uint64 mask_elems[] = { len, 1, 2 };
+	builder_push_elems (builder, mask_elems);
+
+	vec_perm_indices sel (builder, 2, len);
+	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
+
+	tree expected_res[] = { ARG1(0), ARG0(1), ARG0(2) };
+	validate_res (1, 3, res, expected_res);
+      }
+
+      /* Case 5:
 	sel = { len, 0, 2, ... } // (1, 3)
 	This should return NULL because we cross the input vectors.
 	Because,
@@ -17561,7 +17553,7 @@  test_nunits_min_4 (machine_mode vmode)
 	ASSERT_TRUE (!strcmp (reason, "crossed input vectors"));
       }
 
-      /* Case 5: npatterns(arg0) = 4 > npatterns(sel) = 2
+      /* Case 6: npatterns(arg0) = 4 > npatterns(sel) = 2
 	 mask = { 0, len, 1, len + 1, ...} // (2, 2)
 	 res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2)
 
@@ -17583,7 +17575,7 @@  test_nunits_min_4 (machine_mode vmode)
 	validate_res (2, 2, res, expected_res);
       }
 
-      /* Case 6: Test combination in sel, where one pattern is dup and other
+      /* Case 7: Test combination in sel, where one pattern is dup and other
 	 is stepped sequence.
 	 sel = { 0, 0, 0, 1, 0, 2, ... } // (2, 3)
 	 res = { arg0[0], arg0[0], arg0[0],
@@ -17605,7 +17597,7 @@  test_nunits_min_4 (machine_mode vmode)
 	validate_res (2, 3, res, expected_res);
       }
 
-      /* Case 7: PR111048: Check that we set arg_npatterns correctly,
+      /* Case 8: PR111048: Check that we set arg_npatterns correctly,
 	 when arg0, arg1 and sel have different number of patterns.
 	 arg0 is of shape (1, 1)
 	 arg1 is of shape (4, 1)
@@ -17634,6 +17626,51 @@  test_nunits_min_4 (machine_mode vmode)
 	ASSERT_TRUE (res == NULL_TREE);
 	ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns"));
       }
+
+      /* Case 9: PR111648 - a1 is multiple of vector length,
+	 which results in incorrect encoding. Verify that we return
+	 NULL for this case.
+	 sel = { base_elem, len, len+1, ... } // (1, 3)
+	 In this case, the single pattern is: { base_elem len, len+1, ...}
+	 Let's assume that base_elem is used for indexing into arg0,
+	 and a1 ... ae chooses elements from arg1.
+	 So res = { arg0[base_elem], arg1[0], arg1[1], ... } // (1, 3)
+	 Which creates an incorrect encoding with S = arg1[1] - arg1[0]
+	 while the original encoding in arg1 is
+	 arg1: { arg1[0], arg1[1], arg1[2], ... }
+	 with S = arg1[2] - arg1[1].
+
+	 As a concrete example, for above PR:
+	 arg0: { -16, -9, -10, -11 }
+	 arg1: { -12, -5, -6, -7 }
+	 sel = { 3, 4, 5, 6 }
+
+	 arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3.
+	 Since a1 = 4 and arg_len = 4, it ended up creating the result with
+	 following encoding:
+	 res = { arg0[3], arg1[0], arg1[1] } // (1, 3)
+	     = { -11, -12, -5 }
+
+	 So for res[3], it used S = (-5) - (-12) = 7
+	 And hence computed it as -5 + 7 = 2.
+	 instead of arg1[2], ie, -6, which is the correct value.
+	 Ensure that valid_mask_for_fold_vec_perm_cst returns false for this case.  */
+      {
+	tree arg0 = build_vec_cst_rand (vmode, 1, 3);
+	tree arg1 = build_vec_cst_rand (vmode, 1, 3);
+	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
+
+	vec_perm_builder builder (len, 1, 3);
+	poly_uint64 mask_elems[] = { 0, len, len+1 };
+	builder_push_elems (builder, mask_elems);
+
+	vec_perm_indices sel (builder, 2, len);
+	const char *reason;
+	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason);
+	ASSERT_TRUE (res == NULL_TREE);
+	ASSERT_TRUE (!strcmp (reason,
+			      "selecting base element of input vector"));
+      }
     }
 }
 
diff --git a/gcc/testsuite/gcc.dg/vect/pr111648.c b/gcc/testsuite/gcc.dg/vect/pr111648.c
new file mode 100644
index 00000000000..093e2b02654
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr111648.c
@@ -0,0 +1,23 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int a;
+int *b = &a;
+static int **c = &b;
+static int d;
+short e;
+short f;
+
+_Bool foo ()
+{
+  f = -21;
+  for (; f < -5; f++) {
+    e = f ^ 3;
+    d = *b;
+    **c = e;
+  }
+
+  return d == -6;
+}
+
+/* { dg-final { scan-tree-dump "return 1" "optimized" } } */