diff mbox

Convert character arrays to string csts

Message ID 1a50afaa-6b8e-ba98-6cde-aae51c05a2c4@suse.cz
State New
Headers show

Commit Message

Martin Liška Nov. 3, 2016, 12:12 p.m. UTC
Hello.

This is small follow-up of the patches I sent to string folding.
The patch transforms character array defined in an initializer to string
constant:

+const char global[] = {'a', 'b', 'c', 'd', '\0'};

Apart from that, it also enables string folding of local symbols like:
+  const char local[] = "abcd";

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Martin

Comments

Richard Biener Nov. 3, 2016, 12:46 p.m. UTC | #1
On Thu, Nov 3, 2016 at 1:12 PM, Martin Liška <mliska@suse.cz> wrote:
> Hello.

>

> This is small follow-up of the patches I sent to string folding.

> The patch transforms character array defined in an initializer to string

> constant:

>

> +const char global[] = {'a', 'b', 'c', 'd', '\0'};

>

> Apart from that, it also enables string folding of local symbols like:

> +  const char local[] = "abcd";

>

> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.


Hmm, does this handle

const char global[] = {'a', [4] = 'd', 'b', [3] = '\0'};

correctly?

I think you need to check that 'key' is increasing and there are no gaps
(ISTR constructor elements are sorted but gaps can still appear).

+         || !useless_type_conversion_p (TREE_TYPE (value), char_type_node))

please instead check the element type of the array constructor.  I'd also
use a stricter check like TYPE_MAIN_VARIANT (type) == char_type_node
to avoid changing non-strings like unsigned / signed char.

Finally I'm a bit worried about doing this for all 'char'
constructors.  Maybe we
should restrict this to those with ISPRINT () entries?

Richard.


> Martin
Bernd Schmidt Nov. 3, 2016, 12:49 p.m. UTC | #2
On 11/03/2016 01:12 PM, Martin Liška wrote:
> +  tree init = DECL_INITIAL (decl);

> +  if (init

> +      && TREE_READONLY (decl)

> +      && can_convert_ctor_to_string_cst (init))

> +    DECL_INITIAL (decl) = build_string_cst_from_ctor (init);


I'd merge these two new functions since they're only ever called 
together. We'd then have something like

if (init && TREE_READONLY (decl))
   init = convert_ctor_to_string_cst (init);
if (init)
   DECL_INITIAL (decl) = init;

I'll defer to Jan on whether finalize_decl seems like a good place to do 
this.

> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c

> index 283bd1c..b2d1fd5 100644

> --- a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c

> +++ b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c

> @@ -4,12 +4,15 @@

>  char *buffer1;

>  char *buffer2;

>

> +const char global[] = {'a', 'b', 'c', 'd', '\0'};

> +

>  #define SIZE 1000

>

>  int

>  main (void)

>  {

>    const char* const foo1 = "hello world";

> +  const char local[] = "abcd";

>

>    buffer1 = __builtin_malloc (SIZE);

>    __builtin_strcpy (buffer1, foo1);

> @@ -45,6 +48,10 @@ main (void)

>      __builtin_abort ();

>    if (__builtin_memchr (foo1, null, 12) != foo1 + 11)

>      __builtin_abort ();

> +  if (__builtin_memchr (global, null, 5) == 0)

> +    __builtin_abort ();

> +  if (__builtin_memchr (local, null, 5) == 0)

> +    __builtin_abort ();


How is that a meaningful test? This seems to work even with an unpatched 
gcc. I'd have expected something that shows a benefit for doing this 
conversion, and maybe also a test that shows it isn't done in cases 
where it's not allowed.

>  tree

> -build_string_literal (int len, const char *str)

> +build_string_literal (int len, const char *str, bool build_addr_expr)


New arguments should be documented in the function comment.

> +/* Return TRUE when CTOR can be converted to a string constant.  */


"if", not "when".

> +  unsigned HOST_WIDE_INT elements = CONSTRUCTOR_NELTS (ctor);

> +  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), idx, key, value)

> +    {

> +      if (key == NULL_TREE

> +	  || TREE_CODE (key) != INTEGER_CST

> +	  || !tree_fits_uhwi_p (value)

> +	  || !useless_type_conversion_p (TREE_TYPE (value), char_type_node))

> +	return false;


Shouldn't all elements have the same type, or do you really have to call 
useless_type_conversion in a loop?

> +      /* Allow zero character just at the end of a string.  */

> +      if (integer_zerop (value))

> +	return idx == elements - 1;


Don't you also have to explicitly check it's there?


Bernd
Jan Hubicka Nov. 3, 2016, 1 p.m. UTC | #3
> On 11/03/2016 01:12 PM, Martin Liška wrote:

> >+  tree init = DECL_INITIAL (decl);

> >+  if (init

> >+      && TREE_READONLY (decl)

> >+      && can_convert_ctor_to_string_cst (init))

> >+    DECL_INITIAL (decl) = build_string_cst_from_ctor (init);

> 

> I'd merge these two new functions since they're only ever called

> together. We'd then have something like

> 

> if (init && TREE_READONLY (decl))

>   init = convert_ctor_to_string_cst (init);

> if (init)

>   DECL_INITIAL (decl) = init;

> 

> I'll defer to Jan on whether finalize_decl seems like a good place

> to do this.


I think finalize_decl may be bit too early because frontends may expects the
ctors to be in a way they produced them.  We only want to convert those arrays
seen by middle-end so I would move the logic to varpool_node::analyze

Otherwise the patch seems fine to me.

Honza
> 

> >diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c

> >index 283bd1c..b2d1fd5 100644

> >--- a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c

> >+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c

> >@@ -4,12 +4,15 @@

> > char *buffer1;

> > char *buffer2;

> >

> >+const char global[] = {'a', 'b', 'c', 'd', '\0'};

> >+

> > #define SIZE 1000

> >

> > int

> > main (void)

> > {

> >   const char* const foo1 = "hello world";

> >+  const char local[] = "abcd";

> >

> >   buffer1 = __builtin_malloc (SIZE);

> >   __builtin_strcpy (buffer1, foo1);

> >@@ -45,6 +48,10 @@ main (void)

> >     __builtin_abort ();

> >   if (__builtin_memchr (foo1, null, 12) != foo1 + 11)

> >     __builtin_abort ();

> >+  if (__builtin_memchr (global, null, 5) == 0)

> >+    __builtin_abort ();

> >+  if (__builtin_memchr (local, null, 5) == 0)

> >+    __builtin_abort ();

> 

> How is that a meaningful test? This seems to work even with an

> unpatched gcc. I'd have expected something that shows a benefit for

> doing this conversion, and maybe also a test that shows it isn't

> done in cases where it's not allowed.

> 

> > tree

> >-build_string_literal (int len, const char *str)

> >+build_string_literal (int len, const char *str, bool build_addr_expr)

> 

> New arguments should be documented in the function comment.

> 

> >+/* Return TRUE when CTOR can be converted to a string constant.  */

> 

> "if", not "when".

> 

> >+  unsigned HOST_WIDE_INT elements = CONSTRUCTOR_NELTS (ctor);

> >+  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), idx, key, value)

> >+    {

> >+      if (key == NULL_TREE

> >+	  || TREE_CODE (key) != INTEGER_CST

> >+	  || !tree_fits_uhwi_p (value)

> >+	  || !useless_type_conversion_p (TREE_TYPE (value), char_type_node))

> >+	return false;

> 

> Shouldn't all elements have the same type, or do you really have to

> call useless_type_conversion in a loop?

> 

> >+      /* Allow zero character just at the end of a string.  */

> >+      if (integer_zerop (value))

> >+	return idx == elements - 1;

> 

> Don't you also have to explicitly check it's there?

> 

> 

> Bernd
Martin Liška Nov. 4, 2016, 1:31 p.m. UTC | #4
On 11/03/2016 01:46 PM, Richard Biener wrote:
> On Thu, Nov 3, 2016 at 1:12 PM, Martin Liška <mliska@suse.cz> wrote:

>> Hello.

>>

>> This is small follow-up of the patches I sent to string folding.

>> The patch transforms character array defined in an initializer to string

>> constant:

>>

>> +const char global[] = {'a', 'b', 'c', 'd', '\0'};

>>

>> Apart from that, it also enables string folding of local symbols like:

>> +  const char local[] = "abcd";

>>

>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

> 

> Hmm, does this handle

> 

> const char global[] = {'a', [4] = 'd', 'b', [3] = '\0'};

> 

> correctly?


It hasn't, fixed in the v2. As ctor indices are sorted in increasing order,
I only check that there are no holes.

> 

> I think you need to check that 'key' is increasing and there are no gaps

> (ISTR constructor elements are sorted but gaps can still appear).

> 

> +         || !useless_type_conversion_p (TREE_TYPE (value), char_type_node))

> 

> please instead check the element type of the array constructor.  I'd also

> use a stricter check like TYPE_MAIN_VARIANT (type) == char_type_node

> to avoid changing non-strings like unsigned / signed char.


Ok, v2 does: TYPE_MAIN_VARIANT (type) == char_type_node

> 

> Finally I'm a bit worried about doing this for all 'char'

> constructors.  Maybe we

> should restrict this to those with ISPRINT () entries?


Also done for all characters except the trailing \0 character.
I'll send patch in the next email.

Martin

> 

> Richard.

> 

> 

>> Martin
diff mbox

Patch

From ecd2b614072f55e71896f7f5bf290072b3a4b04c Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Mon, 17 Oct 2016 15:24:46 +0200
Subject: [PATCH] Convert character arrays to string csts

gcc/testsuite/ChangeLog:

2016-10-24  Martin Liska  <mliska@suse.cz>

	* gcc.dg/tree-ssa/builtins-folding-gimple.c (main): Add new
	tests.

gcc/ChangeLog:

2016-10-24  Martin Liska  <mliska@suse.cz>

	* cgraphunit.c (varpool_node::finalize_decl): Convert ctors
	to string constants if possible.
	* gimplify.c (gimplify_decl_expr): Emit INIT_EXPR just if it
	cannot be converted to a string constant.
	(gimplify_init_constructor): Create string constant for local
	variables (if possible).
	* tree.c (build_string_cst_from_ctor): New function.
	(build_string_literal): Add new argument.
	(can_convert_ctor_to_string_cst): New function.
	* tree.h: Declare new functions.
	* varpool.c (ctor_for_folding): Support automatic variables.
---
 gcc/cgraphunit.c                                   |  6 ++
 gcc/gimplify.c                                     | 14 ++++-
 .../gcc.dg/tree-ssa/builtins-folding-gimple.c      |  7 +++
 gcc/tree.c                                         | 68 ++++++++++++++++++++--
 gcc/tree.h                                         |  6 +-
 gcc/varpool.c                                      |  8 ++-
 6 files changed, 100 insertions(+), 9 deletions(-)

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index e315a77..bc041d6 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -791,6 +791,12 @@  process_function_and_variable_attributes (cgraph_node *first,
 void
 varpool_node::finalize_decl (tree decl)
 {
+  tree init = DECL_INITIAL (decl);
+  if (init
+      && TREE_READONLY (decl)
+      && can_convert_ctor_to_string_cst (init))
+    DECL_INITIAL (decl) = build_string_cst_from_ctor (init);
+
   varpool_node *node = varpool_node::get_create (decl);
 
   gcc_assert (TREE_STATIC (decl) || DECL_EXTERNAL (decl));
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 1531582..4520843 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -1495,7 +1495,8 @@  gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
 	{
 	  if (!TREE_STATIC (decl))
 	    {
-	      DECL_INITIAL (decl) = NULL_TREE;
+	      if (!TREE_READONLY (decl) || TREE_CODE (init) != STRING_CST)
+		DECL_INITIAL (decl) = NULL_TREE;
 	      init = build2 (INIT_EXPR, void_type_node, decl, init);
 	      gimplify_and_add (init, seq_p);
 	      ggc_free (init);
@@ -4438,6 +4439,17 @@  gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 	    break;
 	  }
 
+	/* Replace a ctor with a string constant with possible.  */
+	if (TREE_READONLY (object)
+	    && VAR_P (object)
+	    && can_convert_ctor_to_string_cst (ctor))
+	  {
+	    ctor = build_string_cst_from_ctor (ctor);
+	    TREE_OPERAND (*expr_p, 1) = ctor;
+	    DECL_INITIAL (object) = ctor;
+	    break;
+	  }
+
 	/* Fetch information about the constructor to direct later processing.
 	   We might want to make static versions of it in various cases, and
 	   can only do so if it known to be a valid constant initializer.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
index 283bd1c..b2d1fd5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtins-folding-gimple.c
@@ -4,12 +4,15 @@ 
 char *buffer1;
 char *buffer2;
 
+const char global[] = {'a', 'b', 'c', 'd', '\0'};
+
 #define SIZE 1000
 
 int
 main (void)
 {
   const char* const foo1 = "hello world";
+  const char local[] = "abcd";
 
   buffer1 = __builtin_malloc (SIZE);
   __builtin_strcpy (buffer1, foo1);
@@ -45,6 +48,10 @@  main (void)
     __builtin_abort ();
   if (__builtin_memchr (foo1, null, 12) != foo1 + 11)
     __builtin_abort ();
+  if (__builtin_memchr (global, null, 5) == 0)
+    __builtin_abort ();
+  if (__builtin_memchr (local, null, 5) == 0)
+    __builtin_abort ();
 
   __builtin_memchr (foo1, x, 11);
   __builtin_memchr (buffer1, x, zero);
diff --git a/gcc/tree.c b/gcc/tree.c
index 56cc653..dcf767f 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -1784,6 +1784,26 @@  build_vector_from_val (tree vectype, tree sc)
     }
 }
 
+/* Build string constant from a constructor CTOR.  */
+
+tree
+build_string_cst_from_ctor (tree ctor)
+{
+  unsigned HOST_WIDE_INT idx;
+  tree value;
+  vec<constructor_elt, va_gc> *elts = CONSTRUCTOR_ELTS (ctor);
+  char *str = XNEWVEC (char, elts->length ());
+
+  FOR_EACH_CONSTRUCTOR_VALUE (elts, idx, value)
+    str[idx] = (char)tree_to_uhwi (value);
+
+  tree init = build_string_literal (elts->length (),
+				    ggc_alloc_string (str, elts->length ()),
+				    false);
+  free (str);
+  return init;
+}
+
 /* Something has messed with the elements of CONSTRUCTOR C after it was built;
    calculate TREE_CONSTANT and TREE_SIDE_EFFECTS.  */
 
@@ -11344,7 +11364,7 @@  maybe_build_call_expr_loc (location_t loc, combined_fn fn, tree type,
 /* Create a new constant string literal and return a char* pointer to it.
    The STRING_CST value is the LEN characters at STR.  */
 tree
-build_string_literal (int len, const char *str)
+build_string_literal (int len, const char *str, bool build_addr_expr)
 {
   tree t, elem, index, type;
 
@@ -11357,10 +11377,14 @@  build_string_literal (int len, const char *str)
   TREE_READONLY (t) = 1;
   TREE_STATIC (t) = 1;
 
-  type = build_pointer_type (elem);
-  t = build1 (ADDR_EXPR, type,
-	      build4 (ARRAY_REF, elem,
-		      t, integer_zero_node, NULL_TREE, NULL_TREE));
+  if (build_addr_expr)
+    {
+      type = build_pointer_type (elem);
+      t = build1 (ADDR_EXPR, type,
+		  build4 (ARRAY_REF, elem,
+			  t, integer_zero_node, NULL_TREE, NULL_TREE));
+    }
+
   return t;
 }
 
@@ -14217,6 +14241,40 @@  combined_fn_name (combined_fn fn)
     return internal_fn_name (as_internal_fn (fn));
 }
 
+/* Return TRUE when CTOR can be converted to a string constant.  */
+
+bool
+can_convert_ctor_to_string_cst (tree ctor)
+{
+  unsigned HOST_WIDE_INT idx;
+  tree value, key;
+
+  if (TREE_CODE (ctor) != CONSTRUCTOR
+      || TREE_CODE (TREE_TYPE (ctor)) != ARRAY_TYPE)
+    return false;
+
+  tree t = TREE_TYPE (TREE_TYPE (ctor));
+  if (TREE_CODE (TYPE_SIZE (t)) != INTEGER_CST
+      || tree_to_uhwi (TYPE_SIZE (t)) != BITS_PER_UNIT)
+    return false;
+
+  unsigned HOST_WIDE_INT elements = CONSTRUCTOR_NELTS (ctor);
+  FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (ctor), idx, key, value)
+    {
+      if (key == NULL_TREE
+	  || TREE_CODE (key) != INTEGER_CST
+	  || !tree_fits_uhwi_p (value)
+	  || !useless_type_conversion_p (TREE_TYPE (value), char_type_node))
+	return false;
+
+      /* Allow zero character just at the end of a string.  */
+      if (integer_zerop (value))
+	return idx == elements - 1;
+    }
+
+  return false;
+}
+
 #if CHECKING_P
 
 namespace selftest {
diff --git a/gcc/tree.h b/gcc/tree.h
index 531bc5e..ea5d184 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -3975,6 +3975,8 @@  extern tree build_vector_stat (tree, tree * MEM_STAT_DECL);
 #define build_vector(t,v) build_vector_stat (t, v MEM_STAT_INFO)
 extern tree build_vector_from_ctor (tree, vec<constructor_elt, va_gc> *);
 extern tree build_vector_from_val (tree, tree);
+extern tree build_string_cst_from_ctor (tree);
+extern tree build_vector_from_val (tree, tree);
 extern void recompute_constructor_flags (tree);
 extern void verify_constructor_flags (tree);
 extern tree build_constructor (tree, vec<constructor_elt, va_gc> *);
@@ -4022,7 +4024,8 @@  extern tree build_call_expr_internal_loc_array (location_t, enum internal_fn,
 						tree, int, const tree *);
 extern tree maybe_build_call_expr_loc (location_t, combined_fn, tree,
 				       int, ...);
-extern tree build_string_literal (int, const char *);
+extern tree build_string_literal (int len, const char *str,
+				  bool build_addr_expr = true);
 
 /* Construct various nodes representing data types.  */
 
@@ -4688,6 +4691,7 @@  extern void assign_assembler_name_if_neeeded (tree);
 extern void warn_deprecated_use (tree, tree);
 extern void cache_integer_cst (tree);
 extern const char *combined_fn_name (combined_fn);
+extern bool can_convert_ctor_to_string_cst (tree ctor);
 
 /* Compare and hash for any structure which begins with a canonical
    pointer.  Assumes all pointers are interchangeable, which is sort
diff --git a/gcc/varpool.c b/gcc/varpool.c
index 78969d2..6c9d067 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -412,11 +412,15 @@  ctor_for_folding (tree decl)
     return error_mark_node;
 
   /* Do not care about automatic variables.  Those are never initialized
-     anyway, because gimplifier exapnds the code.  */
+     anyway, because gimplifier expands the code.  */
   if (!TREE_STATIC (decl) && !DECL_EXTERNAL (decl))
     {
       gcc_assert (!TREE_PUBLIC (decl));
-      return error_mark_node;
+      if (!TREE_READONLY (decl) || TREE_THIS_VOLATILE (decl))
+	return error_mark_node;
+
+      tree ctor = DECL_INITIAL (decl);
+      return ctor == NULL_TREE ? error_mark_node : ctor;
     }
 
   gcc_assert (VAR_P (decl));
-- 
2.10.1