diff mbox

[RFC] propagate malloc attribute in ipa-pure-const pass

Message ID CAAgBjMkaTCAvwO6AVKu5C6_-TqpuQtTcs0EqYHqogDhru3juYg@mail.gmail.com
State New
Headers show

Commit Message

Prathamesh Kulkarni May 15, 2017, 10:39 a.m. UTC
Hi,
I have attached a bare-bones prototype patch that propagates malloc attribute in
ipa-pure-const. As far as I understand, from the doc a function could
be annotated
with malloc attribute if it returns a pointer to a newly allocated
memory block (uninitialized or zeroed) and the pointer does not alias
any other valid pointer ?

* Analysis
The analysis used by the patch in malloc_candidate_p is too conservative,
and I would be grateful for suggestions on improving it.
Currently it only checks if:
1) The function returns a value of pointer type.
2) SSA_NAME_DEF_STMT (return_value) is either a function call or a phi, and
    SSA_NAME_DEF_STMT(each element of phi) is function call.
3) The return-value has immediate uses only within comparisons (gcond
or gassign) and return_stmt (and likewise a phi arg has immediate use only
within comparison or the phi stmt).

The intent is that malloc_candidate_p(node) returns true if node
returns the return value of it's callee, and the return value is only
used for comparisons within node.
Then I assume it's safe (although conservative) to decide that node
could possibly be a malloc function if callee is found to be malloc ?

for eg:
void *f(size_t n)
{
  void *p = __builtin_malloc (n);
  if (p == 0)
    __builtin_abort ();
  return p;
}

malloc_candidate_p would determine f to be a candidate for malloc
attribute since it returns the return-value of it's callee
(__builtin_malloc) and the return value is used only within comparison
and return_stmt.

However due to the imprecise analysis it misses this case:
char  **g(size_t n)
{
  char **p = malloc (sizeof(char *) * 10);
  for (int i = 0; i < 10; i++)
    p[i] = malloc (sizeof(char) * n);
  return p;
}
I suppose g() could be annotated with malloc here ?

The patch uses return_calles_map which is a hash_map<node, callees> such
that the return value of node is return value of one of these callees.
For above eg it would be <f, [__builtin_malloc]>
The analysis phase populates return_callees_map, and the propagation
phase uses it to take the "meet" of callees.

* LTO and memory management
This is a general question about LTO and memory management.
IIUC the following sequence takes place during normal LTO:
LGEN: generate_summary, write_summary
WPA: read_summary, execute ipa passes, write_opt_summary

So I assumed it was OK in LGEN to allocate return_callees_map in
generate_summary and free it in write_summary and during WPA, allocate
return_callees_map in read_summary and free it after execute (since
write_opt_summary does not require return_callees_map).

However with fat LTO, it seems the sequence changes for LGEN with
execute phase takes place after write_summary. However since
return_callees_map is freed in pure_const_write_summary and
propagate_malloc() accesses it in execute stage, it results in
segmentation fault.

To work around this, I am using the following hack in pure_const_write_summary:
// FIXME: Do not free if -ffat-lto-objects is enabled.
if (!global_options.x_flag_fat_lto_objects)
  free_return_callees_map ();
Is there a better approach for handling this ?

Also I am not sure if I have written cgraph_node::set_malloc_flag[_1] correctly.
I tried to imitate cgraph_node::set_const_flag.

The patch passes bootstrap+test on x86_64 and found a few functions in
the source tree (attached func_names.txt) that could be annotated with
malloc (I gave a brief look at some of the functions and didn't appear
to be false positives but I will recheck thoroughly)

Does the patch look in the right direction ?
I would be grateful for suggestions on improving it.

Thanks,
Prathamesh
virtual char* libcp1::compiler::find(std::__cxx11::string&) const
gomp_malloc
virtual char* libcc1::compiler::find(std::__cxx11::string&) const
void* GTM::xrealloc(void*, size_t, bool)
concat
char* gen_internal_sym(const char*)
void* ira_allocate(size_t)
char* gen_fake_label()
char* selftest::locate_file(const char*)
const char* find_plugindir_spec_function(int, const char**)
reconcat
xvasprintf
char* rtx_reader::read_until(const char*, bool)
_Tp* __gnu_cxx::__detail::__mini_vector<_Tp>::allocate(__gnu_cxx::__detail::__mini_vector<_Tp>::size_type) [with _Tp = long unsigned int*]
xstrndup
const char* replace_extension_spec_func(int, const char**)
void* GTM::xcalloc(size_t, bool)
_Tp* __gnu_cxx::__detail::__mini_vector<_Tp>::allocate(__gnu_cxx::__detail::__mini_vector<_Tp>::size_type) [with _Tp = unsigned int*]
xstrdup
void* GTM::xmalloc(size_t, bool)
void* base_pool_allocator<TBlockAllocator>::allocate() [with TBlockAllocator = memory_block_pool]
char* ix86_offload_options()
basic_block_def* alloc_block()
xmemdup
char* build_message_string(const char*, ...)
make_relative_prefix
gomp_malloc_cleared
make_relative_prefix_ignore_links
choose_temp_base
make_temp_file
xasprintf
char* file_name_as_prefix(diagnostic_context*, const char*)
void* yyalloc(yy_size_t)
void* ggc_internal_cleared_alloc(size_t, void (*)(void*), size_t, size_t)
void* __cilkrts_hyperobject_alloc(void*, std::size_t)
void* alloc_for_identifier_to_locale(size_t)
const char* op_error_string(const char*, int, bool)
host_alloc
void* stringpool_ggc_alloc(size_t)
fibheap_new
deps* deps_init()
C_alloca
char* lto_get_section_name(int, const char*, lto_file_decl_data*)
void* lto_zalloc(void*, unsigned int, unsigned int)
_Tp* __gnu_cxx::__detail::__mini_vector<_Tp>::allocate(__gnu_cxx::__detail::__mini_vector<_Tp>::size_type) [with _Tp = std::pair<__gnu_cxx::bitmap_allocator<wchar_t>::_Alloc_block*, __gnu_cxx::bitmap_allocator<wchar_t>::_Alloc_block*>]
void* ggc_internal_alloc(size_t, void (*)(void*), size_t, size_t)
void* ggc_splay_alloc(int, void*)
zcalloc
dupargv
_Tp* __gnu_cxx::__detail::__mini_vector<_Tp>::allocate(__gnu_cxx::__detail::__mini_vector<_Tp>::size_type) [with _Tp = std::pair<__gnu_cxx::bitmap_allocator<char>::_Alloc_block*, __gnu_cxx::bitmap_allocator<char>::_Alloc_block*>]
lrealpath
void* mempool_obstack_chunk_alloc(size_t)
char* describe_cache(cache_desc, cache_desc)
xcalloc
int* ira_allocate_cost_vector(reg_class_t)
buildargv
char* diagnostic_get_location_text(diagnostic_context*, expanded_location)
splay_tree_xmalloc_allocate
lto_in_decl_state* lto_new_in_decl_state()
char* affix_data_type(const char*)
ggc_pch_data* init_ggc_pch()
xmalloc
char* diagnostic_build_prefix(diagnostic_context*, const diagnostic_info*)

Comments

Jeff Law May 16, 2017, 1:12 a.m. UTC | #1
On 05/15/2017 04:39 AM, Prathamesh Kulkarni wrote:
> Hi,

> I have attached a bare-bones prototype patch that propagates malloc attribute in

> ipa-pure-const. As far as I understand, from the doc a function could

> be annotated

> with malloc attribute if it returns a pointer to a newly allocated

> memory block (uninitialized or zeroed) and the pointer does not alias

> any other valid pointer ?

> 

> * Analysis

> The analysis used by the patch in malloc_candidate_p is too conservative,

> and I would be grateful for suggestions on improving it.

> Currently it only checks if:

> 1) The function returns a value of pointer type.

> 2) SSA_NAME_DEF_STMT (return_value) is either a function call or a phi, and

>      SSA_NAME_DEF_STMT(each element of phi) is function call.

> 3) The return-value has immediate uses only within comparisons (gcond

> or gassign) and return_stmt (and likewise a phi arg has immediate use only

> within comparison or the phi stmt).

ISTM the return value *must* be either the result of a call to a 
function with the malloc attribute or NULL.  It can't be a mix of 
malloc'd objects or something else (such as a passed-in object).

> 

> malloc_candidate_p would determine f to be a candidate for malloc

> attribute since it returns the return-value of it's callee

> (__builtin_malloc) and the return value is used only within comparison

> and return_stmt.

> 

> However due to the imprecise analysis it misses this case:

> char  **g(size_t n)

> {

>    char **p = malloc (sizeof(char *) * 10);

>    for (int i = 0; i < 10; i++)

>      p[i] = malloc (sizeof(char) * n);

>    return p;

> }

> I suppose g() could be annotated with malloc here ?

I think that's up to us to decide.  So the question becomes what makes 
the most sense from a user and optimization standpoint.


I would start relatively conservatively and look to add further analysis 
to detect more malloc functions over time.  You've already identified 
comparisons as valid uses, but ISTM the pointer could be passed as an 
argument, stored into memory and all kinds of other things.

You'll probably want instrumentation to log cases where something looked 
like a potential candidate, but had to be rejected for some reason.

Jeff
Richard Biener May 16, 2017, 11:41 a.m. UTC | #2
On Mon, 15 May 2017, Prathamesh Kulkarni wrote:

> Hi,

> I have attached a bare-bones prototype patch that propagates malloc attribute in

> ipa-pure-const. As far as I understand, from the doc a function could

> be annotated

> with malloc attribute if it returns a pointer to a newly allocated

> memory block (uninitialized or zeroed) and the pointer does not alias

> any other valid pointer ?

> 

> * Analysis

> The analysis used by the patch in malloc_candidate_p is too conservative,

> and I would be grateful for suggestions on improving it.

> Currently it only checks if:

> 1) The function returns a value of pointer type.

> 2) SSA_NAME_DEF_STMT (return_value) is either a function call or a phi, and

>     SSA_NAME_DEF_STMT(each element of phi) is function call.

> 3) The return-value has immediate uses only within comparisons (gcond

> or gassign) and return_stmt (and likewise a phi arg has immediate use only

> within comparison or the phi stmt).

>

> The intent is that malloc_candidate_p(node) returns true if node

> returns the return value of it's callee, and the return value is only

> used for comparisons within node.

> Then I assume it's safe (although conservative) to decide that node

> could possibly be a malloc function if callee is found to be malloc ?

> 

> for eg:

> void *f(size_t n)

> {

>   void *p = __builtin_malloc (n);

>   if (p == 0)

>     __builtin_abort ();

>   return p;

> }

> 

> malloc_candidate_p would determine f to be a candidate for malloc

> attribute since it returns the return-value of it's callee

> (__builtin_malloc) and the return value is used only within comparison

> and return_stmt.

> 

> However due to the imprecise analysis it misses this case:

> char  **g(size_t n)

> {

>   char **p = malloc (sizeof(char *) * 10);

>   for (int i = 0; i < 10; i++)

>     p[i] = malloc (sizeof(char) * n);

>   return p;

> }

> I suppose g() could be annotated with malloc here ?


It cannot because what p points to is initialized.  If you do then

 char **x = g(10);
 **x = 1;

will be optimized away.  Now what would be interesting is to do
"poor mans IPA PTA", namely identify functions that are a) small,
b) likely return a newly allocated pointer.  At PTA time then
"inline" all such wrappers, but only for the PTA constraints.

That will give more precise points-to results (and can handle more
cases, esp. initialization) than just annotating functions with 'malloc'.

+      /* With non-call exceptions we can't say for sure if other function
+        body was not possibly optimized to still throw.  */
+      if (!non_call || node->binds_to_current_def_p ())
+       {
+         DECL_IS_MALLOC (node->decl) = true;
+         *changed = true;

I don't see how malloc throwing would be an issue.

+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, retval)
+    if (gcond *cond = dyn_cast<gcond *> (use_stmt))
+      {
+       enum tree_code code = gimple_cond_code (cond);
+       if (TREE_CODE_CLASS (code) != tcc_comparison)

always false

+         RETURN_FROM_IMM_USE_STMT (use_iter, false);

I think you want to disallow eq/ne_expr against sth not integer_zerop.

+    else if (gassign *ga = dyn_cast<gassign *> (use_stmt))
+      {
+       enum tree_code code = gimple_assign_rhs_code (ga);
+       if (TREE_CODE_CLASS (code) != tcc_comparison)
+         RETURN_FROM_IMM_USE_STMT (use_iter, false);

likewise.

+static bool
+malloc_candidate_p (function *fun, vec<cgraph_node *>& callees)
+{
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fun)
+    {
+      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+          !gsi_end_p (gsi);
+          gsi_next (&gsi))
+       {
+         greturn *ret_stmt = dyn_cast<greturn *> (gsi_stmt (gsi));
+         if (ret_stmt)

I think you want do do this much cheaper by only walking the last
stmt of predecessor(s) of EXIT_BLOCK_FOR_FN.  (s) because
we have multiple return stmts for

int foo (int i)
{
  if (i)
    return;
  else
    return i;
}

but all return stmts will be the last stmt in one of the exit block
predecessors.

+             if (!check_retval_uses (retval, ret_stmt))
+               return false;
+
+             gimple *def = SSA_NAME_DEF_STMT (retval);
+             if (gcall *call_stmt = dyn_cast<gcall *> (def))
+               {
+                 tree callee_decl = gimple_call_fndecl (call_stmt);
+                 /* FIXME: In case of indirect call, callee_decl might be 
NULL
+                    Not sure what to do in that case, punting for now.  
*/
+                 if (!callee_decl)
+                   return false;
+                 cgraph_node *callee = cgraph_node::get_create 
(callee_decl);
+                 callees.safe_push (callee);
+               }
+             else if (gphi *phi = dyn_cast<gphi *> (def))
+               for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+                 {
+                   tree arg = gimple_phi_arg_def (phi, i);
+                   if (TREE_CODE (arg) != SSA_NAME)
+                     return false;
+                   if (!check_retval_uses (arg, phi))
+                     return false;
+

I think you should refactor this to a separate function and handle
recursion.  For recursion you miss simple SSA name assigns.

For PHI args you want to allow literal zero as well (for
-fdelete-null-pointer-checks).

> The patch uses return_calles_map which is a hash_map<node, callees> such

> that the return value of node is return value of one of these callees.

> For above eg it would be <f, [__builtin_malloc]>

> The analysis phase populates return_callees_map, and the propagation

> phase uses it to take the "meet" of callees.


I think you are supposed to treat functions optimistically as "malloc"
(use the DECL_IS_MALLOC flag) and during propagation revisit that
change by propagating across callgraph edges.  callgraph edges that are
relevant (calls feeding a pointer return value) should then be
marked somehow (set a flag you stream).

This also means that indirect calls should be only handled during
propagation.

Not sure if we have an example of a "edge encoder".  Honza?

Bonus points for auto-detecting alloc_size and alloc_align attributes
and warning with -Wsuggest-attribute=malloc{_size,_align,}.

> * LTO and memory management

> This is a general question about LTO and memory management.

> IIUC the following sequence takes place during normal LTO:

> LGEN: generate_summary, write_summary

> WPA: read_summary, execute ipa passes, write_opt_summary

> 

> So I assumed it was OK in LGEN to allocate return_callees_map in

> generate_summary and free it in write_summary and during WPA, allocate

> return_callees_map in read_summary and free it after execute (since

> write_opt_summary does not require return_callees_map).

> 

> However with fat LTO, it seems the sequence changes for LGEN with

> execute phase takes place after write_summary. However since

> return_callees_map is freed in pure_const_write_summary and

> propagate_malloc() accesses it in execute stage, it results in

> segmentation fault.

> 

> To work around this, I am using the following hack in pure_const_write_summary:

> // FIXME: Do not free if -ffat-lto-objects is enabled.

> if (!global_options.x_flag_fat_lto_objects)

>   free_return_callees_map ();

> Is there a better approach for handling this ?

> 

> Also I am not sure if I have written cgraph_node::set_malloc_flag[_1] correctly.

> I tried to imitate cgraph_node::set_const_flag.

> 

> The patch passes bootstrap+test on x86_64 and found a few functions in

> the source tree (attached func_names.txt) that could be annotated with

> malloc (I gave a brief look at some of the functions and didn't appear

> to be false positives but I will recheck thoroughly)

> 

> Does the patch look in the right direction ?

> I would be grateful for suggestions on improving it.

> 

> Thanks,

> Prathamesh

> 


-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Martin Sebor May 17, 2017, 9:01 p.m. UTC | #3
> The patch passes bootstrap+test on x86_64 and found a few functions in

> the source tree (attached func_names.txt) that could be annotated with

> malloc (I gave a brief look at some of the functions and didn't appear

> to be false positives but I will recheck thoroughly)


virtual char* libcp1::compiler::find(std::__cxx11::string&) const

The virtual on the list of your candidates gave me pause.  Consider
this completely contrived example:

   struct B {
     virtual void* f (unsigned n) {
       return new char [n];
     }
   };

   void* foo (B &b, unsigned n)
   {
     return b.f (n);
   }

Based on these definitions alone both functions are candidates
for attribute malloc.

But suppose foo is called with an object of a type derived from
B that overrides f() to do something wacky (but strictly not
invalid) like:

   struct D: B {
     char buf[32];
     virtual void* f (unsigned n) {
       if (n < 32)
       return n <= 32 ? buf : B::f (n);
     }

Breaking foo's attribute malloc constraint.

In other words, I think virtual functions need to be excluded
from the list (unless they're defined in a class marked final,
or unless we know they're not overridden to break the constraint
like above).

Martin
Richard Biener May 18, 2017, 7:06 a.m. UTC | #4
On Wed, 17 May 2017, Martin Sebor wrote:

> > The patch passes bootstrap+test on x86_64 and found a few functions in

> > the source tree (attached func_names.txt) that could be annotated with

> > malloc (I gave a brief look at some of the functions and didn't appear

> > to be false positives but I will recheck thoroughly)

> 

> virtual char* libcp1::compiler::find(std::__cxx11::string&) const

> 

> The virtual on the list of your candidates gave me pause.  Consider

> this completely contrived example:

> 

>   struct B {

>     virtual void* f (unsigned n) {

>       return new char [n];

>     }

>   };

> 

>   void* foo (B &b, unsigned n)

>   {

>     return b.f (n);

>   }

> 

> Based on these definitions alone both functions are candidates

> for attribute malloc.

> 

> But suppose foo is called with an object of a type derived from

> B that overrides f() to do something wacky (but strictly not

> invalid) like:

> 

>   struct D: B {

>     char buf[32];

>     virtual void* f (unsigned n) {

>       if (n < 32)

>       return n <= 32 ? buf : B::f (n);

>     }

> 

> Breaking foo's attribute malloc constraint.

> 

> In other words, I think virtual functions need to be excluded

> from the list (unless they're defined in a class marked final,

> or unless we know they're not overridden to break the constraint

> like above).


But we are annotating the actual decl, not the type in the class
struct.

Richard.
Jan Hubicka May 19, 2017, 1:06 p.m. UTC | #5
> >   struct D: B {

> >     char buf[32];

> >     virtual void* f (unsigned n) {

> >       if (n < 32)

> >       return n <= 32 ? buf : B::f (n);

> >     }

> > 

> > Breaking foo's attribute malloc constraint.

> > 

> > In other words, I think virtual functions need to be excluded

> > from the list (unless they're defined in a class marked final,

> > or unless we know they're not overridden to break the constraint

> > like above).

> 

> But we are annotating the actual decl, not the type in the class

> struct.


Yep and we will be able to use the information in case we devirtualize
the call.  So this seems OK to me.

Honza
> 

> Richard.
Jan Hubicka May 19, 2017, 1:32 p.m. UTC | #6
> 

> * LTO and memory management

> This is a general question about LTO and memory management.

> IIUC the following sequence takes place during normal LTO:

> LGEN: generate_summary, write_summary

> WPA: read_summary, execute ipa passes, write_opt_summary

> 

> So I assumed it was OK in LGEN to allocate return_callees_map in

> generate_summary and free it in write_summary and during WPA, allocate

> return_callees_map in read_summary and free it after execute (since

> write_opt_summary does not require return_callees_map).

> 

> However with fat LTO, it seems the sequence changes for LGEN with

> execute phase takes place after write_summary. However since

> return_callees_map is freed in pure_const_write_summary and

> propagate_malloc() accesses it in execute stage, it results in

> segmentation fault.

> 

> To work around this, I am using the following hack in pure_const_write_summary:

> // FIXME: Do not free if -ffat-lto-objects is enabled.

> if (!global_options.x_flag_fat_lto_objects)

>   free_return_callees_map ();

> Is there a better approach for handling this ?


I think most passes just do not free summaries with -flto.  We probably want
to fix it to make it possible to compile multiple units i.e. from plugin by
adding release_summaries method...
So I would say it is OK to do the same as others do and leak it with -flto.
> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c

> index e457166ea39..724c26e03f6 100644

> --- a/gcc/ipa-pure-const.c

> +++ b/gcc/ipa-pure-const.c

> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see

>  #include "tree-scalar-evolution.h"

>  #include "intl.h"

>  #include "opts.h"

> +#include "ssa.h"

>  

>  /* Lattice values for const and pure functions.  Everything starts out

>     being const, then may drop to pure and then neither depending on

> @@ -69,6 +70,15 @@ enum pure_const_state_e

>  

>  const char *pure_const_names[3] = {"const", "pure", "neither"};

>  

> +enum malloc_state_e

> +{

> +  PURE_CONST_MALLOC_TOP,

> +  PURE_CONST_MALLOC,

> +  PURE_CONST_MALLOC_BOTTOM

> +};


It took me a while to work out what PURE_CONST means here :)
I would just call it something like STATE_MALLOC_TOP... or so.
ipa_pure_const is outdated name from the time pass was doing only
those two.
> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state;

>  

>  static vec<funct_state> funct_state_vec;

>  

> +/* A map from node to subset of callees. The subset contains those callees

> + * whose return-value is returned by the node. */

> +static hash_map< cgraph_node *, vec<cgraph_node *>* > *return_callees_map;

> +


Hehe, a special case of return jump function.  We ought to support those more generally.
How do you keep it up to date over callgraph changes?
> @@ -921,6 +1055,23 @@ end:

>    if (TREE_NOTHROW (decl))

>      l->can_throw = false;

>  

> +  if (ipa)

> +    {

> +      vec<cgraph_node *> v = vNULL;

> +      l->malloc_state = PURE_CONST_MALLOC_BOTTOM;

> +      if (DECL_IS_MALLOC (decl))

> +	l->malloc_state = PURE_CONST_MALLOC;

> +      else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v))

> +	{

> +	  l->malloc_state = PURE_CONST_MALLOC_TOP;

> +	  vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL);

> +	  for (unsigned i = 0; i < v.length (); ++i)

> +	    callees_p->safe_push (v[i]);

> +	  return_callees_map->put (fn, callees_p);

> +	}

> +      v.release ();

> +    }

> +


I would do non-ipa variant, too.  I think most attributes can be detected that way
as well.

The patch generally makes sense to me.  It would be nice to make it easier to write such
a basic propagators across callgraph (perhaps adding a template doing the basic
propagation logic). Also I think you need to solve the problem with keeping your
summaries up to date across callgraph node removal and duplications.

Honza
Prathamesh Kulkarni May 23, 2017, 1:40 p.m. UTC | #7
On 19 May 2017 at 19:02, Jan Hubicka <hubicka@ucw.cz> wrote:
>>

>> * LTO and memory management

>> This is a general question about LTO and memory management.

>> IIUC the following sequence takes place during normal LTO:

>> LGEN: generate_summary, write_summary

>> WPA: read_summary, execute ipa passes, write_opt_summary

>>

>> So I assumed it was OK in LGEN to allocate return_callees_map in

>> generate_summary and free it in write_summary and during WPA, allocate

>> return_callees_map in read_summary and free it after execute (since

>> write_opt_summary does not require return_callees_map).

>>

>> However with fat LTO, it seems the sequence changes for LGEN with

>> execute phase takes place after write_summary. However since

>> return_callees_map is freed in pure_const_write_summary and

>> propagate_malloc() accesses it in execute stage, it results in

>> segmentation fault.

>>

>> To work around this, I am using the following hack in pure_const_write_summary:

>> // FIXME: Do not free if -ffat-lto-objects is enabled.

>> if (!global_options.x_flag_fat_lto_objects)

>>   free_return_callees_map ();

>> Is there a better approach for handling this ?

>

> I think most passes just do not free summaries with -flto.  We probably want

> to fix it to make it possible to compile multiple units i.e. from plugin by

> adding release_summaries method...

> So I would say it is OK to do the same as others do and leak it with -flto.

>> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c

>> index e457166ea39..724c26e03f6 100644

>> --- a/gcc/ipa-pure-const.c

>> +++ b/gcc/ipa-pure-const.c

>> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see

>>  #include "tree-scalar-evolution.h"

>>  #include "intl.h"

>>  #include "opts.h"

>> +#include "ssa.h"

>>

>>  /* Lattice values for const and pure functions.  Everything starts out

>>     being const, then may drop to pure and then neither depending on

>> @@ -69,6 +70,15 @@ enum pure_const_state_e

>>

>>  const char *pure_const_names[3] = {"const", "pure", "neither"};

>>

>> +enum malloc_state_e

>> +{

>> +  PURE_CONST_MALLOC_TOP,

>> +  PURE_CONST_MALLOC,

>> +  PURE_CONST_MALLOC_BOTTOM

>> +};

>

> It took me a while to work out what PURE_CONST means here :)

> I would just call it something like STATE_MALLOC_TOP... or so.

> ipa_pure_const is outdated name from the time pass was doing only

> those two.

>> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state;

>>

>>  static vec<funct_state> funct_state_vec;

>>

>> +/* A map from node to subset of callees. The subset contains those callees

>> + * whose return-value is returned by the node. */

>> +static hash_map< cgraph_node *, vec<cgraph_node *>* > *return_callees_map;

>> +

>

> Hehe, a special case of return jump function.  We ought to support those more generally.

> How do you keep it up to date over callgraph changes?

>> @@ -921,6 +1055,23 @@ end:

>>    if (TREE_NOTHROW (decl))

>>      l->can_throw = false;

>>

>> +  if (ipa)

>> +    {

>> +      vec<cgraph_node *> v = vNULL;

>> +      l->malloc_state = PURE_CONST_MALLOC_BOTTOM;

>> +      if (DECL_IS_MALLOC (decl))

>> +     l->malloc_state = PURE_CONST_MALLOC;

>> +      else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v))

>> +     {

>> +       l->malloc_state = PURE_CONST_MALLOC_TOP;

>> +       vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL);

>> +       for (unsigned i = 0; i < v.length (); ++i)

>> +         callees_p->safe_push (v[i]);

>> +       return_callees_map->put (fn, callees_p);

>> +     }

>> +      v.release ();

>> +    }

>> +

>

> I would do non-ipa variant, too.  I think most attributes can be detected that way

> as well.

>

> The patch generally makes sense to me.  It would be nice to make it easier to write such

> a basic propagators across callgraph (perhaps adding a template doing the basic

> propagation logic). Also I think you need to solve the problem with keeping your

> summaries up to date across callgraph node removal and duplications.

Thanks for the suggestions, I will try to address them in a follow-up patch.
IIUC, I would need to modify ipa-pure-const cgraph hooks -
add_new_function, remove_node_data, duplicate_node_data
to keep return_callees_map up-to-date across callgraph node insertions
and removal ?

Also, if instead of having a separate data-structure like return_callees_map,
should we rather have a flag within cgraph_edge, which marks that the
caller may return the value of the callee ?

Thanks,
Prathamesh
>

> Honza
Prathamesh Kulkarni July 31, 2017, 6:23 p.m. UTC | #8
On 23 May 2017 at 19:10, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
> On 19 May 2017 at 19:02, Jan Hubicka <hubicka@ucw.cz> wrote:

>>>

>>> * LTO and memory management

>>> This is a general question about LTO and memory management.

>>> IIUC the following sequence takes place during normal LTO:

>>> LGEN: generate_summary, write_summary

>>> WPA: read_summary, execute ipa passes, write_opt_summary

>>>

>>> So I assumed it was OK in LGEN to allocate return_callees_map in

>>> generate_summary and free it in write_summary and during WPA, allocate

>>> return_callees_map in read_summary and free it after execute (since

>>> write_opt_summary does not require return_callees_map).

>>>

>>> However with fat LTO, it seems the sequence changes for LGEN with

>>> execute phase takes place after write_summary. However since

>>> return_callees_map is freed in pure_const_write_summary and

>>> propagate_malloc() accesses it in execute stage, it results in

>>> segmentation fault.

>>>

>>> To work around this, I am using the following hack in pure_const_write_summary:

>>> // FIXME: Do not free if -ffat-lto-objects is enabled.

>>> if (!global_options.x_flag_fat_lto_objects)

>>>   free_return_callees_map ();

>>> Is there a better approach for handling this ?

>>

>> I think most passes just do not free summaries with -flto.  We probably want

>> to fix it to make it possible to compile multiple units i.e. from plugin by

>> adding release_summaries method...

>> So I would say it is OK to do the same as others do and leak it with -flto.

>>> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c

>>> index e457166ea39..724c26e03f6 100644

>>> --- a/gcc/ipa-pure-const.c

>>> +++ b/gcc/ipa-pure-const.c

>>> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see

>>>  #include "tree-scalar-evolution.h"

>>>  #include "intl.h"

>>>  #include "opts.h"

>>> +#include "ssa.h"

>>>

>>>  /* Lattice values for const and pure functions.  Everything starts out

>>>     being const, then may drop to pure and then neither depending on

>>> @@ -69,6 +70,15 @@ enum pure_const_state_e

>>>

>>>  const char *pure_const_names[3] = {"const", "pure", "neither"};

>>>

>>> +enum malloc_state_e

>>> +{

>>> +  PURE_CONST_MALLOC_TOP,

>>> +  PURE_CONST_MALLOC,

>>> +  PURE_CONST_MALLOC_BOTTOM

>>> +};

>>

>> It took me a while to work out what PURE_CONST means here :)

>> I would just call it something like STATE_MALLOC_TOP... or so.

>> ipa_pure_const is outdated name from the time pass was doing only

>> those two.

>>> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state;

>>>

>>>  static vec<funct_state> funct_state_vec;

>>>

>>> +/* A map from node to subset of callees. The subset contains those callees

>>> + * whose return-value is returned by the node. */

>>> +static hash_map< cgraph_node *, vec<cgraph_node *>* > *return_callees_map;

>>> +

>>

>> Hehe, a special case of return jump function.  We ought to support those more generally.

>> How do you keep it up to date over callgraph changes?

>>> @@ -921,6 +1055,23 @@ end:

>>>    if (TREE_NOTHROW (decl))

>>>      l->can_throw = false;

>>>

>>> +  if (ipa)

>>> +    {

>>> +      vec<cgraph_node *> v = vNULL;

>>> +      l->malloc_state = PURE_CONST_MALLOC_BOTTOM;

>>> +      if (DECL_IS_MALLOC (decl))

>>> +     l->malloc_state = PURE_CONST_MALLOC;

>>> +      else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v))

>>> +     {

>>> +       l->malloc_state = PURE_CONST_MALLOC_TOP;

>>> +       vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL);

>>> +       for (unsigned i = 0; i < v.length (); ++i)

>>> +         callees_p->safe_push (v[i]);

>>> +       return_callees_map->put (fn, callees_p);

>>> +     }

>>> +      v.release ();

>>> +    }

>>> +

>>

>> I would do non-ipa variant, too.  I think most attributes can be detected that way

>> as well.

>>

>> The patch generally makes sense to me.  It would be nice to make it easier to write such

>> a basic propagators across callgraph (perhaps adding a template doing the basic

>> propagation logic). Also I think you need to solve the problem with keeping your

>> summaries up to date across callgraph node removal and duplications.

> Thanks for the suggestions, I will try to address them in a follow-up patch.

> IIUC, I would need to modify ipa-pure-const cgraph hooks -

> add_new_function, remove_node_data, duplicate_node_data

> to keep return_callees_map up-to-date across callgraph node insertions

> and removal ?

>

> Also, if instead of having a separate data-structure like return_callees_map,

> should we rather have a flag within cgraph_edge, which marks that the

> caller may return the value of the callee ?

Hi,
Sorry for the very late response. I have attached an updated version
of the prototype patch,
which adds a non-ipa variant, and keeps return_callees_map up-to-date
across callgraph
node insertions and removal. For the non-ipa variant,
malloc_candidate_p() additionally checks
that all the "return callees" have DECL_IS_MALLOC set to true.
Bootstrapped+tested and LTO bootstrapped+tested on x86_64-unknown-linux-gnu.
Does it look OK so far ?

Um sorry for this silly question, but I don't really understand how
does indirect call propagation
work in ipa-pure-const ? For example consider propagation of nothrow
attribute in following
test-case:

__attribute__((noinline, noclone, nothrow))
int f1(int k) { return k; }

__attribute__((noinline, noclone))
static int foo(int (*p)(int))
{
  return p(10);
}

__attribute__((noinline, noclone))
int bar(void)
{
  return foo(f1);
}

Shouldn't foo and bar be also marked as nothrow ?
Since foo indirectly calls f1 which is nothrow and bar only calls foo ?
The local-pure-const2 dump shows function is locally throwing  for
"foo" and "bar".

Um, I was wondering how to get "points-to" analysis for function-pointers,
to get list of callees that may be indirectly called from that
function pointer ?
In the patch I just set node to bottom if it contains indirect calls
which is far from ideal :(
I would be much grateful for suggestions on how to handle indirect calls.
Thanks!

Regards,
Prathamesh
>

> Thanks,

> Prathamesh

>>

>> Honzadiff --git a/gcc/cgraph.c b/gcc/cgraph.c
index d7c9ba61795..8ab701a5a10 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2518,6 +2518,53 @@ cgraph_node::set_nothrow_flag (bool nothrow)
   return changed;
 }
 
+/* Worker to set malloc flag.  */
+static void
+set_malloc_flag_1 (cgraph_node *node, bool malloc_p, bool *changed)
+{
+  if (malloc_p && !DECL_IS_MALLOC (node->decl))
+    {
+      DECL_IS_MALLOC (node->decl) = true;
+      *changed = true;
+    }
+
+  ipa_ref *ref;
+  FOR_EACH_ALIAS (node, ref)
+    {
+      cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);
+      if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)
+	set_malloc_flag_1 (alias, malloc_p, changed);
+    }
+
+  for (cgraph_edge *e = node->callers; e; e = e->next_caller)
+    if (e->caller->thunk.thunk_p
+	&& (!malloc_p || e->caller->get_availability () > AVAIL_INTERPOSABLE))
+      set_malloc_flag_1 (e->caller, malloc_p, changed);
+}
+
+/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any.  */
+
+bool
+cgraph_node::set_malloc_flag (bool malloc_p)
+{
+  bool changed = false;
+
+  if (!malloc_p || get_availability () > AVAIL_INTERPOSABLE)
+    set_malloc_flag_1 (this, malloc_p, &changed);
+  else
+    {
+      ipa_ref *ref;
+
+      FOR_EACH_ALIAS (this, ref)
+	{
+	  cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);
+	  if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)
+	    set_malloc_flag_1 (alias, malloc_p, &changed);
+	}
+    }
+  return changed;
+}
+
 /* Worker to set_const_flag.  */
 
 static void
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 57cdaa45681..3bcfdf5b893 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1120,6 +1120,10 @@ public:
      if any to NOTHROW.  */
   bool set_nothrow_flag (bool nothrow);
 
+  /* SET DECL_IS_MALLOC on cgraph_node's decl and on aliases of the node
+     if any.  */
+  bool set_malloc_flag (bool malloc_p);
+
   /* If SET_CONST is true, mark function, aliases and thunks to be ECF_CONST.
     If SET_CONST if false, clear the flag.
 
diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index dac8f0d5f21..70da28ac11e 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "intl.h"
 #include "opts.h"
+#include "ssa.h"
 
 /* Lattice values for const and pure functions.  Everything starts out
    being const, then may drop to pure and then neither depending on
@@ -69,6 +70,15 @@ enum pure_const_state_e
 
 const char *pure_const_names[3] = {"const", "pure", "neither"};
 
+enum malloc_state_e
+{
+  STATE_MALLOC_TOP,
+  STATE_MALLOC,
+  STATE_MALLOC_BOTTOM
+};
+
+const char *malloc_state_names[] = {"malloc_top", "malloc", "malloc_bottom"};
+
 /* Holder for the const_state.  There is one of these per function
    decl.  */
 struct funct_state_d
@@ -92,11 +102,13 @@ struct funct_state_d
   /* If function can call free, munmap or otherwise make previously
      non-trapping memory accesses trapping.  */
   bool can_free;
+
+  enum malloc_state_e malloc_state;
 };
 
 /* State used when we know nothing about function.  */
 static struct funct_state_d varying_state
-   = { IPA_NEITHER, IPA_NEITHER, true, true, true, true };
+   = { IPA_NEITHER, IPA_NEITHER, true, true, true, true, STATE_MALLOC_BOTTOM };
 
 
 typedef struct funct_state_d * funct_state;
@@ -109,6 +121,46 @@ typedef struct funct_state_d * funct_state;
 
 static vec<funct_state> funct_state_vec;
 
+/* A map from node to subset of callees. The subset contains those callees
+ * whose return-value is returned by the node. */
+static hash_map< cgraph_node *, vec<cgraph_node *>* > *return_callees_map;
+
+static void
+return_callees_map_remove (cgraph_node *node)
+{
+  vec<cgraph_node *> **callees_pp = return_callees_map->get (node);
+  if (callees_pp)
+    {
+      vec<cgraph_node *> *callees = *callees_pp;
+      callees->release ();
+      return_callees_map->remove (node);
+    }
+}
+
+static void
+return_callees_map_clone (cgraph_node *src, cgraph_node *dst)
+{
+  vec<cgraph_node *> **callees_pp = return_callees_map->get (src);
+  if (callees_pp)
+    {
+      vec<cgraph_node *> *src_callees = *callees_pp;
+      vec<cgraph_node *> *dst_callees = new vec<cgraph_node *> (vNULL);
+      for (unsigned i = 0; i < src_callees->length (); ++i)
+	dst_callees->safe_push ((*src_callees)[i]);
+      return_callees_map->put (dst, dst_callees);
+    }
+}
+
+static void
+return_callees_map_free (void)
+{
+  for (hash_map<cgraph_node *, vec<cgraph_node *>*>::iterator it = return_callees_map->begin ();
+       it != return_callees_map->end ();
+       ++it)
+    return_callees_map_remove ((*it).first);
+  delete return_callees_map;
+}
+
 static bool gate_pure_const (void);
 
 namespace {
@@ -812,6 +864,128 @@ check_stmt (gimple_stmt_iterator *gsip, funct_state local, bool ipa)
     }
 }
 
+static bool
+check_retval_uses (tree retval, gimple *stmt)
+{
+  imm_use_iterator use_iter;
+  gimple *use_stmt;
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, retval)
+    if (gcond *cond = dyn_cast<gcond *> (use_stmt))
+      {
+	tree op2 = gimple_cond_rhs (cond);
+	if (!integer_zerop (op2))
+	  RETURN_FROM_IMM_USE_STMT (use_iter, false);
+      }
+    else if (gassign *ga = dyn_cast<gassign *> (use_stmt))
+      {
+	enum tree_code code = gimple_assign_rhs_code (ga);
+	if (TREE_CODE_CLASS (code) != tcc_comparison)
+	  RETURN_FROM_IMM_USE_STMT (use_iter, false);
+	if (!integer_zerop (gimple_assign_rhs2 (ga)))
+	  RETURN_FROM_IMM_USE_STMT (use_iter, false);
+      }
+    else if (is_gimple_debug (use_stmt))
+      ;
+    else if (use_stmt != stmt)
+      RETURN_FROM_IMM_USE_STMT (use_iter, false);
+
+  return true;
+}
+
+/*
+ * Currently this function does a very conservative analysis to check if
+ * function could be a malloc candidate.
+ *
+ * The function is considered to be a candidate if
+ * 1) The function returns a value of pointer type.
+ * 2) SSA_NAME_DEF_STMT (return_value) is either a function call or
+ *    a phi, and element of phi is either NULL or
+ *    SSA_NAME_DEF_STMT(element) is function call.
+ * 3) The return-value has immediate uses only within comparisons (gcond or gassign)
+ *    and return_stmt (and likewise a phi arg has immediate use only within comparison
+ *    or the phi stmt).
+ */
+
+static bool
+malloc_candidate_p (function *fun, vec<cgraph_node *>* callees,
+		    bool ipa)
+{
+  basic_block exit_block = EXIT_BLOCK_PTR_FOR_FN (fun);
+  edge e;
+  edge_iterator ei;
+
+  if (EDGE_COUNT (exit_block->preds) == 0)
+    return false;
+
+  FOR_EACH_EDGE (e, ei, exit_block->preds)
+    {
+      gimple_stmt_iterator gsi = gsi_last_bb (e->src);
+      greturn *ret_stmt = dyn_cast<greturn *> (gsi_stmt (gsi));
+
+      if (!ret_stmt)
+	return false;
+
+      tree retval = gimple_return_retval (ret_stmt);
+      if (!retval)
+	return false;
+
+      if (TREE_CODE (retval) != SSA_NAME
+	  || TREE_CODE (TREE_TYPE (retval)) != POINTER_TYPE)
+	return false;
+
+      if (!check_retval_uses (retval, ret_stmt))
+	return false;
+
+      gimple *def = SSA_NAME_DEF_STMT (retval);
+      if (gcall *call_stmt = dyn_cast<gcall *> (def))
+	{
+	  tree callee_decl = gimple_call_fndecl (call_stmt);
+	  if (!callee_decl)
+	    return false;
+
+	  cgraph_node *callee = cgraph_node::get_create (callee_decl);
+	  if (!ipa && !DECL_IS_MALLOC (callee->decl))
+	    return false;
+
+	  if (callees)
+	    callees->safe_push (callee);
+	}
+
+      else if (gphi *phi = dyn_cast<gphi *> (def))
+	for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+	  {
+	    tree arg = gimple_phi_arg_def (phi, i);
+	    if (TREE_CODE (arg) != SSA_NAME)
+	      return false;
+	    if (!(arg == null_pointer_node || check_retval_uses (arg, phi)))
+	      return false;
+
+	    gimple *arg_def = SSA_NAME_DEF_STMT (arg);
+	    gcall *call_stmt = dyn_cast<gcall *> (arg_def);
+	    if (!call_stmt)
+	      return false;
+	    tree callee_decl = gimple_call_fndecl (call_stmt);
+	    if (!callee_decl)
+	      return false;
+	    if (!ipa && !DECL_IS_MALLOC (callee_decl))
+	      return false;
+	    cgraph_node *callee = cgraph_node::get_create (callee_decl);
+
+	    if (callees)
+	      callees->safe_push (callee);
+	  }
+
+      else
+	return false;
+    }
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "\nFound %s to be candidate for malloc attribute\n",
+	     IDENTIFIER_POINTER (DECL_NAME (fun->decl)));
+  return true;
+}
+
 
 /* This is the main routine for finding the reference patterns for
    global variables within a function FN.  */
@@ -921,6 +1095,25 @@ end:
   if (TREE_NOTHROW (decl))
     l->can_throw = false;
 
+  l->malloc_state = STATE_MALLOC_BOTTOM;
+  if (DECL_IS_MALLOC (decl))
+    l->malloc_state = STATE_MALLOC;
+  else if (ipa)
+    {
+      vec<cgraph_node *> v = vNULL;
+      if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), &v, true))
+	{
+	  l->malloc_state = STATE_MALLOC_TOP;
+	  vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL);
+	  for (unsigned i = 0; i < v.length (); ++i)
+	    callees_p->safe_push (v[i]);
+	  return_callees_map->put (fn, callees_p);
+	}
+      v.release ();
+    }
+  else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), NULL, false))
+    l->malloc_state = STATE_MALLOC;
+
   pop_cfun ();
   if (dump_file)
     {
@@ -934,6 +1127,8 @@ end:
         fprintf (dump_file, "Function is locally pure.\n");
       if (l->can_free)
 	fprintf (dump_file, "Function can locally free.\n");
+      if (l->malloc_state == STATE_MALLOC)
+	fprintf (dump_file, "Function is locally malloc.\n");
     }
   return l;
 }
@@ -962,6 +1157,7 @@ duplicate_node_data (struct cgraph_node *src, struct cgraph_node *dst,
       gcc_assert (!has_function_state (dst));
       memcpy (l, get_function_state (src), sizeof (*l));
       set_function_state (dst, l);
+      return_callees_map_clone (src, dst);
     }
 }
 
@@ -971,7 +1167,10 @@ static void
 remove_node_data (struct cgraph_node *node, void *data ATTRIBUTE_UNUSED)
 {
   if (has_function_state (node))
-    set_function_state (node, NULL);
+    {
+      set_function_state (node, NULL);
+      return_callees_map_remove (node);
+    }
 }
 
 
@@ -1003,6 +1202,7 @@ pure_const_generate_summary (void)
 
   pass_ipa_pure_const *pass = static_cast <pass_ipa_pure_const *> (current_pass);
   pass->register_hooks ();
+  return_callees_map = new hash_map<cgraph_node *, vec<cgraph_node *>*>;
 
   /* Process all of the functions.
 
@@ -1067,7 +1267,22 @@ pure_const_write_summary (void)
 	  bp_pack_value (&bp, fs->looping, 1);
 	  bp_pack_value (&bp, fs->can_throw, 1);
 	  bp_pack_value (&bp, fs->can_free, 1);
+	  bp_pack_value (&bp, fs->malloc_state, 2);
 	  streamer_write_bitpack (&bp);
+
+	  vec<cgraph_node *> **callees_pp = return_callees_map->get (node);
+	  if (callees_pp)
+	    {
+	      vec<cgraph_node *> callees = **callees_pp;
+	      streamer_write_uhwi_stream (ob->main_stream, callees.length ());
+	      for (unsigned i = 0; i < callees.length (); ++i)
+		{
+		  int callee_ref = lto_symtab_encoder_encode (encoder, callees[i]);
+		  streamer_write_uhwi_stream (ob->main_stream, callee_ref);
+		}
+	    }
+	  else
+	    streamer_write_uhwi_stream (ob->main_stream, 0);
 	}
     }
 
@@ -1087,6 +1302,8 @@ pure_const_read_summary (void)
   pass_ipa_pure_const *pass = static_cast <pass_ipa_pure_const *> (current_pass);
   pass->register_hooks ();
 
+  return_callees_map = new hash_map<cgraph_node *, vec<cgraph_node *> *>;
+
   while ((file_data = file_data_vec[j++]))
     {
       const char *data;
@@ -1127,6 +1344,22 @@ pure_const_read_summary (void)
 	      fs->looping = bp_unpack_value (&bp, 1);
 	      fs->can_throw = bp_unpack_value (&bp, 1);
 	      fs->can_free = bp_unpack_value (&bp, 1);
+	      fs->malloc_state
+			= (enum malloc_state_e) bp_unpack_value (&bp, 2);
+
+	      unsigned len = streamer_read_uhwi (ib);
+	      if (len)
+		{
+		  vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL);
+		  for (unsigned i = 0; i < len; i++)
+		    {
+		      int callee_index = streamer_read_uhwi (ib);
+		      cgraph_node *callee = dyn_cast<cgraph_node *> (lto_symtab_encoder_deref (encoder, callee_index));
+		      callees_p->safe_push (callee);
+		    }
+		  return_callees_map->put (node, callees_p);
+		}
+
 	      if (dump_file)
 		{
 		  int flags = flags_from_decl_or_type (node->decl);
@@ -1149,6 +1382,8 @@ pure_const_read_summary (void)
 		    fprintf (dump_file,"  function is locally throwing\n");
 		  if (fs->can_free)
 		    fprintf (dump_file,"  function can locally free\n");
+		  fprintf (dump_file, "\n malloc state: %s\n",
+			   malloc_state_names[fs->malloc_state]);
 		}
 	    }
 
@@ -1659,6 +1894,121 @@ propagate_nothrow (void)
   free (order);
 }
 
+DEBUG_FUNCTION
+static void
+dump_malloc_lattice (FILE *dump_file, const char *s)
+{
+  if (!dump_file)
+    return;
+
+  fprintf (dump_file, "\n\nMALLOC LATTICE %s:\n", s);
+  cgraph_node *node;
+  FOR_EACH_FUNCTION (node)
+    {
+      funct_state fs = get_function_state (node);
+      malloc_state_e state = fs->malloc_state;
+      fprintf (dump_file, "%s: %s\n", node->name (), malloc_state_names[state]);
+    }
+}
+
+static void
+propagate_malloc (void)
+{
+  cgraph_node *node;
+  FOR_EACH_FUNCTION (node)
+    {
+      if (DECL_IS_MALLOC (node->decl))
+	if (!has_function_state (node))
+	  {
+	    funct_state l = XCNEW (struct funct_state_d);
+	    *l = varying_state;
+	    l->malloc_state = STATE_MALLOC;
+	    set_function_state (node, l);
+	  }
+    }
+
+  dump_malloc_lattice (dump_file, "Initial");
+  struct cgraph_node **order
+    = XNEWVEC (struct cgraph_node *, symtab->cgraph_count);
+  int order_pos = ipa_reverse_postorder (order);
+  bool changed = true;
+
+  while (changed)
+    {
+      changed = false;
+      /* Walk in postorder.  */
+      for (int i = order_pos - 1; i >= 0; --i)
+	{
+	  cgraph_node *node = order[i];
+	  if (node->alias
+	      || !node->definition
+	      || !has_function_state (node))
+	    continue;
+
+	  funct_state l = get_function_state (node);
+
+	  /* FIXME: add support for indirect-calls.  */
+	  if (node->indirect_calls)
+	    {
+	      l->malloc_state = STATE_MALLOC_BOTTOM;
+	      continue;
+	    }
+
+	  if (node->get_availability () <= AVAIL_INTERPOSABLE)
+	    {
+	      l->malloc_state = STATE_MALLOC_BOTTOM;
+	      continue;
+	    }
+
+	  if (l->malloc_state == STATE_MALLOC_BOTTOM)
+	    continue;
+
+	  vec<cgraph_node *> **callees_pp = return_callees_map->get (node);
+	  if (!callees_pp)
+	    continue;
+
+	  vec<cgraph_node *> callees = **callees_pp;
+	  malloc_state_e new_state = l->malloc_state;
+	  for (unsigned j = 0; j < callees.length (); j++)
+	    {
+	      cgraph_node *callee = callees[j];
+	      if (!has_function_state (callee))
+		{
+		  new_state = STATE_MALLOC_BOTTOM;
+		  break;
+		}
+	      malloc_state_e callee_state = get_function_state (callee)->malloc_state;
+	      if (new_state < callee_state)
+		new_state = callee_state;
+	    }
+	  if (new_state != l->malloc_state)
+	    {
+	      changed = true;
+	      l->malloc_state = new_state;
+	    }
+	}
+    }
+
+  FOR_EACH_DEFINED_FUNCTION (node)
+    if (has_function_state (node))
+      {
+	funct_state l = get_function_state (node);
+	if (!node->alias
+	    && l->malloc_state == STATE_MALLOC
+	    && !node->global.inlined_to)
+	  {
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      fprintf (dump_file, "Function %s found to be malloc\n",
+		       node->name ());
+	    node->set_malloc_flag (true);
+	  }
+      }
+
+  dump_malloc_lattice (dump_file, "after propagation");
+  ipa_free_postorder_info ();
+  free (order);
+  return_callees_map_free ();
+}
 
 /* Produce the global information by preforming a transitive closure
    on the local information that was produced by generate_summary.  */
@@ -1677,6 +2027,7 @@ execute (function *)
   /* Nothrow makes more function to not lead to return and improve
      later analysis.  */
   propagate_nothrow ();
+  propagate_malloc ();
   remove_p = propagate_pure_const ();
 
   /* Cleanup. */
@@ -1877,6 +2228,17 @@ pass_local_pure_const::execute (function *fun)
 	fprintf (dump_file, "Function found to be nothrow: %s\n",
 		 current_function_name ());
     }
+
+  if (l->malloc_state == STATE_MALLOC
+      && !DECL_IS_MALLOC (current_function_decl))
+    {
+      node->set_malloc_flag (true);
+      changed = true;
+      if (dump_file)
+	fprintf (dump_file, "Function found to be malloc: %s\n",
+		 node->name ());
+    }
+
   free (l);
   if (changed)
     return execute_fixup_cfg ();
diff --git a/gcc/ssa-iterators.h b/gcc/ssa-iterators.h
index c8aa77bd4f3..740cbf13cb2 100644
--- a/gcc/ssa-iterators.h
+++ b/gcc/ssa-iterators.h
@@ -93,6 +93,12 @@ struct imm_use_iterator
      break;							\
    }
 
+/* Similarly for return.  */
+#define RETURN_FROM_IMM_USE_STMT(ITER, VAL)			\
+  {								\
+    end_imm_use_stmt_traverse (&(ITER));			\
+    return (VAL);						\
+  }
 
 /* Use this iterator in combination with FOR_EACH_IMM_USE_STMT to
    get access to each occurrence of ssavar on the stmt returned by
diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c
new file mode 100644
index 00000000000..9a95f817079
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */
+
+__attribute__((noinline, no_icf, used))
+static void *f(__SIZE_TYPE__ n)
+{
+  void *p = __builtin_malloc (n);
+  if (p == 0)
+    __builtin_abort ();
+  return p;
+}
+
+__attribute__((noinline, no_icf, used))
+static void *bar(__SIZE_TYPE__ n)
+{
+  void *p = f (n);
+  return p;
+}
+
+/* { dg-final { scan-ipa-dump "Function f found to be malloc" "pure-const" } } */
+/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */
diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c
new file mode 100644
index 00000000000..95b2fd74a7a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */
+
+__attribute__((noinline, used, no_icf))
+static void *foo (__SIZE_TYPE__ n)
+{
+  return __builtin_malloc (n * 10);
+}
+
+__attribute__((noinline, used, no_icf))
+static void *bar(__SIZE_TYPE__ n, int cond)
+{
+  void *p;
+  if (cond)
+    p = foo (n);
+  else
+    p = __builtin_malloc (n);
+
+  return p;
+}
+
+/* { dg-final { scan-ipa-dump "Function foo found to be malloc" "pure-const" } } */
+/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */
diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c
new file mode 100644
index 00000000000..13558ddd07d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */
+
+static void *foo(__SIZE_TYPE__, int) __attribute__((noinline, no_icf, used));
+
+__attribute__((noinline, used, no_icf))
+static void *bar(__SIZE_TYPE__ n, int m)
+{
+  return foo (n, m);
+}
+
+static void *foo(__SIZE_TYPE__ n, int m)
+{
+  void *p;
+  if (m > 0)
+    p = bar (n, --m);
+  else
+    p = __builtin_malloc (n);
+
+  return p;
+}
+
+/* { dg-final { scan-ipa-dump "Function foo found to be malloc" "pure-const" } } */
+/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */

Prathamesh Kulkarni Aug. 8, 2017, 4:20 a.m. UTC | #9
On 31 July 2017 at 23:53, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
> On 23 May 2017 at 19:10, Prathamesh Kulkarni

> <prathamesh.kulkarni@linaro.org> wrote:

>> On 19 May 2017 at 19:02, Jan Hubicka <hubicka@ucw.cz> wrote:

>>>>

>>>> * LTO and memory management

>>>> This is a general question about LTO and memory management.

>>>> IIUC the following sequence takes place during normal LTO:

>>>> LGEN: generate_summary, write_summary

>>>> WPA: read_summary, execute ipa passes, write_opt_summary

>>>>

>>>> So I assumed it was OK in LGEN to allocate return_callees_map in

>>>> generate_summary and free it in write_summary and during WPA, allocate

>>>> return_callees_map in read_summary and free it after execute (since

>>>> write_opt_summary does not require return_callees_map).

>>>>

>>>> However with fat LTO, it seems the sequence changes for LGEN with

>>>> execute phase takes place after write_summary. However since

>>>> return_callees_map is freed in pure_const_write_summary and

>>>> propagate_malloc() accesses it in execute stage, it results in

>>>> segmentation fault.

>>>>

>>>> To work around this, I am using the following hack in pure_const_write_summary:

>>>> // FIXME: Do not free if -ffat-lto-objects is enabled.

>>>> if (!global_options.x_flag_fat_lto_objects)

>>>>   free_return_callees_map ();

>>>> Is there a better approach for handling this ?

>>>

>>> I think most passes just do not free summaries with -flto.  We probably want

>>> to fix it to make it possible to compile multiple units i.e. from plugin by

>>> adding release_summaries method...

>>> So I would say it is OK to do the same as others do and leak it with -flto.

>>>> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c

>>>> index e457166ea39..724c26e03f6 100644

>>>> --- a/gcc/ipa-pure-const.c

>>>> +++ b/gcc/ipa-pure-const.c

>>>> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see

>>>>  #include "tree-scalar-evolution.h"

>>>>  #include "intl.h"

>>>>  #include "opts.h"

>>>> +#include "ssa.h"

>>>>

>>>>  /* Lattice values for const and pure functions.  Everything starts out

>>>>     being const, then may drop to pure and then neither depending on

>>>> @@ -69,6 +70,15 @@ enum pure_const_state_e

>>>>

>>>>  const char *pure_const_names[3] = {"const", "pure", "neither"};

>>>>

>>>> +enum malloc_state_e

>>>> +{

>>>> +  PURE_CONST_MALLOC_TOP,

>>>> +  PURE_CONST_MALLOC,

>>>> +  PURE_CONST_MALLOC_BOTTOM

>>>> +};

>>>

>>> It took me a while to work out what PURE_CONST means here :)

>>> I would just call it something like STATE_MALLOC_TOP... or so.

>>> ipa_pure_const is outdated name from the time pass was doing only

>>> those two.

>>>> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state;

>>>>

>>>>  static vec<funct_state> funct_state_vec;

>>>>

>>>> +/* A map from node to subset of callees. The subset contains those callees

>>>> + * whose return-value is returned by the node. */

>>>> +static hash_map< cgraph_node *, vec<cgraph_node *>* > *return_callees_map;

>>>> +

>>>

>>> Hehe, a special case of return jump function.  We ought to support those more generally.

>>> How do you keep it up to date over callgraph changes?

>>>> @@ -921,6 +1055,23 @@ end:

>>>>    if (TREE_NOTHROW (decl))

>>>>      l->can_throw = false;

>>>>

>>>> +  if (ipa)

>>>> +    {

>>>> +      vec<cgraph_node *> v = vNULL;

>>>> +      l->malloc_state = PURE_CONST_MALLOC_BOTTOM;

>>>> +      if (DECL_IS_MALLOC (decl))

>>>> +     l->malloc_state = PURE_CONST_MALLOC;

>>>> +      else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v))

>>>> +     {

>>>> +       l->malloc_state = PURE_CONST_MALLOC_TOP;

>>>> +       vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL);

>>>> +       for (unsigned i = 0; i < v.length (); ++i)

>>>> +         callees_p->safe_push (v[i]);

>>>> +       return_callees_map->put (fn, callees_p);

>>>> +     }

>>>> +      v.release ();

>>>> +    }

>>>> +

>>>

>>> I would do non-ipa variant, too.  I think most attributes can be detected that way

>>> as well.

>>>

>>> The patch generally makes sense to me.  It would be nice to make it easier to write such

>>> a basic propagators across callgraph (perhaps adding a template doing the basic

>>> propagation logic). Also I think you need to solve the problem with keeping your

>>> summaries up to date across callgraph node removal and duplications.

>> Thanks for the suggestions, I will try to address them in a follow-up patch.

>> IIUC, I would need to modify ipa-pure-const cgraph hooks -

>> add_new_function, remove_node_data, duplicate_node_data

>> to keep return_callees_map up-to-date across callgraph node insertions

>> and removal ?

>>

>> Also, if instead of having a separate data-structure like return_callees_map,

>> should we rather have a flag within cgraph_edge, which marks that the

>> caller may return the value of the callee ?

> Hi,

> Sorry for the very late response. I have attached an updated version

> of the prototype patch,

> which adds a non-ipa variant, and keeps return_callees_map up-to-date

> across callgraph

> node insertions and removal. For the non-ipa variant,

> malloc_candidate_p() additionally checks

> that all the "return callees" have DECL_IS_MALLOC set to true.

> Bootstrapped+tested and LTO bootstrapped+tested on x86_64-unknown-linux-gnu.

> Does it look OK so far ?

>

> Um sorry for this silly question, but I don't really understand how

> does indirect call propagation

> work in ipa-pure-const ? For example consider propagation of nothrow

> attribute in following

> test-case:

>

> __attribute__((noinline, noclone, nothrow))

> int f1(int k) { return k; }

>

> __attribute__((noinline, noclone))

> static int foo(int (*p)(int))

> {

>   return p(10);

> }

>

> __attribute__((noinline, noclone))

> int bar(void)

> {

>   return foo(f1);

> }

>

> Shouldn't foo and bar be also marked as nothrow ?

> Since foo indirectly calls f1 which is nothrow and bar only calls foo ?

> The local-pure-const2 dump shows function is locally throwing  for

> "foo" and "bar".

>

> Um, I was wondering how to get "points-to" analysis for function-pointers,

> to get list of callees that may be indirectly called from that

> function pointer ?

> In the patch I just set node to bottom if it contains indirect calls

> which is far from ideal :(

> I would be much grateful for suggestions on how to handle indirect calls.

> Thanks!

ping https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html

Thanks,
Prathamesh
>

> Regards,

> Prathamesh

>>

>> Thanks,

>> Prathamesh

>>>

>>> Honza
Prathamesh Kulkarni Aug. 17, 2017, 12:32 p.m. UTC | #10
On 8 August 2017 at 09:50, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
> On 31 July 2017 at 23:53, Prathamesh Kulkarni

> <prathamesh.kulkarni@linaro.org> wrote:

>> On 23 May 2017 at 19:10, Prathamesh Kulkarni

>> <prathamesh.kulkarni@linaro.org> wrote:

>>> On 19 May 2017 at 19:02, Jan Hubicka <hubicka@ucw.cz> wrote:

>>>>>

>>>>> * LTO and memory management

>>>>> This is a general question about LTO and memory management.

>>>>> IIUC the following sequence takes place during normal LTO:

>>>>> LGEN: generate_summary, write_summary

>>>>> WPA: read_summary, execute ipa passes, write_opt_summary

>>>>>

>>>>> So I assumed it was OK in LGEN to allocate return_callees_map in

>>>>> generate_summary and free it in write_summary and during WPA, allocate

>>>>> return_callees_map in read_summary and free it after execute (since

>>>>> write_opt_summary does not require return_callees_map).

>>>>>

>>>>> However with fat LTO, it seems the sequence changes for LGEN with

>>>>> execute phase takes place after write_summary. However since

>>>>> return_callees_map is freed in pure_const_write_summary and

>>>>> propagate_malloc() accesses it in execute stage, it results in

>>>>> segmentation fault.

>>>>>

>>>>> To work around this, I am using the following hack in pure_const_write_summary:

>>>>> // FIXME: Do not free if -ffat-lto-objects is enabled.

>>>>> if (!global_options.x_flag_fat_lto_objects)

>>>>>   free_return_callees_map ();

>>>>> Is there a better approach for handling this ?

>>>>

>>>> I think most passes just do not free summaries with -flto.  We probably want

>>>> to fix it to make it possible to compile multiple units i.e. from plugin by

>>>> adding release_summaries method...

>>>> So I would say it is OK to do the same as others do and leak it with -flto.

>>>>> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c

>>>>> index e457166ea39..724c26e03f6 100644

>>>>> --- a/gcc/ipa-pure-const.c

>>>>> +++ b/gcc/ipa-pure-const.c

>>>>> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see

>>>>>  #include "tree-scalar-evolution.h"

>>>>>  #include "intl.h"

>>>>>  #include "opts.h"

>>>>> +#include "ssa.h"

>>>>>

>>>>>  /* Lattice values for const and pure functions.  Everything starts out

>>>>>     being const, then may drop to pure and then neither depending on

>>>>> @@ -69,6 +70,15 @@ enum pure_const_state_e

>>>>>

>>>>>  const char *pure_const_names[3] = {"const", "pure", "neither"};

>>>>>

>>>>> +enum malloc_state_e

>>>>> +{

>>>>> +  PURE_CONST_MALLOC_TOP,

>>>>> +  PURE_CONST_MALLOC,

>>>>> +  PURE_CONST_MALLOC_BOTTOM

>>>>> +};

>>>>

>>>> It took me a while to work out what PURE_CONST means here :)

>>>> I would just call it something like STATE_MALLOC_TOP... or so.

>>>> ipa_pure_const is outdated name from the time pass was doing only

>>>> those two.

>>>>> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state;

>>>>>

>>>>>  static vec<funct_state> funct_state_vec;

>>>>>

>>>>> +/* A map from node to subset of callees. The subset contains those callees

>>>>> + * whose return-value is returned by the node. */

>>>>> +static hash_map< cgraph_node *, vec<cgraph_node *>* > *return_callees_map;

>>>>> +

>>>>

>>>> Hehe, a special case of return jump function.  We ought to support those more generally.

>>>> How do you keep it up to date over callgraph changes?

>>>>> @@ -921,6 +1055,23 @@ end:

>>>>>    if (TREE_NOTHROW (decl))

>>>>>      l->can_throw = false;

>>>>>

>>>>> +  if (ipa)

>>>>> +    {

>>>>> +      vec<cgraph_node *> v = vNULL;

>>>>> +      l->malloc_state = PURE_CONST_MALLOC_BOTTOM;

>>>>> +      if (DECL_IS_MALLOC (decl))

>>>>> +     l->malloc_state = PURE_CONST_MALLOC;

>>>>> +      else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v))

>>>>> +     {

>>>>> +       l->malloc_state = PURE_CONST_MALLOC_TOP;

>>>>> +       vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL);

>>>>> +       for (unsigned i = 0; i < v.length (); ++i)

>>>>> +         callees_p->safe_push (v[i]);

>>>>> +       return_callees_map->put (fn, callees_p);

>>>>> +     }

>>>>> +      v.release ();

>>>>> +    }

>>>>> +

>>>>

>>>> I would do non-ipa variant, too.  I think most attributes can be detected that way

>>>> as well.

>>>>

>>>> The patch generally makes sense to me.  It would be nice to make it easier to write such

>>>> a basic propagators across callgraph (perhaps adding a template doing the basic

>>>> propagation logic). Also I think you need to solve the problem with keeping your

>>>> summaries up to date across callgraph node removal and duplications.

>>> Thanks for the suggestions, I will try to address them in a follow-up patch.

>>> IIUC, I would need to modify ipa-pure-const cgraph hooks -

>>> add_new_function, remove_node_data, duplicate_node_data

>>> to keep return_callees_map up-to-date across callgraph node insertions

>>> and removal ?

>>>

>>> Also, if instead of having a separate data-structure like return_callees_map,

>>> should we rather have a flag within cgraph_edge, which marks that the

>>> caller may return the value of the callee ?

>> Hi,

>> Sorry for the very late response. I have attached an updated version

>> of the prototype patch,

>> which adds a non-ipa variant, and keeps return_callees_map up-to-date

>> across callgraph

>> node insertions and removal. For the non-ipa variant,

>> malloc_candidate_p() additionally checks

>> that all the "return callees" have DECL_IS_MALLOC set to true.

>> Bootstrapped+tested and LTO bootstrapped+tested on x86_64-unknown-linux-gnu.

>> Does it look OK so far ?

>>

>> Um sorry for this silly question, but I don't really understand how

>> does indirect call propagation

>> work in ipa-pure-const ? For example consider propagation of nothrow

>> attribute in following

>> test-case:

>>

>> __attribute__((noinline, noclone, nothrow))

>> int f1(int k) { return k; }

>>

>> __attribute__((noinline, noclone))

>> static int foo(int (*p)(int))

>> {

>>   return p(10);

>> }

>>

>> __attribute__((noinline, noclone))

>> int bar(void)

>> {

>>   return foo(f1);

>> }

>>

>> Shouldn't foo and bar be also marked as nothrow ?

>> Since foo indirectly calls f1 which is nothrow and bar only calls foo ?

>> The local-pure-const2 dump shows function is locally throwing  for

>> "foo" and "bar".

>>

>> Um, I was wondering how to get "points-to" analysis for function-pointers,

>> to get list of callees that may be indirectly called from that

>> function pointer ?

>> In the patch I just set node to bottom if it contains indirect calls

>> which is far from ideal :(

>> I would be much grateful for suggestions on how to handle indirect calls.

>> Thanks!

> ping https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html

ping * 2 https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html

Thanks,
Prathamesh
>

> Thanks,

> Prathamesh

>>

>> Regards,

>> Prathamesh

>>>

>>> Thanks,

>>> Prathamesh

>>>>

>>>> Honza
Prathamesh Kulkarni Sept. 1, 2017, 2:39 a.m. UTC | #11
On 17 August 2017 at 18:02, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
> On 8 August 2017 at 09:50, Prathamesh Kulkarni

> <prathamesh.kulkarni@linaro.org> wrote:

>> On 31 July 2017 at 23:53, Prathamesh Kulkarni

>> <prathamesh.kulkarni@linaro.org> wrote:

>>> On 23 May 2017 at 19:10, Prathamesh Kulkarni

>>> <prathamesh.kulkarni@linaro.org> wrote:

>>>> On 19 May 2017 at 19:02, Jan Hubicka <hubicka@ucw.cz> wrote:

>>>>>>

>>>>>> * LTO and memory management

>>>>>> This is a general question about LTO and memory management.

>>>>>> IIUC the following sequence takes place during normal LTO:

>>>>>> LGEN: generate_summary, write_summary

>>>>>> WPA: read_summary, execute ipa passes, write_opt_summary

>>>>>>

>>>>>> So I assumed it was OK in LGEN to allocate return_callees_map in

>>>>>> generate_summary and free it in write_summary and during WPA, allocate

>>>>>> return_callees_map in read_summary and free it after execute (since

>>>>>> write_opt_summary does not require return_callees_map).

>>>>>>

>>>>>> However with fat LTO, it seems the sequence changes for LGEN with

>>>>>> execute phase takes place after write_summary. However since

>>>>>> return_callees_map is freed in pure_const_write_summary and

>>>>>> propagate_malloc() accesses it in execute stage, it results in

>>>>>> segmentation fault.

>>>>>>

>>>>>> To work around this, I am using the following hack in pure_const_write_summary:

>>>>>> // FIXME: Do not free if -ffat-lto-objects is enabled.

>>>>>> if (!global_options.x_flag_fat_lto_objects)

>>>>>>   free_return_callees_map ();

>>>>>> Is there a better approach for handling this ?

>>>>>

>>>>> I think most passes just do not free summaries with -flto.  We probably want

>>>>> to fix it to make it possible to compile multiple units i.e. from plugin by

>>>>> adding release_summaries method...

>>>>> So I would say it is OK to do the same as others do and leak it with -flto.

>>>>>> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c

>>>>>> index e457166ea39..724c26e03f6 100644

>>>>>> --- a/gcc/ipa-pure-const.c

>>>>>> +++ b/gcc/ipa-pure-const.c

>>>>>> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see

>>>>>>  #include "tree-scalar-evolution.h"

>>>>>>  #include "intl.h"

>>>>>>  #include "opts.h"

>>>>>> +#include "ssa.h"

>>>>>>

>>>>>>  /* Lattice values for const and pure functions.  Everything starts out

>>>>>>     being const, then may drop to pure and then neither depending on

>>>>>> @@ -69,6 +70,15 @@ enum pure_const_state_e

>>>>>>

>>>>>>  const char *pure_const_names[3] = {"const", "pure", "neither"};

>>>>>>

>>>>>> +enum malloc_state_e

>>>>>> +{

>>>>>> +  PURE_CONST_MALLOC_TOP,

>>>>>> +  PURE_CONST_MALLOC,

>>>>>> +  PURE_CONST_MALLOC_BOTTOM

>>>>>> +};

>>>>>

>>>>> It took me a while to work out what PURE_CONST means here :)

>>>>> I would just call it something like STATE_MALLOC_TOP... or so.

>>>>> ipa_pure_const is outdated name from the time pass was doing only

>>>>> those two.

>>>>>> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state;

>>>>>>

>>>>>>  static vec<funct_state> funct_state_vec;

>>>>>>

>>>>>> +/* A map from node to subset of callees. The subset contains those callees

>>>>>> + * whose return-value is returned by the node. */

>>>>>> +static hash_map< cgraph_node *, vec<cgraph_node *>* > *return_callees_map;

>>>>>> +

>>>>>

>>>>> Hehe, a special case of return jump function.  We ought to support those more generally.

>>>>> How do you keep it up to date over callgraph changes?

>>>>>> @@ -921,6 +1055,23 @@ end:

>>>>>>    if (TREE_NOTHROW (decl))

>>>>>>      l->can_throw = false;

>>>>>>

>>>>>> +  if (ipa)

>>>>>> +    {

>>>>>> +      vec<cgraph_node *> v = vNULL;

>>>>>> +      l->malloc_state = PURE_CONST_MALLOC_BOTTOM;

>>>>>> +      if (DECL_IS_MALLOC (decl))

>>>>>> +     l->malloc_state = PURE_CONST_MALLOC;

>>>>>> +      else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v))

>>>>>> +     {

>>>>>> +       l->malloc_state = PURE_CONST_MALLOC_TOP;

>>>>>> +       vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL);

>>>>>> +       for (unsigned i = 0; i < v.length (); ++i)

>>>>>> +         callees_p->safe_push (v[i]);

>>>>>> +       return_callees_map->put (fn, callees_p);

>>>>>> +     }

>>>>>> +      v.release ();

>>>>>> +    }

>>>>>> +

>>>>>

>>>>> I would do non-ipa variant, too.  I think most attributes can be detected that way

>>>>> as well.

>>>>>

>>>>> The patch generally makes sense to me.  It would be nice to make it easier to write such

>>>>> a basic propagators across callgraph (perhaps adding a template doing the basic

>>>>> propagation logic). Also I think you need to solve the problem with keeping your

>>>>> summaries up to date across callgraph node removal and duplications.

>>>> Thanks for the suggestions, I will try to address them in a follow-up patch.

>>>> IIUC, I would need to modify ipa-pure-const cgraph hooks -

>>>> add_new_function, remove_node_data, duplicate_node_data

>>>> to keep return_callees_map up-to-date across callgraph node insertions

>>>> and removal ?

>>>>

>>>> Also, if instead of having a separate data-structure like return_callees_map,

>>>> should we rather have a flag within cgraph_edge, which marks that the

>>>> caller may return the value of the callee ?

>>> Hi,

>>> Sorry for the very late response. I have attached an updated version

>>> of the prototype patch,

>>> which adds a non-ipa variant, and keeps return_callees_map up-to-date

>>> across callgraph

>>> node insertions and removal. For the non-ipa variant,

>>> malloc_candidate_p() additionally checks

>>> that all the "return callees" have DECL_IS_MALLOC set to true.

>>> Bootstrapped+tested and LTO bootstrapped+tested on x86_64-unknown-linux-gnu.

>>> Does it look OK so far ?

>>>

>>> Um sorry for this silly question, but I don't really understand how

>>> does indirect call propagation

>>> work in ipa-pure-const ? For example consider propagation of nothrow

>>> attribute in following

>>> test-case:

>>>

>>> __attribute__((noinline, noclone, nothrow))

>>> int f1(int k) { return k; }

>>>

>>> __attribute__((noinline, noclone))

>>> static int foo(int (*p)(int))

>>> {

>>>   return p(10);

>>> }

>>>

>>> __attribute__((noinline, noclone))

>>> int bar(void)

>>> {

>>>   return foo(f1);

>>> }

>>>

>>> Shouldn't foo and bar be also marked as nothrow ?

>>> Since foo indirectly calls f1 which is nothrow and bar only calls foo ?

>>> The local-pure-const2 dump shows function is locally throwing  for

>>> "foo" and "bar".

>>>

>>> Um, I was wondering how to get "points-to" analysis for function-pointers,

>>> to get list of callees that may be indirectly called from that

>>> function pointer ?

>>> In the patch I just set node to bottom if it contains indirect calls

>>> which is far from ideal :(

>>> I would be much grateful for suggestions on how to handle indirect calls.

>>> Thanks!

>> ping https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html

> ping * 2 https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html

ping * 3 https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html

Thanks,
Prathamesh
>

> Thanks,

> Prathamesh

>>

>> Thanks,

>> Prathamesh

>>>

>>> Regards,

>>> Prathamesh

>>>>

>>>> Thanks,

>>>> Prathamesh

>>>>>

>>>>> Honza
Prathamesh Kulkarni Sept. 15, 2017, 12:19 p.m. UTC | #12
On 1 September 2017 at 08:09, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
> On 17 August 2017 at 18:02, Prathamesh Kulkarni

> <prathamesh.kulkarni@linaro.org> wrote:

>> On 8 August 2017 at 09:50, Prathamesh Kulkarni

>> <prathamesh.kulkarni@linaro.org> wrote:

>>> On 31 July 2017 at 23:53, Prathamesh Kulkarni

>>> <prathamesh.kulkarni@linaro.org> wrote:

>>>> On 23 May 2017 at 19:10, Prathamesh Kulkarni

>>>> <prathamesh.kulkarni@linaro.org> wrote:

>>>>> On 19 May 2017 at 19:02, Jan Hubicka <hubicka@ucw.cz> wrote:

>>>>>>>

>>>>>>> * LTO and memory management

>>>>>>> This is a general question about LTO and memory management.

>>>>>>> IIUC the following sequence takes place during normal LTO:

>>>>>>> LGEN: generate_summary, write_summary

>>>>>>> WPA: read_summary, execute ipa passes, write_opt_summary

>>>>>>>

>>>>>>> So I assumed it was OK in LGEN to allocate return_callees_map in

>>>>>>> generate_summary and free it in write_summary and during WPA, allocate

>>>>>>> return_callees_map in read_summary and free it after execute (since

>>>>>>> write_opt_summary does not require return_callees_map).

>>>>>>>

>>>>>>> However with fat LTO, it seems the sequence changes for LGEN with

>>>>>>> execute phase takes place after write_summary. However since

>>>>>>> return_callees_map is freed in pure_const_write_summary and

>>>>>>> propagate_malloc() accesses it in execute stage, it results in

>>>>>>> segmentation fault.

>>>>>>>

>>>>>>> To work around this, I am using the following hack in pure_const_write_summary:

>>>>>>> // FIXME: Do not free if -ffat-lto-objects is enabled.

>>>>>>> if (!global_options.x_flag_fat_lto_objects)

>>>>>>>   free_return_callees_map ();

>>>>>>> Is there a better approach for handling this ?

>>>>>>

>>>>>> I think most passes just do not free summaries with -flto.  We probably want

>>>>>> to fix it to make it possible to compile multiple units i.e. from plugin by

>>>>>> adding release_summaries method...

>>>>>> So I would say it is OK to do the same as others do and leak it with -flto.

>>>>>>> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c

>>>>>>> index e457166ea39..724c26e03f6 100644

>>>>>>> --- a/gcc/ipa-pure-const.c

>>>>>>> +++ b/gcc/ipa-pure-const.c

>>>>>>> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see

>>>>>>>  #include "tree-scalar-evolution.h"

>>>>>>>  #include "intl.h"

>>>>>>>  #include "opts.h"

>>>>>>> +#include "ssa.h"

>>>>>>>

>>>>>>>  /* Lattice values for const and pure functions.  Everything starts out

>>>>>>>     being const, then may drop to pure and then neither depending on

>>>>>>> @@ -69,6 +70,15 @@ enum pure_const_state_e

>>>>>>>

>>>>>>>  const char *pure_const_names[3] = {"const", "pure", "neither"};

>>>>>>>

>>>>>>> +enum malloc_state_e

>>>>>>> +{

>>>>>>> +  PURE_CONST_MALLOC_TOP,

>>>>>>> +  PURE_CONST_MALLOC,

>>>>>>> +  PURE_CONST_MALLOC_BOTTOM

>>>>>>> +};

>>>>>>

>>>>>> It took me a while to work out what PURE_CONST means here :)

>>>>>> I would just call it something like STATE_MALLOC_TOP... or so.

>>>>>> ipa_pure_const is outdated name from the time pass was doing only

>>>>>> those two.

>>>>>>> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state;

>>>>>>>

>>>>>>>  static vec<funct_state> funct_state_vec;

>>>>>>>

>>>>>>> +/* A map from node to subset of callees. The subset contains those callees

>>>>>>> + * whose return-value is returned by the node. */

>>>>>>> +static hash_map< cgraph_node *, vec<cgraph_node *>* > *return_callees_map;

>>>>>>> +

>>>>>>

>>>>>> Hehe, a special case of return jump function.  We ought to support those more generally.

>>>>>> How do you keep it up to date over callgraph changes?

>>>>>>> @@ -921,6 +1055,23 @@ end:

>>>>>>>    if (TREE_NOTHROW (decl))

>>>>>>>      l->can_throw = false;

>>>>>>>

>>>>>>> +  if (ipa)

>>>>>>> +    {

>>>>>>> +      vec<cgraph_node *> v = vNULL;

>>>>>>> +      l->malloc_state = PURE_CONST_MALLOC_BOTTOM;

>>>>>>> +      if (DECL_IS_MALLOC (decl))

>>>>>>> +     l->malloc_state = PURE_CONST_MALLOC;

>>>>>>> +      else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v))

>>>>>>> +     {

>>>>>>> +       l->malloc_state = PURE_CONST_MALLOC_TOP;

>>>>>>> +       vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL);

>>>>>>> +       for (unsigned i = 0; i < v.length (); ++i)

>>>>>>> +         callees_p->safe_push (v[i]);

>>>>>>> +       return_callees_map->put (fn, callees_p);

>>>>>>> +     }

>>>>>>> +      v.release ();

>>>>>>> +    }

>>>>>>> +

>>>>>>

>>>>>> I would do non-ipa variant, too.  I think most attributes can be detected that way

>>>>>> as well.

>>>>>>

>>>>>> The patch generally makes sense to me.  It would be nice to make it easier to write such

>>>>>> a basic propagators across callgraph (perhaps adding a template doing the basic

>>>>>> propagation logic). Also I think you need to solve the problem with keeping your

>>>>>> summaries up to date across callgraph node removal and duplications.

>>>>> Thanks for the suggestions, I will try to address them in a follow-up patch.

>>>>> IIUC, I would need to modify ipa-pure-const cgraph hooks -

>>>>> add_new_function, remove_node_data, duplicate_node_data

>>>>> to keep return_callees_map up-to-date across callgraph node insertions

>>>>> and removal ?

>>>>>

>>>>> Also, if instead of having a separate data-structure like return_callees_map,

>>>>> should we rather have a flag within cgraph_edge, which marks that the

>>>>> caller may return the value of the callee ?

>>>> Hi,

>>>> Sorry for the very late response. I have attached an updated version

>>>> of the prototype patch,

>>>> which adds a non-ipa variant, and keeps return_callees_map up-to-date

>>>> across callgraph

>>>> node insertions and removal. For the non-ipa variant,

>>>> malloc_candidate_p() additionally checks

>>>> that all the "return callees" have DECL_IS_MALLOC set to true.

>>>> Bootstrapped+tested and LTO bootstrapped+tested on x86_64-unknown-linux-gnu.

>>>> Does it look OK so far ?

>>>>

>>>> Um sorry for this silly question, but I don't really understand how

>>>> does indirect call propagation

>>>> work in ipa-pure-const ? For example consider propagation of nothrow

>>>> attribute in following

>>>> test-case:

>>>>

>>>> __attribute__((noinline, noclone, nothrow))

>>>> int f1(int k) { return k; }

>>>>

>>>> __attribute__((noinline, noclone))

>>>> static int foo(int (*p)(int))

>>>> {

>>>>   return p(10);

>>>> }

>>>>

>>>> __attribute__((noinline, noclone))

>>>> int bar(void)

>>>> {

>>>>   return foo(f1);

>>>> }

>>>>

>>>> Shouldn't foo and bar be also marked as nothrow ?

>>>> Since foo indirectly calls f1 which is nothrow and bar only calls foo ?

>>>> The local-pure-const2 dump shows function is locally throwing  for

>>>> "foo" and "bar".

>>>>

>>>> Um, I was wondering how to get "points-to" analysis for function-pointers,

>>>> to get list of callees that may be indirectly called from that

>>>> function pointer ?

>>>> In the patch I just set node to bottom if it contains indirect calls

>>>> which is far from ideal :(

>>>> I would be much grateful for suggestions on how to handle indirect calls.

>>>> Thanks!

>>> ping https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html

>> ping * 2 https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html

> ping * 3 https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html

ping * 4 https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html

Thanks,
Prathamesh
>

> Thanks,

> Prathamesh

>>

>> Thanks,

>> Prathamesh

>>>

>>> Thanks,

>>> Prathamesh

>>>>

>>>> Regards,

>>>> Prathamesh

>>>>>

>>>>> Thanks,

>>>>> Prathamesh

>>>>>>

>>>>>> Honza
Prathamesh Kulkarni Sept. 25, 2017, 6:13 p.m. UTC | #13
On 15 September 2017 at 17:49, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
> On 1 September 2017 at 08:09, Prathamesh Kulkarni

> <prathamesh.kulkarni@linaro.org> wrote:

>> On 17 August 2017 at 18:02, Prathamesh Kulkarni

>> <prathamesh.kulkarni@linaro.org> wrote:

>>> On 8 August 2017 at 09:50, Prathamesh Kulkarni

>>> <prathamesh.kulkarni@linaro.org> wrote:

>>>> On 31 July 2017 at 23:53, Prathamesh Kulkarni

>>>> <prathamesh.kulkarni@linaro.org> wrote:

>>>>> On 23 May 2017 at 19:10, Prathamesh Kulkarni

>>>>> <prathamesh.kulkarni@linaro.org> wrote:

>>>>>> On 19 May 2017 at 19:02, Jan Hubicka <hubicka@ucw.cz> wrote:

>>>>>>>>

>>>>>>>> * LTO and memory management

>>>>>>>> This is a general question about LTO and memory management.

>>>>>>>> IIUC the following sequence takes place during normal LTO:

>>>>>>>> LGEN: generate_summary, write_summary

>>>>>>>> WPA: read_summary, execute ipa passes, write_opt_summary

>>>>>>>>

>>>>>>>> So I assumed it was OK in LGEN to allocate return_callees_map in

>>>>>>>> generate_summary and free it in write_summary and during WPA, allocate

>>>>>>>> return_callees_map in read_summary and free it after execute (since

>>>>>>>> write_opt_summary does not require return_callees_map).

>>>>>>>>

>>>>>>>> However with fat LTO, it seems the sequence changes for LGEN with

>>>>>>>> execute phase takes place after write_summary. However since

>>>>>>>> return_callees_map is freed in pure_const_write_summary and

>>>>>>>> propagate_malloc() accesses it in execute stage, it results in

>>>>>>>> segmentation fault.

>>>>>>>>

>>>>>>>> To work around this, I am using the following hack in pure_const_write_summary:

>>>>>>>> // FIXME: Do not free if -ffat-lto-objects is enabled.

>>>>>>>> if (!global_options.x_flag_fat_lto_objects)

>>>>>>>>   free_return_callees_map ();

>>>>>>>> Is there a better approach for handling this ?

>>>>>>>

>>>>>>> I think most passes just do not free summaries with -flto.  We probably want

>>>>>>> to fix it to make it possible to compile multiple units i.e. from plugin by

>>>>>>> adding release_summaries method...

>>>>>>> So I would say it is OK to do the same as others do and leak it with -flto.

>>>>>>>> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c

>>>>>>>> index e457166ea39..724c26e03f6 100644

>>>>>>>> --- a/gcc/ipa-pure-const.c

>>>>>>>> +++ b/gcc/ipa-pure-const.c

>>>>>>>> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see

>>>>>>>>  #include "tree-scalar-evolution.h"

>>>>>>>>  #include "intl.h"

>>>>>>>>  #include "opts.h"

>>>>>>>> +#include "ssa.h"

>>>>>>>>

>>>>>>>>  /* Lattice values for const and pure functions.  Everything starts out

>>>>>>>>     being const, then may drop to pure and then neither depending on

>>>>>>>> @@ -69,6 +70,15 @@ enum pure_const_state_e

>>>>>>>>

>>>>>>>>  const char *pure_const_names[3] = {"const", "pure", "neither"};

>>>>>>>>

>>>>>>>> +enum malloc_state_e

>>>>>>>> +{

>>>>>>>> +  PURE_CONST_MALLOC_TOP,

>>>>>>>> +  PURE_CONST_MALLOC,

>>>>>>>> +  PURE_CONST_MALLOC_BOTTOM

>>>>>>>> +};

>>>>>>>

>>>>>>> It took me a while to work out what PURE_CONST means here :)

>>>>>>> I would just call it something like STATE_MALLOC_TOP... or so.

>>>>>>> ipa_pure_const is outdated name from the time pass was doing only

>>>>>>> those two.

>>>>>>>> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state;

>>>>>>>>

>>>>>>>>  static vec<funct_state> funct_state_vec;

>>>>>>>>

>>>>>>>> +/* A map from node to subset of callees. The subset contains those callees

>>>>>>>> + * whose return-value is returned by the node. */

>>>>>>>> +static hash_map< cgraph_node *, vec<cgraph_node *>* > *return_callees_map;

>>>>>>>> +

>>>>>>>

>>>>>>> Hehe, a special case of return jump function.  We ought to support those more generally.

>>>>>>> How do you keep it up to date over callgraph changes?

>>>>>>>> @@ -921,6 +1055,23 @@ end:

>>>>>>>>    if (TREE_NOTHROW (decl))

>>>>>>>>      l->can_throw = false;

>>>>>>>>

>>>>>>>> +  if (ipa)

>>>>>>>> +    {

>>>>>>>> +      vec<cgraph_node *> v = vNULL;

>>>>>>>> +      l->malloc_state = PURE_CONST_MALLOC_BOTTOM;

>>>>>>>> +      if (DECL_IS_MALLOC (decl))

>>>>>>>> +     l->malloc_state = PURE_CONST_MALLOC;

>>>>>>>> +      else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v))

>>>>>>>> +     {

>>>>>>>> +       l->malloc_state = PURE_CONST_MALLOC_TOP;

>>>>>>>> +       vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL);

>>>>>>>> +       for (unsigned i = 0; i < v.length (); ++i)

>>>>>>>> +         callees_p->safe_push (v[i]);

>>>>>>>> +       return_callees_map->put (fn, callees_p);

>>>>>>>> +     }

>>>>>>>> +      v.release ();

>>>>>>>> +    }

>>>>>>>> +

>>>>>>>

>>>>>>> I would do non-ipa variant, too.  I think most attributes can be detected that way

>>>>>>> as well.

>>>>>>>

>>>>>>> The patch generally makes sense to me.  It would be nice to make it easier to write such

>>>>>>> a basic propagators across callgraph (perhaps adding a template doing the basic

>>>>>>> propagation logic). Also I think you need to solve the problem with keeping your

>>>>>>> summaries up to date across callgraph node removal and duplications.

>>>>>> Thanks for the suggestions, I will try to address them in a follow-up patch.

>>>>>> IIUC, I would need to modify ipa-pure-const cgraph hooks -

>>>>>> add_new_function, remove_node_data, duplicate_node_data

>>>>>> to keep return_callees_map up-to-date across callgraph node insertions

>>>>>> and removal ?

>>>>>>

>>>>>> Also, if instead of having a separate data-structure like return_callees_map,

>>>>>> should we rather have a flag within cgraph_edge, which marks that the

>>>>>> caller may return the value of the callee ?

>>>>> Hi,

>>>>> Sorry for the very late response. I have attached an updated version

>>>>> of the prototype patch,

>>>>> which adds a non-ipa variant, and keeps return_callees_map up-to-date

>>>>> across callgraph

>>>>> node insertions and removal. For the non-ipa variant,

>>>>> malloc_candidate_p() additionally checks

>>>>> that all the "return callees" have DECL_IS_MALLOC set to true.

>>>>> Bootstrapped+tested and LTO bootstrapped+tested on x86_64-unknown-linux-gnu.

>>>>> Does it look OK so far ?

>>>>>

>>>>> Um sorry for this silly question, but I don't really understand how

>>>>> does indirect call propagation

>>>>> work in ipa-pure-const ? For example consider propagation of nothrow

>>>>> attribute in following

>>>>> test-case:

>>>>>

>>>>> __attribute__((noinline, noclone, nothrow))

>>>>> int f1(int k) { return k; }

>>>>>

>>>>> __attribute__((noinline, noclone))

>>>>> static int foo(int (*p)(int))

>>>>> {

>>>>>   return p(10);

>>>>> }

>>>>>

>>>>> __attribute__((noinline, noclone))

>>>>> int bar(void)

>>>>> {

>>>>>   return foo(f1);

>>>>> }

>>>>>

>>>>> Shouldn't foo and bar be also marked as nothrow ?

>>>>> Since foo indirectly calls f1 which is nothrow and bar only calls foo ?

>>>>> The local-pure-const2 dump shows function is locally throwing  for

>>>>> "foo" and "bar".

>>>>>

>>>>> Um, I was wondering how to get "points-to" analysis for function-pointers,

>>>>> to get list of callees that may be indirectly called from that

>>>>> function pointer ?

>>>>> In the patch I just set node to bottom if it contains indirect calls

>>>>> which is far from ideal :(

>>>>> I would be much grateful for suggestions on how to handle indirect calls.

>>>>> Thanks!

>>>> ping https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html

>>> ping * 2 https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html

>> ping * 3 https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html

> ping * 4 https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html

Hi Honza,
Could you please have a look at this patch ?
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html

I tested it with SPEC2006 on AArch64 Cortex-a57 processor and saw some
improvement for
433.milc (+1.79%), 437.leslie3d (+2.84%) and 470.lbm (+4%) and not
much differences for other benchmarks.
I don't expect them to be precise though, it was run with only one
iteration of SPEC.
Thanks!

Regards,
Prathamesh
>

> Thanks,

> Prathamesh

>>

>> Thanks,

>> Prathamesh

>>>

>>> Thanks,

>>> Prathamesh

>>>>

>>>> Thanks,

>>>> Prathamesh

>>>>>

>>>>> Regards,

>>>>> Prathamesh

>>>>>>

>>>>>> Thanks,

>>>>>> Prathamesh

>>>>>>>

>>>>>>> Honza
Jan Hubicka Sept. 26, 2017, 12:24 a.m. UTC | #14
> Hi Honza,

> Could you please have a look at this patch ?

> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html


I can and I should have done long time ago. I really apologize for slow response
and I will try to be more timely from now on. The reason was that I had some
patches that I was thinking I would like to push out first, but I guess since
they are still not ready it is better to go other way around.

+/* A map from node to subset of callees. The subset contains those callees
+ * whose return-value is returned by the node. */
+static hash_map< cgraph_node *, vec<cgraph_node *>* > *return_callees_map;

Extra * at the beggining of line.  It would make more sense to put those
and the other bits into function_summary rather than using the hooks
but that is something we co do incrementally.

I wonder what happens here when, say, ipa-icf redirect the call to eqivaelnt
function and removes the callee?  Perhaps we realy want to have set of call
sites rahter than nodes stored from analysis to execution. Call sites have
unique stmts and uids, so it will be possible to map them back and forth.

+static bool
+check_retval_uses (tree retval, gimple *stmt)
+{

there is missing toplevel comment on those.

+/*
+ * Currently this function does a very conservative analysis to check if
+ * function could be a malloc candidate.
+ *
+ * The function is considered to be a candidate if
+ * 1) The function returns a value of pointer type.
+ * 2) SSA_NAME_DEF_STMT (return_value) is either a function call or
+ *    a phi, and element of phi is either NULL or
+ *    SSA_NAME_DEF_STMT(element) is function call.
+ * 3) The return-value has immediate uses only within comparisons (gcond or gassign)
+ *    and return_stmt (and likewise a phi arg has immediate use only within comparison
+ *    or the phi stmt).
+ */

Now * in begginig of lines. Theoretically by coding standards the comment
should start with description of what function does and what are the parameters.
I believe Richi already commented on this part - which is more of his domain,
but it seems fine to me.

Pehraps with -details dump it would be nice to dump reason why the malloc
candidate was rejected.

+DEBUG_FUNCTION
+static void
+dump_malloc_lattice (FILE *dump_file, const char *s)

+static void
+propagate_malloc (void)

For coding standards, please add block comments.

With these changes the patch looks good to me!
Honza

> 

> I tested it with SPEC2006 on AArch64 Cortex-a57 processor and saw some

> improvement for

> 433.milc (+1.79%), 437.leslie3d (+2.84%) and 470.lbm (+4%) and not

> much differences for other benchmarks.

> I don't expect them to be precise though, it was run with only one

> iteration of SPEC.

> Thanks!

> 

> Regards,

> Prathamesh

> >

> > Thanks,

> > Prathamesh

> >>

> >> Thanks,

> >> Prathamesh

> >>>

> >>> Thanks,

> >>> Prathamesh

> >>>>

> >>>> Thanks,

> >>>> Prathamesh

> >>>>>

> >>>>> Regards,

> >>>>> Prathamesh

> >>>>>>

> >>>>>> Thanks,

> >>>>>> Prathamesh

> >>>>>>>

> >>>>>>> Honza
Prathamesh Kulkarni Sept. 27, 2017, 1:11 a.m. UTC | #15
On 25 September 2017 at 17:24, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Hi Honza,

>> Could you please have a look at this patch ?

>> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html

>

> I can and I should have done long time ago. I really apologize for slow response

> and I will try to be more timely from now on. The reason was that I had some

> patches that I was thinking I would like to push out first, but I guess since

> they are still not ready it is better to go other way around.

No worries, and thanks for the feedback!
>

> +/* A map from node to subset of callees. The subset contains those callees

> + * whose return-value is returned by the node. */

> +static hash_map< cgraph_node *, vec<cgraph_node *>* > *return_callees_map;

>

> Extra * at the beggining of line.  It would make more sense to put those

> and the other bits into function_summary rather than using the hooks

> but that is something we co do incrementally.

>

> I wonder what happens here when, say, ipa-icf redirect the call to eqivaelnt

> function and removes the callee?  Perhaps we realy want to have set of call

> sites rahter than nodes stored from analysis to execution. Call sites have

> unique stmts and uids, so it will be possible to map them back and forth.

IIUC, call site is represented using cgraph_edge ?
So change return_callees_map to be the mapping from node to subset of
it's call-sites where
node returns the value of one of it's callees:
static hash_map< cgraph_node *, vec<cgraph_edge *> *> *return_callees_map; ?
>

> +static bool

> +check_retval_uses (tree retval, gimple *stmt)

> +{

>

> there is missing toplevel comment on those.

>

> +/*

> + * Currently this function does a very conservative analysis to check if

> + * function could be a malloc candidate.

> + *

> + * The function is considered to be a candidate if

> + * 1) The function returns a value of pointer type.

> + * 2) SSA_NAME_DEF_STMT (return_value) is either a function call or

> + *    a phi, and element of phi is either NULL or

> + *    SSA_NAME_DEF_STMT(element) is function call.

> + * 3) The return-value has immediate uses only within comparisons (gcond or gassign)

> + *    and return_stmt (and likewise a phi arg has immediate use only within comparison

> + *    or the phi stmt).

> + */

>

> Now * in begginig of lines. Theoretically by coding standards the comment

> should start with description of what function does and what are the parameters.

> I believe Richi already commented on this part - which is more of his domain,

> but it seems fine to me.

>

> Pehraps with -details dump it would be nice to dump reason why the malloc

> candidate was rejected.

>

> +DEBUG_FUNCTION

> +static void

> +dump_malloc_lattice (FILE *dump_file, const char *s)

>

> +static void

> +propagate_malloc (void)

>

> For coding standards, please add block comments.

Thanks for the suggestions, I will try to address them in the next
version of the patch.

Regards,
Prathamesh
>

> With these changes the patch looks good to me!

> Honza

>

>>

>> I tested it with SPEC2006 on AArch64 Cortex-a57 processor and saw some

>> improvement for

>> 433.milc (+1.79%), 437.leslie3d (+2.84%) and 470.lbm (+4%) and not

>> much differences for other benchmarks.

>> I don't expect them to be precise though, it was run with only one

>> iteration of SPEC.

>> Thanks!

>>

>> Regards,

>> Prathamesh

>> >

>> > Thanks,

>> > Prathamesh

>> >>

>> >> Thanks,

>> >> Prathamesh

>> >>>

>> >>> Thanks,

>> >>> Prathamesh

>> >>>>

>> >>>> Thanks,

>> >>>> Prathamesh

>> >>>>>

>> >>>>> Regards,

>> >>>>> Prathamesh

>> >>>>>>

>> >>>>>> Thanks,

>> >>>>>> Prathamesh

>> >>>>>>>

>> >>>>>>> Honza
Jan Hubicka Sept. 29, 2017, 7:28 p.m. UTC | #16
> > I wonder what happens here when, say, ipa-icf redirect the call to eqivaelnt

> > function and removes the callee?  Perhaps we realy want to have set of call

> > sites rahter than nodes stored from analysis to execution. Call sites have

> > unique stmts and uids, so it will be possible to map them back and forth.

> IIUC, call site is represented using cgraph_edge ?


Yes, there is call_stmt pointer when gimple is read and uid in WPA mode which
are unique.  There is call_summary class which lets you to associate info with
a call site. While it is not most memory effecient to store one bit of information
this way, i guess it may be easiest to use it in anticipation of it becoming
more useful in foreseeable future.  We have some bits in call edge itself that
may be shuffled to the summary as well.

> So change return_callees_map to be the mapping from node to subset of

> it's call-sites where

> node returns the value of one of it's callees:

> static hash_map< cgraph_node *, vec<cgraph_edge *> *> *return_callees_map; ?


This should work too, though storing direct pointers to edges is something we
probably want to avoid to keep memory representation of edges in bounds.

I would go with call_summary - everything else seems like bit of premature
optimization.  If we will decide to optimize it later, we may invent a variant
of summary datatype for that.

Thanks,
Honza
Prathamesh Kulkarni Oct. 6, 2017, 2:16 a.m. UTC | #17
On 29 September 2017 at 12:28, Jan Hubicka <hubicka@ucw.cz> wrote:
>> > I wonder what happens here when, say, ipa-icf redirect the call to eqivaelnt

>> > function and removes the callee?  Perhaps we realy want to have set of call

>> > sites rahter than nodes stored from analysis to execution. Call sites have

>> > unique stmts and uids, so it will be possible to map them back and forth.

>> IIUC, call site is represented using cgraph_edge ?

>

> Yes, there is call_stmt pointer when gimple is read and uid in WPA mode which

> are unique.  There is call_summary class which lets you to associate info with

> a call site. While it is not most memory effecient to store one bit of information

> this way, i guess it may be easiest to use it in anticipation of it becoming

> more useful in foreseeable future.  We have some bits in call edge itself that

> may be shuffled to the summary as well.

>

>> So change return_callees_map to be the mapping from node to subset of

>> it's call-sites where

>> node returns the value of one of it's callees:

>> static hash_map< cgraph_node *, vec<cgraph_edge *> *> *return_callees_map; ?

>

> This should work too, though storing direct pointers to edges is something we

> probably want to avoid to keep memory representation of edges in bounds.

>

> I would go with call_summary - everything else seems like bit of premature

> optimization.  If we will decide to optimize it later, we may invent a variant

> of summary datatype for that.

Hi Honza,
Thanks for the detailed suggestions, I have updated the patch accordingly.
I have following questions on call_summary:
1] I added field bool is_return_callee in ipa_call_summary to track
whether the caller possibly returns value returned by callee, which
gets rid of return_callees_map. I assume ipa_call_summary_t::remove()
and ipa_call_summary_t::duplicate() will already take care of handling
late insertion/removal of cgraph nodes ? I just initialized
is_return_callee to false in ipa_call_summary::reset and that seems to
work. I am not sure though if I have handled it correctly. Could you
please check that ?

2] ipa_inline() called ipa_free_fn_summary, which made
ipa_call_summaries unavailable during ipa-pure-const pass. I removed
call to ipa_free_fn_summary from ipa_inline, and moved it to
ipa_pure_const::execute(). Is that OK ?

Patch passes bootstrap+test and lto bootstrap+test on x86_64-unknown-linux-gnu.
Verfiied SPEC2k6 compiles and runs without miscompares with LTO
enabled on aarch64-linux-gnu.
Cross-tested on arm*-*-* and aarch64*-*-*. I will additionally test
the patch by building chromium or firefox.
Would it be OK to commit if it passes above validations ?

Thanks,
Prathamesh
>

> Thanks,

> Honza
2017-10-05  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

	* cgraph.h (set_malloc_flag): Declare.
	* cgraph.c (set_malloc_flag_1): New function.
	(set_malloc_flag): Likewise.
	* ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee.
	* ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to
	false.
	(read_ipa_call_summary): Add support for reading is_return_callee.
	(write_ipa_call_summary): Stream is_return_callee.
	* ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary.
	* ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, symbol-summary.h,
	ipa-prop.h, ipa-fnsummary.h.
	(malloc_state_e): Define.
	(malloc_state_names): Define.
	(funct_state_d): Add field malloc_state.
	(varying_state): Set malloc_state to STATE_MALLOC_BOTTOM.
	(check_retval_uses): New function.
	(malloc_candidate_p): Likewise.
	(analyze_function): Add support for malloc attribute.
	(pure_const_write_summary): Stream malloc_state.
	(pure_const_read_summary): Add support for reading malloc_state.
	(dump_malloc_lattice): New function.
	(propagate_malloc): New function.
	(ipa_pure_const::execute): Call propagate_malloc and
	ipa_free_fn_summary.
	(pass_local_pure_const::execute): Add support for malloc attribute.
	* ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro.

testsuite/
	* gcc.dg/ipa/propmalloc-1.c: New test-case.
	* gcc.dg/ipa/propmalloc-2.c: Likewise.
	* gcc.dg/ipa/propmalloc-3.c: Likewise.

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 3d0cefbd46b..0aad90d59ea 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2530,6 +2530,53 @@ cgraph_node::set_nothrow_flag (bool nothrow)
   return changed;
 }
 
+/* Worker to set malloc flag.  */
+static void
+set_malloc_flag_1 (cgraph_node *node, bool malloc_p, bool *changed)
+{
+  if (malloc_p && !DECL_IS_MALLOC (node->decl))
+    {
+      DECL_IS_MALLOC (node->decl) = true;
+      *changed = true;
+    }
+
+  ipa_ref *ref;
+  FOR_EACH_ALIAS (node, ref)
+    {
+      cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);
+      if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)
+	set_malloc_flag_1 (alias, malloc_p, changed);
+    }
+
+  for (cgraph_edge *e = node->callers; e; e = e->next_caller)
+    if (e->caller->thunk.thunk_p
+	&& (!malloc_p || e->caller->get_availability () > AVAIL_INTERPOSABLE))
+      set_malloc_flag_1 (e->caller, malloc_p, changed);
+}
+
+/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any.  */
+
+bool
+cgraph_node::set_malloc_flag (bool malloc_p)
+{
+  bool changed = false;
+
+  if (!malloc_p || get_availability () > AVAIL_INTERPOSABLE)
+    set_malloc_flag_1 (this, malloc_p, &changed);
+  else
+    {
+      ipa_ref *ref;
+
+      FOR_EACH_ALIAS (this, ref)
+	{
+	  cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);
+	  if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)
+	    set_malloc_flag_1 (alias, malloc_p, &changed);
+	}
+    }
+  return changed;
+}
+
 /* Worker to set_const_flag.  */
 
 static void
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 1758e8b08c1..84824e9f814 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1151,6 +1151,10 @@ public:
      if any to NOTHROW.  */
   bool set_nothrow_flag (bool nothrow);
 
+  /* SET DECL_IS_MALLOC on cgraph_node's decl and on aliases of the node
+     if any.  */
+  bool set_malloc_flag (bool malloc_p);
+
   /* If SET_CONST is true, mark function, aliases and thunks to be ECF_CONST.
     If SET_CONST if false, clear the flag.
 
diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 076ccd40bd7..055345c2cdf 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -542,6 +542,7 @@ void
 ipa_call_summary::reset ()
 {
   call_stmt_size = call_stmt_time = 0;
+  is_return_callee = false;
   if (predicate)
     edge_predicate_pool.remove (predicate);
   predicate = NULL;
@@ -3204,6 +3205,10 @@ read_ipa_call_summary (struct lto_input_block *ib, struct cgraph_edge *e)
   es->call_stmt_size = streamer_read_uhwi (ib);
   es->call_stmt_time = streamer_read_uhwi (ib);
   es->loop_depth = streamer_read_uhwi (ib);
+
+  bitpack_d bp = streamer_read_bitpack (ib);
+  es->is_return_callee = bp_unpack_value (&bp, 1);
+
   p.stream_in (ib);
   edge_set_predicate (e, &p);
   length = streamer_read_uhwi (ib);
@@ -3360,6 +3365,11 @@ write_ipa_call_summary (struct output_block *ob, struct cgraph_edge *e)
   streamer_write_uhwi (ob, es->call_stmt_size);
   streamer_write_uhwi (ob, es->call_stmt_time);
   streamer_write_uhwi (ob, es->loop_depth);
+
+  bitpack_d bp = bitpack_create (ob->main_stream);
+  bp_pack_value (&bp, es->is_return_callee, 1);
+  streamer_write_bitpack (&bp);
+
   if (es->predicate)
     es->predicate->stream_out (ob);
   else
diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h
index f50d6806e61..613a2b6f625 100644
--- a/gcc/ipa-fnsummary.h
+++ b/gcc/ipa-fnsummary.h
@@ -197,7 +197,9 @@ struct ipa_call_summary
   int call_stmt_time;
   /* Depth of loop nest, 0 means no nesting.  */
   unsigned int loop_depth;
-  
+  /* Indicates whether the caller returns the value of it's callee.  */
+  bool is_return_callee;
+
   /* Keep all field empty so summary dumping works during its computation.
      This is useful for debugging.  */
   ipa_call_summary ()
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index dd46cb61362..b8e65e2fa7e 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -2526,9 +2526,6 @@ ipa_inline (void)
 
   if (dump_file)
     ipa_dump_fn_summaries (dump_file);
-  /* In WPA we use inline summaries for partitioning process.  */
-  if (!flag_wpa)
-    ipa_free_fn_summary ();
   return remove_functions ? TODO_remove_functions : 0;
 }
 
diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index dac8f0d5f21..aca1f86bf5b 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -56,6 +56,11 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "intl.h"
 #include "opts.h"
+#include "ssa.h"
+#include "alloc-pool.h"
+#include "symbol-summary.h"
+#include "ipa-prop.h"
+#include "ipa-fnsummary.h"
 
 /* Lattice values for const and pure functions.  Everything starts out
    being const, then may drop to pure and then neither depending on
@@ -69,6 +74,15 @@ enum pure_const_state_e
 
 const char *pure_const_names[3] = {"const", "pure", "neither"};
 
+enum malloc_state_e
+{
+  STATE_MALLOC_TOP,
+  STATE_MALLOC,
+  STATE_MALLOC_BOTTOM
+};
+
+const char *malloc_state_names[] = {"malloc_top", "malloc", "malloc_bottom"};
+
 /* Holder for the const_state.  There is one of these per function
    decl.  */
 struct funct_state_d
@@ -92,11 +106,13 @@ struct funct_state_d
   /* If function can call free, munmap or otherwise make previously
      non-trapping memory accesses trapping.  */
   bool can_free;
+
+  enum malloc_state_e malloc_state;
 };
 
 /* State used when we know nothing about function.  */
 static struct funct_state_d varying_state
-   = { IPA_NEITHER, IPA_NEITHER, true, true, true, true };
+   = { IPA_NEITHER, IPA_NEITHER, true, true, true, true, STATE_MALLOC_BOTTOM };
 
 
 typedef struct funct_state_d * funct_state;
@@ -812,6 +828,149 @@ check_stmt (gimple_stmt_iterator *gsip, funct_state local, bool ipa)
     }
 }
 
+/* Check that RETVAL is used only in STMT and in comparisons against 0.
+   RETVAL is return value of the function and STMT is return stmt.  */
+
+static bool
+check_retval_uses (tree retval, gimple *stmt)
+{
+  imm_use_iterator use_iter;
+  gimple *use_stmt;
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, retval)
+    if (gcond *cond = dyn_cast<gcond *> (use_stmt))
+      {
+	tree op2 = gimple_cond_rhs (cond);
+	if (!integer_zerop (op2))
+	  RETURN_FROM_IMM_USE_STMT (use_iter, false);
+      }
+    else if (gassign *ga = dyn_cast<gassign *> (use_stmt))
+      {
+	enum tree_code code = gimple_assign_rhs_code (ga);
+	if (TREE_CODE_CLASS (code) != tcc_comparison)
+	  RETURN_FROM_IMM_USE_STMT (use_iter, false);
+	if (!integer_zerop (gimple_assign_rhs2 (ga)))
+	  RETURN_FROM_IMM_USE_STMT (use_iter, false);
+      }
+    else if (is_gimple_debug (use_stmt))
+      ;
+    else if (use_stmt != stmt)
+      RETURN_FROM_IMM_USE_STMT (use_iter, false);
+
+  return true;
+}
+
+/* malloc_candidate_p() checks if FUN can possibly be annotated with malloc
+   attribute. Currently this function does a very conservative analysis.
+   FUN is considered to be a candidate if
+   1) It returns a value of pointer type.
+   2) SSA_NAME_DEF_STMT (return_value) is either a function call or
+      a phi, and element of phi is either NULL or
+      SSA_NAME_DEF_STMT(element) is function call.
+   3) The return-value has immediate uses only within comparisons (gcond or gassign)
+      and return_stmt (and likewise a phi arg has immediate use only within comparison
+      or the phi stmt).  */
+
+static bool
+malloc_candidate_p (function *fun, bool ipa)
+{
+  basic_block exit_block = EXIT_BLOCK_PTR_FOR_FN (fun);
+  edge e;
+  edge_iterator ei;
+  cgraph_node *node = cgraph_node::get_create (fun->decl);
+
+#define DUMP_AND_RETURN(reason)  \
+{  \
+  if (dump_file && (dump_flags & TDF_DETAILS))  \
+    fprintf (dump_file, "%s", (reason));  \
+  return false;  \
+}
+
+  if (EDGE_COUNT (exit_block->preds) == 0)
+    return false;
+
+  FOR_EACH_EDGE (e, ei, exit_block->preds)
+    {
+      gimple_stmt_iterator gsi = gsi_last_bb (e->src);
+      greturn *ret_stmt = dyn_cast<greturn *> (gsi_stmt (gsi));
+
+      if (!ret_stmt)
+	return false;
+
+      tree retval = gimple_return_retval (ret_stmt);
+      if (!retval)
+	DUMP_AND_RETURN("No return value.")
+
+      if (TREE_CODE (retval) != SSA_NAME
+	  || TREE_CODE (TREE_TYPE (retval)) != POINTER_TYPE)
+	DUMP_AND_RETURN("Return value is not SSA_NAME or not a pointer type.")
+
+      if (!check_retval_uses (retval, ret_stmt))
+	DUMP_AND_RETURN("Return value has uses outside return stmt"
+			" and comparisons against 0.")
+
+      gimple *def = SSA_NAME_DEF_STMT (retval);
+      if (gcall *call_stmt = dyn_cast<gcall *> (def))
+	{
+	  tree callee_decl = gimple_call_fndecl (call_stmt);
+	  if (!callee_decl)
+	    return false;
+
+	  if (!ipa && !DECL_IS_MALLOC (callee_decl))
+	    DUMP_AND_RETURN("callee_decl does not have malloc attribute for"
+			    " non-ipa mode.")
+
+	  cgraph_edge *cs = node->get_edge (call_stmt);
+	  if (cs)
+	    {
+	      ipa_call_summary *es = ipa_call_summaries->get (cs);
+	      gcc_assert (es);
+	      es->is_return_callee = true;
+	    }
+	}
+
+      else if (gphi *phi = dyn_cast<gphi *> (def))
+	for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+	  {
+	    tree arg = gimple_phi_arg_def (phi, i);
+	    if (TREE_CODE (arg) != SSA_NAME)
+	      DUMP_AND_RETURN("phi arg is not SSA_NAME.")
+	    if (!(arg == null_pointer_node || check_retval_uses (arg, phi)))
+	      DUMP_AND_RETURN("phi arg has uses outside phi"
+			      " and comparisons against 0.")
+
+	    gimple *arg_def = SSA_NAME_DEF_STMT (arg);
+	    gcall *call_stmt = dyn_cast<gcall *> (arg_def);
+	    if (!call_stmt)
+	      return false;
+	    tree callee_decl = gimple_call_fndecl (call_stmt);
+	    if (!callee_decl)
+	      return false;
+	    if (!ipa && !DECL_IS_MALLOC (callee_decl))
+	      DUMP_AND_RETURN("callee_decl does not have malloc attribute for"
+			      " non-ipa mode.")
+
+	    cgraph_edge *cs = node->get_edge (call_stmt);
+	    if (cs)
+	      {
+		ipa_call_summary *es = ipa_call_summaries->get (cs);
+		gcc_assert (es);
+		es->is_return_callee = true;
+	      }
+	  }
+
+      else
+	DUMP_AND_RETURN("def_stmt of return value is not a call or phi-stmt.")
+    }
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "\nFound %s to be candidate for malloc attribute\n",
+	     IDENTIFIER_POINTER (DECL_NAME (fun->decl)));
+  return true;
+
+#undef DUMP_AND_RETURN
+}
+
 
 /* This is the main routine for finding the reference patterns for
    global variables within a function FN.  */
@@ -921,6 +1080,14 @@ end:
   if (TREE_NOTHROW (decl))
     l->can_throw = false;
 
+  l->malloc_state = STATE_MALLOC_BOTTOM;
+  if (DECL_IS_MALLOC (decl))
+    l->malloc_state = STATE_MALLOC;
+  else if (ipa && malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), true))
+    l->malloc_state = STATE_MALLOC_TOP;
+  else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), false))
+    l->malloc_state = STATE_MALLOC;
+
   pop_cfun ();
   if (dump_file)
     {
@@ -934,6 +1101,8 @@ end:
         fprintf (dump_file, "Function is locally pure.\n");
       if (l->can_free)
 	fprintf (dump_file, "Function can locally free.\n");
+      if (l->malloc_state == STATE_MALLOC)
+	fprintf (dump_file, "Function is locally malloc.\n");
     }
   return l;
 }
@@ -1067,6 +1236,7 @@ pure_const_write_summary (void)
 	  bp_pack_value (&bp, fs->looping, 1);
 	  bp_pack_value (&bp, fs->can_throw, 1);
 	  bp_pack_value (&bp, fs->can_free, 1);
+	  bp_pack_value (&bp, fs->malloc_state, 2);
 	  streamer_write_bitpack (&bp);
 	}
     }
@@ -1127,6 +1297,9 @@ pure_const_read_summary (void)
 	      fs->looping = bp_unpack_value (&bp, 1);
 	      fs->can_throw = bp_unpack_value (&bp, 1);
 	      fs->can_free = bp_unpack_value (&bp, 1);
+	      fs->malloc_state
+			= (enum malloc_state_e) bp_unpack_value (&bp, 2);
+
 	      if (dump_file)
 		{
 		  int flags = flags_from_decl_or_type (node->decl);
@@ -1149,6 +1322,8 @@ pure_const_read_summary (void)
 		    fprintf (dump_file,"  function is locally throwing\n");
 		  if (fs->can_free)
 		    fprintf (dump_file,"  function can locally free\n");
+		  fprintf (dump_file, "\n malloc state: %s\n",
+			   malloc_state_names[fs->malloc_state]);
 		}
 	    }
 
@@ -1659,6 +1834,127 @@ propagate_nothrow (void)
   free (order);
 }
 
+/* Debugging function to dump state of malloc lattice.  */
+
+DEBUG_FUNCTION
+static void
+dump_malloc_lattice (FILE *dump_file, const char *s)
+{
+  if (!dump_file)
+    return;
+
+  fprintf (dump_file, "\n\nMALLOC LATTICE %s:\n", s);
+  cgraph_node *node;
+  FOR_EACH_FUNCTION (node)
+    {
+      funct_state fs = get_function_state (node);
+      malloc_state_e state = fs->malloc_state;
+      fprintf (dump_file, "%s: %s\n", node->name (), malloc_state_names[state]);
+    }
+}
+
+/* Propagate malloc attribute across the callgraph.  */
+
+static void
+propagate_malloc (void)
+{
+  cgraph_node *node;
+  FOR_EACH_FUNCTION (node)
+    {
+      if (DECL_IS_MALLOC (node->decl))
+	if (!has_function_state (node))
+	  {
+	    funct_state l = XCNEW (struct funct_state_d);
+	    *l = varying_state;
+	    l->malloc_state = STATE_MALLOC;
+	    set_function_state (node, l);
+	  }
+    }
+
+  dump_malloc_lattice (dump_file, "Initial");
+  struct cgraph_node **order
+    = XNEWVEC (struct cgraph_node *, symtab->cgraph_count);
+  int order_pos = ipa_reverse_postorder (order);
+  bool changed = true;
+
+  while (changed)
+    {
+      changed = false;
+      /* Walk in postorder.  */
+      for (int i = order_pos - 1; i >= 0; --i)
+	{
+	  cgraph_node *node = order[i];
+	  if (node->alias
+	      || !node->definition
+	      || !has_function_state (node))
+	    continue;
+
+	  funct_state l = get_function_state (node);
+
+	  /* FIXME: add support for indirect-calls.  */
+	  if (node->indirect_calls)
+	    {
+	      l->malloc_state = STATE_MALLOC_BOTTOM;
+	      continue;
+	    }
+
+	  if (node->get_availability () <= AVAIL_INTERPOSABLE)
+	    {
+	      l->malloc_state = STATE_MALLOC_BOTTOM;
+	      continue;
+	    }
+
+	  if (l->malloc_state == STATE_MALLOC_BOTTOM)
+	    continue;
+
+	  vec<cgraph_node *> callees = vNULL;
+	  for (cgraph_edge *cs = node->callees; cs; cs = cs->next_callee)
+	    {
+	      ipa_call_summary *es = ipa_call_summaries->get (cs);
+	      if (es && es->is_return_callee)
+		callees.safe_push (cs->callee);
+	    }
+
+	  malloc_state_e new_state = l->malloc_state;
+	  for (unsigned j = 0; j < callees.length (); j++)
+	    {
+	      cgraph_node *callee = callees[j];
+	      if (!has_function_state (callee))
+		{
+		  new_state = STATE_MALLOC_BOTTOM;
+		  break;
+		}
+	      malloc_state_e callee_state = get_function_state (callee)->malloc_state;
+	      if (new_state < callee_state)
+		new_state = callee_state;
+	    }
+	  if (new_state != l->malloc_state)
+	    {
+	      changed = true;
+	      l->malloc_state = new_state;
+	    }
+	}
+    }
+
+  FOR_EACH_DEFINED_FUNCTION (node)
+    if (has_function_state (node))
+      {
+	funct_state l = get_function_state (node);
+	if (!node->alias
+	    && l->malloc_state == STATE_MALLOC
+	    && !node->global.inlined_to)
+	  {
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      fprintf (dump_file, "Function %s found to be malloc\n",
+		       node->name ());
+	    node->set_malloc_flag (true);
+	  }
+      }
+
+  dump_malloc_lattice (dump_file, "after propagation");
+  ipa_free_postorder_info ();
+  free (order);
+}
 
 /* Produce the global information by preforming a transitive closure
    on the local information that was produced by generate_summary.  */
@@ -1677,6 +1973,7 @@ execute (function *)
   /* Nothrow makes more function to not lead to return and improve
      later analysis.  */
   propagate_nothrow ();
+  propagate_malloc ();
   remove_p = propagate_pure_const ();
 
   /* Cleanup. */
@@ -1684,6 +1981,10 @@ execute (function *)
     if (has_function_state (node))
       free (get_function_state (node));
   funct_state_vec.release ();
+
+  /* In WPA we use inline summaries for partitioning process.  */
+  if (!flag_wpa)
+    ipa_free_fn_summary ();
   return remove_p ? TODO_remove_functions : 0;
 }
 
@@ -1877,6 +2178,17 @@ pass_local_pure_const::execute (function *fun)
 	fprintf (dump_file, "Function found to be nothrow: %s\n",
 		 current_function_name ());
     }
+
+  if (l->malloc_state == STATE_MALLOC
+      && !DECL_IS_MALLOC (current_function_decl))
+    {
+      node->set_malloc_flag (true);
+      changed = true;
+      if (dump_file)
+	fprintf (dump_file, "Function found to be malloc: %s\n",
+		 node->name ());
+    }
+
   free (l);
   if (changed)
     return execute_fixup_cfg ();
diff --git a/gcc/ssa-iterators.h b/gcc/ssa-iterators.h
index c8aa77bd4f3..740cbf13cb2 100644
--- a/gcc/ssa-iterators.h
+++ b/gcc/ssa-iterators.h
@@ -93,6 +93,12 @@ struct imm_use_iterator
      break;							\
    }
 
+/* Similarly for return.  */
+#define RETURN_FROM_IMM_USE_STMT(ITER, VAL)			\
+  {								\
+    end_imm_use_stmt_traverse (&(ITER));			\
+    return (VAL);						\
+  }
 
 /* Use this iterator in combination with FOR_EACH_IMM_USE_STMT to
    get access to each occurrence of ssavar on the stmt returned by
diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c
new file mode 100644
index 00000000000..9a95f817079
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */
+
+__attribute__((noinline, no_icf, used))
+static void *f(__SIZE_TYPE__ n)
+{
+  void *p = __builtin_malloc (n);
+  if (p == 0)
+    __builtin_abort ();
+  return p;
+}
+
+__attribute__((noinline, no_icf, used))
+static void *bar(__SIZE_TYPE__ n)
+{
+  void *p = f (n);
+  return p;
+}
+
+/* { dg-final { scan-ipa-dump "Function f found to be malloc" "pure-const" } } */
+/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */
diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c
new file mode 100644
index 00000000000..95b2fd74a7a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */
+
+__attribute__((noinline, used, no_icf))
+static void *foo (__SIZE_TYPE__ n)
+{
+  return __builtin_malloc (n * 10);
+}
+
+__attribute__((noinline, used, no_icf))
+static void *bar(__SIZE_TYPE__ n, int cond)
+{
+  void *p;
+  if (cond)
+    p = foo (n);
+  else
+    p = __builtin_malloc (n);
+
+  return p;
+}
+
+/* { dg-final { scan-ipa-dump "Function foo found to be malloc" "pure-const" } } */
+/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */
diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c
new file mode 100644
index 00000000000..13558ddd07d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */
+
+static void *foo(__SIZE_TYPE__, int) __attribute__((noinline, no_icf, used));
+
+__attribute__((noinline, used, no_icf))
+static void *bar(__SIZE_TYPE__ n, int m)
+{
+  return foo (n, m);
+}
+
+static void *foo(__SIZE_TYPE__ n, int m)
+{
+  void *p;
+  if (m > 0)
+    p = bar (n, --m);
+  else
+    p = __builtin_malloc (n);
+
+  return p;
+}
+
+/* { dg-final { scan-ipa-dump "Function foo found to be malloc" "pure-const" } } */
+/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */
Jan Hubicka Oct. 6, 2017, 1:04 p.m. UTC | #18
> Hi Honza,

> Thanks for the detailed suggestions, I have updated the patch accordingly.

> I have following questions on call_summary:

> 1] I added field bool is_return_callee in ipa_call_summary to track

> whether the caller possibly returns value returned by callee, which

> gets rid of return_callees_map. I assume ipa_call_summary_t::remove()

> and ipa_call_summary_t::duplicate() will already take care of handling

> late insertion/removal of cgraph nodes ? I just initialized

> is_return_callee to false in ipa_call_summary::reset and that seems to

> work. I am not sure though if I have handled it correctly. Could you

> please check that ?


I was actually thinking to introduce separate summary for ipa-pure-const pass,
but this seems fine to me too (for one bit definitly more effecient)
ipa_call_summary_t::duplicate copies all the fields, so indeed you should be
safe here.  

Also it is possible for functions to be inserted late.  Updating of call summaries
is currently handled by ipa_fn_summary_t::insert
> 

> 2] ipa_inline() called ipa_free_fn_summary, which made

> ipa_call_summaries unavailable during ipa-pure-const pass. I removed

> call to ipa_free_fn_summary from ipa_inline, and moved it to

> ipa_pure_const::execute(). Is that OK ?


Seems OK to me.
> 

> Patch passes bootstrap+test and lto bootstrap+test on x86_64-unknown-linux-gnu.

> Verfiied SPEC2k6 compiles and runs without miscompares with LTO

> enabled on aarch64-linux-gnu.

> Cross-tested on arm*-*-* and aarch64*-*-*. I will additionally test

> the patch by building chromium or firefox.

> Would it be OK to commit if it passes above validations ?

> 

> Thanks,

> Prathamesh

> >

> > Thanks,

> > Honza


> 2017-10-05  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

> 

> 	* cgraph.h (set_malloc_flag): Declare.

> 	* cgraph.c (set_malloc_flag_1): New function.

> 	(set_malloc_flag): Likewise.

> 	* ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee.

> 	* ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to

> 	false.

> 	(read_ipa_call_summary): Add support for reading is_return_callee.

> 	(write_ipa_call_summary): Stream is_return_callee.

> 	* ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary.

> 	* ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, symbol-summary.h,

> 	ipa-prop.h, ipa-fnsummary.h.

> 	(malloc_state_e): Define.

> 	(malloc_state_names): Define.

> 	(funct_state_d): Add field malloc_state.

> 	(varying_state): Set malloc_state to STATE_MALLOC_BOTTOM.

> 	(check_retval_uses): New function.

> 	(malloc_candidate_p): Likewise.

> 	(analyze_function): Add support for malloc attribute.

> 	(pure_const_write_summary): Stream malloc_state.

> 	(pure_const_read_summary): Add support for reading malloc_state.

> 	(dump_malloc_lattice): New function.

> 	(propagate_malloc): New function.

> 	(ipa_pure_const::execute): Call propagate_malloc and

> 	ipa_free_fn_summary.

> 	(pass_local_pure_const::execute): Add support for malloc attribute.

> 	* ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro.

> 

> testsuite/

> 	* gcc.dg/ipa/propmalloc-1.c: New test-case.

> 	* gcc.dg/ipa/propmalloc-2.c: Likewise.

> 	* gcc.dg/ipa/propmalloc-3.c: Likewise.

> 

> diff --git a/gcc/cgraph.c b/gcc/cgraph.c

> index 3d0cefbd46b..0aad90d59ea 100644

> --- a/gcc/cgraph.c

> +++ b/gcc/cgraph.c

> @@ -2530,6 +2530,53 @@ cgraph_node::set_nothrow_flag (bool nothrow)

>    return changed;

>  }

>  

> +/* Worker to set malloc flag.  */

New line here I guess (it is below)
> +static void

> +set_malloc_flag_1 (cgraph_node *node, bool malloc_p, bool *changed)

> +{

> +  if (malloc_p && !DECL_IS_MALLOC (node->decl))

> +    {

> +      DECL_IS_MALLOC (node->decl) = true;

> +      *changed = true;

> +    }

> +

> +  ipa_ref *ref;

> +  FOR_EACH_ALIAS (node, ref)

> +    {

> +      cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);

> +      if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)

> +	set_malloc_flag_1 (alias, malloc_p, changed);

> +    }

> +

> +  for (cgraph_edge *e = node->callers; e; e = e->next_caller)

> +    if (e->caller->thunk.thunk_p

> +	&& (!malloc_p || e->caller->get_availability () > AVAIL_INTERPOSABLE))

> +      set_malloc_flag_1 (e->caller, malloc_p, changed);

> +}

> +

> +/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any.  */

> +

> +bool

> +cgraph_node::set_malloc_flag (bool malloc_p)

> +{

> +  bool changed = false;

> +

> +  if (!malloc_p || get_availability () > AVAIL_INTERPOSABLE)

> +    set_malloc_flag_1 (this, malloc_p, &changed);

> +  else

> +    {

> +      ipa_ref *ref;

> +

> +      FOR_EACH_ALIAS (this, ref)

> +	{

> +	  cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);

> +	  if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)

> +	    set_malloc_flag_1 (alias, malloc_p, &changed);

> +	}

> +    }

> +  return changed;

> +}

> +

> diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h

> index f50d6806e61..613a2b6f625 100644

> --- a/gcc/ipa-fnsummary.h

> +++ b/gcc/ipa-fnsummary.h

> @@ -197,7 +197,9 @@ struct ipa_call_summary

>    int call_stmt_time;

>    /* Depth of loop nest, 0 means no nesting.  */

>    unsigned int loop_depth;

> -  

> +  /* Indicates whether the caller returns the value of it's callee.  */

Perhaps add a comment when it is initialized.
Don't you also check that the calle is not captured? In that case I would
is_return_callee_uncaptured or so, so someone does not try to use it with
different meaning.

> @@ -69,6 +74,15 @@ enum pure_const_state_e

>  

>  const char *pure_const_names[3] = {"const", "pure", "neither"};

>  

> +enum malloc_state_e

> +{

> +  STATE_MALLOC_TOP,

> +  STATE_MALLOC,

> +  STATE_MALLOC_BOTTOM

> +};

> +

> +const char *malloc_state_names[] = {"malloc_top", "malloc", "malloc_bottom"};


Seems static is missing in all those declarations, please fix it.

OK with these changes. Thanks!

Honza
Prathamesh Kulkarni Oct. 6, 2017, 11:16 p.m. UTC | #19
On 6 October 2017 at 06:04, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Hi Honza,

>> Thanks for the detailed suggestions, I have updated the patch accordingly.

>> I have following questions on call_summary:

>> 1] I added field bool is_return_callee in ipa_call_summary to track

>> whether the caller possibly returns value returned by callee, which

>> gets rid of return_callees_map. I assume ipa_call_summary_t::remove()

>> and ipa_call_summary_t::duplicate() will already take care of handling

>> late insertion/removal of cgraph nodes ? I just initialized

>> is_return_callee to false in ipa_call_summary::reset and that seems to

>> work. I am not sure though if I have handled it correctly. Could you

>> please check that ?

>

> I was actually thinking to introduce separate summary for ipa-pure-const pass,

> but this seems fine to me too (for one bit definitly more effecient)

> ipa_call_summary_t::duplicate copies all the fields, so indeed you should be

> safe here.

>

> Also it is possible for functions to be inserted late.  Updating of call summaries

> is currently handled by ipa_fn_summary_t::insert

>>

>> 2] ipa_inline() called ipa_free_fn_summary, which made

>> ipa_call_summaries unavailable during ipa-pure-const pass. I removed

>> call to ipa_free_fn_summary from ipa_inline, and moved it to

>> ipa_pure_const::execute(). Is that OK ?

>

> Seems OK to me.

>>

>> Patch passes bootstrap+test and lto bootstrap+test on x86_64-unknown-linux-gnu.

>> Verfiied SPEC2k6 compiles and runs without miscompares with LTO

>> enabled on aarch64-linux-gnu.

>> Cross-tested on arm*-*-* and aarch64*-*-*. I will additionally test

>> the patch by building chromium or firefox.

>> Would it be OK to commit if it passes above validations ?

>>

>> Thanks,

>> Prathamesh

>> >

>> > Thanks,

>> > Honza

>

>> 2017-10-05  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

>>

>>       * cgraph.h (set_malloc_flag): Declare.

>>       * cgraph.c (set_malloc_flag_1): New function.

>>       (set_malloc_flag): Likewise.

>>       * ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee.

>>       * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to

>>       false.

>>       (read_ipa_call_summary): Add support for reading is_return_callee.

>>       (write_ipa_call_summary): Stream is_return_callee.

>>       * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary.

>>       * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, symbol-summary.h,

>>       ipa-prop.h, ipa-fnsummary.h.

>>       (malloc_state_e): Define.

>>       (malloc_state_names): Define.

>>       (funct_state_d): Add field malloc_state.

>>       (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM.

>>       (check_retval_uses): New function.

>>       (malloc_candidate_p): Likewise.

>>       (analyze_function): Add support for malloc attribute.

>>       (pure_const_write_summary): Stream malloc_state.

>>       (pure_const_read_summary): Add support for reading malloc_state.

>>       (dump_malloc_lattice): New function.

>>       (propagate_malloc): New function.

>>       (ipa_pure_const::execute): Call propagate_malloc and

>>       ipa_free_fn_summary.

>>       (pass_local_pure_const::execute): Add support for malloc attribute.

>>       * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro.

>>

>> testsuite/

>>       * gcc.dg/ipa/propmalloc-1.c: New test-case.

>>       * gcc.dg/ipa/propmalloc-2.c: Likewise.

>>       * gcc.dg/ipa/propmalloc-3.c: Likewise.

>>

>> diff --git a/gcc/cgraph.c b/gcc/cgraph.c

>> index 3d0cefbd46b..0aad90d59ea 100644

>> --- a/gcc/cgraph.c

>> +++ b/gcc/cgraph.c

>> @@ -2530,6 +2530,53 @@ cgraph_node::set_nothrow_flag (bool nothrow)

>>    return changed;

>>  }

>>

>> +/* Worker to set malloc flag.  */

> New line here I guess (it is below)

>> +static void

>> +set_malloc_flag_1 (cgraph_node *node, bool malloc_p, bool *changed)

>> +{

>> +  if (malloc_p && !DECL_IS_MALLOC (node->decl))

>> +    {

>> +      DECL_IS_MALLOC (node->decl) = true;

>> +      *changed = true;

>> +    }

>> +

>> +  ipa_ref *ref;

>> +  FOR_EACH_ALIAS (node, ref)

>> +    {

>> +      cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);

>> +      if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)

>> +     set_malloc_flag_1 (alias, malloc_p, changed);

>> +    }

>> +

>> +  for (cgraph_edge *e = node->callers; e; e = e->next_caller)

>> +    if (e->caller->thunk.thunk_p

>> +     && (!malloc_p || e->caller->get_availability () > AVAIL_INTERPOSABLE))

>> +      set_malloc_flag_1 (e->caller, malloc_p, changed);

>> +}

>> +

>> +/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any.  */

>> +

>> +bool

>> +cgraph_node::set_malloc_flag (bool malloc_p)

>> +{

>> +  bool changed = false;

>> +

>> +  if (!malloc_p || get_availability () > AVAIL_INTERPOSABLE)

>> +    set_malloc_flag_1 (this, malloc_p, &changed);

>> +  else

>> +    {

>> +      ipa_ref *ref;

>> +

>> +      FOR_EACH_ALIAS (this, ref)

>> +     {

>> +       cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);

>> +       if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)

>> +         set_malloc_flag_1 (alias, malloc_p, &changed);

>> +     }

>> +    }

>> +  return changed;

>> +}

>> +

>> diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h

>> index f50d6806e61..613a2b6f625 100644

>> --- a/gcc/ipa-fnsummary.h

>> +++ b/gcc/ipa-fnsummary.h

>> @@ -197,7 +197,9 @@ struct ipa_call_summary

>>    int call_stmt_time;

>>    /* Depth of loop nest, 0 means no nesting.  */

>>    unsigned int loop_depth;

>> -

>> +  /* Indicates whether the caller returns the value of it's callee.  */

> Perhaps add a comment when it is initialized.

> Don't you also check that the calle is not captured? In that case I would

> is_return_callee_uncaptured or so, so someone does not try to use it with

> different meaning.

Sorry, I didn't understand "Don't you also check that callee is not captured ?"
What does capturing mean in this context ?

Thanks,
Prathamesh
>

>> @@ -69,6 +74,15 @@ enum pure_const_state_e

>>

>>  const char *pure_const_names[3] = {"const", "pure", "neither"};

>>

>> +enum malloc_state_e

>> +{

>> +  STATE_MALLOC_TOP,

>> +  STATE_MALLOC,

>> +  STATE_MALLOC_BOTTOM

>> +};

>> +

>> +const char *malloc_state_names[] = {"malloc_top", "malloc", "malloc_bottom"};

>

> Seems static is missing in all those declarations, please fix it.

>

> OK with these changes. Thanks!

>

> Honza
Jan Hubicka Oct. 7, 2017, 6:23 p.m. UTC | #20
> On 6 October 2017 at 06:04, Jan Hubicka <hubicka@ucw.cz> wrote:

> >> Hi Honza,

> >> Thanks for the detailed suggestions, I have updated the patch accordingly.

> >> I have following questions on call_summary:

> >> 1] I added field bool is_return_callee in ipa_call_summary to track

> >> whether the caller possibly returns value returned by callee, which

> >> gets rid of return_callees_map. I assume ipa_call_summary_t::remove()

> >> and ipa_call_summary_t::duplicate() will already take care of handling

> >> late insertion/removal of cgraph nodes ? I just initialized

> >> is_return_callee to false in ipa_call_summary::reset and that seems to

> >> work. I am not sure though if I have handled it correctly. Could you

> >> please check that ?

> >

> > I was actually thinking to introduce separate summary for ipa-pure-const pass,

> > but this seems fine to me too (for one bit definitly more effecient)

> > ipa_call_summary_t::duplicate copies all the fields, so indeed you should be

> > safe here.

> >

> > Also it is possible for functions to be inserted late.  Updating of call summaries

> > is currently handled by ipa_fn_summary_t::insert

> >>

> >> 2] ipa_inline() called ipa_free_fn_summary, which made

> >> ipa_call_summaries unavailable during ipa-pure-const pass. I removed

> >> call to ipa_free_fn_summary from ipa_inline, and moved it to

> >> ipa_pure_const::execute(). Is that OK ?

> >

> > Seems OK to me.

> >>

> >> Patch passes bootstrap+test and lto bootstrap+test on x86_64-unknown-linux-gnu.

> >> Verfiied SPEC2k6 compiles and runs without miscompares with LTO

> >> enabled on aarch64-linux-gnu.

> >> Cross-tested on arm*-*-* and aarch64*-*-*. I will additionally test

> >> the patch by building chromium or firefox.

> >> Would it be OK to commit if it passes above validations ?

> >>

> >> Thanks,

> >> Prathamesh

> >> >

> >> > Thanks,

> >> > Honza

> >

> >> 2017-10-05  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

> >>

> >>       * cgraph.h (set_malloc_flag): Declare.

> >>       * cgraph.c (set_malloc_flag_1): New function.

> >>       (set_malloc_flag): Likewise.

> >>       * ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee.

> >>       * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to

> >>       false.

> >>       (read_ipa_call_summary): Add support for reading is_return_callee.

> >>       (write_ipa_call_summary): Stream is_return_callee.

> >>       * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary.

> >>       * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, symbol-summary.h,

> >>       ipa-prop.h, ipa-fnsummary.h.

> >>       (malloc_state_e): Define.

> >>       (malloc_state_names): Define.

> >>       (funct_state_d): Add field malloc_state.

> >>       (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM.

> >>       (check_retval_uses): New function.

> >>       (malloc_candidate_p): Likewise.

> >>       (analyze_function): Add support for malloc attribute.

> >>       (pure_const_write_summary): Stream malloc_state.

> >>       (pure_const_read_summary): Add support for reading malloc_state.

> >>       (dump_malloc_lattice): New function.

> >>       (propagate_malloc): New function.

> >>       (ipa_pure_const::execute): Call propagate_malloc and

> >>       ipa_free_fn_summary.

> >>       (pass_local_pure_const::execute): Add support for malloc attribute.

> >>       * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro.

> >>

> >> testsuite/

> >>       * gcc.dg/ipa/propmalloc-1.c: New test-case.

> >>       * gcc.dg/ipa/propmalloc-2.c: Likewise.

> >>       * gcc.dg/ipa/propmalloc-3.c: Likewise.

> >>

> >> diff --git a/gcc/cgraph.c b/gcc/cgraph.c

> >> index 3d0cefbd46b..0aad90d59ea 100644

> >> --- a/gcc/cgraph.c

> >> +++ b/gcc/cgraph.c

> >> @@ -2530,6 +2530,53 @@ cgraph_node::set_nothrow_flag (bool nothrow)

> >>    return changed;

> >>  }

> >>

> >> +/* Worker to set malloc flag.  */

> > New line here I guess (it is below)

> >> +static void

> >> +set_malloc_flag_1 (cgraph_node *node, bool malloc_p, bool *changed)

> >> +{

> >> +  if (malloc_p && !DECL_IS_MALLOC (node->decl))

> >> +    {

> >> +      DECL_IS_MALLOC (node->decl) = true;

> >> +      *changed = true;

> >> +    }

> >> +

> >> +  ipa_ref *ref;

> >> +  FOR_EACH_ALIAS (node, ref)

> >> +    {

> >> +      cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);

> >> +      if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)

> >> +     set_malloc_flag_1 (alias, malloc_p, changed);

> >> +    }

> >> +

> >> +  for (cgraph_edge *e = node->callers; e; e = e->next_caller)

> >> +    if (e->caller->thunk.thunk_p

> >> +     && (!malloc_p || e->caller->get_availability () > AVAIL_INTERPOSABLE))

> >> +      set_malloc_flag_1 (e->caller, malloc_p, changed);

> >> +}

> >> +

> >> +/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any.  */

> >> +

> >> +bool

> >> +cgraph_node::set_malloc_flag (bool malloc_p)

> >> +{

> >> +  bool changed = false;

> >> +

> >> +  if (!malloc_p || get_availability () > AVAIL_INTERPOSABLE)

> >> +    set_malloc_flag_1 (this, malloc_p, &changed);

> >> +  else

> >> +    {

> >> +      ipa_ref *ref;

> >> +

> >> +      FOR_EACH_ALIAS (this, ref)

> >> +     {

> >> +       cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);

> >> +       if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)

> >> +         set_malloc_flag_1 (alias, malloc_p, &changed);

> >> +     }

> >> +    }

> >> +  return changed;

> >> +}

> >> +

> >> diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h

> >> index f50d6806e61..613a2b6f625 100644

> >> --- a/gcc/ipa-fnsummary.h

> >> +++ b/gcc/ipa-fnsummary.h

> >> @@ -197,7 +197,9 @@ struct ipa_call_summary

> >>    int call_stmt_time;

> >>    /* Depth of loop nest, 0 means no nesting.  */

> >>    unsigned int loop_depth;

> >> -

> >> +  /* Indicates whether the caller returns the value of it's callee.  */

> > Perhaps add a comment when it is initialized.

> > Don't you also check that the calle is not captured? In that case I would

> > is_return_callee_uncaptured or so, so someone does not try to use it with

> > different meaning.

> Sorry, I didn't understand "Don't you also check that callee is not captured ?"

> What does capturing mean in this context ?


Don't you need to know that the returned pointer is new and does not alias
with something else?

Honza
> 

> Thanks,

> Prathamesh

> >

> >> @@ -69,6 +74,15 @@ enum pure_const_state_e

> >>

> >>  const char *pure_const_names[3] = {"const", "pure", "neither"};

> >>

> >> +enum malloc_state_e

> >> +{

> >> +  STATE_MALLOC_TOP,

> >> +  STATE_MALLOC,

> >> +  STATE_MALLOC_BOTTOM

> >> +};

> >> +

> >> +const char *malloc_state_names[] = {"malloc_top", "malloc", "malloc_bottom"};

> >

> > Seems static is missing in all those declarations, please fix it.

> >

> > OK with these changes. Thanks!

> >

> > Honza
Prathamesh Kulkarni Oct. 7, 2017, 7:35 p.m. UTC | #21
On 7 October 2017 at 11:23, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On 6 October 2017 at 06:04, Jan Hubicka <hubicka@ucw.cz> wrote:

>> >> Hi Honza,

>> >> Thanks for the detailed suggestions, I have updated the patch accordingly.

>> >> I have following questions on call_summary:

>> >> 1] I added field bool is_return_callee in ipa_call_summary to track

>> >> whether the caller possibly returns value returned by callee, which

>> >> gets rid of return_callees_map. I assume ipa_call_summary_t::remove()

>> >> and ipa_call_summary_t::duplicate() will already take care of handling

>> >> late insertion/removal of cgraph nodes ? I just initialized

>> >> is_return_callee to false in ipa_call_summary::reset and that seems to

>> >> work. I am not sure though if I have handled it correctly. Could you

>> >> please check that ?

>> >

>> > I was actually thinking to introduce separate summary for ipa-pure-const pass,

>> > but this seems fine to me too (for one bit definitly more effecient)

>> > ipa_call_summary_t::duplicate copies all the fields, so indeed you should be

>> > safe here.

>> >

>> > Also it is possible for functions to be inserted late.  Updating of call summaries

>> > is currently handled by ipa_fn_summary_t::insert

>> >>

>> >> 2] ipa_inline() called ipa_free_fn_summary, which made

>> >> ipa_call_summaries unavailable during ipa-pure-const pass. I removed

>> >> call to ipa_free_fn_summary from ipa_inline, and moved it to

>> >> ipa_pure_const::execute(). Is that OK ?

>> >

>> > Seems OK to me.

>> >>

>> >> Patch passes bootstrap+test and lto bootstrap+test on x86_64-unknown-linux-gnu.

>> >> Verfiied SPEC2k6 compiles and runs without miscompares with LTO

>> >> enabled on aarch64-linux-gnu.

>> >> Cross-tested on arm*-*-* and aarch64*-*-*. I will additionally test

>> >> the patch by building chromium or firefox.

>> >> Would it be OK to commit if it passes above validations ?

>> >>

>> >> Thanks,

>> >> Prathamesh

>> >> >

>> >> > Thanks,

>> >> > Honza

>> >

>> >> 2017-10-05  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

>> >>

>> >>       * cgraph.h (set_malloc_flag): Declare.

>> >>       * cgraph.c (set_malloc_flag_1): New function.

>> >>       (set_malloc_flag): Likewise.

>> >>       * ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee.

>> >>       * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to

>> >>       false.

>> >>       (read_ipa_call_summary): Add support for reading is_return_callee.

>> >>       (write_ipa_call_summary): Stream is_return_callee.

>> >>       * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary.

>> >>       * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, symbol-summary.h,

>> >>       ipa-prop.h, ipa-fnsummary.h.

>> >>       (malloc_state_e): Define.

>> >>       (malloc_state_names): Define.

>> >>       (funct_state_d): Add field malloc_state.

>> >>       (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM.

>> >>       (check_retval_uses): New function.

>> >>       (malloc_candidate_p): Likewise.

>> >>       (analyze_function): Add support for malloc attribute.

>> >>       (pure_const_write_summary): Stream malloc_state.

>> >>       (pure_const_read_summary): Add support for reading malloc_state.

>> >>       (dump_malloc_lattice): New function.

>> >>       (propagate_malloc): New function.

>> >>       (ipa_pure_const::execute): Call propagate_malloc and

>> >>       ipa_free_fn_summary.

>> >>       (pass_local_pure_const::execute): Add support for malloc attribute.

>> >>       * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro.

>> >>

>> >> testsuite/

>> >>       * gcc.dg/ipa/propmalloc-1.c: New test-case.

>> >>       * gcc.dg/ipa/propmalloc-2.c: Likewise.

>> >>       * gcc.dg/ipa/propmalloc-3.c: Likewise.

>> >>

>> >> diff --git a/gcc/cgraph.c b/gcc/cgraph.c

>> >> index 3d0cefbd46b..0aad90d59ea 100644

>> >> --- a/gcc/cgraph.c

>> >> +++ b/gcc/cgraph.c

>> >> @@ -2530,6 +2530,53 @@ cgraph_node::set_nothrow_flag (bool nothrow)

>> >>    return changed;

>> >>  }

>> >>

>> >> +/* Worker to set malloc flag.  */

>> > New line here I guess (it is below)

>> >> +static void

>> >> +set_malloc_flag_1 (cgraph_node *node, bool malloc_p, bool *changed)

>> >> +{

>> >> +  if (malloc_p && !DECL_IS_MALLOC (node->decl))

>> >> +    {

>> >> +      DECL_IS_MALLOC (node->decl) = true;

>> >> +      *changed = true;

>> >> +    }

>> >> +

>> >> +  ipa_ref *ref;

>> >> +  FOR_EACH_ALIAS (node, ref)

>> >> +    {

>> >> +      cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);

>> >> +      if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)

>> >> +     set_malloc_flag_1 (alias, malloc_p, changed);

>> >> +    }

>> >> +

>> >> +  for (cgraph_edge *e = node->callers; e; e = e->next_caller)

>> >> +    if (e->caller->thunk.thunk_p

>> >> +     && (!malloc_p || e->caller->get_availability () > AVAIL_INTERPOSABLE))

>> >> +      set_malloc_flag_1 (e->caller, malloc_p, changed);

>> >> +}

>> >> +

>> >> +/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any.  */

>> >> +

>> >> +bool

>> >> +cgraph_node::set_malloc_flag (bool malloc_p)

>> >> +{

>> >> +  bool changed = false;

>> >> +

>> >> +  if (!malloc_p || get_availability () > AVAIL_INTERPOSABLE)

>> >> +    set_malloc_flag_1 (this, malloc_p, &changed);

>> >> +  else

>> >> +    {

>> >> +      ipa_ref *ref;

>> >> +

>> >> +      FOR_EACH_ALIAS (this, ref)

>> >> +     {

>> >> +       cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);

>> >> +       if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)

>> >> +         set_malloc_flag_1 (alias, malloc_p, &changed);

>> >> +     }

>> >> +    }

>> >> +  return changed;

>> >> +}

>> >> +

>> >> diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h

>> >> index f50d6806e61..613a2b6f625 100644

>> >> --- a/gcc/ipa-fnsummary.h

>> >> +++ b/gcc/ipa-fnsummary.h

>> >> @@ -197,7 +197,9 @@ struct ipa_call_summary

>> >>    int call_stmt_time;

>> >>    /* Depth of loop nest, 0 means no nesting.  */

>> >>    unsigned int loop_depth;

>> >> -

>> >> +  /* Indicates whether the caller returns the value of it's callee.  */

>> > Perhaps add a comment when it is initialized.

>> > Don't you also check that the calle is not captured? In that case I would

>> > is_return_callee_uncaptured or so, so someone does not try to use it with

>> > different meaning.

>> Sorry, I didn't understand "Don't you also check that callee is not captured ?"

>> What does capturing mean in this context ?

>

> Don't you need to know that the returned pointer is new and does not alias

> with something else?

Yes, malloc_candidate_p enforces that locally by checking the returned
pointer is return value of callee and optionally has immediate uses
only within comparisons against 0. But whether the returned pointer is
new depends on whether callee itself can be annotated with malloc,
which is done in propagate_malloc.
IIUC pointer returned by a malloc annotated function, does not alias
anything else ?

Thanks,
Prathamesh
>

> Honza

>>

>> Thanks,

>> Prathamesh

>> >

>> >> @@ -69,6 +74,15 @@ enum pure_const_state_e

>> >>

>> >>  const char *pure_const_names[3] = {"const", "pure", "neither"};

>> >>

>> >> +enum malloc_state_e

>> >> +{

>> >> +  STATE_MALLOC_TOP,

>> >> +  STATE_MALLOC,

>> >> +  STATE_MALLOC_BOTTOM

>> >> +};

>> >> +

>> >> +const char *malloc_state_names[] = {"malloc_top", "malloc", "malloc_bottom"};

>> >

>> > Seems static is missing in all those declarations, please fix it.

>> >

>> > OK with these changes. Thanks!

>> >

>> > Honza
Prathamesh Kulkarni Oct. 13, 2017, 9:50 p.m. UTC | #22
On 7 October 2017 at 12:35, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
> On 7 October 2017 at 11:23, Jan Hubicka <hubicka@ucw.cz> wrote:

>>> On 6 October 2017 at 06:04, Jan Hubicka <hubicka@ucw.cz> wrote:

>>> >> Hi Honza,

>>> >> Thanks for the detailed suggestions, I have updated the patch accordingly.

>>> >> I have following questions on call_summary:

>>> >> 1] I added field bool is_return_callee in ipa_call_summary to track

>>> >> whether the caller possibly returns value returned by callee, which

>>> >> gets rid of return_callees_map. I assume ipa_call_summary_t::remove()

>>> >> and ipa_call_summary_t::duplicate() will already take care of handling

>>> >> late insertion/removal of cgraph nodes ? I just initialized

>>> >> is_return_callee to false in ipa_call_summary::reset and that seems to

>>> >> work. I am not sure though if I have handled it correctly. Could you

>>> >> please check that ?

>>> >

>>> > I was actually thinking to introduce separate summary for ipa-pure-const pass,

>>> > but this seems fine to me too (for one bit definitly more effecient)

>>> > ipa_call_summary_t::duplicate copies all the fields, so indeed you should be

>>> > safe here.

>>> >

>>> > Also it is possible for functions to be inserted late.  Updating of call summaries

>>> > is currently handled by ipa_fn_summary_t::insert

>>> >>

>>> >> 2] ipa_inline() called ipa_free_fn_summary, which made

>>> >> ipa_call_summaries unavailable during ipa-pure-const pass. I removed

>>> >> call to ipa_free_fn_summary from ipa_inline, and moved it to

>>> >> ipa_pure_const::execute(). Is that OK ?

>>> >

>>> > Seems OK to me.

>>> >>

>>> >> Patch passes bootstrap+test and lto bootstrap+test on x86_64-unknown-linux-gnu.

>>> >> Verfiied SPEC2k6 compiles and runs without miscompares with LTO

>>> >> enabled on aarch64-linux-gnu.

>>> >> Cross-tested on arm*-*-* and aarch64*-*-*. I will additionally test

>>> >> the patch by building chromium or firefox.

>>> >> Would it be OK to commit if it passes above validations ?

>>> >>

>>> >> Thanks,

>>> >> Prathamesh

>>> >> >

>>> >> > Thanks,

>>> >> > Honza

>>> >

>>> >> 2017-10-05  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

>>> >>

>>> >>       * cgraph.h (set_malloc_flag): Declare.

>>> >>       * cgraph.c (set_malloc_flag_1): New function.

>>> >>       (set_malloc_flag): Likewise.

>>> >>       * ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee.

>>> >>       * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to

>>> >>       false.

>>> >>       (read_ipa_call_summary): Add support for reading is_return_callee.

>>> >>       (write_ipa_call_summary): Stream is_return_callee.

>>> >>       * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary.

>>> >>       * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, symbol-summary.h,

>>> >>       ipa-prop.h, ipa-fnsummary.h.

>>> >>       (malloc_state_e): Define.

>>> >>       (malloc_state_names): Define.

>>> >>       (funct_state_d): Add field malloc_state.

>>> >>       (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM.

>>> >>       (check_retval_uses): New function.

>>> >>       (malloc_candidate_p): Likewise.

>>> >>       (analyze_function): Add support for malloc attribute.

>>> >>       (pure_const_write_summary): Stream malloc_state.

>>> >>       (pure_const_read_summary): Add support for reading malloc_state.

>>> >>       (dump_malloc_lattice): New function.

>>> >>       (propagate_malloc): New function.

>>> >>       (ipa_pure_const::execute): Call propagate_malloc and

>>> >>       ipa_free_fn_summary.

>>> >>       (pass_local_pure_const::execute): Add support for malloc attribute.

>>> >>       * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro.

>>> >>

>>> >> testsuite/

>>> >>       * gcc.dg/ipa/propmalloc-1.c: New test-case.

>>> >>       * gcc.dg/ipa/propmalloc-2.c: Likewise.

>>> >>       * gcc.dg/ipa/propmalloc-3.c: Likewise.

>>> >>

>>> >> diff --git a/gcc/cgraph.c b/gcc/cgraph.c

>>> >> index 3d0cefbd46b..0aad90d59ea 100644

>>> >> --- a/gcc/cgraph.c

>>> >> +++ b/gcc/cgraph.c

>>> >> @@ -2530,6 +2530,53 @@ cgraph_node::set_nothrow_flag (bool nothrow)

>>> >>    return changed;

>>> >>  }

>>> >>

>>> >> +/* Worker to set malloc flag.  */

>>> > New line here I guess (it is below)

>>> >> +static void

>>> >> +set_malloc_flag_1 (cgraph_node *node, bool malloc_p, bool *changed)

>>> >> +{

>>> >> +  if (malloc_p && !DECL_IS_MALLOC (node->decl))

>>> >> +    {

>>> >> +      DECL_IS_MALLOC (node->decl) = true;

>>> >> +      *changed = true;

>>> >> +    }

>>> >> +

>>> >> +  ipa_ref *ref;

>>> >> +  FOR_EACH_ALIAS (node, ref)

>>> >> +    {

>>> >> +      cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);

>>> >> +      if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)

>>> >> +     set_malloc_flag_1 (alias, malloc_p, changed);

>>> >> +    }

>>> >> +

>>> >> +  for (cgraph_edge *e = node->callers; e; e = e->next_caller)

>>> >> +    if (e->caller->thunk.thunk_p

>>> >> +     && (!malloc_p || e->caller->get_availability () > AVAIL_INTERPOSABLE))

>>> >> +      set_malloc_flag_1 (e->caller, malloc_p, changed);

>>> >> +}

>>> >> +

>>> >> +/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any.  */

>>> >> +

>>> >> +bool

>>> >> +cgraph_node::set_malloc_flag (bool malloc_p)

>>> >> +{

>>> >> +  bool changed = false;

>>> >> +

>>> >> +  if (!malloc_p || get_availability () > AVAIL_INTERPOSABLE)

>>> >> +    set_malloc_flag_1 (this, malloc_p, &changed);

>>> >> +  else

>>> >> +    {

>>> >> +      ipa_ref *ref;

>>> >> +

>>> >> +      FOR_EACH_ALIAS (this, ref)

>>> >> +     {

>>> >> +       cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);

>>> >> +       if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)

>>> >> +         set_malloc_flag_1 (alias, malloc_p, &changed);

>>> >> +     }

>>> >> +    }

>>> >> +  return changed;

>>> >> +}

>>> >> +

>>> >> diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h

>>> >> index f50d6806e61..613a2b6f625 100644

>>> >> --- a/gcc/ipa-fnsummary.h

>>> >> +++ b/gcc/ipa-fnsummary.h

>>> >> @@ -197,7 +197,9 @@ struct ipa_call_summary

>>> >>    int call_stmt_time;

>>> >>    /* Depth of loop nest, 0 means no nesting.  */

>>> >>    unsigned int loop_depth;

>>> >> -

>>> >> +  /* Indicates whether the caller returns the value of it's callee.  */

>>> > Perhaps add a comment when it is initialized.

>>> > Don't you also check that the calle is not captured? In that case I would

>>> > is_return_callee_uncaptured or so, so someone does not try to use it with

>>> > different meaning.

>>> Sorry, I didn't understand "Don't you also check that callee is not captured ?"

>>> What does capturing mean in this context ?

>>

>> Don't you need to know that the returned pointer is new and does not alias

>> with something else?

> Yes, malloc_candidate_p enforces that locally by checking the returned

> pointer is return value of callee and optionally has immediate uses

> only within comparisons against 0. But whether the returned pointer is

> new depends on whether callee itself can be annotated with malloc,

> which is done in propagate_malloc.

> IIUC pointer returned by a malloc annotated function, does not alias

> anything else ?

>

> Thanks,

> Prathamesh

>>

>> Honza

>>>

>>> Thanks,

>>> Prathamesh

>>> >

>>> >> @@ -69,6 +74,15 @@ enum pure_const_state_e

>>> >>

>>> >>  const char *pure_const_names[3] = {"const", "pure", "neither"};

>>> >>

>>> >> +enum malloc_state_e

>>> >> +{

>>> >> +  STATE_MALLOC_TOP,

>>> >> +  STATE_MALLOC,

>>> >> +  STATE_MALLOC_BOTTOM

>>> >> +};

>>> >> +

>>> >> +const char *malloc_state_names[] = {"malloc_top", "malloc", "malloc_bottom"};

>>> >

>>> > Seems static is missing in all those declarations, please fix it.

Hi Honza,
I have done the suggested changes in this version.
LTO Bootstrapped+tested on x86_64-unknown-linux-gnu, ppc64le-linux-gnu.
Cross-tested on arm*-*-*, aarch64*-*-*
Verified no functional regressions with SPEC2006.
Would it be OK to commit this version of the patch ?

Thanks,
Prathamesh
>>> >

>>> > OK with these changes. Thanks!

>>> >

>>> > Honza
2017-10-13  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

	* cgraph.h (set_malloc_flag): Declare.
	* cgraph.c (set_malloc_flag_1): New function.
	(set_malloc_flag): Likewise.
	* ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee.
	* ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to
	false.
	(read_ipa_call_summary): Add support for reading is_return_callee.
	(write_ipa_call_summary): Stream is_return_callee.
	* ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary.
	* ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, symbol-summary.h,
	ipa-prop.h, ipa-fnsummary.h.
	(pure_const_names): Change to static.
	(malloc_state_e): Define.
	(malloc_state_names): Define.
	(funct_state_d): Add field malloc_state.
	(varying_state): Set malloc_state to STATE_MALLOC_BOTTOM.
	(check_retval_uses): New function.
	(malloc_candidate_p): Likewise.
	(analyze_function): Add support for malloc attribute.
	(pure_const_write_summary): Stream malloc_state.
	(pure_const_read_summary): Add support for reading malloc_state.
	(dump_malloc_lattice): New function.
	(propagate_malloc): New function.
	(ipa_pure_const::execute): Call propagate_malloc and
	ipa_free_fn_summary.
	(pass_local_pure_const::execute): Add support for malloc attribute.
	* ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro.

testsuite/
	* gcc.dg/ipa/propmalloc-1.c: New test-case.
	* gcc.dg/ipa/propmalloc-2.c: Likewise.
	* gcc.dg/ipa/propmalloc-3.c: Likewise.

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 3d0cefbd46b..0aad90d59ea 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2530,6 +2530,53 @@ cgraph_node::set_nothrow_flag (bool nothrow)
   return changed;
 }
 
+/* Worker to set malloc flag.  */
+static void
+set_malloc_flag_1 (cgraph_node *node, bool malloc_p, bool *changed)
+{
+  if (malloc_p && !DECL_IS_MALLOC (node->decl))
+    {
+      DECL_IS_MALLOC (node->decl) = true;
+      *changed = true;
+    }
+
+  ipa_ref *ref;
+  FOR_EACH_ALIAS (node, ref)
+    {
+      cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);
+      if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)
+	set_malloc_flag_1 (alias, malloc_p, changed);
+    }
+
+  for (cgraph_edge *e = node->callers; e; e = e->next_caller)
+    if (e->caller->thunk.thunk_p
+	&& (!malloc_p || e->caller->get_availability () > AVAIL_INTERPOSABLE))
+      set_malloc_flag_1 (e->caller, malloc_p, changed);
+}
+
+/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any.  */
+
+bool
+cgraph_node::set_malloc_flag (bool malloc_p)
+{
+  bool changed = false;
+
+  if (!malloc_p || get_availability () > AVAIL_INTERPOSABLE)
+    set_malloc_flag_1 (this, malloc_p, &changed);
+  else
+    {
+      ipa_ref *ref;
+
+      FOR_EACH_ALIAS (this, ref)
+	{
+	  cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);
+	  if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)
+	    set_malloc_flag_1 (alias, malloc_p, &changed);
+	}
+    }
+  return changed;
+}
+
 /* Worker to set_const_flag.  */
 
 static void
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 1758e8b08c1..84824e9f814 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1151,6 +1151,10 @@ public:
      if any to NOTHROW.  */
   bool set_nothrow_flag (bool nothrow);
 
+  /* SET DECL_IS_MALLOC on cgraph_node's decl and on aliases of the node
+     if any.  */
+  bool set_malloc_flag (bool malloc_p);
+
   /* If SET_CONST is true, mark function, aliases and thunks to be ECF_CONST.
     If SET_CONST if false, clear the flag.
 
diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 076ccd40bd7..f71338e424e 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -542,6 +542,7 @@ void
 ipa_call_summary::reset ()
 {
   call_stmt_size = call_stmt_time = 0;
+  is_return_callee_uncaptured = false;
   if (predicate)
     edge_predicate_pool.remove (predicate);
   predicate = NULL;
@@ -3204,6 +3205,10 @@ read_ipa_call_summary (struct lto_input_block *ib, struct cgraph_edge *e)
   es->call_stmt_size = streamer_read_uhwi (ib);
   es->call_stmt_time = streamer_read_uhwi (ib);
   es->loop_depth = streamer_read_uhwi (ib);
+
+  bitpack_d bp = streamer_read_bitpack (ib);
+  es->is_return_callee_uncaptured = bp_unpack_value (&bp, 1);
+
   p.stream_in (ib);
   edge_set_predicate (e, &p);
   length = streamer_read_uhwi (ib);
@@ -3360,6 +3365,11 @@ write_ipa_call_summary (struct output_block *ob, struct cgraph_edge *e)
   streamer_write_uhwi (ob, es->call_stmt_size);
   streamer_write_uhwi (ob, es->call_stmt_time);
   streamer_write_uhwi (ob, es->loop_depth);
+
+  bitpack_d bp = bitpack_create (ob->main_stream);
+  bp_pack_value (&bp, es->is_return_callee_uncaptured, 1);
+  streamer_write_bitpack (&bp);
+
   if (es->predicate)
     es->predicate->stream_out (ob);
   else
diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h
index f50d6806e61..a794bd09318 100644
--- a/gcc/ipa-fnsummary.h
+++ b/gcc/ipa-fnsummary.h
@@ -197,7 +197,9 @@ struct ipa_call_summary
   int call_stmt_time;
   /* Depth of loop nest, 0 means no nesting.  */
   unsigned int loop_depth;
-  
+  /* Indicates whether the caller returns the value of it's callee.  */
+  bool is_return_callee_uncaptured;
+
   /* Keep all field empty so summary dumping works during its computation.
      This is useful for debugging.  */
   ipa_call_summary ()
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index dd46cb61362..b8e65e2fa7e 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -2526,9 +2526,6 @@ ipa_inline (void)
 
   if (dump_file)
     ipa_dump_fn_summaries (dump_file);
-  /* In WPA we use inline summaries for partitioning process.  */
-  if (!flag_wpa)
-    ipa_free_fn_summary ();
   return remove_functions ? TODO_remove_functions : 0;
 }
 
diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index dac8f0d5f21..a6bca660036 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -56,6 +56,11 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "intl.h"
 #include "opts.h"
+#include "ssa.h"
+#include "alloc-pool.h"
+#include "symbol-summary.h"
+#include "ipa-prop.h"
+#include "ipa-fnsummary.h"
 
 /* Lattice values for const and pure functions.  Everything starts out
    being const, then may drop to pure and then neither depending on
@@ -67,7 +72,16 @@ enum pure_const_state_e
   IPA_NEITHER
 };
 
-const char *pure_const_names[3] = {"const", "pure", "neither"};
+static const char *pure_const_names[3] = {"const", "pure", "neither"};
+
+enum malloc_state_e
+{
+  STATE_MALLOC_TOP,
+  STATE_MALLOC,
+  STATE_MALLOC_BOTTOM
+};
+
+static const char *malloc_state_names[] = {"malloc_top", "malloc", "malloc_bottom"};
 
 /* Holder for the const_state.  There is one of these per function
    decl.  */
@@ -92,11 +106,13 @@ struct funct_state_d
   /* If function can call free, munmap or otherwise make previously
      non-trapping memory accesses trapping.  */
   bool can_free;
+
+  enum malloc_state_e malloc_state;
 };
 
 /* State used when we know nothing about function.  */
 static struct funct_state_d varying_state
-   = { IPA_NEITHER, IPA_NEITHER, true, true, true, true };
+   = { IPA_NEITHER, IPA_NEITHER, true, true, true, true, STATE_MALLOC_BOTTOM };
 
 
 typedef struct funct_state_d * funct_state;
@@ -812,6 +828,149 @@ check_stmt (gimple_stmt_iterator *gsip, funct_state local, bool ipa)
     }
 }
 
+/* Check that RETVAL is used only in STMT and in comparisons against 0.
+   RETVAL is return value of the function and STMT is return stmt.  */
+
+static bool
+check_retval_uses (tree retval, gimple *stmt)
+{
+  imm_use_iterator use_iter;
+  gimple *use_stmt;
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, retval)
+    if (gcond *cond = dyn_cast<gcond *> (use_stmt))
+      {
+	tree op2 = gimple_cond_rhs (cond);
+	if (!integer_zerop (op2))
+	  RETURN_FROM_IMM_USE_STMT (use_iter, false);
+      }
+    else if (gassign *ga = dyn_cast<gassign *> (use_stmt))
+      {
+	enum tree_code code = gimple_assign_rhs_code (ga);
+	if (TREE_CODE_CLASS (code) != tcc_comparison)
+	  RETURN_FROM_IMM_USE_STMT (use_iter, false);
+	if (!integer_zerop (gimple_assign_rhs2 (ga)))
+	  RETURN_FROM_IMM_USE_STMT (use_iter, false);
+      }
+    else if (is_gimple_debug (use_stmt))
+      ;
+    else if (use_stmt != stmt)
+      RETURN_FROM_IMM_USE_STMT (use_iter, false);
+
+  return true;
+}
+
+/* malloc_candidate_p() checks if FUN can possibly be annotated with malloc
+   attribute. Currently this function does a very conservative analysis.
+   FUN is considered to be a candidate if
+   1) It returns a value of pointer type.
+   2) SSA_NAME_DEF_STMT (return_value) is either a function call or
+      a phi, and element of phi is either NULL or
+      SSA_NAME_DEF_STMT(element) is function call.
+   3) The return-value has immediate uses only within comparisons (gcond or gassign)
+      and return_stmt (and likewise a phi arg has immediate use only within comparison
+      or the phi stmt).  */
+
+static bool
+malloc_candidate_p (function *fun, bool ipa)
+{
+  basic_block exit_block = EXIT_BLOCK_PTR_FOR_FN (fun);
+  edge e;
+  edge_iterator ei;
+  cgraph_node *node = cgraph_node::get_create (fun->decl);
+
+#define DUMP_AND_RETURN(reason)  \
+{  \
+  if (dump_file && (dump_flags & TDF_DETAILS))  \
+    fprintf (dump_file, "%s", (reason));  \
+  return false;  \
+}
+
+  if (EDGE_COUNT (exit_block->preds) == 0)
+    return false;
+
+  FOR_EACH_EDGE (e, ei, exit_block->preds)
+    {
+      gimple_stmt_iterator gsi = gsi_last_bb (e->src);
+      greturn *ret_stmt = dyn_cast<greturn *> (gsi_stmt (gsi));
+
+      if (!ret_stmt)
+	return false;
+
+      tree retval = gimple_return_retval (ret_stmt);
+      if (!retval)
+	DUMP_AND_RETURN("No return value.")
+
+      if (TREE_CODE (retval) != SSA_NAME
+	  || TREE_CODE (TREE_TYPE (retval)) != POINTER_TYPE)
+	DUMP_AND_RETURN("Return value is not SSA_NAME or not a pointer type.")
+
+      if (!check_retval_uses (retval, ret_stmt))
+	DUMP_AND_RETURN("Return value has uses outside return stmt"
+			" and comparisons against 0.")
+
+      gimple *def = SSA_NAME_DEF_STMT (retval);
+      if (gcall *call_stmt = dyn_cast<gcall *> (def))
+	{
+	  tree callee_decl = gimple_call_fndecl (call_stmt);
+	  if (!callee_decl)
+	    return false;
+
+	  if (!ipa && !DECL_IS_MALLOC (callee_decl))
+	    DUMP_AND_RETURN("callee_decl does not have malloc attribute for"
+			    " non-ipa mode.")
+
+	  cgraph_edge *cs = node->get_edge (call_stmt);
+	  if (cs)
+	    {
+	      ipa_call_summary *es = ipa_call_summaries->get (cs);
+	      gcc_assert (es);
+	      es->is_return_callee_uncaptured = true;
+	    }
+	}
+
+      else if (gphi *phi = dyn_cast<gphi *> (def))
+	for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+	  {
+	    tree arg = gimple_phi_arg_def (phi, i);
+	    if (TREE_CODE (arg) != SSA_NAME)
+	      DUMP_AND_RETURN("phi arg is not SSA_NAME.")
+	    if (!(arg == null_pointer_node || check_retval_uses (arg, phi)))
+	      DUMP_AND_RETURN("phi arg has uses outside phi"
+			      " and comparisons against 0.")
+
+	    gimple *arg_def = SSA_NAME_DEF_STMT (arg);
+	    gcall *call_stmt = dyn_cast<gcall *> (arg_def);
+	    if (!call_stmt)
+	      return false;
+	    tree callee_decl = gimple_call_fndecl (call_stmt);
+	    if (!callee_decl)
+	      return false;
+	    if (!ipa && !DECL_IS_MALLOC (callee_decl))
+	      DUMP_AND_RETURN("callee_decl does not have malloc attribute for"
+			      " non-ipa mode.")
+
+	    cgraph_edge *cs = node->get_edge (call_stmt);
+	    if (cs)
+	      {
+		ipa_call_summary *es = ipa_call_summaries->get (cs);
+		gcc_assert (es);
+		es->is_return_callee_uncaptured = true;
+	      }
+	  }
+
+      else
+	DUMP_AND_RETURN("def_stmt of return value is not a call or phi-stmt.")
+    }
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "\nFound %s to be candidate for malloc attribute\n",
+	     IDENTIFIER_POINTER (DECL_NAME (fun->decl)));
+  return true;
+
+#undef DUMP_AND_RETURN
+}
+
 
 /* This is the main routine for finding the reference patterns for
    global variables within a function FN.  */
@@ -921,6 +1080,14 @@ end:
   if (TREE_NOTHROW (decl))
     l->can_throw = false;
 
+  l->malloc_state = STATE_MALLOC_BOTTOM;
+  if (DECL_IS_MALLOC (decl))
+    l->malloc_state = STATE_MALLOC;
+  else if (ipa && malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), true))
+    l->malloc_state = STATE_MALLOC_TOP;
+  else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), false))
+    l->malloc_state = STATE_MALLOC;
+
   pop_cfun ();
   if (dump_file)
     {
@@ -934,6 +1101,8 @@ end:
         fprintf (dump_file, "Function is locally pure.\n");
       if (l->can_free)
 	fprintf (dump_file, "Function can locally free.\n");
+      if (l->malloc_state == STATE_MALLOC)
+	fprintf (dump_file, "Function is locally malloc.\n");
     }
   return l;
 }
@@ -1067,6 +1236,7 @@ pure_const_write_summary (void)
 	  bp_pack_value (&bp, fs->looping, 1);
 	  bp_pack_value (&bp, fs->can_throw, 1);
 	  bp_pack_value (&bp, fs->can_free, 1);
+	  bp_pack_value (&bp, fs->malloc_state, 2);
 	  streamer_write_bitpack (&bp);
 	}
     }
@@ -1127,6 +1297,9 @@ pure_const_read_summary (void)
 	      fs->looping = bp_unpack_value (&bp, 1);
 	      fs->can_throw = bp_unpack_value (&bp, 1);
 	      fs->can_free = bp_unpack_value (&bp, 1);
+	      fs->malloc_state
+			= (enum malloc_state_e) bp_unpack_value (&bp, 2);
+
 	      if (dump_file)
 		{
 		  int flags = flags_from_decl_or_type (node->decl);
@@ -1149,6 +1322,8 @@ pure_const_read_summary (void)
 		    fprintf (dump_file,"  function is locally throwing\n");
 		  if (fs->can_free)
 		    fprintf (dump_file,"  function can locally free\n");
+		  fprintf (dump_file, "\n malloc state: %s\n",
+			   malloc_state_names[fs->malloc_state]);
 		}
 	    }
 
@@ -1659,6 +1834,127 @@ propagate_nothrow (void)
   free (order);
 }
 
+/* Debugging function to dump state of malloc lattice.  */
+
+DEBUG_FUNCTION
+static void
+dump_malloc_lattice (FILE *dump_file, const char *s)
+{
+  if (!dump_file)
+    return;
+
+  fprintf (dump_file, "\n\nMALLOC LATTICE %s:\n", s);
+  cgraph_node *node;
+  FOR_EACH_FUNCTION (node)
+    {
+      funct_state fs = get_function_state (node);
+      malloc_state_e state = fs->malloc_state;
+      fprintf (dump_file, "%s: %s\n", node->name (), malloc_state_names[state]);
+    }
+}
+
+/* Propagate malloc attribute across the callgraph.  */
+
+static void
+propagate_malloc (void)
+{
+  cgraph_node *node;
+  FOR_EACH_FUNCTION (node)
+    {
+      if (DECL_IS_MALLOC (node->decl))
+	if (!has_function_state (node))
+	  {
+	    funct_state l = XCNEW (struct funct_state_d);
+	    *l = varying_state;
+	    l->malloc_state = STATE_MALLOC;
+	    set_function_state (node, l);
+	  }
+    }
+
+  dump_malloc_lattice (dump_file, "Initial");
+  struct cgraph_node **order
+    = XNEWVEC (struct cgraph_node *, symtab->cgraph_count);
+  int order_pos = ipa_reverse_postorder (order);
+  bool changed = true;
+
+  while (changed)
+    {
+      changed = false;
+      /* Walk in postorder.  */
+      for (int i = order_pos - 1; i >= 0; --i)
+	{
+	  cgraph_node *node = order[i];
+	  if (node->alias
+	      || !node->definition
+	      || !has_function_state (node))
+	    continue;
+
+	  funct_state l = get_function_state (node);
+
+	  /* FIXME: add support for indirect-calls.  */
+	  if (node->indirect_calls)
+	    {
+	      l->malloc_state = STATE_MALLOC_BOTTOM;
+	      continue;
+	    }
+
+	  if (node->get_availability () <= AVAIL_INTERPOSABLE)
+	    {
+	      l->malloc_state = STATE_MALLOC_BOTTOM;
+	      continue;
+	    }
+
+	  if (l->malloc_state == STATE_MALLOC_BOTTOM)
+	    continue;
+
+	  vec<cgraph_node *> callees = vNULL;
+	  for (cgraph_edge *cs = node->callees; cs; cs = cs->next_callee)
+	    {
+	      ipa_call_summary *es = ipa_call_summaries->get (cs);
+	      if (es && es->is_return_callee_uncaptured)
+		callees.safe_push (cs->callee);
+	    }
+
+	  malloc_state_e new_state = l->malloc_state;
+	  for (unsigned j = 0; j < callees.length (); j++)
+	    {
+	      cgraph_node *callee = callees[j];
+	      if (!has_function_state (callee))
+		{
+		  new_state = STATE_MALLOC_BOTTOM;
+		  break;
+		}
+	      malloc_state_e callee_state = get_function_state (callee)->malloc_state;
+	      if (new_state < callee_state)
+		new_state = callee_state;
+	    }
+	  if (new_state != l->malloc_state)
+	    {
+	      changed = true;
+	      l->malloc_state = new_state;
+	    }
+	}
+    }
+
+  FOR_EACH_DEFINED_FUNCTION (node)
+    if (has_function_state (node))
+      {
+	funct_state l = get_function_state (node);
+	if (!node->alias
+	    && l->malloc_state == STATE_MALLOC
+	    && !node->global.inlined_to)
+	  {
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      fprintf (dump_file, "Function %s found to be malloc\n",
+		       node->name ());
+	    node->set_malloc_flag (true);
+	  }
+      }
+
+  dump_malloc_lattice (dump_file, "after propagation");
+  ipa_free_postorder_info ();
+  free (order);
+}
 
 /* Produce the global information by preforming a transitive closure
    on the local information that was produced by generate_summary.  */
@@ -1677,6 +1973,7 @@ execute (function *)
   /* Nothrow makes more function to not lead to return and improve
      later analysis.  */
   propagate_nothrow ();
+  propagate_malloc ();
   remove_p = propagate_pure_const ();
 
   /* Cleanup. */
@@ -1684,6 +1981,10 @@ execute (function *)
     if (has_function_state (node))
       free (get_function_state (node));
   funct_state_vec.release ();
+
+  /* In WPA we use inline summaries for partitioning process.  */
+  if (!flag_wpa)
+    ipa_free_fn_summary ();
   return remove_p ? TODO_remove_functions : 0;
 }
 
@@ -1877,6 +2178,17 @@ pass_local_pure_const::execute (function *fun)
 	fprintf (dump_file, "Function found to be nothrow: %s\n",
 		 current_function_name ());
     }
+
+  if (l->malloc_state == STATE_MALLOC
+      && !DECL_IS_MALLOC (current_function_decl))
+    {
+      node->set_malloc_flag (true);
+      changed = true;
+      if (dump_file)
+	fprintf (dump_file, "Function found to be malloc: %s\n",
+		 node->name ());
+    }
+
   free (l);
   if (changed)
     return execute_fixup_cfg ();
diff --git a/gcc/ssa-iterators.h b/gcc/ssa-iterators.h
index c8aa77bd4f3..740cbf13cb2 100644
--- a/gcc/ssa-iterators.h
+++ b/gcc/ssa-iterators.h
@@ -93,6 +93,12 @@ struct imm_use_iterator
      break;							\
    }
 
+/* Similarly for return.  */
+#define RETURN_FROM_IMM_USE_STMT(ITER, VAL)			\
+  {								\
+    end_imm_use_stmt_traverse (&(ITER));			\
+    return (VAL);						\
+  }
 
 /* Use this iterator in combination with FOR_EACH_IMM_USE_STMT to
    get access to each occurrence of ssavar on the stmt returned by
diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c
new file mode 100644
index 00000000000..9a95f817079
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */
+
+__attribute__((noinline, no_icf, used))
+static void *f(__SIZE_TYPE__ n)
+{
+  void *p = __builtin_malloc (n);
+  if (p == 0)
+    __builtin_abort ();
+  return p;
+}
+
+__attribute__((noinline, no_icf, used))
+static void *bar(__SIZE_TYPE__ n)
+{
+  void *p = f (n);
+  return p;
+}
+
+/* { dg-final { scan-ipa-dump "Function f found to be malloc" "pure-const" } } */
+/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */
diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c
new file mode 100644
index 00000000000..95b2fd74a7a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */
+
+__attribute__((noinline, used, no_icf))
+static void *foo (__SIZE_TYPE__ n)
+{
+  return __builtin_malloc (n * 10);
+}
+
+__attribute__((noinline, used, no_icf))
+static void *bar(__SIZE_TYPE__ n, int cond)
+{
+  void *p;
+  if (cond)
+    p = foo (n);
+  else
+    p = __builtin_malloc (n);
+
+  return p;
+}
+
+/* { dg-final { scan-ipa-dump "Function foo found to be malloc" "pure-const" } } */
+/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */
diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c
new file mode 100644
index 00000000000..13558ddd07d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */
+
+static void *foo(__SIZE_TYPE__, int) __attribute__((noinline, no_icf, used));
+
+__attribute__((noinline, used, no_icf))
+static void *bar(__SIZE_TYPE__ n, int m)
+{
+  return foo (n, m);
+}
+
+static void *foo(__SIZE_TYPE__ n, int m)
+{
+  void *p;
+  if (m > 0)
+    p = bar (n, --m);
+  else
+    p = __builtin_malloc (n);
+
+  return p;
+}
+
+/* { dg-final { scan-ipa-dump "Function foo found to be malloc" "pure-const" } } */
+/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */
Prathamesh Kulkarni Oct. 23, 2017, 9:36 a.m. UTC | #23
On 14 October 2017 at 03:20, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
> On 7 October 2017 at 12:35, Prathamesh Kulkarni

> <prathamesh.kulkarni@linaro.org> wrote:

>> On 7 October 2017 at 11:23, Jan Hubicka <hubicka@ucw.cz> wrote:

>>>> On 6 October 2017 at 06:04, Jan Hubicka <hubicka@ucw.cz> wrote:

>>>> >> Hi Honza,

>>>> >> Thanks for the detailed suggestions, I have updated the patch accordingly.

>>>> >> I have following questions on call_summary:

>>>> >> 1] I added field bool is_return_callee in ipa_call_summary to track

>>>> >> whether the caller possibly returns value returned by callee, which

>>>> >> gets rid of return_callees_map. I assume ipa_call_summary_t::remove()

>>>> >> and ipa_call_summary_t::duplicate() will already take care of handling

>>>> >> late insertion/removal of cgraph nodes ? I just initialized

>>>> >> is_return_callee to false in ipa_call_summary::reset and that seems to

>>>> >> work. I am not sure though if I have handled it correctly. Could you

>>>> >> please check that ?

>>>> >

>>>> > I was actually thinking to introduce separate summary for ipa-pure-const pass,

>>>> > but this seems fine to me too (for one bit definitly more effecient)

>>>> > ipa_call_summary_t::duplicate copies all the fields, so indeed you should be

>>>> > safe here.

>>>> >

>>>> > Also it is possible for functions to be inserted late.  Updating of call summaries

>>>> > is currently handled by ipa_fn_summary_t::insert

>>>> >>

>>>> >> 2] ipa_inline() called ipa_free_fn_summary, which made

>>>> >> ipa_call_summaries unavailable during ipa-pure-const pass. I removed

>>>> >> call to ipa_free_fn_summary from ipa_inline, and moved it to

>>>> >> ipa_pure_const::execute(). Is that OK ?

>>>> >

>>>> > Seems OK to me.

>>>> >>

>>>> >> Patch passes bootstrap+test and lto bootstrap+test on x86_64-unknown-linux-gnu.

>>>> >> Verfiied SPEC2k6 compiles and runs without miscompares with LTO

>>>> >> enabled on aarch64-linux-gnu.

>>>> >> Cross-tested on arm*-*-* and aarch64*-*-*. I will additionally test

>>>> >> the patch by building chromium or firefox.

>>>> >> Would it be OK to commit if it passes above validations ?

>>>> >>

>>>> >> Thanks,

>>>> >> Prathamesh

>>>> >> >

>>>> >> > Thanks,

>>>> >> > Honza

>>>> >

>>>> >> 2017-10-05  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

>>>> >>

>>>> >>       * cgraph.h (set_malloc_flag): Declare.

>>>> >>       * cgraph.c (set_malloc_flag_1): New function.

>>>> >>       (set_malloc_flag): Likewise.

>>>> >>       * ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee.

>>>> >>       * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to

>>>> >>       false.

>>>> >>       (read_ipa_call_summary): Add support for reading is_return_callee.

>>>> >>       (write_ipa_call_summary): Stream is_return_callee.

>>>> >>       * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary.

>>>> >>       * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, symbol-summary.h,

>>>> >>       ipa-prop.h, ipa-fnsummary.h.

>>>> >>       (malloc_state_e): Define.

>>>> >>       (malloc_state_names): Define.

>>>> >>       (funct_state_d): Add field malloc_state.

>>>> >>       (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM.

>>>> >>       (check_retval_uses): New function.

>>>> >>       (malloc_candidate_p): Likewise.

>>>> >>       (analyze_function): Add support for malloc attribute.

>>>> >>       (pure_const_write_summary): Stream malloc_state.

>>>> >>       (pure_const_read_summary): Add support for reading malloc_state.

>>>> >>       (dump_malloc_lattice): New function.

>>>> >>       (propagate_malloc): New function.

>>>> >>       (ipa_pure_const::execute): Call propagate_malloc and

>>>> >>       ipa_free_fn_summary.

>>>> >>       (pass_local_pure_const::execute): Add support for malloc attribute.

>>>> >>       * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro.

>>>> >>

>>>> >> testsuite/

>>>> >>       * gcc.dg/ipa/propmalloc-1.c: New test-case.

>>>> >>       * gcc.dg/ipa/propmalloc-2.c: Likewise.

>>>> >>       * gcc.dg/ipa/propmalloc-3.c: Likewise.

>>>> >>

>>>> >> diff --git a/gcc/cgraph.c b/gcc/cgraph.c

>>>> >> index 3d0cefbd46b..0aad90d59ea 100644

>>>> >> --- a/gcc/cgraph.c

>>>> >> +++ b/gcc/cgraph.c

>>>> >> @@ -2530,6 +2530,53 @@ cgraph_node::set_nothrow_flag (bool nothrow)

>>>> >>    return changed;

>>>> >>  }

>>>> >>

>>>> >> +/* Worker to set malloc flag.  */

>>>> > New line here I guess (it is below)

>>>> >> +static void

>>>> >> +set_malloc_flag_1 (cgraph_node *node, bool malloc_p, bool *changed)

>>>> >> +{

>>>> >> +  if (malloc_p && !DECL_IS_MALLOC (node->decl))

>>>> >> +    {

>>>> >> +      DECL_IS_MALLOC (node->decl) = true;

>>>> >> +      *changed = true;

>>>> >> +    }

>>>> >> +

>>>> >> +  ipa_ref *ref;

>>>> >> +  FOR_EACH_ALIAS (node, ref)

>>>> >> +    {

>>>> >> +      cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);

>>>> >> +      if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)

>>>> >> +     set_malloc_flag_1 (alias, malloc_p, changed);

>>>> >> +    }

>>>> >> +

>>>> >> +  for (cgraph_edge *e = node->callers; e; e = e->next_caller)

>>>> >> +    if (e->caller->thunk.thunk_p

>>>> >> +     && (!malloc_p || e->caller->get_availability () > AVAIL_INTERPOSABLE))

>>>> >> +      set_malloc_flag_1 (e->caller, malloc_p, changed);

>>>> >> +}

>>>> >> +

>>>> >> +/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any.  */

>>>> >> +

>>>> >> +bool

>>>> >> +cgraph_node::set_malloc_flag (bool malloc_p)

>>>> >> +{

>>>> >> +  bool changed = false;

>>>> >> +

>>>> >> +  if (!malloc_p || get_availability () > AVAIL_INTERPOSABLE)

>>>> >> +    set_malloc_flag_1 (this, malloc_p, &changed);

>>>> >> +  else

>>>> >> +    {

>>>> >> +      ipa_ref *ref;

>>>> >> +

>>>> >> +      FOR_EACH_ALIAS (this, ref)

>>>> >> +     {

>>>> >> +       cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);

>>>> >> +       if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)

>>>> >> +         set_malloc_flag_1 (alias, malloc_p, &changed);

>>>> >> +     }

>>>> >> +    }

>>>> >> +  return changed;

>>>> >> +}

>>>> >> +

>>>> >> diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h

>>>> >> index f50d6806e61..613a2b6f625 100644

>>>> >> --- a/gcc/ipa-fnsummary.h

>>>> >> +++ b/gcc/ipa-fnsummary.h

>>>> >> @@ -197,7 +197,9 @@ struct ipa_call_summary

>>>> >>    int call_stmt_time;

>>>> >>    /* Depth of loop nest, 0 means no nesting.  */

>>>> >>    unsigned int loop_depth;

>>>> >> -

>>>> >> +  /* Indicates whether the caller returns the value of it's callee.  */

>>>> > Perhaps add a comment when it is initialized.

>>>> > Don't you also check that the calle is not captured? In that case I would

>>>> > is_return_callee_uncaptured or so, so someone does not try to use it with

>>>> > different meaning.

>>>> Sorry, I didn't understand "Don't you also check that callee is not captured ?"

>>>> What does capturing mean in this context ?

>>>

>>> Don't you need to know that the returned pointer is new and does not alias

>>> with something else?

>> Yes, malloc_candidate_p enforces that locally by checking the returned

>> pointer is return value of callee and optionally has immediate uses

>> only within comparisons against 0. But whether the returned pointer is

>> new depends on whether callee itself can be annotated with malloc,

>> which is done in propagate_malloc.

>> IIUC pointer returned by a malloc annotated function, does not alias

>> anything else ?

>>

>> Thanks,

>> Prathamesh

>>>

>>> Honza

>>>>

>>>> Thanks,

>>>> Prathamesh

>>>> >

>>>> >> @@ -69,6 +74,15 @@ enum pure_const_state_e

>>>> >>

>>>> >>  const char *pure_const_names[3] = {"const", "pure", "neither"};

>>>> >>

>>>> >> +enum malloc_state_e

>>>> >> +{

>>>> >> +  STATE_MALLOC_TOP,

>>>> >> +  STATE_MALLOC,

>>>> >> +  STATE_MALLOC_BOTTOM

>>>> >> +};

>>>> >> +

>>>> >> +const char *malloc_state_names[] = {"malloc_top", "malloc", "malloc_bottom"};

>>>> >

>>>> > Seems static is missing in all those declarations, please fix it.

> Hi Honza,

> I have done the suggested changes in this version.

> LTO Bootstrapped+tested on x86_64-unknown-linux-gnu, ppc64le-linux-gnu.

> Cross-tested on arm*-*-*, aarch64*-*-*

> Verified no functional regressions with SPEC2006.

> Would it be OK to commit this version of the patch ?

ping https://gcc.gnu.org/ml/gcc-patches/2017-10/msg00924.html

Thanks,
Prathamesh
>

> Thanks,

> Prathamesh

>>>> >

>>>> > OK with these changes. Thanks!

>>>> >

>>>> > Honza
Jan Hubicka Oct. 24, 2017, 10:56 a.m. UTC | #24
> 2017-10-13  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

> 

> 	* cgraph.h (set_malloc_flag): Declare.

> 	* cgraph.c (set_malloc_flag_1): New function.

> 	(set_malloc_flag): Likewise.

> 	* ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee.

> 	* ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to

> 	false.

> 	(read_ipa_call_summary): Add support for reading is_return_callee.

> 	(write_ipa_call_summary): Stream is_return_callee.

> 	* ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary.

> 	* ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, symbol-summary.h,

> 	ipa-prop.h, ipa-fnsummary.h.

> 	(pure_const_names): Change to static.

> 	(malloc_state_e): Define.

> 	(malloc_state_names): Define.

> 	(funct_state_d): Add field malloc_state.

> 	(varying_state): Set malloc_state to STATE_MALLOC_BOTTOM.

> 	(check_retval_uses): New function.

> 	(malloc_candidate_p): Likewise.

> 	(analyze_function): Add support for malloc attribute.

> 	(pure_const_write_summary): Stream malloc_state.

> 	(pure_const_read_summary): Add support for reading malloc_state.

> 	(dump_malloc_lattice): New function.

> 	(propagate_malloc): New function.

> 	(ipa_pure_const::execute): Call propagate_malloc and

> 	ipa_free_fn_summary.

> 	(pass_local_pure_const::execute): Add support for malloc attribute.

> 	* ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro.

> 

> testsuite/

> 	* gcc.dg/ipa/propmalloc-1.c: New test-case.

> 	* gcc.dg/ipa/propmalloc-2.c: Likewise.

> 	* gcc.dg/ipa/propmalloc-3.c: Likewise.


OK.
Perhaps we could also add -Wsuggest-sttribute=malloc and mention it in changes.html?

Thanks,
Honza
Prathamesh Kulkarni Oct. 25, 2017, 10:57 a.m. UTC | #25
On 24 October 2017 at 16:26, Jan Hubicka <hubicka@ucw.cz> wrote:
>> 2017-10-13  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

>>

>>       * cgraph.h (set_malloc_flag): Declare.

>>       * cgraph.c (set_malloc_flag_1): New function.

>>       (set_malloc_flag): Likewise.

>>       * ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee.

>>       * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to

>>       false.

>>       (read_ipa_call_summary): Add support for reading is_return_callee.

>>       (write_ipa_call_summary): Stream is_return_callee.

>>       * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary.

>>       * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, symbol-summary.h,

>>       ipa-prop.h, ipa-fnsummary.h.

>>       (pure_const_names): Change to static.

>>       (malloc_state_e): Define.

>>       (malloc_state_names): Define.

>>       (funct_state_d): Add field malloc_state.

>>       (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM.

>>       (check_retval_uses): New function.

>>       (malloc_candidate_p): Likewise.

>>       (analyze_function): Add support for malloc attribute.

>>       (pure_const_write_summary): Stream malloc_state.

>>       (pure_const_read_summary): Add support for reading malloc_state.

>>       (dump_malloc_lattice): New function.

>>       (propagate_malloc): New function.

>>       (ipa_pure_const::execute): Call propagate_malloc and

>>       ipa_free_fn_summary.

>>       (pass_local_pure_const::execute): Add support for malloc attribute.

>>       * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro.

>>

>> testsuite/

>>       * gcc.dg/ipa/propmalloc-1.c: New test-case.

>>       * gcc.dg/ipa/propmalloc-2.c: Likewise.

>>       * gcc.dg/ipa/propmalloc-3.c: Likewise.

>

> OK.

> Perhaps we could also add -Wsuggest-sttribute=malloc and mention it in changes.html?

Done in this version.
In warn_function_malloc(), I passed false for known_finite param to
suggest_attribute().
Does that look OK ?
Validation in progress. OK to commit if passes ?

Thanks,
Prathamesh
>

> Thanks,

> Honza
2017-10-13  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

	* cgraph.h (set_malloc_flag): Declare.
	* cgraph.c (set_malloc_flag_1): New function.
	(set_malloc_flag): Likewise.
	* ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee.
	* ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to
	false.
	(read_ipa_call_summary): Add support for reading is_return_callee.
	(write_ipa_call_summary): Stream is_return_callee.
	* ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary.
	* ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, symbol-summary.h,
	ipa-prop.h, ipa-fnsummary.h.
	(pure_const_names): Change to static.
	(malloc_state_e): Define.
	(malloc_state_names): Define.
	(funct_state_d): Add field malloc_state.
	(varying_state): Set malloc_state to STATE_MALLOC_BOTTOM.
	(check_retval_uses): New function.
	(malloc_candidate_p): Likewise.
	(analyze_function): Add support for malloc attribute.
	(pure_const_write_summary): Stream malloc_state.
	(pure_const_read_summary): Add support for reading malloc_state.
	(dump_malloc_lattice): New function.
	(propagate_malloc): New function.
	(warn_function_malloc): New function.
	(ipa_pure_const::execute): Call propagate_malloc and
	ipa_free_fn_summary.
	(pass_local_pure_const::execute): Add support for malloc attribute.
	* ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro.
	* doc/invoke.texi: Document Wsuggest-attribute=malloc.

testsuite/
	* gcc.dg/ipa/propmalloc-1.c: New test-case.
	* gcc.dg/ipa/propmalloc-2.c: Likewise.
	* gcc.dg/ipa/propmalloc-3.c: Likewise.

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 3d0cefbd46b..0aad90d59ea 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2530,6 +2530,53 @@ cgraph_node::set_nothrow_flag (bool nothrow)
   return changed;
 }
 
+/* Worker to set malloc flag.  */
+static void
+set_malloc_flag_1 (cgraph_node *node, bool malloc_p, bool *changed)
+{
+  if (malloc_p && !DECL_IS_MALLOC (node->decl))
+    {
+      DECL_IS_MALLOC (node->decl) = true;
+      *changed = true;
+    }
+
+  ipa_ref *ref;
+  FOR_EACH_ALIAS (node, ref)
+    {
+      cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);
+      if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)
+	set_malloc_flag_1 (alias, malloc_p, changed);
+    }
+
+  for (cgraph_edge *e = node->callers; e; e = e->next_caller)
+    if (e->caller->thunk.thunk_p
+	&& (!malloc_p || e->caller->get_availability () > AVAIL_INTERPOSABLE))
+      set_malloc_flag_1 (e->caller, malloc_p, changed);
+}
+
+/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any.  */
+
+bool
+cgraph_node::set_malloc_flag (bool malloc_p)
+{
+  bool changed = false;
+
+  if (!malloc_p || get_availability () > AVAIL_INTERPOSABLE)
+    set_malloc_flag_1 (this, malloc_p, &changed);
+  else
+    {
+      ipa_ref *ref;
+
+      FOR_EACH_ALIAS (this, ref)
+	{
+	  cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);
+	  if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)
+	    set_malloc_flag_1 (alias, malloc_p, &changed);
+	}
+    }
+  return changed;
+}
+
 /* Worker to set_const_flag.  */
 
 static void
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 1758e8b08c1..84824e9f814 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1151,6 +1151,10 @@ public:
      if any to NOTHROW.  */
   bool set_nothrow_flag (bool nothrow);
 
+  /* SET DECL_IS_MALLOC on cgraph_node's decl and on aliases of the node
+     if any.  */
+  bool set_malloc_flag (bool malloc_p);
+
   /* If SET_CONST is true, mark function, aliases and thunks to be ECF_CONST.
     If SET_CONST if false, clear the flag.
 
diff --git a/gcc/common.opt b/gcc/common.opt
index dfde6adba91..00dfc97c4cb 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -733,6 +733,10 @@ Wsuggest-attribute=noreturn
 Common Var(warn_suggest_attribute_noreturn) Warning
 Warn about functions which might be candidates for __attribute__((noreturn)).
 
+Wsuggest-attribute=malloc
+Common Var(warn_suggest_attribute_malloc) Warning
+Warn about functions which might be candidates for __attribute__((malloc)).
+
 Wsuggest-final-types
 Common Var(warn_suggest_final_types) Warning
 Warn about C++ polymorphic types where adding final keyword would improve code quality.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5e39c0efeb9..97b891d6d69 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -315,7 +315,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wstack-protector  -Wstack-usage=@var{len}  -Wstrict-aliasing @gol
 -Wstrict-aliasing=n  -Wstrict-overflow  -Wstrict-overflow=@var{n} @gol
 -Wstringop-overflow=@var{n} @gol
--Wsuggest-attribute=@r{[}pure@r{|}const@r{|}noreturn@r{|}format@r{]} @gol
+-Wsuggest-attribute=@r{[}pure@r{|}const@r{|}noreturn@r{|}format@r{|}malloc@r{]} @gol
 -Wsuggest-final-types @gol  -Wsuggest-final-methods  -Wsuggest-override @gol
 -Wmissing-format-attribute  -Wsubobject-linkage @gol
 -Wswitch  -Wswitch-bool  -Wswitch-default  -Wswitch-enum @gol
@@ -5190,7 +5190,7 @@ whether to issue a warning.  Similarly to @option{-Wstringop-overflow=3} this
 setting of the option may result in warnings for benign code.
 @end table
 
-@item -Wsuggest-attribute=@r{[}pure@r{|}const@r{|}noreturn@r{|}format@r{]}
+@item -Wsuggest-attribute=@r{[}pure@r{|}const@r{|}noreturn@r{|}format@r{|}malloc@r{]}
 @opindex Wsuggest-attribute=
 @opindex Wno-suggest-attribute=
 Warn for cases where adding an attribute may be beneficial. The
@@ -5200,21 +5200,25 @@ attributes currently supported are listed below.
 @item -Wsuggest-attribute=pure
 @itemx -Wsuggest-attribute=const
 @itemx -Wsuggest-attribute=noreturn
+@itemx -Wsuggest-attribute=malloc
 @opindex Wsuggest-attribute=pure
 @opindex Wno-suggest-attribute=pure
 @opindex Wsuggest-attribute=const
 @opindex Wno-suggest-attribute=const
 @opindex Wsuggest-attribute=noreturn
 @opindex Wno-suggest-attribute=noreturn
+@opindex Wsuggest-attribute=malloc
+@opindex Wno-suggest-attribute=malloc
 
 Warn about functions that might be candidates for attributes
-@code{pure}, @code{const} or @code{noreturn}.  The compiler only warns for
-functions visible in other compilation units or (in the case of @code{pure} and
-@code{const}) if it cannot prove that the function returns normally. A function
-returns normally if it doesn't contain an infinite loop or return abnormally
-by throwing, calling @code{abort} or trapping.  This analysis requires option
-@option{-fipa-pure-const}, which is enabled by default at @option{-O} and
-higher.  Higher optimization levels improve the accuracy of the analysis.
+@code{pure}, @code{const}, @code{noreturn} or @code{malloc}.  The compiler only
+warns for functions visible in other compilation units or (in the case of
+@code{pure} and @code{const}) if it cannot prove that the function returns
+normally. A function returns normally if it doesn't contain an infinite loop
+or return abnormally by throwing, calling @code{abort} or trapping.  This
+analysis requires option @option{-fipa-pure-const}, which is enabled by default
+at @option{-O} and higher.  Higher optimization levels improve the accuracy of
+the analysis.
 
 @item -Wsuggest-attribute=format
 @itemx -Wmissing-format-attribute
diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 076ccd40bd7..f71338e424e 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -542,6 +542,7 @@ void
 ipa_call_summary::reset ()
 {
   call_stmt_size = call_stmt_time = 0;
+  is_return_callee_uncaptured = false;
   if (predicate)
     edge_predicate_pool.remove (predicate);
   predicate = NULL;
@@ -3204,6 +3205,10 @@ read_ipa_call_summary (struct lto_input_block *ib, struct cgraph_edge *e)
   es->call_stmt_size = streamer_read_uhwi (ib);
   es->call_stmt_time = streamer_read_uhwi (ib);
   es->loop_depth = streamer_read_uhwi (ib);
+
+  bitpack_d bp = streamer_read_bitpack (ib);
+  es->is_return_callee_uncaptured = bp_unpack_value (&bp, 1);
+
   p.stream_in (ib);
   edge_set_predicate (e, &p);
   length = streamer_read_uhwi (ib);
@@ -3360,6 +3365,11 @@ write_ipa_call_summary (struct output_block *ob, struct cgraph_edge *e)
   streamer_write_uhwi (ob, es->call_stmt_size);
   streamer_write_uhwi (ob, es->call_stmt_time);
   streamer_write_uhwi (ob, es->loop_depth);
+
+  bitpack_d bp = bitpack_create (ob->main_stream);
+  bp_pack_value (&bp, es->is_return_callee_uncaptured, 1);
+  streamer_write_bitpack (&bp);
+
   if (es->predicate)
     es->predicate->stream_out (ob);
   else
diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h
index f50d6806e61..a794bd09318 100644
--- a/gcc/ipa-fnsummary.h
+++ b/gcc/ipa-fnsummary.h
@@ -197,7 +197,9 @@ struct ipa_call_summary
   int call_stmt_time;
   /* Depth of loop nest, 0 means no nesting.  */
   unsigned int loop_depth;
-  
+  /* Indicates whether the caller returns the value of it's callee.  */
+  bool is_return_callee_uncaptured;
+
   /* Keep all field empty so summary dumping works during its computation.
      This is useful for debugging.  */
   ipa_call_summary ()
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index dd46cb61362..b8e65e2fa7e 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -2526,9 +2526,6 @@ ipa_inline (void)
 
   if (dump_file)
     ipa_dump_fn_summaries (dump_file);
-  /* In WPA we use inline summaries for partitioning process.  */
-  if (!flag_wpa)
-    ipa_free_fn_summary ();
   return remove_functions ? TODO_remove_functions : 0;
 }
 
diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index dac8f0d5f21..965fdd41975 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -56,6 +56,11 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "intl.h"
 #include "opts.h"
+#include "ssa.h"
+#include "alloc-pool.h"
+#include "symbol-summary.h"
+#include "ipa-prop.h"
+#include "ipa-fnsummary.h"
 
 /* Lattice values for const and pure functions.  Everything starts out
    being const, then may drop to pure and then neither depending on
@@ -67,7 +72,16 @@ enum pure_const_state_e
   IPA_NEITHER
 };
 
-const char *pure_const_names[3] = {"const", "pure", "neither"};
+static const char *pure_const_names[3] = {"const", "pure", "neither"};
+
+enum malloc_state_e
+{
+  STATE_MALLOC_TOP,
+  STATE_MALLOC,
+  STATE_MALLOC_BOTTOM
+};
+
+static const char *malloc_state_names[] = {"malloc_top", "malloc", "malloc_bottom"};
 
 /* Holder for the const_state.  There is one of these per function
    decl.  */
@@ -92,11 +106,13 @@ struct funct_state_d
   /* If function can call free, munmap or otherwise make previously
      non-trapping memory accesses trapping.  */
   bool can_free;
+
+  enum malloc_state_e malloc_state;
 };
 
 /* State used when we know nothing about function.  */
 static struct funct_state_d varying_state
-   = { IPA_NEITHER, IPA_NEITHER, true, true, true, true };
+   = { IPA_NEITHER, IPA_NEITHER, true, true, true, true, STATE_MALLOC_BOTTOM };
 
 
 typedef struct funct_state_d * funct_state;
@@ -215,6 +231,19 @@ warn_function_const (tree decl, bool known_finite)
 			 known_finite, warned_about, "const");
 }
 
+/* Emit suggestion about __attribute__((malloc)) for DECL.  */
+
+static void
+warn_function_malloc (tree decl)
+{
+  static hash_set<tree> *warned_about;
+  warned_about
+    = suggest_attribute (OPT_Wsuggest_attribute_malloc, decl,
+			 false, warned_about, "malloc");
+}
+
+/* Emit suggestion about __attribute__((noreturn)) for DECL.  */
+
 static void
 warn_function_noreturn (tree decl)
 {
@@ -812,6 +841,149 @@ check_stmt (gimple_stmt_iterator *gsip, funct_state local, bool ipa)
     }
 }
 
+/* Check that RETVAL is used only in STMT and in comparisons against 0.
+   RETVAL is return value of the function and STMT is return stmt.  */
+
+static bool
+check_retval_uses (tree retval, gimple *stmt)
+{
+  imm_use_iterator use_iter;
+  gimple *use_stmt;
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, retval)
+    if (gcond *cond = dyn_cast<gcond *> (use_stmt))
+      {
+	tree op2 = gimple_cond_rhs (cond);
+	if (!integer_zerop (op2))
+	  RETURN_FROM_IMM_USE_STMT (use_iter, false);
+      }
+    else if (gassign *ga = dyn_cast<gassign *> (use_stmt))
+      {
+	enum tree_code code = gimple_assign_rhs_code (ga);
+	if (TREE_CODE_CLASS (code) != tcc_comparison)
+	  RETURN_FROM_IMM_USE_STMT (use_iter, false);
+	if (!integer_zerop (gimple_assign_rhs2 (ga)))
+	  RETURN_FROM_IMM_USE_STMT (use_iter, false);
+      }
+    else if (is_gimple_debug (use_stmt))
+      ;
+    else if (use_stmt != stmt)
+      RETURN_FROM_IMM_USE_STMT (use_iter, false);
+
+  return true;
+}
+
+/* malloc_candidate_p() checks if FUN can possibly be annotated with malloc
+   attribute. Currently this function does a very conservative analysis.
+   FUN is considered to be a candidate if
+   1) It returns a value of pointer type.
+   2) SSA_NAME_DEF_STMT (return_value) is either a function call or
+      a phi, and element of phi is either NULL or
+      SSA_NAME_DEF_STMT(element) is function call.
+   3) The return-value has immediate uses only within comparisons (gcond or gassign)
+      and return_stmt (and likewise a phi arg has immediate use only within comparison
+      or the phi stmt).  */
+
+static bool
+malloc_candidate_p (function *fun, bool ipa)
+{
+  basic_block exit_block = EXIT_BLOCK_PTR_FOR_FN (fun);
+  edge e;
+  edge_iterator ei;
+  cgraph_node *node = cgraph_node::get_create (fun->decl);
+
+#define DUMP_AND_RETURN(reason)  \
+{  \
+  if (dump_file && (dump_flags & TDF_DETAILS))  \
+    fprintf (dump_file, "%s", (reason));  \
+  return false;  \
+}
+
+  if (EDGE_COUNT (exit_block->preds) == 0)
+    return false;
+
+  FOR_EACH_EDGE (e, ei, exit_block->preds)
+    {
+      gimple_stmt_iterator gsi = gsi_last_bb (e->src);
+      greturn *ret_stmt = dyn_cast<greturn *> (gsi_stmt (gsi));
+
+      if (!ret_stmt)
+	return false;
+
+      tree retval = gimple_return_retval (ret_stmt);
+      if (!retval)
+	DUMP_AND_RETURN("No return value.")
+
+      if (TREE_CODE (retval) != SSA_NAME
+	  || TREE_CODE (TREE_TYPE (retval)) != POINTER_TYPE)
+	DUMP_AND_RETURN("Return value is not SSA_NAME or not a pointer type.")
+
+      if (!check_retval_uses (retval, ret_stmt))
+	DUMP_AND_RETURN("Return value has uses outside return stmt"
+			" and comparisons against 0.")
+
+      gimple *def = SSA_NAME_DEF_STMT (retval);
+      if (gcall *call_stmt = dyn_cast<gcall *> (def))
+	{
+	  tree callee_decl = gimple_call_fndecl (call_stmt);
+	  if (!callee_decl)
+	    return false;
+
+	  if (!ipa && !DECL_IS_MALLOC (callee_decl))
+	    DUMP_AND_RETURN("callee_decl does not have malloc attribute for"
+			    " non-ipa mode.")
+
+	  cgraph_edge *cs = node->get_edge (call_stmt);
+	  if (cs)
+	    {
+	      ipa_call_summary *es = ipa_call_summaries->get (cs);
+	      gcc_assert (es);
+	      es->is_return_callee_uncaptured = true;
+	    }
+	}
+
+      else if (gphi *phi = dyn_cast<gphi *> (def))
+	for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+	  {
+	    tree arg = gimple_phi_arg_def (phi, i);
+	    if (TREE_CODE (arg) != SSA_NAME)
+	      DUMP_AND_RETURN("phi arg is not SSA_NAME.")
+	    if (!(arg == null_pointer_node || check_retval_uses (arg, phi)))
+	      DUMP_AND_RETURN("phi arg has uses outside phi"
+			      " and comparisons against 0.")
+
+	    gimple *arg_def = SSA_NAME_DEF_STMT (arg);
+	    gcall *call_stmt = dyn_cast<gcall *> (arg_def);
+	    if (!call_stmt)
+	      return false;
+	    tree callee_decl = gimple_call_fndecl (call_stmt);
+	    if (!callee_decl)
+	      return false;
+	    if (!ipa && !DECL_IS_MALLOC (callee_decl))
+	      DUMP_AND_RETURN("callee_decl does not have malloc attribute for"
+			      " non-ipa mode.")
+
+	    cgraph_edge *cs = node->get_edge (call_stmt);
+	    if (cs)
+	      {
+		ipa_call_summary *es = ipa_call_summaries->get (cs);
+		gcc_assert (es);
+		es->is_return_callee_uncaptured = true;
+	      }
+	  }
+
+      else
+	DUMP_AND_RETURN("def_stmt of return value is not a call or phi-stmt.")
+    }
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "\nFound %s to be candidate for malloc attribute\n",
+	     IDENTIFIER_POINTER (DECL_NAME (fun->decl)));
+  return true;
+
+#undef DUMP_AND_RETURN
+}
+
 
 /* This is the main routine for finding the reference patterns for
    global variables within a function FN.  */
@@ -921,6 +1093,14 @@ end:
   if (TREE_NOTHROW (decl))
     l->can_throw = false;
 
+  l->malloc_state = STATE_MALLOC_BOTTOM;
+  if (DECL_IS_MALLOC (decl))
+    l->malloc_state = STATE_MALLOC;
+  else if (ipa && malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), true))
+    l->malloc_state = STATE_MALLOC_TOP;
+  else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), false))
+    l->malloc_state = STATE_MALLOC;
+
   pop_cfun ();
   if (dump_file)
     {
@@ -934,6 +1114,8 @@ end:
         fprintf (dump_file, "Function is locally pure.\n");
       if (l->can_free)
 	fprintf (dump_file, "Function can locally free.\n");
+      if (l->malloc_state == STATE_MALLOC)
+	fprintf (dump_file, "Function is locally malloc.\n");
     }
   return l;
 }
@@ -1067,6 +1249,7 @@ pure_const_write_summary (void)
 	  bp_pack_value (&bp, fs->looping, 1);
 	  bp_pack_value (&bp, fs->can_throw, 1);
 	  bp_pack_value (&bp, fs->can_free, 1);
+	  bp_pack_value (&bp, fs->malloc_state, 2);
 	  streamer_write_bitpack (&bp);
 	}
     }
@@ -1127,6 +1310,9 @@ pure_const_read_summary (void)
 	      fs->looping = bp_unpack_value (&bp, 1);
 	      fs->can_throw = bp_unpack_value (&bp, 1);
 	      fs->can_free = bp_unpack_value (&bp, 1);
+	      fs->malloc_state
+			= (enum malloc_state_e) bp_unpack_value (&bp, 2);
+
 	      if (dump_file)
 		{
 		  int flags = flags_from_decl_or_type (node->decl);
@@ -1149,6 +1335,8 @@ pure_const_read_summary (void)
 		    fprintf (dump_file,"  function is locally throwing\n");
 		  if (fs->can_free)
 		    fprintf (dump_file,"  function can locally free\n");
+		  fprintf (dump_file, "\n malloc state: %s\n",
+			   malloc_state_names[fs->malloc_state]);
 		}
 	    }
 
@@ -1659,6 +1847,131 @@ propagate_nothrow (void)
   free (order);
 }
 
+/* Debugging function to dump state of malloc lattice.  */
+
+DEBUG_FUNCTION
+static void
+dump_malloc_lattice (FILE *dump_file, const char *s)
+{
+  if (!dump_file)
+    return;
+
+  fprintf (dump_file, "\n\nMALLOC LATTICE %s:\n", s);
+  cgraph_node *node;
+  FOR_EACH_FUNCTION (node)
+    {
+      funct_state fs = get_function_state (node);
+      malloc_state_e state = fs->malloc_state;
+      fprintf (dump_file, "%s: %s\n", node->name (), malloc_state_names[state]);
+    }
+}
+
+/* Propagate malloc attribute across the callgraph.  */
+
+static void
+propagate_malloc (void)
+{
+  cgraph_node *node;
+  FOR_EACH_FUNCTION (node)
+    {
+      if (DECL_IS_MALLOC (node->decl))
+	if (!has_function_state (node))
+	  {
+	    funct_state l = XCNEW (struct funct_state_d);
+	    *l = varying_state;
+	    l->malloc_state = STATE_MALLOC;
+	    set_function_state (node, l);
+	  }
+    }
+
+  dump_malloc_lattice (dump_file, "Initial");
+  struct cgraph_node **order
+    = XNEWVEC (struct cgraph_node *, symtab->cgraph_count);
+  int order_pos = ipa_reverse_postorder (order);
+  bool changed = true;
+
+  while (changed)
+    {
+      changed = false;
+      /* Walk in postorder.  */
+      for (int i = order_pos - 1; i >= 0; --i)
+	{
+	  cgraph_node *node = order[i];
+	  if (node->alias
+	      || !node->definition
+	      || !has_function_state (node))
+	    continue;
+
+	  funct_state l = get_function_state (node);
+
+	  /* FIXME: add support for indirect-calls.  */
+	  if (node->indirect_calls)
+	    {
+	      l->malloc_state = STATE_MALLOC_BOTTOM;
+	      continue;
+	    }
+
+	  if (node->get_availability () <= AVAIL_INTERPOSABLE)
+	    {
+	      l->malloc_state = STATE_MALLOC_BOTTOM;
+	      continue;
+	    }
+
+	  if (l->malloc_state == STATE_MALLOC_BOTTOM)
+	    continue;
+
+	  vec<cgraph_node *> callees = vNULL;
+	  for (cgraph_edge *cs = node->callees; cs; cs = cs->next_callee)
+	    {
+	      ipa_call_summary *es = ipa_call_summaries->get (cs);
+	      if (es && es->is_return_callee_uncaptured)
+		callees.safe_push (cs->callee);
+	    }
+
+	  malloc_state_e new_state = l->malloc_state;
+	  for (unsigned j = 0; j < callees.length (); j++)
+	    {
+	      cgraph_node *callee = callees[j];
+	      if (!has_function_state (callee))
+		{
+		  new_state = STATE_MALLOC_BOTTOM;
+		  break;
+		}
+	      malloc_state_e callee_state = get_function_state (callee)->malloc_state;
+	      if (new_state < callee_state)
+		new_state = callee_state;
+	    }
+	  if (new_state != l->malloc_state)
+	    {
+	      changed = true;
+	      l->malloc_state = new_state;
+	    }
+	}
+    }
+
+  FOR_EACH_DEFINED_FUNCTION (node)
+    if (has_function_state (node))
+      {
+	funct_state l = get_function_state (node);
+	if (!node->alias
+	    && l->malloc_state == STATE_MALLOC
+	    && !node->global.inlined_to)
+	  {
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      fprintf (dump_file, "Function %s found to be malloc\n",
+		       node->name ());
+
+	    bool malloc_decl_p = DECL_IS_MALLOC (node->decl);
+	    node->set_malloc_flag (true);
+	    if (!malloc_decl_p && warn_suggest_attribute_malloc)
+		warn_function_malloc (node->decl);
+	  }
+      }
+
+  dump_malloc_lattice (dump_file, "after propagation");
+  ipa_free_postorder_info ();
+  free (order);
+}
 
 /* Produce the global information by preforming a transitive closure
    on the local information that was produced by generate_summary.  */
@@ -1677,6 +1990,7 @@ execute (function *)
   /* Nothrow makes more function to not lead to return and improve
      later analysis.  */
   propagate_nothrow ();
+  propagate_malloc ();
   remove_p = propagate_pure_const ();
 
   /* Cleanup. */
@@ -1684,6 +1998,10 @@ execute (function *)
     if (has_function_state (node))
       free (get_function_state (node));
   funct_state_vec.release ();
+
+  /* In WPA we use inline summaries for partitioning process.  */
+  if (!flag_wpa)
+    ipa_free_fn_summary ();
   return remove_p ? TODO_remove_functions : 0;
 }
 
@@ -1877,6 +2195,19 @@ pass_local_pure_const::execute (function *fun)
 	fprintf (dump_file, "Function found to be nothrow: %s\n",
 		 current_function_name ());
     }
+
+  if (l->malloc_state == STATE_MALLOC
+      && !DECL_IS_MALLOC (current_function_decl))
+    {
+      node->set_malloc_flag (true);
+      if (warn_suggest_attribute_malloc)
+	warn_function_malloc (node->decl);
+      changed = true;
+      if (dump_file)
+	fprintf (dump_file, "Function found to be malloc: %s\n",
+		 node->name ());
+    }
+
   free (l);
   if (changed)
     return execute_fixup_cfg ();
diff --git a/gcc/ssa-iterators.h b/gcc/ssa-iterators.h
index c8aa77bd4f3..740cbf13cb2 100644
--- a/gcc/ssa-iterators.h
+++ b/gcc/ssa-iterators.h
@@ -93,6 +93,12 @@ struct imm_use_iterator
      break;							\
    }
 
+/* Similarly for return.  */
+#define RETURN_FROM_IMM_USE_STMT(ITER, VAL)			\
+  {								\
+    end_imm_use_stmt_traverse (&(ITER));			\
+    return (VAL);						\
+  }
 
 /* Use this iterator in combination with FOR_EACH_IMM_USE_STMT to
    get access to each occurrence of ssavar on the stmt returned by
diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c
new file mode 100644
index 00000000000..9a95f817079
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */
+
+__attribute__((noinline, no_icf, used))
+static void *f(__SIZE_TYPE__ n)
+{
+  void *p = __builtin_malloc (n);
+  if (p == 0)
+    __builtin_abort ();
+  return p;
+}
+
+__attribute__((noinline, no_icf, used))
+static void *bar(__SIZE_TYPE__ n)
+{
+  void *p = f (n);
+  return p;
+}
+
+/* { dg-final { scan-ipa-dump "Function f found to be malloc" "pure-const" } } */
+/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */
diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c
new file mode 100644
index 00000000000..95b2fd74a7a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */
+
+__attribute__((noinline, used, no_icf))
+static void *foo (__SIZE_TYPE__ n)
+{
+  return __builtin_malloc (n * 10);
+}
+
+__attribute__((noinline, used, no_icf))
+static void *bar(__SIZE_TYPE__ n, int cond)
+{
+  void *p;
+  if (cond)
+    p = foo (n);
+  else
+    p = __builtin_malloc (n);
+
+  return p;
+}
+
+/* { dg-final { scan-ipa-dump "Function foo found to be malloc" "pure-const" } } */
+/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */
diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c
new file mode 100644
index 00000000000..13558ddd07d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */
+
+static void *foo(__SIZE_TYPE__, int) __attribute__((noinline, no_icf, used));
+
+__attribute__((noinline, used, no_icf))
+static void *bar(__SIZE_TYPE__ n, int m)
+{
+  return foo (n, m);
+}
+
+static void *foo(__SIZE_TYPE__ n, int m)
+{
+  void *p;
+  if (m > 0)
+    p = bar (n, --m);
+  else
+    p = __builtin_malloc (n);
+
+  return p;
+}
+
+/* { dg-final { scan-ipa-dump "Function foo found to be malloc" "pure-const" } } */
+/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */
Index: changes.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/changes.html,v
retrieving revision 1.15
diff -r1.15 changes.html
37c37,39
<   <li></li>
---
>   <li>The ipa-pure-const pass is extended to propagate malloc attribute, and the
>   corresponding warning option <code>Wsuggest-attribute=malloc</code> emits a
>   diagnostic for a function, which can be annotated with malloc attribute.</li>
Jan Hubicka Oct. 25, 2017, 3:14 p.m. UTC | #26
> On 24 October 2017 at 16:26, Jan Hubicka <hubicka@ucw.cz> wrote:

> >> 2017-10-13  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

> >>

> >>       * cgraph.h (set_malloc_flag): Declare.

> >>       * cgraph.c (set_malloc_flag_1): New function.

> >>       (set_malloc_flag): Likewise.

> >>       * ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee.

> >>       * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to

> >>       false.

> >>       (read_ipa_call_summary): Add support for reading is_return_callee.

> >>       (write_ipa_call_summary): Stream is_return_callee.

> >>       * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary.

> >>       * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, symbol-summary.h,

> >>       ipa-prop.h, ipa-fnsummary.h.

> >>       (pure_const_names): Change to static.

> >>       (malloc_state_e): Define.

> >>       (malloc_state_names): Define.

> >>       (funct_state_d): Add field malloc_state.

> >>       (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM.

> >>       (check_retval_uses): New function.

> >>       (malloc_candidate_p): Likewise.

> >>       (analyze_function): Add support for malloc attribute.

> >>       (pure_const_write_summary): Stream malloc_state.

> >>       (pure_const_read_summary): Add support for reading malloc_state.

> >>       (dump_malloc_lattice): New function.

> >>       (propagate_malloc): New function.

> >>       (ipa_pure_const::execute): Call propagate_malloc and

> >>       ipa_free_fn_summary.

> >>       (pass_local_pure_const::execute): Add support for malloc attribute.

> >>       * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro.

> >>

> >> testsuite/

> >>       * gcc.dg/ipa/propmalloc-1.c: New test-case.

> >>       * gcc.dg/ipa/propmalloc-2.c: Likewise.

> >>       * gcc.dg/ipa/propmalloc-3.c: Likewise.

> >

> > OK.

> > Perhaps we could also add -Wsuggest-sttribute=malloc and mention it in changes.html?

> Done in this version.

> In warn_function_malloc(), I passed false for known_finite param to

> suggest_attribute().

> Does that look OK ?

> Validation in progress. OK to commit if passes ?


OK, thanks!
Honza
Prathamesh Kulkarni Oct. 27, 2017, 10:51 a.m. UTC | #27
On 25 October 2017 at 20:44, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On 24 October 2017 at 16:26, Jan Hubicka <hubicka@ucw.cz> wrote:

>> >> 2017-10-13  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

>> >>

>> >>       * cgraph.h (set_malloc_flag): Declare.

>> >>       * cgraph.c (set_malloc_flag_1): New function.

>> >>       (set_malloc_flag): Likewise.

>> >>       * ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee.

>> >>       * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to

>> >>       false.

>> >>       (read_ipa_call_summary): Add support for reading is_return_callee.

>> >>       (write_ipa_call_summary): Stream is_return_callee.

>> >>       * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary.

>> >>       * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, symbol-summary.h,

>> >>       ipa-prop.h, ipa-fnsummary.h.

>> >>       (pure_const_names): Change to static.

>> >>       (malloc_state_e): Define.

>> >>       (malloc_state_names): Define.

>> >>       (funct_state_d): Add field malloc_state.

>> >>       (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM.

>> >>       (check_retval_uses): New function.

>> >>       (malloc_candidate_p): Likewise.

>> >>       (analyze_function): Add support for malloc attribute.

>> >>       (pure_const_write_summary): Stream malloc_state.

>> >>       (pure_const_read_summary): Add support for reading malloc_state.

>> >>       (dump_malloc_lattice): New function.

>> >>       (propagate_malloc): New function.

>> >>       (ipa_pure_const::execute): Call propagate_malloc and

>> >>       ipa_free_fn_summary.

>> >>       (pass_local_pure_const::execute): Add support for malloc attribute.

>> >>       * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro.

>> >>

>> >> testsuite/

>> >>       * gcc.dg/ipa/propmalloc-1.c: New test-case.

>> >>       * gcc.dg/ipa/propmalloc-2.c: Likewise.

>> >>       * gcc.dg/ipa/propmalloc-3.c: Likewise.

>> >

>> > OK.

>> > Perhaps we could also add -Wsuggest-sttribute=malloc and mention it in changes.html?

>> Done in this version.

>> In warn_function_malloc(), I passed false for known_finite param to

>> suggest_attribute().

>> Does that look OK ?

>> Validation in progress. OK to commit if passes ?

>

> OK, thanks!

Thanks, committed as r254140 after following validation:
1] Bootstrap+test with --enable-languages=all,ada,go on
x86_64-unknown-linux-gnu and ppc64le-linux-gnu.
2] LTO bootstrap+test on x86_64-unknown-linux-gnu and ppc64le-linux-gnu
3] Cross tested on arm*-*-* and aarch64*-*-*.

Would it be a good idea to extend ipa-pure-const to propagate
alloc_size/alloc_align and returns_nonnull attributes ?
Which other attributes would be useful to propagate in ipa-pure-const ?

Also, I would be grateful for suggestions on how to propagate malloc
attribute through indirect calls.
Currently, I have left it as FIXME, and simply drop the lattice to
MALLOC_BOTTOM if there's an indirect call :(

I am not able to understand how attribute propagation across indirect
calls works.
For example consider propagation of nothrow attribute in following test-case:

__attribute__((noinline, noclone, nothrow))
int f1(int k) { return k; }

__attribute__((noinline, noclone))
static int foo(int (*p)(int))
{
  return p(10);
}

__attribute__((noinline, noclone))
int bar(void)
{
  return foo(f1);
}

Shouldn't foo and bar be also marked as nothrow ?
Since foo indirectly calls f1 which is nothrow and bar only calls foo ?

The local-pure-const2 dump shows function is locally throwing  for
"foo" and "bar", and ipa-pure-const dump doesn't appear to show foo and bar
marked as nothrow.

Thanks,
Prathamesh
> Honza
Jan Hubicka Oct. 27, 2017, 12:18 p.m. UTC | #28
Prathamesh
> > OK, thanks!

> Thanks, committed as r254140 after following validation:

> 1] Bootstrap+test with --enable-languages=all,ada,go on

> x86_64-unknown-linux-gnu and ppc64le-linux-gnu.

> 2] LTO bootstrap+test on x86_64-unknown-linux-gnu and ppc64le-linux-gnu

> 3] Cross tested on arm*-*-* and aarch64*-*-*.

> 

> Would it be a good idea to extend ipa-pure-const to propagate

> alloc_size/alloc_align and returns_nonnull attributes ?

> Which other attributes would be useful to propagate in ipa-pure-const ?


Good I did not discourage you by slow review rate (will try to do something
about it).
I guess alloc_size/alloc_align are obvious candidates.

returns_nonnull is a special case of VRP over returns so I would rather
like to see ipa-prop/ipa-cp extended to handle return values and this done
as a consequence.

One interesting case I think we could try to track is what types of exceptions
are thrown.  Currently if function throws specific exception which is handled
by caller, we still think caller may throw because we do not take types into
consideration at all.  I think that may mark significant portion of functions
nothrow as this seems like common coding practice.

It would be also nice to cleanup ipa-pure-const.  I think one can implement
propagation template where one just feeds the basic parameters (what data
to store, what edges to consider) rahter than copying the rather outdated
code again and again.
> 

> Also, I would be grateful for suggestions on how to propagate malloc

> attribute through indirect calls.

> Currently, I have left it as FIXME, and simply drop the lattice to

> MALLOC_BOTTOM if there's an indirect call :(

> 

> I am not able to understand how attribute propagation across indirect

> calls works.

> For example consider propagation of nothrow attribute in following test-case:

> 

> __attribute__((noinline, noclone, nothrow))

> int f1(int k) { return k; }

> 

> __attribute__((noinline, noclone))

> static int foo(int (*p)(int))

> {

>   return p(10);

> }

> 

> __attribute__((noinline, noclone))

> int bar(void)

> {

>   return foo(f1);

> }

> 

> Shouldn't foo and bar be also marked as nothrow ?

> Since foo indirectly calls f1 which is nothrow and bar only calls foo ?


Well, foo may have other uses where it calls something else than f1
so one needs to track "nothrow in the context when f1 is called".

In general I think this reduces to may edges in the callgraph (for given
indirect calls we want to track whether we know complete list of possible
targets).  We do that for polymorphic calls via ipa-polymorphic-call analysis
but I did not get around implementing anything for indirect calls.

To ge the list of targets, you call possible_polymorphic_call_targets which
also tells you whether the list is complete (final flag) or whether there may
be some callees invisible to compiler (such as derrivate types from other
compilation unit).

The lists may be large and for that reason there is cache token which tells
you if you see same list again.

Extending ipa-pure-const to work across final lists of polymorphic targets
may be first step to handle indirect calls in general.

Honza
> 

> The local-pure-const2 dump shows function is locally throwing  for

> "foo" and "bar", and ipa-pure-const dump doesn't appear to show foo and bar

> marked as nothrow.

> 

> Thanks,

> Prathamesh

> > Honza
Jan Hubicka Oct. 27, 2017, 12:27 p.m. UTC | #29
> On 25 October 2017 at 20:44, Jan Hubicka <hubicka@ucw.cz> wrote:

> >> On 24 October 2017 at 16:26, Jan Hubicka <hubicka@ucw.cz> wrote:

> >> >> 2017-10-13  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

> >> >>

> >> >>       * cgraph.h (set_malloc_flag): Declare.

> >> >>       * cgraph.c (set_malloc_flag_1): New function.

> >> >>       (set_malloc_flag): Likewise.

> >> >>       * ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee.

> >> >>       * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to

> >> >>       false.

> >> >>       (read_ipa_call_summary): Add support for reading is_return_callee.

> >> >>       (write_ipa_call_summary): Stream is_return_callee.

> >> >>       * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary.

> >> >>       * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, symbol-summary.h,

> >> >>       ipa-prop.h, ipa-fnsummary.h.

> >> >>       (pure_const_names): Change to static.

> >> >>       (malloc_state_e): Define.

> >> >>       (malloc_state_names): Define.

> >> >>       (funct_state_d): Add field malloc_state.

> >> >>       (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM.

> >> >>       (check_retval_uses): New function.

> >> >>       (malloc_candidate_p): Likewise.

> >> >>       (analyze_function): Add support for malloc attribute.

> >> >>       (pure_const_write_summary): Stream malloc_state.

> >> >>       (pure_const_read_summary): Add support for reading malloc_state.

> >> >>       (dump_malloc_lattice): New function.

> >> >>       (propagate_malloc): New function.

> >> >>       (ipa_pure_const::execute): Call propagate_malloc and

> >> >>       ipa_free_fn_summary.

> >> >>       (pass_local_pure_const::execute): Add support for malloc attribute.

> >> >>       * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro.

> >> >>

> >> >> testsuite/

> >> >>       * gcc.dg/ipa/propmalloc-1.c: New test-case.

> >> >>       * gcc.dg/ipa/propmalloc-2.c: Likewise.

> >> >>       * gcc.dg/ipa/propmalloc-3.c: Likewise.

> >> >

> >> > OK.

> >> > Perhaps we could also add -Wsuggest-sttribute=malloc and mention it in changes.html?

> >> Done in this version.

> >> In warn_function_malloc(), I passed false for known_finite param to

> >> suggest_attribute().

> >> Does that look OK ?

> >> Validation in progress. OK to commit if passes ?

> >

> > OK, thanks!

> Thanks, committed as r254140 after following validation:

> 1] Bootstrap+test with --enable-languages=all,ada,go on

> x86_64-unknown-linux-gnu and ppc64le-linux-gnu.

> 2] LTO bootstrap+test on x86_64-unknown-linux-gnu and ppc64le-linux-gnu

> 3] Cross tested on arm*-*-* and aarch64*-*-*.

> 

> Would it be a good idea to extend ipa-pure-const to propagate

> alloc_size/alloc_align and returns_nonnull attributes ?

> Which other attributes would be useful to propagate in ipa-pure-const ?


Also one extension I was considering was TBAA mod-ref pass. I.e. propagate what
types are read/stored by the call, rather than having just pure/const (no stores,
no reads at all).

This will be bit fun to implement in IPA, but it may be useful.  If you would
be interested in looking into this, we can discuss it (I wanted to implement it
this stage1 but I think I have way too many other plans).

LLVM also has nocapture that seems useful for PTA. Richi may have useful
opinions on that ;)

Honza
Richard Biener Oct. 27, 2017, 12:44 p.m. UTC | #30
On Fri, 27 Oct 2017, Jan Hubicka wrote:

> > On 25 October 2017 at 20:44, Jan Hubicka <hubicka@ucw.cz> wrote:

> > >> On 24 October 2017 at 16:26, Jan Hubicka <hubicka@ucw.cz> wrote:

> > >> >> 2017-10-13  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

> > >> >>

> > >> >>       * cgraph.h (set_malloc_flag): Declare.

> > >> >>       * cgraph.c (set_malloc_flag_1): New function.

> > >> >>       (set_malloc_flag): Likewise.

> > >> >>       * ipa-fnsummary.h (ipa_call_summary): Add new field is_return_callee.

> > >> >>       * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to

> > >> >>       false.

> > >> >>       (read_ipa_call_summary): Add support for reading is_return_callee.

> > >> >>       (write_ipa_call_summary): Stream is_return_callee.

> > >> >>       * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary.

> > >> >>       * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, symbol-summary.h,

> > >> >>       ipa-prop.h, ipa-fnsummary.h.

> > >> >>       (pure_const_names): Change to static.

> > >> >>       (malloc_state_e): Define.

> > >> >>       (malloc_state_names): Define.

> > >> >>       (funct_state_d): Add field malloc_state.

> > >> >>       (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM.

> > >> >>       (check_retval_uses): New function.

> > >> >>       (malloc_candidate_p): Likewise.

> > >> >>       (analyze_function): Add support for malloc attribute.

> > >> >>       (pure_const_write_summary): Stream malloc_state.

> > >> >>       (pure_const_read_summary): Add support for reading malloc_state.

> > >> >>       (dump_malloc_lattice): New function.

> > >> >>       (propagate_malloc): New function.

> > >> >>       (ipa_pure_const::execute): Call propagate_malloc and

> > >> >>       ipa_free_fn_summary.

> > >> >>       (pass_local_pure_const::execute): Add support for malloc attribute.

> > >> >>       * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro.

> > >> >>

> > >> >> testsuite/

> > >> >>       * gcc.dg/ipa/propmalloc-1.c: New test-case.

> > >> >>       * gcc.dg/ipa/propmalloc-2.c: Likewise.

> > >> >>       * gcc.dg/ipa/propmalloc-3.c: Likewise.

> > >> >

> > >> > OK.

> > >> > Perhaps we could also add -Wsuggest-sttribute=malloc and mention it in changes.html?

> > >> Done in this version.

> > >> In warn_function_malloc(), I passed false for known_finite param to

> > >> suggest_attribute().

> > >> Does that look OK ?

> > >> Validation in progress. OK to commit if passes ?

> > >

> > > OK, thanks!

> > Thanks, committed as r254140 after following validation:

> > 1] Bootstrap+test with --enable-languages=all,ada,go on

> > x86_64-unknown-linux-gnu and ppc64le-linux-gnu.

> > 2] LTO bootstrap+test on x86_64-unknown-linux-gnu and ppc64le-linux-gnu

> > 3] Cross tested on arm*-*-* and aarch64*-*-*.

> > 

> > Would it be a good idea to extend ipa-pure-const to propagate

> > alloc_size/alloc_align and returns_nonnull attributes ?

> > Which other attributes would be useful to propagate in ipa-pure-const ?

> 

> Also one extension I was considering was TBAA mod-ref pass. I.e. propagate what

> types are read/stored by the call, rather than having just pure/const (no stores,

> no reads at all).

> 

> This will be bit fun to implement in IPA, but it may be useful.  If you would

> be interested in looking into this, we can discuss it (I wanted to implement it

> this stage1 but I think I have way too many other plans).

> 

> LLVM also has nocapture that seems useful for PTA. Richi may have useful

> opinions on that ;)


I once tried to prototype "fn spec" attribute autodetection and
IPA propagation (also into IPA pure const).  Didn't get very far
though.  That tracks per function argument properties like whether
memory is written to through this pointer or whether a pointer
possibly escapes from the function (including through its returned
value).

Richard.
diff mbox

Patch

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index e505b10e211..eab8216100e 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2492,6 +2492,61 @@  cgraph_node::set_nothrow_flag (bool nothrow)
   return changed;
 }
 
+/* Worker to set malloc flag.  */
+static void
+set_malloc_flag_1 (cgraph_node *node, bool malloc_p, bool non_call,
+		    bool *changed)
+{
+  cgraph_edge *e;
+
+  if (malloc_p && !DECL_IS_MALLOC (node->decl))
+    {
+      /* With non-call exceptions we can't say for sure if other function
+ 	 body was not possibly optimized to still throw.  */
+      if (!non_call || node->binds_to_current_def_p ())
+	{
+	  DECL_IS_MALLOC (node->decl) = true;
+	  *changed = true;
+	}
+    }
+  else if (!malloc_p && DECL_IS_MALLOC (node->decl))
+    {
+      DECL_IS_MALLOC (node->decl) = false;
+      *changed = true;
+    }
+
+  ipa_ref *ref;
+  FOR_EACH_ALIAS (node, ref)
+    {
+      cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);
+      if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)
+	set_malloc_flag_1 (alias, malloc_p, non_call, changed);
+    }
+}
+/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any.  */
+
+bool
+cgraph_node::set_malloc_flag (bool malloc_p)
+{
+  bool changed = false;
+  bool non_call = opt_for_fn (decl, flag_non_call_exceptions);
+
+  if (!malloc_p || get_availability () > AVAIL_INTERPOSABLE)
+    set_malloc_flag_1 (this, malloc_p, non_call, &changed);
+  else
+    {
+      ipa_ref *ref;
+
+      FOR_EACH_ALIAS (this, ref)
+	{
+	  cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);
+	  if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)
+	    set_malloc_flag_1 (alias, malloc_p, non_call, &changed);
+	}
+    }
+  return changed;
+}
+
 /* Worker to set_const_flag.  */
 
 static void
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index be4eaee71e2..6739bb3094d 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1116,6 +1116,10 @@  public:
      if any to NOTHROW.  */
   bool set_nothrow_flag (bool nothrow);
 
+  /* SET DECL_IS_MALLOC on cgraph_node's decl and on aliases of the node
+     if any.  */
+  bool set_malloc_flag (bool malloc_p);
+
   /* If SET_CONST is true, mark function, aliases and thunks to be ECF_CONST.
     If SET_CONST if false, clear the flag.
 
diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index e457166ea39..724c26e03f6 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -56,6 +56,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "intl.h"
 #include "opts.h"
+#include "ssa.h"
 
 /* Lattice values for const and pure functions.  Everything starts out
    being const, then may drop to pure and then neither depending on
@@ -69,6 +70,15 @@  enum pure_const_state_e
 
 const char *pure_const_names[3] = {"const", "pure", "neither"};
 
+enum malloc_state_e
+{
+  PURE_CONST_MALLOC_TOP,
+  PURE_CONST_MALLOC,
+  PURE_CONST_MALLOC_BOTTOM
+};
+
+const char *malloc_state_names[] = {"malloc_top", "malloc", "malloc_bottom"};
+
 /* Holder for the const_state.  There is one of these per function
    decl.  */
 struct funct_state_d
@@ -92,11 +102,13 @@  struct funct_state_d
   /* If function can call free, munmap or otherwise make previously
      non-trapping memory accesses trapping.  */
   bool can_free;
+
+  enum malloc_state_e malloc_state;
 };
 
 /* State used when we know nothing about function.  */
 static struct funct_state_d varying_state
-   = { IPA_NEITHER, IPA_NEITHER, true, true, true, true };
+   = { IPA_NEITHER, IPA_NEITHER, true, true, true, true, PURE_CONST_MALLOC_BOTTOM };
 
 
 typedef struct funct_state_d * funct_state;
@@ -109,6 +121,10 @@  typedef struct funct_state_d * funct_state;
 
 static vec<funct_state> funct_state_vec;
 
+/* A map from node to subset of callees. The subset contains those callees
+ * whose return-value is returned by the node. */
+static hash_map< cgraph_node *, vec<cgraph_node *>* > *return_callees_map;
+
 static bool gate_pure_const (void);
 
 namespace {
@@ -812,6 +828,124 @@  check_stmt (gimple_stmt_iterator *gsip, funct_state local, bool ipa)
     }
 }
 
+static bool
+check_retval_uses (tree retval, gimple *stmt)
+{
+  imm_use_iterator use_iter;
+  gimple *use_stmt;
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, retval)
+    if (gcond *cond = dyn_cast<gcond *> (use_stmt))
+      {
+	enum tree_code code = gimple_cond_code (cond);
+	if (TREE_CODE_CLASS (code) != tcc_comparison)
+	  RETURN_FROM_IMM_USE_STMT (use_iter, false);
+      }
+    else if (gassign *ga = dyn_cast<gassign *> (use_stmt))
+      {
+	enum tree_code code = gimple_assign_rhs_code (ga);
+	if (TREE_CODE_CLASS (code) != tcc_comparison)
+	  RETURN_FROM_IMM_USE_STMT (use_iter, false);
+      }
+    else if (use_stmt != stmt)
+      RETURN_FROM_IMM_USE_STMT (use_iter, false);
+
+  return true;
+}
+
+/*
+ * Currently this function does a very conservative analysis to check if
+ * function could be a malloc candidate.
+ *
+ * The function is considered to be a candidate if
+ * 1) The function returns a value of pointer type.
+ * 2) SSA_NAME_DEF_STMT (return_value) is either a function call or
+ *    a phi, and SSA_NAME_DEF_STMT(each element of phi) is function call.
+ * 3) The return-value has immediate uses only within comparisons (gcond or gassign)
+ *    and return_stmt (and likewise a phi arg has immediate use only within comparison
+ *    or the phi stmt).
+ */
+
+static bool
+malloc_candidate_p (function *fun, vec<cgraph_node *>& callees)
+{
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fun)
+    {
+      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+	   !gsi_end_p (gsi);
+	   gsi_next (&gsi))
+	{
+	  greturn *ret_stmt = dyn_cast<greturn *> (gsi_stmt (gsi));
+	  if (ret_stmt)
+	    {
+	      tree retval = gimple_return_retval (ret_stmt);
+	      if (!retval)
+		return false;
+
+	      if (TREE_CODE (retval) != SSA_NAME
+		  || TREE_CODE (TREE_TYPE (retval)) != POINTER_TYPE)
+		return false;
+
+	      if (!check_retval_uses (retval, ret_stmt))
+		return false;
+
+	      gimple *def = SSA_NAME_DEF_STMT (retval);
+	      if (gcall *call_stmt = dyn_cast<gcall *> (def))
+		{
+		  tree callee_decl = gimple_call_fndecl (call_stmt);
+		  /* FIXME: In case of indirect call, callee_decl might be NULL
+ 		     Not sure what to do in that case, punting for now.  */
+		  if (!callee_decl)
+		    return false;
+		  cgraph_node *callee = cgraph_node::get_create (callee_decl);
+		  callees.safe_push (callee);
+		}
+	      else if (gphi *phi = dyn_cast<gphi *> (def))
+		for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+		  {
+		    tree arg = gimple_phi_arg_def (phi, i);
+		    if (TREE_CODE (arg) != SSA_NAME)
+		      return false;
+		    if (!check_retval_uses (arg, phi))
+		      return false;
+
+		    gimple *arg_def = SSA_NAME_DEF_STMT (arg);
+		    gcall *call_stmt = dyn_cast<gcall *> (arg_def);
+		    if (!call_stmt)
+		      return false;
+		    tree callee_decl = gimple_call_fndecl (call_stmt);
+		    if (!callee_decl)
+		      return false;
+		    cgraph_node *callee = cgraph_node::get_create (callee_decl);
+		    callees.safe_push (callee);
+		  }
+	      else
+		return false;
+	    }
+	}
+    }
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "\nFound %s to be candidate for malloc attribute\n",
+	     IDENTIFIER_POINTER (DECL_NAME (fun->decl)));
+  return true;
+}
+
+
+static void
+free_return_callees_map (void)
+{
+  for (hash_map<cgraph_node *, vec<cgraph_node *>*>::iterator it = return_callees_map->begin ();
+       it != return_callees_map->end ();
+       ++it)
+    {
+      std::pair<cgraph_node *, vec<cgraph_node *>*> p = *it;
+      p.second->release ();
+      delete p.second;
+    }
+  delete return_callees_map;
+}
 
 /* This is the main routine for finding the reference patterns for
    global variables within a function FN.  */
@@ -921,6 +1055,23 @@  end:
   if (TREE_NOTHROW (decl))
     l->can_throw = false;
 
+  if (ipa)
+    {
+      vec<cgraph_node *> v = vNULL;
+      l->malloc_state = PURE_CONST_MALLOC_BOTTOM;
+      if (DECL_IS_MALLOC (decl))
+	l->malloc_state = PURE_CONST_MALLOC;
+      else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v))
+	{
+	  l->malloc_state = PURE_CONST_MALLOC_TOP;
+	  vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL);
+	  for (unsigned i = 0; i < v.length (); ++i)
+	    callees_p->safe_push (v[i]);
+	  return_callees_map->put (fn, callees_p);
+	}
+      v.release ();
+    }
+
   pop_cfun ();
   if (dump_file)
     {
@@ -1003,6 +1154,7 @@  pure_const_generate_summary (void)
 
   pass_ipa_pure_const *pass = static_cast <pass_ipa_pure_const *> (current_pass);
   pass->register_hooks ();
+  return_callees_map = new hash_map<cgraph_node *, vec<cgraph_node *>*>;
 
   /* Process all of the functions.
 
@@ -1067,11 +1219,30 @@  pure_const_write_summary (void)
 	  bp_pack_value (&bp, fs->looping, 1);
 	  bp_pack_value (&bp, fs->can_throw, 1);
 	  bp_pack_value (&bp, fs->can_free, 1);
+	  bp_pack_value (&bp, fs->malloc_state, 2);
 	  streamer_write_bitpack (&bp);
+
+	  vec<cgraph_node *> **callees_pp = return_callees_map->get (node);
+	  if (callees_pp)
+	    {
+	      vec<cgraph_node *> callees = **callees_pp;
+	      streamer_write_uhwi_stream (ob->main_stream, callees.length ());
+	      for (unsigned i = 0; i < callees.length (); ++i)
+		{
+		  int callee_ref = lto_symtab_encoder_encode (encoder, callees[i]);
+		  streamer_write_uhwi_stream (ob->main_stream, callee_ref);
+		}
+	    }
+	  else
+	    streamer_write_uhwi_stream (ob->main_stream, 0);
 	}
     }
 
   lto_destroy_simple_output_block (ob);
+
+  // FIXME: Do not free if -ffat-lto-objects is enabled.
+  if (!global_options.x_flag_fat_lto_objects)
+    free_return_callees_map ();
 }
 
 
@@ -1087,6 +1258,8 @@  pure_const_read_summary (void)
   pass_ipa_pure_const *pass = static_cast <pass_ipa_pure_const *> (current_pass);
   pass->register_hooks ();
 
+  return_callees_map = new hash_map<cgraph_node *, vec<cgraph_node *> *>;
+
   while ((file_data = file_data_vec[j++]))
     {
       const char *data;
@@ -1127,6 +1300,22 @@  pure_const_read_summary (void)
 	      fs->looping = bp_unpack_value (&bp, 1);
 	      fs->can_throw = bp_unpack_value (&bp, 1);
 	      fs->can_free = bp_unpack_value (&bp, 1);
+	      fs->malloc_state
+			= (enum malloc_state_e) bp_unpack_value (&bp, 2);
+
+	      unsigned len = streamer_read_uhwi (ib);
+	      if (len)
+		{
+		  vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL);
+		  for (unsigned i = 0; i < len; i++)
+		    {
+		      int callee_index = streamer_read_uhwi (ib);
+		      cgraph_node *callee = dyn_cast<cgraph_node *> (lto_symtab_encoder_deref (encoder, callee_index));
+		      callees_p->safe_push (callee);
+		    }
+		  return_callees_map->put (node, callees_p);
+		}
+
 	      if (dump_file)
 		{
 		  int flags = flags_from_decl_or_type (node->decl);
@@ -1151,6 +1340,8 @@  pure_const_read_summary (void)
 		    fprintf (dump_file,"  function is locally throwing\n");
 		  if (fs->can_free)
 		    fprintf (dump_file,"  function can locally free\n");
+		  fprintf (dump_file, "\n malloc state: %s\n",
+			   malloc_state_names[fs->malloc_state]);
 		}
 	    }
 
@@ -1664,6 +1855,121 @@  propagate_nothrow (void)
   free (order);
 }
 
+DEBUG_FUNCTION
+static void
+dump_malloc_lattice (FILE *dump_file, const char *s)
+{
+  if (!dump_file)
+    return;
+
+  fprintf (dump_file, "\n\nMALLOC LATTICE %s:\n", s);
+  cgraph_node *node;
+  FOR_EACH_FUNCTION (node)
+    {
+      funct_state fs = get_function_state (node);
+      malloc_state_e state = fs->malloc_state;
+      fprintf (dump_file, "%s: %s\n", node->name (), malloc_state_names[state]);
+    }
+}
+
+static void
+propagate_malloc (void)
+{
+  cgraph_node *node;
+  FOR_EACH_FUNCTION (node)
+    {
+      if (DECL_IS_MALLOC (node->decl))
+	if (!has_function_state (node))
+	  {
+	    funct_state l = XCNEW (struct funct_state_d);
+	    *l = varying_state;
+	    l->malloc_state = PURE_CONST_MALLOC;
+	    set_function_state (node, l);
+	  }
+    }
+
+  dump_malloc_lattice (dump_file, "Initial");
+  struct cgraph_node **order
+    = XNEWVEC (struct cgraph_node *, symtab->cgraph_count);
+  int order_pos = ipa_reverse_postorder (order);
+  bool changed = true;
+
+  while (changed)
+    {
+      changed = false;
+      /* Walk in postorder.  */
+      for (int i = order_pos - 1; i >= 0; --i)
+	{
+	  cgraph_node *node = order[i];
+	  if (node->alias
+	      || !node->definition
+	      || !has_function_state (node))
+	    continue;
+
+	  funct_state l = get_function_state (node);
+
+	  /* FIXME: add support for indirect-calls.  */
+	  if (node->indirect_calls)
+	    {
+	      l->malloc_state = PURE_CONST_MALLOC_BOTTOM;
+	      continue;
+	    }
+
+	  if (node->get_availability () <= AVAIL_INTERPOSABLE)
+	    {
+	      l->malloc_state = PURE_CONST_MALLOC_BOTTOM;
+	      continue;
+	    }
+
+	  if (l->malloc_state == PURE_CONST_MALLOC_BOTTOM)
+	    continue;
+
+	  vec<cgraph_node *> **callees_pp = return_callees_map->get (node);
+	  if (!callees_pp)
+	    continue;
+
+	  vec<cgraph_node *> callees = **callees_pp;
+	  malloc_state_e new_state = l->malloc_state;
+	  for (unsigned j = 0; j < callees.length (); j++)
+	    {
+	      cgraph_node *callee = callees[j];
+	      if (!has_function_state (callee))
+		{
+		  new_state = PURE_CONST_MALLOC_BOTTOM;
+		  break;
+		}
+	      malloc_state_e callee_state = get_function_state (callee)->malloc_state;
+	      if (new_state < callee_state)
+		new_state = callee_state;
+	    }
+	  if (new_state != l->malloc_state)
+	    {
+	      changed = true;
+	      l->malloc_state = new_state;
+	    }
+	}
+    }
+
+  FOR_EACH_DEFINED_FUNCTION (node)
+    if (has_function_state (node))
+      {
+	funct_state l = get_function_state (node);
+	if (!node->alias
+	    && l->malloc_state == PURE_CONST_MALLOC
+	    && !node->global.inlined_to)
+	  {
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      fprintf (dump_file, "Function %s found to be malloc\n",
+		       node->name ());
+	    node->set_malloc_flag (true);
+	  }
+      }
+
+  dump_malloc_lattice (dump_file, "after propagation");
+  ipa_free_postorder_info ();
+  free (order);
+  free_return_callees_map ();
+}
 
 /* Produce the global information by preforming a transitive closure
    on the local information that was produced by generate_summary.  */
@@ -1682,6 +1988,7 @@  execute (function *)
   /* Nothrow makes more function to not lead to return and improve
      later analysis.  */
   propagate_nothrow ();
+  propagate_malloc ();
   remove_p = propagate_pure_const ();
 
   /* Cleanup. */
diff --git a/gcc/ssa-iterators.h b/gcc/ssa-iterators.h
index c8aa77bd4f3..740cbf13cb2 100644
--- a/gcc/ssa-iterators.h
+++ b/gcc/ssa-iterators.h
@@ -93,6 +93,12 @@  struct imm_use_iterator
      break;							\
    }
 
+/* Similarly for return.  */
+#define RETURN_FROM_IMM_USE_STMT(ITER, VAL)			\
+  {								\
+    end_imm_use_stmt_traverse (&(ITER));			\
+    return (VAL);						\
+  }
 
 /* Use this iterator in combination with FOR_EACH_IMM_USE_STMT to
    get access to each occurrence of ssavar on the stmt returned by
diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c
new file mode 100644
index 00000000000..9a95f817079
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c
@@ -0,0 +1,21 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */
+
+__attribute__((noinline, no_icf, used))
+static void *f(__SIZE_TYPE__ n)
+{
+  void *p = __builtin_malloc (n);
+  if (p == 0)
+    __builtin_abort ();
+  return p;
+}
+
+__attribute__((noinline, no_icf, used))
+static void *bar(__SIZE_TYPE__ n)
+{
+  void *p = f (n);
+  return p;
+}
+
+/* { dg-final { scan-ipa-dump "Function f found to be malloc" "pure-const" } } */
+/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */
diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c
new file mode 100644
index 00000000000..95b2fd74a7a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c
@@ -0,0 +1,23 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */
+
+__attribute__((noinline, used, no_icf))
+static void *foo (__SIZE_TYPE__ n)
+{
+  return __builtin_malloc (n * 10);
+}
+
+__attribute__((noinline, used, no_icf))
+static void *bar(__SIZE_TYPE__ n, int cond)
+{
+  void *p;
+  if (cond)
+    p = foo (n);
+  else
+    p = __builtin_malloc (n);
+
+  return p;
+}
+
+/* { dg-final { scan-ipa-dump "Function foo found to be malloc" "pure-const" } } */
+/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */
diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c
new file mode 100644
index 00000000000..13558ddd07d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c
@@ -0,0 +1,24 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */
+
+static void *foo(__SIZE_TYPE__, int) __attribute__((noinline, no_icf, used));
+
+__attribute__((noinline, used, no_icf))
+static void *bar(__SIZE_TYPE__ n, int m)
+{
+  return foo (n, m);
+}
+
+static void *foo(__SIZE_TYPE__ n, int m)
+{
+  void *p;
+  if (m > 0)
+    p = bar (n, --m);
+  else
+    p = __builtin_malloc (n);
+
+  return p;
+}
+
+/* { dg-final { scan-ipa-dump "Function foo found to be malloc" "pure-const" } } */
+/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */