[RFC] propagate malloc attribute in ipa-pure-const pass

Message ID CAAgBjMkaTCAvwO6AVKu5C6_-TqpuQtTcs0EqYHqogDhru3juYg@mail.gmail.com
State New
Headers show

Commit Message

Prathamesh Kulkarni May 15, 2017, 10:39 a.m.
Hi,
I have attached a bare-bones prototype patch that propagates malloc attribute in
ipa-pure-const. As far as I understand, from the doc a function could
be annotated
with malloc attribute if it returns a pointer to a newly allocated
memory block (uninitialized or zeroed) and the pointer does not alias
any other valid pointer ?

* Analysis
The analysis used by the patch in malloc_candidate_p is too conservative,
and I would be grateful for suggestions on improving it.
Currently it only checks if:
1) The function returns a value of pointer type.
2) SSA_NAME_DEF_STMT (return_value) is either a function call or a phi, and
    SSA_NAME_DEF_STMT(each element of phi) is function call.
3) The return-value has immediate uses only within comparisons (gcond
or gassign) and return_stmt (and likewise a phi arg has immediate use only
within comparison or the phi stmt).

The intent is that malloc_candidate_p(node) returns true if node
returns the return value of it's callee, and the return value is only
used for comparisons within node.
Then I assume it's safe (although conservative) to decide that node
could possibly be a malloc function if callee is found to be malloc ?

for eg:
void *f(size_t n)
{
  void *p = __builtin_malloc (n);
  if (p == 0)
    __builtin_abort ();
  return p;
}

malloc_candidate_p would determine f to be a candidate for malloc
attribute since it returns the return-value of it's callee
(__builtin_malloc) and the return value is used only within comparison
and return_stmt.

However due to the imprecise analysis it misses this case:
char  **g(size_t n)
{
  char **p = malloc (sizeof(char *) * 10);
  for (int i = 0; i < 10; i++)
    p[i] = malloc (sizeof(char) * n);
  return p;
}
I suppose g() could be annotated with malloc here ?

The patch uses return_calles_map which is a hash_map<node, callees> such
that the return value of node is return value of one of these callees.
For above eg it would be <f, [__builtin_malloc]>
The analysis phase populates return_callees_map, and the propagation
phase uses it to take the "meet" of callees.

* LTO and memory management
This is a general question about LTO and memory management.
IIUC the following sequence takes place during normal LTO:
LGEN: generate_summary, write_summary
WPA: read_summary, execute ipa passes, write_opt_summary

So I assumed it was OK in LGEN to allocate return_callees_map in
generate_summary and free it in write_summary and during WPA, allocate
return_callees_map in read_summary and free it after execute (since
write_opt_summary does not require return_callees_map).

However with fat LTO, it seems the sequence changes for LGEN with
execute phase takes place after write_summary. However since
return_callees_map is freed in pure_const_write_summary and
propagate_malloc() accesses it in execute stage, it results in
segmentation fault.

To work around this, I am using the following hack in pure_const_write_summary:
// FIXME: Do not free if -ffat-lto-objects is enabled.
if (!global_options.x_flag_fat_lto_objects)
  free_return_callees_map ();
Is there a better approach for handling this ?

Also I am not sure if I have written cgraph_node::set_malloc_flag[_1] correctly.
I tried to imitate cgraph_node::set_const_flag.

The patch passes bootstrap+test on x86_64 and found a few functions in
the source tree (attached func_names.txt) that could be annotated with
malloc (I gave a brief look at some of the functions and didn't appear
to be false positives but I will recheck thoroughly)

Does the patch look in the right direction ?
I would be grateful for suggestions on improving it.

Thanks,
Prathamesh
virtual char* libcp1::compiler::find(std::__cxx11::string&) const
gomp_malloc
virtual char* libcc1::compiler::find(std::__cxx11::string&) const
void* GTM::xrealloc(void*, size_t, bool)
concat
char* gen_internal_sym(const char*)
void* ira_allocate(size_t)
char* gen_fake_label()
char* selftest::locate_file(const char*)
const char* find_plugindir_spec_function(int, const char**)
reconcat
xvasprintf
char* rtx_reader::read_until(const char*, bool)
_Tp* __gnu_cxx::__detail::__mini_vector<_Tp>::allocate(__gnu_cxx::__detail::__mini_vector<_Tp>::size_type) [with _Tp = long unsigned int*]
xstrndup
const char* replace_extension_spec_func(int, const char**)
void* GTM::xcalloc(size_t, bool)
_Tp* __gnu_cxx::__detail::__mini_vector<_Tp>::allocate(__gnu_cxx::__detail::__mini_vector<_Tp>::size_type) [with _Tp = unsigned int*]
xstrdup
void* GTM::xmalloc(size_t, bool)
void* base_pool_allocator<TBlockAllocator>::allocate() [with TBlockAllocator = memory_block_pool]
char* ix86_offload_options()
basic_block_def* alloc_block()
xmemdup
char* build_message_string(const char*, ...)
make_relative_prefix
gomp_malloc_cleared
make_relative_prefix_ignore_links
choose_temp_base
make_temp_file
xasprintf
char* file_name_as_prefix(diagnostic_context*, const char*)
void* yyalloc(yy_size_t)
void* ggc_internal_cleared_alloc(size_t, void (*)(void*), size_t, size_t)
void* __cilkrts_hyperobject_alloc(void*, std::size_t)
void* alloc_for_identifier_to_locale(size_t)
const char* op_error_string(const char*, int, bool)
host_alloc
void* stringpool_ggc_alloc(size_t)
fibheap_new
deps* deps_init()
C_alloca
char* lto_get_section_name(int, const char*, lto_file_decl_data*)
void* lto_zalloc(void*, unsigned int, unsigned int)
_Tp* __gnu_cxx::__detail::__mini_vector<_Tp>::allocate(__gnu_cxx::__detail::__mini_vector<_Tp>::size_type) [with _Tp = std::pair<__gnu_cxx::bitmap_allocator<wchar_t>::_Alloc_block*, __gnu_cxx::bitmap_allocator<wchar_t>::_Alloc_block*>]
void* ggc_internal_alloc(size_t, void (*)(void*), size_t, size_t)
void* ggc_splay_alloc(int, void*)
zcalloc
dupargv
_Tp* __gnu_cxx::__detail::__mini_vector<_Tp>::allocate(__gnu_cxx::__detail::__mini_vector<_Tp>::size_type) [with _Tp = std::pair<__gnu_cxx::bitmap_allocator<char>::_Alloc_block*, __gnu_cxx::bitmap_allocator<char>::_Alloc_block*>]
lrealpath
void* mempool_obstack_chunk_alloc(size_t)
char* describe_cache(cache_desc, cache_desc)
xcalloc
int* ira_allocate_cost_vector(reg_class_t)
buildargv
char* diagnostic_get_location_text(diagnostic_context*, expanded_location)
splay_tree_xmalloc_allocate
lto_in_decl_state* lto_new_in_decl_state()
char* affix_data_type(const char*)
ggc_pch_data* init_ggc_pch()
xmalloc
char* diagnostic_build_prefix(diagnostic_context*, const diagnostic_info*)

Comments

Jeff Law May 16, 2017, 1:12 a.m. | #1
On 05/15/2017 04:39 AM, Prathamesh Kulkarni wrote:
> Hi,

> I have attached a bare-bones prototype patch that propagates malloc attribute in

> ipa-pure-const. As far as I understand, from the doc a function could

> be annotated

> with malloc attribute if it returns a pointer to a newly allocated

> memory block (uninitialized or zeroed) and the pointer does not alias

> any other valid pointer ?

> 

> * Analysis

> The analysis used by the patch in malloc_candidate_p is too conservative,

> and I would be grateful for suggestions on improving it.

> Currently it only checks if:

> 1) The function returns a value of pointer type.

> 2) SSA_NAME_DEF_STMT (return_value) is either a function call or a phi, and

>      SSA_NAME_DEF_STMT(each element of phi) is function call.

> 3) The return-value has immediate uses only within comparisons (gcond

> or gassign) and return_stmt (and likewise a phi arg has immediate use only

> within comparison or the phi stmt).

ISTM the return value *must* be either the result of a call to a 
function with the malloc attribute or NULL.  It can't be a mix of 
malloc'd objects or something else (such as a passed-in object).

> 

> malloc_candidate_p would determine f to be a candidate for malloc

> attribute since it returns the return-value of it's callee

> (__builtin_malloc) and the return value is used only within comparison

> and return_stmt.

> 

> However due to the imprecise analysis it misses this case:

> char  **g(size_t n)

> {

>    char **p = malloc (sizeof(char *) * 10);

>    for (int i = 0; i < 10; i++)

>      p[i] = malloc (sizeof(char) * n);

>    return p;

> }

> I suppose g() could be annotated with malloc here ?

I think that's up to us to decide.  So the question becomes what makes 
the most sense from a user and optimization standpoint.


I would start relatively conservatively and look to add further analysis 
to detect more malloc functions over time.  You've already identified 
comparisons as valid uses, but ISTM the pointer could be passed as an 
argument, stored into memory and all kinds of other things.

You'll probably want instrumentation to log cases where something looked 
like a potential candidate, but had to be rejected for some reason.

Jeff
Richard Biener May 16, 2017, 11:41 a.m. | #2
On Mon, 15 May 2017, Prathamesh Kulkarni wrote:

> Hi,

> I have attached a bare-bones prototype patch that propagates malloc attribute in

> ipa-pure-const. As far as I understand, from the doc a function could

> be annotated

> with malloc attribute if it returns a pointer to a newly allocated

> memory block (uninitialized or zeroed) and the pointer does not alias

> any other valid pointer ?

> 

> * Analysis

> The analysis used by the patch in malloc_candidate_p is too conservative,

> and I would be grateful for suggestions on improving it.

> Currently it only checks if:

> 1) The function returns a value of pointer type.

> 2) SSA_NAME_DEF_STMT (return_value) is either a function call or a phi, and

>     SSA_NAME_DEF_STMT(each element of phi) is function call.

> 3) The return-value has immediate uses only within comparisons (gcond

> or gassign) and return_stmt (and likewise a phi arg has immediate use only

> within comparison or the phi stmt).

>

> The intent is that malloc_candidate_p(node) returns true if node

> returns the return value of it's callee, and the return value is only

> used for comparisons within node.

> Then I assume it's safe (although conservative) to decide that node

> could possibly be a malloc function if callee is found to be malloc ?

> 

> for eg:

> void *f(size_t n)

> {

>   void *p = __builtin_malloc (n);

>   if (p == 0)

>     __builtin_abort ();

>   return p;

> }

> 

> malloc_candidate_p would determine f to be a candidate for malloc

> attribute since it returns the return-value of it's callee

> (__builtin_malloc) and the return value is used only within comparison

> and return_stmt.

> 

> However due to the imprecise analysis it misses this case:

> char  **g(size_t n)

> {

>   char **p = malloc (sizeof(char *) * 10);

>   for (int i = 0; i < 10; i++)

>     p[i] = malloc (sizeof(char) * n);

>   return p;

> }

> I suppose g() could be annotated with malloc here ?


It cannot because what p points to is initialized.  If you do then

 char **x = g(10);
 **x = 1;

will be optimized away.  Now what would be interesting is to do
"poor mans IPA PTA", namely identify functions that are a) small,
b) likely return a newly allocated pointer.  At PTA time then
"inline" all such wrappers, but only for the PTA constraints.

That will give more precise points-to results (and can handle more
cases, esp. initialization) than just annotating functions with 'malloc'.

+      /* With non-call exceptions we can't say for sure if other function
+        body was not possibly optimized to still throw.  */
+      if (!non_call || node->binds_to_current_def_p ())
+       {
+         DECL_IS_MALLOC (node->decl) = true;
+         *changed = true;

I don't see how malloc throwing would be an issue.

+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, retval)
+    if (gcond *cond = dyn_cast<gcond *> (use_stmt))
+      {
+       enum tree_code code = gimple_cond_code (cond);
+       if (TREE_CODE_CLASS (code) != tcc_comparison)

always false

+         RETURN_FROM_IMM_USE_STMT (use_iter, false);

I think you want to disallow eq/ne_expr against sth not integer_zerop.

+    else if (gassign *ga = dyn_cast<gassign *> (use_stmt))
+      {
+       enum tree_code code = gimple_assign_rhs_code (ga);
+       if (TREE_CODE_CLASS (code) != tcc_comparison)
+         RETURN_FROM_IMM_USE_STMT (use_iter, false);

likewise.

+static bool
+malloc_candidate_p (function *fun, vec<cgraph_node *>& callees)
+{
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fun)
+    {
+      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+          !gsi_end_p (gsi);
+          gsi_next (&gsi))
+       {
+         greturn *ret_stmt = dyn_cast<greturn *> (gsi_stmt (gsi));
+         if (ret_stmt)

I think you want do do this much cheaper by only walking the last
stmt of predecessor(s) of EXIT_BLOCK_FOR_FN.  (s) because
we have multiple return stmts for

int foo (int i)
{
  if (i)
    return;
  else
    return i;
}

but all return stmts will be the last stmt in one of the exit block
predecessors.

+             if (!check_retval_uses (retval, ret_stmt))
+               return false;
+
+             gimple *def = SSA_NAME_DEF_STMT (retval);
+             if (gcall *call_stmt = dyn_cast<gcall *> (def))
+               {
+                 tree callee_decl = gimple_call_fndecl (call_stmt);
+                 /* FIXME: In case of indirect call, callee_decl might be 
NULL
+                    Not sure what to do in that case, punting for now.  
*/
+                 if (!callee_decl)
+                   return false;
+                 cgraph_node *callee = cgraph_node::get_create 
(callee_decl);
+                 callees.safe_push (callee);
+               }
+             else if (gphi *phi = dyn_cast<gphi *> (def))
+               for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+                 {
+                   tree arg = gimple_phi_arg_def (phi, i);
+                   if (TREE_CODE (arg) != SSA_NAME)
+                     return false;
+                   if (!check_retval_uses (arg, phi))
+                     return false;
+

I think you should refactor this to a separate function and handle
recursion.  For recursion you miss simple SSA name assigns.

For PHI args you want to allow literal zero as well (for
-fdelete-null-pointer-checks).

> The patch uses return_calles_map which is a hash_map<node, callees> such

> that the return value of node is return value of one of these callees.

> For above eg it would be <f, [__builtin_malloc]>

> The analysis phase populates return_callees_map, and the propagation

> phase uses it to take the "meet" of callees.


I think you are supposed to treat functions optimistically as "malloc"
(use the DECL_IS_MALLOC flag) and during propagation revisit that
change by propagating across callgraph edges.  callgraph edges that are
relevant (calls feeding a pointer return value) should then be
marked somehow (set a flag you stream).

This also means that indirect calls should be only handled during
propagation.

Not sure if we have an example of a "edge encoder".  Honza?

Bonus points for auto-detecting alloc_size and alloc_align attributes
and warning with -Wsuggest-attribute=malloc{_size,_align,}.

> * LTO and memory management

> This is a general question about LTO and memory management.

> IIUC the following sequence takes place during normal LTO:

> LGEN: generate_summary, write_summary

> WPA: read_summary, execute ipa passes, write_opt_summary

> 

> So I assumed it was OK in LGEN to allocate return_callees_map in

> generate_summary and free it in write_summary and during WPA, allocate

> return_callees_map in read_summary and free it after execute (since

> write_opt_summary does not require return_callees_map).

> 

> However with fat LTO, it seems the sequence changes for LGEN with

> execute phase takes place after write_summary. However since

> return_callees_map is freed in pure_const_write_summary and

> propagate_malloc() accesses it in execute stage, it results in

> segmentation fault.

> 

> To work around this, I am using the following hack in pure_const_write_summary:

> // FIXME: Do not free if -ffat-lto-objects is enabled.

> if (!global_options.x_flag_fat_lto_objects)

>   free_return_callees_map ();

> Is there a better approach for handling this ?

> 

> Also I am not sure if I have written cgraph_node::set_malloc_flag[_1] correctly.

> I tried to imitate cgraph_node::set_const_flag.

> 

> The patch passes bootstrap+test on x86_64 and found a few functions in

> the source tree (attached func_names.txt) that could be annotated with

> malloc (I gave a brief look at some of the functions and didn't appear

> to be false positives but I will recheck thoroughly)

> 

> Does the patch look in the right direction ?

> I would be grateful for suggestions on improving it.

> 

> Thanks,

> Prathamesh

> 


-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Martin Sebor May 17, 2017, 9:01 p.m. | #3
> The patch passes bootstrap+test on x86_64 and found a few functions in

> the source tree (attached func_names.txt) that could be annotated with

> malloc (I gave a brief look at some of the functions and didn't appear

> to be false positives but I will recheck thoroughly)


virtual char* libcp1::compiler::find(std::__cxx11::string&) const

The virtual on the list of your candidates gave me pause.  Consider
this completely contrived example:

   struct B {
     virtual void* f (unsigned n) {
       return new char [n];
     }
   };

   void* foo (B &b, unsigned n)
   {
     return b.f (n);
   }

Based on these definitions alone both functions are candidates
for attribute malloc.

But suppose foo is called with an object of a type derived from
B that overrides f() to do something wacky (but strictly not
invalid) like:

   struct D: B {
     char buf[32];
     virtual void* f (unsigned n) {
       if (n < 32)
       return n <= 32 ? buf : B::f (n);
     }

Breaking foo's attribute malloc constraint.

In other words, I think virtual functions need to be excluded
from the list (unless they're defined in a class marked final,
or unless we know they're not overridden to break the constraint
like above).

Martin
Richard Biener May 18, 2017, 7:06 a.m. | #4
On Wed, 17 May 2017, Martin Sebor wrote:

> > The patch passes bootstrap+test on x86_64 and found a few functions in

> > the source tree (attached func_names.txt) that could be annotated with

> > malloc (I gave a brief look at some of the functions and didn't appear

> > to be false positives but I will recheck thoroughly)

> 

> virtual char* libcp1::compiler::find(std::__cxx11::string&) const

> 

> The virtual on the list of your candidates gave me pause.  Consider

> this completely contrived example:

> 

>   struct B {

>     virtual void* f (unsigned n) {

>       return new char [n];

>     }

>   };

> 

>   void* foo (B &b, unsigned n)

>   {

>     return b.f (n);

>   }

> 

> Based on these definitions alone both functions are candidates

> for attribute malloc.

> 

> But suppose foo is called with an object of a type derived from

> B that overrides f() to do something wacky (but strictly not

> invalid) like:

> 

>   struct D: B {

>     char buf[32];

>     virtual void* f (unsigned n) {

>       if (n < 32)

>       return n <= 32 ? buf : B::f (n);

>     }

> 

> Breaking foo's attribute malloc constraint.

> 

> In other words, I think virtual functions need to be excluded

> from the list (unless they're defined in a class marked final,

> or unless we know they're not overridden to break the constraint

> like above).


But we are annotating the actual decl, not the type in the class
struct.

Richard.
Jan Hubicka May 19, 2017, 1:06 p.m. | #5
> >   struct D: B {

> >     char buf[32];

> >     virtual void* f (unsigned n) {

> >       if (n < 32)

> >       return n <= 32 ? buf : B::f (n);

> >     }

> > 

> > Breaking foo's attribute malloc constraint.

> > 

> > In other words, I think virtual functions need to be excluded

> > from the list (unless they're defined in a class marked final,

> > or unless we know they're not overridden to break the constraint

> > like above).

> 

> But we are annotating the actual decl, not the type in the class

> struct.


Yep and we will be able to use the information in case we devirtualize
the call.  So this seems OK to me.

Honza
> 

> Richard.
Jan Hubicka May 19, 2017, 1:32 p.m. | #6
> 

> * LTO and memory management

> This is a general question about LTO and memory management.

> IIUC the following sequence takes place during normal LTO:

> LGEN: generate_summary, write_summary

> WPA: read_summary, execute ipa passes, write_opt_summary

> 

> So I assumed it was OK in LGEN to allocate return_callees_map in

> generate_summary and free it in write_summary and during WPA, allocate

> return_callees_map in read_summary and free it after execute (since

> write_opt_summary does not require return_callees_map).

> 

> However with fat LTO, it seems the sequence changes for LGEN with

> execute phase takes place after write_summary. However since

> return_callees_map is freed in pure_const_write_summary and

> propagate_malloc() accesses it in execute stage, it results in

> segmentation fault.

> 

> To work around this, I am using the following hack in pure_const_write_summary:

> // FIXME: Do not free if -ffat-lto-objects is enabled.

> if (!global_options.x_flag_fat_lto_objects)

>   free_return_callees_map ();

> Is there a better approach for handling this ?


I think most passes just do not free summaries with -flto.  We probably want
to fix it to make it possible to compile multiple units i.e. from plugin by
adding release_summaries method...
So I would say it is OK to do the same as others do and leak it with -flto.
> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c

> index e457166ea39..724c26e03f6 100644

> --- a/gcc/ipa-pure-const.c

> +++ b/gcc/ipa-pure-const.c

> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see

>  #include "tree-scalar-evolution.h"

>  #include "intl.h"

>  #include "opts.h"

> +#include "ssa.h"

>  

>  /* Lattice values for const and pure functions.  Everything starts out

>     being const, then may drop to pure and then neither depending on

> @@ -69,6 +70,15 @@ enum pure_const_state_e

>  

>  const char *pure_const_names[3] = {"const", "pure", "neither"};

>  

> +enum malloc_state_e

> +{

> +  PURE_CONST_MALLOC_TOP,

> +  PURE_CONST_MALLOC,

> +  PURE_CONST_MALLOC_BOTTOM

> +};


It took me a while to work out what PURE_CONST means here :)
I would just call it something like STATE_MALLOC_TOP... or so.
ipa_pure_const is outdated name from the time pass was doing only
those two.
> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state;

>  

>  static vec<funct_state> funct_state_vec;

>  

> +/* A map from node to subset of callees. The subset contains those callees

> + * whose return-value is returned by the node. */

> +static hash_map< cgraph_node *, vec<cgraph_node *>* > *return_callees_map;

> +


Hehe, a special case of return jump function.  We ought to support those more generally.
How do you keep it up to date over callgraph changes?
> @@ -921,6 +1055,23 @@ end:

>    if (TREE_NOTHROW (decl))

>      l->can_throw = false;

>  

> +  if (ipa)

> +    {

> +      vec<cgraph_node *> v = vNULL;

> +      l->malloc_state = PURE_CONST_MALLOC_BOTTOM;

> +      if (DECL_IS_MALLOC (decl))

> +	l->malloc_state = PURE_CONST_MALLOC;

> +      else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v))

> +	{

> +	  l->malloc_state = PURE_CONST_MALLOC_TOP;

> +	  vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL);

> +	  for (unsigned i = 0; i < v.length (); ++i)

> +	    callees_p->safe_push (v[i]);

> +	  return_callees_map->put (fn, callees_p);

> +	}

> +      v.release ();

> +    }

> +


I would do non-ipa variant, too.  I think most attributes can be detected that way
as well.

The patch generally makes sense to me.  It would be nice to make it easier to write such
a basic propagators across callgraph (perhaps adding a template doing the basic
propagation logic). Also I think you need to solve the problem with keeping your
summaries up to date across callgraph node removal and duplications.

Honza
Prathamesh Kulkarni May 23, 2017, 1:40 p.m. | #7
On 19 May 2017 at 19:02, Jan Hubicka <hubicka@ucw.cz> wrote:
>>

>> * LTO and memory management

>> This is a general question about LTO and memory management.

>> IIUC the following sequence takes place during normal LTO:

>> LGEN: generate_summary, write_summary

>> WPA: read_summary, execute ipa passes, write_opt_summary

>>

>> So I assumed it was OK in LGEN to allocate return_callees_map in

>> generate_summary and free it in write_summary and during WPA, allocate

>> return_callees_map in read_summary and free it after execute (since

>> write_opt_summary does not require return_callees_map).

>>

>> However with fat LTO, it seems the sequence changes for LGEN with

>> execute phase takes place after write_summary. However since

>> return_callees_map is freed in pure_const_write_summary and

>> propagate_malloc() accesses it in execute stage, it results in

>> segmentation fault.

>>

>> To work around this, I am using the following hack in pure_const_write_summary:

>> // FIXME: Do not free if -ffat-lto-objects is enabled.

>> if (!global_options.x_flag_fat_lto_objects)

>>   free_return_callees_map ();

>> Is there a better approach for handling this ?

>

> I think most passes just do not free summaries with -flto.  We probably want

> to fix it to make it possible to compile multiple units i.e. from plugin by

> adding release_summaries method...

> So I would say it is OK to do the same as others do and leak it with -flto.

>> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c

>> index e457166ea39..724c26e03f6 100644

>> --- a/gcc/ipa-pure-const.c

>> +++ b/gcc/ipa-pure-const.c

>> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see

>>  #include "tree-scalar-evolution.h"

>>  #include "intl.h"

>>  #include "opts.h"

>> +#include "ssa.h"

>>

>>  /* Lattice values for const and pure functions.  Everything starts out

>>     being const, then may drop to pure and then neither depending on

>> @@ -69,6 +70,15 @@ enum pure_const_state_e

>>

>>  const char *pure_const_names[3] = {"const", "pure", "neither"};

>>

>> +enum malloc_state_e

>> +{

>> +  PURE_CONST_MALLOC_TOP,

>> +  PURE_CONST_MALLOC,

>> +  PURE_CONST_MALLOC_BOTTOM

>> +};

>

> It took me a while to work out what PURE_CONST means here :)

> I would just call it something like STATE_MALLOC_TOP... or so.

> ipa_pure_const is outdated name from the time pass was doing only

> those two.

>> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state;

>>

>>  static vec<funct_state> funct_state_vec;

>>

>> +/* A map from node to subset of callees. The subset contains those callees

>> + * whose return-value is returned by the node. */

>> +static hash_map< cgraph_node *, vec<cgraph_node *>* > *return_callees_map;

>> +

>

> Hehe, a special case of return jump function.  We ought to support those more generally.

> How do you keep it up to date over callgraph changes?

>> @@ -921,6 +1055,23 @@ end:

>>    if (TREE_NOTHROW (decl))

>>      l->can_throw = false;

>>

>> +  if (ipa)

>> +    {

>> +      vec<cgraph_node *> v = vNULL;

>> +      l->malloc_state = PURE_CONST_MALLOC_BOTTOM;

>> +      if (DECL_IS_MALLOC (decl))

>> +     l->malloc_state = PURE_CONST_MALLOC;

>> +      else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v))

>> +     {

>> +       l->malloc_state = PURE_CONST_MALLOC_TOP;

>> +       vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL);

>> +       for (unsigned i = 0; i < v.length (); ++i)

>> +         callees_p->safe_push (v[i]);

>> +       return_callees_map->put (fn, callees_p);

>> +     }

>> +      v.release ();

>> +    }

>> +

>

> I would do non-ipa variant, too.  I think most attributes can be detected that way

> as well.

>

> The patch generally makes sense to me.  It would be nice to make it easier to write such

> a basic propagators across callgraph (perhaps adding a template doing the basic

> propagation logic). Also I think you need to solve the problem with keeping your

> summaries up to date across callgraph node removal and duplications.

Thanks for the suggestions, I will try to address them in a follow-up patch.
IIUC, I would need to modify ipa-pure-const cgraph hooks -
add_new_function, remove_node_data, duplicate_node_data
to keep return_callees_map up-to-date across callgraph node insertions
and removal ?

Also, if instead of having a separate data-structure like return_callees_map,
should we rather have a flag within cgraph_edge, which marks that the
caller may return the value of the callee ?

Thanks,
Prathamesh
>

> Honza
Prathamesh Kulkarni July 31, 2017, 6:23 p.m. | #8
On 23 May 2017 at 19:10, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
> On 19 May 2017 at 19:02, Jan Hubicka <hubicka@ucw.cz> wrote:

>>>

>>> * LTO and memory management

>>> This is a general question about LTO and memory management.

>>> IIUC the following sequence takes place during normal LTO:

>>> LGEN: generate_summary, write_summary

>>> WPA: read_summary, execute ipa passes, write_opt_summary

>>>

>>> So I assumed it was OK in LGEN to allocate return_callees_map in

>>> generate_summary and free it in write_summary and during WPA, allocate

>>> return_callees_map in read_summary and free it after execute (since

>>> write_opt_summary does not require return_callees_map).

>>>

>>> However with fat LTO, it seems the sequence changes for LGEN with

>>> execute phase takes place after write_summary. However since

>>> return_callees_map is freed in pure_const_write_summary and

>>> propagate_malloc() accesses it in execute stage, it results in

>>> segmentation fault.

>>>

>>> To work around this, I am using the following hack in pure_const_write_summary:

>>> // FIXME: Do not free if -ffat-lto-objects is enabled.

>>> if (!global_options.x_flag_fat_lto_objects)

>>>   free_return_callees_map ();

>>> Is there a better approach for handling this ?

>>

>> I think most passes just do not free summaries with -flto.  We probably want

>> to fix it to make it possible to compile multiple units i.e. from plugin by

>> adding release_summaries method...

>> So I would say it is OK to do the same as others do and leak it with -flto.

>>> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c

>>> index e457166ea39..724c26e03f6 100644

>>> --- a/gcc/ipa-pure-const.c

>>> +++ b/gcc/ipa-pure-const.c

>>> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see

>>>  #include "tree-scalar-evolution.h"

>>>  #include "intl.h"

>>>  #include "opts.h"

>>> +#include "ssa.h"

>>>

>>>  /* Lattice values for const and pure functions.  Everything starts out

>>>     being const, then may drop to pure and then neither depending on

>>> @@ -69,6 +70,15 @@ enum pure_const_state_e

>>>

>>>  const char *pure_const_names[3] = {"const", "pure", "neither"};

>>>

>>> +enum malloc_state_e

>>> +{

>>> +  PURE_CONST_MALLOC_TOP,

>>> +  PURE_CONST_MALLOC,

>>> +  PURE_CONST_MALLOC_BOTTOM

>>> +};

>>

>> It took me a while to work out what PURE_CONST means here :)

>> I would just call it something like STATE_MALLOC_TOP... or so.

>> ipa_pure_const is outdated name from the time pass was doing only

>> those two.

>>> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state;

>>>

>>>  static vec<funct_state> funct_state_vec;

>>>

>>> +/* A map from node to subset of callees. The subset contains those callees

>>> + * whose return-value is returned by the node. */

>>> +static hash_map< cgraph_node *, vec<cgraph_node *>* > *return_callees_map;

>>> +

>>

>> Hehe, a special case of return jump function.  We ought to support those more generally.

>> How do you keep it up to date over callgraph changes?

>>> @@ -921,6 +1055,23 @@ end:

>>>    if (TREE_NOTHROW (decl))

>>>      l->can_throw = false;

>>>

>>> +  if (ipa)

>>> +    {

>>> +      vec<cgraph_node *> v = vNULL;

>>> +      l->malloc_state = PURE_CONST_MALLOC_BOTTOM;

>>> +      if (DECL_IS_MALLOC (decl))

>>> +     l->malloc_state = PURE_CONST_MALLOC;

>>> +      else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v))

>>> +     {

>>> +       l->malloc_state = PURE_CONST_MALLOC_TOP;

>>> +       vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL);

>>> +       for (unsigned i = 0; i < v.length (); ++i)

>>> +         callees_p->safe_push (v[i]);

>>> +       return_callees_map->put (fn, callees_p);

>>> +     }

>>> +      v.release ();

>>> +    }

>>> +

>>

>> I would do non-ipa variant, too.  I think most attributes can be detected that way

>> as well.

>>

>> The patch generally makes sense to me.  It would be nice to make it easier to write such

>> a basic propagators across callgraph (perhaps adding a template doing the basic

>> propagation logic). Also I think you need to solve the problem with keeping your

>> summaries up to date across callgraph node removal and duplications.

> Thanks for the suggestions, I will try to address them in a follow-up patch.

> IIUC, I would need to modify ipa-pure-const cgraph hooks -

> add_new_function, remove_node_data, duplicate_node_data

> to keep return_callees_map up-to-date across callgraph node insertions

> and removal ?

>

> Also, if instead of having a separate data-structure like return_callees_map,

> should we rather have a flag within cgraph_edge, which marks that the

> caller may return the value of the callee ?

Hi,
Sorry for the very late response. I have attached an updated version
of the prototype patch,
which adds a non-ipa variant, and keeps return_callees_map up-to-date
across callgraph
node insertions and removal. For the non-ipa variant,
malloc_candidate_p() additionally checks
that all the "return callees" have DECL_IS_MALLOC set to true.
Bootstrapped+tested and LTO bootstrapped+tested on x86_64-unknown-linux-gnu.
Does it look OK so far ?

Um sorry for this silly question, but I don't really understand how
does indirect call propagation
work in ipa-pure-const ? For example consider propagation of nothrow
attribute in following
test-case:

__attribute__((noinline, noclone, nothrow))
int f1(int k) { return k; }

__attribute__((noinline, noclone))
static int foo(int (*p)(int))
{
  return p(10);
}

__attribute__((noinline, noclone))
int bar(void)
{
  return foo(f1);
}

Shouldn't foo and bar be also marked as nothrow ?
Since foo indirectly calls f1 which is nothrow and bar only calls foo ?
The local-pure-const2 dump shows function is locally throwing  for
"foo" and "bar".

Um, I was wondering how to get "points-to" analysis for function-pointers,
to get list of callees that may be indirectly called from that
function pointer ?
In the patch I just set node to bottom if it contains indirect calls
which is far from ideal :(
I would be much grateful for suggestions on how to handle indirect calls.
Thanks!

Regards,
Prathamesh
>

> Thanks,

> Prathamesh

>>

>> Honzadiff --git a/gcc/cgraph.c b/gcc/cgraph.c
index d7c9ba61795..8ab701a5a10 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2518,6 +2518,53 @@ cgraph_node::set_nothrow_flag (bool nothrow)
   return changed;
 }
 
+/* Worker to set malloc flag.  */
+static void
+set_malloc_flag_1 (cgraph_node *node, bool malloc_p, bool *changed)
+{
+  if (malloc_p && !DECL_IS_MALLOC (node->decl))
+    {
+      DECL_IS_MALLOC (node->decl) = true;
+      *changed = true;
+    }
+
+  ipa_ref *ref;
+  FOR_EACH_ALIAS (node, ref)
+    {
+      cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);
+      if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)
+	set_malloc_flag_1 (alias, malloc_p, changed);
+    }
+
+  for (cgraph_edge *e = node->callers; e; e = e->next_caller)
+    if (e->caller->thunk.thunk_p
+	&& (!malloc_p || e->caller->get_availability () > AVAIL_INTERPOSABLE))
+      set_malloc_flag_1 (e->caller, malloc_p, changed);
+}
+
+/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any.  */
+
+bool
+cgraph_node::set_malloc_flag (bool malloc_p)
+{
+  bool changed = false;
+
+  if (!malloc_p || get_availability () > AVAIL_INTERPOSABLE)
+    set_malloc_flag_1 (this, malloc_p, &changed);
+  else
+    {
+      ipa_ref *ref;
+
+      FOR_EACH_ALIAS (this, ref)
+	{
+	  cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);
+	  if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)
+	    set_malloc_flag_1 (alias, malloc_p, &changed);
+	}
+    }
+  return changed;
+}
+
 /* Worker to set_const_flag.  */
 
 static void
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 57cdaa45681..3bcfdf5b893 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1120,6 +1120,10 @@ public:
      if any to NOTHROW.  */
   bool set_nothrow_flag (bool nothrow);
 
+  /* SET DECL_IS_MALLOC on cgraph_node's decl and on aliases of the node
+     if any.  */
+  bool set_malloc_flag (bool malloc_p);
+
   /* If SET_CONST is true, mark function, aliases and thunks to be ECF_CONST.
     If SET_CONST if false, clear the flag.
 
diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index dac8f0d5f21..70da28ac11e 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "intl.h"
 #include "opts.h"
+#include "ssa.h"
 
 /* Lattice values for const and pure functions.  Everything starts out
    being const, then may drop to pure and then neither depending on
@@ -69,6 +70,15 @@ enum pure_const_state_e
 
 const char *pure_const_names[3] = {"const", "pure", "neither"};
 
+enum malloc_state_e
+{
+  STATE_MALLOC_TOP,
+  STATE_MALLOC,
+  STATE_MALLOC_BOTTOM
+};
+
+const char *malloc_state_names[] = {"malloc_top", "malloc", "malloc_bottom"};
+
 /* Holder for the const_state.  There is one of these per function
    decl.  */
 struct funct_state_d
@@ -92,11 +102,13 @@ struct funct_state_d
   /* If function can call free, munmap or otherwise make previously
      non-trapping memory accesses trapping.  */
   bool can_free;
+
+  enum malloc_state_e malloc_state;
 };
 
 /* State used when we know nothing about function.  */
 static struct funct_state_d varying_state
-   = { IPA_NEITHER, IPA_NEITHER, true, true, true, true };
+   = { IPA_NEITHER, IPA_NEITHER, true, true, true, true, STATE_MALLOC_BOTTOM };
 
 
 typedef struct funct_state_d * funct_state;
@@ -109,6 +121,46 @@ typedef struct funct_state_d * funct_state;
 
 static vec<funct_state> funct_state_vec;
 
+/* A map from node to subset of callees. The subset contains those callees
+ * whose return-value is returned by the node. */
+static hash_map< cgraph_node *, vec<cgraph_node *>* > *return_callees_map;
+
+static void
+return_callees_map_remove (cgraph_node *node)
+{
+  vec<cgraph_node *> **callees_pp = return_callees_map->get (node);
+  if (callees_pp)
+    {
+      vec<cgraph_node *> *callees = *callees_pp;
+      callees->release ();
+      return_callees_map->remove (node);
+    }
+}
+
+static void
+return_callees_map_clone (cgraph_node *src, cgraph_node *dst)
+{
+  vec<cgraph_node *> **callees_pp = return_callees_map->get (src);
+  if (callees_pp)
+    {
+      vec<cgraph_node *> *src_callees = *callees_pp;
+      vec<cgraph_node *> *dst_callees = new vec<cgraph_node *> (vNULL);
+      for (unsigned i = 0; i < src_callees->length (); ++i)
+	dst_callees->safe_push ((*src_callees)[i]);
+      return_callees_map->put (dst, dst_callees);
+    }
+}
+
+static void
+return_callees_map_free (void)
+{
+  for (hash_map<cgraph_node *, vec<cgraph_node *>*>::iterator it = return_callees_map->begin ();
+       it != return_callees_map->end ();
+       ++it)
+    return_callees_map_remove ((*it).first);
+  delete return_callees_map;
+}
+
 static bool gate_pure_const (void);
 
 namespace {
@@ -812,6 +864,128 @@ check_stmt (gimple_stmt_iterator *gsip, funct_state local, bool ipa)
     }
 }
 
+static bool
+check_retval_uses (tree retval, gimple *stmt)
+{
+  imm_use_iterator use_iter;
+  gimple *use_stmt;
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, retval)
+    if (gcond *cond = dyn_cast<gcond *> (use_stmt))
+      {
+	tree op2 = gimple_cond_rhs (cond);
+	if (!integer_zerop (op2))
+	  RETURN_FROM_IMM_USE_STMT (use_iter, false);
+      }
+    else if (gassign *ga = dyn_cast<gassign *> (use_stmt))
+      {
+	enum tree_code code = gimple_assign_rhs_code (ga);
+	if (TREE_CODE_CLASS (code) != tcc_comparison)
+	  RETURN_FROM_IMM_USE_STMT (use_iter, false);
+	if (!integer_zerop (gimple_assign_rhs2 (ga)))
+	  RETURN_FROM_IMM_USE_STMT (use_iter, false);
+      }
+    else if (is_gimple_debug (use_stmt))
+      ;
+    else if (use_stmt != stmt)
+      RETURN_FROM_IMM_USE_STMT (use_iter, false);
+
+  return true;
+}
+
+/*
+ * Currently this function does a very conservative analysis to check if
+ * function could be a malloc candidate.
+ *
+ * The function is considered to be a candidate if
+ * 1) The function returns a value of pointer type.
+ * 2) SSA_NAME_DEF_STMT (return_value) is either a function call or
+ *    a phi, and element of phi is either NULL or
+ *    SSA_NAME_DEF_STMT(element) is function call.
+ * 3) The return-value has immediate uses only within comparisons (gcond or gassign)
+ *    and return_stmt (and likewise a phi arg has immediate use only within comparison
+ *    or the phi stmt).
+ */
+
+static bool
+malloc_candidate_p (function *fun, vec<cgraph_node *>* callees,
+		    bool ipa)
+{
+  basic_block exit_block = EXIT_BLOCK_PTR_FOR_FN (fun);
+  edge e;
+  edge_iterator ei;
+
+  if (EDGE_COUNT (exit_block->preds) == 0)
+    return false;
+
+  FOR_EACH_EDGE (e, ei, exit_block->preds)
+    {
+      gimple_stmt_iterator gsi = gsi_last_bb (e->src);
+      greturn *ret_stmt = dyn_cast<greturn *> (gsi_stmt (gsi));
+
+      if (!ret_stmt)
+	return false;
+
+      tree retval = gimple_return_retval (ret_stmt);
+      if (!retval)
+	return false;
+
+      if (TREE_CODE (retval) != SSA_NAME
+	  || TREE_CODE (TREE_TYPE (retval)) != POINTER_TYPE)
+	return false;
+
+      if (!check_retval_uses (retval, ret_stmt))
+	return false;
+
+      gimple *def = SSA_NAME_DEF_STMT (retval);
+      if (gcall *call_stmt = dyn_cast<gcall *> (def))
+	{
+	  tree callee_decl = gimple_call_fndecl (call_stmt);
+	  if (!callee_decl)
+	    return false;
+
+	  cgraph_node *callee = cgraph_node::get_create (callee_decl);
+	  if (!ipa && !DECL_IS_MALLOC (callee->decl))
+	    return false;
+
+	  if (callees)
+	    callees->safe_push (callee);
+	}
+
+      else if (gphi *phi = dyn_cast<gphi *> (def))
+	for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+	  {
+	    tree arg = gimple_phi_arg_def (phi, i);
+	    if (TREE_CODE (arg) != SSA_NAME)
+	      return false;
+	    if (!(arg == null_pointer_node || check_retval_uses (arg, phi)))
+	      return false;
+
+	    gimple *arg_def = SSA_NAME_DEF_STMT (arg);
+	    gcall *call_stmt = dyn_cast<gcall *> (arg_def);
+	    if (!call_stmt)
+	      return false;
+	    tree callee_decl = gimple_call_fndecl (call_stmt);
+	    if (!callee_decl)
+	      return false;
+	    if (!ipa && !DECL_IS_MALLOC (callee_decl))
+	      return false;
+	    cgraph_node *callee = cgraph_node::get_create (callee_decl);
+
+	    if (callees)
+	      callees->safe_push (callee);
+	  }
+
+      else
+	return false;
+    }
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "\nFound %s to be candidate for malloc attribute\n",
+	     IDENTIFIER_POINTER (DECL_NAME (fun->decl)));
+  return true;
+}
+
 
 /* This is the main routine for finding the reference patterns for
    global variables within a function FN.  */
@@ -921,6 +1095,25 @@ end:
   if (TREE_NOTHROW (decl))
     l->can_throw = false;
 
+  l->malloc_state = STATE_MALLOC_BOTTOM;
+  if (DECL_IS_MALLOC (decl))
+    l->malloc_state = STATE_MALLOC;
+  else if (ipa)
+    {
+      vec<cgraph_node *> v = vNULL;
+      if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), &v, true))
+	{
+	  l->malloc_state = STATE_MALLOC_TOP;
+	  vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL);
+	  for (unsigned i = 0; i < v.length (); ++i)
+	    callees_p->safe_push (v[i]);
+	  return_callees_map->put (fn, callees_p);
+	}
+      v.release ();
+    }
+  else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), NULL, false))
+    l->malloc_state = STATE_MALLOC;
+
   pop_cfun ();
   if (dump_file)
     {
@@ -934,6 +1127,8 @@ end:
         fprintf (dump_file, "Function is locally pure.\n");
       if (l->can_free)
 	fprintf (dump_file, "Function can locally free.\n");
+      if (l->malloc_state == STATE_MALLOC)
+	fprintf (dump_file, "Function is locally malloc.\n");
     }
   return l;
 }
@@ -962,6 +1157,7 @@ duplicate_node_data (struct cgraph_node *src, struct cgraph_node *dst,
       gcc_assert (!has_function_state (dst));
       memcpy (l, get_function_state (src), sizeof (*l));
       set_function_state (dst, l);
+      return_callees_map_clone (src, dst);
     }
 }
 
@@ -971,7 +1167,10 @@ static void
 remove_node_data (struct cgraph_node *node, void *data ATTRIBUTE_UNUSED)
 {
   if (has_function_state (node))
-    set_function_state (node, NULL);
+    {
+      set_function_state (node, NULL);
+      return_callees_map_remove (node);
+    }
 }
 
 
@@ -1003,6 +1202,7 @@ pure_const_generate_summary (void)
 
   pass_ipa_pure_const *pass = static_cast <pass_ipa_pure_const *> (current_pass);
   pass->register_hooks ();
+  return_callees_map = new hash_map<cgraph_node *, vec<cgraph_node *>*>;
 
   /* Process all of the functions.
 
@@ -1067,7 +1267,22 @@ pure_const_write_summary (void)
 	  bp_pack_value (&bp, fs->looping, 1);
 	  bp_pack_value (&bp, fs->can_throw, 1);
 	  bp_pack_value (&bp, fs->can_free, 1);
+	  bp_pack_value (&bp, fs->malloc_state, 2);
 	  streamer_write_bitpack (&bp);
+
+	  vec<cgraph_node *> **callees_pp = return_callees_map->get (node);
+	  if (callees_pp)
+	    {
+	      vec<cgraph_node *> callees = **callees_pp;
+	      streamer_write_uhwi_stream (ob->main_stream, callees.length ());
+	      for (unsigned i = 0; i < callees.length (); ++i)
+		{
+		  int callee_ref = lto_symtab_encoder_encode (encoder, callees[i]);
+		  streamer_write_uhwi_stream (ob->main_stream, callee_ref);
+		}
+	    }
+	  else
+	    streamer_write_uhwi_stream (ob->main_stream, 0);
 	}
     }
 
@@ -1087,6 +1302,8 @@ pure_const_read_summary (void)
   pass_ipa_pure_const *pass = static_cast <pass_ipa_pure_const *> (current_pass);
   pass->register_hooks ();
 
+  return_callees_map = new hash_map<cgraph_node *, vec<cgraph_node *> *>;
+
   while ((file_data = file_data_vec[j++]))
     {
       const char *data;
@@ -1127,6 +1344,22 @@ pure_const_read_summary (void)
 	      fs->looping = bp_unpack_value (&bp, 1);
 	      fs->can_throw = bp_unpack_value (&bp, 1);
 	      fs->can_free = bp_unpack_value (&bp, 1);
+	      fs->malloc_state
+			= (enum malloc_state_e) bp_unpack_value (&bp, 2);
+
+	      unsigned len = streamer_read_uhwi (ib);
+	      if (len)
+		{
+		  vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL);
+		  for (unsigned i = 0; i < len; i++)
+		    {
+		      int callee_index = streamer_read_uhwi (ib);
+		      cgraph_node *callee = dyn_cast<cgraph_node *> (lto_symtab_encoder_deref (encoder, callee_index));
+		      callees_p->safe_push (callee);
+		    }
+		  return_callees_map->put (node, callees_p);
+		}
+
 	      if (dump_file)
 		{
 		  int flags = flags_from_decl_or_type (node->decl);
@@ -1149,6 +1382,8 @@ pure_const_read_summary (void)
 		    fprintf (dump_file,"  function is locally throwing\n");
 		  if (fs->can_free)
 		    fprintf (dump_file,"  function can locally free\n");
+		  fprintf (dump_file, "\n malloc state: %s\n",
+			   malloc_state_names[fs->malloc_state]);
 		}
 	    }
 
@@ -1659,6 +1894,121 @@ propagate_nothrow (void)
   free (order);
 }
 
+DEBUG_FUNCTION
+static void
+dump_malloc_lattice (FILE *dump_file, const char *s)
+{
+  if (!dump_file)
+    return;
+
+  fprintf (dump_file, "\n\nMALLOC LATTICE %s:\n", s);
+  cgraph_node *node;
+  FOR_EACH_FUNCTION (node)
+    {
+      funct_state fs = get_function_state (node);
+      malloc_state_e state = fs->malloc_state;
+      fprintf (dump_file, "%s: %s\n", node->name (), malloc_state_names[state]);
+    }
+}
+
+static void
+propagate_malloc (void)
+{
+  cgraph_node *node;
+  FOR_EACH_FUNCTION (node)
+    {
+      if (DECL_IS_MALLOC (node->decl))
+	if (!has_function_state (node))
+	  {
+	    funct_state l = XCNEW (struct funct_state_d);
+	    *l = varying_state;
+	    l->malloc_state = STATE_MALLOC;
+	    set_function_state (node, l);
+	  }
+    }
+
+  dump_malloc_lattice (dump_file, "Initial");
+  struct cgraph_node **order
+    = XNEWVEC (struct cgraph_node *, symtab->cgraph_count);
+  int order_pos = ipa_reverse_postorder (order);
+  bool changed = true;
+
+  while (changed)
+    {
+      changed = false;
+      /* Walk in postorder.  */
+      for (int i = order_pos - 1; i >= 0; --i)
+	{
+	  cgraph_node *node = order[i];
+	  if (node->alias
+	      || !node->definition
+	      || !has_function_state (node))
+	    continue;
+
+	  funct_state l = get_function_state (node);
+
+	  /* FIXME: add support for indirect-calls.  */
+	  if (node->indirect_calls)
+	    {
+	      l->malloc_state = STATE_MALLOC_BOTTOM;
+	      continue;
+	    }
+
+	  if (node->get_availability () <= AVAIL_INTERPOSABLE)
+	    {
+	      l->malloc_state = STATE_MALLOC_BOTTOM;
+	      continue;
+	    }
+
+	  if (l->malloc_state == STATE_MALLOC_BOTTOM)
+	    continue;
+
+	  vec<cgraph_node *> **callees_pp = return_callees_map->get (node);
+	  if (!callees_pp)
+	    continue;
+
+	  vec<cgraph_node *> callees = **callees_pp;
+	  malloc_state_e new_state = l->malloc_state;
+	  for (unsigned j = 0; j < callees.length (); j++)
+	    {
+	      cgraph_node *callee = callees[j];
+	      if (!has_function_state (callee))
+		{
+		  new_state = STATE_MALLOC_BOTTOM;
+		  break;
+		}
+	      malloc_state_e callee_state = get_function_state (callee)->malloc_state;
+	      if (new_state < callee_state)
+		new_state = callee_state;
+	    }
+	  if (new_state != l->malloc_state)
+	    {
+	      changed = true;
+	      l->malloc_state = new_state;
+	    }
+	}
+    }
+
+  FOR_EACH_DEFINED_FUNCTION (node)
+    if (has_function_state (node))
+      {
+	funct_state l = get_function_state (node);
+	if (!node->alias
+	    && l->malloc_state == STATE_MALLOC
+	    && !node->global.inlined_to)
+	  {
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      fprintf (dump_file, "Function %s found to be malloc\n",
+		       node->name ());
+	    node->set_malloc_flag (true);
+	  }
+      }
+
+  dump_malloc_lattice (dump_file, "after propagation");
+  ipa_free_postorder_info ();
+  free (order);
+  return_callees_map_free ();
+}
 
 /* Produce the global information by preforming a transitive closure
    on the local information that was produced by generate_summary.  */
@@ -1677,6 +2027,7 @@ execute (function *)
   /* Nothrow makes more function to not lead to return and improve
      later analysis.  */
   propagate_nothrow ();
+  propagate_malloc ();
   remove_p = propagate_pure_const ();
 
   /* Cleanup. */
@@ -1877,6 +2228,17 @@ pass_local_pure_const::execute (function *fun)
 	fprintf (dump_file, "Function found to be nothrow: %s\n",
 		 current_function_name ());
     }
+
+  if (l->malloc_state == STATE_MALLOC
+      && !DECL_IS_MALLOC (current_function_decl))
+    {
+      node->set_malloc_flag (true);
+      changed = true;
+      if (dump_file)
+	fprintf (dump_file, "Function found to be malloc: %s\n",
+		 node->name ());
+    }
+
   free (l);
   if (changed)
     return execute_fixup_cfg ();
diff --git a/gcc/ssa-iterators.h b/gcc/ssa-iterators.h
index c8aa77bd4f3..740cbf13cb2 100644
--- a/gcc/ssa-iterators.h
+++ b/gcc/ssa-iterators.h
@@ -93,6 +93,12 @@ struct imm_use_iterator
      break;							\
    }
 
+/* Similarly for return.  */
+#define RETURN_FROM_IMM_USE_STMT(ITER, VAL)			\
+  {								\
+    end_imm_use_stmt_traverse (&(ITER));			\
+    return (VAL);						\
+  }
 
 /* Use this iterator in combination with FOR_EACH_IMM_USE_STMT to
    get access to each occurrence of ssavar on the stmt returned by
diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c
new file mode 100644
index 00000000000..9a95f817079
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */
+
+__attribute__((noinline, no_icf, used))
+static void *f(__SIZE_TYPE__ n)
+{
+  void *p = __builtin_malloc (n);
+  if (p == 0)
+    __builtin_abort ();
+  return p;
+}
+
+__attribute__((noinline, no_icf, used))
+static void *bar(__SIZE_TYPE__ n)
+{
+  void *p = f (n);
+  return p;
+}
+
+/* { dg-final { scan-ipa-dump "Function f found to be malloc" "pure-const" } } */
+/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */
diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c
new file mode 100644
index 00000000000..95b2fd74a7a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */
+
+__attribute__((noinline, used, no_icf))
+static void *foo (__SIZE_TYPE__ n)
+{
+  return __builtin_malloc (n * 10);
+}
+
+__attribute__((noinline, used, no_icf))
+static void *bar(__SIZE_TYPE__ n, int cond)
+{
+  void *p;
+  if (cond)
+    p = foo (n);
+  else
+    p = __builtin_malloc (n);
+
+  return p;
+}
+
+/* { dg-final { scan-ipa-dump "Function foo found to be malloc" "pure-const" } } */
+/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */
diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c
new file mode 100644
index 00000000000..13558ddd07d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */
+
+static void *foo(__SIZE_TYPE__, int) __attribute__((noinline, no_icf, used));
+
+__attribute__((noinline, used, no_icf))
+static void *bar(__SIZE_TYPE__ n, int m)
+{
+  return foo (n, m);
+}
+
+static void *foo(__SIZE_TYPE__ n, int m)
+{
+  void *p;
+  if (m > 0)
+    p = bar (n, --m);
+  else
+    p = __builtin_malloc (n);
+
+  return p;
+}
+
+/* { dg-final { scan-ipa-dump "Function foo found to be malloc" "pure-const" } } */
+/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */

Prathamesh Kulkarni Aug. 8, 2017, 4:20 a.m. | #9
On 31 July 2017 at 23:53, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
> On 23 May 2017 at 19:10, Prathamesh Kulkarni

> <prathamesh.kulkarni@linaro.org> wrote:

>> On 19 May 2017 at 19:02, Jan Hubicka <hubicka@ucw.cz> wrote:

>>>>

>>>> * LTO and memory management

>>>> This is a general question about LTO and memory management.

>>>> IIUC the following sequence takes place during normal LTO:

>>>> LGEN: generate_summary, write_summary

>>>> WPA: read_summary, execute ipa passes, write_opt_summary

>>>>

>>>> So I assumed it was OK in LGEN to allocate return_callees_map in

>>>> generate_summary and free it in write_summary and during WPA, allocate

>>>> return_callees_map in read_summary and free it after execute (since

>>>> write_opt_summary does not require return_callees_map).

>>>>

>>>> However with fat LTO, it seems the sequence changes for LGEN with

>>>> execute phase takes place after write_summary. However since

>>>> return_callees_map is freed in pure_const_write_summary and

>>>> propagate_malloc() accesses it in execute stage, it results in

>>>> segmentation fault.

>>>>

>>>> To work around this, I am using the following hack in pure_const_write_summary:

>>>> // FIXME: Do not free if -ffat-lto-objects is enabled.

>>>> if (!global_options.x_flag_fat_lto_objects)

>>>>   free_return_callees_map ();

>>>> Is there a better approach for handling this ?

>>>

>>> I think most passes just do not free summaries with -flto.  We probably want

>>> to fix it to make it possible to compile multiple units i.e. from plugin by

>>> adding release_summaries method...

>>> So I would say it is OK to do the same as others do and leak it with -flto.

>>>> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c

>>>> index e457166ea39..724c26e03f6 100644

>>>> --- a/gcc/ipa-pure-const.c

>>>> +++ b/gcc/ipa-pure-const.c

>>>> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see

>>>>  #include "tree-scalar-evolution.h"

>>>>  #include "intl.h"

>>>>  #include "opts.h"

>>>> +#include "ssa.h"

>>>>

>>>>  /* Lattice values for const and pure functions.  Everything starts out

>>>>     being const, then may drop to pure and then neither depending on

>>>> @@ -69,6 +70,15 @@ enum pure_const_state_e

>>>>

>>>>  const char *pure_const_names[3] = {"const", "pure", "neither"};

>>>>

>>>> +enum malloc_state_e

>>>> +{

>>>> +  PURE_CONST_MALLOC_TOP,

>>>> +  PURE_CONST_MALLOC,

>>>> +  PURE_CONST_MALLOC_BOTTOM

>>>> +};

>>>

>>> It took me a while to work out what PURE_CONST means here :)

>>> I would just call it something like STATE_MALLOC_TOP... or so.

>>> ipa_pure_const is outdated name from the time pass was doing only

>>> those two.

>>>> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state;

>>>>

>>>>  static vec<funct_state> funct_state_vec;

>>>>

>>>> +/* A map from node to subset of callees. The subset contains those callees

>>>> + * whose return-value is returned by the node. */

>>>> +static hash_map< cgraph_node *, vec<cgraph_node *>* > *return_callees_map;

>>>> +

>>>

>>> Hehe, a special case of return jump function.  We ought to support those more generally.

>>> How do you keep it up to date over callgraph changes?

>>>> @@ -921,6 +1055,23 @@ end:

>>>>    if (TREE_NOTHROW (decl))

>>>>      l->can_throw = false;

>>>>

>>>> +  if (ipa)

>>>> +    {

>>>> +      vec<cgraph_node *> v = vNULL;

>>>> +      l->malloc_state = PURE_CONST_MALLOC_BOTTOM;

>>>> +      if (DECL_IS_MALLOC (decl))

>>>> +     l->malloc_state = PURE_CONST_MALLOC;

>>>> +      else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v))

>>>> +     {

>>>> +       l->malloc_state = PURE_CONST_MALLOC_TOP;

>>>> +       vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL);

>>>> +       for (unsigned i = 0; i < v.length (); ++i)

>>>> +         callees_p->safe_push (v[i]);

>>>> +       return_callees_map->put (fn, callees_p);

>>>> +     }

>>>> +      v.release ();

>>>> +    }

>>>> +

>>>

>>> I would do non-ipa variant, too.  I think most attributes can be detected that way

>>> as well.

>>>

>>> The patch generally makes sense to me.  It would be nice to make it easier to write such

>>> a basic propagators across callgraph (perhaps adding a template doing the basic

>>> propagation logic). Also I think you need to solve the problem with keeping your

>>> summaries up to date across callgraph node removal and duplications.

>> Thanks for the suggestions, I will try to address them in a follow-up patch.

>> IIUC, I would need to modify ipa-pure-const cgraph hooks -

>> add_new_function, remove_node_data, duplicate_node_data

>> to keep return_callees_map up-to-date across callgraph node insertions

>> and removal ?

>>

>> Also, if instead of having a separate data-structure like return_callees_map,

>> should we rather have a flag within cgraph_edge, which marks that the

>> caller may return the value of the callee ?

> Hi,

> Sorry for the very late response. I have attached an updated version

> of the prototype patch,

> which adds a non-ipa variant, and keeps return_callees_map up-to-date

> across callgraph

> node insertions and removal. For the non-ipa variant,

> malloc_candidate_p() additionally checks

> that all the "return callees" have DECL_IS_MALLOC set to true.

> Bootstrapped+tested and LTO bootstrapped+tested on x86_64-unknown-linux-gnu.

> Does it look OK so far ?

>

> Um sorry for this silly question, but I don't really understand how

> does indirect call propagation

> work in ipa-pure-const ? For example consider propagation of nothrow

> attribute in following

> test-case:

>

> __attribute__((noinline, noclone, nothrow))

> int f1(int k) { return k; }

>

> __attribute__((noinline, noclone))

> static int foo(int (*p)(int))

> {

>   return p(10);

> }

>

> __attribute__((noinline, noclone))

> int bar(void)

> {

>   return foo(f1);

> }

>

> Shouldn't foo and bar be also marked as nothrow ?

> Since foo indirectly calls f1 which is nothrow and bar only calls foo ?

> The local-pure-const2 dump shows function is locally throwing  for

> "foo" and "bar".

>

> Um, I was wondering how to get "points-to" analysis for function-pointers,

> to get list of callees that may be indirectly called from that

> function pointer ?

> In the patch I just set node to bottom if it contains indirect calls

> which is far from ideal :(

> I would be much grateful for suggestions on how to handle indirect calls.

> Thanks!

ping https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html

Thanks,
Prathamesh
>

> Regards,

> Prathamesh

>>

>> Thanks,

>> Prathamesh

>>>

>>> Honza
Prathamesh Kulkarni Aug. 17, 2017, 12:32 p.m. | #10
On 8 August 2017 at 09:50, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
> On 31 July 2017 at 23:53, Prathamesh Kulkarni

> <prathamesh.kulkarni@linaro.org> wrote:

>> On 23 May 2017 at 19:10, Prathamesh Kulkarni

>> <prathamesh.kulkarni@linaro.org> wrote:

>>> On 19 May 2017 at 19:02, Jan Hubicka <hubicka@ucw.cz> wrote:

>>>>>

>>>>> * LTO and memory management

>>>>> This is a general question about LTO and memory management.

>>>>> IIUC the following sequence takes place during normal LTO:

>>>>> LGEN: generate_summary, write_summary

>>>>> WPA: read_summary, execute ipa passes, write_opt_summary

>>>>>

>>>>> So I assumed it was OK in LGEN to allocate return_callees_map in

>>>>> generate_summary and free it in write_summary and during WPA, allocate

>>>>> return_callees_map in read_summary and free it after execute (since

>>>>> write_opt_summary does not require return_callees_map).

>>>>>

>>>>> However with fat LTO, it seems the sequence changes for LGEN with

>>>>> execute phase takes place after write_summary. However since

>>>>> return_callees_map is freed in pure_const_write_summary and

>>>>> propagate_malloc() accesses it in execute stage, it results in

>>>>> segmentation fault.

>>>>>

>>>>> To work around this, I am using the following hack in pure_const_write_summary:

>>>>> // FIXME: Do not free if -ffat-lto-objects is enabled.

>>>>> if (!global_options.x_flag_fat_lto_objects)

>>>>>   free_return_callees_map ();

>>>>> Is there a better approach for handling this ?

>>>>

>>>> I think most passes just do not free summaries with -flto.  We probably want

>>>> to fix it to make it possible to compile multiple units i.e. from plugin by

>>>> adding release_summaries method...

>>>> So I would say it is OK to do the same as others do and leak it with -flto.

>>>>> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c

>>>>> index e457166ea39..724c26e03f6 100644

>>>>> --- a/gcc/ipa-pure-const.c

>>>>> +++ b/gcc/ipa-pure-const.c

>>>>> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see

>>>>>  #include "tree-scalar-evolution.h"

>>>>>  #include "intl.h"

>>>>>  #include "opts.h"

>>>>> +#include "ssa.h"

>>>>>

>>>>>  /* Lattice values for const and pure functions.  Everything starts out

>>>>>     being const, then may drop to pure and then neither depending on

>>>>> @@ -69,6 +70,15 @@ enum pure_const_state_e

>>>>>

>>>>>  const char *pure_const_names[3] = {"const", "pure", "neither"};

>>>>>

>>>>> +enum malloc_state_e

>>>>> +{

>>>>> +  PURE_CONST_MALLOC_TOP,

>>>>> +  PURE_CONST_MALLOC,

>>>>> +  PURE_CONST_MALLOC_BOTTOM

>>>>> +};

>>>>

>>>> It took me a while to work out what PURE_CONST means here :)

>>>> I would just call it something like STATE_MALLOC_TOP... or so.

>>>> ipa_pure_const is outdated name from the time pass was doing only

>>>> those two.

>>>>> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state;

>>>>>

>>>>>  static vec<funct_state> funct_state_vec;

>>>>>

>>>>> +/* A map from node to subset of callees. The subset contains those callees

>>>>> + * whose return-value is returned by the node. */

>>>>> +static hash_map< cgraph_node *, vec<cgraph_node *>* > *return_callees_map;

>>>>> +

>>>>

>>>> Hehe, a special case of return jump function.  We ought to support those more generally.

>>>> How do you keep it up to date over callgraph changes?

>>>>> @@ -921,6 +1055,23 @@ end:

>>>>>    if (TREE_NOTHROW (decl))

>>>>>      l->can_throw = false;

>>>>>

>>>>> +  if (ipa)

>>>>> +    {

>>>>> +      vec<cgraph_node *> v = vNULL;

>>>>> +      l->malloc_state = PURE_CONST_MALLOC_BOTTOM;

>>>>> +      if (DECL_IS_MALLOC (decl))

>>>>> +     l->malloc_state = PURE_CONST_MALLOC;

>>>>> +      else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v))

>>>>> +     {

>>>>> +       l->malloc_state = PURE_CONST_MALLOC_TOP;

>>>>> +       vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL);

>>>>> +       for (unsigned i = 0; i < v.length (); ++i)

>>>>> +         callees_p->safe_push (v[i]);

>>>>> +       return_callees_map->put (fn, callees_p);

>>>>> +     }

>>>>> +      v.release ();

>>>>> +    }

>>>>> +

>>>>

>>>> I would do non-ipa variant, too.  I think most attributes can be detected that way

>>>> as well.

>>>>

>>>> The patch generally makes sense to me.  It would be nice to make it easier to write such

>>>> a basic propagators across callgraph (perhaps adding a template doing the basic

>>>> propagation logic). Also I think you need to solve the problem with keeping your

>>>> summaries up to date across callgraph node removal and duplications.

>>> Thanks for the suggestions, I will try to address them in a follow-up patch.

>>> IIUC, I would need to modify ipa-pure-const cgraph hooks -

>>> add_new_function, remove_node_data, duplicate_node_data

>>> to keep return_callees_map up-to-date across callgraph node insertions

>>> and removal ?

>>>

>>> Also, if instead of having a separate data-structure like return_callees_map,

>>> should we rather have a flag within cgraph_edge, which marks that the

>>> caller may return the value of the callee ?

>> Hi,

>> Sorry for the very late response. I have attached an updated version

>> of the prototype patch,

>> which adds a non-ipa variant, and keeps return_callees_map up-to-date

>> across callgraph

>> node insertions and removal. For the non-ipa variant,

>> malloc_candidate_p() additionally checks

>> that all the "return callees" have DECL_IS_MALLOC set to true.

>> Bootstrapped+tested and LTO bootstrapped+tested on x86_64-unknown-linux-gnu.

>> Does it look OK so far ?

>>

>> Um sorry for this silly question, but I don't really understand how

>> does indirect call propagation

>> work in ipa-pure-const ? For example consider propagation of nothrow

>> attribute in following

>> test-case:

>>

>> __attribute__((noinline, noclone, nothrow))

>> int f1(int k) { return k; }

>>

>> __attribute__((noinline, noclone))

>> static int foo(int (*p)(int))

>> {

>>   return p(10);

>> }

>>

>> __attribute__((noinline, noclone))

>> int bar(void)

>> {

>>   return foo(f1);

>> }

>>

>> Shouldn't foo and bar be also marked as nothrow ?

>> Since foo indirectly calls f1 which is nothrow and bar only calls foo ?

>> The local-pure-const2 dump shows function is locally throwing  for

>> "foo" and "bar".

>>

>> Um, I was wondering how to get "points-to" analysis for function-pointers,

>> to get list of callees that may be indirectly called from that

>> function pointer ?

>> In the patch I just set node to bottom if it contains indirect calls

>> which is far from ideal :(

>> I would be much grateful for suggestions on how to handle indirect calls.

>> Thanks!

> ping https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html

ping * 2 https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html

Thanks,
Prathamesh
>

> Thanks,

> Prathamesh

>>

>> Regards,

>> Prathamesh

>>>

>>> Thanks,

>>> Prathamesh

>>>>

>>>> Honza

Patch hide | download patch | download mbox

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index e505b10e211..eab8216100e 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2492,6 +2492,61 @@  cgraph_node::set_nothrow_flag (bool nothrow)
   return changed;
 }
 
+/* Worker to set malloc flag.  */
+static void
+set_malloc_flag_1 (cgraph_node *node, bool malloc_p, bool non_call,
+		    bool *changed)
+{
+  cgraph_edge *e;
+
+  if (malloc_p && !DECL_IS_MALLOC (node->decl))
+    {
+      /* With non-call exceptions we can't say for sure if other function
+ 	 body was not possibly optimized to still throw.  */
+      if (!non_call || node->binds_to_current_def_p ())
+	{
+	  DECL_IS_MALLOC (node->decl) = true;
+	  *changed = true;
+	}
+    }
+  else if (!malloc_p && DECL_IS_MALLOC (node->decl))
+    {
+      DECL_IS_MALLOC (node->decl) = false;
+      *changed = true;
+    }
+
+  ipa_ref *ref;
+  FOR_EACH_ALIAS (node, ref)
+    {
+      cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);
+      if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)
+	set_malloc_flag_1 (alias, malloc_p, non_call, changed);
+    }
+}
+/* Set DECL_IS_MALLOC on NODE's decl and on NODE's aliases if any.  */
+
+bool
+cgraph_node::set_malloc_flag (bool malloc_p)
+{
+  bool changed = false;
+  bool non_call = opt_for_fn (decl, flag_non_call_exceptions);
+
+  if (!malloc_p || get_availability () > AVAIL_INTERPOSABLE)
+    set_malloc_flag_1 (this, malloc_p, non_call, &changed);
+  else
+    {
+      ipa_ref *ref;
+
+      FOR_EACH_ALIAS (this, ref)
+	{
+	  cgraph_node *alias = dyn_cast<cgraph_node *> (ref->referring);
+	  if (!malloc_p || alias->get_availability () > AVAIL_INTERPOSABLE)
+	    set_malloc_flag_1 (alias, malloc_p, non_call, &changed);
+	}
+    }
+  return changed;
+}
+
 /* Worker to set_const_flag.  */
 
 static void
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index be4eaee71e2..6739bb3094d 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1116,6 +1116,10 @@  public:
      if any to NOTHROW.  */
   bool set_nothrow_flag (bool nothrow);
 
+  /* SET DECL_IS_MALLOC on cgraph_node's decl and on aliases of the node
+     if any.  */
+  bool set_malloc_flag (bool malloc_p);
+
   /* If SET_CONST is true, mark function, aliases and thunks to be ECF_CONST.
     If SET_CONST if false, clear the flag.
 
diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index e457166ea39..724c26e03f6 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -56,6 +56,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "intl.h"
 #include "opts.h"
+#include "ssa.h"
 
 /* Lattice values for const and pure functions.  Everything starts out
    being const, then may drop to pure and then neither depending on
@@ -69,6 +70,15 @@  enum pure_const_state_e
 
 const char *pure_const_names[3] = {"const", "pure", "neither"};
 
+enum malloc_state_e
+{
+  PURE_CONST_MALLOC_TOP,
+  PURE_CONST_MALLOC,
+  PURE_CONST_MALLOC_BOTTOM
+};
+
+const char *malloc_state_names[] = {"malloc_top", "malloc", "malloc_bottom"};
+
 /* Holder for the const_state.  There is one of these per function
    decl.  */
 struct funct_state_d
@@ -92,11 +102,13 @@  struct funct_state_d
   /* If function can call free, munmap or otherwise make previously
      non-trapping memory accesses trapping.  */
   bool can_free;
+
+  enum malloc_state_e malloc_state;
 };
 
 /* State used when we know nothing about function.  */
 static struct funct_state_d varying_state
-   = { IPA_NEITHER, IPA_NEITHER, true, true, true, true };
+   = { IPA_NEITHER, IPA_NEITHER, true, true, true, true, PURE_CONST_MALLOC_BOTTOM };
 
 
 typedef struct funct_state_d * funct_state;
@@ -109,6 +121,10 @@  typedef struct funct_state_d * funct_state;
 
 static vec<funct_state> funct_state_vec;
 
+/* A map from node to subset of callees. The subset contains those callees
+ * whose return-value is returned by the node. */
+static hash_map< cgraph_node *, vec<cgraph_node *>* > *return_callees_map;
+
 static bool gate_pure_const (void);
 
 namespace {
@@ -812,6 +828,124 @@  check_stmt (gimple_stmt_iterator *gsip, funct_state local, bool ipa)
     }
 }
 
+static bool
+check_retval_uses (tree retval, gimple *stmt)
+{
+  imm_use_iterator use_iter;
+  gimple *use_stmt;
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, retval)
+    if (gcond *cond = dyn_cast<gcond *> (use_stmt))
+      {
+	enum tree_code code = gimple_cond_code (cond);
+	if (TREE_CODE_CLASS (code) != tcc_comparison)
+	  RETURN_FROM_IMM_USE_STMT (use_iter, false);
+      }
+    else if (gassign *ga = dyn_cast<gassign *> (use_stmt))
+      {
+	enum tree_code code = gimple_assign_rhs_code (ga);
+	if (TREE_CODE_CLASS (code) != tcc_comparison)
+	  RETURN_FROM_IMM_USE_STMT (use_iter, false);
+      }
+    else if (use_stmt != stmt)
+      RETURN_FROM_IMM_USE_STMT (use_iter, false);
+
+  return true;
+}
+
+/*
+ * Currently this function does a very conservative analysis to check if
+ * function could be a malloc candidate.
+ *
+ * The function is considered to be a candidate if
+ * 1) The function returns a value of pointer type.
+ * 2) SSA_NAME_DEF_STMT (return_value) is either a function call or
+ *    a phi, and SSA_NAME_DEF_STMT(each element of phi) is function call.
+ * 3) The return-value has immediate uses only within comparisons (gcond or gassign)
+ *    and return_stmt (and likewise a phi arg has immediate use only within comparison
+ *    or the phi stmt).
+ */
+
+static bool
+malloc_candidate_p (function *fun, vec<cgraph_node *>& callees)
+{
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fun)
+    {
+      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+	   !gsi_end_p (gsi);
+	   gsi_next (&gsi))
+	{
+	  greturn *ret_stmt = dyn_cast<greturn *> (gsi_stmt (gsi));
+	  if (ret_stmt)
+	    {
+	      tree retval = gimple_return_retval (ret_stmt);
+	      if (!retval)
+		return false;
+
+	      if (TREE_CODE (retval) != SSA_NAME
+		  || TREE_CODE (TREE_TYPE (retval)) != POINTER_TYPE)
+		return false;
+
+	      if (!check_retval_uses (retval, ret_stmt))
+		return false;
+
+	      gimple *def = SSA_NAME_DEF_STMT (retval);
+	      if (gcall *call_stmt = dyn_cast<gcall *> (def))
+		{
+		  tree callee_decl = gimple_call_fndecl (call_stmt);
+		  /* FIXME: In case of indirect call, callee_decl might be NULL
+ 		     Not sure what to do in that case, punting for now.  */
+		  if (!callee_decl)
+		    return false;
+		  cgraph_node *callee = cgraph_node::get_create (callee_decl);
+		  callees.safe_push (callee);
+		}
+	      else if (gphi *phi = dyn_cast<gphi *> (def))
+		for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+		  {
+		    tree arg = gimple_phi_arg_def (phi, i);
+		    if (TREE_CODE (arg) != SSA_NAME)
+		      return false;
+		    if (!check_retval_uses (arg, phi))
+		      return false;
+
+		    gimple *arg_def = SSA_NAME_DEF_STMT (arg);
+		    gcall *call_stmt = dyn_cast<gcall *> (arg_def);
+		    if (!call_stmt)
+		      return false;
+		    tree callee_decl = gimple_call_fndecl (call_stmt);
+		    if (!callee_decl)
+		      return false;
+		    cgraph_node *callee = cgraph_node::get_create (callee_decl);
+		    callees.safe_push (callee);
+		  }
+	      else
+		return false;
+	    }
+	}
+    }
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "\nFound %s to be candidate for malloc attribute\n",
+	     IDENTIFIER_POINTER (DECL_NAME (fun->decl)));
+  return true;
+}
+
+
+static void
+free_return_callees_map (void)
+{
+  for (hash_map<cgraph_node *, vec<cgraph_node *>*>::iterator it = return_callees_map->begin ();
+       it != return_callees_map->end ();
+       ++it)
+    {
+      std::pair<cgraph_node *, vec<cgraph_node *>*> p = *it;
+      p.second->release ();
+      delete p.second;
+    }
+  delete return_callees_map;
+}
 
 /* This is the main routine for finding the reference patterns for
    global variables within a function FN.  */
@@ -921,6 +1055,23 @@  end:
   if (TREE_NOTHROW (decl))
     l->can_throw = false;
 
+  if (ipa)
+    {
+      vec<cgraph_node *> v = vNULL;
+      l->malloc_state = PURE_CONST_MALLOC_BOTTOM;
+      if (DECL_IS_MALLOC (decl))
+	l->malloc_state = PURE_CONST_MALLOC;
+      else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v))
+	{
+	  l->malloc_state = PURE_CONST_MALLOC_TOP;
+	  vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL);
+	  for (unsigned i = 0; i < v.length (); ++i)
+	    callees_p->safe_push (v[i]);
+	  return_callees_map->put (fn, callees_p);
+	}
+      v.release ();
+    }
+
   pop_cfun ();
   if (dump_file)
     {
@@ -1003,6 +1154,7 @@  pure_const_generate_summary (void)
 
   pass_ipa_pure_const *pass = static_cast <pass_ipa_pure_const *> (current_pass);
   pass->register_hooks ();
+  return_callees_map = new hash_map<cgraph_node *, vec<cgraph_node *>*>;
 
   /* Process all of the functions.
 
@@ -1067,11 +1219,30 @@  pure_const_write_summary (void)
 	  bp_pack_value (&bp, fs->looping, 1);
 	  bp_pack_value (&bp, fs->can_throw, 1);
 	  bp_pack_value (&bp, fs->can_free, 1);
+	  bp_pack_value (&bp, fs->malloc_state, 2);
 	  streamer_write_bitpack (&bp);
+
+	  vec<cgraph_node *> **callees_pp = return_callees_map->get (node);
+	  if (callees_pp)
+	    {
+	      vec<cgraph_node *> callees = **callees_pp;
+	      streamer_write_uhwi_stream (ob->main_stream, callees.length ());
+	      for (unsigned i = 0; i < callees.length (); ++i)
+		{
+		  int callee_ref = lto_symtab_encoder_encode (encoder, callees[i]);
+		  streamer_write_uhwi_stream (ob->main_stream, callee_ref);
+		}
+	    }
+	  else
+	    streamer_write_uhwi_stream (ob->main_stream, 0);
 	}
     }
 
   lto_destroy_simple_output_block (ob);
+
+  // FIXME: Do not free if -ffat-lto-objects is enabled.
+  if (!global_options.x_flag_fat_lto_objects)
+    free_return_callees_map ();
 }
 
 
@@ -1087,6 +1258,8 @@  pure_const_read_summary (void)
   pass_ipa_pure_const *pass = static_cast <pass_ipa_pure_const *> (current_pass);
   pass->register_hooks ();
 
+  return_callees_map = new hash_map<cgraph_node *, vec<cgraph_node *> *>;
+
   while ((file_data = file_data_vec[j++]))
     {
       const char *data;
@@ -1127,6 +1300,22 @@  pure_const_read_summary (void)
 	      fs->looping = bp_unpack_value (&bp, 1);
 	      fs->can_throw = bp_unpack_value (&bp, 1);
 	      fs->can_free = bp_unpack_value (&bp, 1);
+	      fs->malloc_state
+			= (enum malloc_state_e) bp_unpack_value (&bp, 2);
+
+	      unsigned len = streamer_read_uhwi (ib);
+	      if (len)
+		{
+		  vec<cgraph_node *> *callees_p = new vec<cgraph_node *> (vNULL);
+		  for (unsigned i = 0; i < len; i++)
+		    {
+		      int callee_index = streamer_read_uhwi (ib);
+		      cgraph_node *callee = dyn_cast<cgraph_node *> (lto_symtab_encoder_deref (encoder, callee_index));
+		      callees_p->safe_push (callee);
+		    }
+		  return_callees_map->put (node, callees_p);
+		}
+
 	      if (dump_file)
 		{
 		  int flags = flags_from_decl_or_type (node->decl);
@@ -1151,6 +1340,8 @@  pure_const_read_summary (void)
 		    fprintf (dump_file,"  function is locally throwing\n");
 		  if (fs->can_free)
 		    fprintf (dump_file,"  function can locally free\n");
+		  fprintf (dump_file, "\n malloc state: %s\n",
+			   malloc_state_names[fs->malloc_state]);
 		}
 	    }
 
@@ -1664,6 +1855,121 @@  propagate_nothrow (void)
   free (order);
 }
 
+DEBUG_FUNCTION
+static void
+dump_malloc_lattice (FILE *dump_file, const char *s)
+{
+  if (!dump_file)
+    return;
+
+  fprintf (dump_file, "\n\nMALLOC LATTICE %s:\n", s);
+  cgraph_node *node;
+  FOR_EACH_FUNCTION (node)
+    {
+      funct_state fs = get_function_state (node);
+      malloc_state_e state = fs->malloc_state;
+      fprintf (dump_file, "%s: %s\n", node->name (), malloc_state_names[state]);
+    }
+}
+
+static void
+propagate_malloc (void)
+{
+  cgraph_node *node;
+  FOR_EACH_FUNCTION (node)
+    {
+      if (DECL_IS_MALLOC (node->decl))
+	if (!has_function_state (node))
+	  {
+	    funct_state l = XCNEW (struct funct_state_d);
+	    *l = varying_state;
+	    l->malloc_state = PURE_CONST_MALLOC;
+	    set_function_state (node, l);
+	  }
+    }
+
+  dump_malloc_lattice (dump_file, "Initial");
+  struct cgraph_node **order
+    = XNEWVEC (struct cgraph_node *, symtab->cgraph_count);
+  int order_pos = ipa_reverse_postorder (order);
+  bool changed = true;
+
+  while (changed)
+    {
+      changed = false;
+      /* Walk in postorder.  */
+      for (int i = order_pos - 1; i >= 0; --i)
+	{
+	  cgraph_node *node = order[i];
+	  if (node->alias
+	      || !node->definition
+	      || !has_function_state (node))
+	    continue;
+
+	  funct_state l = get_function_state (node);
+
+	  /* FIXME: add support for indirect-calls.  */
+	  if (node->indirect_calls)
+	    {
+	      l->malloc_state = PURE_CONST_MALLOC_BOTTOM;
+	      continue;
+	    }
+
+	  if (node->get_availability () <= AVAIL_INTERPOSABLE)
+	    {
+	      l->malloc_state = PURE_CONST_MALLOC_BOTTOM;
+	      continue;
+	    }
+
+	  if (l->malloc_state == PURE_CONST_MALLOC_BOTTOM)
+	    continue;
+
+	  vec<cgraph_node *> **callees_pp = return_callees_map->get (node);
+	  if (!callees_pp)
+	    continue;
+
+	  vec<cgraph_node *> callees = **callees_pp;
+	  malloc_state_e new_state = l->malloc_state;
+	  for (unsigned j = 0; j < callees.length (); j++)
+	    {
+	      cgraph_node *callee = callees[j];
+	      if (!has_function_state (callee))
+		{
+		  new_state = PURE_CONST_MALLOC_BOTTOM;
+		  break;
+		}
+	      malloc_state_e callee_state = get_function_state (callee)->malloc_state;
+	      if (new_state < callee_state)
+		new_state = callee_state;
+	    }
+	  if (new_state != l->malloc_state)
+	    {
+	      changed = true;
+	      l->malloc_state = new_state;
+	    }
+	}
+    }
+
+  FOR_EACH_DEFINED_FUNCTION (node)
+    if (has_function_state (node))
+      {
+	funct_state l = get_function_state (node);
+	if (!node->alias
+	    && l->malloc_state == PURE_CONST_MALLOC
+	    && !node->global.inlined_to)
+	  {
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      fprintf (dump_file, "Function %s found to be malloc\n",
+		       node->name ());
+	    node->set_malloc_flag (true);
+	  }
+      }
+
+  dump_malloc_lattice (dump_file, "after propagation");
+  ipa_free_postorder_info ();
+  free (order);
+  free_return_callees_map ();
+}
 
 /* Produce the global information by preforming a transitive closure
    on the local information that was produced by generate_summary.  */
@@ -1682,6 +1988,7 @@  execute (function *)
   /* Nothrow makes more function to not lead to return and improve
      later analysis.  */
   propagate_nothrow ();
+  propagate_malloc ();
   remove_p = propagate_pure_const ();
 
   /* Cleanup. */
diff --git a/gcc/ssa-iterators.h b/gcc/ssa-iterators.h
index c8aa77bd4f3..740cbf13cb2 100644
--- a/gcc/ssa-iterators.h
+++ b/gcc/ssa-iterators.h
@@ -93,6 +93,12 @@  struct imm_use_iterator
      break;							\
    }
 
+/* Similarly for return.  */
+#define RETURN_FROM_IMM_USE_STMT(ITER, VAL)			\
+  {								\
+    end_imm_use_stmt_traverse (&(ITER));			\
+    return (VAL);						\
+  }
 
 /* Use this iterator in combination with FOR_EACH_IMM_USE_STMT to
    get access to each occurrence of ssavar on the stmt returned by
diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c
new file mode 100644
index 00000000000..9a95f817079
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-1.c
@@ -0,0 +1,21 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */
+
+__attribute__((noinline, no_icf, used))
+static void *f(__SIZE_TYPE__ n)
+{
+  void *p = __builtin_malloc (n);
+  if (p == 0)
+    __builtin_abort ();
+  return p;
+}
+
+__attribute__((noinline, no_icf, used))
+static void *bar(__SIZE_TYPE__ n)
+{
+  void *p = f (n);
+  return p;
+}
+
+/* { dg-final { scan-ipa-dump "Function f found to be malloc" "pure-const" } } */
+/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */
diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c
new file mode 100644
index 00000000000..95b2fd74a7a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-2.c
@@ -0,0 +1,23 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */
+
+__attribute__((noinline, used, no_icf))
+static void *foo (__SIZE_TYPE__ n)
+{
+  return __builtin_malloc (n * 10);
+}
+
+__attribute__((noinline, used, no_icf))
+static void *bar(__SIZE_TYPE__ n, int cond)
+{
+  void *p;
+  if (cond)
+    p = foo (n);
+  else
+    p = __builtin_malloc (n);
+
+  return p;
+}
+
+/* { dg-final { scan-ipa-dump "Function foo found to be malloc" "pure-const" } } */
+/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */
diff --git a/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c b/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c
new file mode 100644
index 00000000000..13558ddd07d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/propmalloc-3.c
@@ -0,0 +1,24 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-pure-const-details" } */
+
+static void *foo(__SIZE_TYPE__, int) __attribute__((noinline, no_icf, used));
+
+__attribute__((noinline, used, no_icf))
+static void *bar(__SIZE_TYPE__ n, int m)
+{
+  return foo (n, m);
+}
+
+static void *foo(__SIZE_TYPE__ n, int m)
+{
+  void *p;
+  if (m > 0)
+    p = bar (n, --m);
+  else
+    p = __builtin_malloc (n);
+
+  return p;
+}
+
+/* { dg-final { scan-ipa-dump "Function foo found to be malloc" "pure-const" } } */
+/* { dg-final { scan-ipa-dump "Function bar found to be malloc" "pure-const" } } */