diff mbox series

[RFC] stdlib: Make atexit to not act as __cxa_atexit

Message ID 20190701205904.22287-1-adhemerval.zanella@linaro.org
State New
Headers show
Series [RFC] stdlib: Make atexit to not act as __cxa_atexit | expand

Commit Message

Adhemerval Zanella July 1, 2019, 8:59 p.m. UTC
This is RFC patch to address the issue brought in recent discussion
regarding atexit and shared libraries [1] [2].  As indicated in the
libc-alpha discussion, the issue is since atexit uses __cxa_atexit
its interaction __cxa_finalize could lead to atexit handlers being
executed in a different order than the expected one.  The github
project gives a small example that triggers it [3].

The changes I could come with changes slight the atexit semantic
as described in my last email [4].  So basically the changes are:

  1. Add the __atexit symbol which is linked as __cxa_finalize in
     static mode (so __dso_handle is correctly set).  The __atexit
     symbol adds an ef_at exit_function entry on __exit_funcs,
     different than an ef_cxa one from __cxa_atexit.

     Old binaries would still call __cxa_atexit, so we do not actually
     need to add a compat symbol.

  2. Make __cxa_finalize set ef_at handlers to ef_at_finalize.  This
     is to for dlclose to mark the atexit handlers as free, so a
     subsequent __cxa_finalize won't run potentially unmmap code.

  3. exit is the only way where atexit handlers are actually called.
     Unloading a library through dlclose does not issue the atexit
     anymore (this is the main semantic change).

I have not opened a bug report about it because I am not sure from
the previous discussion this is indeed a bug or an expected side-effect
implementing atexit based on __cxa_atexit.  This is also only happens
when atexit is created from a shared library (which has its own
__dso_handle value) on some specific scenarios where the unloading
order is changed.

The RFC also has a *lot* of missing parts (tests, abifiles, comments,
CL).

[1] https://sourceware.org/ml/libc-alpha/2019-06/msg00229.html
[2] https://sourceware.org/ml/libc-help/2019-06/msg00025.html
[3] https://github.com/mulle-nat/ld-so-breakage
[4] https://sourceware.org/ml/libc-alpha/2019-06/msg00231.html
---
 dlfcn/dlclose.c                               |   4 +-
 dlfcn/modatexit.c                             |  11 +-
 dlfcn/tstatexit.c                             |  30 +---
 include/stdlib.h                              |   3 +
 stdlib/Versions                               |   4 +
 stdlib/atexit.c                               |   2 +-
 stdlib/cxa_atexit.c                           |  28 +++-
 stdlib/cxa_finalize.c                         | 133 +++++++++++-------
 stdlib/exit.c                                 |   3 +-
 stdlib/exit.h                                 |   7 +-
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   1 +
 11 files changed, 141 insertions(+), 85 deletions(-)

-- 
2.17.1

Comments

Florian Weimer July 2, 2019, 7:44 a.m. UTC | #1
* Adhemerval Zanella:

>   3. exit is the only way where atexit handlers are actually called.

>      Unloading a library through dlclose does not issue the atexit

>      anymore (this is the main semantic change).


Sorry, I don't think this is the right direction.  We got the current
atexit behavior from Solaris, and I think it is useful.

  <https://docs.oracle.com/cd/E88353_01/html/E37843/atexit-3c.html>

I think the bug is that our implementation of that behavior interferes
with the execution order when there is no dlclose at all.  There should
be a different way for fixing that.

Maybe we can turn __cxa_finalize into a NOP and have the dynamic loader
handle atexit handlers directly on dlclose (assuming that it's always
called before the other ELF destructors)?

Or otherwise use flags to mark things, but in such a way that the
handlers are processed in reverse atexit order, unless a dlclose
happens.

Thanks,
Florian
Adhemerval Zanella July 2, 2019, 1:12 p.m. UTC | #2
On 02/07/2019 04:44, Florian Weimer wrote:
> * Adhemerval Zanella:

> 

>>   3. exit is the only way where atexit handlers are actually called.

>>      Unloading a library through dlclose does not issue the atexit

>>      anymore (this is the main semantic change).

> 

> Sorry, I don't think this is the right direction.  We got the current

> atexit behavior from Solaris, and I think it is useful.

> 

>   <https://docs.oracle.com/cd/E88353_01/html/E37843/atexit-3c.html>


I am not sure this is also a good design, specially that we have elf
destructors that provide a similar semantic.

> 

> I think the bug is that our implementation of that behavior interferes

> with the execution order when there is no dlclose at all.  There should

> be a different way for fixing that.

> 

> Maybe we can turn __cxa_finalize into a NOP and have the dynamic loader

> handle atexit handlers directly on dlclose (assuming that it's always

> called before the other ELF destructors)?


We need __cxa_finalize because it is called by __do_global_dtors_aux
from initfini callbacks. Also making the loader itself handler atexit
is not the best option IMHO: we will either to call a private symbol on 
libc (which might require an additional lookup) or export the internal 
data so loader can implement the callback issuing itself. 

> 

> Or otherwise use flags to mark things, but in such a way that the

> handlers are processed in reverse atexit order, unless a dlclose

> happens.


I think one possible option to still allow atexit handler being called
by atexit is to still register them as ef_at (using the new __atexit
symbol) and on __run_exit_handler first process et_at and them
ef_on and ef_cxa. This is still a semantic change of atexit regarding
the order of on_exit and __cxa_atexit, but we keep the Solaris like
one to call them on dlclose.

What do you think?
Florian Weimer July 2, 2019, 1:24 p.m. UTC | #3
* Adhemerval Zanella:

> On 02/07/2019 04:44, Florian Weimer wrote:

>> * Adhemerval Zanella:

>> 

>>>   3. exit is the only way where atexit handlers are actually called.

>>>      Unloading a library through dlclose does not issue the atexit

>>>      anymore (this is the main semantic change).

>> 

>> Sorry, I don't think this is the right direction.  We got the current

>> atexit behavior from Solaris, and I think it is useful.

>> 

>>   <https://docs.oracle.com/cd/E88353_01/html/E37843/atexit-3c.html>

>

> I am not sure this is also a good design, specially that we have elf

> destructors that provide a similar semantic.


It's what's in various quasi-standard documents, so we need to support
it.

>> I think the bug is that our implementation of that behavior interferes

>> with the execution order when there is no dlclose at all.  There should

>> be a different way for fixing that.

>> 

>> Maybe we can turn __cxa_finalize into a NOP and have the dynamic loader

>> handle atexit handlers directly on dlclose (assuming that it's always

>> called before the other ELF destructors)?

>

> We need __cxa_finalize because it is called by __do_global_dtors_aux

> from initfini callbacks.


(Note: The comment is about crtbeginS.o built from libgcc sources.)

If all versions of the ELF constructor in crtbeginS.o call
__cxa_finalize as their first action, then we can turn __cxa_finalize
into a NOP and have the dynamic linker perform the original actions of
__cxa_finalize directly, before running the ELF destructors for this
object.

> Also making the loader itself handler atexit

> is not the best option IMHO: we will either to call a private symbol on 

> libc (which might require an additional lookup) or export the internal 

> data so loader can implement the callback issuing itself.


Anything would be better than this pointless indirection through libgcc.
We have accumlated lots of cruft in the crt* objects which serve no
purpose at all for current binaries, and eventually, we will have to
clean this up.

I think crtbegin*.o could completely go away if we turn __cxa_finalize
into a NOP on the glibc side.

>> Or otherwise use flags to mark things, but in such a way that the

>> handlers are processed in reverse atexit order, unless a dlclose

>> happens.

>

> I think one possible option to still allow atexit handler being called

> by atexit is to still register them as ef_at (using the new __atexit

> symbol) and on __run_exit_handler first process et_at and them

> ef_on and ef_cxa. This is still a semantic change of atexit regarding

> the order of on_exit and __cxa_atexit, but we keep the Solaris like

> one to call them on dlclose.


Sorry, can you rephrase?  There seem to be a few typos in this
paragraph.

Thanks,
Florian
Adhemerval Zanella July 2, 2019, 4:56 p.m. UTC | #4
On 02/07/2019 10:24, Florian Weimer wrote:
> * Adhemerval Zanella:

> 

>> On 02/07/2019 04:44, Florian Weimer wrote:

>>> * Adhemerval Zanella:

>>>

>>>>   3. exit is the only way where atexit handlers are actually called.

>>>>      Unloading a library through dlclose does not issue the atexit

>>>>      anymore (this is the main semantic change).

>>>

>>> Sorry, I don't think this is the right direction.  We got the current

>>> atexit behavior from Solaris, and I think it is useful.

>>>

>>>   <https://docs.oracle.com/cd/E88353_01/html/E37843/atexit-3c.html>

>>

>> I am not sure this is also a good design, specially that we have elf

>> destructors that provide a similar semantic.

> 

> It's what's in various quasi-standard documents, so we need to support

> it.

> 

>>> I think the bug is that our implementation of that behavior interferes

>>> with the execution order when there is no dlclose at all.  There should

>>> be a different way for fixing that.

>>>

>>> Maybe we can turn __cxa_finalize into a NOP and have the dynamic loader

>>> handle atexit handlers directly on dlclose (assuming that it's always

>>> called before the other ELF destructors)?

>>

>> We need __cxa_finalize because it is called by __do_global_dtors_aux

>> from initfini callbacks.

> 

> (Note: The comment is about crtbeginS.o built from libgcc sources.)

> 

> If all versions of the ELF constructor in crtbeginS.o call

> __cxa_finalize as their first action, then we can turn __cxa_finalize

> into a NOP and have the dynamic linker perform the original actions of

> __cxa_finalize directly, before running the ELF destructors for this

> object.


In fact I though about it and one issue I found is __cxa_finalize uses
__dso_handle and it is a hidden symbol for each DSO.  So, at _dl_fini
we will need to resolve __cxa_finalize from libc.so (or potentially
other function that handles the exit handlers), then resolve the
__dso_handle from the referenced object (with potentially a hack to
lookup for local objects), and finally call __cxa_finalize with
__dso_handler as argument. Not sure it is really a better design.


> 

>> Also making the loader itself handler atexit

>> is not the best option IMHO: we will either to call a private symbol on 

>> libc (which might require an additional lookup) or export the internal 

>> data so loader can implement the callback issuing itself.

> 

> Anything would be better than this pointless indirection through libgcc.

> We have accumlated lots of cruft in the crt* objects which serve no

> purpose at all for current binaries, and eventually, we will have to

> clean this up.

> 

> I think crtbegin*.o could completely go away if we turn __cxa_finalize

> into a NOP on the glibc side.


I think if __dso_handle was exported as a global symbol it could
simplify the indirection by a bit.

> 

>>> Or otherwise use flags to mark things, but in such a way that the

>>> handlers are processed in reverse atexit order, unless a dlclose

>>> happens.

>>

>> I think one possible option to still allow atexit handler being called

>> by atexit is to still register them as ef_at (using the new __atexit

>> symbol) and on __run_exit_handler first process et_at and them

>> ef_on and ef_cxa. This is still a semantic change of atexit regarding

>> the order of on_exit and __cxa_atexit, but we keep the Solaris like

>> one to call them on dlclose.

> 

> Sorry, can you rephrase?  There seem to be a few typos in this

> paragraph.

> 


One idea is to change the order atexit handlers are handled slightly:

  - atexit handlers are not registered through __cxa_atexit, a new
    symbol __atexit adds a new entry as ef_at (similar to my proposal).

  - __cxa_finalize is changed to handle ef_at similar to ef_cxa by
    calling the function pointer and setting it as ef_free. dlclose
    will issue atexit as expected.

  - __run_exit_handlers, used on both exit and quick_exit, will issue
    atexit handler (ef_at) first and then handle on_exit (ef_on), and
    __cxa_atexit (ef_cxa).  This keeps the expected order or atexit
    handlers, however the order regarding ef_cxa will change.

  - It will required some changes internally to handle atexit and
    __cxa_atexit differently regarding lock acquisition since before
    ef_cxa handler we can't get newly added ef_at ones.
Florian Weimer July 3, 2019, 10:05 a.m. UTC | #5
* Adhemerval Zanella:

>>> We need __cxa_finalize because it is called by __do_global_dtors_aux

>>> from initfini callbacks.

>> 

>> (Note: The comment is about crtbeginS.o built from libgcc sources.)

>> 

>> If all versions of the ELF constructor in crtbeginS.o call

>> __cxa_finalize as their first action, then we can turn __cxa_finalize

>> into a NOP and have the dynamic linker perform the original actions of

>> __cxa_finalize directly, before running the ELF destructors for this

>> object.

>

> In fact I though about it and one issue I found is __cxa_finalize uses

> __dso_handle and it is a hidden symbol for each DSO.  So, at _dl_fini

> we will need to resolve __cxa_finalize from libc.so (or potentially

> other function that handles the exit handlers), then resolve the

> __dso_handle from the referenced object (with potentially a hack to

> lookup for local objects), and finally call __cxa_finalize with

> __dso_handler as argument. Not sure it is really a better design.


So … what I was implying but didn't say explicitly is that at one point,
we will have to map (the address of) __dso_handle to a link map using
_dl_find_dso_for_object.

I need to submit my patch to speed up _dl_find_dso_for_object
substantially.  Then it shouldn't be a problem at all to call it in
response to each call to atexit, and put new entries on both the global
linked list and the per-DSO list.

Not sure to what extend we'd need to avoid quadratic-time algorithms
here.

>>> Also making the loader itself handler atexit

>>> is not the best option IMHO: we will either to call a private symbol on 

>>> libc (which might require an additional lookup) or export the internal 

>>> data so loader can implement the callback issuing itself.

>> 

>> Anything would be better than this pointless indirection through libgcc.

>> We have accumlated lots of cruft in the crt* objects which serve no

>> purpose at all for current binaries, and eventually, we will have to

>> clean this up.

>> 

>> I think crtbegin*.o could completely go away if we turn __cxa_finalize

>> into a NOP on the glibc side.

>

> I think if __dso_handle was exported as a global symbol it could

> simplify the indirection by a bit.


I don't see how this is possible.  It would trigger interposition, and
we need a unique address for each DSO, to identify it.

Now the __dso_handle interface is fairly broken and probably predates
the tight integration of libc and the dynamic linker.  We could easily
cache something that allows to quickly locate the actual linkmap in a
variable link __dso_handle, but we do not.

> One idea is to change the order atexit handlers are handled slightly:

>

>   - atexit handlers are not registered through __cxa_atexit, a new

>     symbol __atexit adds a new entry as ef_at (similar to my proposal).

>

>   - __cxa_finalize is changed to handle ef_at similar to ef_cxa by

>     calling the function pointer and setting it as ef_free. dlclose

>     will issue atexit as expected.

>

>   - __run_exit_handlers, used on both exit and quick_exit, will issue

>     atexit handler (ef_at) first and then handle on_exit (ef_on), and

>     __cxa_atexit (ef_cxa).  This keeps the expected order or atexit

>     handlers, however the order regarding ef_cxa will change.

>

>   - It will required some changes internally to handle atexit and

>     __cxa_atexit differently regarding lock acquisition since before

>     ef_cxa handler we can't get newly added ef_at ones.


I think you also need to do something on dlclose to deallocate atexit
handlers that have run, to avoid a memory leak.  Setting them to ef_free
is not sufficient.

Thanks,
Florian
Adhemerval Zanella July 3, 2019, 7:22 p.m. UTC | #6
On 03/07/2019 07:05, Florian Weimer wrote:
> * Adhemerval Zanella:

> 

>>>> We need __cxa_finalize because it is called by __do_global_dtors_aux

>>>> from initfini callbacks.

>>>

>>> (Note: The comment is about crtbeginS.o built from libgcc sources.)

>>>

>>> If all versions of the ELF constructor in crtbeginS.o call

>>> __cxa_finalize as their first action, then we can turn __cxa_finalize

>>> into a NOP and have the dynamic linker perform the original actions of

>>> __cxa_finalize directly, before running the ELF destructors for this

>>> object.

>>

>> In fact I though about it and one issue I found is __cxa_finalize uses

>> __dso_handle and it is a hidden symbol for each DSO.  So, at _dl_fini

>> we will need to resolve __cxa_finalize from libc.so (or potentially

>> other function that handles the exit handlers), then resolve the

>> __dso_handle from the referenced object (with potentially a hack to

>> lookup for local objects), and finally call __cxa_finalize with

>> __dso_handler as argument. Not sure it is really a better design.

> 

> So … what I was implying but didn't say explicitly is that at one point,

> we will have to map (the address of) __dso_handle to a link map using

> _dl_find_dso_for_object.

> 

> I need to submit my patch to speed up _dl_find_dso_for_object

> substantially.  Then it shouldn't be a problem at all to call it in

> response to each call to atexit, and put new entries on both the global

> linked list and the per-DSO list.

> 

> Not sure to what extend we'd need to avoid quadratic-time algorithms

> here.


For _dl_fini, we already have the link_map for object that calls 
__cxa_finalize. What we need is to filter out the exit handlers registered
by the object itself, so its calls only the functions registered by the 
shared library referenced by the link_map.

Not sure how easily we can accomplish it on exit handlers registration 
functions (the __dso_handler trick is to make this easier). 

> 

>>>> Also making the loader itself handler atexit

>>>> is not the best option IMHO: we will either to call a private symbol on 

>>>> libc (which might require an additional lookup) or export the internal 

>>>> data so loader can implement the callback issuing itself.

>>>

>>> Anything would be better than this pointless indirection through libgcc.

>>> We have accumlated lots of cruft in the crt* objects which serve no

>>> purpose at all for current binaries, and eventually, we will have to

>>> clean this up.

>>>

>>> I think crtbegin*.o could completely go away if we turn __cxa_finalize

>>> into a NOP on the glibc side.

>>

>> I think if __dso_handle was exported as a global symbol it could

>> simplify the indirection by a bit.

> 

> I don't see how this is possible.  It would trigger interposition, and

> we need a unique address for each DSO, to identify it.


My idea would be to resolve the symbol without the interposition and make
the loader itself setup the value.  But I did not think this through, most
likely this does not really help.

> 

> Now the __dso_handle interface is fairly broken and probably predates

> the tight integration of libc and the dynamic linker.  We could easily

> cache something that allows to quickly locate the actual linkmap in a

> variable link __dso_handle, but we do not.

> 

>> One idea is to change the order atexit handlers are handled slightly:

>>

>>   - atexit handlers are not registered through __cxa_atexit, a new

>>     symbol __atexit adds a new entry as ef_at (similar to my proposal).

>>

>>   - __cxa_finalize is changed to handle ef_at similar to ef_cxa by

>>     calling the function pointer and setting it as ef_free. dlclose

>>     will issue atexit as expected.

>>

>>   - __run_exit_handlers, used on both exit and quick_exit, will issue

>>     atexit handler (ef_at) first and then handle on_exit (ef_on), and

>>     __cxa_atexit (ef_cxa).  This keeps the expected order or atexit

>>     handlers, however the order regarding ef_cxa will change.

>>

>>   - It will required some changes internally to handle atexit and

>>     __cxa_atexit differently regarding lock acquisition since before

>>     ef_cxa handler we can't get newly added ef_at ones.

> 

> I think you also need to do something on dlclose to deallocate atexit

> handlers that have run, to avoid a memory leak.  Setting them to ef_free

> is not sufficient.


The stdlib/exit.c already handles that on the main loop. The main issue 
is we explicit support that all atexit handler (on_exit, __cxa_atexit, 
__cxa_at_quick_exit) can call any interface to add new handlers. 

The idea I am exploring is to change on how new handlers are included in
the exit handler lists, so on a default execution (without handler adding
new entries) atexit/on_exit are executed before __cxa_atexit. 

The change is to make exit_function_list have an associated type which
hold an only a specific set of handlers.  Currently we required just
two types, one for ef_at/ef_on handlers and one for ef_cxa. New inclusion 
are added in a ordered manner, where ef_at/ef_on are added on 
exit_function_list always before ef_cxa.  

Although on __cxa_finalize we will also need to explicit handle ef_at as
well (for dlclose), the idea is also to mimic the ef_cxa behaviour and set
the ef_at/ef_on free after its execution to avoid multiple executions 
(as a side note I think by not setting ef_on currently we might run on_exit 
handlers twice in some scenarios).

The main issue, which imho there is no solution but it also expected,
is if the any of exit handlers add new handlers itself.  The exit/
__cxa_finalize will just rewind itself.
Florian Weimer July 8, 2019, 11:16 a.m. UTC | #7
* Adhemerval Zanella:

> For _dl_fini, we already have the link_map for object that calls 

> __cxa_finalize. What we need is to filter out the exit handlers registered

> by the object itself, so its calls only the functions registered by the 

> shared library referenced by the link_map.

>

> Not sure how easily we can accomplish it on exit handlers registration 

> functions (the __dso_handler trick is to make this easier).


We can find the object that contains the address of __dso_handle, either
at registration time (perhaps better, to keep dlclose cost lower), or
during _dl_fini.


> The idea I am exploring is to change on how new handlers are included in

> the exit handler lists, so on a default execution (without handler adding

> new entries) atexit/on_exit are executed before __cxa_atexit.


Sorry, what do you mean?  That *handlers* registered using
atexit/on_exit are executed before those registered via __cxa_atexit?

> The change is to make exit_function_list have an associated type which

> hold an only a specific set of handlers.  Currently we required just

> two types, one for ef_at/ef_on handlers and one for ef_cxa. New inclusion 

> are added in a ordered manner, where ef_at/ef_on are added on 

> exit_function_list always before ef_cxa.  

>

> Although on __cxa_finalize we will also need to explicit handle ef_at as

> well (for dlclose), the idea is also to mimic the ef_cxa behaviour and set

> the ef_at/ef_on free after its execution to avoid multiple executions 

> (as a side note I think by not setting ef_on currently we might run on_exit 

> handlers twice in some scenarios).


For dlclose, we also need to perform real deallocation at a certain
point, to avoid memory leaks.

Have you considered a design which puts the same handler entry on two
different lists?

Thanks,
Florian
Adhemerval Zanella July 8, 2019, 1:39 p.m. UTC | #8
On 08/07/2019 08:16, Florian Weimer wrote:
> * Adhemerval Zanella:

> 

>> For _dl_fini, we already have the link_map for object that calls 

>> __cxa_finalize. What we need is to filter out the exit handlers registered

>> by the object itself, so its calls only the functions registered by the 

>> shared library referenced by the link_map.

>>

>> Not sure how easily we can accomplish it on exit handlers registration 

>> functions (the __dso_handler trick is to make this easier).

> 

> We can find the object that contains the address of __dso_handle, either

> at registration time (perhaps better, to keep dlclose cost lower), or

> during _dl_fini.


We currently have the following:

  (libdl) dlclose:
          \_ (ld.so) _dl_close_worker
		     - run dt_fini_array
                       \_ (libfoo.so) __do_global_dtors_aux
                                      \_ (libc.so) __cxa_finalize ((libfoo.so) __dso_handle)

What you are suggesting, if I understood correctly is:

  (libdl) dlclose:
          \_ (ld.so) _dl_close_worker 
                     \_ (libc.so) __cxa_finalize ((libfoo.so) __dso_handle)

So what we need to get on _dl_close_worked is the (libfoo.so) __dso_handle
value. I do think it is feasible, we can  add a new field on the link_map to 
hold its value or find it on _dl_close_worker. We will need to handle some 
caveats, as to add symbol resolution for local symbols and handle libraries 
that does not have __do_global_dtors_aux (and thus no __dso_handle).

However I do think we can fix it by tune how to organize the atexit internal
lists.

> 

> 

>> The idea I am exploring is to change on how new handlers are included in

>> the exit handler lists, so on a default execution (without handler adding

>> new entries) atexit/on_exit are executed before __cxa_atexit.

> 

> Sorry, what do you mean?  That *handlers* registered using

> atexit/on_exit are executed before those registered via __cxa_atexit?


My idea is to organize the internal list to keep the atexit/on_exit handler
before __cxa_atexit, so on both __cxa_finalize and exit/quick_exit they will
be executed first.

> 

>> The change is to make exit_function_list have an associated type which

>> hold an only a specific set of handlers.  Currently we required just

>> two types, one for ef_at/ef_on handlers and one for ef_cxa. New inclusion 

>> are added in a ordered manner, where ef_at/ef_on are added on 

>> exit_function_list always before ef_cxa.  

>>

>> Although on __cxa_finalize we will also need to explicit handle ef_at as

>> well (for dlclose), the idea is also to mimic the ef_cxa behaviour and set

>> the ef_at/ef_on free after its execution to avoid multiple executions 

>> (as a side note I think by not setting ef_on currently we might run on_exit 

>> handlers twice in some scenarios).

> 

> For dlclose, we also need to perform real deallocation at a certain

> point, to avoid memory leaks.

> 

> Have you considered a design which puts the same handler entry on two

> different lists?


Yes, the problem is we need to handle multiple constraints:

  - Any function that register a new exit handle may be called from another
    one. For instance atexit my call atexit or __cxa_atexit may call atexit
    (or any combination).

  - The internal lists follow the GNU principle of adding no limit, so we
    support the minimal POSIX states using a static allocated initial list
    (so atexit can't fail due malloc failure), and we add a linked list
    to make it grow as requested.

  - We need to handle BZ#14333, so we need to stop new registered function
    once we are over iterating over the list. I don't think it would be
    possible with two lists, specially for the case where one callback
    can modify the another list.
Florian Weimer July 8, 2019, 1:57 p.m. UTC | #9
* Adhemerval Zanella:

> On 08/07/2019 08:16, Florian Weimer wrote:

>> * Adhemerval Zanella:

>> 

>>> For _dl_fini, we already have the link_map for object that calls 

>>> __cxa_finalize. What we need is to filter out the exit handlers registered

>>> by the object itself, so its calls only the functions registered by the 

>>> shared library referenced by the link_map.

>>>

>>> Not sure how easily we can accomplish it on exit handlers registration 

>>> functions (the __dso_handler trick is to make this easier).

>> 

>> We can find the object that contains the address of __dso_handle, either

>> at registration time (perhaps better, to keep dlclose cost lower), or

>> during _dl_fini.

>

> We currently have the following:

>

>   (libdl) dlclose:

>           \_ (ld.so) _dl_close_worker

> 		     - run dt_fini_array

>                        \_ (libfoo.so) __do_global_dtors_aux

>                                       \_ (libc.so) __cxa_finalize ((libfoo.so) __dso_handle)

>

> What you are suggesting, if I understood correctly is:

>

>   (libdl) dlclose:

>           \_ (ld.so) _dl_close_worker 

>                      \_ (libc.so) __cxa_finalize ((libfoo.so) __dso_handle)


Yes.

> So what we need to get on _dl_close_worked is the (libfoo.so)

> __dso_handle value.


Or we call the internal equivalent of dladdr on the supplied
__dso_handle pointer when the handler is registered.  I expect that will
result in a slightly smoother execution.

> We will need to handle some caveats, as to add symbol resolution for

> local symbols and handle libraries that does not have

> __do_global_dtors_aux (and thus no __dso_handle).


Then they rely on ELF constructors exclusively and there is nothing to
do?  (And there could be __dso_handle present for other reasons, but we
would not necessarily know about it, given that it's not a dynamic
symbol, and cannot be.)

> I do think it is feasible, we can  add a new field on the link_map to 

> hold its value or find it on _dl_close_worker.


I don't think __dso_handle is guaranteed to be unique per object. 8-(
So we need to be a bit careful there.

>> Have you considered a design which puts the same handler entry on two

>> different lists?

>

> Yes, the problem is we need to handle multiple constraints:

>

>   - Any function that register a new exit handle may be called from another

>     one. For instance atexit my call atexit or __cxa_atexit may call atexit

>     (or any combination).

>

>   - The internal lists follow the GNU principle of adding no limit, so we

>     support the minimal POSIX states using a static allocated initial list

>     (so atexit can't fail due malloc failure), and we add a linked list

>     to make it grow as requested.

>

>   - We need to handle BZ#14333, so we need to stop new registered function

>     once we are over iterating over the list. I don't think it would be

>     possible with two lists, specially for the case where one callback

>     can modify the another list.


It should be possible to have the entry in multiple lists if we add
reference counters.
Adhemerval Zanella July 11, 2019, 7:09 p.m. UTC | #10
On 08/07/2019 10:57, Florian Weimer wrote:
> * Adhemerval Zanella:

> 

>> On 08/07/2019 08:16, Florian Weimer wrote:

>>> * Adhemerval Zanella:

>>>

>>>> For _dl_fini, we already have the link_map for object that calls 

>>>> __cxa_finalize. What we need is to filter out the exit handlers registered

>>>> by the object itself, so its calls only the functions registered by the 

>>>> shared library referenced by the link_map.

>>>>

>>>> Not sure how easily we can accomplish it on exit handlers registration 

>>>> functions (the __dso_handler trick is to make this easier).

>>>

>>> We can find the object that contains the address of __dso_handle, either

>>> at registration time (perhaps better, to keep dlclose cost lower), or

>>> during _dl_fini.

>>

>> We currently have the following:

>>

>>   (libdl) dlclose:

>>           \_ (ld.so) _dl_close_worker

>> 		     - run dt_fini_array

>>                        \_ (libfoo.so) __do_global_dtors_aux

>>                                       \_ (libc.so) __cxa_finalize ((libfoo.so) __dso_handle)

>>

>> What you are suggesting, if I understood correctly is:

>>

>>   (libdl) dlclose:

>>           \_ (ld.so) _dl_close_worker 

>>                      \_ (libc.so) __cxa_finalize ((libfoo.so) __dso_handle)

> 

> Yes.

> 

>> So what we need to get on _dl_close_worked is the (libfoo.so)

>> __dso_handle value.

> 

> Or we call the internal equivalent of dladdr on the supplied

> __dso_handle pointer when the handler is registered.  I expect that will

> result in a slightly smoother execution.

> 

>> We will need to handle some caveats, as to add symbol resolution for

>> local symbols and handle libraries that does not have

>> __do_global_dtors_aux (and thus no __dso_handle).

> 

> Then they rely on ELF constructors exclusively and there is nothing to

> do?  (And there could be __dso_handle present for other reasons, but we

> would not necessarily know about it, given that it's not a dynamic

> symbol, and cannot be.)


I will need to check in which cases __do_global_dtors_aux and thus 
__cxa_finalize is pulled on libgcc.a.

> 

>> I do think it is feasible, we can  add a new field on the link_map to 

>> hold its value or find it on _dl_close_worker.

> 

> I don't think __dso_handle is guaranteed to be unique per object. 8-(

> So we need to be a bit careful there.


My understanding is having multiple values might be problematic, the linker
order might resulting multiple values used in atexit or at_quick_exit.

> 

>>> Have you considered a design which puts the same handler entry on two

>>> different lists?

>>

>> Yes, the problem is we need to handle multiple constraints:

>>

>>   - Any function that register a new exit handle may be called from another

>>     one. For instance atexit my call atexit or __cxa_atexit may call atexit

>>     (or any combination).

>>

>>   - The internal lists follow the GNU principle of adding no limit, so we

>>     support the minimal POSIX states using a static allocated initial list

>>     (so atexit can't fail due malloc failure), and we add a linked list

>>     to make it grow as requested.

>>

>>   - We need to handle BZ#14333, so we need to stop new registered function

>>     once we are over iterating over the list. I don't think it would be

>>     possible with two lists, specially for the case where one callback

>>     can modify the another list.

> 

> It should be possible to have the entry in multiple lists if we add

> reference counters.


I don't think reference counters would help here: atexit can theoretically
call __cxa_atexit and vice-versa, so we will need to iterate over two lists
using the same logic.

I still think it would be simple to change how the internal exit handle list
is organized. I have implemented it on a personal branch [1], along with a
tests that shows a wrong atexit order.

The changes on how atexit is handled are:
    
  1. Add the __atexit symbol which is linked as __cxa_finalize in
     static mode (so __dso_handle is correctly set).  The __atexit
     symbol adds an ef_at exit_function entry on __exit_funcs,
     different than an ef_cxa one from __cxa_atexit.
    
     Old binaries would still call __cxa_atexit, so we do not actually
     need to add a compat symbol.
    
  2. Make __cxa_finalize to handle ef_at as well, similar to ef_cxa.
  
  3. Change how the internal exit handler are organized, so ef_at
     and ef_on handlers (registered by atexit and on_exit) are executed
     before ef_cxa (registered by __cxa_atexit).
    
     Each entry set (struct exit_function_list) has on type associated
     (el_at or el_cxa) to represent the internal handle it contains.
     New insertions (done by the __atexit, __cxa_atexit, etc.) keep the
     node orders, with following constraints:
    
     3.1. el_at nodes should be prior el_cxa.
     3.2. el_at should contain only ef_at, ef_on, or ef_free elements.
     3.3. el_cxa should contain only ef_cxa or ef_free elements.
     3.4. new insertions on each node type should be be kept in lifo order.
     3.5. the original first element should be last one (since it is static
          allocated and 'exit' will deallocated the nodes in order.
    
     So the execution on both __cxa_finalize, exit, or quick_exit will
     iterate over the list by executing first atexit/on_exit handlers and
     then __cxa_atexit ones.  New handlers added by registered functions
     are handled as before, by using the ef_free entry and reseting the
     list iteration


Nat!, if you are still interested you can check on this branch. It should
fix the issues you brought on libc-help.

[1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/atexit-order
diff mbox series

Patch

diff --git a/dlfcn/dlclose.c b/dlfcn/dlclose.c
index 98e1375c43..c9a1e643f3 100644
--- a/dlfcn/dlclose.c
+++ b/dlfcn/dlclose.c
@@ -43,7 +43,9 @@  __dlclose (void *handle)
     return _dlfcn_hook->dlclose (handle);
 # endif
 
-  return _dlerror_run (dlclose_doit, handle) ? -1 : 0;
+  int r = _dlerror_run (dlclose_doit, handle) ? -1 : 0;
+  __atexit_cleanup ();
+  return r;
 }
 # ifdef SHARED
 strong_alias (__dlclose, dlclose)
diff --git a/dlfcn/modatexit.c b/dlfcn/modatexit.c
index 118ccf5459..8b16e0ba8f 100644
--- a/dlfcn/modatexit.c
+++ b/dlfcn/modatexit.c
@@ -21,11 +21,14 @@ 
 int global;
 int *ip;
 
-extern void dummy (void);
-extern void foo (void *p);
+/* glibc function for registering DSO-specific unload functions.  */
+extern int __cxa_atexit (void (*func) (void *), void *arg, void *d);
+
+/* Hidden compiler handle to this shared object.  */
+extern void *__dso_handle __attribute__ ((__weak__));
 
 void
-dummy (void)
+dummy (void *arg)
 {
   printf ("This is %s\n", __FUNCTION__);
   *ip = global = 1;
@@ -36,6 +39,6 @@  void
 foo (void *p)
 {
   printf ("This is %s\n", __FUNCTION__);
-  atexit (dummy);
+  __cxa_atexit (dummy, NULL, __dso_handle);
   ip = p;
 }
diff --git a/dlfcn/tstatexit.c b/dlfcn/tstatexit.c
index c8cb272ce0..1f6037cb47 100644
--- a/dlfcn/tstatexit.c
+++ b/dlfcn/tstatexit.c
@@ -19,6 +19,8 @@ 
 #include <stdio.h>
 #include <stdlib.h>
 
+#include <support/xdlfcn.h>
+#include <support/check.h>
 
 int
 main (void)
@@ -28,35 +30,15 @@  main (void)
   void (*fp) (void *);
   int v = 0;
 
-  h = dlopen (fname, RTLD_NOW);
-  if (h == NULL)
-    {
-      printf ("cannot open \"%s\": %s\n", fname, dlerror ());
-      exit (1);
-    }
+  h = xdlopen (fname, RTLD_NOW);
 
-  fp = dlsym (h, "foo");
-  if (fp == NULL)
-    {
-      printf ("cannot find \"foo\": %s\n", dlerror ());
-      exit (1);
-    }
+  fp = xdlsym (h, "foo");
 
   fp (&v);
 
-  if (dlclose (h) != 0)
-    {
-      printf ("cannot close \"%s\": %s\n", fname, dlerror ());
-      exit (1);
-    }
+  xdlclose (h);
 
-  if (v != 1)
-    {
-      puts ("module unload didn't change `v'");
-      exit (1);
-    }
-
-  puts ("finishing now");
+  TEST_COMPARE (v, 1);
 
   return 0;
 }
diff --git a/include/stdlib.h b/include/stdlib.h
index 114e12d255..3bf47ea49d 100644
--- a/include/stdlib.h
+++ b/include/stdlib.h
@@ -103,6 +103,9 @@  extern int __on_exit (void (*__func) (int __status, void *__arg), void *__arg);
 extern int __cxa_atexit (void (*func) (void *), void *arg, void *d);
 libc_hidden_proto (__cxa_atexit);
 
+extern int __atexit (void (*func) (void), void *);
+extern void __atexit_cleanup (void);
+
 extern int __cxa_thread_atexit_impl (void (*func) (void *), void *arg,
 				     void *d);
 extern void __call_tls_dtors (void)
diff --git a/stdlib/Versions b/stdlib/Versions
index 9e665d4c26..9a9d911621 100644
--- a/stdlib/Versions
+++ b/stdlib/Versions
@@ -136,6 +136,9 @@  libc {
     strtof32; strtof64; strtof32x;
     strtof32_l; strtof64_l; strtof32x_l;
   }
+  GLIBC_2.29 {
+    __atexit;
+  }
   GLIBC_PRIVATE {
     # functions which have an additional interface since they are
     # are cancelable.
@@ -146,5 +149,6 @@  libc {
     __libc_secure_getenv;
     __call_tls_dtors;
     __strtof_nan; __strtod_nan; __strtold_nan;
+    __atexit_cleanup;
   }
 }
diff --git a/stdlib/atexit.c b/stdlib/atexit.c
index 5671a43926..de8570e3bf 100644
--- a/stdlib/atexit.c
+++ b/stdlib/atexit.c
@@ -43,5 +43,5 @@  attribute_hidden
 #endif
 atexit (void (*func) (void))
 {
-  return __cxa_atexit ((void (*) (void *)) func, NULL, __dso_handle);
+  return __atexit (func, __dso_handle);
 }
diff --git a/stdlib/cxa_atexit.c b/stdlib/cxa_atexit.c
index 5a39778959..88ee811ac3 100644
--- a/stdlib/cxa_atexit.c
+++ b/stdlib/cxa_atexit.c
@@ -23,7 +23,6 @@ 
 #include "exit.h"
 #include <sysdep.h>
 
-#undef __cxa_atexit
 
 /* See concurrency notes in stdlib/exit.h where this lock is declared.  */
 __libc_lock_define_initialized (, __exit_funcs_lock)
@@ -71,6 +70,33 @@  __cxa_atexit (void (*func) (void *), void *arg, void *d)
 }
 libc_hidden_def (__cxa_atexit)
 
+int
+__atexit (void (*func) (void), void *dso_handle)
+{
+  struct exit_function *new;
+
+  /* As a QoI issue we detect NULL early with an assertion instead
+     of a SIGSEGV at program exit when the handler is run (bug 20544).  */
+  assert (func != NULL);
+
+  __libc_lock_lock (__exit_funcs_lock);
+  new = __new_exitfn (&__exit_funcs);
+
+  if (new == NULL)
+    {
+      __libc_lock_unlock (__exit_funcs_lock);
+      return -1;
+    }
+
+#ifdef PTR_MANGLE
+  PTR_MANGLE (func);
+#endif
+  new->func.at.fn = func;
+  new->func.at.dso_handle = dso_handle;
+  new->flavor = ef_at;
+  __libc_lock_unlock (__exit_funcs_lock);
+  return 0;
+}
 
 static struct exit_function_list initial;
 struct exit_function_list *__exit_funcs = &initial;
diff --git a/stdlib/cxa_finalize.c b/stdlib/cxa_finalize.c
index e31b23467c..aeb364eb9b 100644
--- a/stdlib/cxa_finalize.c
+++ b/stdlib/cxa_finalize.c
@@ -22,6 +22,60 @@ 
 #include <sysdep.h>
 #include <stdint.h>
 
+static bool
+handle_cxa (struct exit_function *f)
+{
+  const uint64_t check = __new_exitfn_called;
+
+  void (*cxafn) (void *arg, int status) = f->func.cxa.fn;
+  void *cxaarg = f->func.cxa.arg;
+
+  /* We don't want to run this cleanup more than once.  The Itanium
+     C++ ABI requires that multiple calls to __cxa_finalize not
+     result in calling termination functions more than once.  One
+     potential scenario where that could happen is with a concurrent
+     dlclose and exit, where the running dlclose must at some point
+     release the list lock, an exiting thread may acquire it, and
+     without setting flavor to ef_free, might re-run this destructor
+     which could result in undefined behaviour.  Therefore we must
+     set flavor to ef_free to avoid calling this destructor again.
+     Note that the concurrent exit must also take the dynamic loader
+     lock (for library finalizer processing) and therefore will
+     block while dlclose completes the processing of any in-progress
+     exit functions. Lastly, once we release the list lock for the
+     entry marked ef_free, we must not read from that entry again
+     since it may have been reused by the time we take the list lock
+     again.  Lastly the detection of new registered exit functions is
+     based on a monotonically incrementing counter, and there is an
+     ABA if between the unlock to run the exit function and the
+     re-lock after completion the user registers 2^64 exit functions,
+     the implementation will not detect this and continue without
+     executing any more functions.
+
+     One minor issue remains: A registered exit function that is in
+     progress by a call to dlclose() may not completely finish before
+     the next registered exit function is run. This may, according to
+     some readings of POSIX violate the requirement that functions
+     run in effective LIFO order.  This should probably be fixed in a
+     future implementation to ensure the functions do not run in
+     parallel.  */
+  f->flavor = ef_free;
+
+#ifdef PTR_DEMANGLE
+  PTR_DEMANGLE (cxafn);
+#endif
+  /* Unlock the list while we call a foreign function.  */
+  __libc_lock_unlock (__exit_funcs_lock);
+  cxafn (cxaarg, 0);
+  __libc_lock_lock (__exit_funcs_lock);
+
+  /* It is possible that that last exit function registered
+     more exit functions.  Start the loop over.  */
+  if (__glibc_unlikely (check != __new_exitfn_called))
+    return true;
+  return false;
+}
+
 /* If D is non-NULL, call all functions registered with `__cxa_atexit'
    with the same dso handle.  Otherwise, if D is NULL, call all of the
    registered handlers.  */
@@ -35,59 +89,19 @@  __cxa_finalize (void *d)
  restart:
   for (funcs = __exit_funcs; funcs; funcs = funcs->next)
     {
-      struct exit_function *f;
-
-      for (f = &funcs->fns[funcs->idx - 1]; f >= &funcs->fns[0]; --f)
-	if ((d == NULL || d == f->func.cxa.dso_handle) && f->flavor == ef_cxa)
-	  {
-	    const uint64_t check = __new_exitfn_called;
-	    void (*cxafn) (void *arg, int status) = f->func.cxa.fn;
-	    void *cxaarg = f->func.cxa.arg;
-
-	    /* We don't want to run this cleanup more than once.  The Itanium
-	       C++ ABI requires that multiple calls to __cxa_finalize not
-	       result in calling termination functions more than once.  One
-	       potential scenario where that could happen is with a concurrent
-	       dlclose and exit, where the running dlclose must at some point
-	       release the list lock, an exiting thread may acquire it, and
-	       without setting flavor to ef_free, might re-run this destructor
-	       which could result in undefined behaviour.  Therefore we must
-	       set flavor to ef_free to avoid calling this destructor again.
-	       Note that the concurrent exit must also take the dynamic loader
-	       lock (for library finalizer processing) and therefore will
-	       block while dlclose completes the processing of any in-progress
-	       exit functions. Lastly, once we release the list lock for the
-	       entry marked ef_free, we must not read from that entry again
-	       since it may have been reused by the time we take the list lock
-	       again.  Lastly the detection of new registered exit functions is
-	       based on a monotonically incrementing counter, and there is an
-	       ABA if between the unlock to run the exit function and the
-	       re-lock after completion the user registers 2^64 exit functions,
-	       the implementation will not detect this and continue without
-	       executing any more functions.
-
-	       One minor issue remains: A registered exit function that is in
-	       progress by a call to dlclose() may not completely finish before
-	       the next registered exit function is run. This may, according to
-	       some readings of POSIX violate the requirement that functions
-	       run in effective LIFO order.  This should probably be fixed in a
-	       future implementation to ensure the functions do not run in
-	       parallel.  */
-	    f->flavor = ef_free;
-
-#ifdef PTR_DEMANGLE
-	    PTR_DEMANGLE (cxafn);
-#endif
-	    /* Unlock the list while we call a foreign function.  */
-	    __libc_lock_unlock (__exit_funcs_lock);
-	    cxafn (cxaarg, 0);
-	    __libc_lock_lock (__exit_funcs_lock);
-
-	    /* It is possible that that last exit function registered
-	       more exit functions.  Start the loop over.  */
-	    if (__glibc_unlikely (check != __new_exitfn_called))
+      for (struct exit_function *f = &funcs->fns[funcs->idx - 1];
+	   f >= &funcs->fns[0]; --f)
+	{
+	  if ((d == NULL || d == f->func.cxa.dso_handle)
+	      && f->flavor == ef_cxa)
+	    if (handle_cxa (f))
 	      goto restart;
-	  }
+
+	  /* Mark ef_at_exit handlers are handled by __cxa_finalize.  */
+	  if ((d == NULL || d == f->func.at.dso_handle)
+	      && f->flavor == ef_at)
+	    f->flavor = ef_at_finalize;
+	}
     }
 
   /* Also remove the quick_exit handlers, but do not call them.  */
@@ -108,3 +122,18 @@  __cxa_finalize (void *d)
 #endif
   __libc_lock_unlock (__exit_funcs_lock);
 }
+
+void
+__atexit_cleanup (void)
+{
+  __libc_lock_lock (__exit_funcs_lock);
+  for (struct exit_function_list * funcs = __exit_funcs; funcs != NULL;
+       funcs = funcs->next)
+    {
+      for (struct exit_function *f = &funcs->fns[funcs->idx - 1];
+	   f >= &funcs->fns[0]; --f)
+	if (f->flavor == ef_at_finalize)
+	  f->flavor = ef_free;
+    }
+  __libc_lock_unlock (__exit_funcs_lock);
+}
diff --git a/stdlib/exit.c b/stdlib/exit.c
index 49d12d9952..8352760fae 100644
--- a/stdlib/exit.c
+++ b/stdlib/exit.c
@@ -91,7 +91,8 @@  __run_exit_handlers (int status, struct exit_function_list **listp,
 	      onfct (status, f->func.on.arg);
 	      break;
 	    case ef_at:
-	      atfct = f->func.at;
+	    case ef_at_finalize:
+	      atfct = f->func.at.fn;
 #ifdef PTR_DEMANGLE
 	      PTR_DEMANGLE (atfct);
 #endif
diff --git a/stdlib/exit.h b/stdlib/exit.h
index 6fab49c94f..a1d4860206 100644
--- a/stdlib/exit.h
+++ b/stdlib/exit.h
@@ -28,6 +28,7 @@  enum
   ef_us,
   ef_on,
   ef_at,
+  ef_at_finalize,
   ef_cxa
 };
 
@@ -38,7 +39,11 @@  struct exit_function
     long int flavor;
     union
       {
-	void (*at) (void);
+	struct
+	  {
+	    void (*fn) (void);
+	    void *dso_handle;
+	  } at;
 	struct
 	  {
 	    void (*fn) (int status, void *arg);
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index 59f85d9373..699fd105de 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -1895,6 +1895,7 @@  GLIBC_2.28 thrd_current F
 GLIBC_2.28 thrd_equal F
 GLIBC_2.28 thrd_sleep F
 GLIBC_2.28 thrd_yield F
+GLIBC_2.29 __atexit F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F