Breaking the crt1.o -> atexit() -> malloc() dependency

Fri Mar 7 22:08:26 PST 2008

On Thu, 6 Mar 2008, Kostik Belousov wrote:

> On Wed, Mar 05, 2008 at 05:12:32PM -0800, Tim Kientzle wrote:
>> There was some recent discussion on the commit mailing
>> list about how to disentangle crt1.o from malloc().
>>
>> Here's a design that I think addresses all of the
>> issues people raised, including the POSIX requirement
>> that atexit() always be able to support 32 registrations.
>> It does it without using sbrk() or mmap(), either.
>>
>> The basic idea is to lift the malloc() call up into
>> atexit() and have atexit_register() use statically-allocated
>> storage if atexit() didn't provide dynamically-allocated
>> storage.
>> ...
>> /* 32 required by POSIX plus a few for crt1.o */
>> static struct atexit pool[40];

Could it use a few for crt1 only, with dynamic allocation for everything
except crt1 and maybe stdio?  This might simplify the frees.

I don't agree with the argument that static allocation is needed or useful
for satisfying the requirement for 32 atexits to succeed.  malloc() can't
fail :-), and if it does then you have worse problems than atexit failures
to handle.

>> Avoiding free() from the low-level code is a little trickier
>> but I think it can be done by having the low-level code
>> put (dynamically-allocated) blocks back onto a free list
>> and having the higher-level atexit() release that list
>> on the next registration.  This should handle the case
>> of a dynamic library being repeatedly loaded and unloaded.
>> Of course, it's unnecessary to release the atexit storage
>> on program exit.

With separate storage for crt1, everything for crt1 except the
calls to the registered functions could be independent of atexit()
- just call the entries in the separate storage last at exit time.
stdio's rotting __cleanup hook works like this.  __cleanup's
reason for existence is to provide an atexit-like hook for stdio
without the full bloat of atexit, but this is defeated by always
calling atexit() from crt1.  This hook costs 1 pointer and one
statement in exit() when it is not used.  exit() still calls
__cleanup last (iff __cleanup is not null.  Thus __cleanup
effectively extends the static atexit table by 1 entry (the
first one).

>> In particular, crt1.o can then call atexit_register(f, NULL)
>> to register its exit functions without creating a dependency on
>> malloc.

Or it could do __cleanupN = functionN for a few small values of N
like stdio does for __cleanup.  Then it wouldn't have any dependency
on atexit either, but the ugliness in exit.c for __cleanup would need
to be duplicated for each __cleanupN.  At most 3 values of N need to
be supported (same for all arches I think):

 	for function cleanup = get_rtld_cleanup();	/* dynamic only */
 	for function _mcleanup				/* profiling only */
 	for function _fini				/* always */

Better, make all these atexit calls implicit.  The conditions for them
don't depend on the startup code, so __cxa_finalize() can call them
directly (except it needs a pointer for get_rtld_cleanup()).  __cxa_finalize
can also handle __cleanup (move the call though __cleanup from exit.c
to atexit.c).

I think this works so simply and machine-indepependently mainly because
most of the details are in _fini.  _fini calls __do_global_dtors_aux
on at least i386.  Any number of magically ordered cleanups can be hidden
there.

>> This does require that atexit() and atexit_register() be in
>> separate source files, but I think it addresses all of the other
>> concerns people have raised.
>
> I mostly agree with proposal, but there is also __cxa_atexit().

More bloat to remove :-).  It seems to be only for C++, but all
executables have it.  Before it existed, exit() looped over the atexit
table where it now calls __cxa_atexit(), and the order of the atexit
finalizations relative to the __cleanup was clearer.

> And, besides the issue of the size of the static linked executables,
> there is more exposed problem of atexit() memory leaks. See
> http://lists.freebsd.org/pipermail/freebsd-stable/2008-February/040644.html

This problem seems to only affect C++.  But how does the C dlclose() work
without calling __cxa_atexit()?

Bruce