Fixing dlopen("")

Daniel Eischen deischen at
Fri Dec 26 18:03:53 UTC 2014

On Fri, 26 Dec 2014, Konstantin Belousov wrote:

> [Long]
> It is somewhat well-known that our cannot be loaded
> dynamically into the process.  Or rather, it can be, but the
> consequences are catastrophic.  We recommend to link any program which
> may load modules, explicitely with -lpthread; the known workaround is
> to do for binaries which were not.  I
> implemented support for ld -z nodlopen some time ago, but attempt to
> mark as non-loadable caused extreme roar.
> A common opinion is that the proper way to fix the problem is
> to merge the actual code from libthr into libc, leaving libthr as the
> filter to preserve the current ABI.  Unfortunately, there are some
> non-trivial and undesirable consequences of doing this.
> First, all pthread mutexes (and other kind of locks) would become
> fully initialized and used even for single-threaded programs, at least
> I do not see a way to work around this.  Right now, libc shims for
> pthread_mutex_init() and pthread_mutex_lock(3) are nop.  After the
> merge, init needs to allocate memory and lock/unlock operations,
> although uncontested, will start costing one atomic each.  In
> particular, malloc(3) and stdio(3) are affected.
> Another very delicate issue is introducing unwanted cancellation
> points into libc functions after libthr wrappers become mandatory.
> This is fixable, but requires lot of mundane work and probably a long
> time to find missed places (i.e. bugs).
> There are probably more problems, and this brings an obvious
> alternative: fix the issues which make dlopen("") so
> destructive.
> One known show-stopper is the broken errno after the load.  The libthr
> provides the interposer for the errno and all cancellable functions
> from libc.  If any interposed symbols have been resolved before the
> was loaded, or non-lazy binding mode is requested, the
> bindings cannot be undonde.  In particular, references to __error(),
> which implements errno, are bound to return locate of the main thread
> errno variable.  Similarly, code referencing cancellable functions
> still gets the uncancellable libc implementations of them.
> Another issue is the recursion between malloc(3) and mutex_init().
> The statically initialized pthread_mutex_t needs some further
> initialization before first use.  Jemalloc calls pthread_mutex_init(3)
> for internally-used mutexes, which is nop stub from libc until libthr
> is loaded.  After the load, first use of any mutex by malloc(3) leads
> to the thr_mutex.c initialization code, which needs calloc(3).  This
> immediately leads to hang due to recursion on some internal libthr
> umtx.  Making the lock recursive does not solve the problem, which is
> the infinite mutual recursion between malloc and pthread_mutex_lock()
> for uninitialized malloc mutex.
> Yet another issue is the signal handlers.  The libthr routes signal
> delivery through its internal signal handler, to avoid interrupting
> critical sections.  Any signal handler installed prior to libthr is
> loaded misses the wrapper, potentially breaking cancellation and
> critical sections.
> Proposed patch does the following:
> - Remove libthr interposers of the libc functions, including
>  __error(). Instead, functions calls are indirected through the
>  interposing table, similar to how pthread stubs in libc are already
>  done.  Libc by default points either to syscall trampolines or to
>  existing libc implementations.  On libthr load, it rewrites the
>  pointers to the cancellable implementations already in libthr.
> - Postpone the malloc(3) internal mutexes initialization until libthr
>  is loaded.
> - Reinstall signal handlers with wrapper on libthr load.
> The signal handler reinstallation on libthr initialization is only
> needed when is dlopened.  Performing 128*2 sigaction(2)
> calls on the startup of the binary which is linked to libthr, and thus
> libthr is guaranteed to install proper sighandler wrappers, is huge
> waste.  So, I perform the hand-over of signal handlers only for the
> dlopen-ed libthr, which now needs to detect loading at startup
> vs. dlopen.  I was unable to distinguish the cases using existing
> facilities, so new private rtld interface is implemented,
> _rtld_is_dlopened(), to query the way library was brought into the
> process address space.
> Without some special measures, static binaries would pull in the whole
> set of the interposed syscalls due to references from the
> interposition table.  To fix it, the references are made weak.  Also,
> to not pull in the pthread stubs, the interposition table is separate
> from pthreads stubs indirection table.
> The patch is available at
> .
> Among other things, I tested it with the program illustrating the
> issues .
> Note that you must use matching versions of rtld, libc and libthr.
> Using old or old with new will
> break the system.
> Work was sponsored by The FreeBSD Foundation.

I took a once-over look at the patch and it looks good.

I never liked the intimacy between malloc and libpthread, but
there's nothing we can currently do about that until mutexes
become real objects instead of pointers.


More information about the freebsd-threads mailing list