Fixing dlopen("")

Konstantin Belousov kostikbel at
Fri Dec 26 16:53:46 UTC 2014


It is somewhat well-known that our cannot be loaded
dynamically into the process.  Or rather, it can be, but the
consequences are catastrophic.  We recommend to link any program which
may load modules, explicitely with -lpthread; the known workaround is
to do for binaries which were not.  I
implemented support for ld -z nodlopen some time ago, but attempt to
mark as non-loadable caused extreme roar.

A common opinion is that the proper way to fix the problem is
to merge the actual code from libthr into libc, leaving libthr as the
filter to preserve the current ABI.  Unfortunately, there are some
non-trivial and undesirable consequences of doing this.

First, all pthread mutexes (and other kind of locks) would become
fully initialized and used even for single-threaded programs, at least
I do not see a way to work around this.  Right now, libc shims for
pthread_mutex_init() and pthread_mutex_lock(3) are nop.  After the
merge, init needs to allocate memory and lock/unlock operations,
although uncontested, will start costing one atomic each.  In
particular, malloc(3) and stdio(3) are affected.

Another very delicate issue is introducing unwanted cancellation
points into libc functions after libthr wrappers become mandatory.
This is fixable, but requires lot of mundane work and probably a long
time to find missed places (i.e. bugs).

There are probably more problems, and this brings an obvious
alternative: fix the issues which make dlopen("") so

One known show-stopper is the broken errno after the load.  The libthr
provides the interposer for the errno and all cancellable functions
from libc.  If any interposed symbols have been resolved before the was loaded, or non-lazy binding mode is requested, the
bindings cannot be undonde.  In particular, references to __error(),
which implements errno, are bound to return locate of the main thread
errno variable.  Similarly, code referencing cancellable functions
still gets the uncancellable libc implementations of them.

Another issue is the recursion between malloc(3) and mutex_init().
The statically initialized pthread_mutex_t needs some further
initialization before first use.  Jemalloc calls pthread_mutex_init(3)
for internally-used mutexes, which is nop stub from libc until libthr
is loaded.  After the load, first use of any mutex by malloc(3) leads
to the thr_mutex.c initialization code, which needs calloc(3).  This
immediately leads to hang due to recursion on some internal libthr
umtx.  Making the lock recursive does not solve the problem, which is
the infinite mutual recursion between malloc and pthread_mutex_lock()
for uninitialized malloc mutex.

Yet another issue is the signal handlers.  The libthr routes signal
delivery through its internal signal handler, to avoid interrupting
critical sections.  Any signal handler installed prior to libthr is
loaded misses the wrapper, potentially breaking cancellation and
critical sections.

Proposed patch does the following:

- Remove libthr interposers of the libc functions, including
  __error(). Instead, functions calls are indirected through the
  interposing table, similar to how pthread stubs in libc are already
  done.  Libc by default points either to syscall trampolines or to
  existing libc implementations.  On libthr load, it rewrites the
  pointers to the cancellable implementations already in libthr.

- Postpone the malloc(3) internal mutexes initialization until libthr
  is loaded.

- Reinstall signal handlers with wrapper on libthr load.

The signal handler reinstallation on libthr initialization is only
needed when is dlopened.  Performing 128*2 sigaction(2)
calls on the startup of the binary which is linked to libthr, and thus
libthr is guaranteed to install proper sighandler wrappers, is huge
waste.  So, I perform the hand-over of signal handlers only for the
dlopen-ed libthr, which now needs to detect loading at startup
vs. dlopen.  I was unable to distinguish the cases using existing
facilities, so new private rtld interface is implemented,
_rtld_is_dlopened(), to query the way library was brought into the
process address space.

Without some special measures, static binaries would pull in the whole
set of the interposed syscalls due to references from the
interposition table.  To fix it, the references are made weak.  Also,
to not pull in the pthread stubs, the interposition table is separate
from pthreads stubs indirection table.

The patch is available at .
Among other things, I tested it with the program illustrating the
issues .
Note that you must use matching versions of rtld, libc and libthr.
Using old or old with new will
break the system.

Work was sponsored by The FreeBSD Foundation.

More information about the freebsd-threads mailing list