"initial-exec" TLS model and dlopen(3)

Jean-Sébastien Pédron dumbbell at FreeBSD.org
Fri Feb 19 23:06:58 UTC 2016


== Context ==

After Mesa 11.2.0 branch point (RC1 is scheduled today), Emil Velikov,
Mesa release engineer, told us he plans to only keep the TLS-based GL
dispatcher and remove the other code path.

However, even if all Linux distributions use the TLS-based one for a
long time, it was still turned off by default in the configure, and we
never field-tested it on FreeBSD.

We enabled the --enable-glx-tls flag in Mesa in our development Ports
tree. Unfortunately, some applications segfault after that. Firefox is
one of them.

Below is what I understand about this issue so far.

== The issue ==

Most applications are linked directly to libGL.so. In this case, there
is no problem. For example, glxgears is linked directly to libGL.so.

Some of them use dlopen(3) to directly or indirectly load libGL.so. For
example, Firefox dlopens libxul.so which is linked to libGL.so.

Mesa uses the "initial-tls" model for at least one TLS variable:

I'm new to TLS implementation details, but if I understand correctly,
this model is a static one, meaning that a variable address is known and
it's accessed directly, like a normal variable, as opposed to dynamic
TLS models where a variable address is first queried with
__tls_get_addr(). This is all transparent to the program because the
compiler is responsible for generating the appropriate code, depending
on the model.

In the case of a direct link like glxgears, our rtld (ld-elf.so)
allocates space during startup to copy static TLS variables from the
program and all linked libraries. libGL.so finds its variables where it
expects them to be, glxgears is happy.

In the case of a dlopen(3) like Firefox, our rtld maps the dlopen'd
object and all its linked libraries but it doesn't look for any static
TLS variables. libGL.so accesses the allocated TLS storage (there is a
small extra chunk of zero'd memory allocated) but its variables were not
copied. So it gets a NULL pointer, dereferences it, End of the World.

Here is a small test program to demonstrate the crash:

== Solutions ==

A first workaround is to LD_PRELOAD libGL.so or link the program
directly to libGL.so.

Another solution is the following: in the Glibc (quite popular these
days), they allocate extra static TLS space beside the size of the TLS
variables available at startup (ie. TLS variables from the program and
linked libraries). Then, when a library is dlopen'd with static TLS
variables, they are copied to this extra space. This space is not
dynamically extended, so first loaded, first served. If there is no
space left, I think dlopen(3) fails.

In FreeBSD's rtld, we already allocate extra space. See for instance the
use of the RTLD_STATIC_TLS_EXTRA constant here:

The command even says that this extra space is allocated specifically
for dynamic modules. However, I don't see where we use this space.
dlopen_object() doesn't mess with TLS at all (or I'm missing something).

FWIW, Mesa's libGL.so is not the only one to do this. NVIDIA's libGL.so
uses the same technic and apparently, AMD Catalyst's one too on Linux. I
wonder if the following bug is caused by this exact issue:

I would like to modify dlopen_object() to install static TLS variables
in this extra space. Or do you suggest a better alternative?

If possible, I would like to have this into FreeBSD 10.3-RELEASE to
avoid future maintenance headaches of Mesa.

Some references about this issue:

Jean-Sébastien Pédron

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 949 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20160220/b5f65bef/attachment.sig>

More information about the freebsd-arch mailing list