TLS(and by extension all threading) completely broken in Valgrind on i386/amd64

Sun Feb 7 19:35:38 UTC 2010

I've been trying out valgrind on some threaded FreeBSD applications
but they've been deadlocking at startup.  I've identified that the
root cause is that FreeBSD's thread local storage is not being
emulated properly by valgrind.  The problem on amd64 is obvious:
valgrind gives an invalid opcode error when the program tries to
execute any instruction that accesses the gs register.  On i386 the
problem is much more subtle.

I've attached two test applications that demonstrate the problem.  In
pthread_self.c, I create one thread which periodically prints
pthread_self(), and then 10 seconds later I create a second thread.
After the second thread is created, the first thread believes that it
is the second thread.  Here's an example invocation:

==883== Memcheck, a memory error detector
==883== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==883== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==883== Command: ./pthread_self
==883==
0x18c180
0x18c180
0x18c180
0x18c180
0x18c180
0x18c180
0x18c180
0x18c180
0x18c180
1st: 0x18c180
2nd: 0x18d390
0x18d390
        0x18d390
0x18d390
        0x18d390
0x18d390
        0x18d390
0x18d390
        0x18d390
0x18d390
        0x18d390

Note that first thread correctly prints that its pthread_t is 0x18c180
before the second thread is created, but after the second thread is
created both threads report that they are 0x18d390!  As far as I can
tell, all threads use the thread local storage of the last thread
created.  This completely breaks libthr's mutexes, as mutex.c
demonstrates.  In that test app, the main thread acquires a mutex and
then creates a new thread, then it tries to unlock the mutex.  The
unlock fails with EPERM, which is returned by pthread_mutex_unlock
when a thread tries to acquire a mutex that it does not own.  This
behaviour is likely the cause of all of the "false positives" from
helgrind.  Helgrind is correctly noting that the libthr internals are
using the same memory in different threads, because the threads think
that they are touching thread-local memory.

I've found the point in the thr_new syscall wrapper where valgrind
notes the TLS area, but I can't figure out how it uses the
information, so I'm stuck in figuring out why valgrind is getting this
wrong.  Anyone have any ideas?  I'm not subscribed to this list so
please CC me on any replies.