Re: LDT/GDT and TCB use by threads

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Sat, 14 May 2022 19:21:14 UTC
On Sat, May 14, 2022 at 06:21:39PM +0200, Paul Floyd wrote:
> Hi
> 
> I'm trying to understand and fix a problem that I have with Valgrind (which
> I maintain on FreeBSD).
> 
> The full gory details are here
> https://github.com/paulfloyd/freebsd_valgrind/issues/85
> 
> In short, on i386 Valgrind allocates a block of 8k slots for the LDT/GDT
> when the first thread gets created. For each subsequent thread it uses up
> one slot, until when the 8192 slots have all been used and libthr fails
> because of a dirty thread exit.
> 
> 
> On Linux x86, from what I've seen, main() allocates a thread area (Valgrind
> intercepts a syscall sys_set_thread_area). Then syscall sys_clone (which is
> used for both pthread_create and fork, with different flags) allocates a new
> GDT every time. Thread exits deletes the thread area. And after a fork, some
> Valgrind child code deletesĀ  the GDTs other than for the running tid.
> 
> My understanding is that FreeBSD has a lighter weight pthread
> implementation. I think that I should be doing something similar in the case
> of fork - delete all GDTs other than for the running tid.
> 
> I'm not sure what I can do in the case of thread creation and exit. If I
> delete the GDT like on Linux I get a crash since it seems on FreeBSD the GDT
> still gets used after thr_exit() (in particular, I see calls to
> get_currthread() in ld-elf32.so which access a lock in the TCB/GDT).
> 
> I have a few questions (and may have more as I continue to debug).
> 
> What exactly causes an application to switch to using threaded versions of
> libc functions (I'm thinking specifically of fork())? Is it just by linking
> with libthr? Or does the application need to create a thread first before
> switching?
For fork, just linking or dynamically loading libthr makes it use the
libthr' intercepted version of the function.  Start looking at lib/libc/sys/
interposing_table.c.

For pthread stubs in libc, like pthread_mutex_lock(), there is similar,
but different, stub overriding mechanism, see lib/libc/gen/_pthread_stubs.c
for the initial pointer.

Also, some places, like stdio, have direct check for __isthreaded value.

> 
> Maybe a bit harder to say, but what is the usage of the GDT/ TCB in terms of
> thread lifetime? And the same question for an ordinary process and for a
> process created in a threaded application.

GDT is static per-CPU.  I do not remember i386 code, for amd64 we create
fixed-size per-process LDT on the first LDT-related syscall, like
sysarch(I386_SET_LDT).  For i386 we might still dynamically grow LDT
if needed.

In long mode, %fs base points to TCB.  In 32bit mode, %gs base does.
On context switch, if %fs (or %gs) contain the magic segment values,
amd64 kernel reloads the bases for incoming thread' values.  If one
of the %fs/%gs segment registers does not point to magic descriptor,
it is left untouched (of course it is reloaded so that CPU has correct
descriptor cache populated, but its base is not set).