LDT/GDT and TCB use by threads

From: Paul Floyd <paulf2718_at_gmail.com>
Date: Sat, 14 May 2022 16:21:39 UTC
Hi

I'm trying to understand and fix a problem that I have with Valgrind 
(which I maintain on FreeBSD).

The full gory details are here 
https://github.com/paulfloyd/freebsd_valgrind/issues/85

In short, on i386 Valgrind allocates a block of 8k slots for the LDT/GDT 
when the first thread gets created. For each subsequent thread it uses 
up one slot, until when the 8192 slots have all been used and libthr 
fails because of a dirty thread exit.


On Linux x86, from what I've seen, main() allocates a thread area 
(Valgrind intercepts a syscall sys_set_thread_area). Then syscall 
sys_clone (which is used for both pthread_create and fork, with 
different flags) allocates a new GDT every time. Thread exits deletes 
the thread area. And after a fork, some Valgrind child code deletesĀ  the 
GDTs other than for the running tid.

My understanding is that FreeBSD has a lighter weight pthread 
implementation. I think that I should be doing something similar in the 
case of fork - delete all GDTs other than for the running tid.

I'm not sure what I can do in the case of thread creation and exit. If I 
delete the GDT like on Linux I get a crash since it seems on FreeBSD the 
GDT still gets used after thr_exit() (in particular, I see calls to 
get_currthread() in ld-elf32.so which access a lock in the TCB/GDT).

I have a few questions (and may have more as I continue to debug).

What exactly causes an application to switch to using threaded versions 
of libc functions (I'm thinking specifically of fork())? Is it just by 
linking with libthr? Or does the application need to create a thread 
first before switching?

Maybe a bit harder to say, but what is the usage of the GDT/ TCB in 
terms of thread lifetime? And the same question for an ordinary process 
and for a process created in a threaded application.


A+

Paul