Thread Local Storage

Mon Mar 29 14:27:58 PST 2004

On Mon, 29 Mar 2004, Doug Rabson wrote:

> On Monday 29 March 2004 22:50, Julian Elischer wrote:
> > On Mon, 29 Mar 2004, Doug Rabson wrote:
> > > How do you do that without an expensive syscall?
> >
> > Each cpu has a GDT entry pointing to a slightly different location
> > and the kernel and the UTS co-operate as to making the contents of
> > that location keep track of the thread currently running. The kernel
> > always makes sure that %gs points to the correct place when it goes
> > back to userland and teh contents of that location is saved whenever
> > a thread sleeps and is restored whenever it wakes up again.
> >
> > it can even be in kernel memory space as long as teh segment
> > is defined as r/w to the user and the segment size is limitted to
> > just that region it is allowed to read from.
> > This actualy gives us a 'per thread' pointer rather than a per-kse
> > pointer which might allow us to go to the GNU version of TLS but we
> > haven't worked out the details..
> > David and peter have both looked at this but we haven't actually
> > tried to do it yet..
> 
> Yes, I understand about the per-cpu GDT thing. I've been reading linux 
> kernel code and glibc all afternoon and my brain is melting :-). It 
> would work great for libthr.
> 
> For the GNU TLS ABI, the %gs segment can't be in kernel space because of 
> the negative offset thing which is generated by the local exec TLS 
> model. That idiom requires the %gs segment to have a 4G limit so that 
> the negative addresses can wrap round properly. If it wasn't for that 
> single 'optimisation', we could just go with the GNU model right now.

and how do they stop the user from accessing kernel space then?

> 
> My question is, given that we are switching threads in userland, how can 
> we get the right contents for the GDT without a syscall?
> 

as you note,
The thread library and the dynamic linker know ahead of time the size of
the static area needed for each thread.. i.e. the size of 
the area addresed -vely.

Unfortunatly you an only make use of this for N:N threads
(I dislike teh term 1:1, which in my mind is a single threaded program)
If you compile libpthread in 1:1 mode (define FORCE_SCOPE_SYSTEM when
you compile it) then this would work for us but for M:N threads it woudl
require a syscall for every context switch
(which removes the whole point of M:N threads. The reason for M:N
threads is the ability to hop between threads in userspace without doing
a kernel entry.

I tend to believe that going with the solaris model is good enough
for now and that we can try optimise later if we think its worth while.

We can use a 'simple per-CPU GDT %gs' and the solaris model and
get enough for our purposes...

lets just go with that..
amd64 and ia64 have their own challenges.