NVIDIA and TLS

Daniel Eischen eischen at pcnet.com
Mon Jun 16 16:13:43 PDT 2003


On Mon, 16 Jun 2003, Julian Elischer wrote:
> 
> On Mon, 16 Jun 2003, Gareth Hughes wrote:
> > I'm not sure what you mean by this.  Do you mean if you provide us with a
> > magic function to get our thread-local data, we'd be non-portable?  Or that
> > because we use the ELF TLS stuff we're non-portable already?
> 
> Both :-)
> ELF TLS is a proposal and not yet (that I know of) part of the standard.
> 
> It is however probably a good thing to support ELF TLS if it comes to be
> a standard so I'm treating this as a "wakeup" but errno (for example) is
> not something that is very time critical and most other uses of TLS are
> not as time critical as your app appears to be. Usually this sort of
> thing is done with a argument in the function calls pointing to the 
> local state, rather than using __thread (which is new).

Right.  It just seems wrong to me to be able to insert __thread
after every global variable in a library and instantly make
it thread-safe.

> Anyone have recent numbers on times needed to change a segment
> register? :-/
> It is sometimes slow to load segment registers on some
> chips, so we try to not do it when we switch in userspace to
> another thread (we have a pointer to the current CPU's structure and in
> THAT we switch the "current thread" pointer. We COULD change things
> around so that %gs points to a thread-specific data pointer and go
> backwards to the "per CPU" structure, but what do we do when we 

I don't think we (libkse/libpthread) can do that.  The %gs
register has to point to the KSE mailbox so that we can
clear km_curthread in one instruction.  A thread can switch
between KSEs so you can't atomically dereference and set 
curthread->kmbx->km_curthread.

> have no thread running? (just the Userland thread scheduler is running
> but has not selected a thread to run). I guess we could have a DUMMY
> thread structure, but that suggests 2 loads of the segment register
> for each contect switch in userland..
> (as opposed to 0). One to the dummy, and one to the next thread.
> that could realy slow down all the other apps.

So far we have avoided a dummy thread structure.  We don't
want to necessitate one.

> It's an interesting problem. Luckily we are only just starting out
> in the "kernel supported threads" game so we can still change things
> relatively easily. We COULD make the C compiler generate
> 
> 
>  leal %gs:(TD_OFFSET)
>  %eax movl TLS(%eax),%eax 
>  (or whatever the syntax is) I guess but it would generate code that
> would refuse to work on lib_r and with the N:N threads.
> 
> Also the compiled code seems to be adding its own LDT entries that would
> be a bad idea if the threading system is doing that too.

-- 
Dan Eischen



More information about the freebsd-threads mailing list