NVIDIA and TLS

Mon Jun 16 17:30:15 PDT 2003

On Mon, 16 Jun 2003, Gareth Hughes wrote:

> On Mon, 16 Jun 2003, Julian Elischer wrote:
> > 
> > I think that the problem is that the access method for TLS is dependent
> > on which library is used.
> > 
> > [snip]
> > 
> > The trouble is that each of these would require a differnt mechanism to
> > reach TLS and the compiler cannot know ahead of time which one to use.
> >
> > [snip]
> > 
> > I may be wrong but I don't think it is a standard yet..
> > especailly for the reason that we see here..
> > It requires that the compiler know what threading library is in use.
> > 
> > We could certainly implement efficient TLS code generation for each
> > library, but which one would be compiled in when you compile a .o file
> > that may be used with any library?
> 
> Please read Ulrich Drepper's document, if you haven't done so already.
> You'll see that the general case of __thread variable access involves
> a function call to look up the variables address.  There are
> optimizations to this access model, that allows one of the other three
> models to be used (ranging from a function call the first time a
> __thread variable is accessed, down to a single instruction per
> access).  FreeBSD could trivially implement __thread variables with
> the General Dynamic model (involving a function call per access).
> Our driver uses the Local Exec model (single instruction per access)
> because GNU libc has an optimization on x86 that allows shared
> libraries to use this model, which is normally reserved for
> applications.  The key thing is that they're still __thread variables,
> the access model depends on the compile time options used and what's
> available at runtime.  Please, I urge you to read Drepper's document
> carefully.
> 

I have read most of it already.

What I'm saying is that we can and probably should implement TLS using
the general model to provide TLS at "sane" speed, (e.g.  5 instructions)
but that I don't think we can implement the "1 instruction" version that
you are asking for without breaking the binary compatibility that we
currently have between our 3 pthread libraries. We can currently switch
libraries between 3 very different threads libraries without recompiling
the app or any other libraries involved. in fact we have a config file
to the loader that specifies which one to use at run time **Per
application**. Without using an entrypoint (or maybe self modifying
code) (*EEK!*) I don't see how we can do it and keep that *Very useful*
functionality.