Nvidia, TLS and __thread keyword -- an observation

Terry Lambert tlambert2 at mindspring.com
Wed Jun 18 02:07:55 PDT 2003


Marcel Moolenaar wrote:
> On Wed, Jun 18, 2003 at 07:48:09AM +0800, David Xu wrote:
> > I believe this will add overhead to thread creating and destroying,
> > How fast an RTLD can be in this case ?
> 
> In the dynamic TLS model you would like to delay the creation of
> the TLS space. Normally __tls_get_addr() gets used for this. In
> the static TLS model you allocate the TLS when you llocate the
> thread control structure.

Lazy binding in this context doesn't make a lot of sense.  In
the case of a dynamically linked binary, every thread will need
the context sooner or later, unless you are running a heterogeneous
workload; even then, the locking required at access time would be
prohibitive, compared to binding it once at object load time.

For modules (e.g. Apache CGI's dlopen'ed and loaded in), the same
is true: you will run them eventually, when your thread ends up
with a request, unless you do special dancing around creating
"uncontaminated" vs. "contaminated" thread pools, and reluctantly
migrate from the former to the latter -- i.e.: make special effort
to keep threads uncontaminated by having had a particular dynamic
object's TLS attached to it.  Garbage collection would be ugly, the
locking overhead would be just as great, trying to keep the TLS
sections dynamically bound, and you would have much more overhead
on thread teardown.  Forget thread_join()!  The code for the poor
applicaiton to keep track of this would very much exceed the
overhead of the application just passing around a context to the
reentrant functions.


> Thus, there's virtually no cost. However TLS accesses for the
> dynamic TLS model are expensive. I have some ideas about that.
> With some kernel support you can even create dynamic TLS with
> static TLS code sequences...

I think you can only do this if you do *not* lazy-bind, and the
refrences are at fixed offsets.  This means orphaning sections
of TLS data any time you unload a dynamic module that formerly
referenced it.

Much better to make thread attach/detach as explicit as process
attach/detach already is, with the .init/.fini sections that are
there for the process, create corresponding ones for the threads.

-- Terry


More information about the freebsd-threads mailing list