NVIDIA and TLS

Mon Jun 16 23:35:43 PDT 2003

Alexander Kabaev wrote:
> Gareth Hughes <gareth at nvidia.com> wrote:
> > If FreeBSD support ELF TLS and __thread variables in ANY form, our
> > driver will use this support.  If the best you can do is a function
> > call per access, so be it.  It doesn't sound like there are any other
> > options, given the fact that you ship with three different thread
> > libraries.
> 
> Three different libraries can attempt to coordinate and reserve exactly
> the same TLS segment size.

Gareth's point is that there is a function call to abstract
that, unless all approaches use the same mechanism.

Right now, he's said that he can't use %fs, because he has a
version of WINE that uses OpenGL to implement it's video, and
the threads people have said that he can't use %gs because it
points to a per-KSE value, not a per thread value, and there
are potentially many threads associated with each KSE, so it's
no good for TLS, only for KSELS.

I've suggested that he use the compiler flags to reserve another
*different* register (not %fs and not %gs) for his purpose; the
compiler has explicit support for this.  If he is wiling to burn
a register in order to get this functionality, it should be one
that's normally allocated for user programs to burn anyway.

Julian has pointed out that the TLS data should probably be cached
following a return from a context switch as a result of the user
space threads scheduler, if there is one, scheduling the thread to
run; so far, it seems that Gareth has mistaken what Julian meant
by this (that the reload happen high up in the OpenGL code, using
an explicit call, to do the register switch, so that it's available
lower down; this would work with the %fs approach, without breaking
WINE).

In addition, it seems that Gareth's Linux approach is assuming that
a thread which is involuntarily context switch will be run again,
when the next quantum becomes available.  This is an invalid
assumption for libkse (though it's currently hacked, locally, by
most people to keep non-strict POSIX applications happy), and
could be an invalid assumption in Linux and libthr in FreeBSD,
should the scheduler code change in the future to recalculate
priority prior to giving the quantum to a particular thread.

Add to this that the model that Gareth is suggesting as a defacto
standard is one advocacted by a company that's sueing IBM over the
Linux OS containing uniform code in 60 lines of its scheduler which
might very well be related to this publication, for all we know,
and the assumptions about implementation implicit in the standard,
and there's a significant issue that needs to be resolve here, and
I'm not just talking about the legal implications of implementing
to a Caldera/SCO specification that they may feel contains their
intellectual property.

Minimally, we are likely talking a "first call" optimization, at a
minimum, in order to set a global offset as valid relative to a
register common to all three threads libraries in FreeBSD, and,
potentially, two of them, if we *must* go indirect through %gs to
get the TLS from the KSELS in %gs.

Does that jive with what everyone else understands of thi mailing
list thread so far?

-- Terry