Thread Local Storage

Peter Wemm peter at wemm.org
Mon Mar 29 15:14:30 PST 2004


On Monday 29 March 2004 02:36 pm, Julian Elischer wrote:
> On Mon, 29 Mar 2004, Doug Rabson wrote:
> > On Monday 29 March 2004 22:50, Julian Elischer wrote:
> > > On Mon, 29 Mar 2004, Doug Rabson wrote:
> > > > How do you do that without an expensive syscall?
> > >
> > > Each cpu has a GDT entry pointing to a slightly different
> > > location and the kernel and the UTS co-operate as to making the
> > > contents of that location keep track of the thread currently
> > > running. The kernel always makes sure that %gs points to the
> > > correct place when it goes back to userland and teh contents of
> > > that location is saved whenever a thread sleeps and is restored
> > > whenever it wakes up again.
> > >
> > > it can even be in kernel memory space as long as teh segment
> > > is defined as r/w to the user and the segment size is limitted to
> > > just that region it is allowed to read from.
> > > This actualy gives us a 'per thread' pointer rather than a
> > > per-kse pointer which might allow us to go to the GNU version of
> > > TLS but we haven't worked out the details..
> > > David and peter have both looked at this but we haven't actually
> > > tried to do it yet..
> >
> > Yes, I understand about the per-cpu GDT thing. I've been reading
> > linux kernel code and glibc all afternoon and my brain is melting
> > :-). It would work great for libthr.
> >
> > For the GNU TLS ABI, the %gs segment can't be in kernel space
> > because of the negative offset thing which is generated by the
> > local exec TLS model. That idiom requires the %gs segment to have a
> > 4G limit so that the negative addresses can wrap round properly. If
> > it wasn't for that single 'optimisation', we could just go with the
> > GNU model right now.
>
> and how do they stop the user from accessing kernel space then?

They use page level protection and set *all* the segment limits to 4GB.  
This is actually an important optimization, because it causes P4 cpus 
to spend one less clock cycle generating addresses since t stops 
verifying the segment limits.

Somebody tried this with the i386 kernel a few months ago. I dont recall 
the results.  I doubt that it made a big difference though.  Its 
probably something that only shows up on microbenchmarks.
-- 
Peter Wemm - peter at wemm.org; peter at FreeBSD.org; peter at yahoo-inc.com
"All of this is for nothing if we don't go to the stars" - JMS/B5


More information about the freebsd-threads mailing list