Implementing TLS: step 1

Terry Lambert tlambert2 at mindspring.com
Thu Jun 19 23:32:15 PDT 2003


[ ... hacked up Marcel's text in the post a bit ... ]

Marcel Moolenaar wrote:
> > Call me stupid but can you draw a picture of what you mean?
> > (it's worth a thoudsand words you know :-)
> 
> Not easily. Let me try with words again. Let me know if it's
> more clear or not. If not, I'll see if there's a graphical
> representation on the net.
[ ... ]
> The linker combines the .tdata and .tbss sections in the same way
> it combines the .data and .bss sections. The end result is an
> executable (or library) that contains both global data and TLS.

----------------------------------------
	Program image file on disk
----------------------------------------
Ordinary		TLS enabled

,-------.		,-------.
|  code |		| code  |
`-------'		`-------'
,-------.		,-------.
|  data |		| data  |
`-------'		`-------'
,-------.		,-------.
|  bss  |		|  bss  |
`-------'		`-------'
			,-------.
			| tbss  |
			`-------'
			,-------.
			| tdata |
			`-------'
----------------------------------------

> When code contains thread local variables (by way of defining them
> with the __thread modifier), the compiler will reserve the space
> for them in the .tdata section (for initialized data) or the .tbss
> section (for uninitialized data or data initialized to zero). This
> is exactly like how the compiler reserves space for global data
> (using .data and .bss sections), except of course that the intend
> of the TLS is that each thread has its own instance.

------------------------------------------------------------
	How the "__thread" C language extension works
------------------------------------------------------------
declaration			where it ends up

int foo1;			---> bss
int foo2 = 37;			---> data
__thread int foo3;		---> tbss
__thread int foo4 = 37;		---> tdata
------------------------------------------------------------

> The global data is normally loaded by the kernel at program load
> because there's one instance per process. For each thread, the
> thread library has to create the TLS instance by copying the TLS
> image present in the executable (or constructed by the rtld).
> Hence the use of template.

------------------------------------------------------------
	How things look when loaded into memory
------------------------------------------------------------
Thread #1		Disk Image		Thread #N

			,-------.
  code----reference---->| code  |<----reference---code
			`-------'
			,-------.
  data----reference---->| data  |<----reference---data
			`-------'
			,-------.
   bss----reference---->|  bss  |<----reference----bss
			`-------'
,-------.		,-------.		,-------.
| tdata |<-----copy-----| tdata |-----copy----->| tdata |	(templated)
`-------'		`-------'		`-------'
,-------.		,-------.		.-------.
| tbss  |<-----copy-----| tbss  |-----copy----->| tbss  |	(templated)
`-------'		`-------'		`-------'
------------------------------------------------------------



> The compiler generates access sequences according to the runtime
> specification which in general means that all offsets to the TLS
> are based on some TLS base address. On ia64 the thread pointer
> points to the TLS and serves as the TLS base address. On other
> architectures there may be an indirection. This means that on ia64
> the lack of TLS still requires us to allocate something for the
> thread pointer to point to. On other architectures this may not be
> the case.

Implementation defined access mechanisms are outside the scope
of this discussions, since they have not yet been selected.  But
the above means that the compiler "magically" knows to implement
code to reference "__thread" attributed data through the locally
defined access mechanism, whatever that may be.

Note(1): I have no idea how this applies to things like function
pointers with this attribute pointed to functions without it;
I assume it will "do the right thing", and make seperate data
elements for the pointers, as directed, *AND* generate code to
make the calls relative to the TLS for the active thread, which
could make the implementeion very complicated.

Note(2): For external global references, one would assume that
there are scoping issues, i.e. that the external declaration with
the "__thread" qualifier language extension *MUST* be in scope at
the time, or, at bes, the symbol decorations will not match, or,
at worst, everyone who references an out of scope variable like
this, or, if forced to have a reference in scope, the reference
fails to also have the "__thread" qualifier, they would get the
first thread's instance... or even worse, the template instance.

> > I need to go out to the car and get my copy of the TLS proposal....
> > this supports exec-time linking but does it support run-time (i.e after
> > exec has begun) linking?
> 
> Yes. The rtld will dynamicly construct the TLS template from the
> images in the ELF files in the startup set and pass this in
> AT_TLS_* by overriding the values (at least that was the idea).

This is where I personally have a problem with lazy intialization
of per thread TLS.  Specifically, when a thread exits, you have to
know what you have and have not instanced, on a per dynamic object,
per thread basis, as a minimum granularity, in order to be able to
clean it up, without trying to clean up things you have not yet
instanced in that particular thread.  This strikes me as being
unable to use the %gs "single instruction" shortcuts, which means
that code generation for a dynamically linked object module would
nee to know, _apriori_, what kind of references it needed to be
generating, OR *all* references would have to be via function and
pointer indirection... meaning that the "single instruction"
optimization is an illusion that can never happen in reality.

-- Terry


More information about the freebsd-threads mailing list