libkse -> libpthreads

Mon Apr 21 21:28:07 PDT 2003

Narvi wrote:
> > Jeff makes a good point about the potential for bugs because of
> > the more sophisticated and larger total number of lines of code,
> > however.
> 
> Unless something went inheremtly wrong in case of the solaris threading
> libraries (or the published information is wrong), there is the potential
> for the additional code and complexity to add sufficent overhead to be
> measurable as perfomance degradation. It could be such might happen only
> with large values of M and N (and thus not surface in the next K years in
> a serious way in FreeBSD) or it could show up also on relatively small
> values of M and N or not at all.

Actually, we're only talking about FreeBSD threading, and we're
talking about the degenerate case in N:M where N == M == 1.

With one exception, which is just an optimization that hasn't
happened yet, the UTS totally drops out, and the subset of the
the code that is called drops down to the same as libthr.  This
is pretty much on purpose.

So the complexity argument isn't really valid (after the additional
optimization to let the kernel know that all threads are created as
PTHREAD_SCOPE_SYSTEM).  There's not really any additional overhead.

Right now, ther *is* measurable overhead, but only because there's
an upcall generated on a block, and the upcall is immediately
terminated with a kse_release() call.  That's the optimization that
hasn't been done.

> There are some unknowns in the picture, and say assigning a non-trivial
> likelyhood to at some point all processorts being 8-way (or even 4-way)
> EV8 style SMT would change some aspects of it.

Not really.  The system call path would be essentially identical,
and so the overhead.

If you really want to talk about 8-way or even NUMA, well, the N:M
model "rules" in that case, so long as you ensure that your KSEGRP
allocation follows the NUMA cluster in terms of grouping mutually
accessible shared memory.  At this point, though, FreeBSD is pretty
shared-memory-multiprocessor centric in its design, and it's not
very amenable to clustering (even NUMA clustering) and it's current
schedulers aren't really up to node-migration of tasks.

On a PPC architecture, too, the coherency model is probably really
completely broken; probably so for any architecture that doesn't
comply with the Intel MPSpec 1.4.

> > Daniel also makes a good point about not finding them unless the
> > code is put in production.
> >
> > To my mind, it comes down to bug avoidance vs. bug detection, if
> > you weigh these arguments against each other.
> 
> I prefer bug avoidance and spending the saved time on perfomance - but
> this is for the moment a preference of somebody without time to spend on
> either thread library.

If you can prove something is correct, then you've eliminated the
bugs.  If you can only prove that you've avoided some bugs, you
can't prove correctness from that.  It's really a false savings;
it's the DJB "qmail" argument all over again.  I like to think
engineers are capable of dealing with complexity, and that they
don't need to be boxed into little sandboxes, like children, to
keep them from hurting themselves.  8-) 8-).

-- Terry