Strawman proposal: making libthr default thread implementation?
Peter Wemm
peter at wemm.org
Wed Jul 5 01:19:08 UTC 2006
On Tuesday 04 July 2006 12:41 pm, Julian Elischer wrote:
> David Xu wrote:
> >On Tuesday 04 July 2006 21:08, Daniel Eischen wrote:
> >>The question was what does libthr lack. The answer is priority
> >>inheritence & protect mutexes, and also SCHED_FIFO, SCHED_RR, and
> >>(in the future) SCHED_SPORADIC scheduling. That is what I stated
> >>earlier in this thread.
> >
> >As other people said, we need performance, these features, as you
> >said, in the future, but I don't think it is more important than
> > performance problem. you have to answer people what they should do
> > when they bought two cpus but works like they only have one, as the
> > major author of libpthread, in the past, you decided to keep
> > silent, ignoring such requirement. also, the signal queue may not
> > work reliably with libpthread, this nightmare appears again.
>
> As much as it pains me to say it, we could do with looking at using
> the simpler mode of 1:1
> as the default. M:N does work but it turns out that many of the
> promissed advantages turn out to be
> phantoms due to the complexities of actually implementing it.
At BSDCan, I tinkered with a checkout of the cvs tree, to see what the
kernel side of things would look like if M:N support came out. The
result is an amazing code clarity improvement and it enables a bunch of
other optimizations to be done with greater ease. Things happen like
being able to easily reduce the C code executed between an interrupt
and ithread dispatch by about 75%. This simplification enabled Kip to
do a bunch of scalability work as well (per-cpu scheduling locks,
per-cpu process lists, etc).
However, my objectives there were quite different to what Robert has
raised. My objectives were a 'what if?'. People have complained in
the past that the complexity that KSE adds to the kernel context
switching code gets in the way of other optimizations that they'd like
to try, so I figured that this would be a good way to call them on that
and see if it really does help or not. I was hoping to be able to
present a list of things that we'd gain as a result, but unfortunately
the cat is out of a bag a bit earlier than I'd have liked. I never
really intended to bring it up until there was something to show for
it. I know Kip has done some amazing work already but I was hoping for
other things as well before going public.
FWIW, My skunkworks project is in perforce:
//depot/projects/bike_sched/...
and there is a live diff:
http://people.freebsd.org/~peter/bike_sched.diff
(Yes, the name was picked long before this thread started)
It does NOT have any of Kip's optimization work in it. It was just
meant as a baseline for other people to experiment with. I've tested
it with 4bsd as the scheduler. ULE might work, but I have not tried
it. SCHED_CORE will not compile in that tree because I haven't yet
gone over diffs from David Xu yet. I run this code on my laptop with
libmap.conf redirecting libpthread to libthr. It works very well for
me, even threaded apps like firefox etc.
Anyway, back to the subject at hand. The basic problem with the KSE/SA
model as I see it (besides the kernel code complexity) is that it
doesn't really seem to suit the kind of threaded applications that
people seem to want to run on unix boxes.
In a traditional 1:1 threading system, eg: linuxthreads/nptl, libthr,
etc, mutex blocking is expensive, but system calls and blocking in
kernel mode is the same cost as a regular process making system calls
or blocking in kernel mode.
Because Linux was the most widely and massively deployed threading
system out there, people tended to write (or modify) their applications
to work best with those assumptions. ie: keep pthread mutex blocking
to an absolute minimum, and not care about kernel blocking.
However, with the SA/KSE model, our tradeoffs are different. We
implement pthread mutex blocking more quickly (except for UTS bugs that
can make it far slower), but we make blocking in kernel context
significantly higher cost than the 1:1 case, probably as much as double
the cost. For applications that block in the kernel a lot instead of on
mutexes, this is a big source of pain.
When most of the applications that we're called to run are written with
the linux behavior in mind, when our performance is compared against
linux we're the ones that usually come off the worst.
I'm sure that there are threaded applications that benefit from cheap
mutex operations, but I'm not personally aware of them. I do know that
the ones that we get regularly compared to linux with are the likes of
mysql, squid and threaded http servers. All of those depend on kernel
blocking being as fast as possible. Faster mutexes doesn't seem to
compensate for the extra costs of kernel blocking. I don't know where
java fits into this picture.
We've proven that we can make KSE work, but it was far harder than we
imagined, and unfortunately, the real-world apps that matter the most
just don't seem to take advantage of it. Not to mention the complexity
that we have to work around for scalability work.
Speaking of scalability, 16 and 32 way systems are here already and will
be common within 7.0's lifetime. If we don't scale, we're sunk. My
gut tells me that we HAVE to address the complexity that the KSE kernel
code adds in order to improve this. We can barely work well on 4-cpu
systems, let alone 32 cpu systems.
PS: I think it would be interesting to see a hybrid user level M:N
system. Even if it was as simple as multiplexing user threads onto a
group of kernel threads (without M:N kernel support) and doing libc_r
style syscall wrappers for intercepting long-term blockable operations
like socket/pipe IO etc. For short term blocking (disk IO), just wear
the cost of letting one thread block for a moment. I suspect that
large parts of libpthread could be reused and some bits brought back
from libc_r. I think this would do a fairly decent job for things like
computational threaded apps because mutexes would be really fast.
PPS: My opinions are not meant as a criticism of the massive amount of
work that has gone into making KSE work. It is more an attempt to step
back and take an objective look at the ever-changing big picture.
--
Peter Wemm - peter at wemm.org; peter at FreeBSD.org; peter at yahoo-inc.com
"All of this is for nothing if we don't go to the stars" - JMS/B5
More information about the freebsd-threads
mailing list