amd64 cpu_switch in C.
Jeff Roberson
jroberson at chesapeake.net
Tue Mar 11 02:25:27 UTC 2008
http://people.freebsd.org/~jeff/amd64.diff
At the above address there is an implementation of cpu_switch() and
cpu_throw() for amd64 almost entirely in C. I'm posting this for
discussion and eventual commit. There are numerous reasons to do this, I
will outline some of them.
Implementing the bulk of the code in C allows us to add/modify higher
level features more easily. For example, we can change the pmap active
bits to use a cpuset_t so we can support more than 64 cpus. It makes the
code faster because we can do more complicated checks to save time, such
as avoiding writing the fs/gsbase MSRs if they have not changed. It makes
the code faster because infrequently used options can be moved out of the
normal code paths.
In fact, the c version is ~10% faster than the assembly version at a two
thread sched_yield() test on a single cpu opteron:
x asm.yield
+ csw.yield
+------------------------------------------------------------------------------+
| ++ x x
|
|+ ++ ++ + + + + + ++ +x x x x xxx
x|
| |______M_____A___________| |__________AM__________|
|
+------------------------------------------------------------------------------+
N Min Max Median Avg
Stddev
x 10 5.17 5.88 5.5 5.479
0.19272606
+ 15 4.58 5.16 4.71 4.8126667
0.20738049
Difference at 95.0% confidence
-0.666333 +/- 0.170431
-12.1616% +/- 3.11062%
(Student's t, pooled s = 0.201773)
This test measures the total time to call sched_yield() 10,000,000 times
between two threads. Two threads are needed to be sure that the scheduler
doesn't pick the same thread twice and skip cpu_switch(). The 10% speedup
is notable because the cpu_switch() routine was consuming less than 40% of
the cpu prior to the speedup. So it's almost 1/3rd faster.
Peter also suggested that we can delay portions of the switch until the
user boundary. For workloads that involve heavy kernel activity on the
users part with multiple switches per-syscall this would be a big savings.
We could also use this as a framework to implement custom switch routines
if we want to switch directly to ithreads or taskqueue threads in the
future.
The C routine is supplemented by two assembly routines which are
responsible for saving the core architecture state and manipulating the
stack. These total approximately 50 assembly instructions and are similar
to savecontext/swapcontext.
The c code saves the old threads context but still runs on its stack as it
continues the switch. This is safe because the old thread is locked until
we call "cpu_switchin()" which is similar to swapcontext.
The only appreciable downside is that it lowers the barrier of entry for
modifying a very sensitive piece of code. Still, I think the flexibility
it gives us outweighs those concerns.
Comments?
Thanks,
Jeff
More information about the freebsd-arch
mailing list