cvs commit: src/sys/amd64/amd64 cpu_switch.S machdep.c

Andrew Gallatin gallatin at
Tue Oct 18 07:25:21 PDT 2005

Scott Long writes:
 > Andrew Gallatin wrote:
 > > David Xu [davidxu at] wrote:
 > > 
 > >>davidxu     2005-10-17 23:10:31 UTC
 > >>
 > >>  FreeBSD src repository
 > >>
 > >>  Modified files:
 > >>    sys/amd64/amd64      cpu_switch.S machdep.c 
 > >>  Log:
 > >>  Micro optimization for context switch. Eliminate code for saving gs.base
 > >>  and fs.base. We always update pcb.pcb_gsbase and pcb.pcb_fsbase
 > >>  when user wants to set them, in context switch routine, we only need to
 > >>  write them into registers, we never have to read them out from registers
 > >>  when thread is switched away. Since rdmsr is a serialization instruction,
 > >>  micro benchmark shows it is worthy to do.
 > > 
 > > 
 > > Nice.  This reduces lmbench context switch latency by about 0.4us (7.2
 > > -> 6.8us), and reduces TCP loopback latency by about 0.9us (36.1 ->
 > > 35.2) on my dual core 3800+
 > > 
 > > It is a shame we can't find a way to use the TSC as a timecounter on
 > > SMP systems.  It seems that about 40% of the context switch time is
 > > spent just waiting for the PIO read of the ACPI-fast or i8254 to
 > > return.
 > > 
 > > 
 > > Drew
 > > 
 > > 
 > > 
 > The TSC represents the clock rate of the CPU, and thus can vary wildly
 > when thermal and power management controls kick in, and there is no way
 > to know when it changes.  Because of this, I think that it's
 > practically useless on Pentium-Mobile and Pentium-M chips, among many
 > others.  There is also the issue of multiple CPUs having to keep their
 > TSC's somewhat in sync in order to get consistent counting in the
 > system.  The best that you can do is to periodically read a stable
 > counter and try to recalibrate, but then you'll likely start getting
 > wild operational variances.  

As I pointed out in another thread, both linux and solaris do it.
Solaris seems to have a nice algorithm for keeping things in sync, and
accounting for the TSC getting cleared after suspend/resume etc.  At
my level of understanding, this argument is nothing more than "but
Mom, all the other kids are doing it".  I was just hoping that
somebody with real understanding could pick up on it.

 >				 It's a shame that a PIO read is still so
 > expensive.  I'd hate to see just how bad your benchmark becomes when
 > ACPI-slow is used instead of ACPI-fast.

It seems like reading ACPI-fast is "only" 3us or so, but when the ctx
switch is otherwise 4us, it adds up. i8254 is much worse on this
system (6.5us).

 > I wonder if moving to HZ=1000 on amd64 and i386 was really all that good
 > of an idea.  Having preemption in the kernel means that ithreads can run
 > right away instead of having to wait for a tick, and various fixes to
 > 4BSD in the past year have eliminated bugs that would make the CPU wait
 > for up to a tick to schedule a thread.  So all we're getting now is a
 > 10x increase in scheduler overhead, including reading the timecounters.

Yeah.  I moved my back to hz=1000 when I noticed 4000 interrupts/sec
on an idle system.


More information about the cvs-src mailing list