cvs commit: src/sys/amd64/amd64 cpu_switch.S machdep.c

Scott Long scottl at samsco.org
Tue Oct 18 07:35:08 PDT 2005


Andrew Gallatin wrote:
> Scott Long writes:
>  > Andrew Gallatin wrote:
>  > > David Xu [davidxu at FreeBSD.org] wrote:
>  > > 
>  > >>davidxu     2005-10-17 23:10:31 UTC
>  > >>
>  > >>  FreeBSD src repository
>  > >>
>  > >>  Modified files:
>  > >>    sys/amd64/amd64      cpu_switch.S machdep.c 
>  > >>  Log:
>  > >>  Micro optimization for context switch. Eliminate code for saving gs.base
>  > >>  and fs.base. We always update pcb.pcb_gsbase and pcb.pcb_fsbase
>  > >>  when user wants to set them, in context switch routine, we only need to
>  > >>  write them into registers, we never have to read them out from registers
>  > >>  when thread is switched away. Since rdmsr is a serialization instruction,
>  > >>  micro benchmark shows it is worthy to do.
>  > > 
>  > > 
>  > > Nice.  This reduces lmbench context switch latency by about 0.4us (7.2
>  > > -> 6.8us), and reduces TCP loopback latency by about 0.9us (36.1 ->
>  > > 35.2) on my dual core 3800+
>  > > 
>  > > It is a shame we can't find a way to use the TSC as a timecounter on
>  > > SMP systems.  It seems that about 40% of the context switch time is
>  > > spent just waiting for the PIO read of the ACPI-fast or i8254 to
>  > > return.
>  > > 
>  > > 
>  > > Drew
>  > > 
>  > > 
>  > > 
>  > 
>  > The TSC represents the clock rate of the CPU, and thus can vary wildly
>  > when thermal and power management controls kick in, and there is no way
>  > to know when it changes.  Because of this, I think that it's
>  > practically useless on Pentium-Mobile and Pentium-M chips, among many
>  > others.  There is also the issue of multiple CPUs having to keep their
>  > TSC's somewhat in sync in order to get consistent counting in the
>  > system.  The best that you can do is to periodically read a stable
>  > counter and try to recalibrate, but then you'll likely start getting
>  > wild operational variances.  
> 
> As I pointed out in another thread, both linux and solaris do it.
> Solaris seems to have a nice algorithm for keeping things in sync, and
> accounting for the TSC getting cleared after suspend/resume etc.  At
> my level of understanding, this argument is nothing more than "but
> Mom, all the other kids are doing it".  I was just hoping that
> somebody with real understanding could pick up on it.

Steering mutliple TSC's together isn't that hard and there are plenty of
examples, as you point out.  Accounting for the changes due to thermal
and power management (note that this isn't the same problem as suspend
and resume) is what worries me.

> 
>  >				 It's a shame that a PIO read is still so
>  > expensive.  I'd hate to see just how bad your benchmark becomes when
>  > ACPI-slow is used instead of ACPI-fast.
> 
> It seems like reading ACPI-fast is "only" 3us or so, but when the ctx
> switch is otherwise 4us, it adds up. i8254 is much worse on this
> system (6.5us).
> 
>  > I wonder if moving to HZ=1000 on amd64 and i386 was really all that good
>  > of an idea.  Having preemption in the kernel means that ithreads can run
>  > right away instead of having to wait for a tick, and various fixes to
>  > 4BSD in the past year have eliminated bugs that would make the CPU wait
>  > for up to a tick to schedule a thread.  So all we're getting now is a
>  > 10x increase in scheduler overhead, including reading the timecounters.
> 
> Yeah.  I moved my back to hz=1000 when I noticed 4000 interrupts/sec
> on an idle system.
> 
> Drew

Do you mean 1000 or 100 here?  Anyways, the high clock interrupt rate is
so that we can use the local apic clock to get the various system ticks
that we have instead of continuing to fight motherboards that no longer
hook up the 8259 in a sane way.  This is why 5.x doesn't work well on a
number of new motherboards (nvidia ones especially) but 6.x works just
fine.

Scott


More information about the cvs-src mailing list