svn commit: r185162 - in head: . sys/amd64/include sys/arm/include sys/conf sys/dev/bce sys/dev/cxgb sys/dev/cxgb/sys sys/dev/cxgb/ulp/iw_cxgb sys/dev/mxge sys/dev/nxge sys/i386/include sys/i386/in...

Mon Dec 1 11:32:47 PST 2008

On Sunday 23 November 2008 10:41:38 am Kostik Belousov wrote:
> On Sun, Nov 23, 2008 at 12:51:58AM +0000, Kip Macy wrote:
> > On Sat, Nov 22, 2008 at 11:08 PM, Scott Long <scottl at samsco.org> wrote:
> > > Kostik Belousov wrote:
> > >>
> > >> On Sat, Nov 22, 2008 at 03:05:22PM -0700, Scott Long wrote:
> > >>>
> > >>> A neat hack would be for the kernel linker to scan the text and do a
> > >>> drop-in replacement of the opcode that is appropriate for the 
platform.
> > >>> I can't see how a CPU_XXX definition would work because it's just a
> > >>> compile time construct, one that can be included with any kernel
> > >>> compile.
> > >>
> > >> Yes, it is possible to do that. Less drastic change is to directly
> > >> check features. I moved slow code to separate section to eliminate
> > >> unconditional jump in fast path.
> > >> Only compile-tested.
> > >>
> > >
> > > As long as it works, I think it's a step in the right direction; I'm
> > > assuming that cpu_feature is a symbol filled in at runtime and not a
> > > macro for the cpuid instruction, right?
> > >
> > > Scott
> > >
> > 
> > i386/include/md_var.h:
> > <..>
> > extern  u_int   cpu_exthigh;
> > extern  u_int   cpu_feature;
> > extern  u_int   cpu_feature2;
> > extern  u_int   amd_feature;
> > extern  u_int   amd_feature2;
> > <...>
> > 
> > I'm not thrilled with it, but we can revisit the issue if it makes a
> > measurable difference on someone's workload.
> 
> Below is the updated patch. It includes changes made after private comments
> by bde@ and uses symbolic definitions for the bits in the features words.
> I thought about accessing a per-CPU word for serialized instruction in the
> slow path, but decided that it does not beneficial.\

Is the branch really better than just doing what the atomic operations for 
mutexes, etc. do and just use 'lock addl $0,%esp' for a barrier in all cases 
on i386 and only bother with using the fancier instructions on amd64?  Even 
amd64 doesn't use *fence yet for the atomic ops actually.  I have had a patch 
to use it for years, but during testing there was no discernable difference 
between the existing 'lock addl' approach vs '*fence'.  I'd much rather just 
use 486 code for all i386 machines than add a branch, esp. if 
the "optimization" the branch is doing isn't an actual optimization.

-- 
John Baldwin