svn commit: r185162 - in head: . sys/amd64/include sys/arm/include sys/conf sys/dev/bce sys/dev/cxgb sys/dev/cxgb/sys sys/dev/cxgb/ulp/iw_cxgb sys/dev/mxge sys/dev/nxge sys/i386/include sys/i386/in...

Mon Dec 1 14:22:23 PST 2008

2008/12/1, John Baldwin <jhb at freebsd.org>:
> On Sunday 23 November 2008 10:41:38 am Kostik Belousov wrote:
>  > On Sun, Nov 23, 2008 at 12:51:58AM +0000, Kip Macy wrote:
>  > > On Sat, Nov 22, 2008 at 11:08 PM, Scott Long <scottl at samsco.org> wrote:
>  > > > Kostik Belousov wrote:
>  > > >>
>  > > >> On Sat, Nov 22, 2008 at 03:05:22PM -0700, Scott Long wrote:
>  > > >>>
>  > > >>> A neat hack would be for the kernel linker to scan the text and do a
>  > > >>> drop-in replacement of the opcode that is appropriate for the
>  platform.
>  > > >>> I can't see how a CPU_XXX definition would work because it's just a
>  > > >>> compile time construct, one that can be included with any kernel
>  > > >>> compile.
>  > > >>
>  > > >> Yes, it is possible to do that. Less drastic change is to directly
>  > > >> check features. I moved slow code to separate section to eliminate
>  > > >> unconditional jump in fast path.
>  > > >> Only compile-tested.
>  > > >>
>  > > >
>  > > > As long as it works, I think it's a step in the right direction; I'm
>  > > > assuming that cpu_feature is a symbol filled in at runtime and not a
>  > > > macro for the cpuid instruction, right?
>  > > >
>  > > > Scott
>  > > >
>  > >
>  > > i386/include/md_var.h:
>  > > <..>
>  > > extern  u_int   cpu_exthigh;
>  > > extern  u_int   cpu_feature;
>  > > extern  u_int   cpu_feature2;
>  > > extern  u_int   amd_feature;
>  > > extern  u_int   amd_feature2;
>  > > <...>
>  > >
>  > > I'm not thrilled with it, but we can revisit the issue if it makes a
>  > > measurable difference on someone's workload.
>  >
>  > Below is the updated patch. It includes changes made after private comments
>  > by bde@ and uses symbolic definitions for the bits in the features words.
>  > I thought about accessing a per-CPU word for serialized instruction in the
>
> > slow path, but decided that it does not beneficial.\
>
>  Is the branch really better than just doing what the atomic operations for
>  mutexes, etc. do and just use 'lock addl $0,%esp' for a barrier in all cases
>  on i386 and only bother with using the fancier instructions on amd64?  Even
>  amd64 doesn't use *fence yet for the atomic ops actually.  I have had a patch
>  to use it for years, but during testing there was no discernable difference
>  between the existing 'lock addl' approach vs '*fence'.  I'd much rather just
>  use 486 code for all i386 machines than add a branch, esp. if
>  the "optimization" the branch is doing isn't an actual optimization.

This is exactly what I suggest in private and I'm supportive with this.

Attilio

-- 
Peace can only be achieved by understanding - A. Einstein