svn commit: r185162 - in head: . sys/amd64/include
sys/arm/include sys/conf sys/dev/bce sys/dev/cxgb
sys/dev/cxgb/sys sys/dev/cxgb/ulp/iw_cxgb sys/dev/mxge
sys/dev/nxge sys/i386/include sys/i386/in...
Attilio Rao
attilio at freebsd.org
Mon Dec 1 14:22:23 PST 2008
2008/12/1, John Baldwin <jhb at freebsd.org>:
> On Sunday 23 November 2008 10:41:38 am Kostik Belousov wrote:
> > On Sun, Nov 23, 2008 at 12:51:58AM +0000, Kip Macy wrote:
> > > On Sat, Nov 22, 2008 at 11:08 PM, Scott Long <scottl at samsco.org> wrote:
> > > > Kostik Belousov wrote:
> > > >>
> > > >> On Sat, Nov 22, 2008 at 03:05:22PM -0700, Scott Long wrote:
> > > >>>
> > > >>> A neat hack would be for the kernel linker to scan the text and do a
> > > >>> drop-in replacement of the opcode that is appropriate for the
> platform.
> > > >>> I can't see how a CPU_XXX definition would work because it's just a
> > > >>> compile time construct, one that can be included with any kernel
> > > >>> compile.
> > > >>
> > > >> Yes, it is possible to do that. Less drastic change is to directly
> > > >> check features. I moved slow code to separate section to eliminate
> > > >> unconditional jump in fast path.
> > > >> Only compile-tested.
> > > >>
> > > >
> > > > As long as it works, I think it's a step in the right direction; I'm
> > > > assuming that cpu_feature is a symbol filled in at runtime and not a
> > > > macro for the cpuid instruction, right?
> > > >
> > > > Scott
> > > >
> > >
> > > i386/include/md_var.h:
> > > <..>
> > > extern u_int cpu_exthigh;
> > > extern u_int cpu_feature;
> > > extern u_int cpu_feature2;
> > > extern u_int amd_feature;
> > > extern u_int amd_feature2;
> > > <...>
> > >
> > > I'm not thrilled with it, but we can revisit the issue if it makes a
> > > measurable difference on someone's workload.
> >
> > Below is the updated patch. It includes changes made after private comments
> > by bde@ and uses symbolic definitions for the bits in the features words.
> > I thought about accessing a per-CPU word for serialized instruction in the
>
> > slow path, but decided that it does not beneficial.\
>
> Is the branch really better than just doing what the atomic operations for
> mutexes, etc. do and just use 'lock addl $0,%esp' for a barrier in all cases
> on i386 and only bother with using the fancier instructions on amd64? Even
> amd64 doesn't use *fence yet for the atomic ops actually. I have had a patch
> to use it for years, but during testing there was no discernable difference
> between the existing 'lock addl' approach vs '*fence'. I'd much rather just
> use 486 code for all i386 machines than add a branch, esp. if
> the "optimization" the branch is doing isn't an actual optimization.
This is exactly what I suggest in private and I'm supportive with this.
Attilio
--
Peace can only be achieved by understanding - A. Einstein
More information about the svn-src-head
mailing list