ports/47061: Conflicting system headers by build of
graphics/cqcam
Bruce Evans
bde at zeta.org.au
Wed Dec 24 02:20:23 PST 2003
The following reply was made to PR kern/47061; it has been noted by GNATS.
From: Bruce Evans <bde at zeta.org.au>
To: Mark Linimon <linimon at lonesome.com>
Cc: freebsd-gnats-submit at freebsd.org
Subject: Re: ports/47061: Conflicting system headers by build of graphics/cqcam
Date: Wed, 24 Dec 2003 21:14:55 +1100 (EST)
On Tue, 23 Dec 2003, Mark Linimon wrote:
> This is really a kernel problem. I am going to go ahead and commit a
> workaround for this and the one or two other ports with this problem --
> but the workaround is basically unacceptable.
Er, this is really a port[s] problem. <machine/cpufunc.h> is not intended
to be included by applications. There was never any conflict with <string.h>
in the kernel because the kernel never included <string.h>, and the kernel
now avoids bogus conflicts, if any, with gcc's builtin ffs() using
-fno-builtin.
> The underlying problem is that machine/cpufunc.h for i386 has had
> a definition for a machine function 'ffs' for, oh, say, about 9 years
> now. However, man ffs will show you that there is an ffs(3) function
> as well. Even after reading the source it's not clear to me if these
> are supposed to have the same purpose -- someone with a more intimate
> knowledge of i386 arch is going to have to rule for certain.
They are the same. Last time I checked (less than a year ago), the gcc
builtin was still slower than the kernel inline except possibly when the
latter can use non-base-arch instructions like cmov. amd64's always have
cmov and always use the builtin.
... I checked again. With the following slightly too simple test:
%%%
#include <sys/types.h>
#include <machine/cpufunc.h>
int z[4096];
main()
{
volatile int v;
int i, j;
for (i = 0; i < 4096; i++)
z[i] = 1 << rand(); /* Yes, this is sloppy. */
for (j = 0; j < 100000; j++)
for (i = 0; i < 4096; i++)
#ifdef NOBUILTIN
v = ffs(z[i]);
#else
v = __builtin_ffs(z[i]);
#endif
}
%%%
Times on an Athlon XP1600 overclocked by 146/133:
cc -O -mcpu=pentiumpro -o foo foo.c (default from bsd.cpu.mk)
3.49 real 3.47 user 0.00 sys
cc -O -mcpu=pentiumpro -DNOBUILTIN -o foo foo.c (default + kernel ffs())
3.21 real 3.21 user 0.00 sys
cc -O -march=pentiumpro -o foo foo.c (gives cmov and works on Athlon XP too):
3.21 real 3.21 user 0.00 sys
Here using cmov[e] gives the same amount of optimization as the kernel ffs()
gets by using a simple conditional branch instead of a slow instruction
sequence starting with "set"[e]. Mispredicted branches are expensive on
some arches, but apparently they aren't on Athlons. The rand() in the
test was intended to cause mispredicted branches as well as lengthy
searches, but it doesn't actually. The branch is never taken since
z[i] is never 0. On changing the initialization of z[i] so that the
branch is taken every second time:
if (i & 1)
z[i] = 1 << rand();
the kernel version becomes much faster:
2.01 real 2.00 user 0.00 sys
and the other times don't change significantly. This is presumably
because the Athlon predicts taking the branch every second time
perfectly. The bit-search instruction is very expensive (and always
takes the same time??) and by branching over it every second time the
cost per iteration is almost halved.
A better benchmark might randomize the branches, but this might be
evey further from real applications since an arg of 0 may be very
unlikely (or very likely).
Times on a Celeron 366:
gcc builtin without cmov (very slow!):
15.78 real 15.68 user 0.00
gcc builtin with cmov:
5.64 real 5.61 user 0.00
kernel ffs():
5.85 real 5.81 user 0.00
kernel ffs() with alternating 0's (again, others not affected by alternating):
5.62 real 5.58 user 0.00
Times on an amd64 (sledge = Opteron 244 1804 MHz)
gcc builtin with cmov:
2.73 real 2.72 user 0.00 sys
old kernel ffs():
3.42 real 3.39 user 0.01 sys
kernel ffs() with alternating 0's (again, builtin affected by alternating):
1.82 real 1.82 user 0.00 sys
So using cmov is actually significtly better than a simple branch on
amd64's, but only if the arg isn't often 0.
> In the meantime, I'm going to hold my nose and commit an include
> file to the port that is merely the inb/outb functions. This is
> clearly a hack that should go away once a "correct" solution is found.
This is approximately correct, not a hack. The system could provide
a header that implements inb() and outb() functions for userland (*),
but <machine/cpufunc.h> is not this header. It's just a bit much for
multiple applications to have to duplicate these interfaces.
(*) They shouldn't exist in the kernel. Bus-space should be used.
Bruce
More information about the freebsd-bugs
mailing list