Assembly string functions in i386 libc

Bruce Evans brde at optusnet.com.au
Fri Jul 13 04:38:16 UTC 2007


On Thu, 12 Jul 2007, Sean C. Farley wrote:

> On Thu, 12 Jul 2007, Bruce Evans wrote:

>> Now I've looked at it.  I think it is not testing strlen() at all,
>> except for the libc case, because __pure prevents more than 1 call to
>> strlen().  (The existence of __pure is also a bug.  __pure was the
>> FreeBSD spelling of the __const__ attribute in gcc-1.  It was removed
>> when special support for gcc-1 was dropped, and should not have been
>> recycled.)  __pure is a syntax error in the old version of FreeBSD
>> that I tested on.  I first tried __pure2, which is the FreeBSD
>> spelling of the __const__ attribute in gcc-2.  I think it is weaker
>> than the __pure__ attribute in gcc-3.
>
>> From what I could find, strlen() should not have the __const__ (__pure2)
> attribute since it is being passed a pointer, but __pure__ (__pure)
> should work.  Are you saying that __pure used to mean __const__ in gcc-1
> but now it means __pure__ for gcc-2.96 and above?  The redefinition of
> __pure is what you are saying is a bug.  Yes?

Yes to most of this.  __pure2 is actually weaker than __pure[>2.96].
__pure2 has the very large effect of removing all calls to strlen()
from the loop.  This affected everything except libc strlen() since
everything else was named xstrlen() and declared as __pure*, while
libc strlen() was declared in <string.h> without __pure*.  OTOH,
__pure[>2.96] has no effect on this benchmark, at least with gcc-3.3.3.
I don't understand why it has no effect.  It has no effect even when I
change the arg to a literal.  The context is very simple, with no
aliasing problems in sight, at least with the literal arg (with the
arg possibly being argv[2], maybe gcc has to worry about the arg being
modified by a signal handler).  If __pure[>2.96] doesn't work in this
simple context, then it isn't clear when it can work.

BTW, starting somewhere near gcc-3.4 for -O2 and gcc-4.2 for -O, simple
loops like this don't always work in benchmarks, because the compiler
removes the whole loop if it can see that it doesn't do anything.  The
compiler can see this if it can see inside any function calls in the
loop (this currently requires the functions to be in the same source
file or #included there), or if the functions are declared as sufficiently
__pure.  When I used __pure2 with gcc-3.3.3 -O, gcc removed the function
calls but not the loop.  gcc-4.2 would also remove the loop.

> I removed __pure from main.c and added -static -g.
>
> Athlon XP 2100 (1.72 GHz):
> libcstrlen:     time spent executing strlen(string) = 64:       0.994755
> asmstrlen:      time spent executing strlen(string) = 64:       0.989012
> basestrlen:     time spent executing strlen(string) = 64:       0.879722
> strlen:         time spent executing strlen(string) = 64:       0.626727
> strlen2:        time spent executing strlen(string) = 64:       0.587162

That looks just like my results on A64 in 32-bit mode.  (A64 is remarkably
similar to AXP in most CPU resources including pipelines, so its performance
is remarkably similar even when when its mode differs.)

> ...[asm version more than twice as slow on P3-P4]

> The Athlon XP did much better with the assembly version than either
> Intel CPU for me.  For all three CPU's using various string lengths from
> 1 to 256, the C versions always beat the assembly version although it
> came somewhat close for the 9 to 32 byte lengths to basestrlen.

Intel CPUs are remarkably different from AXP :-).  I'm surprised at the
sign of the difference here -- I would have expected them to be better
for the string instructions.

Bruce


More information about the freebsd-arch mailing list