Assembly string functions in i386 libc

Sean C. Farley scf at FreeBSD.org
Fri Jul 13 14:19:12 UTC 2007


On Fri, 13 Jul 2007, Bruce Evans wrote:

> On Thu, 12 Jul 2007, Sean C. Farley wrote:
>
>> On Thu, 12 Jul 2007, Bruce Evans wrote:
>
>>> Now I've looked at it.  I think it is not testing strlen() at all,
>>> except for the libc case, because __pure prevents more than 1 call
>>> to strlen().  (The existence of __pure is also a bug.  __pure was
>>> the FreeBSD spelling of the __const__ attribute in gcc-1.  It was
>>> removed when special support for gcc-1 was dropped, and should not
>>> have been recycled.)  __pure is a syntax error in the old version of
>>> FreeBSD that I tested on.  I first tried __pure2, which is the
>>> FreeBSD spelling of the __const__ attribute in gcc-2.  I think it is
>>> weaker than the __pure__ attribute in gcc-3.
>> 
>>> From what I could find, strlen() should not have the __const__
>>> (__pure2) attribute since it is being passed a pointer, but __pure__
>>> (__pure) should work.  Are you saying that __pure used to mean
>>> __const__ in gcc-1 but now it means __pure__ for gcc-2.96 and above?
>>> The redefinition of __pure is what you are saying is a bug.  Yes?
>
> Yes to most of this.  __pure2 is actually weaker than __pure[>2.96].
> __pure2 has the very large effect of removing all calls to strlen()
> from the loop.  This affected everything except libc strlen() since
> everything else was named xstrlen() and declared as __pure*, while
> libc strlen() was declared in <string.h> without __pure*.

Actually, the reason I had __pure in main.c was because it exists in
string.h.

> OTOH, __pure[>2.96] has no effect on this benchmark, at least with
> gcc-3.3.3.  I don't understand why it has no effect.  It has no effect
> even when I change the arg to a literal.  The context is very simple,
> with no aliasing problems in sight, at least with the literal arg
> (with the arg possibly being argv[2], maybe gcc has to worry about the
> arg being modified by a signal handler).  If __pure[>2.96] doesn't
> work in this simple context, then it isn't clear when it can work.

Using or not using __pure with gcc-3.4.6 has no effect for me even with
the literal argument regardless of optimization (-O0, -O1, or -O2).

> BTW, starting somewhere near gcc-3.4 for -O2 and gcc-4.2 for -O,
> simple loops like this don't always work in benchmarks, because the
> compiler removes the whole loop if it can see that it doesn't do
> anything.  The compiler can see this if it can see inside any function
> calls in the loop (this currently requires the functions to be in the
> same source file or #included there), or if the functions are declared
> as sufficiently __pure.  When I used __pure2 with gcc-3.3.3 -O, gcc
> removed the function calls but not the loop.  gcc-4.2 would also
> remove the loop.

Interesting.  I need to remember this.

Just to note, __pure2 is not valid with strlen() since it examines data
passed via a pointer, according to the GCC docs.

> ...[A64 in 32-bit mode similar to AXP]

BTW, does AXP refer to Athlon XP or Alpha AXP?  When I first saw you
write AXP, I thought it was an Alpha.  :)

>> ...[asm version more than twice as slow on P3-P4]
>
>> The Athlon XP did much better with the assembly version than either
>> Intel CPU for me.  For all three CPU's using various string lengths
>> from 1 to 256, the C versions always beat the assembly version
>> although it came somewhat close for the 9 to 32 byte lengths to
>> basestrlen.
>
> Intel CPUs are remarkably different from AXP :-).  I'm surprised at
> the sign of the difference here -- I would have expected them to be
> better for the string instructions.

That is what has been confusing me.  Possibly, Intel has not touched the
basics of these string instructions for a longer time than AMD.

Sean
-- 
scf at FreeBSD.org


More information about the freebsd-arch mailing list