Assembly string functions in i386 libc
Sean C. Farley
scf at FreeBSD.org
Fri Jul 13 14:19:12 UTC 2007
On Fri, 13 Jul 2007, Bruce Evans wrote:
> On Thu, 12 Jul 2007, Sean C. Farley wrote:
>
>> On Thu, 12 Jul 2007, Bruce Evans wrote:
>
>>> Now I've looked at it. I think it is not testing strlen() at all,
>>> except for the libc case, because __pure prevents more than 1 call
>>> to strlen(). (The existence of __pure is also a bug. __pure was
>>> the FreeBSD spelling of the __const__ attribute in gcc-1. It was
>>> removed when special support for gcc-1 was dropped, and should not
>>> have been recycled.) __pure is a syntax error in the old version of
>>> FreeBSD that I tested on. I first tried __pure2, which is the
>>> FreeBSD spelling of the __const__ attribute in gcc-2. I think it is
>>> weaker than the __pure__ attribute in gcc-3.
>>
>>> From what I could find, strlen() should not have the __const__
>>> (__pure2) attribute since it is being passed a pointer, but __pure__
>>> (__pure) should work. Are you saying that __pure used to mean
>>> __const__ in gcc-1 but now it means __pure__ for gcc-2.96 and above?
>>> The redefinition of __pure is what you are saying is a bug. Yes?
>
> Yes to most of this. __pure2 is actually weaker than __pure[>2.96].
> __pure2 has the very large effect of removing all calls to strlen()
> from the loop. This affected everything except libc strlen() since
> everything else was named xstrlen() and declared as __pure*, while
> libc strlen() was declared in <string.h> without __pure*.
Actually, the reason I had __pure in main.c was because it exists in
string.h.
> OTOH, __pure[>2.96] has no effect on this benchmark, at least with
> gcc-3.3.3. I don't understand why it has no effect. It has no effect
> even when I change the arg to a literal. The context is very simple,
> with no aliasing problems in sight, at least with the literal arg
> (with the arg possibly being argv[2], maybe gcc has to worry about the
> arg being modified by a signal handler). If __pure[>2.96] doesn't
> work in this simple context, then it isn't clear when it can work.
Using or not using __pure with gcc-3.4.6 has no effect for me even with
the literal argument regardless of optimization (-O0, -O1, or -O2).
> BTW, starting somewhere near gcc-3.4 for -O2 and gcc-4.2 for -O,
> simple loops like this don't always work in benchmarks, because the
> compiler removes the whole loop if it can see that it doesn't do
> anything. The compiler can see this if it can see inside any function
> calls in the loop (this currently requires the functions to be in the
> same source file or #included there), or if the functions are declared
> as sufficiently __pure. When I used __pure2 with gcc-3.3.3 -O, gcc
> removed the function calls but not the loop. gcc-4.2 would also
> remove the loop.
Interesting. I need to remember this.
Just to note, __pure2 is not valid with strlen() since it examines data
passed via a pointer, according to the GCC docs.
> ...[A64 in 32-bit mode similar to AXP]
BTW, does AXP refer to Athlon XP or Alpha AXP? When I first saw you
write AXP, I thought it was an Alpha. :)
>> ...[asm version more than twice as slow on P3-P4]
>
>> The Athlon XP did much better with the assembly version than either
>> Intel CPU for me. For all three CPU's using various string lengths
>> from 1 to 256, the C versions always beat the assembly version
>> although it came somewhat close for the 9 to 32 byte lengths to
>> basestrlen.
>
> Intel CPUs are remarkably different from AXP :-). I'm surprised at
> the sign of the difference here -- I would have expected them to be
> better for the string instructions.
That is what has been confusing me. Possibly, Intel has not touched the
basics of these string instructions for a longer time than AMD.
Sean
--
scf at FreeBSD.org
More information about the freebsd-arch
mailing list