svn commit: r346588 - head/lib/libc/powerpc64/string
Mark Millard
marklmi at yahoo.com
Thu May 2 07:44:09 UTC 2019
[I did not deal with translating register usage correctly.]
> On 2019-Apr-27, at 01:50, Mark Millard <marklmi at yahoo.com> wrote:
>
> Justin Hibbits jhibbits at FreeBSD.org wrote on
> Fri Apr 26 16:21:47 UTC 2019 :
>
>> This actually uses 'cmpb' which is only available on PowerISA 2.05+, so
>> I'll need to pull it out for now, and re-enable it once we have
>> ifuncs. As it stands, this commit broke the G5 and POWER4/POWER5.
>
> As I understand the code like:
>
> xor %r8,%r8,%r8 /* %r8 <- Zero. */
> xor %r0,%r5,%r6 /* Check if double words are different. */
> cmpb %r7,%r5,%r8 /* Check if double words contain zero. */
>
> /*
> * If double words are different or contain zero,
> * find what byte is different or contains zero,
> * else load next double words.
> */
> or. %r9,%r7,%r0
> bne .Lstrcmp_check_zeros_differences
>
> (and similarly for the loop. . .):
>
> A) Each byte of %r5 that is non-zero needs that byte of %r7 to be zero.
> B) Each byte of %r5 that is zero need that byte of %r7 to be non-zero.
>
> (cmpb assigns 0xff for non-zero as I understand, but even one non-zero
> bit is sufficient for the overall code structure.)
>
> If I've got that much correct, then the following might be an
> alternative to cmpb for now. I'll explain the code via commented
> c/c++-ish code and then show the assembler notation:
>
> unsigned long ul_has_zero_byte(unsigned long b)
> {
> unsigned long constexpr low_7bits_of_bytes{0x7f7f7f7f'7f7f7f7ful};
>
> // Illustrating byte transformations:
> unsigned long const x= b & low_7bits_of_bytes; // 0x00->0x00, 0x80->0x00, other->ms-bit-in-byte==0
> unsigned long const y= x + low_7bits_of_bytes; // ->0x7f, ->0x7f, ->ms-bit-in-byte==1
> unsigned long const z= b | y | low_7bits_of_bytes; // ->0x7f, ->0xff, ->0xff
> return ~z; // ->0x80, ->0x00, ->0x00
> }
>
> (used in a powerpc64 context, so unsigned long being 64 bits).
>
> So, not using %r8 as zero but for a different value,
> each cmpb can be replaced by:
>
> # Only once to set up the value in %r8 (Note: 32639=0x7f7f):
> lis r8,32639
> ori r8,r8,32639
> rldimi r8,r8,32,0
>
> # each "cmpb %r7,%r5,%r8" replaced by:
> and r7,r5,r8
> add r7,r7,r8
> nor r5,r7,r5
> andc r5,r5,r8
The above 4 lines are an incorrect match to the context's
register usage: only r7 of the 3 registers r5, r7, r8
should have been changed. It looks like another temporary
register (for the stage) is required to make a match:
and %r9,%r5,%r8
add %r9,%r9,%r8
nor %r7,%r9,%r5
andc %r7,%r7,%r8
(%r9 later being replaced via: or. %r9,%r7,%r0)
> (The code is from compiler output, but with registers adjusted
> to match the context.)
>
>
> The c/c++-ish code came from thinking about material from Hacker's
> Delight Second Edition and the specific criteria needed here: it
> uses part of Figure 6-2 "Find First 0-Byte, branch-free code",
> adjusted for width and for returning something sufficient here.
>
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
More information about the svn-src-head
mailing list