svn commit: r346588 - head/lib/libc/powerpc64/string

Mark Millard marklmi at yahoo.com
Thu May 2 07:44:09 UTC 2019


[I did not deal with translating register usage correctly.]

> On 2019-Apr-27, at 01:50, Mark Millard <marklmi at yahoo.com> wrote:
> 
> Justin Hibbits jhibbits at FreeBSD.org wrote on
> Fri Apr 26 16:21:47 UTC 2019 :
> 
>> This actually uses 'cmpb' which is only available on PowerISA 2.05+, so
>> I'll need to pull it out for now, and re-enable it once we have
>> ifuncs.  As it stands, this commit broke the G5 and POWER4/POWER5.
> 
> As I understand the code like:
> 
> 	xor	%r8,%r8,%r8	/* %r8 <- Zero. */
> 	xor	%r0,%r5,%r6	/* Check if double words are different. */
> 	cmpb	%r7,%r5,%r8	/* Check if double words contain zero. */
> 
> 	/*
> 	 * If double words are different or contain zero,
> 	 * find what byte is different or contains zero,
> 	 * else load next double words.
> 	 */
> 	or.	%r9,%r7,%r0
> 	bne	.Lstrcmp_check_zeros_differences
> 
> (and similarly for the loop. . .):
> 
> A) Each byte of %r5 that is non-zero needs that byte of %r7 to be zero.
> B) Each byte of %r5 that is zero need that byte of %r7 to be non-zero.
> 
> (cmpb assigns 0xff for non-zero as I understand, but even one non-zero
> bit is sufficient for the overall code structure.)
> 
> If I've got that much correct, then the following might be an
> alternative to cmpb for now. I'll explain the code via commented
> c/c++-ish code and then show the assembler notation:
> 
> unsigned long ul_has_zero_byte(unsigned long b)
> {
>    unsigned long constexpr low_7bits_of_bytes{0x7f7f7f7f'7f7f7f7ful};
> 
>                                                       // Illustrating byte transformations:
>    unsigned long const x= b & low_7bits_of_bytes;     // 0x00->0x00, 0x80->0x00, other->ms-bit-in-byte==0
>    unsigned long const y= x + low_7bits_of_bytes;     //     ->0x7f,     ->0x7f,      ->ms-bit-in-byte==1
>    unsigned long const z= b | y | low_7bits_of_bytes; //     ->0x7f,     ->0xff,      ->0xff
>    return ~z;                                         //     ->0x80,     ->0x00,      ->0x00
> }
> 
> (used in a powerpc64 context, so unsigned long being 64 bits).
> 
> So, not using %r8 as zero but for a different value,
> each cmpb can be replaced by:
> 
> # Only once to set up the value in %r8 (Note: 32639=0x7f7f):
> lis     r8,32639
> ori     r8,r8,32639
> rldimi  r8,r8,32,0
> 
> # each "cmpb %r7,%r5,%r8" replaced by:
> and     r7,r5,r8
> add     r7,r7,r8
> nor     r5,r7,r5
> andc    r5,r5,r8

The above 4 lines are an incorrect match to the context's
register usage: only r7 of the 3 registers r5, r7, r8
should have been changed. It looks like another temporary
register (for the stage) is required to make a match:

and      %r9,%r5,%r8
add      %r9,%r9,%r8
nor      %r7,%r9,%r5
andc     %r7,%r7,%r8

(%r9 later being replaced via: or. %r9,%r7,%r0)

> (The code is from compiler output, but with registers adjusted
> to match the context.)
> 
> 
> The c/c++-ish code came from thinking about material from Hacker's
> Delight Second Edition and the specific criteria needed here: it
> uses part of Figure 6-2 "Find First 0-Byte, branch-free code",
> adjusted for width and for returning something sufficient here.
> 



===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)



More information about the svn-src-head mailing list