svn commit: r346588 - head/lib/libc/powerpc64/string

Mark Millard marklmi at yahoo.com
Sat Apr 27 08:51:02 UTC 2019


Justin Hibbits jhibbits at FreeBSD.org wrote on
Fri Apr 26 16:21:47 UTC 2019 :

> This actually uses 'cmpb' which is only available on PowerISA 2.05+, so
> I'll need to pull it out for now, and re-enable it once we have
> ifuncs.  As it stands, this commit broke the G5 and POWER4/POWER5.

As I understand the code like:

	xor	%r8,%r8,%r8	/* %r8 <- Zero. */
	xor	%r0,%r5,%r6	/* Check if double words are different. */
	cmpb	%r7,%r5,%r8	/* Check if double words contain zero. */

	/*
	 * If double words are different or contain zero,
	 * find what byte is different or contains zero,
	 * else load next double words.
	 */
	or.	%r9,%r7,%r0
	bne	.Lstrcmp_check_zeros_differences

(and similarly for the loop. . .):

A) Each byte of %r5 that is non-zero needs that byte of %r7 to be zero.
B) Each byte of %r5 that is zero need that byte of %r7 to be non-zero.

(cmpb assigns 0xff for non-zero as I understand, but even one non-zero
bit is sufficient for the overall code structure.)

If I've got that much correct, then the following might be an
alternative to cmpb for now. I'll explain the code via commented
c/c++-ish code and then show the assembler notation:

unsigned long ul_has_zero_byte(unsigned long b)
{
    unsigned long constexpr low_7bits_of_bytes{0x7f7f7f7f'7f7f7f7ful};

                                                       // Illustrating byte transformations:
    unsigned long const x= b & low_7bits_of_bytes;     // 0x00->0x00, 0x80->0x00, other->ms-bit-in-byte==0
    unsigned long const y= x + low_7bits_of_bytes;     //     ->0x7f,     ->0x7f,      ->ms-bit-in-byte==1
    unsigned long const z= b | y | low_7bits_of_bytes; //     ->0x7f,     ->0xff,      ->0xff
    return ~z;                                         //     ->0x80,     ->0x00,      ->0x00
}

(used in a powerpc64 context, so unsigned long being 64 bits).

So, not using %r8 as zero but for a different value,
each cmpb can be replaced by:

# Only once to set up the value in %r8 (Note: 32639=0x7f7f):
lis     r8,32639
ori     r8,r8,32639
rldimi  r8,r8,32,0

# each "cmpb %r7,%r5,%r8" replaced by:
and     r7,r5,r8
add     r7,r7,r8
nor     r5,r7,r5
andc    r5,r5,r8

(The code is from compiler output, but with registers adjusted
to match the context.)


The c/c++-ish code came from thinking about material from Hacker's
Delight Second Edition and the specific criteria needed here: it
uses part of Figure 6-2 "Find First 0-Byte, branch-free code",
adjusted for width and for returning something sufficient here.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)



More information about the svn-src-head mailing list