svn commit: r351700 - head/lib/libc/string

Bruce Evans brde at optusnet.com.au
Fri Sep 20 12:14:44 UTC 2019


On Mon, 2 Sep 2019, Ed Maste wrote:

> Author: emaste
> Date: Mon Sep  2 13:56:44 2019
> New Revision: 351700
> URL: https://svnweb.freebsd.org/changeset/base/351700
>
> Log:
>  libc: Use musl's optimized memchr
>
>  Parentheses added to HASZERO macro to avoid a GCC warning.
>
>  Reviewed by:	kib, mjg
>  Obtained from:	musl (snapshot at commit 4d0a82170a)
>  Sponsored by:	The FreeBSD Foundation
>  Differential Revision:	https://reviews.freebsd.org/D17631
>
> Modified:
>  head/lib/libc/string/memchr.c
>
> Modified: head/lib/libc/string/memchr.c
> ==============================================================================
> --- head/lib/libc/string/memchr.c	Mon Sep  2 13:55:31 2019	(r351699)
> +++ head/lib/libc/string/memchr.c	Mon Sep  2 13:56:44 2019	(r351700)
> @@ -1,55 +1,54 @@
> /*-
> - * SPDX-License-Identifier: BSD-3-Clause
> + * SPDX-License-Identifier: MIT
>  *
> - * Copyright (c) 1990, 1993
> - *	The Regents of the University of California.  All rights reserved.
> + * Copyright (c) 2005-2014 Rich Felker, et al.
>  *
> - * This code is derived from software contributed to Berkeley by
> - * Chris Torek.

I prefer Torek's version.  Optimizing this function is especially unimportant,
and this version has mounds of style bugs.

> -void *
> -memchr(const void *s, int c, size_t n)
> -{
> -	if (n != 0) {
> -		const unsigned char *p = s;
> +#define SS (sizeof(size_t))
> +#define ALIGN (sizeof(size_t)-1)
> +#define ONES ((size_t)-1/UCHAR_MAX)
> +#define HIGHS (ONES * (UCHAR_MAX/2+1))
> +#define HASZERO(x) (((x)-ONES) & ~(x) & HIGHS)
>
> -		do {
> -			if (*p++ == (unsigned char)c)
> -				return ((void *)(p - 1));
> -		} while (--n != 0);

Old KNF code.  In the !__GNUC__ #else clausew , this is replaced by
equivalent code with a style that is very far from KNF.

If compilers understood this function, then they could reasonably optimize
the old code in very AMD ways, e.g., use AVX512.  This is not so reasonable
for the new code.  clang does such optimizations to a fault, but knows that
it doesn't understand this code or even strlen(), so it does a naive
translation.

> +void *memchr(const void *src, int c, size_t n)
> +{
> +	const unsigned char *s = src;
> +	c = (unsigned char)c;
> +#ifdef __GNUC__
> +	for (; ((uintptr_t)s & ALIGN) && n && *s != c; s++, n--);
> +	if (n && *s != c) {
> +		typedef size_t __attribute__((__may_alias__)) word;

This seems to have no dependencies on __GNUC__ except the use of anti-aliasing,
and that dependendency is broken (very old versions of gcc don't even have
__attribute__(()), and old versions don't have __may_alias).

> +		const word *w;
> +		size_t k = ONES * c;
> +		for (w = (const void *)s; n>=SS && !HASZERO(*w^k); w++, n-=SS);
> +		s = (const void *)w;
> 	}

This is cryptic as well as non-KNF.  libc/string/strlen.c uses the same
dubious optimization method, but it is at least clearly written and it is
even in KNF.

> -	return (NULL);
> +#endif
> +	for (; n && *s != c; s++, n--);
> +	return n ? (void *)s : 0;
> }

I checked that this optimization actually works.  For long strings, it is
almost sizeof(size_t) times faster, by accessing memory with a size
sizeof(size_t).  size_t is a not very good way of hard-coding the word size.

Related silly optimizations:
- i386 has memchr() in asm.  This uses scasb, which was last optimal on the
   8088, or perhaps the original i386.  On freefall, it is several times slower
   than the naive translation of the naive C code.
- i386 pessimizes several old string functions using scasb.  amd64 mostly
   doesn't copy this mistake.  Similarly for wide char string functions.
   Optimizing these is even more unimportant and just writing them in asm is
   quite complicated.  They are perhaps not fully pessimized in asm on i386.
   amd64 doesn't have any of them in asm.
- i386 also does dubious alignment optimizations
- amd64 pessimizes strcpy() by implementing it as a C wrapper for stpcpy().
   The wrapper code is MI, but is only used for amd64.
- amd64 stpcpy(), strcat() and strcmp() uses the fancy optimization using
   the 0x0101... and 0x8080...  tricks (as in this version of memchr()
   except not so cryptically since the constants are literal instead
   of constructed using macros from the word type).  Optimizing strcat()
   some of these functions is unimportant.  Worth doing at most once,
   in the C version.
- the generic strlen() (but no other generic string function) is optimized
   using the 0x0101... and 0x8080... tricks.  This optimization is dubious
   since it is a pessimization for short strings.  Compilers often inline
   string functions, especially strlen() anyway.  I think x86 compilers
   used to routinely inline strlen(), but now they know that they don't
   understand it, so they just call the library strlen() for most strings
   that are not known at compile time.  This results in the fancy optimization
   actually being used.  Sometimes this optimization improves the speed of
   a large program by as much as 0.001%.

Bruce


More information about the svn-src-all mailing list