git: af366d353b84 - main - amd64: implement strlen in assembly
Jessica Clarke
jrtc27 at freebsd.org
Mon Feb 8 19:36:46 UTC 2021
On 8 Feb 2021, at 19:15, Mateusz Guzik <mjg at FreeBSD.org> wrote:
>
> The branch main has been updated by mjg:
>
> URL: https://cgit.FreeBSD.org/src/commit/?id=af366d353b84bdc4e730f0fc563853abc338271c
>
> commit af366d353b84bdc4e730f0fc563853abc338271c
> Author: Mateusz Guzik <mjg at FreeBSD.org>
> AuthorDate: 2021-02-08 17:01:48 +0000
> Commit: Mateusz Guzik <mjg at FreeBSD.org>
> CommitDate: 2021-02-08 19:15:21 +0000
>
> amd64: implement strlen in assembly
>
> The C variant in libkern performs excessive branching to find the
> non-zero byte instead of using the bsfq instruction. The same code
> patched to use it is still slower than the routine implemented here
> as the compiler keeps neglecting to perform certain optimizations
> (like using leaq).
>
> On top of that the routine can is a starting point for copyinstr
> which operates on words instead of bytes.
>
> Tested with glibc test suite.
>
> Sample results (calls/s):
>
> Haswell:
> $(perl -e "print 'A' x 3"):
> stock: 211198039
> patched:338626619
> asm: 465609618
>
> $(perl -e "print 'A' x 100"):
> stock: 83151997
> patched: 98285919
> asm: 120719888
>
> AMD EPYC 7R32:
> $(perl -e "print 'A' x 3"):
> stock: 282523617
> asm: 491498172
>
> $(perl -e "print 'A' x 100"):
> stock: 114857172
> asm: 112082057
No Reviewed by? More than one pair of eyes on non-trivial assembly is
almost always a good idea.
Jess
More information about the dev-commits-src-all
mailing list