svn commit: r363773 - projects/clang1100-import/contrib/llvm-project/compiler-rt/lib/builtins

Jessica Clarke jrtc27 at freebsd.org
Mon Aug 3 17:05:52 UTC 2020


On 3 Aug 2020, at 17:50, Dimitry Andric <dim at FreeBSD.org> wrote:
> 
> On 2 Aug 2020, at 22:50, Jessica Clarke <jrtc27 at FreeBSD.org> wrote:
>> 
>> On 2 Aug 2020, at 19:07, Dimitry Andric <dim at FreeBSD.org> wrote:
>>> 
>>> Author: dim
>>> Date: Sun Aug  2 18:07:16 2020
>>> New Revision: 363773
>>> URL: https://svnweb.freebsd.org/changeset/base/363773
>>> 
>>> Log:
>>> Reapply r230021, r276851 and a few other commits to compiler-rt
>>> 
>>> Reapply r230021 (by ed):
>>> 
>>> Add a workaround to prevent endless recursion in compiler-rt.
>>> 
>>> SPARC and MIPS CPUs don't have special instructions to count
>>> leading/trailing zeroes. The compiler-rt library provides fallback
>>> rountines for these. The 64-bit routines, __clzdi2 and __ctzdi2, are
>>> implemented as simple wrappers around the compiler built-in
>>> __builtin_clz(), assuming these will expand to either 32-bit
>>> CPU instructions or calls to __clzsi2 and __ctzsi2.
> ...
>> Are you sure this is still necessary? https://reviews.llvm.org/D42902
>> (which landed in in 2018 for 7.0.0, way after the original r230021 in
>> 2012) followed by a follow-up commit for the correct SPARC macro, fixed
>> this in an OS-independent way upstream, but inside c?zdi2.c themselves
>> so you won't notice a merge conflict.
> 
> Yes, you are probably right, though the preceding comment in int_lib.h
> specifically mentions:
> 
> * Instead of placing this workaround in c?zdi2.c, put it in this
> * global header to prevent other C files from making the detour
> * through __c?zdi2() as well.
> 
> There are indeed quite a lot of calls to __builtin_c[lt]z throughout
> compiler-rt, so removing this workaround from the int_lib.h header
> will possibly pessimize all of those. Is that going to work alright for
> all affected architectures, which appear to be mips64, riscv and
> sparc64?

I don't think it matters much. __clzdi2 is this (the preprocessed
version of which you provided):

  dwords x;
  x.all = a;
  const si_int f = -(x.s.high == 0);
  return __builtin_clz((x.s.high & ~f) | (x.s.low & f)) +
         (f & ((si_int)(sizeof(si_int) * CHAR_BIT)));

The implementation of __clzsi2 does a bunch more work than that (not
crazy, but still several times more), so I don't think you'll notice
all that much. Besides, Clang is our primary compiler and won't suffer
from this workaround, so I personally have no qualms about a small
performance hit when using GCC and what should be a relatively cold[1]
function. I'm of the view we should be as close to upstream as
possible, ideally with zero diffs, so personally wouldn't carry patches
like this unless it had a noticeable affect on system performance, but
even then would encourage others to patch it upstream first and
foremost.

Jess

[1] Even if compiler-rt uses it a lot, I would expect the work done by
its callers, and the callers of the callers etc, to far outweigh the
time taken within __clzdi2 (and similarly for other related variants).

> Note also that https://bugs.llvm.org/show_bug.cgi?id=11663 is *still*
> open, possibly because of that reason. But it looks like interest in
> the bug simply waned, and it was never closed.
> 
> In any case, I managed to get the new-new-new-mk2-style version of
> mips64-xtoolchain working, and got these warnings, so indeed the fix
> is now applied in two places:
> 
>  /usr/src/contrib/llvm-project/compiler-rt/lib/builtins/clzdi2.c:23: warning: "__builtin_clz" redefined
>     23 | #define __builtin_clz(a) __clzsi2(a)
>        |
>  In file included from /usr/src/contrib/llvm-project/compiler-rt/lib/builtins/clzdi2.c:13:
>  /usr/src/contrib/llvm-project/compiler-rt/lib/builtins/int_lib.h:112: note: this is the location of the previous definition
>    112 | #define __builtin_clz __clzsi2
>        |
> 
> The preprocessed C code looks OK, in the sense that it delegates to
> __clzsi2 now:
> 
>  extern si_int __clzsi2(si_int);
> 
>  si_int __clzdi2(di_int a) {
>    dwords x;
>    x.all = a;
>    const si_int f = -(x.s.high == 0);
>    return __clzsi2((x.s.high & ~f) | (x.s.low & f)) +
>           (f & ((si_int)(sizeof(si_int) * 8)));
>  }
> 
> And the resulting assembly shows no further external calls:
> 
>  0000000000000000 <__clzdi2>:
>     0: 67bdffe0        daddiu  sp,sp,-32
>     4: ffbf0018        sd      ra,24(sp)
>     8: ffbc0010        sd      gp,16(sp)
>     c: ffb00008        sd      s0,8(sp)
>    10: 3c1c0000        lui     gp,0x0
>    14: 0399e02d        daddu   gp,gp,t9
>    18: 679c0000        daddiu  gp,gp,0
>    1c: 0004103f        dsra32  v0,a0,0x0
>    20: 2c430001        sltiu   v1,v0,1
>    24: 0003802f        dnegu   s0,v1
>    28: 2463ffff        addiu   v1,v1,-1
>    2c: 00621824        and     v1,v1,v0
>    30: 00042000        sll     a0,a0,0x0
>    34: 00902024        and     a0,a0,s0
>    38: df990000        ld      t9,0(gp)
>    3c: 0320f809        jalr    t9
>    40: 00642025        or      a0,v1,a0
>    44: 32100020        andi    s0,s0,0x20
>    48: 02021021        addu    v0,s0,v0
>    4c: dfbf0018        ld      ra,24(sp)
>    50: dfbc0010        ld      gp,16(sp)
>    54: dfb00008        ld      s0,8(sp)
>    58: 03e00008        jr      ra
>    5c: 67bd0020        daddiu  sp,sp,32
> 
> In summary, I'm fine with dropping this part of r230021, as long as
> everybody is convinced this does not pessimize the rest too much.
> 
> -Dimitry



More information about the svn-src-projects mailing list