allow ffs & co. a binary search
    Garrett Cooper 
    yaneurabeya at gmail.com
       
    Sun Jun  7 17:23:24 UTC 2015
    
    
  
On Jun 7, 2015, at 9:44, Davide Italiano <davide at freebsd.org> wrote:
> On Sun, Jun 7, 2015 at 9:37 AM, Davide Italiano <davide at freebsd.org> wrote:
>> On Sun, Jun 7, 2015 at 6:54 AM, Konstantin Belousov <kostikbel at gmail.com> wrote:
>>> On Sun, Jun 07, 2015 at 07:52:45PM +0800, Erich Dollansky wrote:
>>>> What I saw is that all CPUs except ARM uses the software version [of ffs].
>>> 
>>> Without quantifiers, this statement is not true. i386 libc function ffs(3)
>>> uses bsfl instruction to do the job.  Compilers know about ffs(3) and friends
>>> as well, so e.g. gcc 5.1.0 generates the following code for the given
>>> fragment:
>>>        return (ffs(x) + 1);
>>> is translated to
>>>   0:   0f bc c7                bsf    %edi,%eax
>>>   3:   ba ff ff ff ff          mov    $0xffffffff,%edx
>>>   8:   0f 44 c2                cmove  %edx,%eax
>>>   b:   83 c0 02                add    $0x2,%eax
>>> (arg in %edi, result in %eax).
>>> 
>>> I wrote a patch for amd64 libc long time ago to convert ffs/fls etc to use
>>> of the bitstring instruction, but Bruce Evans argued that this would be
>>> excessive.  Your patch is excessive for the similar reasons.
>>> 
>>> My guess is that significantly clever compiler would recognize a pattern
>>> used by native ffs implementation and automatically use bitstring
>>> instructions. E.g., this already happens with popcnt and recent
>>> gcc/clang, I am just lazy to verify ffs.
>> 
>> Clang trunk to the best of my knowledgde hasn't a way to recognize
>> ffs() pattern.
>> http://llvm.org/docs/doxygen/html/LoopIdiomRecognize_8cpp_source.html
>> I can't comment about gcc as long as I'm not familiar with the implementation.
> 
> Also, FWIW, for the fragment provided by Kostik, clang seems to
> generate more instructions than gcc does,
> I'll bring this upstream.
> 
>   0:   0f bc c7                bsf    %edi,%eax
>   3:   b9 20 00 00 00          mov    $0x20,%ecx
>   8:   0f 45 c8                cmovne %eax,%ecx
>   b:   83 c1 02                add    $0x2,%ecx
>   e:   85 ff                   test   %edi,%edi
>  10:   b8 01 00 00 00          mov    $0x1,%eax
>  15:   0f 45 c1                cmovne %ecx,%eax
Hi Davide,
	Could you please post a) the clang/gcc compiler versions and b) the compilation options used when producing the snippet? clang -O0 for instance produces a lot of code, whereas -O2 produces considerably less (to the point that there have been a lot of complaints at $work about being able to debug clang-compiled programs because things have all been optimized out).
Thanks!
-NGie
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 496 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20150607/e8023302/attachment.sig>
    
    
More information about the freebsd-hackers
mailing list