[RFC] Generic population count function
Jung-uk Kim
jkim at FreeBSD.org
Thu Nov 15 22:54:37 UTC 2012
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 2012-11-15 03:32:50 -0500, Bruce Evans wrote:
> The implementation of the ffs() family has similar non-problems.
> No one cares because they are rarely used, but... cperciva@ wrote
> the lookup table part of the following set of implementations of
> ffs() and fls(). A lookup table is the best general method, though
> it can be beaten by unportable methodw in some cases. I added more
> methods and a test framework to this (test framework not always
> included). The methods are: - gcc builtin - cpufunc inline
Talking about cpufunc ffsl(), I found something strange about clang:
% cat ffs.c
#include <sys/types.h>
#include <machine/cpufunc.h>
int ffs_cpufunc(long x) /* copied from cpufunc.h as is */
{ return (x == 0 ? x : (int)bsfq((u_long)x) + 1); }
int ffs_builtin(long x)
{ return (__builtin_ffsl(x)); }
% clang -O2 -c ffs.c
% objdump -d ffs.o
ffs.o: file format elf64-x86-64-freebsd
Disassembly of section .text:
0000000000000000 <ffs_cpufunc>:
0: 48 85 ff test %rdi,%rdi
3: 74 21 je 26 <ffs_cpufunc+0x26>
5: 48 89 7c 24 f8 mov %rdi,-0x8(%rsp)
a: 48 0f bc 4c 24 f8 bsf -0x8(%rsp),%rcx
10: 48 c1 e1 20 shl $0x20,%rcx
14: 48 b8 00 00 00 00 01 mov $0x100000000,%rax
1b: 00 00 00
1e: 48 01 c8 add %rcx,%rax
21: 48 c1 e8 20 shr $0x20,%rax
25: c3 retq
26: 31 c0 xor %eax,%eax
28: c3 retq
29: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
0000000000000030 <ffs_builtin>:
30: 48 0f bc cf bsf %rdi,%rcx
34: ff c1 inc %ecx
36: 31 c0 xor %eax,%eax
38: 48 85 ff test %rdi,%rdi
3b: 0f 45 c1 cmovne %ecx,%eax
3e: c3 retq
Note clang generates very inefficient code for ffsl() from cpufunc.h.
It is using memory, setting an unused bit, etc. :-(
Both GCC 4.2 and 4.8 generate "good code", of course:
% gcc -O2 -c ffs.c
% objdump -d ffs.o
ffs.o: file format elf64-x86-64-freebsd
Disassembly of section .text:
0000000000000000 <ffs_cpufunc>:
0: 31 c0 xor %eax,%eax
2: 48 85 ff test %rdi,%rdi
5: 74 07 je e <ffs_cpufunc+0xe>
7: 48 0f bc c7 bsf %rdi,%rax
b: 83 c0 01 add $0x1,%eax
e: f3 c3 repz retq
0000000000000010 <ffs_builtin>:
10: 48 0f bc ff bsf %rdi,%rdi
14: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
1b: 48 0f 44 f8 cmove %rax,%rdi
1f: 48 83 c7 01 add $0x1,%rdi
23: 89 f8 mov %edi,%eax
25: c3 retq
% gcc48 -O2 -c ffs.c
% objdump -d ffs.o
ffs.o: file format elf64-x86-64-freebsd
Disassembly of section .text:
0000000000000000 <ffs_cpufunc>:
0: 31 c0 xor %eax,%eax
2: 48 85 ff test %rdi,%rdi
5: 75 09 jne 10 <ffs_cpufunc+0x10>
7: f3 c3 repz retq
9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
10: 48 0f bc c7 bsf %rdi,%rax
14: 83 c0 01 add $0x1,%eax
17: c3 retq
18: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
1f: 00
0000000000000020 <ffs_builtin>:
20: 48 0f bc c7 bsf %rdi,%rax
24: 48 c7 c2 ff ff ff ff mov $0xffffffffffffffff,%rdx
2b: 48 0f 44 c2 cmove %rdx,%rax
2f: 48 83 c0 01 add $0x1,%rax
33: c3 retq
FYI,
Jung-uk Kim
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/
iEYEARECAAYFAlClcnkACgkQmlay1b9qnVPQYACgjEYpPDgLfkskpm3+2vRNjCuJ
n5AAn30JxrFS9kLoejh+4BZFD/MnfAvT
=4yS7
-----END PGP SIGNATURE-----
More information about the freebsd-arch
mailing list