allow ffs & co. a binary search

Fri Jun 12 13:32:06 UTC 2015

On 06/12/2015 09:03 AM, Richard Yao wrote:
> On 06/11/2015 09:59 PM, Hans Petter Selasky wrote:
>> On 06/11/15 16:59, Richard Yao wrote:
>>> On 06/11/2015 10:52 AM, Ian Lepore wrote:
>>>> On Thu, 2015-06-11 at 10:49 -0400, Richard Yao wrote:
>>>>> On 06/06/2015 08:13 PM, Erich Dollansky wrote:
>>>> [...]
>>>>>
>>>>> The fastest portable way of calculating highest bit set is to use a
>>>>> debrujin sequence:
>>>>>
>>>>> https://graphics.stanford.edu/~seander/bithacks.html#IntegerLogDeBruijn
>>>>>
>>>>> The binary approach is inferior, especially on Pentium 4 processors.
>>>>
>>>> And of course it's crucial that we optimize for Pentium 4 processors in
>>>> 2015.
>>>
>>> The debrujin approach is fast regardless of the architecture. Anyway,
>>> this is a solved problem. There is no need to reinvent the wheel when
>>> the portable solutions are listed on that page at Stanford.
>>>
>>
>> Hi Richard,
>>
>> ffs() can be implemented like this simply:
>>
>> int ffs_A(uint32_t v)
>> {
>> static const int MultiplyDeBruijnBitPosition[32] =
>> {
>>   0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
>>   8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
>> };
>> v = ~(v - 1) & v;
>> return (MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) >> 27]);
>> }
> 
> There is a way to do that on the stanford page too:
> 
> https://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightMultLookup

My apologies. I inadvertently linked to the wrong variant of that in my
initial email. It has been a while since I visited that page and I was
juggling a few things when I wrote that email, so it was easy to confuse
the two.

Anyway, it would be possible to do something like:

#if defined(__GNUC__) || (defined(__clang__) &&
__has_builtin(__builtin_ffs))
# define ffs(n) __builtin_ffs(n)
#else
/*
 * From Sean Eron Anderson's Bit Twiddling Hacks
 *
https://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightMultLookup
 */
static inline int
ffs(uint32_t v)
{
static const int MultiplyDeBruijnBitPosition[32] =
{
  0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
  8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
};
v = ~(v - 1) & v;
return (MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) >> 27]);
}
#endif

If there were an autotools-like check for `__has_builtin()`, then an
explicit list of compilers known to support `__builtin_ffs()` would not
be necessary, although a list of compilers that lack support for
`__has_builtin()` (such as GCC) would still be necessary. Those building
with compilers that do not support `__builtin_ffs()` could fallback to
the DeBrujin approach.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20150612/58440dd1/attachment.sig>