> Inlining the big function do_hard_work() helps for gcc on
> amd64 (about 5% faster), but makes no significant difference for clang.
> The previous testing was mostly with gcc.

How are you inlining?  With the C99 inline keyword, which changes the linkage type but only provides and advisory hint to the compiler with regard to inlining (which, in a modern compiler, is largely ignored), or with the always_inline attribute, which forces the compiler to inline the function?


