[Bug 218203] Implement AVX2 accelerated Fletcher algorithms

Thu Mar 30 17:07:42 UTC 2017

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218203

--- Comment #1 from kungfujesus06 at gmail.com ---
If desired, I can post my benchmark code.  It is using more instructions than
the zfsonlinux variant (I used SIMD intrinsics instead of inline assembly). 
The extra instructions are mostly just shuffling values between registers. 
After the intermediate sum loop is completed I aliased into the __m256i's
instead of doing vmovqdu into memory for the constant multiplications.  I
suspect the compiler was able to shuffle registers around enough to avoid some
trips to memory, but the Intel whitepaper isn't quite fair to itself, as I
think they are comparing the best possible performance without SIMD (which is
not the original loop, but the loop unrolled 4 times) with their SIMD variant.

-- 
You are receiving this mail because:
You are the assignee for the bug.