using SSE2 in kernel C code (improving AES-NI module)
Peter Wemm
peter at wemm.org
Sat Oct 20 18:10:39 UTC 2012
On Sat, Oct 20, 2012 at 10:11 AM, John-Mark Gurney <jmg at funkthat.com> wrote:
> Konstantin Belousov wrote this message on Sat, Oct 20, 2012 at 08:48 +0300:
>> On Fri, Oct 19, 2012 at 04:38:33PM -0700, John-Mark Gurney wrote:
>> > So, the AES-NI module already uses SSE2 instructions, but it does so
>> > only in assembly. I have improved the performance of the AES-NI
>> > modules implementation, but this involves me using additional SSE2
>> > instructions.
>> >
>> > In order to keep my sanity, I did part of the new code in C using
>> > gcc native types and xmmintrin.h, but we do not support this header in
>> > the kernel.. This means we cannot simply add the new code to the
>> > kernel...
>> >
>> > Any good ideas on how to integrate this code into the kernel build?
>
> [...]
>
>>
>> The current structure of the aes-ni driver is partly enforced by the
>> issue you noted. We cannot use sse intristics in the kernel, and
>> huge inline assembler fragments are hard to write.
>>
>> I prefer to have the separate .S files with the optimized code,
>> hand-written. If needed, I offer you a help with transition. I would
>> need a full patch to rewrite the code.
>
> Are you sure you want to do this? It'll involve writing around 500
> lines of assembly besides the constants... And it isn't simple like
> the aesni_enc where we have a single loop for the rounds... I've
> posted a tar.gz to overlay onto sys/crypto/aesni at:
> https://www.funkthat.com/~jmg/aesni.repfile.tar.gz
Rather than go straight to assembler, why not use the __builtins?
static inline __m128i
xts_crank_lfsr(__m128i inp)
{
const __m128i alphamask = _mm_set_epi32(1, 1, 1, AES_XTS_ALPHA);
__m128i xtweak, ret;
/* set up xor mask */
xtweak = _mm_shuffle_epi32(inp, 0x93);
xtweak = _mm_srai_epi32(xtweak, 31);
xtweak &= alphamask;
/* next term */
ret = _mm_slli_epi32(inp, 1);
ret ^= xtweak;
return ret;
}
-->
static inline __m128i
xts_crank_lfsr(__m128i inp)
{
const __m128i alphamask = (magic casts){ 1, 1, 1, AES_XTS_ALPHA };
__m128i xtweak, ret;
/* set up xor mask */
xtweak = __builtin_ia32_pshufd (inp, 0x93);
xtweak = __builtin_ia32_psradi128(xtweak, 31);
xtweak &= alphamask;
/* next term */
ret = __builtin_ia32_pslldi128(inp, 1);
ret ^= xtweak;
return ret;
}
I know I skipped the details like data types, but most of the meat of
those functions collapses to a simple wrapper around a __builtin.
Or, another option.. do something like genassym or the many other
kernel build tools. aicasm builds and runs a userland tool to
generate something to build into the kernel. With sufficient
cross-contamination safeguards I wonder if something similar might be
able to be done here.
--
Peter Wemm - peter at wemm.org; peter at FreeBSD.org; peter at yahoo-inc.com; KI6FJV
"All of this is for nothing if we don't go to the stars" - JMS/B5
"If Java had true garbage collection, most programs would delete
themselves upon execution." -- Robert Sewell
More information about the freebsd-arch
mailing list