using SSE2 in kernel C code (improving AES-NI module)
Jilles Tjoelker
jilles at stack.nl
Sat Oct 20 21:07:33 UTC 2012
On Sat, Oct 20, 2012 at 09:18:26PM +0300, Konstantin Belousov wrote:
> On Sat, Oct 20, 2012 at 11:10:37AM -0700, Peter Wemm wrote:
> > On Sat, Oct 20, 2012 at 10:11 AM, John-Mark Gurney <jmg at funkthat.com> wrote:
> > > Konstantin Belousov wrote this message on Sat, Oct 20, 2012 at 08:48 +0300:
> > >> On Fri, Oct 19, 2012 at 04:38:33PM -0700, John-Mark Gurney wrote:
> > >> > So, the AES-NI module already uses SSE2 instructions, but it does so
> > >> > only in assembly. I have improved the performance of the AES-NI
> > >> > modules implementation, but this involves me using additional SSE2
> > >> > instructions.
> > >> > In order to keep my sanity, I did part of the new code in C using
> > >> > gcc native types and xmmintrin.h, but we do not support this header in
> > >> > the kernel.. This means we cannot simply add the new code to the
> > >> > kernel...
> > >> > Any good ideas on how to integrate this code into the kernel build?
> > > [...]
> > >> The current structure of the aes-ni driver is partly enforced by the
> > >> issue you noted. We cannot use sse intristics in the kernel, and
> > >> huge inline assembler fragments are hard to write.
> > >> I prefer to have the separate .S files with the optimized code,
> > >> hand-written. If needed, I offer you a help with transition. I would
> > >> need a full patch to rewrite the code.
> > > Are you sure you want to do this? It'll involve writing around 500
> > > lines of assembly besides the constants... And it isn't simple like
> > > the aesni_enc where we have a single loop for the rounds... I've
> > > posted a tar.gz to overlay onto sys/crypto/aesni at:
> > > https://www.funkthat.com/~jmg/aesni.repfile.tar.gz
> > Rather than go straight to assembler, why not use the __builtins?
> > static inline __m128i
> > xts_crank_lfsr(__m128i inp)
> > {
> > const __m128i alphamask = _mm_set_epi32(1, 1, 1, AES_XTS_ALPHA);
> > __m128i xtweak, ret;
> >
> > /* set up xor mask */
> > xtweak = _mm_shuffle_epi32(inp, 0x93);
> > xtweak = _mm_srai_epi32(xtweak, 31);
> > xtweak &= alphamask;
> >
> > /* next term */
> > ret = _mm_slli_epi32(inp, 1);
> > ret ^= xtweak;
> >
> > return ret;
> > }
> > -->
> > static inline __m128i
> > xts_crank_lfsr(__m128i inp)
> > {
> > const __m128i alphamask = (magic casts){ 1, 1, 1, AES_XTS_ALPHA };
> > __m128i xtweak, ret;
> >
> > /* set up xor mask */
> > xtweak = __builtin_ia32_pshufd (inp, 0x93);
> > xtweak = __builtin_ia32_psradi128(xtweak, 31);
> > xtweak &= alphamask;
> >
> > /* next term */
> > ret = __builtin_ia32_pslldi128(inp, 1);
> > ret ^= xtweak;
> >
> > return ret;
> > }
> > I know I skipped the details like data types, but most of the meat of
> > those functions collapses to a simple wrapper around a __builtin.
As far as I understand, the __builtins are mostly a compiler
implementation detail. They are not as standardized as the intrinsics
from *mmintrin.h.
> Are builtins available for -mno-sse compilation ?
They are not.
I did notice that Clang will compile __builtin_ia32_movnti down to a
regular MOV if SSE2 is not enabled, but this seems rarely useful.
> I think we can try to reimplement the builtins needed with inline
> assembly.
This should be possible but slightly ugly.
> > Or, another option.. do something like genassym or the many other
> > kernel build tools. aicasm builds and runs a userland tool to
> > generate something to build into the kernel. With sufficient
> > cross-contamination safeguards I wonder if something similar might be
> > able to be done here.
Is the C compiler with additional flags -mmmx -msse2 also a possible
build tool? If *mmintrin.h are made available, that should work, right?
One detail is that GCC and Clang have their own versions of these header
files. GCC also needs a dummy mm_malloc.h; Clang's xmmintrin.h refrains
from including this in a free-standing environment.
Of course, all code compiled in such a way must only be run with a valid
FPU context, since the compiler may use SSE instructions anywhere.
--
Jilles Tjoelker
More information about the freebsd-arch
mailing list