using SSE2 in kernel C code (improving AES-NI module)
John-Mark Gurney
jmg at funkthat.com
Tue Oct 23 07:04:25 UTC 2012
Konstantin Belousov wrote this message on Sun, Oct 21, 2012 at 09:10 +0300:
> On Sat, Oct 20, 2012 at 07:47:26PM -0700, John-Mark Gurney wrote:
> > Peter Wemm wrote this message on Sat, Oct 20, 2012 at 11:10 -0700:
> > > Or, another option.. do something like genassym or the many other
> > > kernel build tools. aicasm builds and runs a userland tool to
> > > generate something to build into the kernel. With sufficient
> > > cross-contamination safeguards I wonder if something similar might be
> > > able to be done here.
> >
> > Well, looks like I may this working... Turns out I can't name the file
> > .s otherwise config puts it in SFILES which causes all sorts of problems..
> > So, I went w/ .nos, does any one else have any suggestions?
> >
> > how does this look to people:
> > aesni_wrap2.nos optional aesni \
> > dependency "$S/crypto/aesni/aesni_wrap2.c" \
> > compile-with "${CC} -O3 -fPIC -S -o aesni_wrap2.nos $S/crypto/aesni/aesni_wrap2.c" \
> > no-obj no-implicit-rule before-depend \
> > clean "aesni_wrap2.nos"
> > aesni_wrap2.o optional aesni \
> > dependency "aesni_wrap2.nos" \
> > compile-with "${NORMAL_S} aesni_wrap2.nos" \
> > no-implicit-rule \
> > clean "aesni_wrap2.o"
> >
> > We'll have to do something similar in the module Makefile, but that is
> > easier...
> >
> > Also, I thought we had a better way to note that some devices depend
> > upon others than just throwing a depend error... If you include aesni
> > w/o crypto, you get error about missing cryptodev_if.h...
> >
> Hm, if such thing is possible, why do you need to compile through the
> .S at all ? All you need is to specify the special compiling flags,
> including -msse and -msse2.
Thanks, I managed to get it down to one...
> Note, you shall not need -fPIC, at least for amd64. I would suggest to use
> -O2, as well as to try to honour the -g settings.
If I don't do -fpic I get:
aesni_wrap2.o:(.eh_frame+0x20): relocation truncated to fit: R_X86_64_32 against `.text'
when linking the kernel... If you can explain to me how to get rid of
this error, I'll do it..
> Most likely, you can put the ${CFLAGS} on the command line, followed
> by -msse -msse2.
I can't use CFLAGS because it removes access to the xmmintrin.h header
file... It looks like an option is to use:
-fpic ${OPTFLAGS:C/^-O2$/-O3/} ${DEBUG}
In my testing, -O2 is significantly slower, hence the bump to -O3:
x O2.txt
+ O3.txt
N Min Max Median Avg Stddev
x 20 1741.3491 1754.987 1752.9267 1751.5602 3.5616947
+ 20 2223.217 2244.4501 2242.7028 2240.3183 5.7020691
Difference at 95.0% confidence
488.758 +/- 3.04271
27.9042% +/- 0.173715%
(Student's t, pooled s = 4.75391)
Those are MB/sec...
Index: files.amd64
===================================================================
--- files.amd64 (revision 241041)
+++ files.amd64 (working copy)
@@ -137,6 +137,11 @@
crypto/aesni/aeskeys_amd64.S optional aesni
crypto/aesni/aesni.c optional aesni
crypto/aesni/aesni_wrap.c optional aesni
+aesni_wrap2.o optional aesni \
+ dependency "$S/crypto/aesni/aesni_wrap2.c" \
+ compile-with "${CC} -c -fpic ${COPTFLAGS:C/^-O2$/-O3/} ${DEBUG} -o aesni_wrap2.o $S/crypto/aesni/aesni_wrap2.c" \
+ no-implicit-rule \
+ clean "aesni_wrap2.o"
crypto/blowfish/bf_enc.c optional crypto | ipsec
crypto/des/des_enc.c optional crypto | ipsec | netsmb
crypto/via/padlock.c optional padlock
I still need to fix up i386, and will let people review a full patch
to address both arches before committing...
--
John-Mark Gurney Voice: +1 415 225 5579
"All that I will do, has been done, All that I have, has not."
More information about the freebsd-arch
mailing list