using SSE2 in kernel C code (improving AES-NI module)
Konstantin Belousov
kostikbel at gmail.com
Tue Oct 23 08:47:54 UTC 2012
On Tue, Oct 23, 2012 at 12:04:17AM -0700, John-Mark Gurney wrote:
> Konstantin Belousov wrote this message on Sun, Oct 21, 2012 at 09:10 +0300:
> > On Sat, Oct 20, 2012 at 07:47:26PM -0700, John-Mark Gurney wrote:
> > > Peter Wemm wrote this message on Sat, Oct 20, 2012 at 11:10 -0700:
> > > > Or, another option.. do something like genassym or the many other
> > > > kernel build tools. aicasm builds and runs a userland tool to
> > > > generate something to build into the kernel. With sufficient
> > > > cross-contamination safeguards I wonder if something similar might be
> > > > able to be done here.
> > >
> > > Well, looks like I may this working... Turns out I can't name the file
> > > .s otherwise config puts it in SFILES which causes all sorts of problems..
> > > So, I went w/ .nos, does any one else have any suggestions?
> > >
> > > how does this look to people:
> > > aesni_wrap2.nos optional aesni \
> > > dependency "$S/crypto/aesni/aesni_wrap2.c" \
> > > compile-with "${CC} -O3 -fPIC -S -o aesni_wrap2.nos $S/crypto/aesni/aesni_wrap2.c" \
> > > no-obj no-implicit-rule before-depend \
> > > clean "aesni_wrap2.nos"
> > > aesni_wrap2.o optional aesni \
> > > dependency "aesni_wrap2.nos" \
> > > compile-with "${NORMAL_S} aesni_wrap2.nos" \
> > > no-implicit-rule \
> > > clean "aesni_wrap2.o"
> > >
> > > We'll have to do something similar in the module Makefile, but that is
> > > easier...
> > >
> > > Also, I thought we had a better way to note that some devices depend
> > > upon others than just throwing a depend error... If you include aesni
> > > w/o crypto, you get error about missing cryptodev_if.h...
> > >
> > Hm, if such thing is possible, why do you need to compile through the
> > .S at all ? All you need is to specify the special compiling flags,
> > including -msse and -msse2.
>
> Thanks, I managed to get it down to one...
>
> > Note, you shall not need -fPIC, at least for amd64. I would suggest to use
> > -O2, as well as to try to honour the -g settings.
>
> If I don't do -fpic I get:
> aesni_wrap2.o:(.eh_frame+0x20): relocation truncated to fit: R_X86_64_32 against `.text'
>
> when linking the kernel... If you can explain to me how to get rid of
> this error, I'll do it..
Yes, because you need -mcmodel=kernel on amd64, but -fPIC on i386.
This is why I suggested to use CFLAGS, which takes care of it in single
place.
It would be huge PITA to duplicate the kernel compilation flag for
arch in some obscure place. The best would be to edit the CFLAGS in place,
if possible (I do not know make to judge). Second possible way is to
add some var like CFLAGS_SSE to centralized place.
>
> > Most likely, you can put the ${CFLAGS} on the command line, followed
> > by -msse -msse2.
>
> I can't use CFLAGS because it removes access to the xmmintrin.h header
> file... It looks like an option is to use:
> -fpic ${OPTFLAGS:C/^-O2$/-O3/} ${DEBUG}
>
> In my testing, -O2 is significantly slower, hence the bump to -O3:
> x O2.txt
> + O3.txt
> N Min Max Median Avg Stddev
> x 20 1741.3491 1754.987 1752.9267 1751.5602 3.5616947
> + 20 2223.217 2244.4501 2242.7028 2240.3183 5.7020691
> Difference at 95.0% confidence
> 488.758 +/- 3.04271
> 27.9042% +/- 0.173715%
> (Student's t, pooled s = 4.75391)
>
> Those are MB/sec...
I think that -O3 compile output have to be validated manually, due to
high-risk optimizations. Anyway, if it works there, great.
>
> Index: files.amd64
> ===================================================================
> --- files.amd64 (revision 241041)
> +++ files.amd64 (working copy)
> @@ -137,6 +137,11 @@
> crypto/aesni/aeskeys_amd64.S optional aesni
> crypto/aesni/aesni.c optional aesni
> crypto/aesni/aesni_wrap.c optional aesni
> +aesni_wrap2.o optional aesni \
> + dependency "$S/crypto/aesni/aesni_wrap2.c" \
> + compile-with "${CC} -c -fpic ${COPTFLAGS:C/^-O2$/-O3/} ${DEBUG} -o aesni_wrap2.o $S/crypto/aesni/aesni_wrap2.c" \
> + no-implicit-rule \
> + clean "aesni_wrap2.o"
> crypto/blowfish/bf_enc.c optional crypto | ipsec
> crypto/des/des_enc.c optional crypto | ipsec | netsmb
> crypto/via/padlock.c optional padlock
>
>
> I still need to fix up i386, and will let people review a full patch
> to address both arches before committing...
>
> --
> John-Mark Gurney Voice: +1 415 225 5579
>
> "All that I will do, has been done, All that I have, has not."
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20121023/c1f33b55/attachment.sig>
More information about the freebsd-arch
mailing list