using SSE2 in kernel C code (improving AES-NI module)

Konstantin Belousov kostikbel at gmail.com
Tue Oct 23 08:47:54 UTC 2012


On Tue, Oct 23, 2012 at 12:04:17AM -0700, John-Mark Gurney wrote:
> Konstantin Belousov wrote this message on Sun, Oct 21, 2012 at 09:10 +0300:
> > On Sat, Oct 20, 2012 at 07:47:26PM -0700, John-Mark Gurney wrote:
> > > Peter Wemm wrote this message on Sat, Oct 20, 2012 at 11:10 -0700:
> > > > Or, another option.. do something like genassym or the many other
> > > > kernel build tools.  aicasm builds and runs a userland tool to
> > > > generate something to build into the kernel.  With sufficient
> > > > cross-contamination safeguards I wonder if something similar might be
> > > > able to be done here.
> > > 
> > > Well, looks like I may this working...  Turns out I can't name the file
> > > .s otherwise config puts it in SFILES which causes all sorts of problems..
> > > So, I went w/ .nos, does any one else have any suggestions?
> > > 
> > > how does this look to people:
> > > aesni_wrap2.nos                 optional aesni                             \
> > >         dependency      "$S/crypto/aesni/aesni_wrap2.c"                    \
> > >         compile-with    "${CC} -O3 -fPIC -S -o aesni_wrap2.nos $S/crypto/aesni/aesni_wrap2.c" \   
> > >         no-obj no-implicit-rule before-depend                              \
> > >         clean           "aesni_wrap2.nos"
> > > aesni_wrap2.o                   optional aesni                             \
> > >         dependency      "aesni_wrap2.nos"                                  \
> > >         compile-with    "${NORMAL_S} aesni_wrap2.nos"                      \
> > >         no-implicit-rule                                                   \
> > >         clean           "aesni_wrap2.o"
> > > 
> > > We'll have to do something similar in the module Makefile, but that is
> > > easier...
> > > 
> > > Also, I thought we had a better way to note that some devices depend
> > > upon others than just throwing a depend error...  If you include aesni
> > > w/o crypto, you get error about missing cryptodev_if.h...
> > > 
> > Hm, if such thing is possible, why do you need to compile through the
> > .S at all ? All you need is to specify the special compiling flags,
> > including -msse and -msse2.
> 
> Thanks, I managed to get it down to one...
> 
> > Note, you shall not need -fPIC, at least for amd64. I would suggest to use
> > -O2, as well as to try to honour the -g settings.
> 
> If I don't do -fpic I get:
> aesni_wrap2.o:(.eh_frame+0x20): relocation truncated to fit: R_X86_64_32 against `.text'
> 
> when linking the kernel...  If you can explain to me how to get rid of
> this error, I'll do it..
Yes, because you need -mcmodel=kernel on amd64, but -fPIC on i386.
This is why I suggested to use CFLAGS, which takes care of it in single
place.

It would be huge PITA to duplicate the kernel compilation flag for
arch in some obscure place. The best would be to edit the CFLAGS in place,
if possible (I do not know make to judge). Second possible way is to
add some var like CFLAGS_SSE to centralized place.

> 
> > Most likely, you can put the ${CFLAGS} on the command line, followed
> > by -msse -msse2.
> 
> I can't use CFLAGS because it removes access to the xmmintrin.h header
> file...  It looks like an option is to use:
> -fpic ${OPTFLAGS:C/^-O2$/-O3/} ${DEBUG}
> 
> In my testing, -O2 is significantly slower, hence the bump to -O3:
> x O2.txt
> + O3.txt
>     N           Min           Max        Median           Avg        Stddev
> x  20     1741.3491      1754.987     1752.9267     1751.5602     3.5616947
> +  20      2223.217     2244.4501     2242.7028     2240.3183     5.7020691
> Difference at 95.0% confidence
>         488.758 +/- 3.04271
>         27.9042% +/- 0.173715%
>         (Student's t, pooled s = 4.75391)
> 
> Those are MB/sec...
I think that -O3 compile output have to be validated manually, due to
high-risk optimizations. Anyway, if it works there, great.

> 
> Index: files.amd64
> ===================================================================
> --- files.amd64	(revision 241041)
> +++ files.amd64	(working copy)
> @@ -137,6 +137,11 @@
>  crypto/aesni/aeskeys_amd64.S	optional aesni
>  crypto/aesni/aesni.c		optional aesni
>  crypto/aesni/aesni_wrap.c	optional aesni
> +aesni_wrap2.o			optional aesni				   \
> +	dependency	"$S/crypto/aesni/aesni_wrap2.c"			   \
> +	compile-with    "${CC} -c -fpic ${COPTFLAGS:C/^-O2$/-O3/} ${DEBUG} -o aesni_wrap2.o $S/crypto/aesni/aesni_wrap2.c" \
> +	no-implicit-rule						   \
> +	clean           "aesni_wrap2.o"
>  crypto/blowfish/bf_enc.c	optional	crypto | ipsec 
>  crypto/des/des_enc.c		optional	crypto | ipsec | netsmb
>  crypto/via/padlock.c		optional	padlock
> 
> 
> I still need to fix up i386, and will let people review a full patch
> to address both arches before committing...
> 
> -- 
>   John-Mark Gurney				Voice: +1 415 225 5579
> 
>      "All that I will do, has been done, All that I have, has not."
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20121023/c1f33b55/attachment.sig>


More information about the freebsd-arch mailing list