ChaCha8/12/20 and GEOM ELI tests

Alexey Ivanov savetherbtz at gmail.com
Tue Jan 13 03:40:23 UTC 2015


Just curious: why does a stream cipher use mode of operation (e.g. XTS)?

> On Jan 12, 2015, at 3:34 PM, John-Mark Gurney <jmg at funkthat.com> wrote:
> 
> rozhuk.im at gmail.com wrote this message on Mon, Jan 12, 2015 at 23:40 +0300:
>>>> Cha?ha patch:
>>>> 
>>> http://netlab.linkpc.net/download/software/FreeBSD/patches/chacha.patch
>>> 
>>> What's the difference between CHACHA and XCHACHA?
>> 
>> Same as between SALSA and XSALSA.
>> 
>> XChaCha20 uses a 256-bit key as well as the first 128 bits of the nonce in
>> order to compute a subkey. This subkey, as well as the remaining 64 bits of
>> the nonce, are the parameters of the ChaCha20 function used to actually
>> generate the stream.
>> 
>> But with XChaCha20's longer nonce, it is safe to generate nonces using
>> randombytes_buf() for every message encrypted with the same key without
>> having to worry about a collision.
>> 
>> More details: http://cr.yp.to/snuffle/xsalsa-20081128.pdf
> 
> Ahh, thanks..
> 
>>> Also, where are the man page diffs?  They might have explained the
>>> difference between the two, and explained why two versions of chacha
>>> are needed...
>> 
>> No man page diffs.
> 
> You need to document the new defines in crypto(9), and document the
> various parameters in crypto(7)...  Yes, not all modes are documented
> in crypto(7), but going forward, at a minimum we need to document new
> additions...
> 
> I'll admit I didn't document the other algorithms as I'm not as familar
> w/ those as the ones that I worked one...
> 
>> Man pages does not explain difference between AES-CBC and AES-XTS...
> 
> True, but CBC and XTS (which includes a reference to the standard) are
> a lot more searchable/common knowlege than xchacha..  google thinks you
> mean chacha, and xchacha just turns up a bunch of people on various
> networks... Not until you search on xchacha crypto do you get a relevant
> page...  Also, wikipedia doesn't have an entry for xchacha, nor does
> the chacha (cipher) page list it...  So, when documenting xchacha in
> crypto(7), include a link to the description/standard...
> 
>>> Is there a reason you decided to write your own ChaCha implementation
>>> instead of using one of the standard ones?  Did you run performance
>>> tests between your implementation and others?
>> 
>> Reference ChaCha and reference (FreeBSD) XTS (4k sector):
>> ChaCha8-XTS-256   = 199518722 bytes/sec
>> ChaCha12-XTS-256  = 179029849 bytes/sec
>> ChaCha20-XTS-256  = 149447317 bytes/sec
>> XChaCha8-XTS-256  = 195675728 bytes/sec
>> XChaCha12-XTS-256 = 175790196 bytes/sec
>> XChaCha20-XTS-256 = 147939263 bytes/sec
> 
> So, you're seeing a 33%-50% improvement, good to hear...
> 
> Also, do you publish this implementation somewhere?  If so, it'd be
> helpful to include a url to where up to date versions can be obtained...
> If you don't plan on publishing/maintaining it outside of FreeBSD, then
> we need to unifdef out the Windows parts of it for our tree...
> 
>> This is the reference version adapted for use in /dev/crypto.
>> chacha_block_unaligneg() - processing the reference version of a data block.
>> Macros are used for readability.
>> chacha_block_aligned() - the same but the work on the aligned data.
> 
> Please use the macro __NO_STRICT_ALIGNMENT to decide if special work
> is necessary to handle the alignment...
> 
> What is the CHACHA_X64 macro for?  If that is to detect LP64 platforms,
> please use the macro __LP64__ to decide this...  Have you done
> performance evaluations on 32bit arches to make sure double rounds aren't
> a benefit there too?
> 
> Use the byteorder(9) macros to encode/decode integers instead of rolling
> your own (U8TO32_LITTLE and U32TO8_LITTLE)...  Turns out compilers aren't
> good at optimizing this type of code, and platforms may have assembly
> optimized versions for these...
> 
>> To increase speed, instead of one byte is processed for 4/8 byte times.
>> The data in the context of an 8-byte aligned.
>> To increase security, all data, including temporary, saved in a context that
>> on completion of the work is filled with zeros.
> 
> Please use the function explicite_bzero that is available for all of
> these instead of creating your own..
> 
>>>> HW: Core Duo E8500, 8Gb DDR2-800.
>>>> dd if=/dev/zero of=/dev/md0 bs=1m
>>>> 2148489421 bytes/sec
>>>> 
>>>> 
>>>> # sector = 512b
>>>> 3DES-CBC-192      =  20773120 bytes/sec
>>>> AES-CBC-128       =  85276853 bytes/sec
>>>> AES-CBC-256       =  68893016 bytes/sec
>>>> AES-XTS-128       =  68194868 bytes/sec
>>>> AES-XTS-256       =  56611573 bytes/sec
>>>> Blowfish-CBC-128  =  11169657 bytes/sec
>>>> Blowfish-CBC-256  =  11185891 bytes/sec
>>>> Camellia-CBC-128  =  78077243 bytes/sec
>>>> Camellia-CBC-256  =  65732219 bytes/sec
>>>> ChaCha8-XTS-256   = 258042765 bytes/sec
>>>> ChaCha12-XTS-256  = 223616967 bytes/sec
>>>> ChaCha20-XTS-256  = 176005366 bytes/sec
>>>> XChaCha8-XTS-256  = 228292624 bytes/sec
>>>> XChaCha12-XTS-256 = 195577624 bytes/sec
>>>> XChaCha20-XTS-256 = 152247267 bytes/sec
>>>> XChaCha20-XTS-128 = 152717737 bytes/sec ! 128 bit key have same speed
>>>> as 256
>>>> 
>>>> 
>>>> # sector = 4kb
>>>> 3DES-CBC-192      =  22018189 bytes/sec
>>>> AES-CBC-128       = 104097143 bytes/sec
>>>> AES-CBC-256       =  81983833 bytes/sec
>>>> AES-XTS-128       =  78559346 bytes/sec
>>>> AES-XTS-256       =  66047200 bytes/sec
>>>> Blowfish-CBC-128  =  38635464 bytes/sec
>>>> Blowfish-CBC-256  =  38810555 bytes/sec
>>>> Camellia-CBC-128  =  92814510 bytes/sec
>>>> Camellia-CBC-256  =  75949489 bytes/sec
>>>> ChaCha8-XTS-256   = 337336982 bytes/sec
>>>> ChaCha12-XTS-256  = 284740187 bytes/sec
>>>> ChaCha20-XTS-256  = 217326865 bytes/sec
>>>> XChaCha8-XTS-256  = 328424551 bytes/sec
>>>> XChaCha12-XTS-256 = 278579692 bytes/sec
>>>> XChaCha20-XTS-256 = 211660225 bytes/sec
>>>> 
>>>> Optimized AES-XTS - speed like AES-CBC:
>>>> AES-XTS-128       = 102841051 bytes/sec
>>>> AES-XTS-256       =  80813644 bytes/sec
>>> 
>>> Is this from a different patch or what?  Can you talk more about this?
>> 
>> No patch at this moment.
>> After optimization ChaCha-XTS I applied these optimizations to the AES-XTS
>> and get this result.
>> All changes were aes_xts_reinit() and aes_xts_crypt(), just slightly changed
>> the structure aes_xts_ctx.
>> 
>> aes_xts_ctx:
>> u_int8_t tweak[] -> u_int64_t tweak[]
>> 
>> aes_xts_reinit -> same as chacha_xts_reinit()
>> 
>> aes_xts_crypt -> same as chacha_xts_crypt():
>> block[] - temp buf removed;
>> xor 1 byte -> xor 8 bytes at once;
>> tweak[i] << 1: rotl 1 bit: 1 byte -> 8 bytes;
>> unroll loops;
> 
> Ahh, I thought I had done some similar optimizations, but I only did
> them to the aesni version of the routines...  You should use the macro
> above to decide if things are aligned or not...
> 
>> 
>> Final:
>> 
>> struct aes_xts_ctx {
>> 	rijndael_ctx key1;
>> 	rijndael_ctx key2;
>> 	uint64_t tweak[(AES_XTS_BLOCKSIZE / sizeof(uint64_t))];
>> };
>> 
>> void
>> aes_xts_reinit(caddr_t key, u_int8_t *iv)
>> {
>> 	struct aes_xts_ctx *ctx = (struct aes_xts_ctx *)key;
>> 
>> 	/*
>> 	 * Prepare tweak as E_k2(IV). IV is specified as LE representation
>> 	 * of a 64-bit block number which we allow to be passed in directly.
>> 	 */
>> 	if (ALIGNED_POINTER(iv, uint64_t)) {
>> 		ctx->tweak[0] = (*((uint64_t*)(void*)iv));
>> 	} else {
>> 		bcopy(iv, ctx->tweak, sizeof(uint64_t));
>> 	}
>> 	/* Convert to LE. */
>> 	ctx->tweak[0] = htole64(ctx->tweak[0]);
> 
> Hmm... this line bothers me.. I'll need to spend more time reading up
> to decide if it is buggy or not...  Is ctx->tweak in host order? or LE
> order?  I believe it's suppose to be LE order, as it gets passed
> directly to _encryt..  I'm also not sure if the original code is BE
> clean, which is part of my problem...
> 
>> 	/* Last 64 bits of IV are always zero */
>> 	ctx->tweak[1] = 0;
>> 
>> 	rijndael_encrypt(&ctx->key2, (uint8_t*)ctx->tweak,
>> (uint8_t*)ctx->tweak);
>> }
>> 
>> static void
>> aes_xts_crypt(struct aes_xts_ctx *ctx, u_int8_t *data, u_int do_encrypt)
>> {
>> 	size_t i;
>> 	uint64_t crr, tm;
>> 
>> 	if (ALIGNED_POINTER(blk, uint64_t)) {
>> 		((uint64_t*)(void*)data)[0] ^= ctx->tweak[0];
>> 		((uint64_t*)(void*)data)[1] ^= ctx->tweak[1];
>> 	} else {
>> 		for (i = 0; i < AES_XTS_BLOCKSIZE; i ++)
>> 			data[i] ^= ((uint8_t*)ctx->tweak)[i];
>> 	}
>> 
>> 	if (do_encrypt)
>> 		rijndael_encrypt(&ctx->key1, data, data);
>> 	else
>> 		rijndael_decrypt(&ctx->key1, data, data);
>> 
>> 	if (ALIGNED_POINTER(blk, uint64_t)) {
>> 		((uint64_t*)(void*)data)[0] ^= ctx->tweak[0];
>> 		((uint64_t*)(void*)data)[1] ^= ctx->tweak[1];
>> 	} else {
>> 		for (i = 0; i < AES_XTS_BLOCKSIZE; i ++)
>> 			data[i] ^= ((uint8_t*)ctx->tweak)[i];
>> 	}
>> 
>> 	/* Exponentiate tweak */
>> 	crr = (ctx->tweak[0] >> ((sizeof(uint64_t) * 8) - 1));
>> 	ctx->tweak[0] = (ctx->tweak[0] << 1);
>> 
>> 	tm = ctx->tweak[1];
>> 	ctx->tweak[1] = ((tm << 1) | crr);
>> 	crr = (tm >> ((sizeof(uint64_t) * 8) - 1));
>> 
>> 	if (crr)
>> 		ctx->tweak[0] ^= 0x87; /* GF(2^128) generator polynomial. */
> 
> Please use the AES_XTS_ALPHA define instead of hardcoding the value..
> 
> Thanks.
> 
> --
>  John-Mark Gurney				Voice: +1 415 225 5579
> 
>     "All that I will do, has been done, All that I have, has not."
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.freebsd.org/pipermail/freebsd-geom/attachments/20150112/6af6d459/attachment.sig>


More information about the freebsd-geom mailing list