Increasing GELI performance

Tue Jul 31 17:27:37 UTC 2007

> -----Original Message-----
> From: Fluffles [mailto:etc at fluffles.net] 
> Sent: 31 July 2007 14:26
> To: Pawel Jakub Dawidek
> Cc: Dominic Bishop; geom at FreeBSD.org
> Subject: Re: Increasing GELI performance
> 
> Pawel Jakub Dawidek wrote:
> > On Mon, Jul 30, 2007 at 10:35:32PM +0200, Fluffles wrote:
> >   
> >> Pawel Jakub Dawidek wrote:
> >>     
> >>> No matter how many cores/cpus you have if you run single-threaded 
> >>> application. What you do exactly is:
> >>> 1. Send read of 128kB.
> >>> 2. One of geli threads picks it up, decrypts and sends it back.
> >>> 3. Send next read of 128kB.
> >>> 4. One of geli threads picks it up, decrypts and sends it back.
> >>> ...
> >>>
> >>> All threads will be used when there are more threads 
> accessing provider.
> >>>  
> >>>       
> >> But isn't it true that the UFS filesystem utilizes read-ahead and 
> >> with that a multiple I/O queue depth (somewhere between 7 
> to 9 queued 
> >> I/O's)
> >> - even when using something like dd to sequentially read a 
> file on a 
> >> mounted filesystem ? Then this read-ahead will cause multiple I/O 
> >> request coming in and geom_eli can use multiple threads to 
> maximize 
> >> I/O throughput. Maybe Dominic can try playing with the 
> "vfs.read_max" 
> >> sysctl variable.
> >>     
> >
> > You are right in general, but if you reread e-mail I was 
> answering to, 
> > you will see that the author was reading from/writing to GEOM 
> > provider, not file system.
> >   
> 
> Ah yes you're right. Though he might also have tested on a 
> mounted filesystem, his email does not explicitely specify 
> this. So he should re-run his experiment:
> 
> geli onetime /dev/da0
> newfs /dev/da0
> mkdir /test
> mount /dev/da0 /test
> dd if=/dev/zero of=/test/zerofile.000 bs=1m count=2000 (write 
> score) dd if=/test/zerofile.000 of=/dev/null bs=1m (read score)
> 
> That *should* give him higher performance. Also, dominic 
> might try increasing the block size, using "newfs -b 32768 
> /dev/da0". Without it, it seems it hits a performance roof on 
> about ~130MB/s. I once wrote about this on the mailinglist 
> where Bruce Evans questioned the usefulness of a blocksize 
> higher than 16KB. I still have to investigate this further, 
> it's on my to-do list. Deeplink: 
> http://lists.freebsd.org/pipermail/freebsd-fs/2006-October/002298.html
> 
> I tried to recreate a test scenario myself, using 4 disks in 
> a striping configuration (RAID0), first reading and writing 
> on the raw .eli device, then on a mounted filesystem:
> 
> ** raw device
> # dd if=/dev/stripe/data.eli of=/dev/null bs=1m count=2000 
> 2097152000 bytes transferred in 57.949793 secs (36189120 
> bytes/sec) # dd if=/dev/zero of=/dev/stripe/data.eli bs=1m count=2000
> 1239416832 bytes transferred in 35.168374 secs (35242370 bytes/sec)
> 
> ** mounted default newfs
> # dd if=/dev/zero of=/test/zerofile.000 bs=1m count=2000 
> 2097152000 bytes transferred in 47.843614 secs (43833478 
> bytes/sec) # dd if=/test/zerofile.000 of=/dev/null bs=1m 
> count=2000 2097152000 bytes transferred in 50.328749 secs 
> (41669067 bytes/sec)
> 
> This was on a simple single core Sempron K8 CPU, but already 
> there's a difference with the multiple queue depth VFS/UFS provides.
> 
> Good luck Dominic be sure to post again when you have new 
> scores! I'm interested to see how far you can push GELI with 
> a quadcore. :)
> 
> - Veronica
> 

Unfortunately I can no longer do testing as I hit a deadline and had to put
the boxes into production, so can only do non destructive tests now.

It is interesting you should mention non default block sizes though as I did
try this and got some very nasty problems.

I initially tried the following settings:

geli init -s 8192 /dev/da0p2
newfs -b 65536 -f 8192 -n -U -m1 /dev/da0p2.eli

The reasoning for this is that newfs uses the geli sector size for fragment
size anyway by default (quite logically) and the newfs man page suggested a
ratio of 8:1 block:fragment size as an ideal ratio, also newfs wouldn't
allow a larger than 65536 block, so overall this seemed like the ideal
setting for best performance. My data load would be all files into MBs so
any space lost by larger fragmentss was pretty irrelevant.

This all *appeared* to be working fine, however after transferring a couple
of hundred GB onto the filesystem, unmounting and doing a fsck I got a ton
of filesystem errors of all kinds, a few examples:

** Phase 1 - Check Blocks and Sizes
PARTIALLY ALLOCATED INODE I=3523664

UNKNOWN FILE TYPE I=3523696

** Phase 2 - Check Pathnames
DIRECTORY CORRUPTED  I=3523603  OWNER=dom MODE=40755
SIZE=32256 MTIME=Jul 29 15:10 2007
DIR=/DATA/TEST

** Phase 3 - Check Connectivity
UNREF DIR  I=18855983  OWNER=dom MODE=40755
SIZE=1024 MTIME=Jul 29 14:39 2007 

** Phase 4 - Check Reference Counts
LINK COUNT FILE I=3523760  OWNER=4017938749 MODE=24015 SIZE=0 MTIME=Jan  1
00:00 1970  COUNT -9594 SHOULD BE 1

I was running out of time by this point so I went back to settings I used in
the past on other boxes, taken from the geli man page a sector size of 4096
to geli and no settings for newfs, leaving it to use a block size of 16384
and fragment size of 4096. After putting a few hundred GB on the filesystem
a fsck passed perfectly.

Unfortunately I didn't have time to see which of the geli/newfs settings
specifically was causing a problem or if it was just the combination of the
two.

I am reasonably confident it wasn't a hardware issue as this happened the
same on 4 machines,
3 of these were identical dual 3.0Ghz netburst xeons running RELENG_6 i386,
the other was Quad 1.6Ghz conroe based xeon cores running RELENG_6 amd64,
the src was RELENG_6 as of the 27th on all machines. At no point were any
errors logged to messages or console.log

The one common factor amongst all machines was a 3ware 9550SXU-12 as the
underlying raw device, but as it has worked flawlessly with the different
geli/newfs parameters I highly doubt this was the problem.

Regards

Dominic Bishop