ZFS-backed NFS export with vSphere

Rick Macklem rmacklem at uoguelph.ca
Thu Jun 27 21:58:20 UTC 2013


Zoltan Nagy wrote:
> Hi list,
> 
> I'd love to have a ZFS-backed NFS export as my VM datastore, but as
> much as
> I'd like to tune
> it, the performance doesn't even get close to Solaris 11's.
> 
> I currently have the system set up as this:
> 
>   pool: tank
>  state: ONLINE
>   scan: none requested
> config:
> 
>     NAME        STATE     READ WRITE CKSUM
>     tank        ONLINE       0     0     0
>       mirror-0  ONLINE       0     0     0
>         da0     ONLINE       0     0     0
>         da1     ONLINE       0     0     0
>       mirror-1  ONLINE       0     0     0
>         da2     ONLINE       0     0     0
>         da3     ONLINE       0     0     0
>     logs
>       ada0p4    ONLINE       0     0     0
>     spares
>       da4       AVAIL
> 
> ada0 is a samsung 840pro SSD, which I'm using for system+ZIL.
> daX is 1TB, 7200rpm seagate disks.
> (from this test's perspective, if I use a separate ZIL device or just
> a
> partition, doesn't matter - I get roughly the same numbers).
> 
> The first thing I noticed is that the FSINFO reply from FreeBSD is
> advertising untunable values (I did not find them documented either
> in the
> manpage, or as a sysctl).
> 
> rtmax, rtpref, wtmax, wtpref: 64k (fbsd), 1M (solaris)
> dtpref: 64k (fbsd), 8k (solaris)
> 
> After manually patching the nfs code (changing NFS_MAXBSIZE to 1M
> instead
> of MAXBSIZE) to adversize the same read/write values (didn't touch
> dtpref),
> my performance went up from 17MB/s to 76MB/s.
> 
> Is there a reason NFS_MAXBSIZE is not tunable and/or is it so slow?
> 
For exporting other file system types (UFS, ...) the buffer cache is
used and MAXBSIZE is the largest block you can use for the buffer cache.
Some increase of MAXBSIZE would be nice. (I've tried 128Kb without observing
difficulties and from what I've been told 128Kb is the ZFS block size.)

> Here's my iozone output (which is run on an ext4 partition created on
> a
> linux VM which has a disk backed by the NFS exported from the FreeBSD
> box):
> 
>     Record Size 4096 KB
>     File size set to 2097152 KB
>     Command line used: iozone -b results.xls -r 4m -s 2g -t 6 -i 0 -i
>     1 -i 2
>     Output is in Kbytes/sec
>     Time Resolution = 0.000001 seconds.
>     Processor cache size set to 1024 Kbytes.
>     Processor cache line size set to 32 bytes.
>     File stride size set to 17 * record size.
>     Throughput test with 6 processes
>     Each process writes a 2097152 Kbyte file in 4096 Kbyte records
> 
>     Children see throughput for  6 initial writers     =   76820.31
>     KB/sec
>     Parent sees throughput for  6 initial writers     =   74899.44
>     KB/sec
>     Min throughput per process             =   12298.62 KB/sec
>     Max throughput per process             =   12972.72 KB/sec
>     Avg throughput per process             =   12803.38 KB/sec
>     Min xfer                     = 1990656.00 KB
> 
>     Children see throughput for  6 rewriters     =   76030.99 KB/sec
>     Parent sees throughput for  6 rewriters     =   75062.91 KB/sec
>     Min throughput per process             =   12620.45 KB/sec
>     Max throughput per process             =   12762.80 KB/sec
>     Avg throughput per process             =   12671.83 KB/sec
>     Min xfer                     = 2076672.00 KB
> 
>     Children see throughput for  6 readers         =  114221.39
>     KB/sec
>     Parent sees throughput for  6 readers         =  113942.71 KB/sec
>     Min throughput per process             =   18920.14 KB/sec
>     Max throughput per process             =   19183.80 KB/sec
>     Avg throughput per process             =   19036.90 KB/sec
>     Min xfer                     = 2068480.00 KB
> 
>     Children see throughput for 6 re-readers     =  117018.50 KB/sec
>     Parent sees throughput for 6 re-readers     =  116917.01 KB/sec
>     Min throughput per process             =   19436.28 KB/sec
>     Max throughput per process             =   19590.40 KB/sec
>     Avg throughput per process             =   19503.08 KB/sec
>     Min xfer                     = 2080768.00 KB
> 
>     Children see throughput for 6 random readers     =  110072.68
>     KB/sec
>     Parent sees throughput for 6 random readers     =  109698.99
>     KB/sec
>     Min throughput per process             =   18260.33 KB/sec
>     Max throughput per process             =   18442.55 KB/sec
>     Avg throughput per process             =   18345.45 KB/sec
>     Min xfer                     = 2076672.00 KB
> 
>     Children see throughput for 6 random writers     =   76389.71
>     KB/sec
>     Parent sees throughput for 6 random writers     =   74816.45
>     KB/sec
>     Min throughput per process             =   12592.09 KB/sec
>     Max throughput per process             =   12843.75 KB/sec
>     Avg throughput per process             =   12731.62 KB/sec
>     Min xfer                     = 2056192.00 KB
> 
> The other interesting this is that you can notice the system doesn't
> cache
> the data file to ram (the box has 32G), so even for re-reads I get
> miserable numbers. With solaris, the re-reads happen at nearly wire
> spead.
> 
> Any ideas what else I could tune? While 76MB/s is much better than
> the
> original 17MB I was seeing, it's still far from Solaris's ~220MB/s...
> 
> Thanks a lot,
> Zoltan
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
> 


More information about the freebsd-fs mailing list