ZFS-backed NFS export with vSphere

Zoltan Arnold NAGY zoltan.arnold.nagy at gmail.com
Thu Jun 27 22:16:49 UTC 2013


Right. As I said, increasing it to 1M increased my throughput from 17MB/s
to 76MB/s.

However, the SSD can do much more random writes; any idea why I don't see
the ZIL go over
this value?
(vSphere always uses sync writes).

Thanks,
Zoltan


On Thu, Jun 27, 2013 at 11:58 PM, Rick Macklem <rmacklem at uoguelph.ca> wrote:

> Zoltan Nagy wrote:
> > Hi list,
> >
> > I'd love to have a ZFS-backed NFS export as my VM datastore, but as
> > much as
> > I'd like to tune
> > it, the performance doesn't even get close to Solaris 11's.
> >
> > I currently have the system set up as this:
> >
> >   pool: tank
> >  state: ONLINE
> >   scan: none requested
> > config:
> >
> >     NAME        STATE     READ WRITE CKSUM
> >     tank        ONLINE       0     0     0
> >       mirror-0  ONLINE       0     0     0
> >         da0     ONLINE       0     0     0
> >         da1     ONLINE       0     0     0
> >       mirror-1  ONLINE       0     0     0
> >         da2     ONLINE       0     0     0
> >         da3     ONLINE       0     0     0
> >     logs
> >       ada0p4    ONLINE       0     0     0
> >     spares
> >       da4       AVAIL
> >
> > ada0 is a samsung 840pro SSD, which I'm using for system+ZIL.
> > daX is 1TB, 7200rpm seagate disks.
> > (from this test's perspective, if I use a separate ZIL device or just
> > a
> > partition, doesn't matter - I get roughly the same numbers).
> >
> > The first thing I noticed is that the FSINFO reply from FreeBSD is
> > advertising untunable values (I did not find them documented either
> > in the
> > manpage, or as a sysctl).
> >
> > rtmax, rtpref, wtmax, wtpref: 64k (fbsd), 1M (solaris)
> > dtpref: 64k (fbsd), 8k (solaris)
> >
> > After manually patching the nfs code (changing NFS_MAXBSIZE to 1M
> > instead
> > of MAXBSIZE) to adversize the same read/write values (didn't touch
> > dtpref),
> > my performance went up from 17MB/s to 76MB/s.
> >
> > Is there a reason NFS_MAXBSIZE is not tunable and/or is it so slow?
> >
> For exporting other file system types (UFS, ...) the buffer cache is
> used and MAXBSIZE is the largest block you can use for the buffer cache.
> Some increase of MAXBSIZE would be nice. (I've tried 128Kb without
> observing
> difficulties and from what I've been told 128Kb is the ZFS block size.)
>
> > Here's my iozone output (which is run on an ext4 partition created on
> > a
> > linux VM which has a disk backed by the NFS exported from the FreeBSD
> > box):
> >
> >     Record Size 4096 KB
> >     File size set to 2097152 KB
> >     Command line used: iozone -b results.xls -r 4m -s 2g -t 6 -i 0 -i
> >     1 -i 2
> >     Output is in Kbytes/sec
> >     Time Resolution = 0.000001 seconds.
> >     Processor cache size set to 1024 Kbytes.
> >     Processor cache line size set to 32 bytes.
> >     File stride size set to 17 * record size.
> >     Throughput test with 6 processes
> >     Each process writes a 2097152 Kbyte file in 4096 Kbyte records
> >
> >     Children see throughput for  6 initial writers     =   76820.31
> >     KB/sec
> >     Parent sees throughput for  6 initial writers     =   74899.44
> >     KB/sec
> >     Min throughput per process             =   12298.62 KB/sec
> >     Max throughput per process             =   12972.72 KB/sec
> >     Avg throughput per process             =   12803.38 KB/sec
> >     Min xfer                     = 1990656.00 KB
> >
> >     Children see throughput for  6 rewriters     =   76030.99 KB/sec
> >     Parent sees throughput for  6 rewriters     =   75062.91 KB/sec
> >     Min throughput per process             =   12620.45 KB/sec
> >     Max throughput per process             =   12762.80 KB/sec
> >     Avg throughput per process             =   12671.83 KB/sec
> >     Min xfer                     = 2076672.00 KB
> >
> >     Children see throughput for  6 readers         =  114221.39
> >     KB/sec
> >     Parent sees throughput for  6 readers         =  113942.71 KB/sec
> >     Min throughput per process             =   18920.14 KB/sec
> >     Max throughput per process             =   19183.80 KB/sec
> >     Avg throughput per process             =   19036.90 KB/sec
> >     Min xfer                     = 2068480.00 KB
> >
> >     Children see throughput for 6 re-readers     =  117018.50 KB/sec
> >     Parent sees throughput for 6 re-readers     =  116917.01 KB/sec
> >     Min throughput per process             =   19436.28 KB/sec
> >     Max throughput per process             =   19590.40 KB/sec
> >     Avg throughput per process             =   19503.08 KB/sec
> >     Min xfer                     = 2080768.00 KB
> >
> >     Children see throughput for 6 random readers     =  110072.68
> >     KB/sec
> >     Parent sees throughput for 6 random readers     =  109698.99
> >     KB/sec
> >     Min throughput per process             =   18260.33 KB/sec
> >     Max throughput per process             =   18442.55 KB/sec
> >     Avg throughput per process             =   18345.45 KB/sec
> >     Min xfer                     = 2076672.00 KB
> >
> >     Children see throughput for 6 random writers     =   76389.71
> >     KB/sec
> >     Parent sees throughput for 6 random writers     =   74816.45
> >     KB/sec
> >     Min throughput per process             =   12592.09 KB/sec
> >     Max throughput per process             =   12843.75 KB/sec
> >     Avg throughput per process             =   12731.62 KB/sec
> >     Min xfer                     = 2056192.00 KB
> >
> > The other interesting this is that you can notice the system doesn't
> > cache
> > the data file to ram (the box has 32G), so even for re-reads I get
> > miserable numbers. With solaris, the re-reads happen at nearly wire
> > spead.
> >
> > Any ideas what else I could tune? While 76MB/s is much better than
> > the
> > original 17MB I was seeing, it's still far from Solaris's ~220MB/s...
> >
> > Thanks a lot,
> > Zoltan
> > _______________________________________________
> > freebsd-fs at freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
> >
>


More information about the freebsd-fs mailing list