NFS on ZFS pure SSD pool

Rick Macklem rmacklem at uoguelph.ca
Tue Aug 27 23:08:12 UTC 2013


I wrote:
> Outback Dingo wrote:
> > 
> > 
> > 
> > 
> > 
> > 
> > On Tue, Aug 27, 2013 at 3:29 PM, Rick Macklem <
> > rmacklem at uoguelph.ca
> > > wrote:
> > 
> > 
> > 
> > 
> > Eric Browning wrote:
> > > Hello, first time posting to this list. I have a new server that
> > > is
> > > not
> > > living up to the promise of SSD speeds and NFS is maxing out the
> > > CPU.
> > > I'm
> > > new to FreeBSD but I've been reading up on it as much as I can. I
> > > have
> > > obscured my IP addresses and hostname with x's so just ignore
> > > that.
> > > Server has about 200 users on it each draing under 50Mb/s peak
> > > sustained
> > > around 1-2Mb/s
> > > 
> > > I've followed some network tuning guides for our I350t4 nic and
> > > that
> > > has
> > > helped with network performance somewhat but the server is still
> > > experiencing heavy load with pegging the CPU at 1250% on average
> > > with
> > > only
> > > 50Mb/s of traffin in/out of the machine. All of the network
> > > tuning
> > > came
> > > from https://calomel.org/freebsd_network_tuning.html since it was
> > > relevant
> > > to the same nic that I have.
> > > 
> > > Server Specs:
> > > FreeBSD 9.1
> > > 16 cores AMDx64
> > > 64GB of ram
> > > ZFS v28 with four Intel DC S3700 drives (800GB) as a zfs stripe
> > > Intel DC S3500 for ZIL and enabling/disabling has made no
> > > difference
> > > Used a spare DC S3700 for the ZIL and that made no difference
> > > either.
> > > NFS v3 & v4 for Mac home folders whose Cache fodler is
> > > redirected.
> > > 
> > > I've tried:
> > > Compression on/of <-- no appreciable difference
> > > Deduplication on/off <-- no appreciable difference
> > > sync=disabled and sync=standard <-- no appreciable difference
> > > setting arc cache to 56GB and also to 32GB <-- no difference in
> > > performance
> > > in terms of kern.
> > > 
> > > I've tried to follow the freebsd tuning guide:
> > > https://wiki.freebsd.org/ZFSTuningGuide to no avail either. I've
> > > read
> > > everything I can find on NFS on ZFS and nothing has helped. WHere
> > > am
> > > I
> > > going wrong?
> > > 
> > You could try this patch:
> > http://people.freebsd.org/~rmacklem/drc4-stable9.patch
> > - After applying the patch and booting a kernel built from the
> > patched
> > sources, you need to increase the value of vfs.nfsd.tcphighwater.
> > (Try something like 5000 for it as a starting point.)
> > 
> > 
> > 
> > 
> > can we get a brief on what this is supposed to improve upon ?
> > 
> It was developed for and tested by wollman@ to reduce mutex lock
> contention and CPU overheads for the duplicate request cache, mainly
> for NFS over TCP. (For the CPU overheads case, it allows the cache
> to grow larger, reducing the frequency and, therefore, overhead of
> trimming out stale entries.)
Oh, and I should also mention that ivoras@ developed a similar patch which
had better code structure than mine. I did use some of his code in the patch
that went into head, but not as much as I would have liked, because I wanted
to get it into head before code slush for 10.0. (I had already missed the 9.2
release.) I think I did convince ivoras@ that global LRU was appropriate for
UDP and that using a single list/mutex was the simplest coding of this.

rick

> Here is the commit message, which I think covers it:
> 
> Fix several performance related issues in the new NFS server's
> DRC for NFS over TCP.
> - Increase the size of the hash tables.
> - Create a separate mutex for each hash list of the TCP hash table.
> - Single thread the code that deletes stale cache entries.
> - Add a tunable called vfs.nfsd.tcphighwater, which can be increased
>   to allow the cache to grow larger, avoiding the overhead of
>   frequent
>   scans to delete stale cache entries.
>   (The default value will result in frequent scans to delete stale
>   cache
>    entries, analagous to what the pre-patched code does.)
> - Add a tunable called vfs.nfsd.cachetcp that can be used to disable
>   DRC caching for NFS over TCP, since the old NFS server didn't DRC
>   cache TCP.
> It also adjusts the size of nfsrc_floodlevel dynamically, so that it
> is
> always greater than vfs.nfsd.tcphighwater.
> 
> For UDP the algorithm remains the same as the pre-patched code, but
> the
> tunable vfs.nfsd.udphighwater can be used to allow the cache to grow
> larger and reduce the overhead caused by frequent scans for stale
> entries.
> UDP also uses a larger hash table size than the pre-patched code.
> 
> Reported by:	wollman
> Tested by:	wollman (earlier version of patch)
> Submitted by:	ivoras (earlier patch)
> Reviewed by:	jhb (earlier version of patch)
> 
> > 
> > Although this patch is somewhat different code, it should be
> > semantically
> > the same as r254337 in head, that is scheduled to be MFC'd to
> > stable/9 in
> > a couple of weeks.
> > 
> > rick
> > 
> > 
> > 
> > > Here's /boot/loader:
> > > [quote]
> > > # ZFS tuning tweaks
> > > aio_load="YES" # Async IO system calls
> > > autoboot_delay="10" # reduce boot menu delay time from 10 to 3
> > > seconds
> > > vfs.zfs.arc_max="56868864000" # Reserves 10GB or ram for system,
> > > leaves
> > > 56GB for ZFS
> > > vfs.zfs.cache_flush_disable="1"
> > > #vfs.zfs.prefetch_disble="1"
> > > vfs.zfs.write_limit_override="429496728"
> > > 
> > > kern.ipc.nmbclusters="264144" # increase the number of network
> > > mbufs
> > > kern.maxfiles="65535"
> > > net.inet.tcp.syncache.hashsize="1024" # Size of the syncache hash
> > > table
> > > net.inet.tcp.syncache.bucketlimit="100" # Limit the number of
> > > entries
> > > permitted in each bucket of the hash table.
> > > net.inet.tcp.tcbhashsize="32768"
> > > 
> > > # Link Aggregation loader tweaks. see:
> > > https://calomel.org/freebsd_network_tuning.html
> > > hw.igb.enable_msix="1"
> > > hw.igb.num_queues="0"
> > > hw.igb.enable_aim="1"
> > > hw.igb.max_interrupt_rate="32000"
> > > hw.igb.rxd="2048"
> > > hw.igb.txd="2048"
> > > hw.igb.rx_process_limit="4096"
> > > if_lagg_load="YES"
> > > [/quote]
> > > 
> > > Here's etc/sysctl.conf:
> > > [quote]
> > > # $FreeBSD: release/9.1.0/etc/sysctl.conf 112200 2003-03-13
> > > 18:43:50Z
> > > mux $
> > > #
> > > # This file is read when going to multi-user and its contents
> > > piped
> > > thru
> > > # ``sysctl'' to adjust kernel values. ``man 5 sysctl.conf'' for
> > > details.
> > > #
> > > 
> > > # Uncomment this to prevent users from seeing information about
> > > processes
> > > that
> > > # are being run under another UID.
> > > #security.bsd.see_other_uids=0
> > > kern.ipc.somaxconn=1024
> > > kern.maxusers=272
> > > #kern.maxvnodes=1096848 #increase this if necessary
> > > kern.ipc.maxsockbuf=8388608
> > > net.inet.tcp.mssdflt=1460
> > > net.inet.ip.forwarding=1
> > > net.inet.ip.fastforwarding=1
> > > dev.igb.2.fc=0
> > > dev.igb.3.fc=0
> > > dev.igb.4.fc=0
> > > dev.igb.5.fc=0
> > > dev.igb.2.rx_procesing_limit=10000
> > > dev.igb.3.rx_procesing_limit=10000
> > > dev.igb.4.rx_procesing_limit=10000
> > > dev.igb.5.rx_procesing_limit=10000
> > > net.inet.ip.redirect=0
> > > net.inet.icmp.bmcastecho=0 # do not respond to ICMP packets sent
> > > to IP
> > > .255
> > > net.inet.icmp.maskfake=0 # do not fake reply to ICMP Address
> > > Mask
> > > Request packets
> > > net.inet.icmp.maskrepl=0 # replies are not sent for ICMP
> > > address mask
> > > net.inet.icmp.log_redirect=0 # do not log redirected ICMP packet
> > > attempts
> > > net.inet.icmp.drop_redirect=1 # no redirected ICMP packets
> > > net.inet.tcp.drop_synfin=1 # SYN/FIN packets get dropped on
> > > initial
> > > connection
> > > net.inet.tcp.ecn.enable=1 # explicit congestion notification
> > > (ecn)
> > > warning: some ISP routers abuse it
> > > net.inet.tcp.icmp_may_rst=0 # icmp may not send RST to avoid
> > > spoofed
> > > icmp/udp floods
> > > net.inet.tcp.maxtcptw=15000 # max number of tcp time_wait states
> > > for
> > > closing connections
> > > net.inet.tcp.msl=5000 # 5 second maximum segment life
> > > waiting for
> > > an ACK in reply to a SYN-ACK or FIN-ACK
> > > net.inet.tcp.path_mtu_discovery=0 # disable MTU discovery since
> > > most
> > > ICMP
> > > packets are dropped by others
> > > net.inet.tcp.rfc3042=0 # disable the limited transmit
> > > mechanism
> > > which can slow burst transmissions
> > > net.inet.ip.rtexpire=60 # 3600 secs
> > > net.inet.ip.rtminexpire=2 # 10 secs
> > > net.inet.ip.rtmaxcache=1024 # 128 entries
> > > [/quote]
> > > 
> > > Here's /etc/rc.conf
> > > [quote]
> > > #ifconfig_igb2=" inet xxx.xx.x.xx netmask 255.255.248.0"
> > > hostname="xxxxxxxxxxxxxxxxxxx"
> > > #
> > > # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
> > > dumpdev="NO"
> > > #
> > > ### LACP config
> > > ifconfig_igb2="up"
> > > ifconfig_igb3="up"
> > > ifconfig_igb4="up"
> > > ifconfig_igb5="up"
> > > cloned_interfaces="lagg0"
> > > ifconfig_lagg0="laggproto lacp laggport igb2 laggport igb3
> > > laggport
> > > igb4
> > > laggport igb5 xxx.xx.x.xx netmask 255.255.248.0"
> > > ipvr_addrs_lagg0="xxx.xx.x.xx"
> > > defaultrouter="xxx.xx.x.xx"
> > > #
> > > ### Defaults for SSH, NTP, ZFS
> > > sshd_enable="YES"
> > > ntpd_enable="YES"
> > > zfs_enable="YES"
> > > #
> > > ## NFS Server
> > > rpcbind_enable="YES"
> > > nfs_server_enable="YES"
> > > mountd_flags="-r -l"
> > > nfsd_enable="YES"
> > > mountd_enable="YES"
> > > rpc_lockd_enable="NO"
> > > rpc_statd_enable="NO"
> > > nfs_server_flags="-u -t -n 128"
> > > nfsv4_server_enable="YES"
> > > nfsuserd_enable="YES"
> > > [/quote]
> > > 
> > > Thanks in advance,
> > > --
> > > Eric Browning
> > > _______________________________________________
> > > freebsd-fs at freebsd.org mailing list
> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > > To unsubscribe, send any mail to "
> > > freebsd-fs-unsubscribe at freebsd.org "
> > > 
> > _______________________________________________
> > freebsd-fs at freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > To unsubscribe, send any mail to "
> > freebsd-fs-unsubscribe at freebsd.org
> > "
> > 
> > 
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
> 


More information about the freebsd-fs mailing list