NFS on ZFS pure SSD pool

Rick Macklem rmacklem at uoguelph.ca
Tue Aug 27 23:02:21 UTC 2013


Outback Dingo wrote:
> 
> 
> 
> 
> 
> 
> On Tue, Aug 27, 2013 at 3:29 PM, Rick Macklem < rmacklem at uoguelph.ca
> > wrote:
> 
> 
> 
> 
> Eric Browning wrote:
> > Hello, first time posting to this list. I have a new server that is
> > not
> > living up to the promise of SSD speeds and NFS is maxing out the
> > CPU.
> > I'm
> > new to FreeBSD but I've been reading up on it as much as I can. I
> > have
> > obscured my IP addresses and hostname with x's so just ignore that.
> > Server has about 200 users on it each draing under 50Mb/s peak
> > sustained
> > around 1-2Mb/s
> > 
> > I've followed some network tuning guides for our I350t4 nic and
> > that
> > has
> > helped with network performance somewhat but the server is still
> > experiencing heavy load with pegging the CPU at 1250% on average
> > with
> > only
> > 50Mb/s of traffin in/out of the machine. All of the network tuning
> > came
> > from https://calomel.org/freebsd_network_tuning.html since it was
> > relevant
> > to the same nic that I have.
> > 
> > Server Specs:
> > FreeBSD 9.1
> > 16 cores AMDx64
> > 64GB of ram
> > ZFS v28 with four Intel DC S3700 drives (800GB) as a zfs stripe
> > Intel DC S3500 for ZIL and enabling/disabling has made no
> > difference
> > Used a spare DC S3700 for the ZIL and that made no difference
> > either.
> > NFS v3 & v4 for Mac home folders whose Cache fodler is redirected.
> > 
> > I've tried:
> > Compression on/of <-- no appreciable difference
> > Deduplication on/off <-- no appreciable difference
> > sync=disabled and sync=standard <-- no appreciable difference
> > setting arc cache to 56GB and also to 32GB <-- no difference in
> > performance
> > in terms of kern.
> > 
> > I've tried to follow the freebsd tuning guide:
> > https://wiki.freebsd.org/ZFSTuningGuide to no avail either. I've
> > read
> > everything I can find on NFS on ZFS and nothing has helped. WHere
> > am
> > I
> > going wrong?
> > 
> You could try this patch:
> http://people.freebsd.org/~rmacklem/drc4-stable9.patch
> - After applying the patch and booting a kernel built from the
> patched
> sources, you need to increase the value of vfs.nfsd.tcphighwater.
> (Try something like 5000 for it as a starting point.)
> 
> 
> 
> 
> can we get a brief on what this is supposed to improve upon ?
> 
It was developed for and tested by wollman@ to reduce mutex lock
contention and CPU overheads for the duplicate request cache, mainly
for NFS over TCP. (For the CPU overheads case, it allows the cache
to grow larger, reducing the frequency and, therefore, overhead of
trimming out stale entries.)
Here is the commit message, which I think covers it:

Fix several performance related issues in the new NFS server's
DRC for NFS over TCP.
- Increase the size of the hash tables.
- Create a separate mutex for each hash list of the TCP hash table.
- Single thread the code that deletes stale cache entries.
- Add a tunable called vfs.nfsd.tcphighwater, which can be increased
  to allow the cache to grow larger, avoiding the overhead of frequent
  scans to delete stale cache entries.
  (The default value will result in frequent scans to delete stale cache
   entries, analagous to what the pre-patched code does.)
- Add a tunable called vfs.nfsd.cachetcp that can be used to disable
  DRC caching for NFS over TCP, since the old NFS server didn't DRC cache TCP.
It also adjusts the size of nfsrc_floodlevel dynamically, so that it is
always greater than vfs.nfsd.tcphighwater.

For UDP the algorithm remains the same as the pre-patched code, but the
tunable vfs.nfsd.udphighwater can be used to allow the cache to grow
larger and reduce the overhead caused by frequent scans for stale entries.
UDP also uses a larger hash table size than the pre-patched code.

Reported by:	wollman
Tested by:	wollman (earlier version of patch)
Submitted by:	ivoras (earlier patch)
Reviewed by:	jhb (earlier version of patch)

> 
> Although this patch is somewhat different code, it should be
> semantically
> the same as r254337 in head, that is scheduled to be MFC'd to
> stable/9 in
> a couple of weeks.
> 
> rick
> 
> 
> 
> > Here's /boot/loader:
> > [quote]
> > # ZFS tuning tweaks
> > aio_load="YES" # Async IO system calls
> > autoboot_delay="10" # reduce boot menu delay time from 10 to 3
> > seconds
> > vfs.zfs.arc_max="56868864000" # Reserves 10GB or ram for system,
> > leaves
> > 56GB for ZFS
> > vfs.zfs.cache_flush_disable="1"
> > #vfs.zfs.prefetch_disble="1"
> > vfs.zfs.write_limit_override="429496728"
> > 
> > kern.ipc.nmbclusters="264144" # increase the number of network
> > mbufs
> > kern.maxfiles="65535"
> > net.inet.tcp.syncache.hashsize="1024" # Size of the syncache hash
> > table
> > net.inet.tcp.syncache.bucketlimit="100" # Limit the number of
> > entries
> > permitted in each bucket of the hash table.
> > net.inet.tcp.tcbhashsize="32768"
> > 
> > # Link Aggregation loader tweaks. see:
> > https://calomel.org/freebsd_network_tuning.html
> > hw.igb.enable_msix="1"
> > hw.igb.num_queues="0"
> > hw.igb.enable_aim="1"
> > hw.igb.max_interrupt_rate="32000"
> > hw.igb.rxd="2048"
> > hw.igb.txd="2048"
> > hw.igb.rx_process_limit="4096"
> > if_lagg_load="YES"
> > [/quote]
> > 
> > Here's etc/sysctl.conf:
> > [quote]
> > # $FreeBSD: release/9.1.0/etc/sysctl.conf 112200 2003-03-13
> > 18:43:50Z
> > mux $
> > #
> > # This file is read when going to multi-user and its contents piped
> > thru
> > # ``sysctl'' to adjust kernel values. ``man 5 sysctl.conf'' for
> > details.
> > #
> > 
> > # Uncomment this to prevent users from seeing information about
> > processes
> > that
> > # are being run under another UID.
> > #security.bsd.see_other_uids=0
> > kern.ipc.somaxconn=1024
> > kern.maxusers=272
> > #kern.maxvnodes=1096848 #increase this if necessary
> > kern.ipc.maxsockbuf=8388608
> > net.inet.tcp.mssdflt=1460
> > net.inet.ip.forwarding=1
> > net.inet.ip.fastforwarding=1
> > dev.igb.2.fc=0
> > dev.igb.3.fc=0
> > dev.igb.4.fc=0
> > dev.igb.5.fc=0
> > dev.igb.2.rx_procesing_limit=10000
> > dev.igb.3.rx_procesing_limit=10000
> > dev.igb.4.rx_procesing_limit=10000
> > dev.igb.5.rx_procesing_limit=10000
> > net.inet.ip.redirect=0
> > net.inet.icmp.bmcastecho=0 # do not respond to ICMP packets sent
> > to IP
> > .255
> > net.inet.icmp.maskfake=0 # do not fake reply to ICMP Address
> > Mask
> > Request packets
> > net.inet.icmp.maskrepl=0 # replies are not sent for ICMP
> > address mask
> > net.inet.icmp.log_redirect=0 # do not log redirected ICMP packet
> > attempts
> > net.inet.icmp.drop_redirect=1 # no redirected ICMP packets
> > net.inet.tcp.drop_synfin=1 # SYN/FIN packets get dropped on
> > initial
> > connection
> > net.inet.tcp.ecn.enable=1 # explicit congestion notification
> > (ecn)
> > warning: some ISP routers abuse it
> > net.inet.tcp.icmp_may_rst=0 # icmp may not send RST to avoid
> > spoofed
> > icmp/udp floods
> > net.inet.tcp.maxtcptw=15000 # max number of tcp time_wait states
> > for
> > closing connections
> > net.inet.tcp.msl=5000 # 5 second maximum segment life
> > waiting for
> > an ACK in reply to a SYN-ACK or FIN-ACK
> > net.inet.tcp.path_mtu_discovery=0 # disable MTU discovery since
> > most
> > ICMP
> > packets are dropped by others
> > net.inet.tcp.rfc3042=0 # disable the limited transmit
> > mechanism
> > which can slow burst transmissions
> > net.inet.ip.rtexpire=60 # 3600 secs
> > net.inet.ip.rtminexpire=2 # 10 secs
> > net.inet.ip.rtmaxcache=1024 # 128 entries
> > [/quote]
> > 
> > Here's /etc/rc.conf
> > [quote]
> > #ifconfig_igb2=" inet xxx.xx.x.xx netmask 255.255.248.0"
> > hostname="xxxxxxxxxxxxxxxxxxx"
> > #
> > # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
> > dumpdev="NO"
> > #
> > ### LACP config
> > ifconfig_igb2="up"
> > ifconfig_igb3="up"
> > ifconfig_igb4="up"
> > ifconfig_igb5="up"
> > cloned_interfaces="lagg0"
> > ifconfig_lagg0="laggproto lacp laggport igb2 laggport igb3 laggport
> > igb4
> > laggport igb5 xxx.xx.x.xx netmask 255.255.248.0"
> > ipvr_addrs_lagg0="xxx.xx.x.xx"
> > defaultrouter="xxx.xx.x.xx"
> > #
> > ### Defaults for SSH, NTP, ZFS
> > sshd_enable="YES"
> > ntpd_enable="YES"
> > zfs_enable="YES"
> > #
> > ## NFS Server
> > rpcbind_enable="YES"
> > nfs_server_enable="YES"
> > mountd_flags="-r -l"
> > nfsd_enable="YES"
> > mountd_enable="YES"
> > rpc_lockd_enable="NO"
> > rpc_statd_enable="NO"
> > nfs_server_flags="-u -t -n 128"
> > nfsv4_server_enable="YES"
> > nfsuserd_enable="YES"
> > [/quote]
> > 
> > Thanks in advance,
> > --
> > Eric Browning
> > _______________________________________________
> > freebsd-fs at freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > To unsubscribe, send any mail to "
> > freebsd-fs-unsubscribe at freebsd.org "
> > 
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to " freebsd-fs-unsubscribe at freebsd.org
> "
> 
> 


More information about the freebsd-fs mailing list