Problems with ZFS file servers

Bob Healey healer at
Sun Sep 13 19:32:58 UTC 2015


I've been semi-successfully running multi-homed ZFS based NFS file 
servers.  Every 30-90 days I have to reboot them, or they become 
non-responsive on one or more interfaces.  My only error messages are my 
RHEL 5 clients complaining the server is unreachable, and the output of 
netstat -i showing fast increasing input errors.  I am running 
10.1-RELEASE patched to 7/2/15.  Installed ports are minimal, mainly 
bash, rsync, portupgrade, and their dependencies.

Basic info:
Variety of hosts, some Dell, some IBM, some HP, some Sun (pre Oracle), 
some Supermicro whitebox systems.  Age ranges from 1 to 5 years old.  
Ram varies 6GB to 64GB, network cards are assorted onboard igb, em, and 
bge cards.  Also have some mxge cards installed.  Disk is mostly on mfi 
or mpt based controllers, with two cciss card.  Raw disk capacity varies 
between 12TB and 96TB.  CPUs vary from Xeon 54xx chips to Opteron 43xx 
chips and everything in between.  I have some identical machines still 
on Oracle support running Solaris/ZFS that do not exhibit these problems 
under identical loads.

The servers are used as NFS file stores to HPC research clusters. There 
is one interface reachable from the publicly routed university network, 
and a second interface with 802.1q vlans to reach each of the internal 
cluster networks a given host servers.  Due to boss's rules regarding 
downtime (no scheduled outages ever, for any reason), the next time I 
know I'll be able to reboot these to test changes is 6/18/16 when the 
annual electrical shutdown occurs. Otherwise, I can try suggestions as 
things get unhappy with life and require unscheduled reboots.

Bob Healey
Systems Administrator
Biocomputation and Bioinformatics Constellation
and Molecularium
healer at
(518) 276-4407

More information about the freebsd-questions mailing list