Problems with ZFS file servers
healer at rpi.edu
Sun Sep 13 19:32:58 UTC 2015
I've been semi-successfully running multi-homed ZFS based NFS file
servers. Every 30-90 days I have to reboot them, or they become
non-responsive on one or more interfaces. My only error messages are my
RHEL 5 clients complaining the server is unreachable, and the output of
netstat -i showing fast increasing input errors. I am running
10.1-RELEASE patched to 7/2/15. Installed ports are minimal, mainly
bash, rsync, portupgrade, and their dependencies.
Variety of hosts, some Dell, some IBM, some HP, some Sun (pre Oracle),
some Supermicro whitebox systems. Age ranges from 1 to 5 years old.
Ram varies 6GB to 64GB, network cards are assorted onboard igb, em, and
bge cards. Also have some mxge cards installed. Disk is mostly on mfi
or mpt based controllers, with two cciss card. Raw disk capacity varies
between 12TB and 96TB. CPUs vary from Xeon 54xx chips to Opteron 43xx
chips and everything in between. I have some identical machines still
on Oracle support running Solaris/ZFS that do not exhibit these problems
under identical loads.
The servers are used as NFS file stores to HPC research clusters. There
is one interface reachable from the publicly routed university network,
and a second interface with 802.1q vlans to reach each of the internal
cluster networks a given host servers. Due to boss's rules regarding
downtime (no scheduled outages ever, for any reason), the next time I
know I'll be able to reboot these to test changes is 6/18/16 when the
annual electrical shutdown occurs. Otherwise, I can try suggestions as
things get unhappy with life and require unscheduled reboots.
Biocomputation and Bioinformatics Constellation
healer at rpi.edu
More information about the freebsd-questions