NFS server fail-over - how do you do it?

Mon May 31 12:40:37 PDT 2004

In the last episode (May 31), adp said:
> Very useful information, thanks. We have a very stable NFS server,
> but I am still working hard to put some redundancy into place. I was
> thinking that since NFS is udp-based, that if the primary NFS server
> failed, and the secondary assumed the primary NFS server's IP
> address, that things would at least return to normal (of course, any
> writes that had been in progress would fail horribly). That doesn't
> seem to be the case. During a test we killed the main NFS server and
> brought up the NFS IP as an alias on the backup. Didn't work. Has
> anyone tried anything like this?

That should work, I believe.  NFS is stateless so as long as "a" server
starts responding to the client, it should wake up.  You may get "stale
NFS handle" errors on open files or ones not synched to the slave when
the master failed, but apart from that you should be okay.  Does a
tcpdump show any NFS traffic at all?

I have a port of the heartbeat program (from the badly-named
www.linux-ha.org site) that automates the IP failover part that I will
be submitting soon.  1.2.1 actually works out of the box on FreeBSD,
but 1.2.2 has problems releasing the IP when you try to move an active
server to standby.

-- 
	Dan Nelson
	dnelson at allantgroup.com