NFS + FreeBSD TCP Behavior with Linux NAT
julian at freebsd.org
Thu Nov 11 20:40:00 UTC 2010
On 11/11/10 6:36 AM, Christopher Penney wrote:
> I have a curious problem I'm hoping someone can help with or at least
> educate me on.
> I have several large Linux clusters and for each one we hide the compute
> nodes behind a head node using NAT. Historically, this has worked very well
> for us and any time a NAT gateway (the head node) reboots everything
> recovers within a minute or two of it coming back up. This includes NFS
> mounts from Linux and Solaris NFS servers, license server connections, etc.
> Recently, we added a FreeBSD based NFS server to our cluster resources and
> have had significant issues with NFS mounts hanging if the head node
> reboots. We don't have this happen much, but it does occasionally happen.
> I've explored this and it seems the behavior of FreeBSD differs a bit from
> at least Linux and Solaris with respect to TCP recovery. I'm curious if
> someone can explain this or offer any workarounds.
> Here are some specifics from a test I ran:
> Before the reboot two Linux clients were mounting the FreeBSD server. They
> were both using port 903 locally. On the head node clientA:903 was remapped
> to headnode:903 and clientB:903 was remapped to headnode:601. There is no
> activity when the reboot occurs. The head node takes a few minutes to come
> back up (we kept it down for several minutes).
> When it comes back up clientA and clientB try to reconnect to the FreeBSD
> NFS server. They both use the same source port, but since the head node's
> conntrack table is cleared it's a race to see who gets what port and this
> time clientA:903 appears as headnode:601 and clientB:903 appears as
> headnode:903 (>>> they essentially switch places as far as the FreeBSD
> server would see<<< ).
> The FreeBSD NFS server, since there was no outstanding acks it was waiting
> on, thinks things are ok so when it gets a SYN from the two clients it only
> responds with an ACK. The ACK for each that it replies with is bogus
> (invalid seq number) because it's using the return path the other client was
> using before the reboot so the client sends a RST back, but it never gets to
> the FreeBSD system since the head node's NAT hasn't yet seen the full
> handshake (that would allow return packets). The end result is a
> "permanent" hang (at least until it would otherwise cleanup idle TCP
> This is in stark contrast to the behavior of the other systems we have.
> Other systems respond to the SYN used to reconnect with a SYN/ACK. They
> appear to implicitly tear down the return path based on getting a SYN from a
> seemingly already established connection.
> I'm assuming this is one of the grey areas where there is no specific
> behavior outlined in an RFC? Is there any way to make the FreeBSD system
> more reliable in this situation (like making it implicitly tear down the
> return)? Or is there a way to adjust the NAT setup to allow the RST to
> return to the FreeBSD system? Currently, NAT is setup with simply:
> iptables -t nat -A POSTROUTING -s 10.1.0.0/16 -o bond0 -j SNAT --to 126.96.36.199
> Where 188.8.131.52 is the intranet address and 10.1.0.0 is the cluster network.
I just added NFS to the subject because the NFS people are thise you
> freebsd-net at freebsd.org mailing list
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
More information about the freebsd-net