FreeBSD 8.0 - network stack crashes?
Weldon S Godfrey 3
weldon at excelsusphoto.com
Mon Nov 2 15:52:36 UTC 2009
Up until yesterday, we have been running FreeBSD-CURRENT of 12/08. We
started to see a couple months ago some very odd network behavior.
Something happens to the stack that causes processes accessing the network
to just hang. After the problem happens, usually (but not always), you
can't ssh in. Always, you can't ssh or telnet out, and nothing can access
the NFS shares on the server. You can ping everything from the server.
You can't even do a route add, you can't ssh if you use just the IP
address (although pinging with hostnames it doesn't have cached or in
hosts table resolves). When you try to ssh out, do a route add from the
box, the process just hangs. You can't control C it at all, it hangs
forever. There is nothing in dmesg or messages to indicate an issue. I
try to up/down the interfaces. In CURRENT-12/08, it may allow things to
work for like 30s.
We upgraded to 8.0-RC2 yesterday and, at first, the problem appeared to
happen a lot more often. We expected that was related with the increase
in network performance. At least in 8.0-RC2, I did see a large amount of
input errors with netstat -in on the heavily loaded interface before it
started the locking up behavior. I have replaced the ethernet cable and
move ports. The Catalyst 3650 never records any errors. The problem
would reoccur in about 5 minutes once our load kicked in this morning.
One change in this upgrade, we switched from NFS v2 to v3. When we
downgraded to the previous OS, we stayed at v3. The problem was just
about as bad with v3 with the 12/08 OS
We went back to RC2 with NFS v2 and appeared to stabilize to a degree.
It ran for about an hour and a half and then the issue came up
We are currently back to the 12/08 version using NFS2 and watching things.
We are using a Dell PowerEdge 2950-iii, the problem happens when using the
onboard nics using the bce driver and with an Intel card using the em
driver
I am hunting down any MTU/duplex/speed problems that could cause it
(haven't found any so far). Of course, any problems on the network
wouldn't (ideally) freak out the network stack on the server). I don't
know how to troubleshoot this further on the server since I am not getting
any problems indicated in logging, panics, cores, etc.
Any help is appreciated.
Thanks,
Weldon
More information about the freebsd-current
mailing list