FreeBSD 8.0 - network stack crashes?

Weldon S Godfrey 3 weldon at excelsusphoto.com
Mon Nov 2 21:11:16 UTC 2009



If memory serves me right, sometime around 10:52am, Weldon S Godfrey 3 told me:

>
> Up until yesterday, we have been running FreeBSD-CURRENT of 12/08.  We started 
> to see a couple months ago some very odd network behavior. Something happens 
> to the stack that causes processes accessing the network to just hang.  After 
> the problem happens, usually (but not always), you can't ssh in.  Always, you 
> can't ssh or telnet out, and nothing can access the NFS shares on the server. 
> You can ping everything from the server. You can't even do a route add, you 
> can't ssh if you use just the IP address (although pinging with hostnames it 
> doesn't have cached or in hosts table resolves).  When you try to ssh out, do 
> a route add from the box, the process just hangs.  You can't control C it at 
> all, it hangs forever.  There is nothing in dmesg or messages to indicate an 
> issue.  I try to up/down the interfaces.  In CURRENT-12/08, it may allow 
> things to work for like 30s.
>
> We upgraded to 8.0-RC2 yesterday and, at first, the problem appeared to happen 
> a lot more often.  We expected that was related with the increase in network 
> performance.  At least in 8.0-RC2, I did see a large amount of input errors 
> with netstat -in on the heavily loaded interface before it started the locking 
> up behavior.  I have replaced the ethernet cable and move ports.  The Catalyst 
> 3650 never records any errors.  The problem would reoccur in about 5 minutes 
> once our load kicked in this morning.
>
>
> One change in this upgrade, we switched from NFS v2 to v3.  When we downgraded 
> to the previous OS, we stayed at v3.  The problem was just about as bad with 
> v3 with the 12/08 OS
>
> We went back to RC2 with NFS v2 and appeared to stabilize to a degree.
> It ran for about an hour and a half and then the issue came up
>
> We are currently back to the 12/08 version using NFS2 and watching things.
>
> We are using a Dell PowerEdge 2950-iii, the problem happens when using the 
> onboard nics using the bce driver and with an Intel card using the em driver
>
> I am hunting down any MTU/duplex/speed problems that could cause it (haven't 
> found any so far).  Of course, any problems on the network wouldn't (ideally) 
> freak out the network stack on the server).  I don't know how to troubleshoot 
> this further on the server since I am not getting any problems indicated in 
> logging, panics, cores, etc.
>
> Any help is appreciated.
>


I have swapped out the computer, switch, ethernet card, 3ware card.  We 
are running on 8.0-CURRENT 12/08 that was what we where using with a lot 
less issues.  No help.

If it happens again, I am going to try to do a netif restart and routing 
restart.  Although I believe I tried that at the begining and it did not 
help.


More information about the freebsd-current mailing list