em0 watchdog timeouts

Greg Byshenk freebsd at byshenk.net
Mon Oct 5 20:30:02 UTC 2009


On Mon, Oct 05, 2009 at 08:32:14PM +0200, Daniel Bond wrote:
 
> What I need is useful advice/help. I never stated I needed a driver  
> developer.
> 
> I'd like to be able to run my favorite OS on cool hardware, in the  
> future, for a high-performing NFS-server, without problems like I've  
> experienced the past 6months, on a production system.
> Please note that I'm managing a server-park almost completely based on  
> FreeBSD, and I'm running many NFS servers on other hardware, for other  
> services, without issues.
> 
> I've seen several other FreeBSD-users having problems with this too,  
> so I think it's of importance for the project. As I mentioned  
> originally, I'm happy to dispose the hardware to any FreeBSD developer
> that might want to look further into this. Debugging it further is  
> above my skill-set, I don't even know where to begin looking,  
> especially since I can't produce any panics.

I can give one bit of advice that helped me in a similar situation:
check you motherboards.

I run about a dozen fileservers on FreeBSD, and have always been very
happy with their performance, but some months ago I began to experience
problems with one of them.  These problems were 'watchdog timeout'
errors.  Tried all manner of things, different NICs of different types,
changing settings, etc., but nothing helped over the long term.  At 
some point, when very heavy i/o was going on to our Beowulf cluster, the
'watchdog timeouts' would begin.  What was strange is that other 
(supposedly identical) machines handled _more_ i/o without a problem.

Finally, while doing some comparisons, I realized that the motherboard
having the problem was _not_ the same as the others; it was similar, but
not identical.  I changed the motherboard and all the problems went away,
never to reappear.

I don't know if it was a specific problem with that particular
motherboard, or something about that model, but for whatever reason, it
appears that the buses just couldn't handle a RAID card and three active
NICs.


-- 
greg byshenk  -  gbyshenk at byshenk.net  -  Leiden, NL


More information about the freebsd-stable mailing list