em watchdog timeout on UP, 6-stable

Barney Wolff barney at databus.com
Fri Sep 8 17:13:26 PDT 2006


On Fri, Sep 08, 2006 at 09:25:43PM +0400, Gleb Smirnoff wrote:
>   Barney,
> 
> On Tue, Sep 05, 2006 at 02:33:52PM -0400, Barney Wolff wrote:
> B> Updated my Athlon-xp 6-stable system last night, got an em watchdog
> B> timeout for the first time a few hours later, during a fairly
> B> high-traffic period.  System is UP but does have device apic in
> B> the config.  Any chance this is the recent race condition?
> B> Workaround?  ifconfig em0 down, ifconfig em0 up seemed to cure it,
> B> at least for the moment.
> 
> Not clear from your mail whether interface was working after the
> event occured.

In the watchdog timer case it was not.  Looking further, I had several
cases where nfs-over-tcp failed under heavy load, but the interface
did not report failure and continued to work.  The system sending
nfs writes logged "nfs send error 35" and gzip died with "resource
temporarily unavailable".  (I haven't looked at the code - EAGAIN?)
In the watchdog timer case the cpu was very busy with portbuilding and
the system was receiving nfs writes.  But the nfs failures happened in
both directions (I have two systems which back up each other, at different
times).  Before updating from a 6/14/06 6-stable to 9/04/06, such nfs
failures were unknown unless I tried to run both backups simultaneously.
Systems are on a cheap netgear gb switch, other system is current but
a couple of months old.

After the watchdog timer, the link was unidirectional - sending worked
(packets were correctly received on the other system) but receiving
did not work.  Then, after another 9 minutes, it seemed to stop working
in either direction, until manually down/up'd hours later.

I can put logs on a webserver if that would be useful.

-- 
Barney Wolff         I never met a computer I didn't like.


More information about the freebsd-stable mailing list