ffs snapshot lockup

Kris Kennaway kris at obsecurity.org
Wed Oct 4 19:57:58 UTC 2006


On Wed, Oct 04, 2006 at 03:53:54PM -0400, Vivek Khera wrote:
> 
> On Oct 4, 2006, at 3:41 PM, Kris Kennaway wrote:
> 
> >>from what i read in the output from kgdb, it seems that something
> >>locked the kernel and we broke to debugger from the watchdog timeout
> >>(I enable software watchdog).
> >
> >Hmm, be careful with that - if you set the timeout too low (and note
> >that for some workloads O(minutes) may even be too low) then you'll
> >get a lot of false positives.
> 
> hmmm... the man page for watchdogd doesn't specify what the default  
> timeout is, but that's what we've got running.   [tappity-tapptity- 
> tap...] source seems to indicate 16seconds timeout.  interesting.

Yes, that's probably way too low.  e.g. when creating a snapshot (as
in your workload) your machine may be unresponsive for up to a few
minutes depending on your filesystem size and I/O load.

> so we could be getting hit with a bge interrupt storm and timing  
> out.  i'll turn off fido and see what happens.
> 
> at this point, though, i think i have two separate issues.  one with  
> bge and watchdog timeout, and one with locking of the filesystem with  
> mksnap_ffs, as the symptoms are different.

That sounds plausible.  Many people are reporting issues involving NIC
interrupts, but they're proving elusive to characterize so far (there
may be multiple problems).

kris



-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20061004/9706cf00/attachment.pgp


More information about the freebsd-stable mailing list