deadlock or bad disk ? RELENG_8
Jeremy Chadwick
freebsd at jdc.parodius.com
Mon Jul 19 20:17:01 UTC 2010
On Mon, Jul 19, 2010 at 08:41:40AM -0400, Mike Tancsa wrote:
> At 11:58 PM 7/18/2010, Jeremy Chadwick wrote:
>
> >So I believe this indicates the message only gets printed during swapin,
> >not swapout. Meaning it's happening during an I/O read from da0.
>
> Yes, and from my existing ssh sessions, it would _seem_ no disk IO
> was completing. ie I tried a killall -9 watchdogd which would need
> to load killall from the disk, read whatever its linked against.
> However, after hitting enter it was just blocking on trying to read.
> So I would describe it as if the entire system was waiting from that
> "swapper Indefinite wait" to finish, or I could not read anything
> from drives associated with that controller.
Hmm, okay, so it sounds like the controller wedged or arcmsr(4) started
acting oddly. I would open up a case with Areca on the problem,
*especially* if it happens again.
> >So what's hz? Well, I want to assume it's kern.hz, which defaults to
> >1000. 1000*20 = 20000, so the timeout would be 20000/1000 = 20 seconds.
> >That's a pretty long time to be waiting for an I/O read to return.
>
> I think the messages were printing to the serial console faster than
> that, but I could be wrong. If it happens again, I will time it
Come to think of it, I'm betting you'd get large batches of these
messages if/when it happens. That VM code isn't something I'm familiar
with (nor msleep(9)), I just happen to dig around and find what I can.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |
More information about the freebsd-stable
mailing list