strange deadlock and magic resurrection with RELENG_6
mike at Reifenberger.com
Thu Mar 23 10:15:54 UTC 2006
I'm using a recent RELENG_6 under I386/SMP (Athlon X2 4800+).
dmesg output is under http://people.freebsd.org/~mr/dmesg.log.gz
Root is on gmirror volume (2 SATA disks), a backup FS is on graid3
(5 firewire disks). This server acts as an bacula server.
During backup with bacula I discovered an complete system freeze
(no keyboard, nfs, disk...) after the following lines on the screen:
ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=108916879
ad1: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=116030287
ad1: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=108911183
ad1: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=108378767
Since I could ping the system and after waiting a couple of hours in the
hope the system would would resurrection by itself, I issued an
flood-ping to this machine and voila, after getting the following lines:
Limiting icmp ping response from 261 to 200 packets/sec
Limiting icmp ping response from 283 to 200 packets/sec
Anything went back to normality!
This seems to me that we have an deadlock condition somewhere in the kernel.
But how to debug this issue when anything is frozen?
BTW: I've got the DMA errors in the past allready which seems to be an
interaction between ata and some geom modules. See a former post from me
regarding this issue.
Maybe the same issue got fatal now after the latest gmirror/graid3 changes?
Has anyone else seen this?
Michael Reifenberger, Business Development Manager SAP-Basis, Plaut Consulting
Comp: Michael.Reifenberger at plaut.de | Priv: Michael at Reifenberger.com
http://www.plaut.de | http://www.Reifenberger.com
More information about the freebsd-stable