Infinite loop in GEOM_JOURNAL when device dies
Marcin Wisnicki
mwisnicki+freebsd at gmail.com
Sun Oct 25 01:00:10 UTC 2009
Hello,
My system looks like this:
FreeBSD 7.2-STABLE #3: Sat Oct 17 20:50:32 CEST 2009
da1 at umass-sim1 bus 1 target 0 lun 0
da1: <WD 2500BMV External 1.05> Fixed Direct Access SCSI-4 device
da1: 40.000MB/s transfers
da1: 238475MB (488397168 512 byte sectors: 255H 63S/T 30401C)
GEOM_JOURNAL: Journal 584260361: da1p1 contains data.
GEOM_JOURNAL: Journal 584260361: da1p1 contains journal.
GEOM_JOURNAL: Journal da1p1 consistent.
/dev/ufs/tank1u on /vol/store/tank1 (ufs, asynchronous, local, noatime,
acls, gjournal)
Device da1 is an external WD Passport hdd connected to a powered usb hub.
UFS filesystem on da1p1.journal is labeled "tank1u".
Unfortunately from time to time (1 day to many weeks after startup) it
stops working:
umass1: BBB reset failed, IOERROR
umass1: BBB bulk-in clear stall failed, IOERROR
umass1: BBB bulk-out clear stall failed, IOERROR
umass1: BBB reset failed, IOERROR
umass1: BBB bulk-in clear stall failed, IOERROR
umass1: BBB bulk-out clear stall failed, IOERROR
umass1: BBB reset failed, IOERROR
umass1: at uhub4 port 2 (addr 4) disconnected
GEOM_JOURNAL: Error while reading data from da1p1 (error=22).
At this point I usually get:
panic: ufs_dirbad: /vol/store/tank1: bad dir ino 2 at offset 0: mangled entry
Which is unfortunate but at least system will recover itself.
However today it didn't panic but instead following happened:
(da1:umass-sim1:1:0:0): lost device
GEOM_JOURNAL: Lost provider da1p1.
GEOM_JOURNAL: Cannot destroy journal da1p1 (error=16). Destroy it
manually after last close.
System was still working but when I've tried doing "ls /vol/store", I got
this on serial console:
(da1:umass-sim1:1:0:0): lost device
GEOM_JOURNAL: Lost provider da1p1.
GEOM_JOURNAL: Cannot destroy journal da1p1 (error=16). Destroy it manually after last close.
GEOM_JOURNAL: Error while reading data from da1p1 (error=6).
GEOM_JOURNAL: Error while reading data from da1p1 (error=6).
GEOM_JOURNAL: Error while reading data from da1p1 (error=6).
GEOM_JOURNAL: Error while reading data from da1p1 (error=6).
GEOM_JOURNAL: Error while reading data from da1p1 (error=6).
GEOM_JOURNAL: Error while reading data from da1p1 (error=6).
GEOM_JOURNAL: Error while reading data from da1p1 (error=6).
GEOM_JOURNAL: Error while reading data from da1p1 (error=6).
g_vfs_done():ufs/tank1u[READ(offset=254163107960586240, length=16384)]error = 5
bad block 9261869914, ino 432515
g_vfs_done():ufs/tank1u[READ(offset=7447739922700238848, length=16384)]error = 5
g_vfs_done():ufs/tank1u[READ(offset=-9220267016017932288, length=16384)]error = 5
[skip]
g_vfs_done():ufs/tank1u[READ(offset=-9220267016017932288, length=16384)]error = 5
g_vfs_done():ufs/tank1u[READ(offset=-9220267016017932288, length=16384)]error = 5
g_vfs_done():ufs/tank1u[READ(o
g_vfs_done():ufs/tank1u[READ(offset=-9220267016017932288, length=16384)]error = 5
[skip]
g_vfs_done():ufs/tank1u[READ(offset=-9220267016017932288, length=16384)]error = 5
g_vfs_done():ufs/tank1u[READ(o
17932288, length=16384)]error = 5
g_vfs_done():ufs/tank1u[READ(offset=-9220267016017932288, length=16384)]error = 5
[skip many pages]
g_vfs_done():ufs/tank1u[READ(offset=-9220267016017932288, length=16384)]error = 5
g_vfs_done():ufs/tank1u[READ(of
ength=16384)]error = 5
g_vfs_done():ufs/tank1u[READ(offset=18968309645312, length=16384)]error = 5
[infinite loop ?]
(I have unmangled and reformatted output for readability)
While I could ping the machine nothing in userland worked and console was
constantly printing geom errors and wouldn't accept any input, so I had
to press reset button.
I think that gjournal should somehow destroy itself if underlying
provider dies - just like a provider of unplugged disk.
UFS is supposed to handle disappearing devices for some time now and even
while this does not really work yet a panic is better than an infinite
loop.
SMART log shows some READ DMA EXT errors but no permanent damage - I have smartd
doing periodic testing and it completes without failure and all error counters
remain at 0.
BTW the drive worked fine in Windows, it just "stalled" for a moment sometimes.
More information about the freebsd-geom
mailing list