geom_mirror was stale, now broken
Paul Mather
paul at gromit.dlib.vt.edu
Wed Feb 16 09:55:41 PST 2005
Because of the annoying ATA regression that crept into 5.3, I
semi-regularly get "TIMEOUT - WRITE_DMA" errors that ultimately cause a
drive to be removed from my geom_mirror configuration. :-(
Previously, the drive suffering the WRITE_DMA problem would be marked as
a "stale" provider during boot. Recently, this appears to have changed,
and the provider is listed as "broken." E.g.:
Feb 16 05:21:45 zappa kernel: ad2: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=49981679
Feb 16 05:21:50 zappa kernel: ad2: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=49981679
Feb 16 05:21:50 zappa kernel: ad2: FAILURE - WRITE_DMA timed out
Feb 16 05:21:50 zappa kernel: GEOM_MIRROR: Cannot update metadata on disk ad2 (error=5).
Feb 16 05:21:50 zappa kernel: GEOM_MIRROR: Device raid1: provider ad2 disconnected.
[[...]]
Feb 16 11:48:37 zappa kernel: FreeBSD 6.0-CURRENT #0: Fri Feb 11 09:03:49 EST 2005
[[...]]
Feb 16 11:48:37 zappa kernel: GEOM_MIRROR: Device raid1 created (id=723259611).
Feb 16 11:48:37 zappa kernel: GEOM_MIRROR: Device raid1: provider ad0 detected.
Feb 16 11:48:37 zappa kernel: GEOM_MIRROR: Device raid1: provider ad2 detected.
Feb 16 11:48:37 zappa kernel: GEOM_MIRROR: Component ad2 (device raid1) broken, skipping.
Feb 16 11:48:37 zappa kernel: GEOM_MIRROR: Device raid1: provider ad0 activated.
Feb 16 11:48:37 zappa kernel: GEOM_MIRROR: Device raid1: provider mirror/raid1 launched.
One artifact of the mirror provider being marked as "broken" is that I
can no longer simply rebuild onto it. Now, I have to "gmirror forget"
and then "gmirror insert" the "broken" provider back into the mirror and
then rebuild onto it.
Under what circumstances is a mirror provider considered "broken" as
opposed to "stale?"
BTW, here is the current status of my geom_mirror (it is currently
rebuilding):
Geom name: raid1
State: DEGRADED
Components: 2
Balance: split
Slice: 4096
Flags: NOAUTOSYNC
GenID: 4
SyncID: 12
ID: 723259611
Providers:
1. Name: mirror/raid1
Mediasize: 25590619648 (24G)
Sectorsize: 512
Mode: r6w5e5
Consumers:
1. Name: ad0
Mediasize: 25590620160 (24G)
Sectorsize: 512
Mode: r1w1e1
State: ACTIVE
Priority: 0
Flags: DIRTY
GenID: 4
SyncID: 12
ID: 1971505175
2. Name: ad2
Mediasize: 25590620160 (24G)
Sectorsize: 512
Mode: r1w1e1
State: SYNCHRONIZING
Priority: 1
Flags: DIRTY, SYNCHRONIZING, FORCE_SYNC
GenID: 4
SyncID: 12
Synchronized: 89%
ID: 3025777059
Geom name: raid1.sync
Consumers:
1. Name: mirror/raid1
Mediasize: 25590619648 (24G)
Sectorsize: 512
Mode: r1w0e0
I notice the "GenID" of the providers increases at every breakage. What
is the GenID? (It seems like a relatively recent addition.)
I've also noticed the geom_mirror appears less resilient to drive
"failures" nowadays than it did before. I recently had to do a hard
reset reboot this morning because my system "froze"---apparently unable
to do any disk I/O. :-(
Alas, I can't get any crashdumps (I'm using GBDE-encrypted swap on my
geom_mirror), and can't set up a serial console at this time because the
system has only one serial port and it's being used for my olde Apple
LaserWriter II right now. :-)
BTW, I run X. Is it possible to break to the debugger from the regular
console should the system freeze again like it did this morning? If so,
how do I do that?
Cheers,
Paul.
--
e-mail: paul at gromit.dlib.vt.edu
"Without music to decorate it, time is just a bunch of boring production
deadlines or dates by which bills must be paid."
--- Frank Vincent Zappa
More information about the freebsd-geom
mailing list