geom_mirror was stale, now broken

Paul Mather paul at gromit.dlib.vt.edu
Wed Feb 16 09:55:41 PST 2005


Because of the annoying ATA regression that crept into 5.3, I
semi-regularly get "TIMEOUT - WRITE_DMA" errors that ultimately cause a
drive to be removed from my geom_mirror configuration. :-(

Previously, the drive suffering the WRITE_DMA problem would be marked as
a "stale" provider during boot.  Recently, this appears to have changed,
and the provider is listed as "broken."  E.g.:

Feb 16 05:21:45 zappa kernel: ad2: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=49981679
Feb 16 05:21:50 zappa kernel: ad2: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=49981679
Feb 16 05:21:50 zappa kernel: ad2: FAILURE - WRITE_DMA timed out
Feb 16 05:21:50 zappa kernel: GEOM_MIRROR: Cannot update metadata on disk ad2 (error=5).
Feb 16 05:21:50 zappa kernel: GEOM_MIRROR: Device raid1: provider ad2 disconnected.
[[...]]
Feb 16 11:48:37 zappa kernel: FreeBSD 6.0-CURRENT #0: Fri Feb 11 09:03:49 EST 2005
[[...]]
Feb 16 11:48:37 zappa kernel: GEOM_MIRROR: Device raid1 created (id=723259611).
Feb 16 11:48:37 zappa kernel: GEOM_MIRROR: Device raid1: provider ad0 detected.
Feb 16 11:48:37 zappa kernel: GEOM_MIRROR: Device raid1: provider ad2 detected.
Feb 16 11:48:37 zappa kernel: GEOM_MIRROR: Component ad2 (device raid1) broken, skipping.
Feb 16 11:48:37 zappa kernel: GEOM_MIRROR: Device raid1: provider ad0 activated.
Feb 16 11:48:37 zappa kernel: GEOM_MIRROR: Device raid1: provider mirror/raid1 launched.


One artifact of the mirror provider being marked as "broken" is that I
can no longer simply rebuild onto it.  Now, I have to "gmirror forget"
and then "gmirror insert" the "broken" provider back into the mirror and
then rebuild onto it.

Under what circumstances is a mirror provider considered "broken" as
opposed to "stale?"

BTW, here is the current status of my geom_mirror (it is currently
rebuilding):

Geom name: raid1
State: DEGRADED
Components: 2
Balance: split
Slice: 4096
Flags: NOAUTOSYNC
GenID: 4
SyncID: 12
ID: 723259611
Providers:
1. Name: mirror/raid1
   Mediasize: 25590619648 (24G)
   Sectorsize: 512
   Mode: r6w5e5
Consumers:
1. Name: ad0
   Mediasize: 25590620160 (24G)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
   Priority: 0
   Flags: DIRTY
   GenID: 4
   SyncID: 12
   ID: 1971505175
2. Name: ad2
   Mediasize: 25590620160 (24G)
   Sectorsize: 512
   Mode: r1w1e1
   State: SYNCHRONIZING
   Priority: 1
   Flags: DIRTY, SYNCHRONIZING, FORCE_SYNC
   GenID: 4
   SyncID: 12
   Synchronized: 89%
   ID: 3025777059

Geom name: raid1.sync
Consumers:
1. Name: mirror/raid1
   Mediasize: 25590619648 (24G)
   Sectorsize: 512
   Mode: r1w0e0


I notice the "GenID" of the providers increases at every breakage.  What
is the GenID?  (It seems like a relatively recent addition.)

I've also noticed the geom_mirror appears less resilient to drive
"failures" nowadays than it did before.  I recently had to do a hard
reset reboot this morning because my system "froze"---apparently unable
to do any disk I/O. :-(

Alas, I can't get any crashdumps (I'm using GBDE-encrypted swap on my
geom_mirror), and can't set up a serial console at this time because the
system has only one serial port and it's being used for my olde Apple
LaserWriter II right now. :-)

BTW, I run X.  Is it possible to break to the debugger from the regular
console should the system freeze again like it did this morning?  If so,
how do I do that?

Cheers,

Paul.
-- 
e-mail: paul at gromit.dlib.vt.edu

"Without music to decorate it, time is just a bunch of boring production
 deadlines or dates by which bills must be paid."
        --- Frank Vincent Zappa


More information about the freebsd-geom mailing list