gmirror disk fail questions...

Gary Newcombe gary at pattersonsoftware.com
Fri Apr 18 04:38:25 UTC 2008


Hi all,

Yesterday, after users complaining of strange things happening in their
accounting package, I rebooted the server only to find that it never
came back up. gmirror was complaining about ad6 in the raid and the
server had hung bringing the mirror up (this has happened twice now).

uname -a
FreeBSD mesh.lhshoses.com.au 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Thu
Jan 18 22:55:39 EST 2007
gary at mesh.lhshoses.com.au:/usr/obj/usr/src/sys/MESH  i386

After a hard reboot, provider ad4 was available, ad6 timed out and the
server booted.

dmesg

ad4: 76324MB <WDC WD800JD-23LSA0 07.01D07> at ata2-master SATA150
ad6: 76324MB <WDC WD800JD-23LSA0 07.01D07> at ata3-master SATA150
GEOM_MIRROR: Device gm0 created (id=3803006992).
GEOM_MIRROR: Device gm0: provider ad4 detected.
Root mount waiting for: GMIRROR
Root mount waiting for: GMIRROR
Root mount waiting for: GMIRROR
Root mount waiting for: GMIRROR
GEOM_MIRROR: Force device gm0 start due to timeout.
GEOM_MIRROR: Device gm0: provider ad4 activated.
GEOM_MIRROR: Device gm0: provider mirror/gm0 launched.
Trying to mount root from ufs:/dev/mirror/gm0s1a


# gmirror status

[mesh:/var/log]# gmirror status
      Name    Status  Components
mirror/gm0  DEGRADED  ad4


looking in /dev/ however, we have

crw-r-----  1 root  operator    0,  83 17 Apr 13:58 ad4
crw-r-----  1 root  operator    0,  91 17 Apr 13:58 ad4s1
crw-r-----  1 root  operator    0,  84 17 Apr 13:58 ad6
crw-r-----  1 root  operator    0,  92 17 Apr 13:58 ad6a
crw-r-----  1 root  operator    0,  99 17 Apr 13:58 ad6as1
crw-r-----  1 root  operator    0,  93 17 Apr 13:58 ad6b
crw-r-----  1 root  operator    0,  94 17 Apr 13:58 ad6c
crw-r-----  1 root  operator    0, 100 17 Apr 13:58 ad6cs1
crw-r-----  1 root  operator    0,  95 17 Apr 13:58 ad6d
crw-r-----  1 root  operator    0,  96 17 Apr 13:58 ad6e
crw-r-----  1 root  operator    0,  97 17 Apr 13:58 ad6f
crw-r-----  1 root  operator    0,  98 17 Apr 13:58 ad6s1
crw-r-----  1 root  operator    0, 101 17 Apr 13:58 ad6s1a
crw-r-----  1 root  operator    0, 102 17 Apr 13:58 ad6s1b
crw-r-----  1 root  operator    0, 103 17 Apr 13:58 ad6s1c
crw-r-----  1 root  operator    0, 104 17 Apr 13:58 ad6s1d
crw-r-----  1 root  operator    0, 105 17 Apr 13:58 ad6s1e
crw-r-----  1 root  operator    0, 106 17 Apr 13:58 ad6s1f

I am guessing that a failing disk is responsible for the data
corruption, but I have no errors in /var/log/messages or console.log.
On every boot, the mirror is marked clean ad there's no warnings about
a disk failing anywhere? Where should I be looking for or what should I
be doing to get any warnings?

Also, how-come if ad4 is the working disk, ad4's slices seem to be
labelled as ad6. What's going on here? To me, ad6 appears to have
correct labelling for the mirror from ad6s1a-f

How can I test for sure whether the disk is damaged or dying, or
whether this is just a temporary glitch in the mirror? This is the
first time I've had a gmirror raid give me problems.

Assuming ad6 has been deactivated/disconnected, I was thinking of
trying:

gmirror activate gm0 ad6
gmirror rebuild gm0 ad6

Is this safe?

I haven't tried pulling either disk from the server as I am remote from
the site.

Cheers, Gary.


More information about the freebsd-questions mailing list