Crashed gmirror, single disk marked SYNC and wont boot...

Johan Ström johan at stromnet.se
Tue Aug 21 05:32:44 PDT 2007


Hi

FreeBSD gw-1.stromnet.se 6.2-RELEASE-p1 FreeBSD 6.2-RELEASE-p1 #7:  
Tue Feb 13 18:24:34 CET 2007     johan at elfi.stromnet.se:/usr/obj/usr/ 
src/sys/ROUTER.POLLING  i386

(ROUTER.POLLING is GENERIC  + options DEVICE_POLLING  and ALTQ,  
IPSEC, also pfsync and carp)

This weekend I had a disk failing on me in a machine running gmirror  
gm0 with 2 providers (ad0 and ad6). The whole box froze with no  
screen output, and on hard reboot I got some LBA errors etc from ad0,  
after a few reboots it got up and running though (I wasnt at the  
screen, had do do it by phone so couldn't really debug very well).
As soon as the box got up, I removed ad0 from the gmirror, so ad6 was  
the only provider. Today I got a new disk that would replace ad0..
Now remeber, ad6 was the only disk in the mirror. I took the box down  
fine, replaced the disk. ad0 was now gone and instead I hade ad4 (ad4 
+6 is SATA, ad0 was IDE). Changed so I booted of the old SATA..   
Okay, there came the first problem; the boot loader gave me the usual  
options F1 FreeBSD F5 Disk 2 (or whatever it said).. If I pressed F1  
i got the same prompt again.. F5 nothing at all.. Funny!... The  
system refused to load the loader (or whatever the 1-9 menu thingy is  
called) kernel or anything..
So I finally plugged the old ad0 disk into the machine to at least  
get it booted, thinking it would go up on the gmirror.. Nope..:

(got the new ad4 out here)
ad0: 38166MB <WDC WD400BB-00CAA1 17.07W17> at ata0-master UDMA100
ad6: 152627MB <SAMSUNG HD160JJ ZM100-41> at ata3-master SATA150
GEOM_MIRROR: Device gm0 created (id=4029378995).
GEOM_MIRROR: Device gm0: provider ad6 detected.
Root mount waiting for: GMIRROR
Root mount waiting for: GMIRROR
Root mount waiting for: GMIRROR
Root mount waiting for: GMIRROR
GEOM_MIRROR: Force device gm0 start due to timeout.
Trying to mount root from ufs:/dev/mirror/gm0s1a

Manual root filesystem specification:
   <fstype>:<device>  Mount <device> using filesystem <fstype>
                        eg. ufs:da0s1a
   ?                  List valid disk boot devices
   <empty line>       Abort manual input

mountroot>

Okey... so why wouldnt it load my mirror from ad6 now?? I just did a  
clean shutdown without problems.. It didnt even recognize any slices  
on ad6s1 (altough the ad6s1 was found)...
I entered ad0s1 as root and booted from there, ofcourse i got to  
emergency shell since fstab looked for the gmirror devices, which  
didnt exist..

Some more digging into gmirror, I did a gmirror dump ad6:

Metadata on /dev/ad6:
      magic: GEOM::MIRROR
    version: 3
       name: gm0
        mid: 4029378995
        did: 449032193
        all: 3
      genid: 0
     syncid: 5
   priority: 0
      slice: 4096
    balance: round-robin
mediasize: 20416757248
sectorsize: 512
syncoffset: 0
     mflags: NONE
     dflags: SYNCHRONIZING
hcprovider:
   provsize: 160041885696
   MD5 hash: 6e1e8ca80a27e0e1b0460feab595c39f

Some googling indicated  that  SYNCHRONIZING means that its not  
"complete" and wont mount? Is that correct? Why would it be in that  
state then, I just shut it down fine... And where the f*ck did my  
slices go??..

Did a sysctl kern.geom.mirror.debug=2 and tried to gmirror activate  
the mirror:

GEOM_MIRROR[1]: Creating device gm0 (id=4029378995).
GEOM_MIRROR[0]: Device gm0 created (id=4029378995).
GEOM_MIRROR[1]: root_mount_hold 0xc3539510
GEOM_MIRROR[1]: Adding disk ad6 to gm0.
GEOM_MIRROR[2]: Adding disk ad6.
GEOM_MIRROR[2]: Disk ad6 connected.
GEOM_MIRROR[1]: Disk ad6 state changed from NONE to NEW (device gm0).
GEOM_MIRROR[0]: Device gm0: provider ad6 detected.
GEOM_MIRROR[2]: Tasting ad6s1.
GEOM_MIRROR[0]: Force device gm0 start due to timeout.
GEOM_MIRROR[1]: root_mount_rel[2169] 0xc3539510
GEOM_MIRROR[2]: No I/O requests for gm0, it can be destroyed.
GEOM_MIRROR[2]: Metadata on ad6 updated.
GEOM_MIRROR[2]: Access ad6 r-1w-1e-1 = 0
GEOM_MIRROR[0]: Device gm0 destroyed.
GEOM_MIRROR[1]: Thread exiting.
GEOM_MIRROR[1]: Consumer ad6 destroyed.


Soo.. What is going on here? Anyone with some clues? Currently  
running on the ad0 disk, no raid at all.. Lets hope it doesnt die on  
me (havent had any signs of that since sunday when it froze and gave  
boot errors now so I'm hoping..). The data loss from using ad0  
instead of ad6 is probably minimal, its a router so its more or less  
only logging that seems to been lost... For now I just want to get  
clear about wth happened here and how to prevent it, and how to get  
back up on a gmirror with ad6 and ad4 (to be plugged in) so I can  
throw ad0 out...


Thanks

--
Johan Ström
Stromnet
johan at stromnet.se
http://www.stromnet.se/




More information about the freebsd-geom mailing list