Crashed gmirror, single disk marked SYNC and wont boot...

Johan Ström johan at stromnet.se
Tue Aug 21 08:53:37 PDT 2007


On Aug 21, 2007, at 16:31 , Pawel Jakub Dawidek wrote:

> On Tue, Aug 21, 2007 at 02:15:08PM +0200, Johan Ström wrote:
>> Hi
>>
>> FreeBSD gw-1.stromnet.se 6.2-RELEASE-p1 FreeBSD 6.2-RELEASE-p1 #7:
>> Tue Feb 13 18:24:34 CET 2007     johan at elfi.stromnet.se:/usr/obj/usr/
>> src/sys/ROUTER.POLLING  i386
>>
>> (ROUTER.POLLING is GENERIC  + options DEVICE_POLLING  and ALTQ,
>> IPSEC, also pfsync and carp)
>>
>> This weekend I had a disk failing on me in a machine running gmirror
>> gm0 with 2 providers (ad0 and ad6). The whole box froze with no
>> screen output, and on hard reboot I got some LBA errors etc from ad0,
>> after a few reboots it got up and running though (I wasnt at the
>> screen, had do do it by phone so couldn't really debug very well).
>> As soon as the box got up, I removed ad0 from the gmirror, so ad6 was
>> the only provider. Today I got a new disk that would replace ad0..
>> Now remeber, ad6 was the only disk in the mirror. I took the box down
>> fine, replaced the disk. ad0 was now gone and instead I hade ad4 (ad4
>> +6 is SATA, ad0 was IDE). Changed so I booted of the old SATA..
>> Okay, there came the first problem; the boot loader gave me the usual
>> options F1 FreeBSD F5 Disk 2 (or whatever it said).. If I pressed F1
>> i got the same prompt again.. F5 nothing at all.. Funny!... The
>> system refused to load the loader (or whatever the 1-9 menu thingy is
>> called) kernel or anything..
>> So I finally plugged the old ad0 disk into the machine to at least
>> get it booted, thinking it would go up on the gmirror.. Nope..:
>>
>> (got the new ad4 out here)
>> ad0: 38166MB <WDC WD400BB-00CAA1 17.07W17> at ata0-master UDMA100
>> ad6: 152627MB <SAMSUNG HD160JJ ZM100-41> at ata3-master SATA150
>> GEOM_MIRROR: Device gm0 created (id=4029378995).
>> GEOM_MIRROR: Device gm0: provider ad6 detected.
>> Root mount waiting for: GMIRROR
>> Root mount waiting for: GMIRROR
>> Root mount waiting for: GMIRROR
>> Root mount waiting for: GMIRROR
>> GEOM_MIRROR: Force device gm0 start due to timeout.
>> Trying to mount root from ufs:/dev/mirror/gm0s1a
>>
>> Manual root filesystem specification:
>>   <fstype>:<device>  Mount <device> using filesystem <fstype>
>>                        eg. ufs:da0s1a
>>   ?                  List valid disk boot devices
>>   <empty line>       Abort manual input
>>
>> mountroot>
>>
>> Okey... so why wouldnt it load my mirror from ad6 now?? I just did a
>> clean shutdown without problems.. It didnt even recognize any slices
>> on ad6s1 (altough the ad6s1 was found)...
>
> It loaded your mirror just fine, you confuse things. Gmirror  
> started in
> degraded state, as one could expect, but it seems there is no 'a'
> partition on your gm0s1 slice (or entire bsdlabel is gone).
> You could try to recreate it based on bsdlabel from ad0 (if it  
> should be
> the same), but I've no idea how it disapeared. Anyway, gmirror  
> seems to
> work properly.

Okay.. So it tries to load, find no partition table, and ignores and  
unloads gm0?

>
>> Some more digging into gmirror, I did a gmirror dump ad6:
>>
>> Metadata on /dev/ad6:
>>      magic: GEOM::MIRROR
>>    version: 3
>>       name: gm0
>>        mid: 4029378995
>>        did: 449032193
>>        all: 3
>
> You have 3-way mirror?

Uhm.. never had more than 2 disks in this machine..

>
>>      genid: 0
>>     syncid: 5
>>   priority: 0
>>      slice: 4096
>>    balance: round-robin
>> mediasize: 20416757248
>> sectorsize: 512
>> syncoffset: 0
>>     mflags: NONE
>>     dflags: SYNCHRONIZING
>> hcprovider:
>>   provsize: 160041885696
>>   MD5 hash: 6e1e8ca80a27e0e1b0460feab595c39f
>
> BTW. Your provider size is 149GB and mirror only use 19GB, which means
> you mirrored 149GB disk with 19GB disk and you waste 130GB (it's
> unused).

Yes, the ad0 disk was (is) only 40GB so only first 40 Gb of that disk  
was in mirror (the rest was in another slice with its own label..  
altough if I'm doing fdisk on the disk it seems to not be there at  
all..)
But hum, 19??.. It should be 40 (or somewhere around there at  
least).. From ad0 mount:
Filesystem           1K-blocks     Used     Avail Capacity  Mounted on
/dev/ad0s1a             507630    85142    381878    18%    /
/dev/ad0s1e             507630       20    467000     0%    /tmp
/dev/ad0s1f           10154158  1176410   8165416    13%    /usr
/dev/ad0s1d            1506190    80326   1305370     6%    /var
/dev/ad0s1g           24174212  6939804  15300472    31%    /var/squid
swapinfo:
/dev/ad0s1b       1022536        0  1022536     0%

~35Gb...
Compared slice 1 on ad0 vs ad6, both have the exact same size.

>
>> Some googling indicated  that  SYNCHRONIZING means that its not
>> "complete" and wont mount? Is that correct? Why would it be in that
>> state then, I just shut it down fine... And where the f*ck did my
>> slices go??..
>
> SYNCHRONIZING means that this component was/is being synchronized. It
> seems that you removed/lost the master disk, while it was  
> synchronizing.
> It should work anyway.

Okay thats odd.. ad6 was the only disk in the mirror when I shut down  
(shutdown -p now, and it powered off by itself..) so it should have  
been good..

>
> BTW. You confuse things again. Your slice is just fine (ad6s1), you
> don't have partitions, AFAIU.

Seems I did yes, thanks. Disks have slices (which on windows/dos/ 
linux world is called partitions) which have partitions.. check :)

>
> All in all, your partition table seems to be gone. If you created  
> it on
> gmirror before (gm0s1) you may still have the same partition table on
> the other half of the mirror. You can try to move it to ad6 with
> bsdlabel and verify if you can see file system inside partitions.

Okay, tried that now.. Saved ad0s1 label, reloaded it onto ad6s1..  
Now I got same partition table on ad6s1 as on ad0s1...
Trying to mount any though gives me incorrect super block... fsck  
cannot find any superblocks either..

So.. What to do now then? Just for get ad6 and start from scratch  
from ad2? (as i said, the data isnt very old realy)...

Im thinking about doing complete reinstall on ad4+ad6 then.. Can I do  
that? fdisk both with full partition on both, create a new gmirror  
between ad6s1/ad4s1 (or should i go on ad4/ad6?), create slices, use  
dump | restore (of course with apps shutdown so no data is changed..  
or at least nothing that I care about) to copy all files from ad2 to  
new mirror.. what do I need to do more? bsdlabel -B on both to write  
boot blocks? Is there anything else to think about?


Thanks for your help..:)



More information about the freebsd-stable mailing list