ZFS help! (solved)
Mike Tancsa
mike at sentex.net
Tue Feb 1 15:17:59 UTC 2011
On 1/31/2011 3:32 PM, Adam Vande More wrote:
>>
>
> maybe the meta data stuff is stored above it in /tank1/? I don't know. I'm
> pretty sure you can use a newer version of ZFS to rewind the transaction
> groups until you get back to a good state, but there's probably a lot in
> this scenario that would prevent that from being a viable solution. If you
> do get it resolved please post the resolution.
OK, to summarize what happened for the archives. This is RELENG_8 (from
end of Jan, on AMD64, 8G of RAM)
On my DR backup server that has backups of backups, I decided to expand
an existing pool. I added a new eSata cage with integrated PM
2011-01-28.11:45:43 zpool add tank1 raidz /dev/ada0 /dev/ada1 /dev/ada2
/dev/ada3
0(offsite)# camcontrol devlist
<WDC WD1001FALS-00J7B1 05.00K05> at scbus0 target 0 lun 0 (pass0,ada0)
<WDC WD1001FALS-00J7B1 05.00K05> at scbus0 target 1 lun 0 (pass1,ada1)
<WDC WD1001FALS-00J7B1 05.00K05> at scbus0 target 2 lun 0 (pass2,ada2)
<WDC WD1001FALS-00J7B1 05.00K05> at scbus0 target 3 lun 0 (pass3,ada3)
<Port Multiplier 47261095 1f06> at scbus0 target 15 lun 0 (pass4,pmp0)
<WDC WD2001FASS-00U0B0 01.00101> at scbus1 target 0 lun 0 (pass5,ada4)
<WDC WD1501FASS-00W2B0 05.01D05> at scbus1 target 1 lun 0 (pass6,ada5)
<WDC WD1501FASS-00W2B0 05.01D05> at scbus1 target 2 lun 0 (pass7,ada6)
<WDC WD1501FASS-00W2B0 05.01D05> at scbus1 target 3 lun 0 (pass8,ada7)
<WDC WD1501FASS-00W2B0 05.01D05> at scbus1 target 4 lun 0 (pass9,ada8)
<Port Multiplier 47261095 1f06> at scbus1 target 15 lun 0 (pass10,pmp1)
0(offsite)#
Controller is an Sil3134 (siis and ahci drivers)
Shortly after bringing the new sets of drives online, the drive cage
failed and started to present the drives in some odd way where the label
on the drives was no longer there.
# zdb -l /dev/ada0
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3
# zpool status -v
pool: tank1
state: UNAVAIL
status: One or more devices could not be opened. There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-3C
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank1 UNAVAIL 0 0 0 insufficient replicas
raidz1 ONLINE 0 0 0
ad0 ONLINE 0 0 0
ad1 ONLINE 0 0 0
ad4 ONLINE 0 0 0
ad6 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
ada4 ONLINE 0 0 0
ada5 ONLINE 0 0 0
ada6 ONLINE 0 0 0
ada7 ONLINE 0 0 0
raidz1 UNAVAIL 0 0 0 insufficient replicas
ada0 UNAVAIL 0 0 0 cannot open
ada1 UNAVAIL 0 0 0 cannot open
ada2 UNAVAIL 0 0 0 cannot open
ada3 UNAVAIL 0 0 0 cannot open
Pulling the drives out and putting them in a new drive cage allowed me
to see the file system as being online, albeit with errors. Next steps
were to delete the 2 problem files
On bootup, it looked like
zpool status -v
pool: tank1
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank1 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
ad0 ONLINE 0 0 0
ad1 ONLINE 0 0 0
ad4 ONLINE 0 0 0
ad6 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
ada0 ONLINE 0 0 0
ada1 ONLINE 0 0 0
ada2 ONLINE 0 0 0
ada3 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
ada5 ONLINE 0 0 0
ada8 ONLINE 0 0 0
ada7 ONLINE 0 0 0
ada6 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
/tank1/argus-data/previous/argus-sites-radium.2011.01.28.16.00
tank1/argus-data:<0xc6>
/tank1/argus-data/argus-sites-radium
Killed those files via rm, and then zpool status -v shows
errors: Permanent errors have been detected in the following files:
tank1/argus-data:<0xc5>
tank1/argus-data:<0xc6>
tank1/argus-data:<0xc7>
So started a scrub and once it was done, no errors and all is clean!
0(offsite)# zpool status
pool: tank1
state: ONLINE
scrub: scrub completed after 7h32m with 0 errors on Mon Jan 31 23:00:46
2011
config:
NAME STATE READ WRITE CKSUM
tank1 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
ad0 ONLINE 0 0 0
ad1 ONLINE 0 0 0
ad4 ONLINE 0 0 0
ad6 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
ada0 ONLINE 0 0 0
ada1 ONLINE 0 0 0
ada2 ONLINE 0 0 0
ada3 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
ada5 ONLINE 0 0 0
ada8 ONLINE 0 0 0
ada7 ONLINE 0 0 0
ada6 ONLINE 0 0 0
errors: No known data errors
0(offsite)#
---Mike
More information about the freebsd-fs
mailing list