ZFS pool faulted (corrupt metadata) but the disk data appears ok...

Michelle Sullivan michelle at sorbs.net
Sun Mar 10 02:35:40 UTC 2019


Turns out the cause of the fire is now known...

https://www.southcoastregister.com.au/story/5945663/homes-left-without-power-after-electrical-pole-destroyed-in-sanctuary-point-accident/ UPSs couldn’t deal with 11kv down the 240v line... (guess I’m lucky no one was killed..)

Anyhow..





Any clues on how to get the pool back would be greatly appreciated..  the “cannot open” disk was the faulted disk that mfid13 was replacing...  being raidz2 there should be all the data there 



Michelle Sullivan
http://www.mhix.org/
Sent from my iPad

> On 10 Mar 2019, at 12:29, Michelle Sullivan <michelle at sorbs.net> wrote:
> 
> Michelle Sullivan wrote:
>> Ben RUBSON wrote:
>>>> On 02 Feb 2018 21:48, Michelle Sullivan wrote:
>>>> 
>>>> Ben RUBSON wrote:
>>>> 
>>>>> So disks died because of the carrier, as I assume the second unscathed server was OK...
>>>> 
>>>> Pretty much.
>>>> 
>>>>> Heads must have scratched the platters, but they should have been parked, so... Really strange.
>>>> 
>>>> You'd have thought... though 2 of the drives look like it was wear and wear issues (the 2 not showing red lights) just not picked up on the periodic scrub....  Could be that the recovery showed that one up... you know - how you can have an array working fine, but one disk dies then others fail during the rebuild because of the extra workload.
>>> 
>>> Yes... To try to mitigate this, when I add a new vdev to a pool, I spread the new disks I have among the existing vdevs, and construct the new vdev with the remaining new disk(s) + other disks retrieved from the other vdevs. Thus, when possible, avoiding vdevs with all disks at the same runtime.
>>> However I only use mirrors, applying this with raid-Z could be a little bit more tricky...
>>> 
>> Believe it or not...
>> 
>> # zpool status -v
>>  pool: VirtualDisks
>> state: ONLINE
>> status: One or more devices are configured to use a non-native block size.
>>    Expect reduced performance.
>> action: Replace affected devices with devices that support the
>>    configured block size, or migrate data to a properly configured
>>    pool.
>>  scan: none requested
>> config:
>> 
>>    NAME                       STATE     READ WRITE CKSUM
>>    VirtualDisks               ONLINE       0     0     0
>>      zvol/sorbs/VirtualDisks  ONLINE       0     0     0  block size: 512B configured, 8192B native
>> 
>> errors: No known data errors
>> 
>>  pool: sorbs
>> state: ONLINE
>>  scan: resilvered 2.38T in 307445734561816429h29m with 0 errors on Sat Aug 26 09:26:53 2017
>> config:
>> 
>>    NAME                  STATE     READ WRITE CKSUM
>>    sorbs                 ONLINE       0     0     0
>>      raidz2-0            ONLINE       0     0     0
>>        mfid0             ONLINE       0     0     0
>>        mfid1             ONLINE       0     0     0
>>        mfid7             ONLINE       0     0     0
>>        mfid8             ONLINE       0     0     0
>>        mfid12            ONLINE       0     0     0
>>        mfid10            ONLINE       0     0     0
>>        mfid14            ONLINE       0     0     0
>>        mfid11            ONLINE       0     0     0
>>        mfid6             ONLINE       0     0     0
>>        mfid15            ONLINE       0     0     0
>>        mfid2             ONLINE       0     0     0
>>        mfid3             ONLINE       0     0     0
>>        spare-12          ONLINE       0     0     3
>>          mfid13          ONLINE       0     0     0
>>          mfid9           ONLINE       0     0     0
>>        mfid4             ONLINE       0     0     0
>>        mfid5             ONLINE       0     0     0
>>    spares
>>      185579620420611382  INUSE     was /dev/mfid9
>> 
>> errors: No known data errors
>> 
>> 
>> It would appear that the when I replaced the damaged drives it picked one of them up as being rebuilt from back in August (before it was packed up to go) and that was why it saw it as 'corrupted metadata' and spent the last 3 weeks importing it, it rebuilt it as it was importing it.. no dataloss that I can determine. (literally just finished in the middle of the night here.)
>> 
> 
> And back to this little nutmeg...
> 
> We had a fire last night ...  and it (the same pool) was resilvering again...  Corrupted the metadata.. import -fFX worked and it started rebuilding, then during the early hours when the pool was at 50%(ish) rebuilt/resilvered (one vdev) there was at last one more issue on the powerline... UPSs went out after multiple hits and now can't get it imported - the server was in single user mode - on a FBSD-12 USB stick ...  so it was only resilvering...  "zdb -AAA -L -uhdi -FX -e storage" returns sanely...
> 
> anyone any thoughts how I might get the data back/pool to import? (zpool import -fFX storage spends a long time working and eventually comes back with unable to import as one or more of the vdevs are unavailable - however they are all there as far as I can tell)
> 
> THanks,
> 
> -- 
> Michelle Sullivan
> http://www.mhix.org/
> 


More information about the freebsd-fs mailing list