ZFS pool faulted (corrupt metadata) but the disk data appears ok...

Sun Mar 10 02:29:44 UTC 2019

Michelle Sullivan wrote:
> Ben RUBSON wrote:
>> On 02 Feb 2018 21:48, Michelle Sullivan wrote:
>>
>>> Ben RUBSON wrote:
>>>
>>>> So disks died because of the carrier, as I assume the second 
>>>> unscathed server was OK...
>>>
>>> Pretty much.
>>>
>>>> Heads must have scratched the platters, but they should have been 
>>>> parked, so... Really strange.
>>>
>>> You'd have thought... though 2 of the drives look like it was wear 
>>> and wear issues (the 2 not showing red lights) just not picked up on 
>>> the periodic scrub....  Could be that the recovery showed that one 
>>> up... you know - how you can have an array working fine, but one 
>>> disk dies then others fail during the rebuild because of the extra 
>>> workload.
>>
>> Yes... To try to mitigate this, when I add a new vdev to a pool, I 
>> spread the new disks I have among the existing vdevs, and construct 
>> the new vdev with the remaining new disk(s) + other disks retrieved 
>> from the other vdevs. Thus, when possible, avoiding vdevs with all 
>> disks at the same runtime.
>> However I only use mirrors, applying this with raid-Z could be a 
>> little bit more tricky...
>>
> Believe it or not...
>
> # zpool status -v
>   pool: VirtualDisks
>  state: ONLINE
> status: One or more devices are configured to use a non-native block 
> size.
>     Expect reduced performance.
> action: Replace affected devices with devices that support the
>     configured block size, or migrate data to a properly configured
>     pool.
>   scan: none requested
> config:
>
>     NAME                       STATE     READ WRITE CKSUM
>     VirtualDisks               ONLINE       0     0     0
>       zvol/sorbs/VirtualDisks  ONLINE       0     0     0  block size: 
> 512B configured, 8192B native
>
> errors: No known data errors
>
>   pool: sorbs
>  state: ONLINE
>   scan: resilvered 2.38T in 307445734561816429h29m with 0 errors on 
> Sat Aug 26 09:26:53 2017
> config:
>
>     NAME                  STATE     READ WRITE CKSUM
>     sorbs                 ONLINE       0     0     0
>       raidz2-0            ONLINE       0     0     0
>         mfid0             ONLINE       0     0     0
>         mfid1             ONLINE       0     0     0
>         mfid7             ONLINE       0     0     0
>         mfid8             ONLINE       0     0     0
>         mfid12            ONLINE       0     0     0
>         mfid10            ONLINE       0     0     0
>         mfid14            ONLINE       0     0     0
>         mfid11            ONLINE       0     0     0
>         mfid6             ONLINE       0     0     0
>         mfid15            ONLINE       0     0     0
>         mfid2             ONLINE       0     0     0
>         mfid3             ONLINE       0     0     0
>         spare-12          ONLINE       0     0     3
>           mfid13          ONLINE       0     0     0
>           mfid9           ONLINE       0     0     0
>         mfid4             ONLINE       0     0     0
>         mfid5             ONLINE       0     0     0
>     spares
>       185579620420611382  INUSE     was /dev/mfid9
>
> errors: No known data errors
>
>
> It would appear that the when I replaced the damaged drives it picked 
> one of them up as being rebuilt from back in August (before it was 
> packed up to go) and that was why it saw it as 'corrupted metadata' 
> and spent the last 3 weeks importing it, it rebuilt it as it was 
> importing it.. no dataloss that I can determine. (literally just 
> finished in the middle of the night here.)
>

And back to this little nutmeg...

We had a fire last night ...  and it (the same pool) was resilvering 
again...  Corrupted the metadata.. import -fFX worked and it started 
rebuilding, then during the early hours when the pool was at 50%(ish) 
rebuilt/resilvered (one vdev) there was at last one more issue on the 
powerline... UPSs went out after multiple hits and now can't get it 
imported - the server was in single user mode - on a FBSD-12 USB stick 
...  so it was only resilvering...  "zdb -AAA -L -uhdi -FX -e storage" 
returns sanely...

anyone any thoughts how I might get the data back/pool to import? (zpool 
import -fFX storage spends a long time working and eventually comes back 
with unable to import as one or more of the vdevs are unavailable - 
however they are all there as far as I can tell)

THanks,

-- 
Michelle Sullivan
http://www.mhix.org/