ZFS pool faulted (corrupt metadata) but the disk data appears ok...

Sun Feb 18 20:13:19 UTC 2018

Ben RUBSON wrote:
> On 02 Feb 2018 21:48, Michelle Sullivan wrote:
>
>> Ben RUBSON wrote:
>>
>>> So disks died because of the carrier, as I assume the second 
>>> unscathed server was OK...
>>
>> Pretty much.
>>
>>> Heads must have scratched the platters, but they should have been 
>>> parked, so... Really strange.
>>
>> You'd have thought... though 2 of the drives look like it was wear 
>> and wear issues (the 2 not showing red lights) just not picked up on 
>> the periodic scrub....  Could be that the recovery showed that one 
>> up... you know - how you can have an array working fine, but one disk 
>> dies then others fail during the rebuild because of the extra workload.
>
> Yes... To try to mitigate this, when I add a new vdev to a pool, I 
> spread the new disks I have among the existing vdevs, and construct 
> the new vdev with the remaining new disk(s) + other disks retrieved 
> from the other vdevs. Thus, when possible, avoiding vdevs with all 
> disks at the same runtime.
> However I only use mirrors, applying this with raid-Z could be a 
> little bit more tricky...
>
Believe it or not...

# zpool status -v
   pool: VirtualDisks
  state: ONLINE
status: One or more devices are configured to use a non-native block size.
     Expect reduced performance.
action: Replace affected devices with devices that support the
     configured block size, or migrate data to a properly configured
     pool.
   scan: none requested
config:

     NAME                       STATE     READ WRITE CKSUM
     VirtualDisks               ONLINE       0     0     0
       zvol/sorbs/VirtualDisks  ONLINE       0     0     0  block size: 
512B configured, 8192B native

errors: No known data errors

   pool: sorbs
  state: ONLINE
   scan: resilvered 2.38T in 307445734561816429h29m with 0 errors on Sat 
Aug 26 09:26:53 2017
config:

     NAME                  STATE     READ WRITE CKSUM
     sorbs                 ONLINE       0     0     0
       raidz2-0            ONLINE       0     0     0
         mfid0             ONLINE       0     0     0
         mfid1             ONLINE       0     0     0
         mfid7             ONLINE       0     0     0
         mfid8             ONLINE       0     0     0
         mfid12            ONLINE       0     0     0
         mfid10            ONLINE       0     0     0
         mfid14            ONLINE       0     0     0
         mfid11            ONLINE       0     0     0
         mfid6             ONLINE       0     0     0
         mfid15            ONLINE       0     0     0
         mfid2             ONLINE       0     0     0
         mfid3             ONLINE       0     0     0
         spare-12          ONLINE       0     0     3
           mfid13          ONLINE       0     0     0
           mfid9           ONLINE       0     0     0
         mfid4             ONLINE       0     0     0
         mfid5             ONLINE       0     0     0
     spares
       185579620420611382  INUSE     was /dev/mfid9

errors: No known data errors

It would appear that the when I replaced the damaged drives it picked 
one of them up as being rebuilt from back in August (before it was 
packed up to go) and that was why it saw it as 'corrupted metadata' and 
spent the last 3 weeks importing it, it rebuilt it as it was importing 
it.. no dataloss that I can determine. (literally just finished in the 
middle of the night here.)

-- 
Michelle Sullivan
http://www.mhix.org/