ZFS pool faulted (corrupt metadata) but the disk data appears ok...
Michelle Sullivan
michelle at sorbs.net
Sun Mar 10 02:29:44 UTC 2019
Michelle Sullivan wrote:
> Ben RUBSON wrote:
>> On 02 Feb 2018 21:48, Michelle Sullivan wrote:
>>
>>> Ben RUBSON wrote:
>>>
>>>> So disks died because of the carrier, as I assume the second
>>>> unscathed server was OK...
>>>
>>> Pretty much.
>>>
>>>> Heads must have scratched the platters, but they should have been
>>>> parked, so... Really strange.
>>>
>>> You'd have thought... though 2 of the drives look like it was wear
>>> and wear issues (the 2 not showing red lights) just not picked up on
>>> the periodic scrub.... Could be that the recovery showed that one
>>> up... you know - how you can have an array working fine, but one
>>> disk dies then others fail during the rebuild because of the extra
>>> workload.
>>
>> Yes... To try to mitigate this, when I add a new vdev to a pool, I
>> spread the new disks I have among the existing vdevs, and construct
>> the new vdev with the remaining new disk(s) + other disks retrieved
>> from the other vdevs. Thus, when possible, avoiding vdevs with all
>> disks at the same runtime.
>> However I only use mirrors, applying this with raid-Z could be a
>> little bit more tricky...
>>
> Believe it or not...
>
> # zpool status -v
> pool: VirtualDisks
> state: ONLINE
> status: One or more devices are configured to use a non-native block
> size.
> Expect reduced performance.
> action: Replace affected devices with devices that support the
> configured block size, or migrate data to a properly configured
> pool.
> scan: none requested
> config:
>
> NAME STATE READ WRITE CKSUM
> VirtualDisks ONLINE 0 0 0
> zvol/sorbs/VirtualDisks ONLINE 0 0 0 block size:
> 512B configured, 8192B native
>
> errors: No known data errors
>
> pool: sorbs
> state: ONLINE
> scan: resilvered 2.38T in 307445734561816429h29m with 0 errors on
> Sat Aug 26 09:26:53 2017
> config:
>
> NAME STATE READ WRITE CKSUM
> sorbs ONLINE 0 0 0
> raidz2-0 ONLINE 0 0 0
> mfid0 ONLINE 0 0 0
> mfid1 ONLINE 0 0 0
> mfid7 ONLINE 0 0 0
> mfid8 ONLINE 0 0 0
> mfid12 ONLINE 0 0 0
> mfid10 ONLINE 0 0 0
> mfid14 ONLINE 0 0 0
> mfid11 ONLINE 0 0 0
> mfid6 ONLINE 0 0 0
> mfid15 ONLINE 0 0 0
> mfid2 ONLINE 0 0 0
> mfid3 ONLINE 0 0 0
> spare-12 ONLINE 0 0 3
> mfid13 ONLINE 0 0 0
> mfid9 ONLINE 0 0 0
> mfid4 ONLINE 0 0 0
> mfid5 ONLINE 0 0 0
> spares
> 185579620420611382 INUSE was /dev/mfid9
>
> errors: No known data errors
>
>
> It would appear that the when I replaced the damaged drives it picked
> one of them up as being rebuilt from back in August (before it was
> packed up to go) and that was why it saw it as 'corrupted metadata'
> and spent the last 3 weeks importing it, it rebuilt it as it was
> importing it.. no dataloss that I can determine. (literally just
> finished in the middle of the night here.)
>
And back to this little nutmeg...
We had a fire last night ... and it (the same pool) was resilvering
again... Corrupted the metadata.. import -fFX worked and it started
rebuilding, then during the early hours when the pool was at 50%(ish)
rebuilt/resilvered (one vdev) there was at last one more issue on the
powerline... UPSs went out after multiple hits and now can't get it
imported - the server was in single user mode - on a FBSD-12 USB stick
... so it was only resilvering... "zdb -AAA -L -uhdi -FX -e storage"
returns sanely...
anyone any thoughts how I might get the data back/pool to import? (zpool
import -fFX storage spends a long time working and eventually comes back
with unable to import as one or more of the vdevs are unavailable -
however they are all there as far as I can tell)
THanks,
--
Michelle Sullivan
http://www.mhix.org/
More information about the freebsd-fs
mailing list