ZFS pool faulted (corrupt metadata) but the disk data appears ok...

Ben RUBSON ben.rubson at gmail.com
Fri Feb 2 16:12:06 UTC 2018


On 02 Feb 2018 11:51, Michelle Sullivan wrote:

> Ben RUBSON wrote:
>> On 02 Feb 2018 11:26, Michelle Sullivan wrote:
>>
>> Hi Michelle,
>>
>>> Michelle Sullivan wrote:
>>>> Michelle Sullivan wrote:
>>>>> So far (few hours in) zfs import -fFX has not faulted with this  
>>>>> image...
>>>>> it's running out of memory currently about 16G of 32G- however 9.2-P15
>>>>> kernel died within minutes... out of memory (all 32G and swap) so am
>>>>> more optimistic at the moment...  Fingers Crossed.
>>>> And the answer:
>>>>
>>>> 11-STABLE on a USB stick.
>>>>
>>>> Remove the drive that was replacing the hotspare (ie the replacement
>>>> drive for the one that initially died)
>>>> zpool import -fFX storage
>>>> zpool export storage
>>>>
>>>> reboot back to 9.x
>>>> zpool import storage
>>>> re-insert drive replacement drive.
>>>> reboot
>>> Gotta thank people for this again, saved me again this time on a  
>>> non-FreeBSD system this time (with a lot of using a modified  
>>> recoverdisk for OSX - thanks PSK@)... Lost 3 disks out of a raidz2 and  
>>> 2 more had read errors on some sectors.. don't know how much (if any)  
>>> data I've lost but at least it's not a rebuild from back up of all  
>>> 48TB..
>>
>> What about the root-cause ?
>
> 3 disks died whilst the server was in transit from Malta to Australia  
> (and I'm surprised that was all considering the state of some of the  
> stuff that came out of the container - have a 3kva UPS that is completely  
> destroyed despite good packing.)
>> Sounds like you had 5 disks dying at the same time ?
>
> Turns out that one of the 3 that had 'red lights on' had bad sectors, the  
> other 2 were just excluded by the BIOS...  I did a byte copy onto new  
> drives found no read errors so put them back in and forced them online.   
> The other 1 had 78k of bytes unreadable so new disk went in and an  
> convinced the controller that it was the same disk as the one it  
> replaced, the export/import produced 2 more disks unrecoverable read  
> errors that nothing had flagged previously, so byte copied them onto new  
> drives and the import -fFX is currently working (5 hours so far)...
>
>> Do you periodically run long smart tests ?
>
> Yup (fully automated.)
>
>> Zpool scrubs ?
>
> Both servers took a zpool scrub before they were packed into the  
> containers... the second one came out unscathed... but then most stuff in  
> the second container came out unscathed unlike the first....

What a story ! Thanks for the details.

So disks died because of the carrier, as I assume the second unscathed  
server was OK...
Heads must have scratched the platters, but they should have been parked,  
so... Really strange.

Hope you'll recover your whole pool.

Ben



More information about the freebsd-fs mailing list