ZFS panic under extreme circumstances (2/3 disks corrupted)
serenity at exscape.org
Mon May 25 16:12:59 UTC 2009
On May 25, 2009, at 05:39 PM, Freddie Cash wrote:
> On Mon, May 25, 2009 at 2:13 AM, Thomas Backman
> <serenity at exscape.org> wrote:
>> On May 24, 2009, at 09:02 PM, Thomas Backman wrote:
>>> So, I was playing around with RAID-Z and self-healing...
>> Yet another follow-up to this.
>> It appears that all traces of errors vanish after a reboot. So, say
>> you have
>> a dying disk; ZFS repairs the data for you, and you don't notice
>> (unless you
>> check zpool status). Then you reboot, and there's NO (easy?) way
>> that I can
>> tell to find out that something is wrong with your hardware!
> On our storage server that was initially configured using 1 large
> 24-drive raidz2 vdev (don't do that, by the way), we had 1 drive go
> south. "zpool status" was full of errors. And the error counts
> survived reboots. Either that, or the drive was so bad that the error
> counts started increasing right away after a boot. After a week of
> fighting with it to get the new drive to resilver and get added to the
> vdev, we nuked it and re-created it using 3 raidz2 vdevs each
> comprised of 8 drives.
> (Un)fortunately, that was the only failure we've had so far, so can't
> really confirm/deny the "error counts reset after reboot".
Was this on FreeBSD?
I have another unfortunate thing to note regarding this: after a
reboot, it's even impossible to tell *which disk* has gone bad, even
if the pool is "uncleared" but otherwise "healed". It simply says that
a device has failed, with no clue as to which one, since they're all
More information about the freebsd-current