Raidz2 pool with single disk failure is faulted

Javier Martín Rueda jmrueda at diatel.upm.es
Mon Feb 2 23:09:06 PST 2009


Wes Morgan escribió:
> On Tue, 3 Feb 2009, Javier Martín Rueda wrote:
>
>> On a FreeBSD 7.1-PRERELEASE amd64 system I had a raidz2 pool made up 
>> of 8 disks. Due to some things I tried in the past, the pool was 
>> currently like this:
>>
>>       z1              ONLINE
>>         raidz2        ONLINE
>>           mirror/gm0  ONLINE
>>           mirror/gm1  ONLINE
>>           da2         ONLINE
>>           da3         ONLINE
>>           da4         ONLINE
>>           da5         ONLINE
>>           da6         ONLINE
>>           da7         ONLINE
>>
>> da2 to da7 where originally mirror/gm2 to mirror/gm7, but I replaced 
>> them little by little, eliminating the corresponding gmirrors at the 
>> same time. I don't think this is relevant for what I'm goint to 
>> explain, but I mention it just in case...
>>
>> One day, after a system reboot, one of the disks (da4) was dead and 
>> FreeBSD renamed all of the other disks that used to be after it (da5 
>> became da4, da6 became da5, and da7 became da6). The pool was 
>> unavailable (da4 to da6 marked as corrupt and da7 as unavailable) 
>> because I suppose ZFS couldn't match the contents in the last 3 disks 
>> to their new names. I was able to fix this by inserting a blank new 
>> disk, rebooting, now the disk names were correct again, and the pool 
>> showed up as degraded because da4 was unavailable, but usable. I 
>> resilvered the pool and everything was back to normal.
>>
>> Yesterday, another disk died after a system reboot and the pool was 
>> unavailable again because of the automatic renaming of the SCSI 
>> disks. However, this time I didn't substitute it by a blank disk, but 
>> for another identical disk which I had been using in the past in a 
>> different ZFS pool on a different computer, but with the same name 
>> (z1) and same characteristics (raidz2, 8 disks). The disk hadn't been 
>> erased and its pool hadn't been destroyed, so it still had whatever 
>> ZFS stored in it.
>>
>> After rebooting, it seems ZFS got confused or something when it found 
>> out about two different active pools with the same name, etc. and it 
>> faulted the pool. I stopped ZFS, wiped the beginning and end of the 
>> disk with zeroes, but the problem persisted. Finally, I tried to 
>> export and import the pool, as I read somewhere that may help, but 
>> zpool import complains about an I/O error (which I imagine is 
>> ficticious, because all of the disks are find, I can read from them 
>> with dd no problem).
>>
>> The current situation is this:
>>
>> # zpool import
>> pool: z1
>>   id: 8828203687312199578
>> state: FAULTED
>> status: One or more devices contains corrupted data.
>> action: The pool cannot be imported due to damaged devices or data.
>>       The pool may be active on on another system, but can be 
>> imported using
>>       the '-f' flag.
>>  see: http://www.sun.com/msg/ZFS-8000-5E
>> config:
>>
>>       z1              FAULTED   corrupted data
>>         raidz2        ONLINE
>>           mirror/gm0  ONLINE
>>           mirror/gm1  ONLINE
>>           da2         ONLINE
>>           da3         ONLINE
>>           da4         UNAVAIL   corrupted data
>>           da5         ONLINE
>>           da6         ONLINE
>>           da7         ONLINE
>> # zpool import -f z1
>> cannot import 'z1': I/O error
>>
>> By the way, before exporting the pool, the CKSUM column in "zpool 
>> status" showed 6 errors. However, zpool status -v didn't give any 
>> additional information.
>>
>> How come the pool is faulted if it is raidz2 and 7 out of 8 disks are 
>> reported as fine? Any idea how to recover the pool? The data has to 
>> be in there, as I haven't done any other destructive operation, as 
>> far as I can think of, and I imagine it should be some stupid little 
>> detail.
>>
>> I have dumped all of the labels in the 8 disks with zdb -l, and I 
>> don't see anything peculiar. They are fine in the 7 online disks, and 
>> it doesn't exist in the da4 disk.
>>
>> Is there some kind of diagnostic tools similar to dumpfs, but for zfs?
>>
>> I can provide additional information if needed.
>
> I would try removing /boot/zfs/zpool.cache and re-importing, and if 
> that doesn't work detach da4 device (camcontrol stop da4 or so) and 
> see if it will import.
>
> Also make sure you wiped at least 512k from the front of the drive.
I tried all that, but nothing worked.

I've tried to trace what's going on in the kernel when I try to import 
the pool. The problem seems to be in dsl_pool_open(). In the first part 
of the function, there is this call:

 err = zap_lookup(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
            DMU_POOL_ROOT_DATASET, sizeof (uint64_t), 1,
            &dp->dp_root_dir_obj);
if (err)
                goto out;

zap_lookup() was returning EIO, but I don't think it is a real I/O 
problem, but a checksumming problem, because I also got these messages:

zio 0xffffff000bcb5810 vdev raidz offset 6eb6d6d9000 stage 15 error 86
retry #1 for read to raidz offset 6eb6d6d9000
zio 0xffffff000bcb5810 vdev raidz offset 6eb6d6d9000 stage 15 error 86
zio 0xffffff000bcb5810 vdev raidz offset 6eb6d6d9000 stage 16 error 86
zio 0xffffff000bcb5810 vdev raidz offset 6eb6d6d9000 stage 17 error 86
zio 0xffffff000bcb5ac0 vdev <unknown> offset 0 stage 14 error 86
zio 0xffffff005a5dbac0 vdev raidz offset c03bf45800 stage 15 error 86
retry #1 for read to raidz offset c03bf45800
zio 0xffffff005a5dbac0 vdev raidz offset c03bf45800 stage 15 error 86
zio 0xffffff005a5dbac0 vdev raidz offset c03bf45800 stage 16 error 86
zio 0xffffff005a5dbac0 vdev raidz offset c03bf45800 stage 17 error 86
zio 0xffffff000bcb5ac0 vdev <unknown> offset 0 stage 14 error 86
zio 0xffffff005a534ac0 vdev raidz offset 5902760f800 stage 15 error 86
retry #1 for read to raidz offset 5902760f800
zio 0xffffff005a534ac0 vdev raidz offset 5902760f800 stage 15 error 86
zio 0xffffff005a534ac0 vdev raidz offset 5902760f800 stage 16 error 86
zio 0xffffff005a534ac0 vdev raidz offset 5902760f800 stage 17 error 86
zio 0xffffff000bcb5ac0 vdev <unknown> offset 0 stage 14 error 86
zio 0xffffff000bcb5ac0 vdev <unknown> offset 0 stage 15 error 86
retry #1 for read to <unknown> offset 0
zio 0xffffff0003ebbac0 vdev raidz offset 6eb6d6d9000 stage 15 error 86
retry #1 for read to raidz offset 6eb6d6d9000
zio 0xffffff0003ebbac0 vdev raidz offset 6eb6d6d9000 stage 15 error 86
zio 0xffffff0003ebbac0 vdev raidz offset 6eb6d6d9000 stage 16 error 86
zio 0xffffff0003ebbac0 vdev raidz offset 6eb6d6d9000 stage 17 error 86
zio 0xffffff000bcb5ac0 vdev <unknown> offset 0 stage 14 error 86
zio 0xffffff0003eba2b0 vdev raidz offset c03bf45800 stage 15 error 86
retry #1 for read to raidz offset c03bf45800
zio 0xffffff0003eba2b0 vdev raidz offset c03bf45800 stage 15 error 86
zio 0xffffff0003eba2b0 vdev raidz offset c03bf45800 stage 16 error 86
zio 0xffffff0003eba2b0 vdev raidz offset c03bf45800 stage 17 error 86
zio 0xffffff000bcb5ac0 vdev <unknown> offset 0 stage 14 error 86
zio 0xffffff000bc93ac0 vdev raidz offset 5902760f800 stage 15 error 86
retry #1 for read to raidz offset 5902760f800
zio 0xffffff000bc93ac0 vdev raidz offset 5902760f800 stage 15 error 86
zio 0xffffff000bc93ac0 vdev raidz offset 5902760f800 stage 16 error 86
zio 0xffffff000bc93ac0 vdev raidz offset 5902760f800 stage 17 error 86
zio 0xffffff000bcb5ac0 vdev <unknown> offset 0 stage 14 error 86
zio 0xffffff000bcb5ac0 vdev <unknown> offset 0 stage 15 error 86
zio 0xffffff000bcb5ac0 vdev <unknown> offset 0 stage 16 error 86
zio 0xffffff000bcb5ac0 vdev <unknown> offset 0 stage 17 error 86
zio 0xffffff0003f44000 vdev <unknown> offset 0 stage 17 error 5

Error 86 seems to be ECKSUM, so I decided to disable checksumming and 
see what happened. To disable checksumming I edited 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio_checksum.c 
and just put a return 0 at the beginning of zio_checksum_error().

I tried to import again, and still didn't work. Only this time, 
zap_lookup() was returning ENOENT, which I imagine it means that ZFS 
cannot locate the root of the pool or something like that.

Any ideas?



More information about the freebsd-fs mailing list