zraid2 loses a single disk and becomes difficult to recover

Mon Oct 12 19:49:40 UTC 2009

I managed to cleanly recover all critical data by cloning the most recent
snapshots of all my filesystems (which worked even for those filesystems
that had disappeared from 'zfs list') - and moving back to ufs2

The 'live' filesystems since the snapshots had pretty much gone corrupt.

Intereresting note is that even if I promoted those clones - if the system
was rebooted the contents of the snapshots became gobbledygooked (invalid
byte sequence errors on numerous files).

As it stands I managed to recover 100% of the data, so I'm out the woods.

How does a dual-parity array lose its mind when only one disk is lost ?
Might it have been related to the old TXGid I found on ad16 and ad17 ?

--
Alex

2009/10/11 Alex Trull <alextzfs at googlemail.com>

> Well after trying alot of things (zpool import with or without cache file
> in place, etc), it randomly managed to mount the pool up, atleast, with
> errors - :
>
> zfs list output:
> cannot iterate filesystems: I/O error
> NAME                      USED  AVAIL  REFER  MOUNTPOINT
> fatman                   1.40T  1.70T  51.2K  /fatman
> fatman/backup             100G  99.5G  95.5G  /fatman/backup
> fatman/jail               422G  1.70T  60.5K  /fatman/jail
> fatman/jail/havnor        198G  51.7G   112G  /fatman/jail/havnor
> fatman/jail/mail         19.4G  30.6G  13.0G  /fatman/jail/mail
> fatman/jail/syndicate    16.6G   103G  10.5G  /fatman/jail/syndicate
> fatman/jail/thirdforces   159G  41.4G  78.1G  /fatman/jail/thirdforces
> fatman/jail/web          24.8G  25.2G  22.3G  /fatman/jail/web
> fatman/stash              913G  1.70T   913G  /fatman/stash
>
> (end of the dmesg)
> JMR: vdev_uberblock_load_done ubbest ub_txg=46475461
> ub_timestamp=1255231841
> JMR: vdev_uberblock_load_done ub_txg=46481476 ub_timestamp=1255234263
> JMR: vdev_uberblock_load_done ubbest ub_txg=46481476
> ub_timestamp=1255234263
> JMR: vdev_uberblock_load_done ubbest ub_txg=46475459
> ub_timestamp=1255231780
> JMR: vdev_uberblock_load_done ubbest ub_txg=46475458
> ub_timestamp=1255231750
> JMR: vdev_uberblock_load_done ub_txg=46481473 ub_timestamp=1255234263
> JMR: vdev_uberblock_load_done ubbest ub_txg=46481473
> ub_timestamp=1255234263
> JMR: vdev_uberblock_load_done ubbest ub_txg=46481472
> ub_timestamp=1255234263
> Solaris: WARNING: can't open objset for fatman/jail/margaret
> Solaris: WARNING: can't open objset for fatman/jail/margaret
> Solaris: WARNING: ZFS replay transaction error 86, dataset
> fatman/jail/havnor, seq 0x25442, txtype 9
>
> Solaris: WARNING: ZFS replay transaction error 86, dataset
> fatman/jail/mail, seq 0x1e200, txtype 9
>
> Solaris: WARNING: ZFS replay transaction error 86, dataset
> fatman/jail/thirdforces, seq 0x732e3, txtype 9
>
> [root at potjie /fatman/jail/mail]# zpool status -v
>   pool: fatman
>  state: DEGRADED
> status: One or more devices has experienced an error resulting in data
>     corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>     entire pool from backup.
>    see: http://www.sun.com/msg/ZFS-8000-8A
>  scrub: resilver in progress for 0h4m, 0.83% done, 8h21m to go
> config:
>
>     NAME           STATE     READ WRITE CKSUM
>     fatman         DEGRADED     0     0    34
>       raidz2       DEGRADED     0     0   384
>         replacing  DEGRADED     0     0     0
>           da2/old  REMOVED      0    24     0
>           da2      ONLINE       0     0     0  1.71G resilvered
>         ad4        ONLINE       0     0     0  21.3M resilvered
>         ad6        ONLINE       0     0     0  21.4M resilvered
>         ad20       ONLINE       0     0     0  21.3M resilvered
>         ad22       ONLINE       0     0     0  21.3M resilvered
>         ad17       ONLINE       0     0     0  21.3M resilvered
>         da3        ONLINE       0     0     0  21.3M resilvered
>         ad10       ONLINE       0     0     1  21.4M resilvered
>         ad16       ONLINE       0     0     0  21.2M resilvered
>     cache
>       ad18         ONLINE       0     0     0
>
> errors: Permanent errors have been detected in the following files:
>
>         fatman/jail/margaret:<0x0>
>         fatman/jail/syndicate:<0x0>
>         fatman/jail/mail:<0x0>
>         /fatman/jail/mail/tmp
>         fatman/jail/havnor:<0x0>
>         fatman/jail/thirdforces:<0x0>
>         fatman/backup:<0x0>
>
> jail/margaret & backup isn't showing up in zfs list
> jail/syndicate is showing up but isn't viewable
>
> It seems the latest content on the better-looking zfs filesystems are quite
> recent.
>
> Any thoughts about what is going on ?
>
> I have snapshots for africa on these zfs filesystems - any suggestions on
> trying to get them back ?
>
> --
> Alex
>
> 2009/10/11 Alex Trull <alextzfs at googlemail.com>
>
> Hi All,
>>
>> My zraid2 has broken this morning on releng_7 zfs13.
>>
>> System failed this morning and came back without pool - having lost a
>> disk.
>>
>> This is how I found the system:
>>
>>   pool: fatman
>>  state: FAULTED
>> status: One or more devices could not be used because the label is missing
>>
>>     or invalid.  There are insufficient replicas for the pool to continue
>>     functioning.
>> action: Destroy and re-create the pool from a backup source.
>>    see: http://www.sun.com/msg/ZFS-8000-5E
>>  scrub: none requested
>> config:
>>
>>     NAME        STATE     READ WRITE CKSUM
>>     fatman      FAULTED      0     0     1  corrupted data
>>       raidz2    DEGRADED     0     0     6
>>         da2     FAULTED      0     0     0  corrupted data
>>         ad4     ONLINE       0     0     0
>>         ad6     ONLINE       0     0     0
>>         ad20    ONLINE       0     0     0
>>         ad22    ONLINE       0     0     0
>>         ad17    ONLINE       0     0     0
>>         da2     ONLINE       0     0     0
>>         ad10    ONLINE       0     0     0
>>         ad16    ONLINE       0     0     0
>>
>> Initialy it complained that da3 had gone to da2 (da2 had failed and was no
>> longer seen)
>>
>> I replaced the original da2 and bumped what was originaly da3 back up to
>> da3 using the controllers ordering.
>>
>> [root at potjie /dev]# zpool status
>>   pool: fatman
>>  state: FAULTED
>> status: One or more devices could not be used because the label is missing
>>
>>     or invalid.  There are insufficient replicas for the pool to continue
>>     functioning.
>> action: Destroy and re-create the pool from a backup source.
>>    see: http://www.sun.com/msg/ZFS-8000-5E
>>  scrub: none requested
>> config:
>>
>>     NAME        STATE     READ WRITE CKSUM
>>     fatman      FAULTED      0     0     1  corrupted data
>>       raidz2    ONLINE       0     0     6
>>         da2     UNAVAIL      0     0     0  corrupted data
>>         ad4     ONLINE       0     0     0
>>         ad6     ONLINE       0     0     0
>>         ad20    ONLINE       0     0     0
>>         ad22    ONLINE       0     0     0
>>         ad17    ONLINE       0     0     0
>>         da3     ONLINE       0     0     0
>>         ad10    ONLINE       0     0     0
>>         ad16    ONLINE       0     0     0
>>
>> Issue looks very similar to this (JMR's issue) :
>> http://freebsd.monkey.org/freebsd-fs/200902/msg00017.html
>>
>> I've tried the methods there without much result.
>>
>> Using JMR's patches/debugs to see what is going on, this is what I got:
>>
>> JMR: vdev_uberblock_load_done ubbest ub_txg=46488653
>> ub_timestamp=1255246834
>> JMR: vdev_uberblock_load_done ub_txg=46475459 ub_timestamp=1255231780
>> JMR: vdev_uberblock_load_done ubbest ub_txg=46488653
>> ub_timestamp=1255246834
>> JMR: vdev_uberblock_load_done ub_txg=46475458 ub_timestamp=1255231750
>> JMR: vdev_uberblock_load_done ubbest ub_txg=46488653
>> ub_timestamp=1255246834
>> JMR: vdev_uberblock_load_done ub_txg=46481473 ub_timestamp=1255234263
>> JMR: vdev_uberblock_load_done ubbest ub_txg=46488653
>> ub_timestamp=1255246834
>> JMR: vdev_uberblock_load_done ub_txg=46481472 ub_timestamp=1255234263
>> JMR: vdev_uberblock_load_done ubbest ub_txg=46488653
>> ub_timestamp=1255246834
>>
>> But JMR's patch still doesn't let me import even with a decremented txg
>>
>> I then had a look around the drives using zdb and some dirty script:
>>
>> [root at potjie /dev]# ls /dev/ad* /dev/da2 /dev/da3 | awk '{print "echo
>> "$1";zdb -l "$1" |grep txg"}' | sh
>> /dev/ad10
>>     txg=46488654
>>     txg=46488654
>>     txg=46488654
>>     txg=46488654
>> /dev/ad16
>>     txg=46408223 <- old TXGid ?
>>     txg=46408223
>>     txg=46408223
>>     txg=46408223
>> /dev/ad17
>>     txg=46408223 <- old TXGid ?
>>     txg=46408223
>>     txg=46408223
>>     txg=46408223
>> /dev/ad18 (ssd)
>> /dev/ad19 (spare drive, removed from pool some time ago)
>>     txg=0
>>     create_txg=0
>>     txg=0
>>     create_txg=0
>>     txg=0
>>     create_txg=0
>>     txg=0
>>     create_txg=0
>> /dev/ad20
>>     txg=46488654
>>     txg=46488654
>>     txg=46488654
>>     txg=46488654
>> /dev/ad22
>>     txg=46488654
>>     txg=46488654
>>     txg=46488654
>>     txg=46488654
>> /dev/ad4
>>     txg=46488654
>>     txg=46488654
>>     txg=46488654
>>     txg=46488654
>> /dev/ad6
>>     txg=46488654
>>     txg=46488654
>>     txg=46488654
>>     txg=46488654
>> /dev/da2 < new drive replaced broken da2
>> /dev/da3
>>     txg=46488654
>>     txg=46488654
>>     txg=46488654
>>     txg=46488654
>>
>> I did not see any checksums or other issues on ad16 and ad17 previously,
>> and I do check regularly.
>>
>> Any thoughts on what to try next ?
>>
>> Regards,
>>
>> Alex
>>
>>
>