ZFS pool faulted (corrupt metadata) but the disk data appears ok...

Fri Feb 6 11:28:21 UTC 2015

Stefan Esser wrote:
> Am 06.02.2015 um 03:20 schrieb Michelle Sullivan:
>   
>> 2. zpool import -f -n -F -X storage and see if the system would
>> give you a proposal.
>>
>>     
>>> This crashes (without -n) the machine out of memory.... there's
>>> 32G of RAM. /boot/loader.conf contains:
>>>       
>>> vfs.zfs.prefetch_disable=1 #vfs.zfs.arc_min="8G" 
>>> #vfs.zfs.arc_max="16G" #vm.kmem_size_max="8" #vm.kmem_size="6G" 
>>> vfs.zfs.txg.timeout="5" kern.maxvnodes=250000 
>>> vfs.zfs.write_limit_override=1073741824 vboxdrv_load="YES"
>>>       
>
> I've recovered two "lost" ZFS pools (1 on my system, the other on
> someone elses) by identifying a TXG for state that at least allowed
> copying to a fresh pool.
>
> The main tool was zdb, which contains user land implementations of
> the kernel code, that leads to panic on import. You can use zdb to
> test what is left from your pool (and then try to find a way to get
> most of it rescued), if you add commands that make errors non-fatal
> and that skip some consistency checks, e.g.:
>
> # zdb -AAA -L -u %POOL%
>
> You may need to add -e and possibly also -p %PATH_TO_DEVS% before
> the pool lname.
>
>   
root at colossus:~ # zdb -AAA -L -e storage

Configuration for import:
        vdev_children: 1
        version: 5000
        pool_guid: 10618504954404185222
        name: 'storage'
        state: 0
        hostid: 4203774842
        hostname: 'colossus'
        vdev_tree:
            type: 'root'
            id: 0
            guid: 10618504954404185222
            children[0]:
                type: 'raidz'
                id: 0
                guid: 12489400212295803034
                nparity: 2
                metaslab_array: 34
                metaslab_shift: 38
                ashift: 9
                asize: 45000449064960
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 3998695725653225547
                    phys_path: '/dev/mfid0'
                    whole_disk: 1
                    DTL: 168
                    create_txg: 4
                    path: '/dev/mfid15'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 10795471632546545577
                    phys_path: '/dev/mfid1'
                    whole_disk: 1
                    DTL: 167
                    create_txg: 4
                    path: '/dev/mfid13'
                children[2]:
                    type: 'disk'
                    id: 2
                    guid: 15820272272734706674
                    phys_path: '/dev/mfid2'
                    whole_disk: 1
                    DTL: 166
                    create_txg: 4
                    path: '/dev/mfid0'
                children[3]:
                    type: 'disk'
                    id: 3
                    guid: 3928579496187019848
                    phys_path: '/dev/mfid3'
                    whole_disk: 1
                    DTL: 165
                    create_txg: 4
                    path: '/dev/mfid1'
                children[4]:
                    type: 'disk'
                    id: 4
                    guid: 7125052278051590304
                    phys_path: '/dev/mfid4'
                    whole_disk: 1
                    DTL: 164
                    create_txg: 4
                    path: '/dev/mfid2'
                children[5]:
                    type: 'disk'
                    id: 5
                    guid: 14370198745088794709
                    phys_path: '/dev/mfid5'
                    whole_disk: 1
                    DTL: 163
                    create_txg: 4
                    path: '/dev/mfid3'
                children[6]:
                    type: 'disk'
                    id: 6
                    guid: 1843597351388951655
                    phys_path: '/dev/mfid6'
                    whole_disk: 1
                    DTL: 162
                    create_txg: 4
                    path: '/dev/mfid4'
                children[7]:
                    type: 'replacing'
                    id: 7
                    guid: 2914889727426054645
                    whole_disk: 0
                    create_txg: 4
                    children[0]:
                        type: 'disk'
                        id: 0
                        guid: 10956220251832269421
                        phys_path: '/dev/mfid15'
                        whole_disk: 1
                        DTL: 179
                        create_txg: 4
                        path: '/dev/mfid11'
                    children[1]:
                        type: 'disk'
                        id: 1
                        guid: 2463756237300743131
                        phys_path: '/dev/mfid13'
                        whole_disk: 1
                        DTL: 181
                        create_txg: 4
                        resilvering: 1
                        path: '/dev/mfid12'
                children[8]:
                    type: 'disk'
                    id: 8
                    guid: 8864096842672670007
                    phys_path: '/dev/mfid7'
                    whole_disk: 1
                    DTL: 160
                    create_txg: 4
                    path: '/dev/mfid5'
                children[9]:
                    type: 'disk'
                    id: 9
                    guid: 4650681673751655245
                    phys_path: '/dev/mfid8'
                    whole_disk: 1
                    DTL: 159
                    create_txg: 4
                    path: '/dev/mfid14'
                children[10]:
                    type: 'disk'
                    id: 10
                    guid: 8432109430432996813
                    phys_path: '/dev/mfid9'
                    whole_disk: 1
                    DTL: 158
                    create_txg: 4
                    path: '/dev/mfid6'
                children[11]:
                    type: 'disk'
                    id: 11
                    guid: 414941847968750824
                    phys_path: '/dev/mfid10'
                    whole_disk: 1
                    DTL: 157
                    create_txg: 4
                    path: '/dev/mfid7'
                children[12]:
                    type: 'disk'
                    id: 12
                    guid: 7335375930620195352
                    phys_path: '/dev/mfid11'
                    whole_disk: 1
                    DTL: 156
                    create_txg: 4
                    path: '/dev/mfid8'
                children[13]:
                    type: 'disk'
                    id: 13
                    guid: 5100737174610362
                    phys_path: '/dev/mfid12'
                    whole_disk: 1
                    DTL: 155
                    create_txg: 4
                    path: '/dev/mfid9'
                children[14]:
                    type: 'disk'
                    id: 14
                    guid: 15695558693726858796
                    phys_path: '/dev/mfid14'
                    whole_disk: 1
                    DTL: 174
                    create_txg: 4
                    path: '/dev/mfid10'
Segmentation fault (core dumped)
root at colossus:~ # zdb -AAA -L -u -e storage
Segmentation fault (core dumped)

> Other commands to try instead of -u are e.g. -d and -h.
>   
root at colossus:~ # zdb -AAA -L -d -e storage
Segmentation fault (core dumped)
root at colossus:~ # zdb -AAA -L -h -e storage
Segmentation fault (core dumped)

> If you can get a history list, then you may want to add -T %TXG%
> for some txg number in the past, to see whether you get better
> results.
>
> You may want to set "vfs.zfs.debug=1" in loader.conf to prevent the
> kernel from panicing during import, BTW. But be careful, this can
> lead to undetected inconsistencies and is only a last resort for a
> read-only mounted pool that is to be copied out (once you are able
> to import it).
>
> Good luck, STefan
>   


-- 
Michelle Sullivan
http://www.mhix.org/