zfs corruption (again) due to interupted resilver and power faults.

Michelle Sullivan michelle at sorbs.net
Sun Mar 31 13:11:59 UTC 2019


Stefan Esser wrote:
> Am 20.03.19 um 08:15 schrieb Michelle Sullivan:
>> Michelle Sullivan wrote:
>>> Trying now thanks (and no I hadn’t - wasn’t aware of the sysctl)
>> Failed with the same old...
>>
>> http://flashback.sorbs.net/packages/zfs/image6.jpeg
> Hi Michelle,
>
> when I was in a somewhat similar situation, I recovered my pool
> (at least to copy it to new disk drives) by patching the ZFS code
> to ignore certain error aborts.
>
> Testing is possible with zdb, since it uses the same source files
> as the kernel module for all ZFS accesses.
>
> I identified the test that failed and made it non-fatal (issue a
> warning but continue). This lead to inconsistent checksums, since
> they were not correctly updated in the failure case. I had to make
> these checksum checks non-fatal, too.
>
> All testing can be done by issuing zdb commands, but I do not
> remember the exact options. Option -AAA is at least required, to
> make most checks non-fatal, but it was not sufficient.
>
> I cannot offer any more specific help, I'm afraid.
>
> Good luck in recovering your pool!
>
> Regards, STefan
Finally made progress..

Booted 12-STABLE on a USB key - installed to a USB external drive and 
booted that.

Built a debug kernel, installed and booted it, then installed mdb... 
after playing with it and getting no symbol errors finally worked it 
out...  This worked.

*root at colossus:/usr/src # mdb -Mkwe "spa_load_verify_metadata/W 0"
Preloading module symbols: [ kernel uhid.ko ums.ko mac_ntpd.ko zfs.ko 
opensolaris.ko ]
zfs.ko`spa_load_verify_metadata:0x1             =       0x0
Segmentation fault (core dumped)
root at colossus:/usr/src #*

(I had already worked out with mdb *spa_load_verify_metadata=0* causes a 
'LOADED' state)...

Then I was able to run the following, I had already noted and identified 
transaction 24628146 was the latest, but the latest that was 'complete' 
(commited/uncorrupt) is 24628138 so...

root at colossus:/usr/src # zpool import -fT 24628138 storage
cannot mount 'storage': Input/output error
Unsupported share protocol: 1.
root at colossus:/usr/src # zpool status -v
   pool: storage
  state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
     continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
   scan: resilver in progress since Thu Mar  7 19:06:14 2019
     14.9T scanned at 2.06G/s, 13.4T issued at 615M/s, 28.8T total
     863G resilvered, 46.39% done, 0 days 07:19:25 to go
config:

     NAME                    STATE     READ WRITE CKSUM
     storage                 ONLINE       0     0     2
       raidz2-0              ONLINE       0     0     8
         mfid8               ONLINE       0     0     0
         mfid7               ONLINE       0     0     0
         mfid12              ONLINE       0     0     0
         mfid11              ONLINE       0     0     0
         mfid0               ONLINE       0     0     0
         mfid5               ONLINE       0     0     0
         mfid4               ONLINE       0     0     0
         mfid3               ONLINE       0     0     0
         mfid2               ONLINE       0     0     0
         spare-9             ONLINE       0     0 4.38K
           mfid14            ONLINE       0     0     0
           mfid15            ONLINE       0     0     0
         mfid10              ONLINE       0     0     0
         mfid6               ONLINE       0     0     0
         mfid13              ONLINE       0     0     0
         mfid9               ONLINE       0     0     0
         mfid1               ONLINE       0     0     0
     spares
       12144659313369122799  INUSE     was /dev/mfid15

errors: Permanent errors have been detected in the following files:

         <metadata>:<0x5d>
         storage:<0x0>
root at colossus:/usr/src #

So currently it appears imported but not mounted (don't care) and it's 
currently resilvering.  When complete I intend to scrub, export and 
reimport which hopefully will have resolved the issues... will let you 
all know... but for the forums and archives....

This is a God-send: 
https://www.delphix.com/blog/openzfs-pool-import-recovery

To get mdb working you *must* currently use -M to preload the modules.

Regards,

Michelle

-- 
Michelle Sullivan
http://www.mhix.org/



More information about the freebsd-fs mailing list