zfs corruption (again) due to interupted resilver and power faults.
Michelle Sullivan
michelle at sorbs.net
Sun Mar 31 13:11:59 UTC 2019
Stefan Esser wrote:
> Am 20.03.19 um 08:15 schrieb Michelle Sullivan:
>> Michelle Sullivan wrote:
>>> Trying now thanks (and no I hadn’t - wasn’t aware of the sysctl)
>> Failed with the same old...
>>
>> http://flashback.sorbs.net/packages/zfs/image6.jpeg
> Hi Michelle,
>
> when I was in a somewhat similar situation, I recovered my pool
> (at least to copy it to new disk drives) by patching the ZFS code
> to ignore certain error aborts.
>
> Testing is possible with zdb, since it uses the same source files
> as the kernel module for all ZFS accesses.
>
> I identified the test that failed and made it non-fatal (issue a
> warning but continue). This lead to inconsistent checksums, since
> they were not correctly updated in the failure case. I had to make
> these checksum checks non-fatal, too.
>
> All testing can be done by issuing zdb commands, but I do not
> remember the exact options. Option -AAA is at least required, to
> make most checks non-fatal, but it was not sufficient.
>
> I cannot offer any more specific help, I'm afraid.
>
> Good luck in recovering your pool!
>
> Regards, STefan
Finally made progress..
Booted 12-STABLE on a USB key - installed to a USB external drive and
booted that.
Built a debug kernel, installed and booted it, then installed mdb...
after playing with it and getting no symbol errors finally worked it
out... This worked.
*root at colossus:/usr/src # mdb -Mkwe "spa_load_verify_metadata/W 0"
Preloading module symbols: [ kernel uhid.ko ums.ko mac_ntpd.ko zfs.ko
opensolaris.ko ]
zfs.ko`spa_load_verify_metadata:0x1 = 0x0
Segmentation fault (core dumped)
root at colossus:/usr/src #*
(I had already worked out with mdb *spa_load_verify_metadata=0* causes a
'LOADED' state)...
Then I was able to run the following, I had already noted and identified
transaction 24628146 was the latest, but the latest that was 'complete'
(commited/uncorrupt) is 24628138 so...
root at colossus:/usr/src # zpool import -fT 24628138 storage
cannot mount 'storage': Input/output error
Unsupported share protocol: 1.
root at colossus:/usr/src # zpool status -v
pool: storage
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Thu Mar 7 19:06:14 2019
14.9T scanned at 2.06G/s, 13.4T issued at 615M/s, 28.8T total
863G resilvered, 46.39% done, 0 days 07:19:25 to go
config:
NAME STATE READ WRITE CKSUM
storage ONLINE 0 0 2
raidz2-0 ONLINE 0 0 8
mfid8 ONLINE 0 0 0
mfid7 ONLINE 0 0 0
mfid12 ONLINE 0 0 0
mfid11 ONLINE 0 0 0
mfid0 ONLINE 0 0 0
mfid5 ONLINE 0 0 0
mfid4 ONLINE 0 0 0
mfid3 ONLINE 0 0 0
mfid2 ONLINE 0 0 0
spare-9 ONLINE 0 0 4.38K
mfid14 ONLINE 0 0 0
mfid15 ONLINE 0 0 0
mfid10 ONLINE 0 0 0
mfid6 ONLINE 0 0 0
mfid13 ONLINE 0 0 0
mfid9 ONLINE 0 0 0
mfid1 ONLINE 0 0 0
spares
12144659313369122799 INUSE was /dev/mfid15
errors: Permanent errors have been detected in the following files:
<metadata>:<0x5d>
storage:<0x0>
root at colossus:/usr/src #
So currently it appears imported but not mounted (don't care) and it's
currently resilvering. When complete I intend to scrub, export and
reimport which hopefully will have resolved the issues... will let you
all know... but for the forums and archives....
This is a God-send:
https://www.delphix.com/blog/openzfs-pool-import-recovery
To get mdb working you *must* currently use -M to preload the modules.
Regards,
Michelle
--
Michelle Sullivan
http://www.mhix.org/
More information about the freebsd-fs
mailing list