ZFS pool faulted (corrupt metadata) but the disk data appears ok...

Mon Feb 9 15:14:38 UTC 2015

Am 09.02.2015 um 14:19 schrieb Michelle Sullivan:
> Stefan Esser wrote:
>>
>> The point were zdb seg faults hints at the data structure that is
>> corrupt. You may get some output before the seg fault, if you add
>> a number of -v options (they add up to higher verbosity).
>>
>> Else, you may be able to look at the core and identify the function
>> that fails. You'll most probably need zdb and libzfs compiled with
>> "-g" to get any useful information from the core, though.
>>
>> For my failed pool, I noticed that internal assumptions were
>> violated, due to some free space occuring in more than one entry.
>> I had to special case the test in some function to ignore this
>> situation (I knew that I'd only ever wanted to mount that pool
>> R/O to rescue my data). But skipping the test did not suffice,
>> since another assert triggered (after skipping the NULL dereference,
>> the calculated sum of free space did not match the recorded sum, I
>> had to disable that assert, too). With these two patches I was able
>> to recover the pool starting at a TXG less than 100 transactions back,
>> which was sufficient for my purpose ...
>>   
> 
> Question is will zdb 'fix' things or is it just a debug utility (for
> displaying)?

The purpose of zdf is to access the pool without the need to import
it (which tends to crash the kernel) and to possibly identify a safe
TXG to go back to. Once you have found that zdb survives accesses to
critical data structures of your pool, you can then try to import the
pool to rescue your data.

> If it is just a debug and won't fix anything, I'm quite happy to roll
> back transactions, question is how (presumably after one finds the
> corrupt point - I'm quite happy to just do it by hand until I get
> success - it will save 2+months of work - I did get an output with a
> date/time that indicates where the rollback would go to...)
> 
> In the mean time this appears to be working without crashing - it's been
> running days now...
> 
>   PID USERNAME       THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU
> COMMAND
>  4332 root           209  22    0 23770M 23277M uwait   1 549:07 11.04%
> zdb -AAA -L -uhdi -FX -e storage

Options -u and -h do not take much time, -i depends on how much was
in the intent log (and recovery should be possible without, if your
kernel is not too old with regard to supported ZFS features).

zdb -d takes a long time, and if it succeeds, you should be able to
recover your data. But zdb -m should also run to completion (and ISTR,
in my case that was where my kernel blew up trying to import the pool).

Using the debugger to analyze the failed instruction let me work around
the inconsistency with two small patches (one skipped a consistency
check, the second fixed up the sum of free space which was miscalculated
due to the free block that lead to the panic being omitted).

After I had these patches tested with zdb, I was able to import the pool
into a kernel that included these exact patches. You obviously do not
want to perform any other activities with the patched kernel, since it
lacks some internal checks - it is purely required for the one time
backup operation of the failed pool.

So, zdb and even the patches that make zdb dump your pool's internal
state will not directly lead to access to your data. But if you manage
to print all state with "zdb -dm", chances are very good, that you'll
be able to import the pool - possibly with temporary hacks to libzfs
that skip corrupt data elements (if not strictly required for read
accesses to your data).

After that succeeded, you have a good chance to copy off your data
using a kernel that has the exact same patches in the ZFS driver ...
(if any are required, as in my case).

Regards, STefan