ZFS Kernel Panic on 10.0-RELEASE

Tue Jun 3 00:29:09 UTC 2014

----- Original Message ----- 
From: "Mike Carlson" <mike at bayphoto.com>
To: "Steven Hartland" <killing at multiplay.co.uk>; <freebsd-fs at freebsd.org>
Sent: Monday, June 02, 2014 11:57 PM
Subject: Re: ZFS Kernel Panic on 10.0-RELEASE

> On 6/2/2014 2:15 PM, Steven Hartland wrote:
>> ----- Original Message ----- From: "Mike Carlson" <mike at bayphoto.com>
>>
>>>> Thats the line I gathered it was on but no I need to know what the 
>>>> value
>>>> of vd is, so what you need to do is:
>>>> print vd
>>>>
>>>> If thats valid then:
>>>> print *vd
>>>>
>>> It reports:
>>>
>>> (kgdb) print *vd
>>> No symbol "vd" in current context.
>>
>> Dam optimiser :(
>>
>>> Should I rebuild the kernel with additional options?
>>
>> Likely wont help as kernel with zero optimisations tends to fail
>> to build in my experience :(
>>
>> Can you try applying the attached patch to your src e.g.
>> cd /usr/src
>> patch < zfs-dsize-dva-check.patch
>>
>> The rebuild, install the kernel and then reproduce the issue again.
>>
>> Hopefully it will provide some more information on the cause, but
>> I suspect you might be seeing the effect os have some corruption.
>
> Well, after building the kernel with your patch, installing it and 
> booting off of it, the system does not panic.
>
> It reports this when I mount the filesystem:
>
>    Solaris: WARNING: dva_get_dsize_sync(): bad DVA 131241:2147483648
>    Solaris: WARNING: dva_get_dsize_sync(): bad DVA 131241:2147483648
>    Solaris: WARNING: dva_get_dsize_sync(): bad DVA 131241:2147483648
>
> Here is the results, I can now mount the file system!
>
>    root at working-1:~ # zfs set canmount=on zroot/data/working
>    root at working-1:~ # zfs mount zroot/data/working
>    root at working-1:~ # df
>    Filesystem                 1K-blocks       Used Avail Capacity 
>    Mounted on
>    zroot                     2677363378    1207060 2676156318     0%    /
>    devfs                              1          1 0   100%    /dev
>    /dev/mfid10p1              253911544    2827824 230770800     1%   
>    /dump
>    zroot/home                2676156506        188 2676156318     0%   
>    /home
>    zroot/data                2676156389         71 2676156318     0%   
>    /mnt/data
>    zroot/usr/ports/distfiles 2676246609      90291 2676156318     0%   
>    /mnt/usr/ports/distfiles
>    zroot/usr/ports/packages  2676158702       2384 2676156318     0%   
>    /mnt/usr/ports/packages
>    zroot/tmp                 2676156812        493 2676156318     0%   
>    /tmp
>    zroot/usr                 2679746045    3589727 2676156318     0%   
>    /usr
>    zroot/usr/ports           2676986896     830578 2676156318     0%   
>    /usr/ports
>    zroot/usr/src             2676643553     487234 2676156318     0%   
>    /usr/src
>    zroot/var                 2676650671     494353 2676156318     0%   
>    /var
>    zroot/var/crash           2676156388         69 2676156318     0%   
>    /var/crash
>    zroot/var/db              2677521200    1364882 2676156318     0%   
>    /var/db
>    zroot/var/db/pkg          2676198058      41740 2676156318     0%   
>    /var/db/pkg
>    zroot/var/empty           2676156387         68 2676156318     0%   
>    /var/empty
>    zroot/var/log             2676168522      12203 2676156318     0%   
>    /var/log
>    zroot/var/mail            2676157043        725 2676156318     0%   
>    /var/mail
>    zroot/var/run             2676156508        190 2676156318     0%   
>    /var/run
>    zroot/var/tmp             2676156389         71 2676156318     0%   
>    /var/tmp
>    zroot/data/working        7664687468 4988531149 2676156318    65%   
>    /mnt/data/working
>    root at working-1:~ # ls /mnt/data/working/
>    DONE_ORDERS             DP2_CMD NEW_MULTI_TESTING       PROCESS
>    RECYCLER                XML_NOTIFICATIONS       XML_REPORTS

That does indeed seem to indicated some on disk corruption.

There are a number of cases in the code which have a similar check but
I'm afraid I don't know the implications of the corruption your
seeing but others may.

The attached updated patch will enforce the safe panic in this case
unless the sysctl vfs.zfs.recover is set to 1 (which can also now be
done on  the fly).

I'd recommend backing up the data off the pool and restoring it else
where.

It would be interesting to see the output of the following command
on your pool:
zdb -uuumdC <pool>

    Regards
    Steve
-------------- next part --------------
A non-text attachment was scrubbed...
Name: zfs-dsize-dva-check.patch
Type: application/octet-stream
Size: 1188 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20140603/76e984ea/attachment.obj>