ZFS Kernel Panic on 10.0-RELEASE

Mike Carlson mike at bayphoto.com
Tue Jun 3 00:54:47 UTC 2014


On 6/2/2014 5:37 PM, Mike Carlson wrote:
> On 6/2/2014 5:29 PM, Steven Hartland wrote:
>>
>> ----- Original Message ----- From: "Mike Carlson" <mike at bayphoto.com>
>> To: "Steven Hartland" <killing at multiplay.co.uk>; 
>> <freebsd-fs at freebsd.org>
>> Sent: Monday, June 02, 2014 11:57 PM
>> Subject: Re: ZFS Kernel Panic on 10.0-RELEASE
>>
>>
>>> On 6/2/2014 2:15 PM, Steven Hartland wrote:
>>>> ----- Original Message ----- From: "Mike Carlson" <mike at bayphoto.com>
>>>>
>>>>>> Thats the line I gathered it was on but no I need to know what 
>>>>>> the value
>>>>>> of vd is, so what you need to do is:
>>>>>> print vd
>>>>>>
>>>>>> If thats valid then:
>>>>>> print *vd
>>>>>>
>>>>> It reports:
>>>>>
>>>>> (kgdb) print *vd
>>>>> No symbol "vd" in current context.
>>>>
>>>> Dam optimiser :(
>>>>
>>>>> Should I rebuild the kernel with additional options?
>>>>
>>>> Likely wont help as kernel with zero optimisations tends to fail
>>>> to build in my experience :(
>>>>
>>>> Can you try applying the attached patch to your src e.g.
>>>> cd /usr/src
>>>> patch < zfs-dsize-dva-check.patch
>>>>
>>>> The rebuild, install the kernel and then reproduce the issue again.
>>>>
>>>> Hopefully it will provide some more information on the cause, but
>>>> I suspect you might be seeing the effect os have some corruption.
>>>
>>> Well, after building the kernel with your patch, installing it and 
>>> booting off of it, the system does not panic.
>>>
>>> It reports this when I mount the filesystem:
>>>
>>>    Solaris: WARNING: dva_get_dsize_sync(): bad DVA 131241:2147483648
>>>    Solaris: WARNING: dva_get_dsize_sync(): bad DVA 131241:2147483648
>>>    Solaris: WARNING: dva_get_dsize_sync(): bad DVA 131241:2147483648
>>>
>>> Here is the results, I can now mount the file system!
>>>
>>>    root at working-1:~ # zfs set canmount=on zroot/data/working
>>>    root at working-1:~ # zfs mount zroot/data/working
>>>    root at working-1:~ # df
>>>    Filesystem                 1K-blocks       Used Avail Capacity    
>>> Mounted on
>>>    zroot                     2677363378    1207060 2676156318     
>>> 0%    /
>>>    devfs                              1          1 0   100% /dev
>>>    /dev/mfid10p1              253911544    2827824 230770800 1%      
>>> /dump
>>>    zroot/home                2676156506        188 2676156318     
>>> 0%      /home
>>>    zroot/data                2676156389         71 2676156318     
>>> 0%      /mnt/data
>>>    zroot/usr/ports/distfiles 2676246609      90291 2676156318     
>>> 0%      /mnt/usr/ports/distfiles
>>>    zroot/usr/ports/packages  2676158702       2384 2676156318     
>>> 0%      /mnt/usr/ports/packages
>>>    zroot/tmp                 2676156812        493 2676156318     
>>> 0%      /tmp
>>>    zroot/usr                 2679746045    3589727 2676156318     
>>> 0%      /usr
>>>    zroot/usr/ports           2676986896     830578 2676156318     
>>> 0%      /usr/ports
>>>    zroot/usr/src             2676643553     487234 2676156318     
>>> 0%      /usr/src
>>>    zroot/var                 2676650671     494353 2676156318     
>>> 0%      /var
>>>    zroot/var/crash           2676156388         69 2676156318     
>>> 0%      /var/crash
>>>    zroot/var/db              2677521200    1364882 2676156318     
>>> 0%      /var/db
>>>    zroot/var/db/pkg          2676198058      41740 2676156318     
>>> 0%      /var/db/pkg
>>>    zroot/var/empty           2676156387         68 2676156318     
>>> 0%      /var/empty
>>>    zroot/var/log             2676168522      12203 2676156318     
>>> 0%      /var/log
>>>    zroot/var/mail            2676157043        725 2676156318     
>>> 0%      /var/mail
>>>    zroot/var/run             2676156508        190 2676156318     
>>> 0%      /var/run
>>>    zroot/var/tmp             2676156389         71 2676156318     
>>> 0%      /var/tmp
>>>    zroot/data/working        7664687468 4988531149 2676156318 65%   
>>>    /mnt/data/working
>>>    root at working-1:~ # ls /mnt/data/working/
>>>    DONE_ORDERS             DP2_CMD NEW_MULTI_TESTING PROCESS
>>>    RECYCLER                XML_NOTIFICATIONS       XML_REPORTS
>>
>> That does indeed seem to indicated some on disk corruption.
>>
>> There are a number of cases in the code which have a similar check but
>> I'm afraid I don't know the implications of the corruption your
>> seeing but others may.
>>
>> The attached updated patch will enforce the safe panic in this case
>> unless the sysctl vfs.zfs.recover is set to 1 (which can also now be
>> done on  the fly).
>>
>> I'd recommend backing up the data off the pool and restoring it else
>> where.
>>
>> It would be interesting to see the output of the following command
>> on your pool:
>> zdb -uuumdC <pool>
>>
>>    Regards
>>    Steve
>
Scratch that last one, the cachefile had to be reset on the pool to 
/boot/zfs/zpool.cache

So I'm running it now, and its taking so long to traverse all blocks, 
that it is telling me its going to take around 5400 HOURS

I guess I'll report back 90 days?


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6054 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20140602/5be072f8/attachment.bin>


More information about the freebsd-fs mailing list