zpool import reboots computer

Martin Ranne martin.ranne at kockumsonics.com
Mon Jan 23 18:33:07 UTC 2012


>On 2012-01-23 15:59, Andriy Gapon wrote: 
>>on 23/01/2012 16:38 Martin Ranne said the following:
>>>To me it looks like in the vdev_mirror_child_select function mc->mc_vd could be
>>>NULL although the code doesn't expect it.  You can add some code to the function
>>>to check if the hypothesis is correct and to skip a loop if mc->mc_vd is NULL.
>>>Such a hack is probably not needed in general, but given that your pool could be
>>>corrupted, this could be your chance to get access to it.

>>>BTW, restoring from backups is what is usually recommended first in a situation
>>>like this.

>>I know it would be recommended first to restore from backup but there were backup failures.

>>Am back after the weekend. I have done the hack in vdev_mirror_child_select function as per the code below.
>>if (mc->mc_tried || mc->mc_skipped)
>>        continue;
>># hack start
>>if (mc->mc_vd == NULL)
>>        break;
>># hack end
>>if (!vdev_readable(mc->mc_vd)) {
>>I am not getting the fault virtual address at 0x38 and 0x88 but instead get two at 0x88. The function it stops at is zio_vdev_child_io. Is there another hack i could do there?
>You could try a similar hack in vdev_mirror_io_start().
>Please note that there are two loops in there.

>BTW, if you run kgdb /path/to/kernel/that/paniced, you can do e.g. 'info line
>*zio_vdev_child_io+0x25" to see on what line the trap occurred.
>I have now tried with the hack in vdev_mirror_io_start() like below and the one i previously did in vdev_mirror_child_select(). Unfortunately I get the same crash as i sent earlier today. It takes time to get into DDB for a crash as >the computer freezes 19/20 times when i do the zpool import and if i try to save a dump, the comptuer freezes so I can not use that.

Have done some checking and found mc->mc_vd == NULL in the function vdev_mirror_io_start() where the while-loop is. 

while (children--) { 
    mc = &mm->mm_child[c];
    zio_nowait(zio_vdev_child_io(zio, zio->io_bp,
        mc->mc_vd, mc->mc_offset, zio->io_data, zio->io_size,
        zio->io_type, zio->io_priority, 0,
        vdev_mirror_child_done, mc));
    c++;
}

if i set a break before it runs zio_nowait() it will still crash the kernel. 
What can i check next for it to be able to continue? Is it possible to have it ignore the child where mc_vd is NULL? I am also looking into what more I can do to debug it (adding code to print to console as i can not use kernel dumps).


>>Crash and bt below.
>>Fatal trap 12: page fault while in kernel mode
>>cpuid = 1;
>>apic id = 01
>>Fatal trap 12: page fault while in kernel mode
>>fault virtual address   = 0x88
>>cpuid = 5; fault code           = supervisor read data, page not present
>>apic id = 05
>>instruction pointer     = 0x20:0xffffffff814a7ee5
>>fault virtual address   = 0x88
>>stack pointer           = 0x28:0xffffff8c0d564f00
>>fault code              = supervisor read data, page not present
>>frame pointer           = 0x28:0xffffff8c0d564f70
>>instruction pointer     = 0x20:0xffffffff814a7ee5
>>code segment            = base 0x0, limit 0xfffff, type 0x1b
>>stack pointer           = 0x28:0xffffff8c1009aad0
>>                        = DPL 0, pres 1, long 1, def32 0, gran 1
>>frame pointer           = 0x28:0xffffff8c1009ab40
>>processor eflags        = code segment          = base 0x0, limit 0xfffff, type 0x1b
>>interrupt enabled,                      = DPL 0, pres 1, long 1, def32 0, gran 1
>>resume, processor eflags        = IOPL = 0
>>interrupt enabled, current process              = resume, 0 (system_taskq_3)
>>I[ thread pid 0 tid 100099 ]
>>Stopped at      zio_vdev_child_io+0x25: cmpq    $0, 0x88(%r10)
>>db> bt
>>Tracing pid 0 tid 100099 td 0xfffffe000ee4e460
>>zio_vdev_child_io() at zio_vdev_child_io+0x25
>>vdev_mirror_io_start() at vdev_mirror_io_start+0x16c
>>zio_vdev_io_start() at zio_vdev_io_start+0x232
>>zio_execute() at zio_execute+0xc3
>>zio_gang_assemble() at zio_gang_assemble+0x1b
>>zio_execute() at zio_execute+0xc3
>>arc_read_nolock() at arc_read_nolock+0x6d1
>>arc_read() at arc_read+0x93
>>traverse_prefetcher() at traverse_prefetcher+0x103
>>traverse_visitbp() at traverse_visitbp+0x21c
>>traverse_dnode() at traverse_dnode+0x7c
>>traverse_visitbp() at traverse_visitbp+0x3ff
>>traverse_visitbp() at traverse_visitbp+0x316
>>traverse_visitbp() at traverse_visitbp+0x316
>>traverse_visitbp() at traverse_visitbp+0x316
>>traverse_visitbp() at traverse_visitbp+0x316
>>traverse_visitbp() at traverse_visitbp+0x316
>>traverse_visitbp() at traverse_visitbp+0x316
>>traverse_dnode() at traverse_dnode+0x7c
>>traverse_visitbp() at traverse_visitbp+0x48c
>>traverse_prefetch_thread() at traverse_prefetch_thread+0x78
>>taskq_run() at taskq_run+0x13
>>taskqueue_run_locked() at taskqueue_run_locked+0x85
>>taskqueue_thread_loop() at taskqueue_thread_loop+0x46
>>fork_exit() at fork_exit+0x11f
>>fork_trampoline() at fork_trampoline+0xe
>>--- trap 0, rip = 0, rsp = 0xffffff8c0d565d00, rbp = 0 ---
>>db>
>>
>>
>>//Martin Ranne
________________________________________
________________________________________
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1901 / Virus Database: 2109/4761 - Release Date: 01/23/12


More information about the freebsd-fs mailing list