Freebsd 11.0 RELEASE - ZFS deadlock

Mon Nov 14 12:01:01 UTC 2016


On 11/14/2016 12:45, Andriy Gapon wrote:
> On 14/11/2016 11:35, Henri Hennebert wrote:
>>
>>
>> On 11/14/2016 10:07, Andriy Gapon wrote:
>>> Hmm, I've just noticed another interesting thread:
>>> Thread 668 (Thread 101245):
>>> #0  sched_switch (td=0xfffff800b642aa00, newtd=0xfffff8000285f000, flags=<value
>>> optimized out>) at /usr/src/sys/kern/sched_ule.c:1973
>>> #1  0xffffffff80561ae2 in mi_switch (flags=<value optimized out>, newtd=0x0) at
>>> /usr/src/sys/kern/kern_synch.c:455
>>> #2  0xffffffff805ae8da in sleepq_wait (wchan=0x0, pri=0) at
>>> /usr/src/sys/kern/subr_sleepqueue.c:646
>>> #3  0xffffffff805614b1 in _sleep (ident=<value optimized out>, lock=<value
>>> optimized out>, priority=<value optimized out>, wmesg=0xffffffff809c51bc
>>> "vmpfw", sbt=0, pr=<value optimized out>, flags=<value optimized out>) at
>>> /usr/src/sys/kern/kern_synch.c:229
>>> #4  0xffffffff8089d1c1 in vm_page_busy_sleep (m=0xfffff800df68cd40, wmesg=<value
>>> optimized out>) at /usr/src/sys/vm/vm_page.c:753
>>> #5  0xffffffff8089dd4d in vm_page_sleep_if_busy (m=0xfffff800df68cd40,
>>> msg=0xffffffff809c51bc "vmpfw") at /usr/src/sys/vm/vm_page.c:1086
>>> #6  0xffffffff80886be9 in vm_fault_hold (map=<value optimized out>, vaddr=<value
>>> optimized out>, fault_type=4 '\004', fault_flags=0, m_hold=0x0) at
>>> /usr/src/sys/vm/vm_fault.c:495
>>> #7  0xffffffff80885448 in vm_fault (map=0xfffff80011d66000, vaddr=<value
>>> optimized out>, fault_type=4 '\004', fault_flags=<value optimized out>) at
>>> /usr/src/sys/vm/vm_fault.c:273
>>> #8  0xffffffff808d3c49 in trap_pfault (frame=0xfffffe0101836c00, usermode=1) at
>>> /usr/src/sys/amd64/amd64/trap.c:741
>>> #9  0xffffffff808d3386 in trap (frame=0xfffffe0101836c00) at
>>> /usr/src/sys/amd64/amd64/trap.c:333
>>> #10 0xffffffff808b7af1 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236
>>
>> This tread is another program from the news system:
>> 668 Thread 101245 (PID=49124: innfeed)  sched_switch (td=0xfffff800b642aa00,
>> newtd=0xfffff8000285f000, flags=<value optimized out>) at
>> /usr/src/sys/kern/sched_ule.c:1973
>>
>>>
>>> I strongly suspect that this is thread that we were looking for.
>>> I think that it has the vnode lock in the shared mode while trying to fault in a
>>> page.
>>>

--clip--

>
> Okay.  Luckily for us, it seems that 'm' is available in frame 5.  It also
> happens to be the first field of 'struct faultstate'.  So, could you please go
> to frame and print '*m' and '*(struct faultstate *)m' ?
>
(kgdb) fr 4
#4  0xffffffff8089d1c1 in vm_page_busy_sleep (m=0xfffff800df68cd40, 
wmesg=<value optimized out>) at /usr/src/sys/vm/vm_page.c:753
753		msleep(m, vm_page_lockptr(m), PVM | PDROP, wmesg, 0);
(kgdb) print *m
$1 = {plinks = {q = {tqe_next = 0xfffff800dc5d85b0, tqe_prev = 
0xfffff800debf3bd0}, s = {ss = {sle_next = 0xfffff800dc5d85b0},
       pv = 0xfffff800debf3bd0}, memguard = {p = 18446735281313646000, v 
= 18446735281353604048}}, listq = {tqe_next = 0x0,
     tqe_prev = 0xfffff800dc5d85c0}, object = 0xfffff800b62e9c60, pindex 
= 11, phys_addr = 3389358080, md = {pv_list = {
       tqh_first = 0x0, tqh_last = 0xfffff800df68cd78}, pv_gen = 426, 
pat_mode = 6}, wire_count = 0, busy_lock = 6, hold_count = 0,
   flags = 0, aflags = 2 '\002', oflags = 0 '\0', queue = 0 '\0', psind 
= 0 '\0', segind = 3 '\003', order = 13 '\r',
   pool = 0 '\0', act_count = 0 '\0', valid = 0 '\0', dirty = 0 '\0'}
(kgdb) print *(struct faultstate *)m
$2 = {m = 0xfffff800dc5d85b0, object = 0xfffff800debf3bd0, pindex = 0, 
first_m = 0xfffff800dc5d85c0,
   first_object = 0xfffff800b62e9c60, first_pindex = 11, map = 
0xca058000, entry = 0x0, lookup_still_valid = -546779784,
   vp = 0x6000001aa}
(kgdb)