Freebsd 11.0 RELEASE - ZFS deadlock

Henri Hennebert hlh at restart.be
Mon Nov 14 09:35:24 UTC 2016



On 11/14/2016 10:07, Andriy Gapon wrote:
> On 13/11/2016 15:28, Henri Hennebert wrote:
>> On 11/13/2016 11:06, Andriy Gapon wrote:
>>> On 12/11/2016 14:40, Henri Hennebert wrote:

> [snip]
>
> Could you please show 'info local' in frame 14?
> I expected that 'nd' variable would be defined there and it may contain some
> useful information.
>
No luck there:

(kgdb) fr 14
#14 0xffffffff80636838 in kern_statat (td=0xfffff80009ba0500, 
flag=<value optimized out>, fd=-100, path=0x0,
     pathseg=<value optimized out>, sbp=<value optimized out>, 
hook=0x800e2a388) at /usr/src/sys/kern/vfs_syscalls.c:2160
2160		if ((error = namei(&nd)) != 0)
(kgdb) info local
rights = <value optimized out>
nd = <value optimized out>
error = <value optimized out>
sb = <value optimized out>
(kgdb)


>> I also try to get information from the execve of the other treads:
>>
>> for tid 101250:
>> (kgdb) fr 10
>> #10 0xffffffff80508ccc in sys_execve (td=0xfffff800b6429000,
>> uap=0xfffffe010184fb80) at /usr/src/sys/kern/kern_exec.c:218
>> 218            error = kern_execve(td, &args, NULL);
>> (kgdb) print *uap
>> $4 = {fname_l_ = 0xfffffe010184fb80 "`\220\217\002\b", fname = 0x8028f9060
>> <Address 0x8028f9060 out of bounds>,
>>   fname_r_ = 0xfffffe010184fb88 "`¶ÿÿÿ\177", argv_l_ = 0xfffffe010184fb88
>> "`¶ÿÿÿ\177", argv = 0x7fffffffb660,
>>   argv_r_ = 0xfffffe010184fb90 "\bÜÿÿÿ\177", envv_l_ = 0xfffffe010184fb90
>> "\bÜÿÿÿ\177", envv = 0x7fffffffdc08,
>>   envv_r_ = 0xfffffe010184fb98 ""}
>> (kgdb)
>>
>> for tid 101243:
>>
>> (kgdb) f 15
>> #15 0xffffffff80508ccc in sys_execve (td=0xfffff800b642b500,
>> uap=0xfffffe010182cb80) at /usr/src/sys/kern/kern_exec.c:218
>> 218            error = kern_execve(td, &args, NULL);
>> (kgdb) print *uap
>> $5 = {fname_l_ = 0xfffffe010182cb80 "ÀÏ\205\002\b", fname = 0x80285cfc0 <Address
>> 0x80285cfc0 out of bounds>,
>>   fname_r_ = 0xfffffe010182cb88 "`¶ÿÿÿ\177", argv_l_ = 0xfffffe010182cb88
>> "`¶ÿÿÿ\177", argv = 0x7fffffffb660,
>>   argv_r_ = 0xfffffe010182cb90 "\bÜÿÿÿ\177", envv_l_ = 0xfffffe010182cb90
>> "\bÜÿÿÿ\177", envv = 0x7fffffffdc08,
>>   envv_r_ = 0xfffffe010182cb98 ""}
>> (kgdb)
>
> I think that you see garbage in those structures because they contain pointers
> to userland data.
>
> Hmm, I've just noticed another interesting thread:
> Thread 668 (Thread 101245):
> #0  sched_switch (td=0xfffff800b642aa00, newtd=0xfffff8000285f000, flags=<value
> optimized out>) at /usr/src/sys/kern/sched_ule.c:1973
> #1  0xffffffff80561ae2 in mi_switch (flags=<value optimized out>, newtd=0x0) at
> /usr/src/sys/kern/kern_synch.c:455
> #2  0xffffffff805ae8da in sleepq_wait (wchan=0x0, pri=0) at
> /usr/src/sys/kern/subr_sleepqueue.c:646
> #3  0xffffffff805614b1 in _sleep (ident=<value optimized out>, lock=<value
> optimized out>, priority=<value optimized out>, wmesg=0xffffffff809c51bc
> "vmpfw", sbt=0, pr=<value optimized out>, flags=<value optimized out>) at
> /usr/src/sys/kern/kern_synch.c:229
> #4  0xffffffff8089d1c1 in vm_page_busy_sleep (m=0xfffff800df68cd40, wmesg=<value
> optimized out>) at /usr/src/sys/vm/vm_page.c:753
> #5  0xffffffff8089dd4d in vm_page_sleep_if_busy (m=0xfffff800df68cd40,
> msg=0xffffffff809c51bc "vmpfw") at /usr/src/sys/vm/vm_page.c:1086
> #6  0xffffffff80886be9 in vm_fault_hold (map=<value optimized out>, vaddr=<value
> optimized out>, fault_type=4 '\004', fault_flags=0, m_hold=0x0) at
> /usr/src/sys/vm/vm_fault.c:495
> #7  0xffffffff80885448 in vm_fault (map=0xfffff80011d66000, vaddr=<value
> optimized out>, fault_type=4 '\004', fault_flags=<value optimized out>) at
> /usr/src/sys/vm/vm_fault.c:273
> #8  0xffffffff808d3c49 in trap_pfault (frame=0xfffffe0101836c00, usermode=1) at
> /usr/src/sys/amd64/amd64/trap.c:741
> #9  0xffffffff808d3386 in trap (frame=0xfffffe0101836c00) at
> /usr/src/sys/amd64/amd64/trap.c:333
> #10 0xffffffff808b7af1 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236

This tread is another program from the news system:
668 Thread 101245 (PID=49124: innfeed)  sched_switch 
(td=0xfffff800b642aa00, newtd=0xfffff8000285f000, flags=<value optimized 
out>) at /usr/src/sys/kern/sched_ule.c:1973

>
> I strongly suspect that this is thread that we were looking for.
> I think that it has the vnode lock in the shared mode while trying to fault in a
> page.
>
> Could you please check that by going to frame 6 and printing 'fs' and '*fs.vp'?
> It'd be interesting to understand why this thread is waiting here.
> So, please also print '*fs.m' and '*fs.object'.

No luck :-(
(kgdb) fr 6
#6  0xffffffff80886be9 in vm_fault_hold (map=<value optimized out>, 
vaddr=<value optimized out>, fault_type=4 '\004',
     fault_flags=0, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:495
495						vm_page_sleep_if_busy(fs.m, "vmpfw");
(kgdb) print fs
Cannot access memory at address 0xffff00001fa0
(kgdb)

Henri


More information about the freebsd-stable mailing list