Freebsd 11.0 RELEASE - ZFS deadlock

Andriy Gapon avg at FreeBSD.org
Mon Nov 14 09:08:48 UTC 2016


On 13/11/2016 15:28, Henri Hennebert wrote:
> On 11/13/2016 11:06, Andriy Gapon wrote:
>> On 12/11/2016 14:40, Henri Hennebert wrote:
>>> I attatch it
>>
>> Thank you!
>> So, these two threads are trying to get the lock in the exclusive mode:
>> Thread 687 (Thread 101243):
>> #0  sched_switch (td=0xfffff800b642b500, newtd=0xfffff8000285ea00, flags=<value
>> optimized out>) at /usr/src/sys/kern/sched_ule.c:1973
>> #1  0xffffffff80561ae2 in mi_switch (flags=<value optimized out>, newtd=0x0) at
>> /usr/src/sys/kern/kern_synch.c:455
>> #2  0xffffffff805ae8da in sleepq_wait (wchan=0x0, pri=0) at
>> /usr/src/sys/kern/subr_sleepqueue.c:646
>> #3  0xffffffff8052f854 in sleeplk (lk=<value optimized out>, flags=<value
>> optimized out>, ilk=<value optimized out>, wmesg=0xffffffff813be535 "zfs",
>> pri=<value optimized out>, timo=51) at /usr/src/sys/kern/kern_lock.c:222
>> #4  0xffffffff8052f39d in __lockmgr_args (lk=<value optimized out>, flags=<value
>> optimized out>, ilk=<value optimized out>, wmesg=<value optimized out>,
>> pri=<value optimized out>, timo=<value optimized out>, file=<value optimized
>> out>, line=<value optimized out>) at /usr/src/sys/kern/kern_lock.c:958
>> #5  0xffffffff80616a8c in vop_stdlock (ap=<value optimized out>) at lockmgr.h:98
>> #6  0xffffffff8093784d in VOP_LOCK1_APV (vop=<value optimized out>, a=<value
>> optimized out>) at vnode_if.c:2087
>> #7  0xffffffff8063c5b3 in _vn_lock (vp=<value optimized out>, flags=548864,
>> file=<value optimized out>, line=<value optimized out>) at vnode_if.h:859
>> #8  0xffffffff8062a5f7 in vget (vp=0xfffff80049c2c000, flags=548864,
>> td=0xfffff800b642b500) at /usr/src/sys/kern/vfs_subr.c:2523
>> #9  0xffffffff806118b9 in cache_lookup (dvp=<value optimized out>, vpp=<value
>> optimized out>, cnp=<value optimized out>, tsp=<value optimized out>,
>> ticksp=<value optimized out>) at /usr/src/sys/kern/vfs_cache.c:686
>> #10 0xffffffff806133dc in vfs_cache_lookup (ap=<value optimized out>) at
>> /usr/src/sys/kern/vfs_cache.c:1081
>> #11 0xffffffff80935777 in VOP_LOOKUP_APV (vop=<value optimized out>, a=<value
>> optimized out>) at vnode_if.c:127
>> #12 0xffffffff8061cdf1 in lookup (ndp=<value optimized out>) at vnode_if.h:54
>> #13 0xffffffff8061c492 in namei (ndp=<value optimized out>) at
>> /usr/src/sys/kern/vfs_lookup.c:306
>> #14 0xffffffff80509395 in kern_execve (td=<value optimized out>, args=<value
>> optimized out>, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443
>> #15 0xffffffff80508ccc in sys_execve (td=0xfffff800b642b500,
>> uap=0xfffffe010182cb80) at /usr/src/sys/kern/kern_exec.c:218
>> #16 0xffffffff808d449e in amd64_syscall (td=<value optimized out>, traced=0) at
>> subr_syscall.c:135
>> #17 0xffffffff808b7ddb in Xfast_syscall () at
>> /usr/src/sys/amd64/amd64/exception.S:396
>>
>> Thread 681 (Thread 101147):
>> #0  sched_switch (td=0xfffff80065f4e500, newtd=0xfffff8000285f000, flags=<value
>> optimized out>) at /usr/src/sys/kern/sched_ule.c:1973
>> #1  0xffffffff80561ae2 in mi_switch (flags=<value optimized out>, newtd=0x0) at
>> /usr/src/sys/kern/kern_synch.c:455
>> #2  0xffffffff805ae8da in sleepq_wait (wchan=0x0, pri=0) at
>> /usr/src/sys/kern/subr_sleepqueue.c:646
>> #3  0xffffffff8052f854 in sleeplk (lk=<value optimized out>, flags=<value
>> optimized out>, ilk=<value optimized out>, wmesg=0xffffffff813be535 "zfs",
>> pri=<value optimized out>, timo=51) at /usr/src/sys/kern/kern_lock.c:222
>> #4  0xffffffff8052f39d in __lockmgr_args (lk=<value optimized out>, flags=<value
>> optimized out>, ilk=<value optimized out>, wmesg=<value optimized out>,
>> pri=<value optimized out>, timo=<value optimized out>, file=<value optimized
>> out>, line=<value optimized out>) at /usr/src/sys/kern/kern_lock.c:958
>> #5  0xffffffff80616a8c in vop_stdlock (ap=<value optimized out>) at lockmgr.h:98
>> #6  0xffffffff8093784d in VOP_LOCK1_APV (vop=<value optimized out>, a=<value
>> optimized out>) at vnode_if.c:2087
>> #7  0xffffffff8063c5b3 in _vn_lock (vp=<value optimized out>, flags=548864,
>> file=<value optimized out>, line=<value optimized out>) at vnode_if.h:859
>> #8  0xffffffff8062a5f7 in vget (vp=0xfffff80049c2c000, flags=548864,
>> td=0xfffff80065f4e500) at /usr/src/sys/kern/vfs_subr.c:2523
>> #9  0xffffffff806118b9 in cache_lookup (dvp=<value optimized out>, vpp=<value
>> optimized out>, cnp=<value optimized out>, tsp=<value optimized out>,
>> ticksp=<value optimized out>) at /usr/src/sys/kern/vfs_cache.c:686
>> #10 0xffffffff806133dc in vfs_cache_lookup (ap=<value optimized out>) at
>> /usr/src/sys/kern/vfs_cache.c:1081
>> #11 0xffffffff80935777 in VOP_LOOKUP_APV (vop=<value optimized out>, a=<value
>> optimized out>) at vnode_if.c:127
>> #12 0xffffffff8061cdf1 in lookup (ndp=<value optimized out>) at vnode_if.h:54
>> #13 0xffffffff8061c492 in namei (ndp=<value optimized out>) at
>> /usr/src/sys/kern/vfs_lookup.c:306
>> #14 0xffffffff80509395 in kern_execve (td=<value optimized out>, args=<value
>> optimized out>, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:443
>> #15 0xffffffff80508ccc in sys_execve (td=0xfffff80065f4e500,
>> uap=0xfffffe01016b8b80) at /usr/src/sys/kern/kern_exec.c:218
>> #16 0xffffffff808d449e in amd64_syscall (td=<value optimized out>, traced=0) at
>> subr_syscall.c:135
>> #17 0xffffffff808b7ddb in Xfast_syscall () at
>> /usr/src/sys/amd64/amd64/exception.S:396
> 
> This 2 threads are innd processes. In core.txt.4:
> 
>    8 14789 29165   0   24  4   40040   6612 zfs      DN    - 0:00.00 [innd]
>    8 29165     1   0   20  0   42496   6888 select   Ds    - 0:01.33 [innd]
>    8 49778 29165   0   24  4   40040   6900 zfs      DN    - 0:00.00 [innd]
>    8 82034 29165   0   24  4     132      0 zfs      DN    - 0:00.00 [innd]
> 
> the corresponding info treads are:
> 
>   687 Thread 101243 (PID=49778: innd)  sched_switch (td=0xfffff800b642b500,
> newtd=0xfffff8000285ea00, flags=<value optimized out>) at
> /usr/src/sys/kern/sched_ule.c:1973
>   681 Thread 101147 (PID=14789: innd)  sched_switch (td=0xfffff80065f4e500,
> newtd=0xfffff8000285f000, flags=<value optimized out>) at
> /usr/src/sys/kern/sched_ule.c:1973
>   669 Thread 101250 (PID=82034: innd)  sched_switch (td=0xfffff800b6429000,
> newtd=0xfffff8000285ea00, flags=<value optimized out>) at
> /usr/src/sys/kern/sched_ule.c:1973
>   665 Thread 101262 (PID=29165: innd)  sched_switch (td=0xfffff800b6b54a00,
> newtd=0xfffff8000285ea00, flags=<value optimized out>) at
> /usr/src/sys/kern/sched_ule.c:1973
> 
> So your missing tread must be 101250:

Oh, I missed this thread...
But it looks like another LK_EXCLUSIVE contender rather than the lock owner.

> (kgdb) tid 101250
> [Switching to thread 669 (Thread 101250)]#0  sched_switch
> (td=0xfffff800b6429000, newtd=0xfffff8000285ea00,
>     flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1973
> 1973            cpuid = PCPU_GET(cpuid);
> Current language:  auto; currently minimal
> (kgdb) bt
> #0  sched_switch (td=0xfffff800b6429000, newtd=0xfffff8000285ea00, flags=<value
> optimized out>)
>     at /usr/src/sys/kern/sched_ule.c:1973
> #1  0xffffffff80561ae2 in mi_switch (flags=<value optimized out>, newtd=0x0) at
> /usr/src/sys/kern/kern_synch.c:455
> #2  0xffffffff805ae8da in sleepq_wait (wchan=0x0, pri=0) at
> /usr/src/sys/kern/subr_sleepqueue.c:646
> #3  0xffffffff8052f854 in sleeplk (lk=<value optimized out>, flags=<value
> optimized out>, ilk=<value optimized out>,
>     wmesg=0xffffffff813be535 "zfs", pri=<value optimized out>, timo=51) at
> /usr/src/sys/kern/kern_lock.c:222
> #4  0xffffffff8052f39d in __lockmgr_args (lk=<value optimized out>, flags=<value
> optimized out>, ilk=<value optimized out>,
>     wmesg=<value optimized out>, pri=<value optimized out>, timo=<value
> optimized out>, file=<value optimized out>,
>     line=<value optimized out>) at /usr/src/sys/kern/kern_lock.c:958
> #5  0xffffffff80616a8c in vop_stdlock (ap=<value optimized out>) at lockmgr.h:98
> #6  0xffffffff8093784d in VOP_LOCK1_APV (vop=<value optimized out>, a=<value
> optimized out>) at vnode_if.c:2087
> #7  0xffffffff8063c5b3 in _vn_lock (vp=<value optimized out>, flags=525312,
> file=<value optimized out>, line=<value optimized out>)
>     at vnode_if.h:859
> #8  0xffffffff804e16cf in exec_elf64_imgact (imgp=<value optimized out>) at
> /usr/src/sys/kern/imgact_elf.c:899
> #9  0xffffffff8050983d in kern_execve (td=<value optimized out>, args=<value
> optimized out>, mac_p=0x0)
>     at /usr/src/sys/kern/kern_exec.c:602
> #10 0xffffffff80508ccc in sys_execve (td=0xfffff800b6429000,
> uap=0xfffffe010184fb80) at /usr/src/sys/kern/kern_exec.c:218
> #11 0xffffffff808d449e in amd64_syscall (td=<value optimized out>, traced=0) at
> subr_syscall.c:135
> #12 0xffffffff808b7ddb in Xfast_syscall () at
> /usr/src/sys/amd64/amd64/exception.S:396
> #13 0x000000080217edaa in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> (kgdb)
> 
>>
>> And the original stuck thread wants to get the lock in the shared mode.
>> And there should be another thread that already holds the lock in the shared
>> mode.  But I am not able to identify it.  I wonder if the original thread could
>> be trying to get the lock recursively...
>>
>> It would be interesting to get more details from thread 101112.
>> You can switch to it using tid command, you can use 'fr' to select frames, 'info
>> local' and 'info args' to see what variables are available (not optimized out)
>> and the you can print any that look interesting.  It would be nice to get a file
>> path and a directory vnode where the lookup is called.
>>
>> Thank you.
>>
> I find nothing interresting:
> 
> (kgdb) fr 15
[snip]

Could you please show 'info local' in frame 14?
I expected that 'nd' variable would be defined there and it may contain some
useful information.

> I also try to get information from the execve of the other treads:
> 
> for tid 101250:
> (kgdb) fr 10
> #10 0xffffffff80508ccc in sys_execve (td=0xfffff800b6429000,
> uap=0xfffffe010184fb80) at /usr/src/sys/kern/kern_exec.c:218
> 218            error = kern_execve(td, &args, NULL);
> (kgdb) print *uap
> $4 = {fname_l_ = 0xfffffe010184fb80 "`\220\217\002\b", fname = 0x8028f9060
> <Address 0x8028f9060 out of bounds>,
>   fname_r_ = 0xfffffe010184fb88 "`¶ÿÿÿ\177", argv_l_ = 0xfffffe010184fb88
> "`¶ÿÿÿ\177", argv = 0x7fffffffb660,
>   argv_r_ = 0xfffffe010184fb90 "\bÜÿÿÿ\177", envv_l_ = 0xfffffe010184fb90
> "\bÜÿÿÿ\177", envv = 0x7fffffffdc08,
>   envv_r_ = 0xfffffe010184fb98 ""}
> (kgdb)
> 
> for tid 101243:
> 
> (kgdb) f 15
> #15 0xffffffff80508ccc in sys_execve (td=0xfffff800b642b500,
> uap=0xfffffe010182cb80) at /usr/src/sys/kern/kern_exec.c:218
> 218            error = kern_execve(td, &args, NULL);
> (kgdb) print *uap
> $5 = {fname_l_ = 0xfffffe010182cb80 "ÀÏ\205\002\b", fname = 0x80285cfc0 <Address
> 0x80285cfc0 out of bounds>,
>   fname_r_ = 0xfffffe010182cb88 "`¶ÿÿÿ\177", argv_l_ = 0xfffffe010182cb88
> "`¶ÿÿÿ\177", argv = 0x7fffffffb660,
>   argv_r_ = 0xfffffe010182cb90 "\bÜÿÿÿ\177", envv_l_ = 0xfffffe010182cb90
> "\bÜÿÿÿ\177", envv = 0x7fffffffdc08,
>   envv_r_ = 0xfffffe010182cb98 ""}
> (kgdb)

I think that you see garbage in those structures because they contain pointers
to userland data.

Hmm, I've just noticed another interesting thread:
Thread 668 (Thread 101245):
#0  sched_switch (td=0xfffff800b642aa00, newtd=0xfffff8000285f000, flags=<value
optimized out>) at /usr/src/sys/kern/sched_ule.c:1973
#1  0xffffffff80561ae2 in mi_switch (flags=<value optimized out>, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:455
#2  0xffffffff805ae8da in sleepq_wait (wchan=0x0, pri=0) at
/usr/src/sys/kern/subr_sleepqueue.c:646
#3  0xffffffff805614b1 in _sleep (ident=<value optimized out>, lock=<value
optimized out>, priority=<value optimized out>, wmesg=0xffffffff809c51bc
"vmpfw", sbt=0, pr=<value optimized out>, flags=<value optimized out>) at
/usr/src/sys/kern/kern_synch.c:229
#4  0xffffffff8089d1c1 in vm_page_busy_sleep (m=0xfffff800df68cd40, wmesg=<value
optimized out>) at /usr/src/sys/vm/vm_page.c:753
#5  0xffffffff8089dd4d in vm_page_sleep_if_busy (m=0xfffff800df68cd40,
msg=0xffffffff809c51bc "vmpfw") at /usr/src/sys/vm/vm_page.c:1086
#6  0xffffffff80886be9 in vm_fault_hold (map=<value optimized out>, vaddr=<value
optimized out>, fault_type=4 '\004', fault_flags=0, m_hold=0x0) at
/usr/src/sys/vm/vm_fault.c:495
#7  0xffffffff80885448 in vm_fault (map=0xfffff80011d66000, vaddr=<value
optimized out>, fault_type=4 '\004', fault_flags=<value optimized out>) at
/usr/src/sys/vm/vm_fault.c:273
#8  0xffffffff808d3c49 in trap_pfault (frame=0xfffffe0101836c00, usermode=1) at
/usr/src/sys/amd64/amd64/trap.c:741
#9  0xffffffff808d3386 in trap (frame=0xfffffe0101836c00) at
/usr/src/sys/amd64/amd64/trap.c:333
#10 0xffffffff808b7af1 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236

I strongly suspect that this is thread that we were looking for.
I think that it has the vnode lock in the shared mode while trying to fault in a
page.

Could you please check that by going to frame 6 and printing 'fs' and '*fs.vp'?
It'd be interesting to understand why this thread is waiting here.
So, please also print '*fs.m' and '*fs.object'.

Thanks!

-- 
Andriy Gapon


More information about the freebsd-stable mailing list