kern/188063: deadlock between syncache(4) and pf(4)

Mathieu sigsys at gmail.com
Sat Mar 29 23:10:00 UTC 2014


>Number:         188063
>Category:       kern
>Synopsis:       deadlock between syncache(4) and pf(4)
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Mar 29 23:10:00 UTC 2014
>Closed-Date:
>Last-Modified:
>Originator:     Mathieu
>Release:        9.2-RELEASE-p3
>Organization:
>Environment:
FreeBSD 9.2-RELEASE-p3 amd64
>Description:
We have a server that becomes unresponsive every few weeks or so.  When it happens, the NICs seem dead, and user processes hang in the "tcp" state.  The only way to fix it is rebooting.  This time, I got it to dump core before rebooting.

IIUC, there's a deadlock happening with an inpcb and a syncache_head locks between the "swi1: netisr 0" and "swi4: clock" threads.

No idea where to go from there...


(kgdb) tid 100011
[Switching to thread 41 (Thread 100011)]#3  0xffffffff808ef68e in _mtx_lock_sleep (m=0xffffff80010a7088, tid=18446741874755597456, opts=<value optimized out>,
    file=<value optimized out>, line=<value optimized out>)
    at /usr/src/sys/kern/kern_mutex.c:466
466                     turnstile_wait(ts, mtx_owner(m), TS_EXCLUSIVE_QUEUE);
(kgdb) bt
#0  sched_switch (td=0xfffffe0004217490, newtd=0xfffffe0004209490,
    flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1920
#1  0xffffffff8090d4f4 in mi_switch (flags=259, newtd=0x0)
    at /usr/src/sys/kern/kern_synch.c:485
#2  0xffffffff8094f446 in turnstile_wait (ts=<value optimized out>,
    owner=0xfffffe0004216920, queue=<value optimized out>)
    at /usr/src/sys/kern/subr_turnstile.c:753
#3  0xffffffff808ef68e in _mtx_lock_sleep (m=0xffffff80010a7088,
    tid=18446741874755597456, opts=<value optimized out>,
    file=<value optimized out>, line=<value optimized out>)
    at /usr/src/sys/kern/kern_mutex.c:466
#4  0xffffffff80ab3c97 in syncache_lookup (inc=0xffffff80002a2910,
    schp=<value optimized out>) at /usr/src/sys/netinet/tcp_syncache.c:500
#5  0xffffffff80ab424c in syncache_chkrst (inc=0xffffff80002a2910,
    th=0xfffffe005157ab7c) at /usr/src/sys/netinet/tcp_syncache.c:528
#6  0xffffffff80aabc33 in tcp_input (m=0xfffffe005157ab00,
    off0=<value optimized out>) at /usr/src/sys/netinet/tcp_input.c:1184
#7  0xffffffff80a3c5aa in ip_input (m=0xfffffe005157ab00)
    at /usr/src/sys/netinet/ip_input.c:760
#8  0xffffffff809db591 in swi_net (arg=<value optimized out>)
    at /usr/src/sys/net/netisr.c:806
#9  0xffffffff808d451d in intr_event_execute_handlers (
    p=<value optimized out>, ie=0xfffffe0004221c00)
    at /usr/src/sys/kern/kern_intr.c:1272
#10 0xffffffff808d5d0d in ithread_loop (arg=0xfffffe00042036c0)
    at /usr/src/sys/kern/kern_intr.c:1285
#11 0xffffffff808d099f in fork_exit (
    callout=0xffffffff808d5c70 <ithread_loop>, arg=0xfffffe00042036c0,
    frame=0xffffff80002a2b00) at /usr/src/sys/kern/kern_fork.c:992
#12 0xffffffff80ce603e in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:606
#13 0x0000000000000000 in ?? ()
(kgdb) frame 3
#3  0xffffffff808ef68e in _mtx_lock_sleep (m=0xffffff80010a7088,
    tid=18446741874755597456, opts=<value optimized out>,
    file=<value optimized out>, line=<value optimized out>)
    at /usr/src/sys/kern/kern_mutex.c:466
466                     turnstile_wait(ts, mtx_owner(m), TS_EXCLUSIVE_QUEUE);
(kgdb) p ((struct thread *)(m->mtx_lock&~15))->td_tid
$1 = 100013
(kgdb) tid 100013
[Switching to thread 43 (Thread 100013)]#0  sched_switch (
    td=0xfffffe0004216920, newtd=0xfffffe0004209920,
    flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1920
1920                    cpuid = PCPU_GET(cpuid);
(kgdb) bt
#0  sched_switch (td=0xfffffe0004216920, newtd=0xfffffe0004209920,
    flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1920
#1  0xffffffff8090d4f4 in mi_switch (flags=259, newtd=0x0)
    at /usr/src/sys/kern/kern_synch.c:485
#2  0xffffffff8094f446 in turnstile_wait (ts=<value optimized out>,
    owner=0xfffffe0004216920, queue=<value optimized out>)
    at /usr/src/sys/kern/subr_turnstile.c:753
#3  0xffffffff809014b2 in _rw_rlock (rw=0xfffffe0051850a98,
    file=<value optimized out>, line=0) at /usr/src/sys/kern/kern_rwlock.c:477
#4  0xffffffff80a35771 in in_pcblookup_hash (pcbinfo=0xffffffff81434020,
    faddr=<value optimized out>, fport=19210, laddr={s_addr = 1827520685},
    lport=<value optimized out>, lookupflags=2, ifp=0x0)
    at /usr/src/sys/netinet/in_pcb.c:1805
#5  0xffffffff81a1da99 in pf_socket_lookup () from /boot/kernel/pf.ko
#6  0xffffffff81a248a5 in pf_test_rule () from /boot/kernel/pf.ko
#7  0xffffffff81a2834c in pf_test () from /boot/kernel/pf.ko
#8  0xffffffff81a2f961 in pf_check_out () from /boot/kernel/pf.ko
#9  0xffffffff809dbbee in pfil_run_hooks (ph=<value optimized out>,
    mp=0xffffff80002ac7f8, ifp=0x6e00, dir=115288696, inp=0x4b0a)
    at /usr/src/sys/net/pfil.c:82
#10 0xffffffff80a3ecb9 in ip_output (m=0xfffffe0006df2a00,
    opt=<value optimized out>, ro=0xffffff80002ac810, flags=0, imo=0x0,
    inp=0x0) at /usr/src/sys/netinet/ip_output.c:504
#11 0xffffffff80ab398f in syncache_respond (sc=0xfffffe0173157000)
    at /usr/src/sys/netinet/tcp_syncache.c:1525
#12 0xffffffff80ab3afa in syncache_timer (xsch=<value optimized out>)
    at /usr/src/sys/netinet/tcp_syncache.c:460
#13 0xffffffff80919ee8 in softclock (arg=<value optimized out>)
    at /usr/src/sys/kern/kern_timeout.c:520
#14 0xffffffff808d451d in intr_event_execute_handlers (
    p=<value optimized out>, ie=0xfffffe0004221800)
    at /usr/src/sys/kern/kern_intr.c:1272
#15 0xffffffff808d5d0d in ithread_loop (arg=0xfffffe0004203680)
    at /usr/src/sys/kern/kern_intr.c:1285
#16 0xffffffff808d099f in fork_exit (
    callout=0xffffffff808d5c70 <ithread_loop>, arg=0xfffffe0004203680,
    frame=0xffffff80002acb00) at /usr/src/sys/kern/kern_fork.c:992
#17 0xffffffff80ce603e in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:606
#18 0x0000000000000000 in ?? ()
(kgdb) frame 3
#3  0xffffffff809014b2 in _rw_rlock (rw=0xfffffe0051850a98,
    file=<value optimized out>, line=0) at /usr/src/sys/kern/kern_rwlock.c:477
477                     turnstile_wait(ts, rw_owner(rw), TS_SHARED_QUEUE);
(kgdb) p ((struct thread *)(rw->rw_lock&~15))->td_tid
$2 = 100011

>How-To-Repeat:

>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list