NICs locking up, "*tcp_sc_h"

Nick Withers nick at nickwithers.com
Sat Mar 14 01:51:07 PDT 2009


On Fri, 2009-03-13 at 09:49 +0000, Robert Watson wrote:
> On Fri, 13 Mar 2009, Robert Watson wrote:
> 
> > Sounds like a lock leak -- if you're running INVARIANTS, then "show allocks"
>                                                   ^^^^ should read WITNESS
> > and "show allchains" would be useful.  I've had a report of a TCP lock leak 
> > possibly in tcp_input(), but haven't managed to track it down yet -- this 
> > could well be it as well.

Right, here we go!

Here's "show alllocks"' output:
____

Process 31 (irp20: fxp0+) thread 0xffffff00012016e0 (100030)
exclusive rw tcpinp r = 0 (0xffffff000392d570) locked
@ /usr/src/sys/netinet/tcp_input.c:480
exclusive rw tcp r = 0 (0xffffffff806dcbe8) locked
@ /usr/src/sys/netinet/tcp_input.c:400
Process 17 (swi6: Giant taskq) thread 0xffffff0001173000 (100016)
exclusive sleep mutex Giant r = 0 (0xffffffff80652520) locked
@ /usr/src/sys/kern/kern_intr.c:1087
Process 12 (swi4: clock) thread 0xffffff00010c6370 (100003)
shared rw IPFW static rules r = 0 (0xffffffff806db2b8) locked
@ /usr/src/sys/netinet/ip_fw2.c:2460
shared rw PFil hoow read/write mutex r = 0 (0xffffffff806dba28) locked
@ /usr/src/sys/net/pfil.c:73
exclusive sleep mutex tcp_sc_head r = 0 (0xfffffffe8036c8f8) locked
@ /usr/src/sys/kern/kern_timeout.c:241
____

...and here's "show allchains"':
____

chain 1:
 thread 100031 (pid 32, irp22: rl0) blocked on lock 0xffffffff806dcbe8
(rw) "tcp"
 thread 100030 (pid 31, irq20: fxp0+) blocked on lock 0xfffffffe8036c8f8
(sleep mutex) "tcp_sc_head"
 thread 100003 (pid 12, swi4: clock) blocked on lock 0xffffffff806dcbe8
(rw) "tcp"
 thread 100030 (pid 31, irq20: fxp0+) blocked on lock 0xfffffffe8036c8f8
(sleep mutex) "tcp_sc_head"
 thread 100003 (pid 12, swi4: clock) blocked on lock 0xffffffff806dcbe8
(rw) "tcp"
 thread 100030 (pid 31, irq20: fxp0+) blocked on lock 0xfffffffe8036c8f8
(sleep mutex) "tcp_sc_head"
 thread 100003 (pid 12, swi4: clock) blocked on lock 0xffffffff806dcbe8
(rw) "tcp"
(...and so on, these last two seeming to go on forever.)
____

This is with fxp0 and rl0 lagg(4)-ed

For completeness...
  - "ps" in DDB shows that:
    - PID 32 ("[irp22: rl0]") is in state "LL" on "*tcp"
    - PID 31 ("[irq20: fxp0+]") is in state "LL" on "*tcp_sc_h"
    - PID 12 ("[swi4: clock]") is in state "LL" on "*tcp"
  - "where 32" gives:
____

sched_switch() at sched_switch+0xdf
mi_switch() at mi_switch+0x18b
turnstile_wait() at turnstile_wait+0x1c4
_rw_lock_hard() at _rw_lock_hard+0xa3
_rw_wlock() at _rw_wlock+0x54
tcp_input() at tcp_input+0x318
ip_input() at ip_input+0xaf
ether_demux() at ether_demux+0x1b9
ether_input() at ether_input+0x1bb
rl_rxeof() at rl_rxeof+0x1c8
rl_intr() at rl_intr+0x138
ithread_loop() at ithread_loop+0xe9
fork_exit() at fork_exit+0x112
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xfffffffe91e0cd30, rbp = 0 ---
____

  - "where 31" gives:
____

Tracing PID 31 tid 100030 td 0xffffff00012016e0
sched_switch() at sched_switch+0xdf
mi_switch() at mi_switch+0x18b
turnstile_wait() at turnstile_wait+0x1c4
_mtx_lock_sleep() at _mtx_lock_sleep+0x76
_mtx_lock_flags() at _mtx_lock_flags+0x95
syncache_lookup() at syncache_lookup+0xee
syncache_expand() at syncache_expand+0x38
tcp_input() at tcp_input+0x99b
ip_input() at ip_input+0xaf
ether_demux() at ether_demux+0x1b9
ether_input() at ether_input+0x1bb
fxp_intr() at fxp_intr+0x224
ithread_loop() at ithread_loop+0xe9
fork_exit() at fork_exit+0x112
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xfffffffe80174d30, rbp = 0 ---
____

  - "where 12" gives:
____

sched_switch() at sched_switch+0xdf
mi_switch() at mi_switch+0x18b
turnstile_wait() at turnstile_wait+0x1c4
_rw_rlock() at _rw_rlock+0x9c
ipfw_chk() at ipfw_chk+0x3ac1
ipfw_check_out() at ipfw_check_out+0xb1
pfil_run_hooks() at pfil_run_hooks+0xac
ip_output() at ip_output+0x357
syncache_respond() at syncache_respond+0x2fd
syncache_timer() at syncache_timer+0x15a
softclock() at softclock+0x270
ithread_loop() at ithread_loop+0xe9
fork_exit() at fork_exit+0x112
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xfffffffe80017d30, rbp = 0 ---
____

  - Before having entered the debugger, the following were logged:
____

lock order reversal:
 1st 0xffffff00032947b0 tcpinp (tcpinp)
@ /usr/src/sys/netinet/tcp_timer.c:169
 2nd 0xffffffff806dba28 PFil hook read/write mutex (PFil hook read/write
mutex) @ /usr/src/sys/net/pfil.c:73
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
witness_checkorder() at witness_checkorder+0x565
_rw_rlock() at _rw_rlock+0x25
pfil_run_hooks() at pfil_run_hooks+0x44
ip_output() at ip_output+0x357
tcp_output() at tcp_output+0xa1d
tcp_timer_delack() at tcp_timer_delack+0xc7
softclock() at softclock+0x270
ithread_loop() at ithread_loop+0xe9
fork_exit() at fork_exit+0x112
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xfffffffe80017d30, rbp = 0 ---
____

...and:
____

lock order reversal:
 1st 0xfffffffe80365df0 tcp_sc_head (tcp_sc_head)
@ /usr/src/sys/kern/kern_timeout.c:241
 2nd 0xffffffff806dba28 PFil hook read/write mutex (PFil hook read/write
mutex) @ /usr/src/sys/net/pfil.c:73
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
witness_checkorder() at witness_checkorder+0x565
_rw_rlock() at _rw_rlock+0x25
pfil_run_hooks() at pfil_run_hooks+0x44
ip_output() at ip_output+0x357
syncache_respond() at syncache_respond+0x2fd
syncache_timer() at syncache_timer+0x15a
softclock() at softclock+0x270
ithread_loop() at ithread_loop+0xe9
fork_exit() at fork_exit+0x112
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xfffffffe80017d30, rbp = 0 ---
____

On reboot:
____

lock order reversal:
 1st 0xffffffff806db2b8 IPFW static rules (IPFW static rules)
@ /usr/src/sys/netinet/ip_fw2.c:2460
 2nd 0xffffffff806dcbe8 tcp (tcp) @ /usr/src/sys/netinet/ip_fw2.c:2009
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
witness_checkorder() at witness_checkorder+0x565
_rw_rlock() at _rw_rlock+0x25
ipfw_chk() at ipfw_chk+0x3ac1
ipfw_check_out() at ipfw_check_out+0xb1
pfil_run_hooks() at pfil_run_hooks+0xac
ip_output() at ip_output+0x357
tcp_respond() at tcp_respond+0x33a
tcp_dropwithreset() at tcp_dropwithreset+0x131
tcp_do_segment() at tcp_do_segment+0xd93
tcp_input() at tcp_input+0x8dd
ip_input() at ip_input+0xaf
ether_demux() at ether_demux+0x1b9
ether_input() at ether_input+0x1bb
fxp_intr() at fxp_intr+0x224
ithread_loop() at ithread_loop+0xe9
fork_exit() at fork_exit+0x112
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xfffffffe80174d30, rbp = 0 ---
____

(Haven't actually checked whether any of these are known innocuous).

Anything else that might be useful?

> Robert N M Watson
> Computer Laboratory
> University of Cambridge
-- 
Nick Withers
email: nick at nickwithers.com
Web: http://www.nickwithers.com
Mobile: +61 414 397 446
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: This is a digitally signed message part
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20090314/19098c4f/attachment.pgp


More information about the freebsd-stable mailing list