NICs locking up, "*tcp_sc_h"
Nick Withers
nick at nickwithers.com
Sat Mar 14 18:43:23 PDT 2009
On Sat, 2009-03-14 at 18:01 +0000, Robert Watson wrote:
> On Sat, 14 Mar 2009, Nick Withers wrote:
>
> > Right, here we go!
> ...
>
> Turns out that the problem is a lock cycle triggered by the syncache calling,
> indirectly, the firewall during output, and the firewall trying to look up the
> connection for the packet. Thread one:
>
> > Tracing PID 31 tid 100030 td 0xffffff00012016e0
> > sched_switch() at sched_switch+0xdf
> > mi_switch() at mi_switch+0x18b
> > turnstile_wait() at turnstile_wait+0x1c4
> > _mtx_lock_sleep() at _mtx_lock_sleep+0x76
> > _mtx_lock_flags() at _mtx_lock_flags+0x95
> > syncache_lookup() at syncache_lookup+0xee
> > syncache_expand() at syncache_expand+0x38
> > tcp_input() at tcp_input+0x99b
> > ip_input() at ip_input+0xaf
> > ether_demux() at ether_demux+0x1b9
> > ether_input() at ether_input+0x1bb
> > fxp_intr() at fxp_intr+0x224
> > ithread_loop() at ithread_loop+0xe9
> > fork_exit() at fork_exit+0x112
> > fork_trampoline() at fork_trampoline+0xe
> > --- trap 0, rip = 0, rsp = 0xfffffffe80174d30, rbp = 0 ---
>
> This thread holds TCP locks and is trying to acquire the syncache lock.
> Thread two:
>
> > sched_switch() at sched_switch+0xdf
> > mi_switch() at mi_switch+0x18b
> > turnstile_wait() at turnstile_wait+0x1c4
> > _rw_rlock() at _rw_rlock+0x9c
> > ipfw_chk() at ipfw_chk+0x3ac1
> > ipfw_check_out() at ipfw_check_out+0xb1
> > pfil_run_hooks() at pfil_run_hooks+0xac
> > ip_output() at ip_output+0x357
> > syncache_respond() at syncache_respond+0x2fd
> > syncache_timer() at syncache_timer+0x15a
> > softclock() at softclock+0x270
> > ithread_loop() at ithread_loop+0xe9
> > fork_exit() at fork_exit+0x112
> > fork_trampoline() at fork_trampoline+0xe
>
> This is the syncache timer holding syncache locks, calling IP output, and IPFW
> trying to acquire TCP locks.
>
> Am I right in thinking that you are using uid/gid/jail firewall rules?
You are indeed.
> They
> suffer from a fundamental architectural problem in that they require reaching
> "up" to a higher level of the stack at times when it's not always a good idea
> to do so. In general we solve the problem by passing "down" the inpcb for a
> connection in the output path so that TCP doesn't have to look it up --
> however, in the case of the syncache we actually don't have the inpcb easily
> in hand (or at least, we have it, but we can't just lock it because syncache
> locks are after TCP locks in the lock order...). It transpires that what the
> firewall really wants is not the inpcb, but the credential, but those are
> interfaces we can't change right now.
Thanks for the explanation!
> I'll need to think a bit about a proper fix for this, but you'll find the
> problem likely goes away if you eliminate all uid/gid/jail rules from your
> firewall. You could also tweak the syncache logic not to use a retransmit
> timer, which might slightly extend the time it takes for systems to connect to
> your host in the presence of packet loss, but would eliminate this
> transmission path entirely. We'll need a real and more general fix, however,
> to commit, and I'll look and see what I can come up with.
Brilliant, thanks very much. I'll work without uid rules for the time
being, then.
Ta for your time and help on this!
> Robert N M Watson
> Computer Laboratory
> University of Cambridge
--
Nick Withers
email: nick at nickwithers.com
Web: http://www.nickwithers.com
Mobile: +61 414 397 446
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: This is a digitally signed message part
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20090315/c3eddc24/attachment.pgp
More information about the freebsd-stable
mailing list