svn commit: r205391 - head/sys/netinet
Robert Watson
rwatson at FreeBSD.org
Sun Mar 21 10:15:59 UTC 2010
On Sat, 20 Mar 2010, Kip Macy wrote:
> - spread tcp timer callout load evenly across cpus if net.inet.tcp.per_cpu_timers is set to 1
> - don't default to acquiring tcbinfo lock exclusively in rexmt
>
> MFC after: 7 days
In the future, it would be helpful if you could make independent changes such
as these as separate commits. It makes post-commit review easier, but also
means that elements of the change can be more easily backed out or merged
separately.
My experience with TCP timer and locking changes is that 7 days is a very
short merge time. I would generally suggest a minimum of a month -- while
bugs crop up more slowly in HEAD testing, the bug reports are better and
you're more likely to get people with expertise looking at them there (without
the risk of taking out large numbers of systems :-). This is especially true
currently, when several folks actively have their hands in TCP.
For more complex changes, I've generally been going with a three month MFC
timeout.
> +static int per_cpu_timers = 0;
> +SYSCTL_INT(_net_inet_tcp, OID_AUTO, per_cpu_timers, CTLFLAG_RW,
> + &per_cpu_timers , 0, "run tcp timers on all cpus");
> +
> +#define INP_CPU(inp) (per_cpu_timers ? (!CPU_ABSENT(((inp)->inp_flowid % (mp_maxid+1))) ? \
> + ((inp)->inp_flowid % (mp_maxid+1)) : curcpu) : 0)
The 'curcpu' case here violates an invariant we have been trying hard to
maintain: that callouts for a single inpcb/tcpcb execute on only one CPU at a
time. While I don't have any specific bugs in mind (well, perhaps other than
the well-known TCP timer race that turns out does occur with moderate
frequency), it is fairly certain that increasing single connection parallelism
in callouts would significantly increase the chances of hitting those
bugs/features. Long and hard experience suggests that changing assumptions in
the TCP timer code can have subtle but ultimately catastrophic consequences
(both Andre and I have run into this in the past decade).
Since the 'curcpu' case above is somewhat unlikely in the hardware you're
using, can I suggest changing it to fall back to CPU 0 in that case as well?
This would maintain the parallelism you're trying to accomplish but avoid that
edge case that could have hard to to track down consequences. This would
increase my comfort level with an MFC before 8.1.
> @@ -478,11 +485,22 @@ tcp_timer_rexmt(void * xtp)
> if (++tp->t_rxtshift > TCP_MAXRXTSHIFT) {
> tp->t_rxtshift = TCP_MAXRXTSHIFT;
> TCPSTAT_INC(tcps_timeoutdrop);
> + in_pcbref(inp);
> + INP_INFO_RUNLOCK(&V_tcbinfo);
> + INP_WUNLOCK(inp);
> + INP_INFO_WLOCK(&V_tcbinfo);
> + INP_WLOCK(inp);
> + if (in_pcbrele(inp)) {
> + INP_INFO_WUNLOCK(&V_tcbinfo);
> + CURVNET_RESTORE();
> + return;
> + }
> tp = tcp_drop(tp, tp->t_softerror ?
> tp->t_softerror : ETIMEDOUT);
> + headlocked = 1;
> goto out;
> }
> - INP_INFO_WUNLOCK(&V_tcbinfo);
> + INP_INFO_RUNLOCK(&V_tcbinfo);
> headlocked = 0;
> if (tp->t_rxtshift == 1) {
> /*
Recent survey results for tcp_timer_race leave me a bit worried about changes
that open up greater potential races in the TCP timer code, and this one does
worry me. When tcp_timer_race occurs, holding tcbinfo continuously across the
timer is what helps mitigate the race. With dozens of sites reporting
significantly non-zero instance of the bug, I'm worried that this change could
allow a conversion from "silently mitigated with a counter bump" to "the
system panics or similar". I need to think a bit more about the exact nature
of the bug, but it could be that MFC'ing this part of the change before
tcp_timer_race is fixed could have unfortunate stability consequences.
(Since you're working only with long-lived connections with relatively little
turnover, you may not see this in testing -- however, you can check
net.inet.tcp.timer_race to see if production systems see it). Interestingly,
it wasn't just 8-core systems that appeared in the reports, there were also
uniprocessor systems.
On an unrelated note: I think it would be useful to rename 'headlocked' in
this function to 'headwlocked', since it no longer tracks whether tcbinfo is
locked at all, it just tracks whether it is write-locked.
Robert N M Watson
Computer Laboratory
University of Cambridge
More information about the svn-src-head
mailing list