Survey results very helpful,
thanks! (was: Re: net.inet.tcp.timer_race:
does anyone have a non-zero value?)
Robert Watson
rwatson at FreeBSD.org
Mon Mar 8 20:33:33 UTC 2010
On Mon, 8 Mar 2010, Doug Hardie wrote:
> I run a number of 4 core systems with em interfaces. These are production
> systems that are unmanned and located a long way from me. Under unusual
> conditions it can take up to 6 hours to get there. I have been waiting to
> switch to 8.0 because of the discussions on the em device and now it sounds
> like I had better just skip 8.x and wait for 9. 7.2 is working just fine.
Not sure that any information in this survey thread should be relevant to that
decision. This race has existed since before FreeBSD, having appeared in the
original BSD network stack, and is just as present in FreeBSD 7.x as 8.x or
9.x. When I learned about the race during the early 7.x development cycle, I
added a counter/statistic to measure how much it happened in practice, but was
not able to exercise it in my testing, and so left the counter in to appear in
7.0 and later so that we could perform this survey as core counts/etc
increase.
The two likely outcomes were "it is never exercised" and "it is exercised but
only very infrequently", neither really justifying the quite complex change to
correct it given requirements at the time. On-going development work on the
virtual network stack is what justifies correcting the bug at this point,
moving from detecting and handling the race to preventing it from occuring as
an invariant. The motivation here, BTW, is that we'd like to eliminate the
type-stable storage requirement for connection state (which ensures that
memory once used for a connection block is only ever used for connection
blocks in the future), allowing memory to be fully freed when a virtual
network stack is destroyed. Using type-stable storage helped address this
bug, but was primarily present to reduce the overhead of monitoring using
netstat(1). We'll now need to use a slightly more expensive solution (true
reference counts) in that context, although in practice it will almost
certainly be an unmeasurable cost.
Which is to say that while there might be something in the em/altq/... thread
to reasonably lead you to avoid 8.0, nothing in the TCP timer race thread
should do so, since it affects 7.2 just as much as 8.0. Even if you do see a
non-zero counter, that's not a matter for operational concern, just useful
from the perspective of a network stack developer to understanding timing and
behaviors in the stack. :-)
Robert
More information about the freebsd-stable
mailing list