Help needed: TCP Wizards (was 8.0-RC1 NFS client timeout issue)
Rick Macklem
rmacklem at uoguelph.ca
Thu Nov 5 16:29:00 UTC 2009
Rick Macklem wrote:
> I can now reproduce what I think others were seeing as slow reconnects
> when using NFSv3 over TCP against a server that disconnects inactive
> TCP connections. I have had some luck figuring out what is going on
> and can reproduce it fairly easily, but I really need help from someone
> who understands the FreeBSD TCP stack.
>
Ok, I haven't made much progress on this, but here's what little I
currently know about it.
The problem occurs after a server has dropped an inactive TCP connection
for an NFS over TCP mount (in my case a Solaris10 server). When the client
does a new connection it, for some reason, sends a RST at almost exactly
the same time as the first RPC request on the new TCP connection, causing
the server to shut it down.
Ok, things I now know don't affect this are:
- doing the soshutdown(), soclose() on the old connection. I commented
them out and it still happened.
- Avoiding the sobind() on the new connection, done before the
soconnect().
- Using a non-reserved port#.
(The above tests shot down pretty well all the "theories" I could come up
with.)
The only thing I've found that avoids the problem:
- putting a 2sec delay right before the soconnect() call. (A 1sec delay
made it hard to reproduce and I've never reproduced it yet with a 2sec
delay.)
Not much of a fix, though.
Now, here's where someone may be able to help?
Grep'ng around, I found 4 places where the TCP stack called ip_output()
(one in each of tcp_output.c, tcp_subr.c, tcp_timewait.c and
tcp_syncache.c) and I put a printf like this just before them:
if (flags & TH_RST)
printf("sent a reset\n");
(The exact format varies for each, because of where the TCP
header flags are and have different printf messages.)
Now, the weird part is, that when the extraneous RST is sent to the
server, I don't get any printf. (I do get a few of these, but at other
times for what appear to be legitimate RSTs.)
I can't see anywhere else that the TCP stack would send an RST and, so,
I'm stuck w.r.t. figuring out what is sending them?
Anyone know of another place the TCP stack would make the send happen?
(Or is it queued earlier when I see the printf message, and then the
send is "triggered" inside the ip layer when the first data is sent on
the new connection?)
rick, who is getting sick of looking at this:-)
More information about the freebsd-current
mailing list