FreeBSD 7.0: sockets stuck in CLOSED state...

Wed Jun 25 19:35:32 UTC 2008

On Wed, 25 Jun 2008, Ali Niknam wrote:

> Recently i've been upgrading some of my machines from FreeBSD 6.x amd64 to 
> FreeBSD 7.0 amd64.
>
> After upgrading I noticed a weird error/bug. It seems that after several 
> thousand TCP connections some seem to hang in 'CLOSED' state.

Sounds like there's a bug somewhere.  Before we start trying to track it down, 
I'll tell you a little more about how this works so that we can interpret the 
output you're seeing.

In FreeBSD, as with all UNIX/Berkeley sockets systems, each socket is actually 
represented by a set of data structures representing different layers of 
abstraction.  At the top level of struct file, representing a file descriptor. 
Next down is struct socket, representing a socket.  Then the protocol code has 
struct inpcb, representing a generic IP connection, and struct tcpcb (or 
struct tcptw once we enter TIMEWAIT), representing a TCP connection. 
Confusingly, these data structures don't always exist all at once.  For 
example, if you close the file descriptor, freeing struct file, the socket and 
protocol state may persist for some time until the TCP connection closes (all 
data has been sent, or various other close modes).

One important difference between FreeBSD 6.x and FreeBSD 7.x is that, in 
FreeBSD 7.x, we've reduced the degree to which these data structures exist in 
isolation.  If you look at the mailing list threads discussing the change, 
you'll see it described as "strengthening invariants".  The most important 
part of the change was making it an invariant that so->so_pcb, the pointer 
from the socket to the protocol layer state, always remains stable and valid. 
This had a number of benefits: because the pointer is always stable, it no 
longer requires locks to following, lowering overhead and improving 
parallelism.  It also simplifies the code by removing lots of error handling, 
and improved code stability by avoiding the inevitable bugs associated with 
complex error handling.  If you look at bug reports over the years, we've had 
quite a few panics reported (and fixed) in which the disappearance of protocol 
layer state, such as when a connection is reset while still in use by a 
process, and these are now all believed to be eliminated.

So the code is faster, cleaner, and more stable.  But there are a few 
interesting side effects.  One is that we retain state at the TCP layer for 
longer than we used to.  Specifically, if a TCP connection closes, the inpcb 
remains allocated until the file descriptor is closed (i.e., the application 
notices the connection has closed and invokes close() on the file descriptor). 
This has a few impacts: one is that TCP connections now appear in netstat in 
the CLOSED state for longer than before, and another is that open sockets that 
are associated with CLOSED TCP connections now count against the global 
resource limit on the number of simultaneous TCP connections.

I say "longer than before", but I should be clear that, in practice, assuming 
all is working properly, there's no measurable behavioral change *except* for 
improved performance, cleanliness, and stability.  This is because 
applications generally open a socket, run a protocol, and when the protocol 
wraps up, they then close() the file descriptor in order to close the 
connection.

So, with that introduction, we're interested in resolving:

(1) Is this an application bug (leaking file descriptors) that only manifests
     in 7.x due to changes in kernel state management, leading to the sockets
     being visible in netstat and counting against the resource limit?

(2) Is this a *new* bug in TCP in 7.x, perhaps a result of the state-related
     changes I've described?

(3) Is this an *old* bug in TCP that is only now manifesting because of the
     changes in kernel state management?

The first is the easiest to resolve, as all we need to do is see whether the 
number of file descriptors for the application goes upwards in an improbable 
manner.  You can use fstat, procstat, sockstat, or various other tools (such 
as lsof) to see whether the process is leaking file descriptors.  You can also 
instrument your application to keep track of the file descriptor numbers being 
returned to see whether, perhaps, that number only goes up over time, and gets 
really big.

If it turns out that your application *is* properly closing sockets, then we 
need to decide if perhaps we're looking at a race in close and state 
management.  In particular, I'll need the output of "netstat -na", "vmstat 
-z", and "vmstat -m" from the machine once it's in its rather wedged-up state. 
It would be most helpful if you could actually shut down to single-user mode, 
killing all user processes, then waiting ten minutes, and capturing the output 
of those above commands to files that you can then e-mail to me.

Without accusing you of having buggy code, I should say that I think there's a 
reasonable chance that what you're seeing is an interaction between an 
existing leak of resources in the application and the way the kernel state 
management has changed.  The output from netstat pretty precisely matches that 
what you'd expect: lots of TCP connections in the CLOSED state reflecting a 
series of connections built by an application but then not properly discarded. 
Likewise, when the application is killed, all of the connections go away -- 
most likely because the file descriptors are all closed, allowing them to be 
garbage collected and connection state freed.  If it is this sort of bug, then 
most likely you're missing a call to close() in a work loop somewhere, and in 
some exceptional case, you fall out of the loop without calling close().

If it turns out that you can get to single-user, wait ten minutes to make sure 
all the connections wind down, and there are still connections visible in 
netstat, then we may indeed be looking at a kernel bug, and the debugging 
information using netstat and vmstat will allow us to start to investigate.

Robert N M Watson
Computer Laboratory
University of Cambridge

>
> netstat -n gives:
> ...
> tcp4      0       0  1.2.3.4.*          4.5.6.7.42149       CLOSED
> tcp4      39      0  1.2.3.4.*          4.5.6.7.54103       CLOSED
> tcp4      35      0  1.2.3.4.*          4.5.6.7.41718       CLOSED
> tcp4      38      0  1.2.3.4.*          4.5.6.7.55618       CLOSED
> tcp4      41      0  1.2.3.4.*          4.5.6.7.44230       CLOSED
> tcp4      39      0  1.2.3.4.*          4.5.6.7.49439       CLOSED
> ...
>
> These never go away; they gradually increase and increase until the 
> application starts giving errors (probably because some socket or 
> filedescriptor limit is reached). When the application is killed these 
> entries disappear.
>
> The application in question is a self written DNS server, multithreaded, and 
> running fine for years without any troubles on both BSD 5.x as well as 6.x. 
> Also 32bits as well as 64bits on 6.x.
>
> Ofcourse that doesn't mean that the application is error free, however, after 
> doing extensive testing I really can not find anything wrong with the 
> application itself, so I'm thinking maybe there's a change somewhere that 
> causes this? I know that tcp/network has been completely redone...
>
> What basically happens in the application is this:
> - one main tcp thread runs an infinite while loop waiting for new 
> connections to arrive
> - as soon as one arrives a new thread is spawned that handles the newly 
> created stream
> - it reads some bytes, writes some bytes, then closes it
> - thread exits
>
> What appears to happen is this: after the new thread is spawned it tries to 
> read 2 bytes (DNS tcp length information). It gets back 0 bytes (EOF) and 
> therefore closes the sockets and calls pthread_exit. However in netstat that 
> same stream oftenly appears to have bytes 'stuck' in the in queue...
>
> I really can't see how this can cause hanging sockets in 'CLOSED' state. Even 
> if the incoming queue isnt read entirely a call to close should close it. 
> Also I really can't find any documentation in netstat, or elsewhere, about 
> the 'CLOSED' state...
>
>
> Any help would greatly be appreciated!
>
>
> Kind Regards,
>
>
> Ali Niknam
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>