Sockets stuck in FIN_WAIT_1

Thu May 29 21:32:45 UTC 2008

:I think we're onto something here, but for some reason it doesn't make  
:any sense.  I have keepalives turned OFF in Apache:
:
:When I tcpdump this, I see something sending ack's back and forth  
:every 60 seconds, but what?  Apache?  I'm not sure why.   I don't see  
:any timeouts in Apache for ~60 seconds.  As you can see, sometimes we  
:send an ack, but never see a reply.  I'm gathering the OS level  
:keepalives don't come into play because this session is not considered  
:idle?
:
:
:0:13:07.640426 IP 1.1.1.1.80 > 2.2.2.2.33379: .  
:4208136508:4208136509(1) ack 1471446041 win 520 <nop,nop,timestamp  
:3019088951 5004131>
:20:13:07.736505 IP 2.2.2.2.33379 > 1.1.1.1.80: . ack 0 win 0  
:<nop,nop,timestamp 5022148 3019088951>
:20:14:07.702647 IP 1.1.1.1.80 > 2.2.2.2.33379: . 0:1(1) ack 1 win 520  
:<nop,nop,timestamp 3019148951 5022148>
:20:15:07.764920 IP 1.1.1.1.80 > 2.2.2.2.33379: . 0:1(1) ack 1 win 520  
:<nop,nop,timestamp 3019208951 5022148>
:20:15:07.860988 IP 2.2.2.2.33379 > 1.1.1.1.80: . ack 0 win 0  
:<nop,nop,timestamp 5058183 3019208951>
:20:16:07.827262 IP 1.1.1.1.80 > 2.2.2.2.33379: . 0:1(1) ack 1 win 520  
:...

    Yah, the connection is valid so keepalives do not come into play.
    What is happening is that 1.1.1.1 wants to send something to 2.2.2.2,
    but 2.2.2.2 is telling 1.1.1.1 that it has no buffer space (win 0).

    This forces the TCP stack on 1.1.1.1 (the kernel, not the apache server)
    to 'probe' the connection, which it appears to be doing once a minute.
    It is probing the connection waiting for 2.2.2.2 to tell it that buffer
    space is available (win != 0).

    The connection remains valid because 2.2.2.2 continues to respond to
    the probes.

    Now, the connection is also in a half-closed state, which means that
    one direction is closed.  I can't tell which direction that is but my
    guess is that 1.1.1.1 (the apache server) closed the 1.1.1.1->2.2.2.2
    direction and the 2.2.2.2 box has a broken TCP implementation and can't
    deal with it.

:I'm finding several of these sessions doing the same exact thing....
:
:-- 
:Robert Blayzor, BOFH
:INOC, LLC

    I can suggest two things.  First, the TCP connection is good but you
    still may be able to tell Apache, in the apache configuration file, to
    timeout after a certain period of time and clear the connection.

    Secondly, it may be beneficial to identify exactly what the client and
    server were talking about which caused the client to hang with a live
    tcp connection.  The only way to do that is to tcpdump EVERYTHING going
    on related to the apache srever, save it to a big-ass disk partition
    (like 500G), and then when you see a stuck connection go back through
    the tcpdump log file and locate it, grep it out, and review what exactly
    it was talking about.  You'd have to tcpdump with options to tell it to
    dump the TCP data payloads.

    It seems likely that the client is running an applet or javascript that
    receives a stream over the connection, and that applet or javascript
    program has locked up, causing the data sent from the server to build up
    and for the client's buffer space to run out, and start advertising the
    0 window.

					-Matt
					Matthew Dillon 
					<dillon at backplane.com>