SO_LINGER on socket with non-blocking I/O

Julian Cowley julian at tiki-tech.com
Thu Jun 10 16:18:44 PDT 2004


On Thu, 10 Jun 2004, Chuck Swiger wrote:
> Julian Cowley wrote:
> > I've been developing an application that attempts to send data from
> > one host to another via TCP.  The intent is for the data transfer
> > to be as reliable as possible, and to log whenever it detects
> > that it has lost data (this is for a reliable syslog protocol,
> > if you're wondering).  Because my application doesn't (yet) have
> > application-level acknowledgments, it has to depend on TCP to make
> > sure the data gets through reliably.
>
> OK.  TCP is really good at doing what you've asked.  :-)

Yes, we decided to switch to using TCP just for this reason.  I suppose
what I'm trying to get at here is that while using TCP for its
reliability is easy, getting the kernel to report its progress at the
lower levels is turning out to be more difficult than I first thought.
This has gone past my original question about blocking, since what
I really want to do is detect when the underlying TCP session is
encountering difficulty when closing the connection (if I know for
certain that the TCP session closed successful, then I can be sure all
of the data was sent).  That's where I was hoping SO_LINGER would help.

I've started to do some testing to find out how SO_LINGER works (when
I complete the results I'll post them here).  So far, I've found
it to be fairly disappointing -- on my testing of an OpenBSD system
(haven't been able to do this on an actual FreeBSD system yet), setting
SO_LINGER only causes the close() to pause.  If the timer expires and
the TCP session still hasn't completed, the close() returns success
and the TCP shutdown continues in the background (what they call a
"graceful shutdown").  In my mind, this makes SO_LINGER completely
useless!  If the close() always returns success, I can't find out
if the TCP session closed successfully.  Setting SO_LINGER causes
the close() to just, well, linger, but what's the point in that?
I'm hoping that FreeBSD will work differently, but I'm not betting
on it since most BSD-derived kernels work the same in this respect.

> > When closing the socket, I want to make sure that the remaining data
> > got through to the other end (or otherwise log something if it didn't).
> > I've set SO_LINGER on the socket for this purpose, but one caveat is
> > that I also have the socket in non-blocking mode.
>
> When your local TCP issues a close(), the TCP stack will iterate through a
> series of steps (the FIN-WAIT stages) to ensure that any remaining data will
> be sent and acknowledged before your local machine actually releases the
> socket.  See RFC-793, "3.5.  Closing a Connection"
>
>    CLOSE is an operation meaning "I have no more data to send."  The
>    notion of closing a full-duplex connection is subject to ambiguous
>    interpretation, of course, since it may not be obvious how to treat
>    the receiving side of the connection.  We have chosen to treat CLOSE
>    in a simplex fashion.  The user who CLOSEs may continue to RECEIVE
>    until he is told that the other side has CLOSED also.  Thus, a program
>    could initiate several SENDs followed by a CLOSE, and then continue to
>    RECEIVE until signaled that a RECEIVE failed because the other side
>    has CLOSED.  We assume that the TCP will signal a user, even if no
>    RECEIVEs are outstanding, that the other side has closed, so the user
>    can terminate his side gracefully.  A TCP will reliably deliver all
>    buffers SENT before the connection was CLOSED so a user who expects no
>    data in return need only wait to hear the connection was CLOSED
>    successfully to know that all his data was received at the destination
>    TCP.  Users must keep reading connections they close for sending until
>    the TCP says no more data.

It's interesting that the actual RFC differs from the Berkeley socket
implementation in subtle ways.  In the Berkeley model, close() doesn't
implement the CLOSE described above.  What they describe seems more
like what "shutdown(s, SHUT_WR)" does, which is different.

> > My question is, what is the behavior of close() on a socket in
> > non-blocking mode when SO_LINGER is set (to a non-zero time)?
> >
> > There seems to be two, possibly three, possibilities according to
> > some web searches I've done:
> >
> > 1) the close() call immediately returns with an EWOULDBLOCK (EAGAIN)
> >    error.
> > 2) the call blocks anyway regardless of the non-blocking mode setting.
> > 3) the call returns immediately after the connection is forcibly reset,
> >    possibly losing any queued data that was to be sent.
> >
> > I'm pretty sure the third possibility only happens when SO_LINGER is
> > set with a linger time of 0 seconds.
>
> Remember that the in-process reference to a socket's descriptor is not the
> same thing as the kernel's reference to the underlying TCB (or whatever
> FreeBSD calls the TCP control block).  Even if you close() the descriptor, the
> system ought to continue to process any unsent data until the TCP stack
> succeeds or times out the TCP connection.

Yes, that's the way it works.  If SO_LINGER is not used, the close()
returns immediately and the TCP shutdown continues gracefully in the
background, doing retries if necessary even after the process that
opened the socket has exited.  In the absence of SO_LINGER, this is
perfectly logical and acceptable behavior.

BTW, I think the kernel sources refer to it as an ICB (internet
control block), but I could be misinterpreting them.

> It may be the case that what you want to use is shutdown(2), instead.

Thanks, that's a really good idea.  I'll start looking into this.
I'm hoping that since shutdown() allows the socket to remain open
(even if the underlying TCP is shut down), this may provide a way to
get more information out of the kernel.

> In other words, possibility #1 is probably what should happen.  #2 may happen
> if the local platform doesn't handle non-blocking I/O very well.  #3 should
> only happen if you are using a TCP stack which is broken, but some people seem
> to prefer that, so who can say?

I hope to be able to find out during my testing.

-- 
In the Year 2000 (tm)... "I will convert to Judasism and change my
trademark Fa Shizzle My Nizzle to Sheiztle Fa Zeitzel." -- Snoop Dog


More information about the freebsd-questions mailing list