POLLHUP on never connected socket

Fri Sep 2 10:40:20 UTC 2011

On Fri, Sep 02, 2011 at 12:28:24PM +0300, Andriy Gapon wrote:
> I see a problem where FreeBSD kernel (recent head) returns POLLHUP
> _alone_ (0x10) for a socket that has never been connected - a client
> socket for which connect(2) failed.  There is also a piece of software
> that doesn't expect that flag and exhibits illogical behavior because
> of it.

That would be a bug in that software. Returning POLLHUP alone is clearly
valid.

> This is how POSIX describes POLLHUP:
> POLLHUP
>     The device has been disconnected. This event and POLLOUT are
> mutually-exclusive; a stream can never be writable if a hangup has
> occurred.  However, this event and POLLIN, POLLRDNORM, POLLRDBAND, or
> POLLPRI are not mutually-exclusive. This flag is only valid in the
> revents bitmask; it shall be ignored in the events member.

Together with the description of POLLIN which says that data may be read
without blocking, this seems to mean that if POLLHUP is set alone, there
is no point in calling read() since it will return 0 or -1 immediately.
If POLLHUP and POLLIN are both set, there is still data in the buffer
(in FreeBSD, POLLHUP | POLLIN is often returned even if there is no
data, to keep buggy applications from breaking).

A complicating factor is that there are situations where read() returns
0, but yet it is not a hangup. For example, if a user types the EOF
character (such as ^D) in a terminal, this causes read() to return 0 but
the terminal remains writable (and writing to it may even block). This
is the "zero length message" which is under the obsolescent STREAMS
marking, even though it applies to terminals in all implementations.

Similarly, if the other side has done the equivalent of
shutdown(SHUT_WR), read() will return 0 but POLLHUP must not be returned
as it would prevent the application from waiting for space in the
outgoing socket buffer; therefore POLLIN must be returned. Similar
considerations apply to other shutdown() calls (but note that
shutdown(SHUT_RD) is not sent to the other side).

The other way around, read() may fail after poll() did not return
POLLHUP if the connection had closed in the meantime.

> For me "disconnected" _implies_ that the device should have been
> connected first.  But this is not explicitly said anywhere.  Also, I
> think it's possible that a socket gets connected and immediately
> disconnected (before poll(2) is called), then the POLLHUP would be
> appropriate in any interpretation.  So, I am inclined to think that
> the software should check for POLLHUP.  But I would like to ask your
> opinion since the problem appears to be FreeBSD-specific.

There was a connection attempt in progress that failed at some point,
which seems close enough to "disconnected". Alternatively, POLLERR could
be set but could activate more bugs in applications.

Comparisons with things like "connected and immediately disconnected"
and "connection closed after poll() returned something" are useful in
reasoning about this.

Ports people have complained about poll() behaviour before, are there
configure scripts that attempt to check if we ever return POLLHUP alone
and only check for POLLIN if not?

-- 
Jilles Tjoelker