kern/94772: FIFOs (named pipes) + select() == broken

Thu Mar 23 02:20:17 UTC 2006

The following reply was made to PR kern/94772; it has been noted by GNATS.

From: Bruce Evans <bde at zeta.org.au>
To: Oliver Fromme <olli at lurza.secnetix.de>
Cc: bug-followup at freebsd.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Thu, 23 Mar 2006 13:13:15 +1100 (EST)

 On Wed, 22 Mar 2006, Oliver Fromme wrote:
 > Oliver Fromme wrote:
 > > Bruce Evans wrote:

 > > > See also:
 > > >
 > > > PR 76125 (about the same bug)

 > The first one (76125) seems completely unrelated.  Typo?

 Yes, it's actually 76525.

 > > (By the way, DEC UNIX 4.0D _does_ have a bug:  If the FIFO
 > > has O_NONBLOCK set and no writer has opened the FIFO, then
 > > select() doesn't block.
 >
 > Actually, it's not a bug.  I've read SUSv3 wrong.  That
 > behaviour is perfectly fine.  In fact, SUSv3 (a.k.a.
 > POSIX-2001) requires that select() doesn't block in that
 > case, and the behaviour of select() and poll() must be
 > independet of whether O_NONBLOCK is set or not.

 I have tried to find POSIX saying that many times since I think
 it is the correct behaviour, but I couldn't find it for either
 select() or poll() before today.  Now I can find it for [p]select()
 but not for poll().  From POSIX.1-2001-draft7.txt for pselect():

 %%%
 31193              A descriptor shall be considered ready for reading when a call to an input function with
 31194              O_NONBLOCK clear would not block, whether or not the function would transfer data
 31195              successfully. (The function might return data, an end-of-file indication, or an error other than
 31196              one indicating that it is blocked, and in each of these cases the descriptor shall be considered
 31197              ready for reading.)
 %%%

 Other parts of POSIX make it clear that O_NONBLOCK reads must never block,
 so if O_NONBLOCK is set then pselect() for read must never block either.
 This requires the behaviour of pselect() dependent on O_NONBLOCK but not on
 the current or previous presence of a writer.

 I still can't find similar words for poll().  The spec for poll() seems to
 be fuzzier in general, and the closest I could find is:

 %%%
 27931               POLLIN             Data other than high-priority data may be read without blocking.
 %%%

 "Data" doesn't seem to be defined anywhere.  Is null data (at EOF) data?...
 poll() presumably does depend on the previous presence of a writer, since
 POLLHUP only makes sense if there was a previous presence.  But POLLHUP
 seems to be specified even more fuzzily than POLLIN.  Clearly previous
 presences of writers shouldn't count if the previous set of readers, writers
 and data all went away, but this doesn't seem to be specified in detail, and
 what happens with multiple readers and/or writers living across sessions
 either intentionally or due to races or bugs?

 I intened to check the behaviour for this in my test programs but don't
 seem to have done it.  I intended to follow Linux's behaviour even if this
 is nonstandard.  Linux used to have some special cases including a gripe
 in a comment about having to have them to match Sun's behaviour, but I
 couldn't find these when I last checked.  Perhaps the difference is
 precisely between select() and poll(), to follow the standard for select()
 and exploit the fuzziness for poll().

 > > I would be happy to help out, but I'm not familiar with
 > > that part of the kernel code.
 >
 > Well, now (a few hours later) I'm a little bit more
 > familiar with it.  Patch attached below, and also
 > available from this URL:
 > http://www.secnetix.de/~olli/tmp/fifodiff.txt
 >
 > With that patch, my own test program (the one from
 > the top of this PR) runs successfully, i.e. EOF is
 > recognized correctly in all cases that I have tested
 > (with and without select()), and it also behaves
 > standard-compliant when O_NONBLOCK is used (see
 > above).

 I'll add tests for the O_NONBLOCK behaviour before mailing the
 test for poll().

 > The patch addresses the following things:
 > - implement POLLHUP in sopoll().
 > - remove POLLINIGNEOF.
 > - selscan() doesn't need to be patched: it already
 >   works as expected when fo_poll() returns POLLHUP.
 > - I don't think the comment is entirely incorrect,
 >   but I'm not sure, so I left it alone.

 Ah, selscan() just uses the result of fo_poll() as a boolean to decide
 whether a descriptor is ready (I thought that it checked only for the
 bits that it asked for).  fo_poll() returns revents.  Thus selscan()
 returns for when one of the output-only parameter bits like POLLHUP
 is set even if none of the input-output parameter bits are set.  I think
 the comment should say this more directly.

 > --- src/sys/fs/fifofs/fifo_vnops.c.orig	Tue Mar 21 09:42:32 2006
 > +++ src/sys/fs/fifofs/fifo_vnops.c	Wed Mar 22 18:49:27 2006
 > @@ -661,31 +661,11 @@
 > ...

 Good.
 > --- src/sys/kern/uipc_socket.c.orig	Wed Dec 28 19:05:13 2005
 > +++ src/sys/kern/uipc_socket.c	Wed Mar 22 18:44:08 2006
 > @@ -2036,23 +2036,26 @@
 > 		if (soreadable(so))
 > 			revents |= events & (POLLIN | POLLRDNORM);
 >
 > -	if (events & POLLINIGNEOF)
 > -		if (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat ||
 > -		    !TAILQ_EMPTY(&so->so_comp) || so->so_error)
 > -			revents |= POLLINIGNEOF;
 > -

 Good.

 > -	if (events & (POLLOUT | POLLWRNORM))
 > -		if (sowriteable(so))
 > -			revents |= events & (POLLOUT | POLLWRNORM);
 > +	if (events & (POLLOUT | POLLWRNORM) && sowriteable(so))
 > +		revents |= events & (POLLOUT | POLLWRNORM);
 > +	else {
 > +		/*
 > +		 * POLLOUT and POLLHUP shall not both be set.
 > +		 * Therefore check only for POLLHUP if POLLOUT
 > +		 * has not been set.  (Note that POLLHUP need
 > +		 * not be in events; it's always checked.)
 > +		 */
 > +		if (so->so_rcv.sb_state & SBS_CANTRCVMORE &&
 > +		    so->so_rcv.sb_cc == 0)
 > +			revents |= POLLHUP;
 > +	}

 I think SBS_CANTSENDMORE in so_snd should be checked here.  This flag has
 already been checked to be clear for in sowritable() in most cases.  I
 think the receiver count shouldn't be checked here.  I'm surprised that
 my test succeeds with this -- doesn't it prevent POLLHUP being set in the
 hangup+<old data to read> case?  Old versions of fifo_vnops.c had bugs
 from confusing these flags and/or the sender/receiver.  I hope these
 are all fixed now.

 This might be clearer with SBS_CANTSENDMORE checked first.
 SBS_CANTSENDMORE set implies !sowriteable() so the behaviour is the same,
 and I think it is clearer to not even look at the output bits in
 `events' in the hangup case.  I think that none of the other checks in
 sowriteable() is related to hangup, but I don't understand the
 PR_CONNREQUIRED one.

 >...

 The rest looks good.

 This also fixes poll() on sockets.  Sockets are more often used than named
 pipes so the change needs a few weeks of testing before MFC.  Applications
 might be confused by poll() actually setting POLLHUP.  It sets only POLLIN
 for hangup now (this is because SBS_CANTRCVMORE implies soreadable()).

 Bruce