kern/94772: FIFOs (named pipes) + select() == broken
Bruce Evans
bde at zeta.org.au
Thu Mar 23 02:20:17 UTC 2006
The following reply was made to PR kern/94772; it has been noted by GNATS.
From: Bruce Evans <bde at zeta.org.au>
To: Oliver Fromme <olli at lurza.secnetix.de>
Cc: bug-followup at freebsd.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Thu, 23 Mar 2006 13:13:15 +1100 (EST)
On Wed, 22 Mar 2006, Oliver Fromme wrote:
> Oliver Fromme wrote:
> > Bruce Evans wrote:
> > > See also:
> > >
> > > PR 76125 (about the same bug)
> The first one (76125) seems completely unrelated. Typo?
Yes, it's actually 76525.
> > (By the way, DEC UNIX 4.0D _does_ have a bug: If the FIFO
> > has O_NONBLOCK set and no writer has opened the FIFO, then
> > select() doesn't block.
>
> Actually, it's not a bug. I've read SUSv3 wrong. That
> behaviour is perfectly fine. In fact, SUSv3 (a.k.a.
> POSIX-2001) requires that select() doesn't block in that
> case, and the behaviour of select() and poll() must be
> independet of whether O_NONBLOCK is set or not.
I have tried to find POSIX saying that many times since I think
it is the correct behaviour, but I couldn't find it for either
select() or poll() before today. Now I can find it for [p]select()
but not for poll(). From POSIX.1-2001-draft7.txt for pselect():
%%%
31193 A descriptor shall be considered ready for reading when a call to an input function with
31194 O_NONBLOCK clear would not block, whether or not the function would transfer data
31195 successfully. (The function might return data, an end-of-file indication, or an error other than
31196 one indicating that it is blocked, and in each of these cases the descriptor shall be considered
31197 ready for reading.)
%%%
Other parts of POSIX make it clear that O_NONBLOCK reads must never block,
so if O_NONBLOCK is set then pselect() for read must never block either.
This requires the behaviour of pselect() dependent on O_NONBLOCK but not on
the current or previous presence of a writer.
I still can't find similar words for poll(). The spec for poll() seems to
be fuzzier in general, and the closest I could find is:
%%%
27931 POLLIN Data other than high-priority data may be read without blocking.
%%%
"Data" doesn't seem to be defined anywhere. Is null data (at EOF) data?...
poll() presumably does depend on the previous presence of a writer, since
POLLHUP only makes sense if there was a previous presence. But POLLHUP
seems to be specified even more fuzzily than POLLIN. Clearly previous
presences of writers shouldn't count if the previous set of readers, writers
and data all went away, but this doesn't seem to be specified in detail, and
what happens with multiple readers and/or writers living across sessions
either intentionally or due to races or bugs?
I intened to check the behaviour for this in my test programs but don't
seem to have done it. I intended to follow Linux's behaviour even if this
is nonstandard. Linux used to have some special cases including a gripe
in a comment about having to have them to match Sun's behaviour, but I
couldn't find these when I last checked. Perhaps the difference is
precisely between select() and poll(), to follow the standard for select()
and exploit the fuzziness for poll().
> > I would be happy to help out, but I'm not familiar with
> > that part of the kernel code.
>
> Well, now (a few hours later) I'm a little bit more
> familiar with it. Patch attached below, and also
> available from this URL:
> http://www.secnetix.de/~olli/tmp/fifodiff.txt
>
> With that patch, my own test program (the one from
> the top of this PR) runs successfully, i.e. EOF is
> recognized correctly in all cases that I have tested
> (with and without select()), and it also behaves
> standard-compliant when O_NONBLOCK is used (see
> above).
I'll add tests for the O_NONBLOCK behaviour before mailing the
test for poll().
> The patch addresses the following things:
> - implement POLLHUP in sopoll().
> - remove POLLINIGNEOF.
> - selscan() doesn't need to be patched: it already
> works as expected when fo_poll() returns POLLHUP.
> - I don't think the comment is entirely incorrect,
> but I'm not sure, so I left it alone.
Ah, selscan() just uses the result of fo_poll() as a boolean to decide
whether a descriptor is ready (I thought that it checked only for the
bits that it asked for). fo_poll() returns revents. Thus selscan()
returns for when one of the output-only parameter bits like POLLHUP
is set even if none of the input-output parameter bits are set. I think
the comment should say this more directly.
> --- src/sys/fs/fifofs/fifo_vnops.c.orig Tue Mar 21 09:42:32 2006
> +++ src/sys/fs/fifofs/fifo_vnops.c Wed Mar 22 18:49:27 2006
> @@ -661,31 +661,11 @@
> ...
Good.
> --- src/sys/kern/uipc_socket.c.orig Wed Dec 28 19:05:13 2005
> +++ src/sys/kern/uipc_socket.c Wed Mar 22 18:44:08 2006
> @@ -2036,23 +2036,26 @@
> if (soreadable(so))
> revents |= events & (POLLIN | POLLRDNORM);
>
> - if (events & POLLINIGNEOF)
> - if (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat ||
> - !TAILQ_EMPTY(&so->so_comp) || so->so_error)
> - revents |= POLLINIGNEOF;
> -
Good.
> - if (events & (POLLOUT | POLLWRNORM))
> - if (sowriteable(so))
> - revents |= events & (POLLOUT | POLLWRNORM);
> + if (events & (POLLOUT | POLLWRNORM) && sowriteable(so))
> + revents |= events & (POLLOUT | POLLWRNORM);
> + else {
> + /*
> + * POLLOUT and POLLHUP shall not both be set.
> + * Therefore check only for POLLHUP if POLLOUT
> + * has not been set. (Note that POLLHUP need
> + * not be in events; it's always checked.)
> + */
> + if (so->so_rcv.sb_state & SBS_CANTRCVMORE &&
> + so->so_rcv.sb_cc == 0)
> + revents |= POLLHUP;
> + }
I think SBS_CANTSENDMORE in so_snd should be checked here. This flag has
already been checked to be clear for in sowritable() in most cases. I
think the receiver count shouldn't be checked here. I'm surprised that
my test succeeds with this -- doesn't it prevent POLLHUP being set in the
hangup+<old data to read> case? Old versions of fifo_vnops.c had bugs
from confusing these flags and/or the sender/receiver. I hope these
are all fixed now.
This might be clearer with SBS_CANTSENDMORE checked first.
SBS_CANTSENDMORE set implies !sowriteable() so the behaviour is the same,
and I think it is clearer to not even look at the output bits in
`events' in the hangup case. I think that none of the other checks in
sowriteable() is related to hangup, but I don't understand the
PR_CONNREQUIRED one.
>...
The rest looks good.
This also fixes poll() on sockets. Sockets are more often used than named
pipes so the change needs a few weeks of testing before MFC. Applications
might be confused by poll() actually setting POLLHUP. It sets only POLLIN
for hangup now (this is because SBS_CANTRCVMORE implies soreadable()).
Bruce
More information about the freebsd-bugs
mailing list