kern/94772: FIFOs (named pipes) + select() == broken
Oliver Fromme
olli at lurza.secnetix.de
Thu Mar 23 10:10:18 UTC 2006
The following reply was made to PR kern/94772; it has been noted by GNATS.
From: Oliver Fromme <olli at lurza.secnetix.de>
To: bde at zeta.org.au (Bruce Evans)
Cc: bug-followup at freebsd.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Thu, 23 Mar 2006 11:08:50 +0100 (CET)
Hi,
I'm answering on several emails at once,
to make things simpler.
Bruce Evans wrote:
> Oliver Fromme wrote:
> > Bruce Evans wrote:
> > > Here is a program that tests more cases. I made it give no output
> > > ...
> >
> > It does not produce any output on Solaris 9, NetBSD 3.0,
> > DEC UNIX 4.0D and Linux 2.4.32. (I had to replace signal()
> > with sigset() on Solaris, add a few missing #includes and
> > write small replacements for err() and warnx().)
>
> I thought that the signal() was portable.
Unfortunately, it's not. SysV (e.g. Solaris) has different
semantics: When the signal handler is executed, the signal's
disposition is set to SIG_DFL. That means that the handler
is only executed once, unless you call signal() again. The
solution is to use sigset() which behaves more like BSD's
signal(). On the other hand, FreeBSD doesn't know sigset()
at all.
On Linux, the situation is even more complex: When using
libc4 or libc5, signal() has SysV semantics, and when using
glibc2, it has BSD semantics. However, when using glibc2
with -D_XOPEN_SOURCE=500, it's again SysV, and in this
latter case sigset() is defined in the header file (not
in the other cases).
Bottom line: For portable programs, neither signal() nor
sigset() should be used. Instead, sigaction() should be
used, which behaves the same on BSD and SysV, and should
be supported everywhere.
> Under FreeBSD, <stdlib.h>
> of all things is the only missing include.
FreeBSD generally seems to require less includes than the
standard says. I had to add <sys/types.h>, <stdlib.h>,
<string.h> and <stdio.h> (although the latter two probably
only because of my err() and warnx() replacements).
> I stopped trying to avoid
> using the err() family in test programs when Linux got them 6-8 years
> ago.
Yes, but Solaris and DEC UNIX (and probably other commercial
UNIX systems) don't have them. Fortunately, it was easy
to write replacements in this case, because they were only
called with single constant strings.
> > > (By the way, DEC UNIX 4.0D _does_ have a bug: If the FIFO
> > > has O_NONBLOCK set and no writer has opened the FIFO, then
> > > select() doesn't block.
> >
> > Actually, it's not a bug. I've read SUSv3 wrong. That
> > behaviour is perfectly fine. In fact, SUSv3 (a.k.a.
> > POSIX-2001) requires that select() doesn't block in that
> > case, and the behaviour of select() and poll() must be
> > independet of whether O_NONBLOCK is set or not.
>
> I have tried to find POSIX saying that many times since I think
> it is the correct behaviour, but I couldn't find it for either
> select() or poll() before today. Now I can find it for [p]select()
> but not for poll(). From POSIX.1-2001-draft7.txt for pselect():
>
> %%%
> 31193 A descriptor shall be considered ready for reading when a call to an input function with
> 31194 O_NONBLOCK clear would not block, whether or not the function would transfer data
> 31195 successfully. (The function might return data, an end-of-file indication, or an error other than
> 31196 one indicating that it is blocked, and in each of these cases the descriptor shall be considered
> 31197 ready for reading.)
> %%%
I've got SUSv3 a.k.a. IEEE Std 1003.1-2001 ("POSIX"). You
can download it from The Open Group's website (you have to
register with them, but it's free). However, I don't know
how much it differs from the draft that you have.
The above paragraph from the select() spec seems to be the
same.
> Other parts of POSIX make it clear that O_NONBLOCK reads must never block,
That's right, but it does not matter for select()/poll().
> so if O_NONBLOCK is set then pselect() for read must never block either.
No, I think that's not right. The standard clearly says
that select() should always behave as if O_NONBLOCK was not
set: "A descriptor shall be considered ready for reading
when a call to an input function with O_NONBLOCK clear
would not block".
For poll there is a similar statement which is even clearer:
"The poll() function shall not be affected by the O_NONBLOCK
flag."
Therefore: select() and poll() are not dependent on the
O_NONBLOCK flag. They should always behave as if it was
not set.
Furthermore, the standard says a few things about the
read() function when used on (nameless) pipes or FIFOs:
[quote begin]
When attempting to read from an empty pipe or FIFO:
* If no process has the pipe open for writing, read()
shall return 0 to indicate end-of-file.
* If some process has the pipe open for writing and
O_NONBLOCK is set, read() shall return -1 and set
errno to [EAGAIN].
* If some process has the pipe open for writing and
O_NONBLOCK is clear, read() shall block the calling
thread until some data is written or the pipe is
closed by all processes that had the pipe open for
writing.
[quote end]
That clearly means that select() should _not_ block when
no process has the FIFO open for writing. (Because the
select() behaviour depends on the behaviour of read() as
if the O_NONBLOCK flag is clear.)
Furthermore, it als means that it does _not_ matter if
there was a a writer previously or not.
> > - if (events & (POLLOUT | POLLWRNORM))
> > - if (sowriteable(so))
> > - revents |= events & (POLLOUT | POLLWRNORM);
> > + if (events & (POLLOUT | POLLWRNORM) && sowriteable(so))
> > + revents |= events & (POLLOUT | POLLWRNORM);
> > + else {
> > + /*
> > + * POLLOUT and POLLHUP shall not both be set.
> > + * Therefore check only for POLLHUP if POLLOUT
> > + * has not been set. (Note that POLLHUP need
> > + * not be in events; it's always checked.)
> > + */
> > + if (so->so_rcv.sb_state & SBS_CANTRCVMORE &&
> > + so->so_rcv.sb_cc == 0)
> > + revents |= POLLHUP;
> > + }
>
> I think SBS_CANTSENDMORE in so_snd should be checked here.
Agreed.
> I think the receiver count shouldn't be checked here.
Agreed. That would handle the case correctly where both
POLLIN and POLLHUP can be set at the same time.
> I'm surprised that
> my test succeeds with this -- doesn't it prevent POLLHUP being set in the
> hangup+<old data to read> case?
Yes, I think it prevents that (i.e. POLLHUP would act more
like a "POLLEOF"). That's not correct behaviour, of course.
I'll fix that.
> This might be clearer with SBS_CANTSENDMORE checked first.
> SBS_CANTSENDMORE set implies !sowriteable() so the behaviour is the same,
> and I think it is clearer to not even look at the output bits in
> `events' in the hangup case.
So you mean in the SBS_CANTSENDMORE case, POLLHUP should be
set without checking if the caller has requested POLLOUT in
the events mask? That sounds reasonable, because POLLOUT
certainly can't be returned in that case. It makes the
code more complex, though.
I'll have a look at that and try to implement it that way.
> This also fixes poll() on sockets. Sockets are more often used than named
> pipes so the change needs a few weeks of testing before MFC.
I see.
Bruce Evans wrote:
> Bruce Evans wrote:
> > I intened to check the behaviour for this in my test programs but don't
> > seem to have done it. I intended to follow Linux's behaviour even if this
> > is nonstandard. Linux used to have some special cases including a gripe
> > in a comment about having to have them to match Sun's behaviour, but I
> > couldn't find these when I last checked. Perhaps the difference is
> > precisely between select() and poll(), to follow the standard for select()
> > and exploit the fuzziness for poll().
>
> I added the check.
I'll try that later today. (At least I hope to have enough
time for it.)
> select() on a named pipe:
> % selectp: state 0: expected set; got clear
> [...]
> Now there is an extra failure for state 0. Some complications will be
> required to fix this without breaking poll() on named pipe. State 0 is
> when the read descriptor is open with O_NONBLOCK and there has "never"
> been a writer. In this state, select() on the read descriptor must
> succeed to conform to POSIX, but poll() on the read descriptor must
> block to conform to Linux. I think the Linux behaviour is what happens
> naturally -- the socket isn't hung up so sopoll() won't set POLLHUP,
Now that might be debatable. SUSv3 says that POLLHUP means
that the device is disconnected. That doesn't sound like
it should make a difference if there was a previous writer
or not. In fact, when I open a FIFO which doesn't have a
writer currently, there's no way to know if there was a
writer previously (before I opened the FIFO) who "hung it
up".
Personally I think that Linux is in error. POLLHUP should
be set when "the device is disconnected" (SUSv3), i.e. when
there is no writer, period.
However, I see your point that it might be more beneficial
to be Linux-compliant instead of standard-compliant.
> and there is no input so sopoll() won't set POLLIN, so sopoll() won't
> set any flags in revents and poll() will block. An extra flag seems to
> be necessary to distinguish this state so that select() doesn't block.
Yes, if we want to be Linux-compliant. That'll make the
code a lot more complicated. *sigh*
Best regards
Oliver
--
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.
"The ITU has offered the IETF formal alignment with its
corresponding technology, Penguins, but that won't fly."
-- RFC 2549
More information about the freebsd-bugs
mailing list