kern/94772: FIFOs (named pipes) + select() == broken

Thu Mar 23 10:10:18 UTC 2006

The following reply was made to PR kern/94772; it has been noted by GNATS.

From: Oliver Fromme <olli at lurza.secnetix.de>
To: bde at zeta.org.au (Bruce Evans)
Cc: bug-followup at freebsd.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Thu, 23 Mar 2006 11:08:50 +0100 (CET)

 Hi,

 I'm answering on several emails at once,
 to make things simpler.

 Bruce Evans wrote:
  > Oliver Fromme wrote:
  > > Bruce Evans wrote:
  > > > Here is a program that tests more cases.  I made it give no output
  > > > ...
  > >
  > > It does not produce any output on Solaris 9, NetBSD 3.0,
  > > DEC UNIX 4.0D and Linux 2.4.32.  (I had to replace signal()
  > > with sigset() on Solaris, add a few missing #includes and
  > > write small replacements for err() and warnx().)
  > 
  > I thought that the signal() was portable.

 Unfortunately, it's not.  SysV (e.g. Solaris) has different
 semantics:  When the signal handler is executed, the signal's
 disposition is set to SIG_DFL.  That means that the handler
 is only executed once, unless you call signal() again.  The
 solution is to use sigset() which behaves more like BSD's
 signal().  On the other hand, FreeBSD doesn't know sigset()
 at all.

 On Linux, the situation is even more complex:  When using
 libc4 or libc5, signal() has SysV semantics, and when using
 glibc2, it has BSD semantics.  However, when using glibc2
 with -D_XOPEN_SOURCE=500, it's again SysV, and in this
 latter case sigset() is defined in the header file (not
 in the other cases).

 Bottom line:  For portable programs, neither signal() nor
 sigset() should be used.  Instead, sigaction() should be
 used, which behaves the same on BSD and SysV, and should
 be supported everywhere.

  > Under FreeBSD, <stdlib.h>
  > of all things is the only missing include.

 FreeBSD generally seems to require less includes than the
 standard says.  I had to add <sys/types.h>, <stdlib.h>,
 <string.h> and <stdio.h> (although the latter two probably
 only because of my err() and warnx() replacements).

  > I stopped trying to avoid
  > using the err() family in test programs when Linux got them 6-8 years
  > ago.

 Yes, but Solaris and DEC UNIX (and probably other commercial
 UNIX systems) don't have them.  Fortunately, it was easy
 to write replacements in this case, because they were only
 called with single constant strings.

  > > > (By the way, DEC UNIX 4.0D _does_ have a bug:  If the FIFO
  > > > has O_NONBLOCK set and no writer has opened the FIFO, then
  > > > select() doesn't block.
  > >
  > > Actually, it's not a bug.  I've read SUSv3 wrong.  That
  > > behaviour is perfectly fine.  In fact, SUSv3 (a.k.a.
  > > POSIX-2001) requires that select() doesn't block in that
  > > case, and the behaviour of select() and poll() must be
  > > independet of whether O_NONBLOCK is set or not.
  > 
  > I have tried to find POSIX saying that many times since I think
  > it is the correct behaviour, but I couldn't find it for either
  > select() or poll() before today.  Now I can find it for [p]select()
  > but not for poll().  From POSIX.1-2001-draft7.txt for pselect():
  > 
  > %%%
  > 31193              A descriptor shall be considered ready for reading when a call to an input function with
  > 31194              O_NONBLOCK clear would not block, whether or not the function would transfer data
  > 31195              successfully. (The function might return data, an end-of-file indication, or an error other than
  > 31196              one indicating that it is blocked, and in each of these cases the descriptor shall be considered
  > 31197              ready for reading.)
  > %%%

 I've got SUSv3 a.k.a. IEEE Std 1003.1-2001 ("POSIX").  You
 can download it from The Open Group's website (you have to
 register with them, but it's free).  However, I don't know
 how much it differs from the draft that you have.

 The above paragraph from the select() spec seems to be the
 same.

  > Other parts of POSIX make it clear that O_NONBLOCK reads must never block,

 That's right, but it does not matter for select()/poll().

  > so if O_NONBLOCK is set then pselect() for read must never block either.

 No, I think that's not right.  The standard clearly says
 that select() should always behave as if O_NONBLOCK was not
 set:  "A descriptor shall be considered ready for reading
 when a call to an input function with O_NONBLOCK clear
 would not block".

 For poll there is a similar statement which is even clearer:
 "The poll() function shall not be affected by the O_NONBLOCK
 flag."

 Therefore:  select() and poll() are not dependent on the
 O_NONBLOCK flag.  They should always behave as if it was
 not set.

 Furthermore, the standard says a few things about the
 read() function when used on (nameless) pipes or FIFOs:

 [quote begin]
    When attempting to read from an empty pipe or FIFO:                        

     * If no process has the pipe open for writing, read()
       shall return 0 to indicate end-of-file.                                               

     * If some process has the pipe open for writing and
       O_NONBLOCK is set, read() shall return -1 and set
       errno to [EAGAIN].                      

     * If some process has the pipe open for writing and
       O_NONBLOCK is clear, read() shall block the calling
       thread until some data is written or the pipe is
       closed by all processes that had the pipe open for
       writing.                                                      
 [quote end]

 That clearly means that select() should _not_ block when
 no process has the FIFO open for writing.  (Because the
 select() behaviour depends on the behaviour of read() as
 if the O_NONBLOCK flag is clear.)

 Furthermore, it als means that it does _not_ matter if
 there was a a writer previously or not.

  > > -	if (events & (POLLOUT | POLLWRNORM))
  > > -		if (sowriteable(so))
  > > -			revents |= events & (POLLOUT | POLLWRNORM);
  > > +	if (events & (POLLOUT | POLLWRNORM) && sowriteable(so))
  > > +		revents |= events & (POLLOUT | POLLWRNORM);
  > > +	else {
  > > +		/*
  > > +		 * POLLOUT and POLLHUP shall not both be set.
  > > +		 * Therefore check only for POLLHUP if POLLOUT
  > > +		 * has not been set.  (Note that POLLHUP need
  > > +		 * not be in events; it's always checked.)
  > > +		 */
  > > +		if (so->so_rcv.sb_state & SBS_CANTRCVMORE &&
  > > +		    so->so_rcv.sb_cc == 0)
  > > +			revents |= POLLHUP;
  > > +	}
  > 
  > I think SBS_CANTSENDMORE in so_snd should be checked here.

 Agreed.

  > I think the receiver count shouldn't be checked here.

 Agreed.  That would handle the case correctly where both
 POLLIN and POLLHUP can be set at the same time.

  > I'm surprised that
  > my test succeeds with this -- doesn't it prevent POLLHUP being set in the
  > hangup+<old data to read> case?

 Yes, I think it prevents that (i.e. POLLHUP would act more
 like a "POLLEOF").  That's not correct behaviour, of course.
 I'll fix that.

  > This might be clearer with SBS_CANTSENDMORE checked first.
  > SBS_CANTSENDMORE set implies !sowriteable() so the behaviour is the same,
  > and I think it is clearer to not even look at the output bits in
  > `events' in the hangup case.

 So you mean in the SBS_CANTSENDMORE case, POLLHUP should be
 set without checking if the caller has requested POLLOUT in
 the events mask?  That sounds reasonable, because POLLOUT
 certainly can't be returned in that case.  It makes the
 code more complex, though.

 I'll have a look at that and try to implement it that way.

  > This also fixes poll() on sockets.  Sockets are more often used than named
  > pipes so the change needs a few weeks of testing before MFC.

 I see.

 Bruce Evans wrote:
  > Bruce Evans wrote:
  > > I intened to check the behaviour for this in my test programs but don't
  > > seem to have done it.  I intended to follow Linux's behaviour even if this
  > > is nonstandard.  Linux used to have some special cases including a gripe
  > > in a comment about having to have them to match Sun's behaviour, but I
  > > couldn't find these when I last checked.  Perhaps the difference is
  > > precisely between select() and poll(), to follow the standard for select()
  > > and exploit the fuzziness for poll().
  > 
  > I added the check.

 I'll try that later today.  (At least I hope to have enough
 time for it.)

  > select() on a named pipe:
  > % selectp: state 0: expected set; got clear
  > [...]
  > Now there is an extra failure for state 0.  Some complications will be
  > required to fix this without breaking poll() on named pipe.  State 0 is
  > when the read descriptor is open with O_NONBLOCK and there has "never"
  > been a writer.  In this state, select() on the read descriptor must
  > succeed to conform to POSIX, but poll() on the read descriptor must
  > block to conform to Linux.  I think the Linux behaviour is what happens
  > naturally -- the socket isn't hung up so sopoll() won't set POLLHUP,

 Now that might be debatable.  SUSv3 says that POLLHUP means
 that the device is disconnected.  That doesn't sound like
 it should make a difference if there was a previous writer
 or not.  In fact, when I open a FIFO which doesn't have a 
 writer currently, there's no way to know if there was a
 writer previously (before I opened the FIFO) who "hung it
 up".

 Personally I think that Linux is in error.  POLLHUP should
 be set when "the device is disconnected" (SUSv3), i.e. when
 there is no writer, period.

 However, I see your point that it might be more beneficial
 to be Linux-compliant instead of standard-compliant.

  > and there is no input so sopoll() won't set POLLIN, so sopoll() won't
  > set any flags in revents and poll() will block.  An extra flag seems to
  > be necessary to distinguish this state so that select() doesn't block.

 Yes, if we want to be Linux-compliant.  That'll make the
 code a lot more complicated.  *sigh*

 Best regards
    Oliver

 -- 
 Oliver Fromme,  secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
 Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
 Any opinions expressed in this message may be personal to the author
 and may not necessarily reflect the opinions of secnetix in any way.

 "The ITU has offered the IETF formal alignment with its
 corresponding technology, Penguins, but that won't fly."
         -- RFC 2549