kern/94772: FIFOs (named pipes) + select() == broken

Thu Mar 23 05:10:21 UTC 2006

The following reply was made to PR kern/94772; it has been noted by GNATS.

From: Bruce Evans <bde at zeta.org.au>
To: Oliver Fromme <olli at lurza.secnetix.de>
Cc: bug-followup at freebsd.org
Subject: Re: kern/94772: FIFOs (named pipes) + select() == broken
Date: Thu, 23 Mar 2006 16:02:54 +1100 (EST)

 On Thu, 23 Mar 2006, Bruce Evans wrote:

 > On Wed, 22 Mar 2006, Oliver Fromme wrote:
 >> Oliver Fromme wrote:
 >> > Bruce Evans wrote:

 > I intened to check the behaviour for this in my test programs but don't
 > seem to have done it.  I intended to follow Linux's behaviour even if this
 > is nonstandard.  Linux used to have some special cases including a gripe
 > in a comment about having to have them to match Sun's behaviour, but I
 > couldn't find these when I last checked.  Perhaps the difference is
 > precisely between select() and poll(), to follow the standard for select()
 > and exploit the fuzziness for poll().

 I added the check.  Linux-2.6.10 in fact acts as guessed above.  So the
 check for select() is for the behaviour specified by POSIX (select() on
 a read descriptor that is in nonblocking mode and is for a fifo that has
 never had a writer returns success), while the check for poll() is
 for exactly the opposite behaviour (poll() blocks instead of returning
 with POLLIN set; the test actually uses a nonblocking poll() and only
 sees checks for POLLIN not set, since a test that poll() blocks would
 be messier and I think I understand at least the FreeBSD implementation
 well enough to know that this test is equivalent).

 > I'll add tests for the O_NONBLOCK behaviour before mailing the
 > test for poll().

 First a small change to add it to the select() test:

 %%%
 --- select.c~	Sun Feb 12 23:42:30 2006
 +++ select.c	Thu Mar 23 13:47:23 2006
 @@ -30,7 +30,19 @@
   		err(1, "open for read");
   #endif
 -	kill(ppid, SIGUSR1);
 +	if (fd >= FD_SETSIZE)
 +		errx(1, "fd = %d too large for select()", fd);
 +
 +#ifdef NAMEDPIPE
 +	FD_ZERO(&rfds);
 +	FD_SET(fd, &rfds);
 +	tv.tv_sec = 0;
 +	tv.tv_usec = 0;
 +	if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0)
 +		err(1, "select");
 +	if (!FD_ISSET(fd, &rfds))
 +		warnx("state 0: expected set; got clear");
 +#endif

 -	/* XXX should check that fd fits in rfds. */
 +	kill(ppid, SIGUSR1);

   	usleep(1);
 %%%

 poll() test:

 %%%
 #include <sys/poll.h>
 #include <sys/stat.h>

 #include <err.h>
 #include <errno.h>
 #include <fcntl.h>
 #include <signal.h>
 #include <unistd.h>

 static pid_t cpid;
 static pid_t ppid;
 static volatile sig_atomic_t state;

 static void
 catch(int sig)
 {
  	state++;
 }

 #ifdef USE_POLLINIGNEOF
 /*
   * FreeBSD's POLLINIGNEOF (which causes half of the bugs when the kernel
   * uses it) can be used to fix up the broken cases 3 and 6a if the kernel
   * uses it, i.e., for named pipes but not for pipes.  Note that the sense
   * of POLLINIGNEOF is reversed when passed to the kernel -- it means
   * don't-ignore-EOF in .events and if it is set there then it means
   * not-POLLHUP in .revents.
   *
   * This leaves the following broken cases:
   * state 6 (hangup but data available) for poll on a named pipe:
   *         should have POLLIN | POLLHUP, but have POLLIN only.  In this
   *         case, we don't try POLLINIGNEOF since resulting pair of revents
   *         cannot be distinguished from the pair for a case in which POLLIN
   *         only is correct.
   * state 6a (hangup and no data available) for poll on a plain pipe:
   *         should have POLLHUP only, but have POLLIN | POLLHUP.  This is
   *         what I thought is correct, but it is not what Linux-2.6.10 does
   *         for named pipes.  FreeBSD's select() currently depends on POLLIN
   *         being set in this case, and Linux's select() acts the same as
   *         FreeBSD's select() in this case.
   * states 3 and 6a (hangup and no data available) for select on a named pipe:
   *         should have FD_SET() set as in old-FreeBSD and Linux-2.6.10, but
   *         have FD_SET() clear.  The POLLINIGNEOF changes just broke select()
   *         here.  So what was the PR (34020?) which inspired these changes
   *         about?  poll() only?  This regression test uses nonblocking mode
   *         for all polls and a timeout of 0 for all selects so that the
   *         kernel state can be seen without blocking for long.  I hope that
   *         the select() blocks iff the resulting .revents indicates that it
   *         should block (it shouldn't block if it would set POLLIN).
   */
 int
 mypoll(struct pollfd *fds, nfds_t nfds, int timeout)
 {
  	struct pollfd mypfd;
  	int r;

  	r = poll(fds, nfds, timeout);
  	if (nfds != 1 || timeout != 0 || fds[0].revents & POLLIN)
  		return (r);
  	mypfd = fds[0];
  	mypfd.events |= POLLINIGNEOF;
  	r = poll(&mypfd, 1, 0);
  	if (r >= 0) {
  		if (mypfd.revents &= POLLIN) {
  			mypfd.revents &= ~POLLIN;
  			mypfd.revents |= POLLHUP;
  		}
  		fds[0].revents = mypfd.revents;
  	}
  	return (r);
 }
 #define	poll(fds, nfds, timeout)	mypoll((fds), (nfds), (timeout))
 #endif

 static void
 child(int fd)
 {
  	struct pollfd pfd;
  	char buf[256];

 #ifdef NAMEDPIPE
  	pfd.fd = open("p", O_RDONLY | O_NONBLOCK);
  	if (pfd.fd < 0)
  		err(1, "open for read");
 #else
  	pfd.fd = fd;
 #endif
  	pfd.events = POLLIN;

 #ifdef NAMEDPIPE
  	if (poll(&pfd, 1, 0) < 0)
  		err(1, "poll");
  	if (pfd.revents != 0)
  		warnx("state 0: expected 0; got %#x", pfd.revents);
 #endif

  	kill(ppid, SIGUSR1);

  	usleep(1);
  	while (state != 1)
  		;
 #ifndef NAMEDPIPE
  	/*
  	 * The connection cannot be restablished.  Use the code that delays
  	 * the read until after the writer disconnects since that case is
  	 * more interesting.
  	 */
  	state = 4;
  	goto state4;
 #endif
  	if (poll(&pfd, 1, 0) < 0)
  		err(1, "poll");
  	if (pfd.revents != 0)
  		warnx("state 1: expected 0; got %#x", pfd.revents);
  	kill(ppid, SIGUSR1);

  	usleep(1);
  	while (state != 2)
  		;
  	if (poll(&pfd, 1, 0) < 0)
  		err(1, "poll");
  	if (pfd.revents != POLLIN)
  		warnx("state 2: expected POLLIN; got %#x", pfd.revents);
  	if (read(pfd.fd, buf, sizeof buf) != 1)
  		err(1, "read");
  	if (poll(&pfd, 1, 0) < 0)
  		err(1, "poll");
  	if (pfd.revents != 0)
  		warnx("state 2a: expected 0; got %#x", pfd.revents);
  	kill(ppid, SIGUSR1);

  	usleep(1);
  	while (state != 3)
  		;
  	if (poll(&pfd, 1, 0) < 0)
  		err(1, "poll");
  	if (pfd.revents != POLLHUP)
  		warnx("state 3: expected POLLHUP; got %#x",
  		    pfd.revents);
  	kill(ppid, SIGUSR1);

  	/*
  	 * Now we expect a new writer, and a new connection too since
  	 * we read all the data.  The only new point is that we didn't
  	 * start quite from scratch since the read fd is not new.  Check
  	 * startup state as above, but don't do the read as above.
  	 */
  	usleep(1);
  	while (state != 4)
  		;
 state4:
  	if (poll(&pfd, 1, 0) < 0)
  		err(1, "poll");
  	if (pfd.revents != 0)
  		warnx("state 4: expected 0; got %#x", pfd.revents);
  	kill(ppid, SIGUSR1);

  	usleep(1);
  	while (state != 5)
  		;
  	if (poll(&pfd, 1, 0) < 0)
  		err(1, "poll");
  	if (pfd.revents != POLLIN)
  		warnx("state 5: expected POLLIN; got %#x", pfd.revents);
  	kill(ppid, SIGUSR1);

  	usleep(1);
  	while (state != 6)
  		;
  	/*
  	 * Now we have no writer, but should still have data from the old
  	 * writer. Check that we have both a data condition and a hangup
  	 * condition, and that the data can read the data in the usual way.
  	 * Since Linux does this, programs must not quite reading when they
  	 * see POLLHUP; they must see POLLHUP without POLLIN (or another
  	 * input condition) before they decide that there is EOF.  gdb-6.1.1
  	 * is an example of a broken program that quits on POLLHUP only --
  	 * see its event-loop.c.
  	 */
  	if (poll(&pfd, 1, 0) < 0)
  		err(1, "poll");
  	if (pfd.revents != (POLLIN | POLLHUP))
  		warnx("state 6: expected POLLIN | POLLHUP; got %#x",
  		    pfd.revents);
  	if (read(pfd.fd, buf, sizeof buf) != 1)
  		err(1, "read");
  	if (poll(&pfd, 1, 0) < 0)
  		err(1, "poll");
  	if (pfd.revents != POLLHUP)
  		warnx("state 6a: expected POLLHUP; got %#x",
  		    pfd.revents);
  	close(pfd.fd);
  	kill(ppid, SIGUSR1);
  	exit(0);
 }

 static void
 parent(int fd)
 {
  	usleep(1);
  	while (state != 1)
  		;
 #ifdef NAMEDPIPE
  	fd = open("p", O_WRONLY | O_NONBLOCK);
  	if (fd < 0)
  		err(1, "open for write");
 #endif
  	kill(cpid, SIGUSR1);

  	usleep(1);
  	while (state != 2)
  		;
  	if (write(fd, "", 1) != 1)
  		err(1, "write");
  	kill(cpid, SIGUSR1);

  	usleep(1);
  	while (state != 3)
  		;
  	if (close(fd) != 0)
  		err(1, "close for write");
  	kill(cpid, SIGUSR1);

  	usleep(1);
  	while (state != 4)
  	    ;
 #ifndef NAMEDPIPE
  	return;
 #endif
  	fd = open("p", O_WRONLY | O_NONBLOCK);
  	if (fd < 0)
  		err(1, "open for write");
  	kill(cpid, SIGUSR1);

  	usleep(1);
  	while (state != 5)
  		;
  	if (write(fd, "", 1) != 1)
  		err(1, "write");
  	kill(cpid, SIGUSR1);

  	usleep(1);
  	while (state != 6)
  		;
  	if (close(fd) != 0)
  		err(1, "close for write");
  	kill(cpid, SIGUSR1);

  	usleep(1);
  	while (state != 7)
  		;
 }

 int
 main(void)
 {
  	int fd[2];
  	int i;

 #ifdef NAMEDPIPE
  	if (mkfifo("p", 0666) != 0 && errno != EEXIST)
  		err(1, "mkfifo");
 #endif
  	signal(SIGUSR1, catch);
  	ppid = getpid();
  	for (i = 0; i < 2; i++) {
 #ifndef NAMEDPIPE
  		if (pipe(fd) != 0)
  			err(1, "pipe");
 #else
  		fd[0] = -1;
  		fd[1] = -1;
 #endif
  		state = 0;
  		switch (cpid = fork()) {
  		case -1:
  			err(1, "fork");
  		case 0:
  			(void)close(fd[1]);
  			child(fd[0]);
  			break;
  		default:
  			(void)close(fd[0]);
  			parent(fd[1]);
  			break;
  		}
  	}
  	return (0);
 }
 %%%

 The error output of these is null under Linux-2.6.10, but under
 FreeBSD-5.oldcurrent it is:

 poll() on a nameless pipe:
 % poll: state 6a: expected POLLHUP; got 0x11
 % poll: state 6a: expected POLLHUP; got 0x11

 No change for this.  For poll(), Linux consistently doesn't set POLLIN when
 there is only null data, so we check for this.

 poll() on a named pipe:
 % pollp: state 3: expected POLLHUP; got 0
 % pollp: state 6: expected POLLIN | POLLHUP; got 0x1
 % pollp: state 6a: expected POLLHUP; got 0
 % pollp: state 3: expected POLLHUP; got 0
 % pollp: state 6: expected POLLIN | POLLHUP; got 0x1
 % pollp: state 6a: expected POLLHUP; got 0

 No change for this, except I didn't compile with POLLINIGNEOF used so
 the 3 and 6a state don't get fixed up.

 select() on a nameless pipe:
 <no output>

 No change for this.  Here it doesn't matter if hangup is indicated by
 POLLHUP or POLLIN | POLLHUP -- selscan() converts both to data-ready
 although it's null data.

 select() on a named pipe:
 % selectp: state 0: expected set; got clear
 % selectp: state 3: expected set; got clear
 % selectp: state 6a: expected set; got clear
 % selectp: state 0: expected set; got clear
 % selectp: state 3: expected set; got clear
 % selectp: state 6a: expected set; got clear

 Now there is an extra failure for state 0.  Some complications will be
 required to fix this without breaking poll() on named pipe.  State 0 is
 when the read descriptor is open with O_NONBLOCK and there has "never"
 been a writer.  In this state, select() on the read descriptor must
 succeed to conform to POSIX, but poll() on the read descriptor must
 block to conform to Linux.  I think the Linux behaviour is what happens
 naturally -- the socket isn't hung up so sopoll() won't set POLLHUP,
 and there is no input so sopoll() won't set POLLIN, so sopoll() won't
 set any flags in revents and poll() will block.  An extra flag seems to
 be necessary to distinguish this state so that select() doesn't block.
 POLLINIGNEOF was supposed to be this flag.

 Bruce