kern/53447: poll(2) semantics differ from susV3/POSIX

clemens fischer ino-qc at spotteswoode.de.eu.org
Wed Jun 18 07:10:16 PDT 2003


>Number:         53447
>Category:       kern
>Synopsis:       poll(2) semantics differ from susV3/POSIX
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Jun 18 07:10:13 PDT 2003
>Closed-Date:
>Last-Modified:
>Originator:     Clemens Fischer ino-qc at spotteswoode.dnsalias.org
>Release:        FreeBSD 4.8-STABLE i386
>Organization:
<organization of PR author (multiple lines)>
>Environment:
System: FreeBSD private.spanker 4.8-STABLE FreeBSD 4.8-STABLE #3: Thu May 29 06:19:09 CEST 2003 root at private.spanker:/usr/src/sys/compile/n1 i386

the program is fnord (http://www.fefe.de/fnord/) running on
freebsd-4.8/i386 and serving a wiki-CGI.

>Description:

a colleague and i independantly made the same observation: we are
running a wiki on a small HTTP server.  every page served by it had an
error message on the bottom: "Looks like the CGI crashed.".  we could
track this down to the code in the server where data is read from the
CGI through a pipe.  this is done using poll(2) and read(2).  the same
code runs without problems on linux, and we can patch fnord to work
around the problem, which is otherwise reproducable.

this is part of the discussion thread on the mailinglist:

  > i had the same problem on my freebsd-4.8-stable.  every page had
  > "looks like your CGI crashed" at the bottom, but they actually
  > worked fine.  after applying the patch the problem has
  > disappeared.

  Mhh, then this is apparently a problem with BSD poll() semantics.

  poll is expected to set the POLLHUP bit on EOF, but FreeBSD
  apparently does not, but signals POLLIN and then returns 0 on
  read().  Is someone involved with the FreeBSD crowd and can post a
  bug report for this?

  ---
  See the single unix specification.

    http://www.opengroup.org/onlinepubs/007904975/functions/poll.html

  POLLHUP shall be set if the device has been disconnected, i.e. for
  sockets if the other side has called shutdown or close.  We are
  polling on a pipe from the CGI.  When the CGI is done, the pipe is
  closed, and we should received POLLHUP.  That is exactly what this
  return bit is for.

>How-To-Repeat:

this is an excerpt of fnords code.  using poll(2) on the pipe to the
CGIs server in this way will produce the expected results, but the
last line always states: "Looks like the CGI crashed.".

static void start_cgi(int nph,const char* pathinfo,const char *const *envp) {
  size_t size=0;
  int n;
  int pid;
  char ibuf[8192],obuf[8192];
  int fd[2],df[2];

  if (pipe(fd)||pipe(df)) {
    badrequest(500,"Internal Server Error","Server Resource problem.");
  }

  if ((pid=fork())) {
    if (pid>0) {
      struct pollfd pfd[2];
      int nr=1;
      int startup=1;

      signal(SIGCHLD,cgi_child);
      signal(SIGPIPE,SIG_IGN);		/* NO! no signal! */

      close(df[0]);
      close(fd[1]);

      pfd[0].fd=fd[0];
      pfd[0].events=POLLIN;
      pfd[0].revents=0;

      pfd[1].fd=df[1];
      pfd[1].events=POLLOUT;
      pfd[1].revents=0;

      if (post_len) ++nr;	/* have post data */
      else close(df[1]);	/* no post data */

      while(poll(pfd,nr,-1)!=-1) {
	/* read from cgi */
	if (pfd[0].revents&POLLIN) {
	  n=read(fd[0],ibuf,sizeof(ibuf));
	  // if (n<=0) goto cgi_500;             this is the original code
          if (n<=0 && errno!=0) goto cgi_500; // this is the workaround
	  /* startup */
	  if (startup) {
	    startup=0;
	    ...
	  }
	  /* non startup */
	  else {
	    buffer_put(buffer_1,ibuf,n);
	  }
	  size+=n;
	  if (pfd[0].revents&POLLHUP) break;
	}
	/* write to cgi the post data */
	else if (nr>1 && pfd[1].revents&POLLOUT) {
	  if (post_miss) {
	    write(df[1],post_miss,post_mlen);
	    post_miss=0;
	  }
	  else if (post_mlen<post_len) {
	    n=read(0,obuf,sizeof(obuf));
	    if (n<1) goto cgi_500;
	    post_mlen+=n;
	    write(df[1],obuf,n);
	  }
	  else {
	    --nr;
	    close(df[1]);
	  }
	}
	else if (pfd[0].revents&POLLHUP) break;
	else {
cgi_500:  if (startup)
	    badrequest(500,"Internal Server Error","Looks like the CGI crashed.");
	  else {
	    buffer_puts(buffer_1,"\n\n");
	    buffer_puts(buffer_1,"Looks like the CGI crashed.");
	    buffer_puts(buffer_1,"\n\n");
	    break;
	  }
	}
      }

      buffer_flush(buffer_1);
      dolog(size);
      ...

>Fix:

i have classified the problems Severity as "serious", although for the
case of poll(2) loops a workaround is easy to find.  on the other hand
people porting susv3 compliant software to freebsd will have to do
this for every poll(2) use.  so it can well become critical to other
people who aren't aware of this difference.

here's a typical expression found in linux application code:

      while (poll(pfd,nr,-1) != -1) {
	/* read from cgi */
	if (pfd[0].revents & POLLIN) {
	  n = read(fd[0], ibuf, sizeof(ibuf));
	  if (n<=0) goto cgi_500;                // <-
	  ...
        }
      }

and here's what makes it run reliably on freebsd-4.8:

      while (poll(pfd,nr,-1) != -1) {
	/* read from cgi */
	if (pfd[0].revents & POLLIN) {
	  n = read(fd[0], ibuf, sizeof(ibuf));
          if (n<=0 && errno!=0) goto cgi_500;    // <-
	  ...
        }
      }

  clemens
>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list