pipe/fifo code merged.

Wed Jan 11 00:14:47 UTC 2012

On Tue, Jan 10, 2012 at 1:04 PM, Bruce Evans <brde at optusnet.com.au> wrote:
> On Tue, 10 Jan 2012, Bruce Evans wrote:
>
>> I think you don't want me to read the patch, since I would see too much
>> detail starting with style bugs.  Anyway..
>> ...
>
>
> One more set of details.
>
> % +             PIPE_UNLOCK(rpipe);
> % +             if (fifo_iseof(fp))
> % +                     events |= POLLINIGNEOF;
> % +             PIPE_LOCK(rpipe);
>
> This is new code (needed to force POLLIGNEOF for fifos).
>
> It is a layering violation to call the fifo code for non-fifos here.
> fifo_iseof() handles this internally by checking fp->vnode->v_fifoinfo.
> The pipe layer should know if it is dealing with a fifo in a better way
> than that.

I fixed this in
http://www.trematerra.net/patches/pipefifo_merge2.4.diff

>
> I don't like unlocking in the middle in general, and here it gives
> races.  We will miss setting POLLIN | POLLRDNORM for certain changes
> if they weren't set earlier and the state changed while unlocked.  Why
> unlock anyway or lock in fifo_iseof()?  Only fi_seqcount == fi_wgen
> is checked under the lock there.  Races in that check are probably
> just as harmless as races here.  And locking doesn't even prevent them,
> since if fi_seqcount or fi_wgen can change underneath us, they can also
> change just after we check them.  They rarely change compared with the
> buffer count raced with above.

fixed that too as you suggest.

> % @@ -1326,58 +1549,66 @@ pipe_poll(fp, events, active_cred, td)
> %       struct ucred *active_cred;
> %       struct thread *td;
> %  {
> % -     struct pipe *rpipe = fp->f_data;
> % +     struct pipeinfo *pip = fp->f_data;
> % +     struct pipe *rpipe;
> %       struct pipe *wpipe;
> %       int revents = 0;
> %  #ifdef MAC
> %       int error;
> %  #endif
> % % -   wpipe = rpipe->pipe_peer;
>
> % +     rpipe = pip->pi_rpipe;
> % +     wpipe = pip->pi_wpipe->pipe_peer;
> %       PIPE_LOCK(rpipe);
> %  #ifdef MAC
> %       error = mac_pipe_check_poll(active_cred, rpipe->pipe_pair);
> %       if (error)
> % -             goto locked_error;
> % +             return (0);
>
> Seems to be broken.  The unlock is now missing.

fixed.

>
> % -static int
> % -fifo_poll_f(struct file *fp, int events, struct ucred *cred, struct
> thread *td)
> % -{
> % -     struct fifoinfo *fip;
> % -     struct file filetmp;
> % -     int levents, revents = 0;
> % -
> % -     fip = fp->f_data;
> % -     levents = events &
> % -         (POLLIN | POLLINIGNEOF | POLLPRI | POLLRDNORM | POLLRDBAND);
> % -     if ((fp->f_flag & FREAD) && levents) {
> % -             filetmp.f_data = fip->fi_readsock;
> % -             filetmp.f_cred = cred;
> % -             mtx_lock(&fifo_mtx);
> % -             if (fp->f_seqcount == fip->fi_wgen)
> % -                     levents |= POLLINIGNEOF;
> % -             mtx_unlock(&fifo_mtx);
> % -             revents |= soo_poll(&filetmp, levents, cred, td);
> % -     }
> % -     levents = events & (POLLOUT | POLLWRNORM | POLLWRBAND);
> % -     if ((fp->f_flag & FWRITE) && levents) {
> % -             filetmp.f_data = fip->fi_writesock;
> % -             filetmp.f_cred = cred;
> % -             revents |= soo_poll(&filetmp, levents, cred, td);
> % -     }
> % -     return (revents);
> % -}
>
> This was reasonably clean.  My version is cleaner:
> - POLLIGNEOF is an old mistake of mine.  I tried to kill it, but kib@
>  propagated it to sys_pipe.c too, where it has survived another release
>  or two.  In my version, I still have it in the call to soo_poll() but
>  don't have it in the `levents = events & ...' mask.  Thus it is a
>  pure kernel flag, and acts the same as your isfifo flag -- it tells
>  the socket layer to do something unusual because this is a fifo.  It
>  is not needed any more, since the pipe layer is close to the fifo
>  layer so it can just do something unusual.  It can determine whether
>  the pipe is a fifo without passing around flags (the flag should be
>  in pipe_state).
> - My version is missing the FREAD and FWRITE checks.  These seem to be
>  necessary, but I think they don't belong at this level.  Also, the
>  error handling for them seems quite broken (nonexistent).  I think
>  POLLERR is supposed to be returned for attempts to poll for an
>  impossible condition, but the FREAD and FWRITE checks give a return
>  of 0.  And returning 0 is much worse than returning success, since
>  it will cause at least poll() to block() when it should return,
>  Here is the commit that added these checks:
>
> % ----------------------------
> % revision 1.118
> % date: 2005/09/12 10:16:18;  author: rwatson;  state: Exp;  lines: +2 -2
> % Only poll the fifo for read events if the fifo is attached to a readable
> % file descriptor.  Otherwise, the read end of a fifo might return that it
> % is writable (which it isn't).
>
> But it should return (with POLLERR).  This is an error condition and
> should be detected.
>
> POSIX is fuzzy about this.  It only says that POLLERR is for when an error
> occurred.  It defines the POLLNVAL error clearly as meaning that the fd
> is invalid.  Well that is not so clear.  A non-open fd is clearly invalid.
> This is handled in upper layers.  Polling for a direction that can't work
> can be considered as an invalid fd too, unless "invalid" has its technical
> meaning.  Linux-2.6.10 sets POLLERR for reading from a pipe or fifo with
> no readers, and has an XXX comment saying that most Unices don't do this
> for fifos.  This seems wrong to me, and FreeBSD doesn't do it for any of
> pipes, fifos or sockets.  But for pipes, there is tricky EOF handling
> associated with this condition.  I can't see anywhere where Linux gives
> this based on the open mode.
>
> % % Only poll the fifo for write events if the fifo attached to a writable
> % file descriptor.  Otherwise, the write end of a fifo might return that
> % it is readable (which it isn't).
>
> Seems to be necessary too.  I can't see anywhere where Linux returns
> POLLERR for i/o errors or unwritable files.
>
> % % In the event that a file is FREAD|FWRITE (which is allowed by POSIX, but
> % has undefined behavior), we poll for both.
> % % MFC after:  3 days
> % ----------------------------
>
> select() is interestingly different than poll().  It can't return POLLERR.
> Thus, the old broken behaviour gave the best close to possible behaviour
> for select() at the usual level.  The POLLERR's should make it return
> success, and the false successes in the kernel would have done the same.
> Only cases where there were no false successes in the kernel were broken.
>
>
> I don't like defaults set by initializations in declarations 'revents
> = 0'.  Both the default and the return of 0 here seem to be wrong.
> This is an error condition, so I think POLLERR should be returned, as
> about.  Otherwise, poll() will probably block.  And the block is not
> just transient, at least in the above since the error condition can
> never go away.  You will only be saved from blocking forever if there
> is success on some other file descriptor or event.
>
> %  #endif
> % -     if (events & (POLLIN | POLLRDNORM))
> % -             if ((rpipe->pipe_state & PIPE_DIRECTW) ||
> % -                 (rpipe->pipe_buffer.cnt > 0))
> % -                     revents |= events & (POLLIN | POLLRDNORM);
> % +     if (fp->f_flag & FREAD) {
> % +             if (events & (POLLIN | POLLRDNORM))
> % +                     if ((rpipe->pipe_state & PIPE_DIRECTW) ||
> % +                         (rpipe->pipe_buffer.cnt > 0))
> % +                             revents |= events & (POLLIN | POLLRDNORM);
>
> The change in fifos_vnops.c was done cleanly by adding the FREAD check
> to the events mask check.  With fifos now polled here, it is needed
> (modulo bugs) here too.  But here it makes the important changes for
> fifos, if any, unreadable by indenting everything.
>
> % -     if (events & (POLLOUT | POLLWRNORM))
> % -             if (wpipe->pipe_present != PIPE_ACTIVE ||
> % -                 (wpipe->pipe_state & PIPE_EOF) ||
> % -                 (((wpipe->pipe_state & PIPE_DIRECTW) == 0) &&
> % -                  ((wpipe->pipe_buffer.size - wpipe->pipe_buffer.cnt) >=
> PIPE_BUF ||
> % -                      wpipe->pipe_buffer.size == 0)))
> % -                     revents |= events & (POLLOUT | POLLWRNORM);
> % +             PIPE_UNLOCK(rpipe);
> % +             if (fifo_iseof(fp))
> % +                     events |= POLLINIGNEOF;
> % +             PIPE_LOCK(rpipe);
> % % -   if ((events & POLLINIGNEOF) == 0) {
> % -             if (rpipe->pipe_state & PIPE_EOF) {
> % -                     revents |= (events & (POLLIN | POLLRDNORM));
> % -                     if (wpipe->pipe_present != PIPE_ACTIVE ||
> % -                         (wpipe->pipe_state & PIPE_EOF))
> % -                             revents |= POLLHUP;
> % +             if ((events & POLLINIGNEOF) == 0) {
> % +                     if (rpipe->pipe_state & PIPE_EOF) {
> % +                             revents |= (events & (POLLIN | POLLRDNORM));
> % +                             if (wpipe->pipe_present != PIPE_ACTIVE ||
> % +                                 (wpipe->pipe_state & PIPE_EOF))
> % +                                     revents |= POLLHUP;
> % +                     }
> %               }
> %       }
> % +     if (fp->f_flag & FWRITE)
> % +             if (events & (POLLOUT | POLLWRNORM))
> % +                     if (wpipe->pipe_present != PIPE_ACTIVE ||
> % +                         (wpipe->pipe_state & PIPE_EOF) || % +
>             (((wpipe->pipe_state & PIPE_DIRECTW) == 0) &&
> % +                     ((wpipe->pipe_buffer.size - wpipe->pipe_buffer.cnt)
>>=
> % +                             PIPE_BUF || wpipe->pipe_buffer.size == 0)))
> % +                             revents |= events & (POLLOUT | POLLWRNORM);
> % %     if (revents == 0) {
> % -             if (events & (POLLIN | POLLRDNORM)) {
> % -                     selrecord(td, &rpipe->pipe_sel);
> % -                     if (SEL_WAITING(&rpipe->pipe_sel))
> % -                             rpipe->pipe_state |= PIPE_SEL;
> % -             }
> % +             if (fp->f_flag & FREAD)
> % +                     if (events & (POLLIN | POLLRDNORM)) {
> % +                             selrecord(td, &rpipe->pipe_sel);
> % +                             if (SEL_WAITING(&rpipe->pipe_sel))
> % +                                     rpipe->pipe_state |= PIPE_SEL;
> % +                     }
> % % -           if (events & (POLLOUT | POLLWRNORM)) {
> % -                     selrecord(td, &wpipe->pipe_sel);
> % -                     if (SEL_WAITING(&wpipe->pipe_sel))
> % -                             wpipe->pipe_state |= PIPE_SEL;
> % -             }
> % +             if (fp->f_flag & FWRITE)
> % +                     if (events & (POLLOUT | POLLWRNORM)) {
> % +                             selrecord(td, &wpipe->pipe_sel);
> % +                             if (SEL_WAITING(&wpipe->pipe_sel))
> % +                                     wpipe->pipe_state |= PIPE_SEL;
> % +                     }
> %       }
> % -#ifdef MAC
> % -locked_error:
> % -#endif
> %       PIPE_UNLOCK(rpipe);
> % %     return (revents);
>
> It seems that not much really changed here.  To avoid indentation and
> fix bugs, the FREAD and FWRITE checks should be done up front.  I think
> they can be done before locking and mac checking.  Something like:
>
>        if ((fp->f_flag & FREAD) && (events & (POLLIN | POLLRDNORM))
>                return (POLLERR);
>        if ((fp->f_flag & FWRITE) && (events & (POLLOUT | POLLWRNORM))
>                return (POLLERR);
>        if (events & POLLINIGNEOF)
>                return (POLLER);        /* try to kill this too */
>
> Since the diff for pipe_poll() was unreadable, here it is again with
> the old lines removed.  A few more problems are now obvious:
>
> % @@ -1326,58 +1549,66 @@ pipe_poll(fp, events, active_cred, td)
> %       struct ucred *active_cred;
> %       struct thread *td;
> %  {
> % +     struct pipeinfo *pip = fp->f_data;
> % +     struct pipe *rpipe;
> %       struct pipe *wpipe;
> %       int revents = 0;
> %  #ifdef MAC
> %       int error;
> %  #endif
> % % +   rpipe = pip->pi_rpipe;
> % +     wpipe = pip->pi_wpipe->pipe_peer;
> %       PIPE_LOCK(rpipe);
> %  #ifdef MAC
> %       error = mac_pipe_check_poll(active_cred, rpipe->pipe_pair);
> %       if (error)
> % +             return (0);
> %  #endif
> % +     if (fp->f_flag & FREAD) {
> % +             if (events & (POLLIN | POLLRDNORM))
> % +                     if ((rpipe->pipe_state & PIPE_DIRECTW) ||
> % +                         (rpipe->pipe_buffer.cnt > 0))
> % +                             revents |= events & (POLLIN | POLLRDNORM);
> %
>
> This style bug (extra blank line) was common in old code.  It helps make
> the diffs unreadable too.
>

>
> % % +           if ((events & POLLINIGNEOF) == 0) {
> % +                     if (rpipe->pipe_state & PIPE_EOF) {
> % +                             revents |= (events & (POLLIN | POLLRDNORM));
> % +                             if (wpipe->pipe_present != PIPE_ACTIVE ||
> % +                                 (wpipe->pipe_state & PIPE_EOF))
> % +                                     revents |= POLLHUP;
> % +                     }
>
> This is old code, reindented.  It was not needed, since it used to
> just check for the POLLINIGNEOF mistake in the user events.  Now it
> is needed to give the modified (POLLINIGNEOF) semantics from the kernel
> flag for fifos.  It is much uglier than the corresponding code in the
> old fifo_poll_f().  That begins with putting the relevant user events
> in levents.  So that it doesn't have to repeat the long mask expressions.
> Well, that's about the limits of the cleanups.  Something like the
> above is still needed to give the semantics change.
>
> The socket layer still has code that corresponds exactly to the above.
> It is now not needed, since it now only supports the POLLINIGNEOF
> mistake in the user events.  One copy of this code is bad enough.
>

It seems to me that your concerns aren't related to the patch.
I'll try to address them when the patch will be into the tree,

--
Gianni