pipe/fifo code merged.

Giovanni Trematerra gianni at freebsd.org
Tue Jan 10 23:04:58 UTC 2012


On Tue, Jan 10, 2012 at 10:41 AM, Bruce Evans <brde at optusnet.com.au> wrote:
> On Mon, 9 Jan 2012, Giovanni Trematerra wrote:
>
>> On Mon, Jan 9, 2012 at 3:34 PM, Bruce Evans <brde at optusnet.com.au> wrote:
>>>
>>>
>>> I would go the other way, and pessimize pipes to be like fifos.  Then
>>> optimize the socket layer under both.  Fifos are not important, but
>>> they are implemented on top of the socket layer which is important.
>>> Pipes are important. ...
>>> ...
>>>
>>> Linux-2.6.10 implements fifos as a small wrapper around pipes, while
>>> FreeBSD implements them as a large wrapper around sockets.  I hope the
>>> former is what you do -- share most pipe code, without making it more
>>> complicated, and with making the fifo wrapper much simpler.  The Linux
>>> code is much simpler and smaller, since for pipes it it doesn't
>>> implement direct mode, and for sockets it doesn't have to interact with
>>> the complicated socket layer.
>>
>>
>> If you read the patch, as I think you didn't, you'd see that there's no
>> wrapper
>> at all. fifo's code is just fifo_open, fifo_close and another couple of
>> helper
>> functions to deal with VFS, all the remaining code is shared with pipes
>> and
>> no complicated code was added.
>
>
> I think you don't want me to read the patch, since I would see too much
> detail starting with style bugs.  Anyway..

Thanks a lot for your review. I really appreciated it.

[skip]

I'll do my best to fix style bugs.

>
> In this file, I have most experience fixing this function (and open
> and close so that select and poll work).  The above looks simple, but
> has a complex interaction with layers above and below it.  Most of the
> details are in the socket layer.  You had to reimplement these in the
> pipe layer.  The most delicate point involving fs_wgen seems to be
> reimplemented correctly in fifo_iseof().  Before I fixed this for
> fifos, poll and select on pipes (especially for EOF) was less broken
> than for fifos, partly because pipes are simpler -- they can't be
> reopened.  My tests in /usr/src/tools/regression/poll/ are hopefully
> enough to detect any regressions.  Some of the tests are intentionally
> left broken and/or expected to fail, to be bug for bug compatible with
> old kernel bugs.
>

ok. I'll try that regression test

>
> % @@ -295,42 +328,133 @@ pipe_zone_ctor(void *mem, int size, void
> %  static int
> %  pipe_zone_init(void *mem, int size, int flags)
> %  {
> % -     struct pipepair *pp;
> % +     struct umapipe *up;
> % +     struct pipeinfo *pip;
> % +     struct timespec ctime;
> % % -   KASSERT(size == sizeof(*pp), ("pipe_zone_init: wrong size"));
> % +     KASSERT(size == sizeof(*up), ("pipe_zone_init: wrong size"));
> % % -   pp = (struct pipepair *)mem;
> % +     up = (struct umapipe *)mem;
> % +     vfs_timestamp(&ctime);
> % +     pip = &up->pip[0];
> % +     pip->pi_atime = pip->pi_mtime = pip->pi_ctime = ctime;
> % +     pip->pi_ino = -1;
> % % -   mtx_init(&pp->pp_mtx, "pipe mutex", NULL, MTX_DEF | MTX_RECURSE);
> % +     pip = &up->pip[1];
> % +     pip->pi_atime = pip->pi_mtime = pip->pi_ctime = ctime;
> % +     pip->pi_ino = -1;
> % +
> % +     mtx_init(&up->pp.pp_mtx, "pipe mutex", NULL, MTX_DEF | MTX_RECURSE);
> %       return (0);
>
> Timestamps seem to be broken.  jhb pointed out a problem in them, without
> much detail (but I forget the exact detail).  Here it can't be right to
> have timestamps on both sides, since timestamps are a property of the
> file at its lowest level, not the file descriptor or even the file at the
> open file (fcntl) level.  For fifos, timestamps are even more a property
> of the file.
>



> % ...
> % +static int
> % +fifo_zone_init(void *mem, int size, int flags)
> % +{
> % +     struct umafifo *up;
> % +     struct pipeinfo *pip;
> % +
> % +     KASSERT(size == sizeof(*up), ("fifo_zone_init: wrong size"));
> % +
> % +     up = (struct umafifo *)mem;
> % +     pip = &up->pip[0];
> % +     vfs_timestamp(&pip->pi_ctime);
> % +     pip->pi_atime = pip->pi_mtime = pip->pi_ctime;
>
> For fifos, is wrong to have even 1 timestamp at this level (except
> unused ones won't hurt).  Fifos and their timstamps persist as disk
> files, and the timestamps for these disk files are managed by the
> underlying file system.

> Any timestamps at this level can only give
> possibilities for inconsistencies.  For example, this level uses
> vfs_timestamp(), but the file system level might use a different
> timestamp method, either because it is buggy or because it cannot
> represent timestamps with the granularity that vfs_timestamp() gives.
> (It is a bug that vfs_timestamp() is global.)
>
> % @@ -1219,7 +1403,7 @@ pipe_write(fp, uio, active_cred, flags, %       }
> % %     if (error == 0)
> % -             vfs_timestamp(&wpipe->pipe_mtime);
> % +             vfs_timestamp(&pip->pi_mtime);
> % %     /*
> %        * We have something to offer,
>
> It's doing timstamps in the same way for fifos as for pipes.  There must
> by a problem for stat() too.  The old fifo code uses fo_stat() to get
> back to the underlying file system which knows all the attributes (not
> just timestamps).  I can't see anything like that here.  The new fifo
> code seems to just use pipe_stat() which gives many fake attributes
> which are likely to differ from ones in the file system.
>
> % @@ -1492,12 +1726,12 @@ pipe_free_kmem(cpipe)
> %   * shutdown the pipe
> %   */
> %  static void
> % -pipeclose(cpipe)
> % +pipeclose(cpipe, isfifo)
> %       struct pipe *cpipe;
> % +     int isfifo;
> %  {
>
> I don't see any reason to have a different zone for fifos.  This
> complicates some interfaces ...

Just to not waste a sizeof(struct pipeinfo ) bytes.

>
> % @@ -1570,21 +1798,34 @@ pipeclose(cpipe)
> %  #ifdef MAC
> %               mac_pipe_destroy(pp);
> %  #endif
> % -             uma_zfree(pipe_zone, cpipe->pipe_pair);
> % +             uma_zfree(isfifo ? fifo_zone : pipe_zone, cpipe->pipe_pair);
>
> The new isfifo parameter is only used here.

That's the only way to understand which zone I need to use.

>
> % diff -r fee0771aad22 sys/sys/pipe.h
> % --- a/sys/sys/pipe.h  Sun Jan 01 19:00:13 2012 +0100
> % +++ b/sys/sys/pipe.h  Mon Jan 09 00:13:55 2012 +0100
> % @@ -28,6 +28,8 @@
> %  #error "no user-servicable parts inside"
> %  #endif
> % % +#include <sys/selinfo.h>
> % +
>
> This namespace pollution was intentionally left out.
>
> %  /*
> %   * Pipe buffer size, keep moderate in value, pipes take kva space.
> %   */
> % @@ -103,16 +105,12 @@ struct pipe {
> %       struct  pipebuf pipe_buffer;    /* data storage */
> %       struct  pipemapping pipe_map;   /* pipe mapping for direct I/O */
> %       struct  selinfo pipe_sel;       /* for compat with select */
>
> <sys/selinfo.h> was a prerequisite for this file.

I'll fix it.

>
> % -     struct  timespec pipe_atime;    /* time of last access */
> % -     struct  timespec pipe_mtime;    /* time of last modify */
> % -     struct  timespec pipe_ctime;    /* time of status change */
> %       struct  sigio *pipe_sigio;      /* information for async I/O */
> %       struct  pipe *pipe_peer;        /* link with other direction */
> %       struct  pipepair *pipe_pair;    /* container structure pointer */
> %       u_int   pipe_state;             /* pipe status info */
> %       int     pipe_busy;              /* busy flag, mostly to handle
> rundown sanely */
> %       int     pipe_present;           /* still present? */
> % -     ino_t   pipe_ino;               /* fake inode for stat(2) */
>
> Both this and the timestamps should never have been here, since they
> are per-"disk"-file but they were per-pipe-endpoint (see below).
>
> %  };
> % %  /*
> % @@ -138,5 +136,24 @@ struct pipepair {
> %  #define PIPE_UNLOCK(pipe)    mtx_unlock(PIPE_MTX(pipe))
> %  #define PIPE_LOCK_ASSERT(pipe, type)  mtx_assert(PIPE_MTX(pipe), (type))
> % % +#define PIPE_CNT(pipe)     ((pipe->pipe_state & PIPE_DIRECTW) ? \
> % +             pipe->pipe_map.cnt : pipe->pipe_buffer.cnt)
> % +
> % +/*
> % + *  Per-file descriptor structure.
> % + */
>
> I was very confused by this comment.  It is sort of backwards.  This
> structure is for the lowest level, which is the pipepair level for
> pipes and the disk level for for fifos.
>  (But we have to expand the disk level, first from dinodes/directory
>  entries to inodes/directory blocks, then from inodes to vnodes, then
>  append pipepairs).  The open file level is a level or two above that,
>  and the file descriptor level is further above.
>
>  The levels are stacked even more confusingly for pipes.  Above the
>  pipepair level, there is the pipe level, but this is modes sideways
>  than fully above.  Open files are more at the level of pipes than
>  pipepairs.  "pipe" in normal usage can mean either "pipe" or
>  "pipepair" in this implementation.)
> The timestamps and inode were at a wrong level.  You made a step towards
> fixing this by moving them down, but the comment says that they are now
> at the highest level.

The comment is wrong I'll change it.

>
> % +struct pipeinfo {
> % +     struct  pipe    *pi_rpipe;      /* pipe we read from */
> % +     struct  pipe    *pi_wpipe;      /* pipe we write to */
> % +     struct  timespec pi_atime;      /* time of last access */
> % +     struct  timespec pi_mtime;      /* time of last modify */
> % +     struct  timespec pi_ctime;      /* time of status change */
> % +     ino_t   pi_ino;         /* fake pipe inode for stat(2) */
>
> Indentation error.
>
> % +};
>
> I was confused by the layering for this struct too.  This struct seems
> to be needed only to swap rpipe with wpipe for the 2 ends of a pipe.
> This is confusing, but I can't see any better way at the moment.
> Putting the other fields in it just gives confusion which leads to
> bugs and minor resource wastage.  All the other fields must be per-file
> at the lowest level, so any duplication of them gives either bugs
> (if they are different) or just wastes resources time to write them
> and check that they are the same, and and space to hold copies).
> These fields belong in the pipepair struct.  This moves them down
> (more sideways) another level.
>


> POSIX is fuzzy about whether the attributes are unique for the 2 ends
> of a pipe.  It requires all st_ timestamps and some other attributes
> to be "meaningful" unless otherwise specified and doesn't specify
> anything otherwise for pipes, at least in an old draft.  But for pipe()
> it says:
>
> 27884            Upon successful completion, pipe( ) shall mark for update
> the st_atime, st_ctime, and st_mtime
> 27885            fields of the pipe.
>
> Assuming that this part is not fuzzy, "the ... st_atime... fields of
> the pipe" in it must refer to unique fields.
>
> Similarly for pi_ino.
>
> These fields shouldn't be used at all for fifos, as mentioned above.
> File systems still maintain separate non-copies which pipe_stat()
> hides.  This doesn't matter much for timestamps and st_ino.  It
> matters for modes and permissions, etc.

I'll fix it
Just to be clear. timestamps of last access or last modify aren't saved
back into file system for fifo.
it's hard to fix this now so I might look at it later.

>
> After moving timestamps and pi_ino into the pipepair struct, the new
> info struct is reduced to 2 pointers into the pipepair struct.
> Unfortunately, the pointer to the pipeinfo struct cannot be simply
> replaced by a pointer to the pipepair struct (and dereferencing the
> latter instead of the former) because of complications for the
> separate ends of a pipe:
>
> @ @@ -372,17 +554,17 @@ kern_pipe(struct thread *td, int fildes[
> @        * to avoid races against processes which manage to dup() the read
> @        * side while we are blocked trying to allocate the write side.
> @        */
> @ -     finit(rf, FREAD | FWRITE, DTYPE_PIPE, rpipe, &pipeops);
> @ +     finit(rf, FREAD | FWRITE, DTYPE_PIPE, &up->pip[0], &pipeops);
> @       error = falloc(td, &wf, &fd, 0);
> @       if (error) {
> @               fdclose(fdp, rf, fildes[0], td);
> @               fdrop(rf, td);
> @               /* rpipe has been closed by fdrop(). */
> @ -             pipeclose(wpipe);
> @ +             pipeclose(wpipe, 0);
> @               return (error);
> @       }
> @       /* An extra reference on `wf' has been held for us by falloc(). */
> @ -     finit(wf, FREAD | FWRITE, DTYPE_PIPE, wpipe, &pipeops);
> @ +     finit(wf, FREAD | FWRITE, DTYPE_PIPE, &up->pip[1], &pipeops);
>
> One end used to get rpipe and the other end wpipe.  This is used mainly
> by read/write to go in the correct direction.  Now, the 2 ends get
> pointers to different pipeinfo structs, with the only differences in
> the pipeinfo structs being:
> - swap rpipe and wpipe.  This is used mainly by read/write as before

rpipe and wpipe aren't swapped.

for pipes:
pip[0].pi_rpipe == pip[0].pi_wpipe
pip[1].pi_rpipe == pip[1].pi_wpipe

pipe[0].pi_rpipe != pipe[1].pi_rpipe
pipe[0].pi_wpipe != pipe[1].pi_wpipe

for fifos
pip[0].pi_rpipe != pip[0].pi_wpipe

> - different timestamps (to implement bugs and resource wastage)
> - pi_ino should be the same in both (just waste space).

Well posix says:
"The st_ino and st_dev fields taken together uniquely identify the file
within the system."
So that's a matter of what a file is: pipepair or and end of if?

>
> For fifos, rpipe == wpipe so the 2 pipeinfo structs should be the same
> and the complications are not needed.
>

Err, that's true for pipes.

>
> There are some more complications resource wastages related to having
> 2 pipe ends when only 1 is needed:
> - struct pipe is in struct pipepair twice.  This can't quite be fixed
>  by moving it to struct pipeinfo.  If both pipe structs for malloc()ed,
>  then the 2 pointers in struct pipeinfo would be enough for accessing
>  them, but I think the space wastage is too small to fix like this.
> - some code was symmetrical relative to rpipe and wpipe and it didn't
>  care in which order they were in.  Now rpipe can be equal to wpipe,
>  some of this code is no longer symmetrical because it does things
>  like free(rpipe), while the rest of it is still symmetrical because
>  it does things like initalizing rpipe->foo to the same value as
>  wpipe->foo.  You had to make some changes in this area.  Example
>  of a change in this area:
>
> % @@@ -349,18 +473,76 @@ kern_pipe(struct thread *td, int fildes[
> % @     /* Only the forward direction pipe is backed by default */
> % @     if ((error = pipe_create(rpipe, 1)) != 0 ||
> % @         (error = pipe_create(wpipe, 0)) != 0) {
> % @-            pipeclose(rpipe);
> % @-            pipeclose(wpipe);
> % @+            pipeclose(rpipe, isfifo);
> % @+            pipeclose(wpipe, isfifo);
>
>  Not really symmetrical.  There must be 2 closes if there are 2 open
>  fd's (1 in each direction).  But what if there is only 1 open fd for
>  a fifo with rpipe = wpipe?  pipe_dtor() is more obviously correct,
>  since it only does 1 pipeclose() for fifos.  The new isfifo flag is
>  only used in pipeclose() to select the zone.

That's needed to free UMA allocated memory.

>
> % @             return (error);
> % @     }
> % @ % @         rpipe->pipe_state |= PIPE_DIRECTOK;
> % @     wpipe->pipe_state |= PIPE_DIRECTOK;
> % @ % @+        if (isfifo) {
> % @+            up->pip[0].pi_rpipe = rpipe;
> % @+            up->pip[0].pi_wpipe = wpipe;
> % @+    } else {
> % @+            up->pip[0].pi_rpipe = rpipe;
> % @+            up->pip[0].pi_wpipe = rpipe;
> % @+            up->pip[1].pi_rpipe = wpipe;
> % @+            up->pip[1].pi_wpipe = wpipe;
> % @+    }
>
>  This seems slightly broken.  Fifos should only use 1 pipeinfo struct,
>  but other fifo code initializes both up->pip[0] and up->pip[1].

your interpretation seems wrong.

> Having only 1 pipe end initialized would destroy any remaining symmetry
> but would make it clear which 1 is actually used.
>
> % +
> % +extern struct fileops pipeops;
> % +
> % +int pipe_ctor(struct pipeinfo **ppip, struct thread *td);
> % +void pipe_dtor(struct pipeinfo *pip);
>
> Indentation errors.
>
> % %  #endif /* !_SYS_PIPE_H_ */
>
> Bruce


More information about the freebsd-arch mailing list