Versioning File System for FreeBSD?

Wed Jun 24 21:57:40 UTC 2009

On Wed, Jun 24, 2009 at 09:11:25PM +0200, Roland Smith wrote:
> > Yes, that's one possibility. But just like Subversion (which I'm
> > using extensively here), it's not really transparent.
> 
> What is? If you have to extend the API like you propose below, all
> programs that want to use that feature have to be changed. So if you're
> going around changing your program, why not have it interface to an existing
> revision control that you are already familiar with? That seems a lot
> easier that tacking revision control onto a filesystem!

Yep, you're right. I thought about a way to extend the API in a
backwards compatible way, but that's not as easy or straight
forward as it seems. In fact, it opens a whole can of worms.

If the versioned file system isn't also POSIX compatible (where
everything happens in HEAD unless specified otherwise), it's
practically useless.

> Git is very good at efficiently storing the differences between
> commits. And every copy of a directory under git control is a full-blown
> repository, so you can experiment with a copy without fear of fouling up
> your precious repository.

That's true as well. I'm not very familiar with git (as opposed
to subversion), but I clearly see its advantages.

> > I was actually thinking of a real versioning file system, with an
> > extended POSIX API (yet to be defined), to access all revisions of
> > a file system, just like with Subversion revisions.
> > 
> > As an example: opendir(2) would grow an additional and optional
> > argument "revision" to select either HEAD or some revision of the
> > directory:
> > 
> > DIR *dirp;
> > dirp = opendir("/path/to/dir", 0);   /* open /path/to/file at HEAD */
> > dirp = opendir("/path/to/dir");      /* same as above, POSIX compat */
> > dirp = opendir("/path/to/dir", 323); /* open dir at revision 323 */
> > 
> > /* From here on, readdir() would retrieve /path/to/dir entries
> >    at the specified revision. */
> > 
> > open(2) could open a file at an earlier revision:
> > 
> > FILE *filep;
> > 
> > /* open file in HEAD */
> > filep = open("/path/to/file", O_RDONLY);
> > 
> > /* open same path, but at revision 323 */
> > filep = open("/path/to/file", O_RDONLY, /* 0666 */, 323);
> 
> There is some ambiguity here. Does 323 refer to a single file, or to the
> state of its parent directory? If changing a file doesn't update the
> version of its parent directory, then why have version numbers for
> directories? On the other hand, if changing a file updates the revision
> for the file and its parent directory, the revision for the root
> directory will increase quite rapidly!

Quite true!

I see even more ambiguity here: What about a versioned file pointed
to by hard links from two versioned directories?

And even if the semantics were absolutely sound (can they be?), all
this meta data really needs to happen on a block level, e.g. how
described in that paper.

> > unlink(2) would remove an entry from a directory, and bump the revision
> > of the directory. Accessing that path from the new revision wouldn't be
> > possible, but the file would still be there in an earlier revision.
> > 
> > Modifying a file would also create new revisions (e.g. at each
> > write(2), or at each close(2), that should be selectable).
> 
> I don't know what you want to do use this for, but a simple trick (used
> e.g. by Pro/Engineer) is to have your application append a version
> number after the filename (e.g. "foo.prt.1") that is incremented every
> time the file is saved. This does waste some disk space (or provides
> redundancy, take your pick).

Yes, that's always possible. But that would defeat transparency.

And there's another problem here: what if two processes concurrently
save (commit?) the same file, and there's a merging conflict?

> > Of course, there would be additional API calls to traverse the
> > list of revisions, to access meta data (properties?, tags?,
> > commit logs?, ...) etc., so that the file system remains manageable.
> 
> VMS had a filesystem that uses versioning: [http://en.wikipedia.org/wiki/Files-11]

I was thinking about this before starting this thread. But file
versioning (as opposed to full versioning that also includes
directory versioning) is probably relatively easy to implement.
At least, its semantics are unambiguous.

> > I didn't try them (yet), but on Linux, there are some experimental
> > versioning file systems like:
> >   http://www.ext3cow.com/Welcome.html
> >   http://tux3.org/
> > 
> > and there's a (unmaintained?) FUSE file system at:
> >   http://wayback.sourceforge.net/
> > 
> > all of which differ in the way POSIX API should be extended and
> > the semantics for versioning. But there's apparently nothing yet
> > in the works for FreeBSD. Perhaps some layer on top of existing
> > file systems, or an extension of UFS/FFS that stores versioning
> > meta data directly at the block level?

-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/