[FiST] Re: Overlayfs for FiST?

Wed Apr 23 20:33:57 PDT 2003

Hi,

In my opinion...	(O.K. I'll take a dive into this.)

This all sounds a lot like (Free)BSD's unionfs.  I have tried using
unionfs for various tasks, including some related to security.  It
works quite well, but I noticed there are definitely complicated cases
where a complex hierarchy of overlay would be required for it to
become practical to use overlays.  UnionFS seems a successful proof
of concept that demonstrates how overlays can work in simple
situations, and in how namespaces can be joined using vnode stacking.
Doing overlays in fist would be great: hopefully then, as the BSD
templates further mature, an overlayfs could be used in multiplatform
environments, much as cryptfs could be.

My concern would be that the code quality for any target platform
would not be left aside for greater portability.  Fist can likely
produce very optimal code, if the ports are closely worked into the
source bases of the target operating systems.  I've been on-and-off
trying FreeBSD templates, and it is very good to see the progress
thus far.  I spent a few nights 1-2 months ago trying to get the
make files to integrate with the BSD makefiles under 5.0-RELEASE.
Some progress, but nothing usable yet.  The 4.x-RELEASE
templates are better off since the last update.

Another issue with overlays is implementing the mechanisms to
migrate changes between different working copies, or layers.  At
that point, it would even seem to be related to revisioning.

Here, authority and immutability also would seem to be
applicable, in that the filestore could depending on trust
assessment (for instance by ACL), assign different authority to
an update.

Some of my ideas:
- union mounts should have a filtering mask, which determines
  which layer the changes are effected on.  Parameters could be
  anything which fist potentially allows: filenames, userid,
  timestamp, size, accesses, acl, etc...
- union mounts should allow 3 and more filesystems to be stacked
  <above> 1 2 3 ... <below> (coalesced into one namespace)
  where they can occupy different sections of the namespace for
  instance and should allow dynamic configuration of the stack
  after the initial union mount point is set.  This would imply
  that there can be a relative weight to a store both in
  retrieval and storage.
(Here is where fan-outs would come into play I would imagine.
With the introduction in FreeBSD of GEOM, which I haven't
had the chance to fully explore, at least at the device level,
things are getting more advanced.)

Let me know if these are the types of things you had in mind.  If
these concepts are one in the same, from the standpoint of theory;
I'm not certain which terminology I would prefer.

The term union seems to suggest that the namespaces are being
brought together to create a combined system composed of numerous
member filesystems.  While, the term overlay seems to suggest
that sections of the filesystem are being ignored and emphasizes
intersection in, and overriding factors of the combined system
instead.  Perhaps both apply.

One thing is for sure, this problem goes deeper than simple
overlays.  It has to do with more than combining two sources when
exploring the roots of the issue.  Not to diminish the role of
the join in namespace.

On Wed, Apr 23, 2003 at 23:50:33PM +0200, Wout Mertens wrote:
> Hi Erez,
> 
> On Wed, 23 Apr 2003, Erez Zadok wrote:
> 
> > I'll CC the fist list (where this message is suitable) for other people's
> > comments.
> >
> > In message <Pine.GSO.4.53.0304230843220.9711 at bru-cse-075.cisco.com>, Wout Mertens writes:
> > > Hi there,
> > >
> > > I'm trying to boot a Linux 2.4 thin client from a readonly nfs root, and I
> > > keep going back to my childhood dream, a filesystem that you can overlay
> > > over another filesystem and that keeps the changes you make to it.

The idea of using NFS for both the base image and overlay, is a good
example of the types of applications possible.  To what extent this
intersects with existing network filesystems might be of interest.

> > >
> > > The idea would be that the filesystem would keep track of additions,
> > > renames, deletions, permissions and so forth, but not touching the
> > > filesystem below it. If this is then done with a tmpfs backing store, you
> > > get a nonpersistent fs.
> > >
> > > Right now I solve the problem by copying all files to tmpfs, but this is
> > > wasteful.
> > >
> > > So I was wondering if you have implementation hints, maybe you considered
> > > the same things, or you have a half-finished .fist file lying around...
> > >
> > > Thanks!
> > >
> > > Wout.
> >
> > So if I understand you right, you want the f/s to read from one source, but
> > when writing, it should write to another location, right?
> >
> > Do you just want to keep the latest update to files that have been modified,
> > or a historical detailed log of all activity (perhaps one that can be rolled
> > back).  The latter of course is more complex.
> >
> > Once a file is modified and written, what happens if you try to re-read it?
> > Do you get the original unmodified version, or the one just written?  The
> > latter is a special case of a write-through cachefs (such as Solaris's) but
> > one which doesn't write through any changes.
> 
> I want to be able to perform all file operations on files in a certain
> filesystem, where the changes are kept somewhere else. In my specific
> case, I start fresh and I'll throw away the changes afterwards, but they
> could also be kept. Activity log is not necessary in my case.
> 
> It is related to a write-through cachefs, but an important difference is
> that deletions, attr changes, etc. should also be handled. cachefs is much
> simpler to implement, I would think.
> 
> Maybe I'm being too complicated and the best way would be to just keep the
> block level changes on the raw device but not apply them, but then that
> wouldn't work for nfs, my goal filesystem, which has no raw device.

Makes me think of revisioning in the filesystem.  Even if you didn't
"commit" changes to a file, they could still exist under a different
revision name.  There has been, and will be much conversation on this
topic (inevitably), as it becomes apparent at various stages that it
was both a good thing and a bad thing that a VMS-like model wasn't
adopted.

> Besides, then you wouldn't be able to see what the changes were, useful
> for sandboxed stuff. (Although you should need per-user-visible mounts as
> well then)

One less exciting application is: the idea of using multiple layers
of storage to maintain working copies of data.  For instance using
an overlay for special purpose attribute/meta-data directories files
to avoid filesystem pollution.  It's frustrating to have to clean-up
after tools that leave their messy attribute directories all over
a filesystem.  The only other ways to eliminate them are to set very
restrictive permissions and risk breaking something, use a special
copy of the data in an isolated location (sandbox), or spend the
time fixing each potential offender.  Luckily build tools and source
repositories avoid this type of problem where possible by placing
their working files in an object tree in the first place.

The concept of an attribute itself, to me: is somewhat risky,
since if abused, it's no longer an attribute.  What constitutes
meta-data anyway?

I don't accept that "dot directories are out, so don't worry about
them" dictum.  It's been abused, so that .dir has lost it's meaning
almost entirely: just look at your home directory to find out why.
I have for example:
  .kde/ .w3m/ .netscape/ .procmail/ .emacs/
almost none of this is "special data"!
  alias ls='ls -a'

Under FreeBSD for instance: netatalk is one packages that spews
random directories to back proprietary Mac attributes, and assumes
that since the attributes are in .thing directories, it's OK to add
them at any point.

> > It seems to me that one way or another, you'll be needing a fan-out file
> > stackable system: one that can have one branch that it treats as read-only
> > (say, nfs), and another branch that's a writable directory (perhaps even a
> > local disk based f/s).
> 
> I agree. The hard part is deciding on a nice way of keeping the changes.
> Possibly something with a subdirectory per type of change, with regular
> files replacing/adding to the original filesystem just being in the
> corresponding directory on the backing store.
> 
> Original
>  |--a/
>  |   `--b.txt
>  `--c/
>      |--d/
>      `--e.txt
> 
> Backing Store
>  |--a/
>  |   |--b.txt (newer file)
>  |   `--f.txt
>  |--c/
>  |   `--.deletions
>  |       `--d/
>  |       `--e.txt (0-length file)
>  `--g.txt

Reminds me of "white-out" entries in BSD.  There is a good paper on the union fs
by Jan-Simon Pendry, and of course McKusick's book covers this as well.

> 
> Overlay
>  |--a/
>  |   |--b.txt (newer file)
>  |   `--f.txt
>  |--c/
>  `--g.txt
> 
> > True fan-out file system support in fist has been on my todo list for
> > several years.  It's not an easy task: many OS design assumptions are easily
> > broken and have to be addressed.  We recently did a prototype two-branch
> > read-only unionfs (fan-out) in linux 2.4; we hope to polish it up, add full
> > write support, multiple branches, and more, then make it available by
> > summer's end.
> 
> Looking forward to that :)
> 
> Cheers,
> 
> Wout.
> _______________________________________________
> FiST mailing list
> FiST at lists.cs.columbia.edu
> http://lists.cs.columbia.edu/mailman/listinfo/fist

Allan Fields