Prefaulting for i/o buffers

Sat Feb 25 19:47:51 UTC 2012

On Sat, Feb 25, 2012 at 06:45:00PM +0100, Attilio Rao wrote:
> Il 25 febbraio 2012 16:13, Pawel Jakub Dawidek <pjd at freebsd.org> ha scritto:
> > I personal opinion about rangelocks and many other VFS features we
> > currently have is that it is good idea in theory, but in practise it
> > tends to overcomplicate VFS.
> >
> > I'm in opinion that we should move as much stuff as we can to individual
> > file systems. We try to implement everything in VFS itself in hope that
> > this will simplify file systems we have. It then turns out only one file
> > system is really using this stuff (most of the time it is UFS) and this
> > is PITA for all the other file systems as well as maintaining VFS. VFS
> > became so complicated over the years that there are maybe few people
> > that can understand it, and every single change to VFS is a huge risk of
> > potentially breaking some unrelated parts.
> 
> I think this is questionable due to the following assets:
> - If the problem is filesystems writers having trouble in
> understanding the necessary locking we should really provide cleaner
> and more complete documentation. One would think the same with our VM
> subsystem, but at least in that case there is plenty of comments that
> help understanding how to deal with vm_object, vm_pages locking during
> their lifelines.

Documentation is not the answer here. If the code is so complex it is
harder to learn, no matter how good the documentation is, it makes less
people willing to learn it in the first place and it makes the code more
buggy, because there are more edge/special cases you can forget about.

> - Our primitives may be more complicated than the
> 'all-in-the-filesystem' one, but at least they offer a complete and
> centralized view over the resources we have allocated in the whole
> system and they allow building better policies about how to manage
> them. One problem I see here, is that those policies are not fully
> implemented, tuned or just got outdated, removing one of the highest
> beneficial that we have by making vnodes so generic

Again, this is only nice theory, that is far from being the reality.
You will never be able to have control on all the resources allocated by
file systems.

> About the thing I mentioned myself:
> - As long as the same path now has both range-locking and vnode
> locking I don't see as a good idea to keep both separated forever.
> Merging them seems to me an important evolution (not only helping
> shrinking the number of primitives themselves but also introducing
> less overhead and likely rewamped scalability for vnodes (but I think
> this needs a deep investigation).
> - About ZFS rangelocks absorbing the VFS ones, I think this is a minor
> point, but still, if you think it can be done efficiently and without
> loosing performance I don't see why not do that. You already wrote
> rangelocks for ZFS, so you are have earned a big experience in this
> area and can comment on fallouts, etc., but I don't see a good reason
> to not do that, unless it is just too difficult. This is not about
> generalizing a new mechanism, it is using a general mechanism in a
> specific implementation, if possible.

I did not implement rangelocking for ZFS. It came with ZFS when I ported
it. Until we want to merge changes from upstream (which is now IllumOS)
we don't want to make huge changes just for the sake of proving that
this is general purpose mechanism used by more than one file system.

Attilio, don't get me wrong. In 99% cases it is good to make code more
general and more universal and reusable, but we can't ignore reality.

There are reasons why file systems like XFS, ReiserFS and others where
never fully ported. I'm not saying VFS complexity was the only reason,
but I'm sure it was one of them.

Our VFS is very UFS-centric. We make so many assumptions that sounds
fine only for UFS. I saw plenty of those while working on ZFS, like:

- "Every file system needs cache. Let's make it general, so that all file
  systems can use it!" Well, for VFS each file system is a separate
  entity, which is not the case for ZFS. ZFS can cache one block only
  once that is used by one file system, 10 clones and 100 snapshots,
  which all are separate mount points from VFS perspective.
  The same block would be cached 111 times by the buffer cache.

- "rmdir(2) on a mountpoint is bad idea, let's deny it at VFS level."
  It is bad idea, indeed, but in ZFS it is a nice way to remove snapshot
  by rmdiring .zfs/snapshot/<name> directory.

- Noone implemented rangelocking in VFS, so no file system can use it.
  Even if the given file system has all the code to do it.

etc.

I'm also sure it will be way easier for Jeff to make VFS MP-safe if it
was less complex.

When looking at the big picture, it would be nice to have all this
general stuff like rangelocking, quota, buffer cache, etc. as some kind
of libraries for file systems to use and not something that is
mandatory. If I develop a file system for FreeBSD only and I don't want
to reinvent the wheel, I can use those libraries. If I port file system
to FreeBSD or develop a file system that doesn't really need those
libraries I'm not forced to use them.

All this might make a good working group subject at BSDCan devsummit.
We could cross swords there:)

-- 
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://tupytaj.pl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20120225/be2b84ca/attachment.pgp