posix_fallocate(2)

Gleb Kurtsou gleb.kurtsou at gmail.com
Fri Apr 15 10:54:16 UTC 2011


On (14/04/2011 15:41), mdf at FreeBSD.org wrote:
> On Thu, Apr 14, 2011 at 2:36 PM, Gleb Kurtsou <gleb.kurtsou at gmail.com> wrote:
> > On (14/04/2011 12:35), mdf at FreeBSD.org wrote:
> >> For work we need a functionality in our filesystem that is pretty much
> >> like posix_fallocate(2), so we're using the name and I've added a
> >> default VOP_ALLOCATE definition that does the right, but dumb, thing.
> >>
> >> The most recent mention of this function in FreeBSD was another thread
> >> lamenting it's failure to exist:
> >> http://lists.freebsd.org/pipermail/freebsd-ports/2010-February/059268.html
> >>
> >> The attached files are the core of the kernel implementation of the
> >> syscall and a default VOP for any filesystem not supporting
> >> VOP_ALLOCATE, which allows the syscall to work as expected but in a
> >> non-performant manner.  I didn't see this syscall in NetBSD or
> >> OpenBSD, so I plan to add it to the end of our syscall table.
> >>
> >> What I wanted to check with -arch about was:
> >>
> >> 1) is there still a desire for this syscall?
> > It looks not to play well architecturally with modern COW file systems
> > like ZFS and HUMMER. So potentially it can be implemented only for UFS.
> 
> The syscall, or the dumb implementation?  I don't see why the syscall
> itself would be a problem; presumably ZFS can figure out whether an
> fallocate() block is worth COWing or not...
It is good to have if there is a chance to get a real implementation for
UFS. Having only dumb implementation will fool user software that we
support it.

As far as I understand ZFS caches large chunk of changes and than writes
all of them at once. I doubt blocks can be preallocated. You preallocate
block, it's marked as used in file systems meta data, changes to meta
data are written to disk -- it results in inconsistency because
preallocated block is marked as "used" in meta data and thus can't
be overwritten. I might be absolutely wrong, ZFS experts are
better answer this. Grepping reveals no fallocate support in ZFS.

> >> 2) is this naive implementation useful enough to serve as a default
> >> for all filesystems until someone with more knowledge fills them in?
> > Maillist ate the patch. Only man page attached.
> 
> Whoops!
> 
> http://people.freebsd.org/~mdf/bsd-fallocate.diff
What was performance impact on copying large files? I had sparse file
support in PEFS implemented similar way. Performance was terrible, vm
and buf caches where saturated first by writing huge chunks of zeros and
than by mmap'ing and writing actual data. sched_yeld() and/or vnode
lock/unlock didn't improve interactive performance either.

Why wouldn't you just call VOP_SETATTR(newsize) in dumb implementation.
File systems expect files such behavior, cp is using mmap for a while
already.


> 
> Cheers,
> matthew


More information about the freebsd-arch mailing list