posix_fallocate(2)

John Baldwin jhb at freebsd.org
Fri Apr 15 16:42:43 UTC 2011


On Friday, April 15, 2011 5:30:57 am Kostik Belousov wrote:
> On Thu, Apr 14, 2011 at 03:41:30PM -0700, mdf at freebsd.org wrote:
> > On Thu, Apr 14, 2011 at 2:36 PM, Gleb Kurtsou <gleb.kurtsou at gmail.com> 
wrote:
> > > On (14/04/2011 12:35), mdf at FreeBSD.org wrote:
> > >> For work we need a functionality in our filesystem that is pretty much
> > >> like posix_fallocate(2), so we're using the name and I've added a
> > >> default VOP_ALLOCATE definition that does the right, but dumb, thing.
> > >>
> > >> The most recent mention of this function in FreeBSD was another thread
> > >> lamenting it's failure to exist:
> > >> http://lists.freebsd.org/pipermail/freebsd-ports/2010-
February/059268.html
> > >>
> > >> The attached files are the core of the kernel implementation of the
> > >> syscall and a default VOP for any filesystem not supporting
> > >> VOP_ALLOCATE, which allows the syscall to work as expected but in a
> > >> non-performant manner.  I didn't see this syscall in NetBSD or
> > >> OpenBSD, so I plan to add it to the end of our syscall table.
> > >>
> > >> What I wanted to check with -arch about was:
> > >>
> > >> 1) is there still a desire for this syscall?
> > > It looks not to play well architecturally with modern COW file systems
> > > like ZFS and HUMMER. So potentially it can be implemented only for UFS.
> > 
> > The syscall, or the dumb implementation?  I don't see why the syscall
> > itself would be a problem; presumably ZFS can figure out whether an
> > fallocate() block is worth COWing or not...
> > 
> > >> 2) is this naive implementation useful enough to serve as a default
> > >> for all filesystems until someone with more knowledge fills them in?
> > > Maillist ate the patch. Only man page attached.
> > 
> > Whoops!
> > 
> > http://people.freebsd.org/~mdf/bsd-fallocate.diff
> 
> New syscall symbols for 9.0 should go in under FBSD_1.2 version, not 
FBSD_1.0.
> 
> You have inconsistent spacing in the kern_posix_fallocate().
> 
> I do not quite understand the locking for vnode you did.
> You marked the vop as taking and returning unlocked vnode. But, you
> do call VOP_GETATTR in the vop std implementation before locking the vnode.
> Did you tested with DEBUG_VFS_LOCKS config ?
> 
> Usual (and proper) practice is to have such vop require locked vnode, in
> case of VOP_ALLOCATE, exclusive lock is appropriate. The Giant dance and
> vn_start_write() + vn_lock() go into kern_posix_fallocate() then.
> Also, you should call bwillwrite() before taking any vfs locks.
> 
> Is locking/unlocking the vnode in loop is done to allow other callers
> to perform i/o on the vnode in between ? In particular, to truncate it ?
> I think this is not needed, and previous suggestion would take care of it.
> 
> Why do you need stdallocate_extend() ? VOP_WRITE does the right thing
> with extending the vnode.
> 
> You might find vn_rdwr easier to use then the bare vops. In particular,
> it would not omit the mac calls for read/write.

I agree with pretty much all of this esp. as regards the locking, etc.

-- 
John Baldwin


More information about the freebsd-arch mailing list