close() of an flock'd file is not atomic
Konstantin Belousov
kostikbel at gmail.com
Thu Mar 8 23:19:01 UTC 2012
On Thu, Mar 08, 2012 at 03:39:07PM -0500, John Baldwin wrote:
> On Wednesday, March 07, 2012 1:18:07 pm John Baldwin wrote:
> > So I ran into this problem at work. Suppose you have a process that opens a
> > read-write file descriptor with O_EXLOCK (so it has an flock()). It then
> > writes out a binary into that file. Another process wants to execve() the
> > file when it is ready, so it opens the file with O_EXLOCK (or O_SHLOCK), and
> > will call execve() once it has locked the file. In theory, what should happen
> > is that the second process should wait until the first process has finished
> > and called close(). In practice what happens is that I occasionally see the
> > second process fail with ETXTBUSY.
> >
> > The bug is that the vn_closefile() does the VOP_ADVLOCK() to unlock the file
> > separately from the call to vn_close() which drops the writecount. Thus, the
> > second process can do an open() and flock() of the file and subsequently call
> > execve() after the first process has done the VOP_ADVLOCK(), but before it
> > calls into vn_close(). In fact, since vn_close() requires a write lock on the
> > vnode, this turns out to not be too hard to reproduce at all. Below is a
> > simple test program that reproduces this constantly. To use, copy /bin/test
> > to some other file (e.g. /tmp/foo) and make it writable (chmod a+w), then run
> > ./flock_close_race /tmp/foo.
> >
> > The "fix" I came up with is to defer calling VOP_ADVLOCK() to release the lock
> > until after vn_close() executes. However, even with that fix applied, my test
> > case still fails. Now it is because open() with a given lock flag is
> > non-atomic in that the open(O_RDWR) will call vn_open() and bump v_writecount
> > before it blocks on the lock due to O_EXLOCK, so even though the 'exec_child'
> > process has the fd locked, the writecount can still be bumped. One gross hack
> > would be to defer the bump of the writecount to the caller of vn_open() if the
> > caller passes in O_EXLOCK or O_SHLOCK, but that's a really gross kludge, plus
> > it doesn't actually work. I ended up moving acquiring the lock into
> > vn_open_cred(). The current patch I'm testing has both of these approaches,
> > but the first one is #if 0'd out, and the second is #if 1'd.
> >
> > http://www.freebsd.org/~jhb/patches/flock_open_close.patch
>
> Based on some feedback from Konstantin, I've fixed some issues in the failure
> path handling for VOP_ADVLOCK(). I've also removed the #if 0'd code mentioned
> above, so the patch is now the actual change that I'm testing. So far it
> handles both my workload at work and my test program without any issues.
I think a comment is needed for a reason to call vn_writechk() second time.
Could you, please, point me, where the FHASLOCK is set for O_EXLOCK | O_SHLOCK
case in the patched kernel ?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20120308/d42631db/attachment-0001.pgp
More information about the freebsd-fs
mailing list