close() of an flock'd file is not atomic
jhb at freebsd.org
Thu Mar 8 20:39:10 UTC 2012
On Wednesday, March 07, 2012 1:18:07 pm John Baldwin wrote:
> So I ran into this problem at work. Suppose you have a process that opens a
> read-write file descriptor with O_EXLOCK (so it has an flock()). It then
> writes out a binary into that file. Another process wants to execve() the
> file when it is ready, so it opens the file with O_EXLOCK (or O_SHLOCK), and
> will call execve() once it has locked the file. In theory, what should happen
> is that the second process should wait until the first process has finished
> and called close(). In practice what happens is that I occasionally see the
> second process fail with ETXTBUSY.
> The bug is that the vn_closefile() does the VOP_ADVLOCK() to unlock the file
> separately from the call to vn_close() which drops the writecount. Thus, the
> second process can do an open() and flock() of the file and subsequently call
> execve() after the first process has done the VOP_ADVLOCK(), but before it
> calls into vn_close(). In fact, since vn_close() requires a write lock on the
> vnode, this turns out to not be too hard to reproduce at all. Below is a
> simple test program that reproduces this constantly. To use, copy /bin/test
> to some other file (e.g. /tmp/foo) and make it writable (chmod a+w), then run
> ./flock_close_race /tmp/foo.
> The "fix" I came up with is to defer calling VOP_ADVLOCK() to release the lock
> until after vn_close() executes. However, even with that fix applied, my test
> case still fails. Now it is because open() with a given lock flag is
> non-atomic in that the open(O_RDWR) will call vn_open() and bump v_writecount
> before it blocks on the lock due to O_EXLOCK, so even though the 'exec_child'
> process has the fd locked, the writecount can still be bumped. One gross hack
> would be to defer the bump of the writecount to the caller of vn_open() if the
> caller passes in O_EXLOCK or O_SHLOCK, but that's a really gross kludge, plus
> it doesn't actually work. I ended up moving acquiring the lock into
> vn_open_cred(). The current patch I'm testing has both of these approaches,
> but the first one is #if 0'd out, and the second is #if 1'd.
Based on some feedback from Konstantin, I've fixed some issues in the failure
path handling for VOP_ADVLOCK(). I've also removed the #if 0'd code mentioned
above, so the patch is now the actual change that I'm testing. So far it
handles both my workload at work and my test program without any issues.
More information about the freebsd-fs