lockf() vs. flock() -- lockf() not locking?

Thu Apr 9 13:04:34 UTC 2015

> On Apr 7, 2015, at 5:15 PM, Jilles Tjoelker <jilles at stack.nl> wrote:
> 
> On Mon, Apr 06, 2015 at 04:18:09PM -0500, Guy Helmer wrote:
>> Recently an application I use switched from using flock() for advisory
>> file locking to lockf() in the code that protects against concurrent
>> writes to a file that is being shared and updated by multiple
>> processes (not threads in a single process). The code seems reliable —
>> a lock manager class opens the file & obtains the lock, then the
>> read/update method opens the file using a separate file descriptor &
>> reads/writes the file, flushes & closes the second file descriptor,
>> and then destroys the lock manager object which unlocks the file &
>> closes the first file descriptor.
> 
>> Surprisingly this simple change seems to have made the code unreliable
>> by allowing concurrent writers to the file and corrupting its
>> contents:
> 
>> -    if (flock(fd, LOCK_EX) != 0)
>> +    if (lockf(fd, F_LOCK, 0) != 0)
>>         throw std::runtime_error("Failed to get a lock of " + filename);
> 
>> . . .
>>     if (fd != -1) {
>> -        flock(fd, LOCK_EX);
>> +        lockf(fd, F_ULOCK, 0);
>>         close(fd);
>>         fd = -1;
>>     }
> 
>> From my reading of the lockf(3) man page and reviewing the
>> implementation in lib/libc/gen/lockf.c, and corresponding code in
>> sys/kern/kern_descrip.c, it appears the lockf() call should be
>> successfully obtaining an advisory lock over the whole file like a
>> successful flock() did. However, I have a stress test that quickly
>> corrupts the target file using the lockf() implementation, and the
>> test fails to cause corruption using the flock() implementation. I’ve
>> instrumented the code, and it's clear that multiple processes are
>> simultaneously in the block of code after the “lockf(fd, F_LOCK, 0)”
>> line.
> 
>> Am I missing something obvious? Any ideas?
> 
> With lockf/fcntl locks, the close of the second file descriptor actually
> already unlocks the file. If there is another close and open in there,
> it would explain your problem. Both the lockf(3) and the fcntl(2) man
> pages mention these strange semantics, but only fcntl(2) clearly warns
> about them. With flock locks, opening the file another time will not
> cause problems.
> 
> The second thing that will not work with lockf/fcntl locks is having a
> child process inherit them.
> 
> Changing flock() to lockf() seems like a bad idea, particularly in a
> reusable "lock manager" class, since it is then harder to see what
> operations must be avoided to avoid losing the lock.
> 
> There is a proposal in the Austin Group for the next version of POSIX to
> add a form of file lock that allows both range locking and proper
> (flock-style) semantics.

Thank you! I had forgotten the text in the fcntl(2) page about fcntl/lockf semantics (haven’t read that for a while). I have verified the method in question uses the lock manager to lock the file, opens the file to read the contents, closes the file [thus loosing the lockf lock], does some work, and then opens & writes the file to save the updates.

Regards,
Guy