[Bug 256005] [FUSEFS] Subsequent posix lock operation fails when it should not.

From: <bugzilla-noreply_at_freebsd.org>
Date: Wed, 19 May 2021 17:30:48 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256005

            Bug ID: 256005
           Summary: [FUSEFS] Subsequent posix lock operation fails when it
                    should not.
           Product: Base System
           Version: CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: chogata@moosefs.pro

Hi,

At MooseFS test lab we run some tests to see if issues mentioned in bug #255945
and bug #253500 were solved in 14-CURRENT and we encountered another issue with
posix locks handled via FUSEFS.

I executed this simple script:

---------
#!/usr/bin/env python

import posix
import time

fd = posix.open("lock.file",posix.O_WRONLY|posix.O_CREAT,0o666)
print(fd)
print(posix.lockf(fd,posix.F_LOCK,0))
time.sleep(2)
print(posix.lockf(fd,posix.F_TLOCK,0))
time.sleep(2)
print(posix.lockf(fd,posix.F_ULOCK,0))
---------

When this is run on a local filesystem, the output is as expected, that is:

---------
3
None
None
None
---------

However, when I run it on MooseFS, I get:

---------
3
None
Traceback (most recent call last):
  File "/mnt/mfs/f13/test.py", line 14, in <module>
    print(posix.lockf(fd,posix.F_TLOCK,0))
BlockingIOError: [Errno 35] Resource temporarily unavailable
---------

and that is not correct, because a process, that owns a posix lock on a file,
is supposed to be able to lock this file again.
So I checked the operation history and this is what I got (I cut out everything
that happened before the open/create operation of file "lock.file" as it is not
relevant):

---------
05.19 16:38:13.441623: uid:0 gid:0 pid:766 cmd:lookup (1,lock.file): ENOENT (No
such file or directory)
05.19 16:38:13.442336: uid:0 gid:0 pid:766 cmd:create
(1,lock.file,O_RDWR|O_CREAT,-rw-r--r--:0100644): OK
(0.0,233781,1.0,[-rw-r--r--:0100644,1,0,0,1621442293,1621442293,1621442293,0])
(direct_io:0,keep_cache:0,append_mode:0) [handle:01000004]
05.19 16:38:13.443110: uid:0 gid:0 pid:766 cmd:setlkw (inode:233781
owner:0000000000018723 start:0 end:18446744073709551615 type:W)
[handle:01000004]: OK
05.19 16:38:15.449501: uid:0 gid:0 pid:766 cmd:setlk (inode:233781
owner:0000000000E3D000 start:0 end:18446744073709551615 type:W)
[handle:01000004]: EAGAIN (Resource temporarily unavailable)
05.19 16:38:15.450339: uid:0 gid:0 pid:766 cmd:access (1,0x1): OK
05.19 16:38:15.451038: uid:0 gid:0 pid:766 cmd:lookup (1,f13): OK
(1.0,8243,1.0,[drwxr-xr-x:0040755,2,0,0,1621442079,1621442071,1621442071,1001824])
05.19 16:38:15.451110: uid:0 gid:0 pid:766 cmd:access (8243,0x1): OK
05.19 16:38:15.451743: uid:0 gid:0 pid:766 cmd:lookup (8243,test.py): OK
(0.0,233775,1.0,[-rwxr-xr-x:0100755,1,0,0,1621442269,1621442071,1621442071,559])
05.19 16:38:15.451801: uid:0 gid:0 pid:766 cmd:access (233775,0x4): OK (forced
- kernel bug workaround)
05.19 16:38:15.451868: uid:0 gid:0 pid:766 cmd:open (233775,O_RDONLY) (using
cached data from lookup): OK (direct_io:0,keep_cache:1,append_mode:0)
[handle:01000005]
05.19 16:38:15.452915: uid:0 gid:0 pid:766 cmd:setlk (inode:233775
owner:FFFFFFFF8EB01020 start:0 end:18446744073709551615 type:U)
[handle:01000005]: OK
05.19 16:38:15.453583: uid:0 gid:0 pid:766 cmd:setlk (inode:233775
owner:FFFFFFFF8EB01020 start:0 end:18446744073709551615 type:U)
[handle:01000005]: OK
05.19 16:38:15.454143: uid:0 gid:0 pid:766 cmd:flush (233775) - releasing all
POSIX-type locks for 00000000000002FE (received from kernel): OK
05.19 16:38:15.454151: uid:0 gid:0 pid:766 cmd:flush (233775)
[handle:01000005,uselocks:2,lock_owner:00000000000002FE]: OK
05.19 16:38:15.454204: uid:0 gid:0 pid:766 cmd:release (233775)
[handle:01000005,uselocks:2,lock_owner:00000000000002FE]: OK
05.19 16:38:15.456512: uid:0 gid:0 pid:766 cmd:setlk (inode:233781
owner:FFFFFFFFCDEFB9F0 start:0 end:18446744073709551615 type:U)
[handle:01000004]: OK
05.19 16:38:15.457012: uid:0 gid:0 pid:766 cmd:flush (233781) - releasing all
POSIX-type locks for 00000000000002FE (received from kernel): OK
05.19 16:38:15.457022: uid:0 gid:0 pid:766 cmd:flush (233781)
[handle:01000004,uselocks:2,lock_owner:00000000000002FE]: OK
05.19 16:38:15.457478: uid:0 gid:0 pid:766 cmd:release (233781) - releasing all
POSIX-type locks for 0000000000E3D000 (left by kernel): OK
05.19 16:38:15.457850: uid:0 gid:0 pid:766 cmd:release (233781) - releasing all
POSIX-type locks for 0000000000018723 (left by kernel): OK
05.19 16:38:15.457861: uid:0 gid:0 pid:766 cmd:release (233781)
[handle:01000004,uselocks:2,lock_owner:00000000000002FE]: OK
---------

Just to be sure, I ran it on Linux on MooseFS, it behaved like FreeBSD on local
filesystem: displayed the file descriptor number and 3 "None"s. Operation log
(from the same moment as the one above, so opening/creating the lockfile) looks
like that:

---------
05.19 19:04:33.608503: uid:0 gid:0 pid:1703 cmd:lookup (233782,lock.file):
ENOENT (No such file or directory)
05.19 19:04:33.609052: uid:0 gid:0 pid:1703 cmd:create
(233782,lock.file,O_WRONLY|O_CREAT,-rw-rw-rw-:0100666): OK
(0.0,233784,1.0,[-rw-r--r--:0100644,1,0,0,1621443873,1621443873,1621443873,0])
(direct_io:0,keep_cache:0,append_mode:0) [handle:0A000001]
05.19 19:04:33.609475: uid:0 gid:0 pid:1703 cmd:setlkw (inode:233784
owner:53C3FEC6E42D7114 start:0 end:18446744073709551615 type:W)
[handle:0A000001]: OK
05.19 19:04:35.124774: uid:0 gid:0 pid:1667 cmd:getattr (233782) [no handle]:
OK (1.0,[drwxr-xr-x:0040755,2,0,0,1621443864,1621443873,1621443873,1])
05.19 19:04:35.610893: uid:0 gid:0 pid:1703 cmd:setlk (inode:233784
owner:53C3FEC6E42D7114 start:0 end:18446744073709551615 type:W)
[handle:0A000001]: OK
05.19 19:04:37.613568: uid:0 gid:0 pid:1703 cmd:setlk (inode:233784
owner:53C3FEC6E42D7114 start:0 end:18446744073709551615 type:U)
[handle:0A000001]: OK
05.19 19:04:37.616895: uid:0 gid:0 pid:1703 cmd:flush (233784) - releasing all
POSIX-type locks for 53C3FEC6E42D7114 (received from kernel): OK
05.19 19:04:37.616898: uid:0 gid:0 pid:1703 cmd:flush (233784)
[handle:0A000001,uselocks:2,lock_owner:53C3FEC6E42D7114]: OK
---------

So, clearly, on FreeBSD MooseFS gets some weird owner lock values from FUSE,
while on Linux it all looks consistent. As a result, the second lock fails,
because MooseFS thinks some other process is trying to obtain the lock.

-- 
You are receiving this mail because:
You are the assignee for the bug.