[Bug 258022] [FUSES] Inode attributes are cached unnecessarily/for too long

From: <bugzilla-noreply_at_freebsd.org>
Date: Tue, 24 Aug 2021 11:30:38 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=258022

            Bug ID: 258022
           Summary: [FUSES] Inode attributes are cached unnecessarily/for
                    too long
           Product: Base System
           Version: 13.0-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: chogata@moosefs.pro

This is a problem a user of MooseFS reported. Under some circumstances creating
a new fs entry (any type: directory, regular file, special file) on MooseFS
mount shows a message "Resource temporarily unavailable" and any subsequent
operations on this inode (ls -al, rm or rmdir) also show this message. And
MooseFS cannot be unmounted, the system shows a message "Device busy". Only a
restart of the whole machine helps.

Since this was a bit similar to a problem some versions of Linux kernel had,
when a process on one machine deleted a CWD of a process on a different
machine, we at first thought it had to do with CWDs only and introduced some
safeguards in MooseFS client for FreeBSD. But recent findings show this is much
more serious and on FreeBSD side.

A simple test: we take two machines and mount MooseFS on both. 
On FreeBSD machine (13.0-RELEASE-p3) we use these mount options:

mfsmount -o mfsattrcacheto=0 -o mfsxattrcacheto=0 -o mfsentrycacheto=0 -o
mfsdirentrycacheto=0 -o mfssymlinkcacheto=0 -o mfsgroupscacheto=0 /mnt/mfs

All the -o options are to disable any attribute caches that may exist (any
lookup, access, mkdir etc. operations will return 0 seconds as cache time).

We also have a second machine with the same MooseFS instance. Operating system
on the second machine is irrelevant.

Then we perform these steps, exactly in the order shown below:

***FreeBSD machine***
~# cd /mnt/mfs/testdir
/mnt/mfs/testdir# ls -al
total 2932
drwxr-xr-x   2 root  wheel        1 Aug 23 12:41 .
drwxrwxrwx  43 root  wheel  3001433 Aug 23 12:28 ..
/mnt/mfs/testdir#
******
(FreeBSD "sees" that "testdir" is empty)

***OTHER machine***
~# cd /mnt/mfs/testdir
/mnt/mfs/testdir# mkdir dir
/mnt/mfs/testdir#
******
(Other machine creates a directory named "dir" inside "testdir")

***FreeBSD machine***
/mnt/mfs/testdir# ls -al
total 2933
drwxr-xr-x   3 root  wheel        1 Aug 23 12:59 .
drwxrwxrwx  43 root  wheel  3001433 Aug 23 12:28 ..
drwxr-xr-x   2 root  wheel        1 Aug 23 12:59 dir
/mnt/mfs/testdir#
******
(FreeBSD "sees" that there is now "dir" inside "testdir")

***OTHER machine***
/mnt/mfs/testdir# ls -ali
total 2933
8 drwxr-xr-x   3 root  wheel        1 Aug 23 12:59 .
1 drwxrwxrwx  43 root  wheel  3001433 Aug 23 12:28 ..
9 drwxr-xr-x   2 root  wheel        1 Aug 23 12:59 dir
/mnt/mfs/testdir# rmdir dir
/mnt/mfs/testdir# ls -al
total 2932
drwxr-xr-x   2 root  wheel        1 Aug 23 13:00 .
drwxrwxrwx  43 root  wheel  3001433 Aug 23 12:28 ..
/mnt/mfs/testdir#
******
(We check the inode number of "dir" on the other machine and delete "dir")

***FreeBSD machine***
/mnt/mfs/testdir# ls -al
total 2932
drwxr-xr-x   2 root  wheel        1 Aug 23 13:00 .
drwxrwxrwx  43 root  wheel  3001433 Aug 23 12:28 ..
/mnt/mfs/testdir#
******
(FreeBSD "sees" again, that "testdir" is empty)

Now we wait for at least 5 minutes, the timing will be explained below.

***FreeBSD machine***
/mnt/mfs/testdir# echo "foo" > file.txt
-bash: file.txt: Resource temporarily unavailable
/mnt/mfs/testdir# ls -al
ls: file.txt: Resource temporarily unavailable
total 2932
drwxr-xr-x   2 root  wheel        1 Aug 23 13:17 .
drwxrwxrwx  43 root  wheel  3001433 Aug 23 12:28 ..
/mnt/mfs/testdir#
******
Ooops?!

***OTHER machine***
/mnt/mfs/testdir# ls -ali
total 2932
8 drwxr-xr-x   2 root  wheel        1 Aug 23 13:17 .
1 drwxrwxrwx  43 root  wheel  3001433 Aug 23 12:28 ..
9 -rw-r--r--   1 root  wheel        0 Aug 23 13:17 file.txt
/mnt/mfs/testdir#
******
The newly created file got the same inode number as the recently deleted
directory "dir"...

Notes:
1) The effect is not exclusive to former directory inode numbers becoming file
inode numbers. It happens whenever the new object is of a different type than
the old one (so ex-directory inode number becomes re-used as file, ex-file as
fifo, ex-fifo as a device or directory etc.). The "ls -al" scenario is not the
only one, the same will happen if objects are created on FreeBSD machine and
then deleted from another machine, which is of course a normal occurrence in a
network file system.
2) Default inode reuse time in MooseFS is 24 hours. It was set to 5 minutes for
testing purposes only. The person, that reported the problem first (there were
others after), used the default 24 hours. And only inodes that are truly "free"
are reused, that means: no CWDs (active on any MooseFS client connected to the
instance), no sustained (deleted but still open) files are reused. The 24 hour
delay is counted from the moment they are considered free, so if a file is in a
sustained state for, let's say, 24 hours after deletion (and then whatever
process had a hold on it finally finishes), its inode number is still not
reused for another 24 hours.
3) Default cache times in MooseFS: file attributes cache timeout - 1 second,
extended attributes (xattr) cache timeout - 30 seconds, directory entry cache
timeout - 1 second, negative entry cache timeout - 0 seconds (default no
negative cache), symbolic link cache timeout - 300 seconds, supplementary
groups cache timeout - 300 seconds
4) Caches in the above experiment were ALL set to 0.
5) The problem was first reported on FreeBSD 12.1.

So, to sum it up: we say "don't cache anything at all/longer than 300 seconds",
FreeBSD caches indode attributes (we don't know, which ones, but at least type)
for longer than 24 hours and it causes a serious problem, because a new inode
with reused inode number is basically unusable in the file system.

-- 
You are receiving this mail because:
You are the assignee for the bug.