soft updates / background fsck directory link count bug

Tor Egge Tor.Egge at
Sat Sep 24 12:08:14 PDT 2005

> I believe the problem is that handle_workitem_remove() is putting the
> the dirrem on the inodep inowait list, but it is never getting moved to
> the inodep bufwait list because ffs_update() and
> softdep_update_inodeblock() are not getting called for the leaf
> directory after the dirrem is put on the inowait list if the link count
> is too large.


Running the commands (on an idle system)

	  dirchain=`jot $levels | tr '\n' '/'`
	  mkdir -p $dirchain
	  fsync $dirchain
	  rm -rf 1

and monitoring the number of dirrem structures allocated in the kernel (while
sleep 1; do vmstat -m | grep dirrem; done) shows that the number of dirrem
structures slowly decreases.  In this scenario, the rundown still happens since
the link count on the inodes are normal.

When the rundown doesn't start due to an elevated link count on the leaf inode
then a situation might occur where there are no dirty blocks and no softupdate
depdendecies for the file system on the global work list while some inodedep
and dirrem dependencies for that file system are still lingering.

ffs_sync() doesn't detect these lingering dependencies, and vfs_write_suspend()
returns without any errors, indicating that the file system has been suspended.

> In the normal case, it appears that the dirrem migration is triggered
> when the inode is zeroed in ufs_inactive(), which happens when the first
> call to handle_workitem_remove() calls vput().

Intermediate nodes ends up waiting for the child inode being zeroed and then
written to disk.

> Perhaps the dirrem should be put on the inowait list before the call to
> ffs_truncate().

If softdep_slowdown() returns a nonzero value then ffs_truncate() can call
ffs_syncvnode() before di_size has been set to 0.  If the inodeblock is written
due to fsync() operations on other inodes in the same inodeblock then the
dirrem dependency would be moved to the global work list too early.

Enclosed is a patch that forces an ffs_update() call from ufs_inactive() by
setting the IN_CHANGE flag if i_effnlink is larger than 0 right before the call
to vput().  An alternative is checking i_nlink instead of i_effnlink for faster

- Tor Egge
-------------- next part --------------
Index: sys/ufs/ffs/ffs_softdep.c
RCS file: /home/ncvs/src/sys/ufs/ffs/ffs_softdep.c,v
retrieving revision 1.184
diff -u -r1.184 ffs_softdep.c
--- sys/ufs/ffs/ffs_softdep.c	5 Sep 2005 22:14:33 -0000	1.184
+++ sys/ufs/ffs/ffs_softdep.c	24 Sep 2005 18:31:04 -0000
@@ -3477,6 +3477,8 @@
 	WORKLIST_INSERT(&inodedep->id_inowait, &dirrem->dm_list);
+	if (ip->i_effnlink > 0)
+		ip->i_flag |= IN_CHANGE;

More information about the freebsd-current mailing list