fsync: giving up on dirty on ufs partitions running vfs_write_suspend()

Fri Sep 8 13:42:25 UTC 2017

I try to describe the cause for the "fsync: given up on dirty" problem
described in

https://lists.freebsd.org/pipermail/freebsd-fs/2012-February/013804.html
or
https://lists.freebsd.org/pipermail/freebsd-fs/2013-August/018163.html

Now I run FreeBSD 10.3 Stable r317936 and sometimes I see messages like

 <kern.crit> dssbkp4 kernel: fsync: giving up on dirty
 <kern.crit> dssbkp4 kernel: 0xfffff80040d6c938: tag devfs, type VCHR
 <kern.crit> dssbkp4 kernel: usecount 1, writecount 0, refcount 47
mountedhere 0xfffff8004083a200
 <kern.crit> dssbkp4 kernel: flags (VI_ACTIVE)
 <kern.crit> dssbkp4 kernel: v_object 0xfffff800409b3500 ref 0 pages
1138 cleanbuf 42 dirtybuf 4
 <kern.crit> dssbkp4 kernel: lock type devfs: EXCL by thread
0xfffff800403a8a00 (pid 26, g_journal switcher, tid 100181)
 <kern.crit> dssbkp4 kernel: dev mirror/gmbkp4p5.journal
 <kern.crit> dssbkp4 kernel: GEOM_JOURNAL: Cannot suspend file system
/home (error=35).

on all of my servers running gjournal. Similar messages can be seen when
a snapshot is taken (e.g. dump -L) on a arbitrary ufs partition. In all
these cases the function vfs_write_suspend() was called which returned
EAGAIN. This error code is set in vop_stdfsync(), when the above
messages are created.

First I was confused about the "mountedhere" address, because the given
address does not point to a "struct mount" but (as type = VCHR
indicates) to a "struct cdev". Threfore I suggest the following patch to
improve the output of vn_printf() using the textstrings from defines in
/sys/sys/vnode.h:

--- vfs_subr.c.orig     2017-05-08 14:17:38.000000000 +0200
+++ vfs_subr.c  2017-08-30 10:45:47.549740000 +0200
@@ -3003,6 +3003,8 @@
 static char *typename[] =
 {"VNON", "VREG", "VDIR", "VBLK", "VCHR", "VLNK", "VSOCK", "VFIFO", "VBAD",
  "VMARKER"};
+static char *typetext[] =
+{"", "", "mountedhere", "", "rdev", "", "socket", "fifoinfo", "", ""};

 void
 vn_printf(struct vnode *vp, const char *fmt, ...)
@@ -3016,8 +3018,9 @@
        va_end(ap);
        printf("%p: ", (void *)vp);
        printf("tag %s, type %s\n", vp->v_tag, typename[vp->v_type]);
-       printf("    usecount %d, writecount %d, refcount %d mountedhere
%p\n",
-           vp->v_usecount, vp->v_writecount, vp->v_holdcnt,
vp->v_mountedhere);
+       printf("    usecount %d, writecount %d, refcount %d %s %p\n",
+           vp->v_usecount, vp->v_writecount, vp->v_holdcnt,
typetext[vp->v_type],
+           vp->v_mountedhere);
        buf[0] = '\0';
        buf[1] = '\0';
        if (vp->v_vflag & VV_ROOT)

Second I found, that the "dirty" situation during vfs_write_suspend()
only occurs when a big file (more than 10G on a partition of 116G) is
removed. If vfs_write_suspend() is called immediately after "rm
bigfile", then in vop_stdfsync() 1000 tries (maxretry) are done to wait
for the "rm bigfile" to complete. Because a lot of bitmap writes must be
done, the value 1000 is not sufficient on my servers. I have increased
maxretry and in the worst case I saw 8650 tries to complete without
"dirty". In this case the time spent in vop_stdfsync() was about 0,5
seconds. The following patch solves the "dirty problem" for me:

--- vfs_default.c.orig  2016-10-24 12:26:57.000000000 +0200
+++ vfs_default.c       2017-09-08 12:49:18.059970000 +0200
@@ -644,7 +644,7 @@
        struct bufobj *bo;
        struct buf *nbp;
        int error = 0;
-       int maxretry = 1000;     /* large, arbitrarily chosen */
+       int maxretry = 100000;   /* large, arbitrarily chosen */

        bo = &vp->v_bufobj;
        BO_LOCK(bo);

---
Andreas Longwitz