panic(): vinvalbuf: dirty bufs: perhaps a ffs_syncvnode bug?

Mon Nov 27 06:54:07 PST 2006

On Thu, Nov 16, 2006 at 09:24:07AM +0100, Rink Springer wrote:
> Over the night, we reset the shelf in order to activate its new
> management IP address, causing the /dev/da[12] devices to be temporarily
> unavailable. This resulted in the following panic on the rather busy
> mailstorage server (the other server has minor load and was fine):

> ---
> (da0:isp0:0:1:0): lost device
> (da0:isp0:0:1:0): removing device entry
> (da1:isp0:0:2:0): lost device
> g_vfs_done():da1s1[WRITE(offset=292316823552, length=16384)]error = 6
> g_vfs_done():da1s1[WRITE(offset=240287318016, length=16384)]error = 6
> g_vfs_done():da1s1[READ(offset=12175362048, length=2048)]error = 6
> g_vfs_done():da1s1[WRITE(offset=240287318016, length=16384)]error = 6
> g_vfs_done():da1s1[READ(offset=18370689024, length=2048)]error = 6
> g_vfs_done():da1s1[READ(offset=25829486592, length=512)]error = 6
> vnode_pager_getpages: I/O read error
> vm_fault: pager read error, pid 78035 (lmtpd)
> g_vfs_done():da1s1[WRITE(offset=240287318016,
> length=1638(da1:isp0:0:2:0): Invalidating pack
> 4)]error = 6
> g_vfs_done():da1s1[READ(offset=13768671232, length=6144)]error = 6
> g_vfs_done():da1s1[READ(offset=102126977024, length=16384)]error = 6
> g_vfs_done():da1s1[READ(offset=13768671232, length=6144)]error = 6
> g_vfs_dpone():da1s1[READ(offset=102319669248, length=16384)]error = 6a
> nic: vinvalbuf: dirty bufs
> cpuid = 2
> Uptime: 54d15h48m38s

> When looking at the source code of vinvalbuf(), which calls
> bufobj_invalbuf(), it seems that this panic is raised after a bufobj
> still contains dirty data after waiting for it to complete without
> error. The code can be found at /sys/kern/vfs_subr.c

Note that this panic can only occur if vinvalbuf() is called with
V_SAVE (save modified data first).

The exact condition for the panic is better described as: a bufobj still
contains dirty data or still has output in progress after a successful
synchronous BO_SYNC operation.  bufobj_wwait() cannot return an error
unless msleep() fails (e.g. interruptible sleep requested via slpflag
and signal occured).  If the I/O has failed, bufobj_wwait() will return
success.

> The sync routine called eventually translates to bufsync(), as in
> /sys/kern/vfs_bio.c, which calls the filesystem's sync routine. It seems
> as if the return status of vfs_bio_awrite() in ffs_syncvnode() is not
> checked; all the other parts are checked. I believe this could provoke
> this panic.

There does not seem much point in checking an asynchronous write result
anyway, as the I/O is not completed yet.

I don't understand well what the code is doing with async writes.  For
all but the last pass (see further), it will call bawrite() on the
buffer, which sets B_ASYNC then calls bwrite(). For the last pass, it
calls bwrite() directly (has something cleared B_ASYNC?), and returns an
error if it fails. bwrite() itself is an inline function defined in
/sys/sys/buf.h, which calls BO_WRITE after some KASSERTs.

> As the machine is in production use, it was instantly rebooted by a
> collegue and thus I have no vmcore, backtrace or anything. I therefore
> hope the information provided here is adequate.

> Can someone with more FreeBSD-VFS knowledge please look at this?

There is another possible problem, from this comment in
/sys/ufs/ffs/ffs_vnops.c ffs_syncvnode():
/*
 * Block devices associated with filesystems may
 * have new I/O requests posted for them even if
 * the vnode is locked, so no amount of trying will
 * get them clean. Thus we give block devices a
 * good effort, then just give up. For all other file
 * types, go around and try again until it is clean.
 */
Actually it just does NIADDR + 1 (four) passes and then gives up.  If
DIAGNOSTIC is enabled, it will then print the affected vnode, if it is
not a disk.  This failure is not reflected in ffs_syncvnode()'s return
value, so if it occurs when ffs_syncvnode() is called from
bufobj_invalbuf(), a panic will result.

Suppose ffs_syncvnode() would be changed to return some error in this
case.  bufobj_invalbuf()/vinvalbuf() will handle a BO_SYNC/ffs_syncvnode()
error by aborting with an error return.  It seems that in most cases
this will cause the operation invoking the vinvalbuf() to fail.
However, in at least one case (vm_object_terminate()), the error will be
ignored; this may cause old garbage/dangling references?

-- 
Jilles Tjoelker