panic(): vinvalbuf: dirty bufs: perhaps a ffs_syncvnode bug?

Thu Nov 16 00:23:59 PST 2006

Hi people,

At work, some storage is put on a SUN StorEdge 3510 FibreChannel array;
the disks are divided into two volumes, which are mounted using isp(4)
controllers. Two seperate machines each mount a single volume (this is,
box1 mounts the first volume (/dev/da1), and box2 mounts the second
volume (/dev/da2)). Both volumes are formatted using UFS2.

Over the night, we reset the shelf in order to activate its new
management IP address, causing the /dev/da[12] devices to be temporarily
unavailable. This resulted in the following panic on the rather busy
mailstorage server (the other server has minor load and was fine):

---
(da0:isp0:0:1:0): lost device
(da0:isp0:0:1:0): removing device entry
(da1:isp0:0:2:0): lost device
g_vfs_done():da1s1[WRITE(offset=292316823552, length=16384)]error = 6
g_vfs_done():da1s1[WRITE(offset=240287318016, length=16384)]error = 6
g_vfs_done():da1s1[READ(offset=12175362048, length=2048)]error = 6
g_vfs_done():da1s1[WRITE(offset=240287318016, length=16384)]error = 6
g_vfs_done():da1s1[READ(offset=18370689024, length=2048)]error = 6
g_vfs_done():da1s1[READ(offset=25829486592, length=512)]error = 6
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 78035 (lmtpd)
g_vfs_done():da1s1[WRITE(offset=240287318016,
length=1638(da1:isp0:0:2:0): Invalidating pack
4)]error = 6
g_vfs_done():da1s1[READ(offset=13768671232, length=6144)]error = 6
g_vfs_done():da1s1[READ(offset=102126977024, length=16384)]error = 6
g_vfs_done():da1s1[READ(offset=13768671232, length=6144)]error = 6
g_vfs_dpone():da1s1[READ(offset=102319669248, length=16384)]error = 6a
nic: vinvalbuf: dirty bufs
cpuid = 2
Uptime: 54d15h48m38s

kernel trap 12 with interrupts disabled

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x56
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0xc0681303
stack pointer           = 0x28:0xe8d973f0
frame pointer           = 0x28:0xe8d973f8
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = resume, IOPL = 0
current process         = 78066 (lmtpd)
trap number             = 12
---

When looking at the source code of vinvalbuf(), which calls
bufobj_invalbuf(), it seems that this panic is raised after a bufobj
still contains dirty data after waiting for it to complete without
error. The code can be found at /sys/kern/vfs_subr.c

The sync routine called eventually translates to bufsync(), as in
/sys/kern/vfs_bio.c, which calls the filesystem's sync routine. It seems
as if the return status of vfs_bio_awrite() in ffs_syncvnode() is not
checked; all the other parts are checked. I believe this could provoke
this panic.

As the machine is in production use, it was instantly rebooted by a
collegue and thus I have no vmcore, backtrace or anything. I therefore
hope the information provided here is adequate.

Can someone with more FreeBSD-VFS knowledge please look at this?

Thanks!

-- 
Rink P.W. Springer                                - http://rink.nu
"It's you isn't it? THE BASTARD OPERATOR FROM HELL!"
"In the flesh, on the phone and in your account..."       - BOFH #3
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 2415 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20061116/91575736/smime-0001.bin