[sparc64] [ZFS] panic: mutex vnode interlock not owned

Sat Jul 3 08:55:17 UTC 2010

(hello freebsd-fs@; I'm cc:ing you since the latest part of my story
involves a ZFS-related panic and I hear you're the right place to go with
those.  It began attempting to debug a VM locking panic and has moved a
little...)

On Thu, Jun 10, 2010 at 12:23:24PM -0500, Alan Cox wrote:
> On Thu, Jun 10, 2010 at 7:16 AM, John Baldwin <jhb at freebsd.org> wrote:
> 
> > On Wednesday 09 June 2010 5:27:47 pm Nathaniel W Filardo wrote:
> > > Attempting to boot on (2-way SMP; SUN Fire V240) sparc64 a 9.0-CURRENT
> > > kernel built on Jun 9 at 14:41, and fully csup'd before building (I don't
> > > have the SVN revision number, sorry) yields, surprisingly late in the
> > boot
> > > process, this panic:
> > >
> > > panic: mutex vm object not owned at /systank/src/sys/vm/vm_object.c:1692
> > > cpuid = 0
> > > KDB: stack backtrace:
> > > panic() at panic+0x1c8
> > > _mtx_assert() at _mtx_assert+0xb0
> > > vm_object_collapse() at vm_object_collapse+0x28
> > > vm_object_deallocate() at vm_object_deallocate+0x538
> > > _vm_map_unlock() at _vm_map_unlock+0x64
> > > vm_map_remove() at vm_map_remove+0x64
> > > vmspace_exit() at vmspace_exit+0x100
> > > exit1() at exit1+0x788
> > > sys_exit() at sys_exit+0x10
> > > syscallenter() at syscallenter+0x268
> > > syscall() at syscall+0x74
> > > -- syscall (1, FreeBSD ELF64, sys_exit) %o7=0x11980c --
> > > userland() at 0x406fe8c8
> > > user trace: trap %o7=0x11980c
> > > pc 0x406fe8c8, sp 0x7fdffff7611
> > > done
> > > Uptime: 4m7s
> > >
> > > The system was, at the time, attempting to bring up its jails.
> > >
> > > Anything else that would be helpful to know?
> >
> > Can you get a crashdump?  If so, it would be good to pull up gdb and check
> > the
> > value sof 'object' and 'robject' in the vm_object_deallocate() frame.
> >
> >
> That would be useful.  None of the locking changes of the last few weeks
> have altered the vm object locking, so this assertion failure and stack
> trace come as something of a surprise.
> 
> Alan

Well, I thought that no longer delegating ZFS (with "zfs jail") to the jail
whose startup was causing the above panic might solve the problem and indeed
the system made it slightly further.  A few minutes after reaching the
login: prompt, though, it produced

panic: mutex vnode interlock not owned at /systank/src/sys/kern/kern_mutex.c:223
cpuid = 0
KDB: stack backtrace:
panic() at panic+0x1c8
_mtx_assert() at _mtx_assert+0xb0
_mtx_unlock_flags() at _mtx_unlock_flags+0x144
vnlru_free() at vnlru_free+0x500
getnewvnode() at getnewvnode+0x7c
zfs_znode_cache_constructor() at zfs_znode_cache_constructor+0x4c
zfs_znode_alloc() at zfs_znode_alloc+0x34
zfs_zget() at zfs_zget+0x2b8
zfs_dirent_lock() at zfs_dirent_lock+0x508
zfs_dirlook() at zfs_dirlook+0x50
zfs_lookup() at zfs_lookup+0x1bc
zfs_freebsd_lookup() at zfs_freebsd_lookup+0x6c
VOP_CACHEDLOOKUP_APV() at VOP_CACHEDLOOKUP_APV+0x108
vfs_cache_lookup() at vfs_cache_lookup+0xfc
VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x110
lookup() at lookup+0x7d0
namei() at namei+0x69c
kern_statat_vnhook() at kern_statat_vnhook+0x48
kern_statat() at kern_statat+0x1c
kern_lstat() at kern_lstat+0x18
lstat() at lstat+0x14
syscallenter() at syscallenter+0x27c
syscall() at syscall+0x74
-- syscall (190, FreeBSD ELF64, lstat) %o7=0x12b830 --
...

which at least is consistent with my hunch that the original panic had
something to do with ZFS.  The system is as of svn 209653 (git c65b199...)
with http://people.freebsd.org/~marius/sparc64_pin_ipis.diff applied.  The
old kernel has uname
  FreeBSD hydra.priv.oc.ietfng.org 9.0-CURRENT FreeBSD 9.0-CURRENT #20: Sun
  Apr  4 20:31:58 EDT 2010
  root at hydra.priv.oc.ietfng.org:/systank/obj/systank/src/sys/NWFKERN  sparc64
which is probably too old to be of use to anybody, but just in case, there
it is.  I don't suspect the machine of having bad hardware since this old
kernel runs apparently fine on it and zpool scrubs haven't found anything
yet.

I can't easily get a crash dump on the system (if somebody could tell me how
to get one from a ddb(4) prompt, I could try that, but otherwise the system
just ceases to do anything after panic; I have swap and dump set, so I'm not
sure what's not happening there...).

Anything more I should do?
--nwf;
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20100703/da3b4002/attachment.pgp