panic: mutex Giant owned at /tank/usr/src/sys/kern/kern_thread.c:357

Thu Dec 31 17:36:12 UTC 2009

On Dec 31, 2009, at 5:49 AM, John Baldwin wrote:

> On Wednesday 30 December 2009 4:55:44 pm Marcel Moolenaar wrote:
>> All,
>> 
>> We still have a ZFS-triggerable panic. The conditions under which the panic
>> happens are "simple":
>> 
>> 1.  Create a mount-point /dos, and mount a MS-DOS file system
>>    there.
>> 2.  Create directory /dos/zfs
>> 3.  Make /boot/zfs a symlink to /dos/zfs
>> 4.  create or import a pool, like "zpool import tank"
>> 
>> ZFS will create/update the zpool cache (/boot/zfs/zpool.cache)
>> and when done exits the zfskern/solthread thread, at which time
>> the panic happens:
>> 
>> panic: mutex Giant owned at /tank/usr/src/sys/kern/kern_thread.c:357
>> cpuid = 0
>> KDB: enter: panic
>> [thread pid 8 tid 100147 ]
>> Stopped at      kdb_enter+0x92: [I2]    addl r14=0xffffffffffe1f3f0,gp ;;
>> db> show alllocks
>> Process 8 (zfskern) thread 0xe000000010df4a20 (100147)
>> exclusive sleep mutex process lock (process lock) r = 0 (0xe000000010407660) 
> locked @ /tank/usr/src/sys/kern/kern_kthread.c:326
>> exclusive sleep mutex Giant (Giant) r = 1 (0xe0000000048f8da8) locked @ 
> /tank/usr/src/sys/kern/vfs_lookup.c:755
>> 
>> It looks to me that this is a bug in vfs_lookup.c, but I'm not
>> savvy enough to know this for sure or fix it fast myself. Help
>> is welcome, because this particular bug hits ia64 hard: /boot
>> is a symlink to /efi/boot, where /efi is a msdosfs mount point.
> 
> Can you get a stack trace?  The bug is probably that ZFS isn't properly 
> honoring NDHASGIANT() someplace.  Hmm, it certainly doesn't honor it
> in lookupnameat().  You could maybe have it unlock Giant there, but I
> believe that will result in ZFS not acquiring Giant for any vnode
> operations on a returned vnode from a !MPSAFE filesystem.

The backtrace is rather useless:

# zpool import tank
panic: mutex Giant owned at /tank/usr/src/sys/kern/kern_thread.c:357
cpuid = 1
KDB: enter: panic
[thread pid 8 tid 100105 ]
Stopped at      kdb_enter+0x92: [I2]    addl r14=0xffffffffffe1fab8,gp ;;
db> bt
Tracing pid 8 tid 100105 td 0xe0000000109e1560
kdb_enter(0xe0000000047984c0, 0xe0000000047984c0, 0xe00000000439bb70, 0x793) at kdb_enter+0x92
panic(0xe000000004796058, 0xe000000004796728, 0xe000000004799b48, 0x165) at panic+0x2f0
_mtx_assert(0xe000000004911828, 0x0, 0xe000000004799b48, 0x165) at _mtx_assert+0x200
thread_exit(0xe000000004799b48, 0x0, 0xe000000004793480, 0xe0000000109e1560) at thread_exit+0x70
kthread_exit(0xe000000004793480, 0xe000000010407568, 0xe000000004c7bb80, 0x58f) at kthread_exit+0xd0
spa_async_thread(0xe000000010e59000, 0x1, 0xe000000004791aa8, 0x343) at spa_async_thread+0x1a0
fork_exit(0xe00000000485f130, 0xe000000010e59000, 0xa000000034b61550) at fork_exit+0x110
enter_userland() at enter_userland

I traced the locks (with a tweak to get Giant included) and it
looks like vfs_lookup.c is fine: there are as many unlocks of
Giant as there are locks.

Unfortunately the default trace buffer size doesn't capture
the problem entirely. Though it may show a little bit already:

	:
882 (0xe0000000109e1560:cpu1): _mtx_unlock_sleep: 0xe000000004911828 unrecurse
881 (0xe0000000109e1560:cpu1): UNLOCK (sleep mutex) Giant 0xe000000004911828 r = 2 at /tank/usr/src/sys/modules/zfs/../../cddl/compat/opensolaris/sys/vnode.h:266
	:
861 (0xe0000000109e1560:cpu1): vfs_ref: mp 0xe0000000108e0ed8
860 (0xe0000000109e1560:cpu1): LOCK (sleep mutex) Giant 0xe000000004911828 r = 2 at /tank/usr/src/sys/modules/zfs/../../cddl/compat/opensolaris/sys/vnode.h:264
859 (0xe0000000109e1560:cpu1): _mtx_lock_sleep: 0xe000000004911828 recursing
858 (0xe0000000109e1560:cpu1): _mtx_unlock_sleep: 0xe000000004911828 unrecurse
857 (0xe0000000109e1560:cpu1): UNLOCK (sleep mutex) Giant 0xe000000004911828 r = 2 at /tank/usr/src/sys/kern/vfs_syscalls.c:3675
856 (0xe0000000109e1560:cpu1): _mtx_unlock_sleep: 0xe000000004911828 unrecurse
855 (0xe0000000109e1560:cpu1): UNLOCK (sleep mutex) Giant 0xe000000004911828 r = 3 at /tank/usr/src/sys/kern/vfs_syscalls.c:3674
	:
(some sleep operations that result in repeated unlocks and
 locks of Giant)
	:

When the trace starts Giant has been locked 4 times (recursively) and
the only unmatched unlocks are those in vfs_syscalls.c shows above.
Thus: we're still missing 2 unlocks of Giant somewhere of which the
lock location is not known.

I'll redo the experiment with a 128K entry trace buffer or so and
see what comes up...

BTW: Xin LI gave me a patch with 2 missing unlocks of Giant in zfs_dir.c
It seem ZFS is rather sloppy WRT to Giant :-/

FYI,

-- 
Marcel Moolenaar
xcllnt at mac.com