ZFS hang in zfs_freebsd_rename
Bengt Ahlgren
bengta at sics.se
Tue Dec 15 12:52:39 UTC 2015
We have a server running 9.3-REL which currenly has two quite large zfs
pools:
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
p1 18.1T 10.7T 7.38T 59% 1.00x ONLINE -
p2 43.5T 29.1T 14.4T 66% 1.00x ONLINE -
It has been running without any issues for some time now. Once, just
now, processes are getting stuck and impossible to kill on accessing a
particular directory in the p2 pool. That pool is a 2x6 disk raidz2.
One process is stuck in zfs_freebsd_rename, and other processes
accessing that particular directory also get stuck. The system is now
almost completely idle.
Output from kgdb on the running system for that first process:
Thread 651 (Thread 102157):
#0 sched_switch (td=0xfffffe0b14059920, newtd=0xfffffe001633e920, flags=<value optimized out>)
at /usr/src/sys/kern/sched_ule.c:1904
#1 0xffffffff808f4604 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:485
#2 0xffffffff809308e2 in sleepq_wait (wchan=0xfffffe0135b60488, pri=96) at /usr/src/sys/kern/subr_sleepqueue.c:618
#3 0xffffffff808cf922 in __lockmgr_args (lk=0xfffffe0135b60488, flags=524544, ilk=0xfffffe0135b604b8,
wmesg=<value optimized out>, pri=<value optimized out>, timo=<value optimized out>,
file=0xffffffff80f0d782 "/usr/src/sys/kern/vfs_subr.c", line=2337) at /usr/src/sys/kern/kern_lock.c:221
#4 0xffffffff80977369 in vop_stdlock (ap=<value optimized out>) at lockmgr.h:97
#5 0xffffffff80dd4a04 in VOP_LOCK1_APV (vop=0xffffffff813e8160, a=0xffffffa07f935520) at vnode_if.c:2052
#6 0xffffffff80998c17 in _vn_lock (vp=0xfffffe0135b603f0, flags=524288,
file=0xffffffff80f0d782 "/usr/src/sys/kern/vfs_subr.c", line=2337) at vnode_if.h:859
#7 0xffffffff8098b621 in vputx (vp=0xfffffe0135b603f0, func=1) at /usr/src/sys/kern/vfs_subr.c:2337
#8 0xffffffff81ac7955 in zfs_rename_unlock (zlpp=0xffffffa07f9356b8)
at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:3609
#9 0xffffffff81ac8c72 in zfs_freebsd_rename (ap=<value optimized out>)
at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4039
#10 0xffffffff80dd4f04 in VOP_RENAME_APV (vop=0xffffffff81b47d40, a=0xffffffa07f9358e0) at vnode_if.c:1522
#11 0xffffffff80996bbd in kern_renameat (td=<value optimized out>, oldfd=<value optimized out>,
old=<value optimized out>, newfd=-100, new=0x1826a9af00 <Error reading address 0x1826a9af00: Bad address>,
pathseg=<value optimized out>) at vnode_if.h:636
#12 0xffffffff80cd228a in amd64_syscall (td=0xfffffe0b14059920, traced=0) at subr_syscall.c:135
#13 0xffffffff80cbc907 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396
---Type <return> to continue, or q <return> to quit---
#14 0x0000000800cc1acc in ?? ()
Previous frame inner to this frame (corrupt stack?)
Full procstat -kk -a and kgdb "thread apply all bt" can be found here:
https://www.sics.se/~bengta/ZFS-hang/
I don't know how to produce "alltrace in ddb" as the instructions in the
wiki says. It runs the GENERIC kernel, so perhaps it isn't possible?
I checked "camcontrol tags" for all the disks in the pool - all have
zeroes for dev_active, devq_queued and held.
Is there anything else I can check while the machine is up? I however
need to restart it pretty soon.
Bengt
More information about the freebsd-fs
mailing list