ZFS Deadlock: 'Directory of Death'

Ori Bernstein ori at eigenstate.org
Sun Jul 10 18:36:43 UTC 2016


Looks like it, down to very very similar stack traces.

On Sun, 10 Jul 2016 14:07:28 -0400, Allan Jude <allanjude at freebsd.org> wrote:

> On 2016-07-10 13:54, Ori Bernstein wrote:
> > I've got a fun issue: There's a directory of death on my desktop running ZFS.
> > When installing ports, I Any program that tries to touch a directory in
> > /usr/ports hangs, deadlocked in the VFS cache lookup:
> > 
> > 	$ sudo procstat -kk 94688
> > 	PID    TID COMM             TDNAME           KSTACK                       
> > 	94688 102697 fuser            -                mi_switch+0xe1 sleepq_wait+0x3a sleeplk+0x15d __lockmgr_args+0x91a vop_stdlock+0x3c VOP_LOCK1_APV+0xab _vn_lock+0x43 vget+0x73 cache_lookup+0x5d5 vfs_cache_lookup+0xac VOP_LOOKUP_APV+0xa1 lookup+0x5a1 namei+0x4d4 kern_statat_vnhook+0xae sys_stat+0x2d amd64_syscall+0x40f Xfast_syscall+0xfb 
> > 
> > 
> > The process that seems to have triggered this was a port install of NPM, which seems to have
> > deadlocked while renaming some files:
> > 
> > 	$ ps a 93931
> > 	93931 15- D      0:11.87 npm (node)
> > 
> > 	$ sudo procstat -kk 93931
> > 	PID    TID COMM             TDNAME           KSTACK                       
> > 	93931 101449 node             -                mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_sem_wait+0x4db __umtx_op_sem_wait+0x73 amd64_syscall+0x40f Xfast_syscall+0xfb 
> > 	93931 101977 node             -                mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d kern_kevent_fp+0x399 kern_kevent+0x9f sys_kevent+0x12a amd64_syscall+0x40f Xfast_syscall+0xfb 
> > 	93931 102252 node             V8 WorkerThread  mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_sem_wait+0x4db __umtx_op_sem_wait+0x73 amd64_syscall+0x40f Xfast_syscall+0xfb 
> > 	93931 102258 node             V8 WorkerThread  mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_sem_wait+0x4db __umtx_op_sem_wait+0x73 amd64_syscall+0x40f Xfast_syscall+0xfb 
> > 	93931 102262 node             V8 WorkerThread  mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_sem_wait+0x4db __umtx_op_sem_wait+0x73 amd64_syscall+0x40f Xfast_syscall+0xfb 
> > 	93931 102266 node             V8 WorkerThread  mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_sem_wait+0x4db __umtx_op_sem_wait+0x73 amd64_syscall+0x40f Xfast_syscall+0xfb 
> > 	93931 102279 node             -                mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d umtxq_sleep+0x125 do_wait+0x387 __umtx_op_wait_uint_private+0x83 amd64_syscall+0x40f Xfast_syscall+0xfb 
> > 	93931 102285 node             -                mi_switch+0xe1 sleepq_wait+0x3a sleeplk+0x15d __lockmgr_args+0xca0 vop_stdlock+0x3c VOP_LOCK1_APV+0xab _vn_lock+0x43 vputx+0x21f zfs_rename_unlock+0x3e zfs_freebsd_rename+0xe39 VOP_RENAME_APV+0xab kern_renameat+0x4a6 amd64_syscall+0x40f Xfast_syscall+0xfb 
> > 	93931 102287 node             -                mi_switch+0xe1 sleepq_wait+0x3a sleeplk+0x15d __lockmgr_args+0x91a vop_stdlock+0x3c VOP_LOCK1_APV+0xab _vn_lock+0x43 zfs_lookup+0x45d zfs_freebsd_lookup+0x6d VOP_CACHEDLOOKUP_APV+0xa1 vfs_cache_lookup+0xd6 VOP_LOOKUP_APV+0xa1 lookup+0x5a1 namei+0x4d4 kern_renameat+0x1b7 amd64_syscall+0x40f Xfast_syscall+0xfb 
> > 	93931 102294 node             -                mi_switch+0xe1 sleepq_wait+0x3a sleeplk+0x15d __lockmgr_args+0x91a vop_stdlock+0x3c VOP_LOCK1_APV+0xab _vn_lock+0x43 vget+0x73 cache_lookup+0x5d5 vfs_cache_lookup+0xac VOP_LOOKUP_APV+0xa1 lookup+0x5a1 namei+0x4d4 kern_renameat+0x1b7 amd64_syscall+0x40f Xfast_syscall+0xfb 
> > 
> > The specific directory that is causing this issue is:
> > 
> > 	/usr/ports/www/npm/work/stage/usr/local/lib/node_modules/npm/node_modules/npmlog/node_modules/gauge
> > 
> > Right now I'm running on a stock FreeBSD 10.3 release.
> > 
> > 	FreeBSD oneeye 10.3-RELEASE-p4 FreeBSD 10.3-RELEASE-p4 #0: Sat May 28 12:23:44 UTC 2016     root at amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
> > 
> > I'm going to guess that this deadlock will go away if I reboot this box, but
> > I haven't rebooted yet. Since this isn't really impacting anything important
> > that I'm doing, I can keep it up for a bit if someone wants me to gather some
> > extra information to track down the root cause.
> > 
> 
> It looks like your problem is similar to:
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209158
> 
> If you read that PR and find your problem is similar, please add your
> information to that post.
> 
> 
> -- 
> Allan Jude
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"


-- 
    Ori Bernstein


More information about the freebsd-hackers mailing list