Process enters unkillable state and somewhat wedges zfs

John Baldwin jhb at freebsd.org
Wed Aug 27 20:43:09 UTC 2014


On Monday, August 25, 2014 1:25:01 pm Daniel Andersen wrote:
> On 08/20/2014 11:06 AM, Bryan Drewery wrote:
> > On 8/18/2014 2:29 PM, Daniel Andersen wrote:
> >> We are currently experiencing a strange problem that sort of locks up one 
of our zfs pools.  This is on a FreeBSD 10
> >> machine.  Let me give a rough layout of our system to better describe 
what is happening:
> > 
> > When it happens get the output of 'procstat -kka|grep zfs' please.
> > 
> 
> I'd read something about that while attempting to debug the last few times, 
so I happen to have that output, I believe.
> Here are the last two examples ( the latter having had more wedged processes 
):
> 
>     0 100687 kernel           zfs_vn_rele_task mi_switch+0xde 
sleepq_wait+0x3a _sleep+0x26f taskqueue_thread_loop+0xd5
> fork_exit+0x9a fork_trampoline+0xe
>     0 101055 kernel           zfs_vn_rele_task mi_switch+0xde 
sleepq_wait+0x3a _sleep+0x26f taskqueue_thread_loop+0xd5
> fork_exit+0x9a fork_trampoline+0xe
>    38 100306 zfskern          arc_reclaim_thre mi_switch+0xde 
sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d
> arc_reclaim_thread+0x302 fork_exit+0x9a fork_trampoline+0xe
>    38 100307 zfskern          l2arc_feed_threa mi_switch+0xde 
sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d
> l2arc_feed_thread+0xb29 fork_exit+0x9a fork_trampoline+0xe
>    38 100672 zfskern          trim tank        mi_switch+0xde 
sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d
> trim_thread+0x9b fork_exit+0x9a fork_trampoline+0xe
>    38 100688 zfskern          txg_thread_enter mi_switch+0xde 
sleepq_wait+0x3a _cv_wait+0x16d txg_quiesce_thread+0x30b
> fork_exit+0x9a fork_trampoline+0xe
>    38 100689 zfskern          txg_thread_enter mi_switch+0xde 
sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d
> txg_sync_thread+0x1dd fork_exit+0x9a fork_trampoline+0xe
>    38 101054 zfskern          trim work        mi_switch+0xde 
sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d
> trim_thread+0x9b fork_exit+0x9a fork_trampoline+0xe
>    38 101056 zfskern          txg_thread_enter mi_switch+0xde 
sleepq_wait+0x3a _cv_wait+0x16d txg_quiesce_thread+0x30b
> fork_exit+0x9a fork_trampoline+0xe
>    38 101057 zfskern          txg_thread_enter mi_switch+0xde 
sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d
> txg_sync_thread+0x1dd fork_exit+0x9a fork_trampoline+0xe
> 16656 101610 ls               -                mi_switch+0xde 
sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950
> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 
namei+0x504 kern_statat_vnhook+0xa5
> sys_lstat+0x30 amd64_syscall+0x357 Xfast_syscall+0xfb
> 
> 
> and
> 
>     0 100688 kernel           zfs_vn_rele_task mi_switch+0xde 
sleepq_wait+0x3a _sleep+0x26f taskqueue_thread_loop+0xd5
> fork_exit+0x9a fork_trampoline+0xe
>     0 101056 kernel           zfs_vn_rele_task mi_switch+0xde 
sleepq_wait+0x3a _sleep+0x26f taskqueue_thread_loop+0xd5
> fork_exit+0x9a fork_trampoline+0xe
>    38 100306 zfskern          arc_reclaim_thre mi_switch+0xde 
sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d
> arc_reclaim_thread+0x302 fork_exit+0x9a fork_trampoline+0xe
>    38 100307 zfskern          l2arc_feed_threa mi_switch+0xde 
sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d
> l2arc_feed_thread+0xb29 fork_exit+0x9a fork_trampoline+0xe
>    38 100673 zfskern          trim tank        mi_switch+0xde 
sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d
> trim_thread+0x9b fork_exit+0x9a fork_trampoline+0xe
>    38 100689 zfskern          txg_thread_enter mi_switch+0xde 
sleepq_wait+0x3a _cv_wait+0x16d txg_quiesce_thread+0x30b
> fork_exit+0x9a fork_trampoline+0xe
>    38 100690 zfskern          txg_thread_enter mi_switch+0xde 
sleepq_wait+0x3a _cv_wait+0x16d zio_wait+0x5b
> vdev_uberblock_sync_list+0xad vdev_config_sync+0x118 spa_sync+0x827 
txg_sync_thread+0x375 fork_exit+0x9a
> fork_trampoline+0xe
>    38 101055 zfskern          trim work        mi_switch+0xde 
sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d
> trim_thread+0x9b fork_exit+0x9a fork_trampoline+0xe
>    38 101057 zfskern          txg_thread_enter mi_switch+0xde 
sleepq_wait+0x3a _cv_wait+0x16d txg_quiesce_thread+0x30b
> fork_exit+0x9a fork_trampoline+0xe
>    38 101058 zfskern          txg_thread_enter mi_switch+0xde 
sleepq_timedwait+0x3a _cv_timedwait_sbt+0x18d
> txg_sync_thread+0x1dd fork_exit+0x9a fork_trampoline+0xe
>  2638 101160 nfsd             nfsd: service    mi_switch+0xde 
sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950
> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_vget+0xfe 
nullfs_vget+0x54 nfsrvd_readdirplus+0x905
> nfsrvd_dorpc+0x773 nfssvc_program+0x4f6 svc_run_internal+0x1f9 
svc_thread_start+0xb fork_exit+0x9a fork_trampoline+0xe
>  3891 101641 collectd         -                mi_switch+0xde 
sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950
> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 
namei+0x504 kern_statfs+0x6f
> sys_statfs+0x2c amd64_syscall+0x357 Xfast_syscall+0xfb
> 11808 101802 sendsize         -                mi_switch+0xde 
sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950
> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 
namei+0x504 kern_statat_vnhook+0xa5
> sys_stat+0x2d amd64_syscall+0x357 Xfast_syscall+0xfb
> 13900 101899 sendsize         -                mi_switch+0xde 
sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950
> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 
namei+0x504 kern_statat_vnhook+0xa5
> sys_stat+0x2d amd64_syscall+0x357 Xfast_syscall+0xfb
> 17961 102536 df               -                mi_switch+0xde 
sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950
> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 
namei+0x504 kern_statfs+0x6f
> sys_statfs+0x2c amd64_syscall+0x357 Xfast_syscall+0xfb
> 38964 102279 ls               -                mi_switch+0xde 
sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950
> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 
namei+0x504 kern_statat_vnhook+0xa5
> sys_lstat+0x30 amd64_syscall+0x357 Xfast_syscall+0xfb
> 42315 102539 df               -                mi_switch+0xde 
sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950
> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 
namei+0x504 kern_statfs+0x6f
> sys_statfs+0x2c amd64_syscall+0x357 Xfast_syscall+0xfb
> 42335 101782 sendsize         -                mi_switch+0xde 
sleepq_wait+0x3a sleeplk+0x11c __lockmgr_args+0x950
> vop_stdlock+0x3c VOP_LOCK1_APV+0x9d _vn_lock+0x43 zfs_root+0x8f lookup+0x7f0 
namei+0x504 kern_statat_vnhook+0xa5
> sys_stat+0x2d amd64_syscall+0x357 Xfast_syscall+0xfb

These are all blocked in "zfs" then.  (For future reference, the 'mwchan' 
field that you see as 'STATE' in top or via 'ps O mwchan' is more detailed 
than the 'D' state.)

To diagnose this further, you would need to see which thread holds the
ZFS vnode lock these threads need.  I have some gdb scripts you can use to
do that at www.freebsd.org/~jhb/gdb/.  You would want to download 'gdb6*'
files from there and then do this as root:

# cd /path/to/gdb/files
# kgdb
(kgdb) source gdb6
(kgdb) sleepchain 42335

Where '42335' is the pid of some process stuck in "zfs".

-- 
John Baldwin


More information about the freebsd-fs mailing list