ZFS hang in production on 8.2-RELEASE

Mon Aug 29 20:04:06 UTC 2011

> On Mon, Aug 29, 2011 at 12:38 PM, Luke Marsden
> <luke-lists at hybrid-logic.co.uk> wrote:
> > Hi all,
> >
> > I've just noticed a "partial" ZFS deadlock in production on 8.2-RELEASE.
> >
> > FreeBSD XXX 8.2-RELEASE FreeBSD 8.2-RELEASE #0 r219081M: Wed Mar  2
> > 08:29:52 CET 2011     root at www4:/usr/obj/usr/src/sys/GENERIC  amd64
> >
> > There are 9 'zfs rename' processes and 1 'zfs umount -f' processes hung.
> > Here is the procstat for the 'zfs umount -f':
> >
> > 13451 104337 zfs              -                mi_switch+0x176
> > sleepq_wait+0x42 _sleep+0x317 zfsvfs_teardown+0x269 zfs_umount+0x1c4
> > dounmount+0x32a unmount+0x38b syscallenter+0x1e5 syscall+0x4b
> > Xfast_syscall+0xe2
> >
> > And the 'zfs rename's all look the same:
> >
> > 20361 101049 zfs              -                mi_switch+0x176
> > sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV
> > +0x46 _vn_lock+0x47 lookup+0x6e1 namei+0x53a kern_rmdirat+0xa4
> > syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
> >
> > An 'ls' on a directory which contains most of the system's ZFS
> > mount-points (/hcfs) also hangs:
> >
> > 30073 101466 gnuls            -                mi_switch+0x176
> > sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV
> > +0x46 _vn_lock+0x47 zfs_root+0x85 lookup+0x9b8 namei+0x53a vn_open_cred
> > +0x3ac kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall
> > +0xe2
> >
> > If I truss the 'ls' it hangs on the stat syscall:
> > stat("/hcfs",{ mode=drwxr-xr-x ,inode=3,size=2012,blksize=16384 }) = 0
> > (0x0)
> >
> > There is also a 'find -s / ! ( -fstype zfs ) -prune -or -path /tmp
> > -prune -or -path /usr/tmp -prune -or -path /var/tmp -prune -or
> > -path /var/db/portsnap -prune -or -print' running which is also hung:
> >
> >  2650 101674 find             -                mi_switch+0x176
> > sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV
> > +0x46 _vn_lock+0x47 zfs_root+0x85 lookup+0x9b8 namei+0x53a vn_open_cred
> > +0x3ac kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall
> > +0xe2
> >
> > However I/O to the presently mounted filesystems continues to work (even
> > on parts of filesystems which are unlikely to be cached), and 'zfs list'
> > showing all the filesystems (3,500 filesystems with ~100 snapshots per
> > filesystem) also works.
> >
> > Any activity on the structure of the ZFS hierarchy *under the hcfs
> > filesystem* crashes, such as a 'zfs create hpool/hcfs/test':
> >
> > 70868 101874 zfs              -                mi_switch+0x176
> > sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV
> > +0x46 _vn_lock+0x47 lookup+0x6e1 namei+0x53a kern_mkdirat+0xce
> > syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
> >
> > BUT "zfs create hpool/system/opt/hello" (a ZFS filesystem in the same
> > pool, but not rooted on hpool/hcfs) does not hang, and succeeds
> > normally.
> >
> > procstat -kk on the zfskern process gives:
> >
> >  PID    TID COMM             TDNAME
> > KSTACK
> >    5 100045 zfskern          arc_reclaim_thre mi_switch+0x176
> > sleepq_timedwait+0x42 _cv_timedwait+0x134 arc_reclaim_thread+0x2a9
> > fork_exit+0x118 fork_trampoline+0xe
> >    5 100046 zfskern          l2arc_feed_threa mi_switch+0x176
> > sleepq_timedwait+0x42 _cv_timedwait+0x134 l2arc_feed_thread+0x1ce
> > fork_exit+0x118 fork_trampoline+0xe
> >    5 100098 zfskern          txg_thread_enter mi_switch+0x176
> > sleepq_wait+0x42 _cv_wait+0x129 txg_thread_wait+0x79 txg_quiesce_thread
> > +0xb5 fork_exit+0x118 fork_trampoline+0xe
> >    5 100099 zfskern          txg_thread_enter mi_switch+0x176
> > sleepq_timedwait+0x42 _cv_timedwait+0x134 txg_thread_wait+0x3c
> > txg_sync_thread+0x365 fork_exit+0x118 fork_trampoline+0xe
> >
> > Any ideas on what might be causing this?
> 
> It sounds like the bug Martin Matuska has recently fixed in FreeBSD
> and reported upstream to Illumos:
> https://www.illumos.org/issues/1313
> 
> The fix has been MFC'ed to 8-STABLE r224647 on Aug 4th.

Thank you for such a quick response, but I'm not sure it's the right
solution.  The uptime on this server is 25 days, which is less than 28.
Also, I would expect the solution you described to cause issues globally
for the zpool, but the hangs only happens localised to one filesystem.
Also the bug report refers to slowing down ZFS writes, but this is a
hang (as if caused by a lock that ought to have been freed) on reads.

hybrid at ns382210:~$ sudo sysctl -a|grep tick
kern.clockrate: { hz = 1000, tick = 1000, profhz = 2000, stathz = 133 }
kern.timecounter.tick: 1
debug.tickdelay: 2

By the way, the server has 16GB of RAM with 5.8GB free, so I don't
suspect memory pressure causing the issue.  The zpool is 3.19T striped
over two disks.

-- 
Best Regards,
Luke Marsden
CTO, Hybrid Logic Ltd.

Web: http://www.hybrid-cluster.com/
Hybrid Web Cluster - cloud web hosting

Mobile: +1-415-449-1165 (US) / +447791750420 (UK)