ZFS: processes hanging when trying to access filesystems

Mon Apr 23 17:33:37 UTC 2012

> -----Original Message-----
> From: owner-freebsd-fs at freebsd.org [mailto:owner-freebsd-fs at freebsd.org]
> On Behalf Of Tim Bishop
> Sent: Monday, April 23, 2012 7:38 AM
> To: freebsd-fs at freebsd.org
> Subject: Re: ZFS: processes hanging when trying to access filesystems
> 
> Here's a comparison of top output. This shows the higher context
> switching. I'm not sure if this is part of the cause of the problems, or
> just an effect:
> 
> "top -Sj -m io"
> 
> last pid: 95277;  load averages:  0.04,  0.11,  0.13   up 20+05:31:54 15:29:52
> 186 processes: 2 running, 182 sleeping, 1 stopped, 1 waiting
> CPU:  4.1% user,  0.0% nice,  3.6% system,  0.0% interrupt, 92.3% idle
> Mem: 412M Active, 488M Inact, 4685M Wired, 52M Cache, 551M Buf, 288M Free
> Swap: 6144M Total, 316M Used, 5828M Free, 5% Inuse
> 
>   PID JID USERNAME   VCSW  IVCSW   READ  WRITE  FAULT  TOTAL PERCENT
> COMMAND
>    12   0 root        617      1      0      0      0      0   0.00% intr
>    11   0 root        584   1212      0      0      0      0   0.00% idle
>     0   0 root        322     46      0      0      0      0   0.00% kernel
>     3   0 root        257      1      0      0      0      0   0.00% g_up
>     4   0 root        175      3      0      0      0      0   0.00% g_down
>    13   0 root         20      0      0      0      0      0   0.00% yarrow
>     5   0 root         17      0      0     16      0     16  88.89% zfskern
>   641   0 _pflogd       4      0      0      0      0      0   0.00% pflogd
> 
> last pid: 92079;  load averages:  0.39,  0.22,  0.18   up 20+05:22:39 15:20:37
> 197 processes: 2 running, 192 sleeping, 1 stopped, 1 zombie, 1 waiting
> CPU:  0.0% user,  0.0% nice,  5.3% system,  1.5% interrupt, 93.2% idle
> Mem: 484M Active, 478M Inact, 4655M Wired, 52M Cache, 551M Buf, 257M Free
> Swap: 6144M Total, 316M Used, 5828M Free, 5% Inuse
> 
>   PID JID USERNAME   VCSW  IVCSW   READ  WRITE  FAULT  TOTAL PERCENT
> COMMAND
>    11   0 root       3945   6837      0      0      0      0   0.00% idle
>    12   0 root       2130      1      0      0      0      0   0.00% intr
>     0   0 root       2008     99      0      0      0      0   0.00% kernel
>     3   0 root       1810      0      0      0      0      0   0.00% g_up
>     4   0 root       1486     12      0      0      0      0   0.00% g_down
>    13   0 root         20      2      0      0      0      0   0.00% yarrow
>     5   0 root         19      0      2     66      0     68  95.77% zfskern
>    20   0 root          9      0      0      0      0      0   0.00% g_mirror
r
> 
> The latter shows the machine when it's unresponsive and processes are
> starting to hang.
> 
> Tim.
> 
> On Tue, Mar 27, 2012 at 07:14:57PM +0100, Tim Bishop wrote:
> > I have a machine running 8-STABLE amd64 from the end of last week. I
> > have a problem where the machine starts to freeze up. Any process
> > accessing the ZFS filesystems hangs, which eventually causes more and
> > more processes to be spawned (cronjobs, etc, never complete). Although
> > the root filesystem is on UFS (the machine hosts jails on ZFS),
> > eventually I can't log in anymore.
> >
> > The problem occurs when the frequently used part of the ARC gets too
> > large. See this graph:
> >
> > http://dl.dropbox.com/u/318044/zfs_arc_utilization-day.png
> >
> > At the right of the graph things started to hang.
> >
> > At the same time I see a high amount of context switching.
> >
> > I picked a hanging process and procstat showed the following:
> >
> >   PID    TID COMM             TDNAME           KSTACK
> > 24787 100303 mutt             -                mi_switch+0x176
sleepq_wait+0x42
> _cv_wait+0x129 txg_wait_open+0x85 dmu_tx_assign+0x170 zfs_inactive+0xf1
> zfs_freebsd_inactive+0x1a vinactive+0x71 vputx+0x2d8 null_reclaim+0xb3
> vgonel+0x119 vrecycle+0x7b null_inactive+0x1f vinactive+0x71 vputx+0x2d8
> vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23
> >
> > I'm running a reduced amount of jails on the machine at the moment which
> > is limiting the speed at which the machine freezes up completely. I'd
> > like to debug this problem further, so any advice on useful information
> > to collect would be appreciated.
> >
> > I've had this problem on the machine before[1] but adding more RAM
> > allievated the issue.
> >
> > Tim.
> >
> > [1] http://lists.freebsd.org/pipermail/freebsd-stable/2010-
> September/058541.html
> 

We've seen this happen and were able to not only devise a way of replicating it
quickly, but also able to identify the root cause (and have since fixed the
issue).

When this first started, it was occurring on one system only and that system was
crashing maybe once per day (by crash, I mean hang exactly as described by the
OP).

Then we started doing volume load testing and that's when things changed for the
worst. By running the following shell script 7 times on local vty's, we were
able to replicate the problem within 15 minutes (even faster if threw in some
more running via SSH -- but in diagnosing the issue we wanted to prove first to
ourselves that it was not network related):

	#!/bin/sh
	while :;do
		echo "############################## $(date)"
		bonnie
	done

Eventually (for us, about 15 minutes -- with 7 concurrent instances running with
a CWD within the ZFS filesystem; YMMV) bonnie stops.

We did lots of things. First we watched the stats using:

	#!/bin/sh
	while :;do
		echo "############################## $(date)"
		sysctl vfs.zfs kstat.zfs
		sleep 1
	done | tee zfs-stats.${HOSTNAME%%.*}

And it *looked* like we were hitting some limit (kstat.zfs.misc.arcstats.p
appears to be ever-climbing and then BOOM! -- hang).

But, after tuning the following parameters in /boot/loader.conf:

	# ZFS kernel tune
	vm.kmem_size="2560M"
	vfs.zfs.arc_min="1536M"
	vfs.zfs.arc_max="1536M"
	vfs.zfs.prefetch_disable="1"
	vfs.zfs.txg.timeout="5"
	vfs.zfs.vdev.min_pending="3"
	vfs.zfs.vdev.max_pending="6"

Things changed enough that we no longer thought we were hitting some limit
(arcstats.p would grow then drop, grow then drop, ad nauseum several times
before reaching the hang).

So...

We finally resolved to dismember the array (something we avoided doing until we
ran out of other ideas) and test each member-disk with "SpinRite" -- a FreeDOS
based bootable disc/diskette capable of performing miracles on drives.

Well, turned out that the 5th (out of 6) drives was overheating under load. We
were able to reliably reproduce the overheating with SpinRite (which actively
monitors drive temperature during all disk operations). We saw the drive (still
in our possession) routinely reach 127 degrees (Farhenheit) about 10-15 minutes
under nominal load.

We replaced that drive with a new one, re-assembled the array with the new
drive, and started the rebuild operation. Within the hour we were back in
operation AND... the system now survives multi-day heavy-volume loads running
continuously throughout -- no more issues to report.

We've kept the drive!

I brought this up at the last BAFUG meeting and a couple guys said that we
should use that drive to improve ZFS.

I'm game if anybody else is.
-- 
Devin

_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.