ZFS: processes hanging when trying to access filesystems
Tim Bishop
tim-lists at bishnet.net
Mon Apr 23 14:38:04 UTC 2012
Here's a comparison of top output. This shows the higher context
switching. I'm not sure if this is part of the cause of the problems, or
just an effect:
"top -Sj -m io"
last pid: 95277; load averages: 0.04, 0.11, 0.13 up 20+05:31:54 15:29:52
186 processes: 2 running, 182 sleeping, 1 stopped, 1 waiting
CPU: 4.1% user, 0.0% nice, 3.6% system, 0.0% interrupt, 92.3% idle
Mem: 412M Active, 488M Inact, 4685M Wired, 52M Cache, 551M Buf, 288M Free
Swap: 6144M Total, 316M Used, 5828M Free, 5% Inuse
PID JID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT COMMAND
12 0 root 617 1 0 0 0 0 0.00% intr
11 0 root 584 1212 0 0 0 0 0.00% idle
0 0 root 322 46 0 0 0 0 0.00% kernel
3 0 root 257 1 0 0 0 0 0.00% g_up
4 0 root 175 3 0 0 0 0 0.00% g_down
13 0 root 20 0 0 0 0 0 0.00% yarrow
5 0 root 17 0 0 16 0 16 88.89% zfskern
641 0 _pflogd 4 0 0 0 0 0 0.00% pflogd
last pid: 92079; load averages: 0.39, 0.22, 0.18 up 20+05:22:39 15:20:37
197 processes: 2 running, 192 sleeping, 1 stopped, 1 zombie, 1 waiting
CPU: 0.0% user, 0.0% nice, 5.3% system, 1.5% interrupt, 93.2% idle
Mem: 484M Active, 478M Inact, 4655M Wired, 52M Cache, 551M Buf, 257M Free
Swap: 6144M Total, 316M Used, 5828M Free, 5% Inuse
PID JID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT COMMAND
11 0 root 3945 6837 0 0 0 0 0.00% idle
12 0 root 2130 1 0 0 0 0 0.00% intr
0 0 root 2008 99 0 0 0 0 0.00% kernel
3 0 root 1810 0 0 0 0 0 0.00% g_up
4 0 root 1486 12 0 0 0 0 0.00% g_down
13 0 root 20 2 0 0 0 0 0.00% yarrow
5 0 root 19 0 2 66 0 68 95.77% zfskern
20 0 root 9 0 0 0 0 0 0.00% g_mirror r
The latter shows the machine when it's unresponsive and processes are
starting to hang.
Tim.
On Tue, Mar 27, 2012 at 07:14:57PM +0100, Tim Bishop wrote:
> I have a machine running 8-STABLE amd64 from the end of last week. I
> have a problem where the machine starts to freeze up. Any process
> accessing the ZFS filesystems hangs, which eventually causes more and
> more processes to be spawned (cronjobs, etc, never complete). Although
> the root filesystem is on UFS (the machine hosts jails on ZFS),
> eventually I can't log in anymore.
>
> The problem occurs when the frequently used part of the ARC gets too
> large. See this graph:
>
> http://dl.dropbox.com/u/318044/zfs_arc_utilization-day.png
>
> At the right of the graph things started to hang.
>
> At the same time I see a high amount of context switching.
>
> I picked a hanging process and procstat showed the following:
>
> PID TID COMM TDNAME KSTACK
> 24787 100303 mutt - mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 txg_wait_open+0x85 dmu_tx_assign+0x170 zfs_inactive+0xf1 zfs_freebsd_inactive+0x1a vinactive+0x71 vputx+0x2d8 null_reclaim+0xb3 vgonel+0x119 vrecycle+0x7b null_inactive+0x1f vinactive+0x71 vputx+0x2d8 vn_close+0xa1 vn_closefile+0x5a _fdrop+0x23
>
> I'm running a reduced amount of jails on the machine at the moment which
> is limiting the speed at which the machine freezes up completely. I'd
> like to debug this problem further, so any advice on useful information
> to collect would be appreciated.
>
> I've had this problem on the machine before[1] but adding more RAM
> allievated the issue.
>
> Tim.
>
> [1] http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058541.html
--
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x5AE7D984
More information about the freebsd-fs
mailing list