ZFS, compression, system load, pauses (livelocks?)
bsd at w.evip.pl
Tue Dec 15 19:57:08 UTC 2009
2009/12/15 Ivan Voras <ivoras at freebsd.org>:
> The context of this post is file servers running FreeBSD 8 and ZFS with
> compressed file systems on low-end hardware, or actually high-end hardware
> on VMWare ESX 3.5 and 4, which kind of makes it low-end as far as storage is
> concerned. The servers are standby backup mirrors of production servers -
> thus many writes, few reads.
> Running this setup I notice two things:
> 1) load averages get very high, though the only usage these systems get is
> file system usage:
> 2) long pauses, in what looks like vfs.zfs.txg.timeout second intervals,
> which seemengly block everything, or at least the entire userland. These
> pauses are sometimes so long that file transfers fail, which must be
> Looking at the list of processes it looks like a large number of kernel and
> userland processes are woken up at once. From the kernel side there are
> regularily all g_* threads, but also unrelated threads like bufdaemon,
> softdepflush, etc. and from the userland - top, syslog, cron, etc. It is
> like ZFS livelocks everything else.
> Any ideas on the "pauses" issue?
I've a bit striped your post. It's kind of "me too" message (more
details here: http://lists.freebsd.org/pipermail/freebsd-geom/2009-December/003810.html).
What I've figured out so far is, that lowering the kernel thread
priority (as pjd@ suggested) gives quite promising results (no
livelocks at all). Though my bottleneck were caused by GELI thread.
The pattern there is like this:
msleep(sc, &sc->sc_queue_mtx, PDROP | PRIBIO, "geli:w", 0);
I'm running right now with changed wersion - where I have:
msleep(sc, &sc->sc_queue_mtx, PDROP, "geli:w", 0);
So I don't change initial thread priority. It doesn't give such result
as using PUSER prio, but I fear, that using PUSER may cause livelocks
in some other cases.
This helps my case (geli encryption and periodic locks during ZFS
transaction commits) with some performance penalty, but I have similar
problems in other cases.
When I run:
# zfs scrub tank
Then "kernel" system process/thread consumes most of CPU (>95% in
system) and load rises to 20+ for the period of scrubbing. During
scrub my top screen looks like:
last pid: 87570; load averages: 8.26, 2.84, 1.68
199 processes: 3 running, 179 sleeping, 17 waiting
CPU: 2.4% user, 0.0% nice, 97.0% system, 0.6% interrupt, 0.0% idle
Mem: 66M Active, 6256K Inact, 1027M Wired, 104K Cache, 240K Buf, 839M Free
Swap: 4096M Total, 4096M Free
PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
0 root 69 -8 0 0K 544K - 104:40 67.19% kernel
24 root 1 -8 - 0K 8K geli:w 9:56 5.66% g_eli ad6
26 root 1 -8 - 0K 8K geli:w 9:50 5.47% g_eli ad10
25 root 1 -8 - 0K 8K geli:w 9:53 5.37% g_eli ad8
8 root 12 -8 - 0K 104K vgeom: 61:35 3.27% zfskern
3 root 1 -8 - 0K 8K - 3:22 0.68% g_up
11 root 17 -60 - 0K 136K WAIT 31:21 0.29% intr
Intresting thing, is that I have 17 processes waiting for CPU reported
(though only intr is the only process that is reported as in WAIT
state - at least for top40 processes).
I just wonder, whether this might be a scheduler related issue. I'm
thinking about giving a SCHED_4BSD a try.
More information about the freebsd-fs