ZFS, compression, system load, pauses (livelocks?)

Tue Dec 15 19:57:08 UTC 2009

2009/12/15 Ivan Voras <ivoras at freebsd.org>:
> The context of this post is file servers running FreeBSD 8 and ZFS with
> compressed file systems on low-end hardware, or actually high-end hardware
> on VMWare ESX 3.5 and 4, which kind of makes it low-end as far as storage is
> concerned. The servers are standby backup mirrors of production servers -
> thus many writes, few reads.
>
> Running this setup I notice two things:
>
> 1) load averages get very high, though the only usage these systems get is
> file system usage:
> 2) long pauses, in what looks like vfs.zfs.txg.timeout second intervals,
> which seemengly block everything, or at least the entire userland. These
> pauses are sometimes so long that file transfers fail, which must be
> avoided.
>
> Looking at the list of processes it looks like a large number of kernel and
> userland processes are woken up at once. From the kernel side there are
> regularily all g_* threads, but also unrelated threads like bufdaemon,
> softdepflush, etc. and from the userland - top, syslog, cron, etc. It is
> like ZFS livelocks everything else.
>
> Any ideas on the "pauses" issue?
>

Hi,

I've a bit striped your post. It's kind of "me too" message (more
details here: http://lists.freebsd.org/pipermail/freebsd-geom/2009-December/003810.html).
What I've figured out so far is, that lowering the kernel thread
priority (as pjd@ suggested) gives quite promising results (no
livelocks at all). Though my bottleneck were caused by GELI thread.

The pattern there is like this:

sched_prio(curthread, PRIBIO);
[...]
msleep(sc, &sc->sc_queue_mtx, PDROP | PRIBIO,  "geli:w", 0);

I'm running right now with changed wersion - where I have:
msleep(sc, &sc->sc_queue_mtx, PDROP,  "geli:w", 0);

So I don't change initial thread priority. It doesn't give such result
as using PUSER prio, but I fear, that using PUSER may cause livelocks
in some  other cases.

This helps my case (geli encryption and periodic locks during ZFS
transaction commits) with some performance penalty, but I have similar
problems in other cases.

When I run:
# zfs scrub tank

Then "kernel" system process/thread consumes most of CPU (>95% in
system) and load rises to 20+ for the period of scrubbing. During
scrub my top screen looks like:
last pid: 87570;  load averages:  8.26,  2.84,  1.68
199 processes: 3 running, 179 sleeping, 17 waiting
CPU:  2.4% user,  0.0% nice, 97.0% system,  0.6% interrupt,  0.0% idle
Mem: 66M Active, 6256K Inact, 1027M Wired, 104K Cache, 240K Buf, 839M Free
Swap: 4096M Total, 4096M Free

  PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
    0 root       69  -8    0     0K   544K -      104:40 67.19% kernel
   24 root        1  -8    -     0K     8K geli:w   9:56  5.66% g_eli[0] ad6
   26 root        1  -8    -     0K     8K geli:w   9:50  5.47% g_eli[0] ad10
   25 root        1  -8    -     0K     8K geli:w   9:53  5.37% g_eli[0] ad8
    8 root       12  -8    -     0K   104K vgeom:  61:35  3.27% zfskern
    3 root        1  -8    -     0K     8K -        3:22  0.68% g_up
   11 root       17 -60    -     0K   136K WAIT    31:21  0.29% intr

Intresting thing, is that I have 17 processes waiting for CPU reported
(though only intr is the only process that is reported as in WAIT
state - at least for top40 processes).

I just wonder, whether this might be a scheduler related issue. I'm
thinking about giving a SCHED_4BSD a try.

Cheers,

Wiktor Niesiobędzki