Strange slowdown when cache devices enabled in ZFS

Freddie Cash fjwcash at gmail.com
Thu Apr 25 17:44:28 UTC 2013


On Thu, Apr 25, 2013 at 8:51 AM, Freddie Cash <fjwcash at gmail.com> wrote:

> I haven't had a chance to run any of the DTrace scripts on any of my ZFS
> systems, but I have narrowed down the issue a bit.
>
> If I set primarycache=all and secondarycache=all, then adding an L2ARC
> device to the pool will lead to zfskern{l2arc_feed_thread} taking up 100%
> of one CPU core and stalling I/O to the pool.
>
> If I set primarycache=all and secondarycache=metadata, then adding an
> L2ARC device to the pool speeds things up (zfs send/recv saturates a 1 Gbps
> link; and the nightly rsync backups run finishes 4 hours earlier).
>
> I haven't tested the other two combinations (metadata/metadata;
> metadata/all) as yet.
>
> This is consistent across two ZFS systems so far:
>   - 8-core Opteron 6100-series CPU with 48 GB of RAM; 44 GB ARC, 40 GB
> metadata limit; 3x raidz2
>   - 2x 8-core Opteron 6100-series CPU with 128 GB of RAM; 64 GB ARC, 60 GB
> metadata limit; 5x raidz2
>
> Still reading up on dtrace/hwpmc as time permits.  Just wanted to pass
> along the above to show I haven't forgotten about this yet.  :)  $JOB/$LIFE
> slows things down sometimes.  :)
>

And, I may have narrowed it down even further.  It appears (still testing)
that the following sysctl can be toggled to enable/disable the behaviour
that leads to the 100% CPU usage and I/O stalls to the pool:

vfs.zfs.l2arc_norw

The default setting is 1, which (if I'm reading things right) means no
reads from the L2ARC while writing data to the L2ARC.  Setting this to 0
(which allows reads and writes to occur concurrently to L2ARC?) on a pool
that is "stalled" makes things work again.

Since these pools all have dedupe enabled, is it possible that the L2ARC
feed thread searching the ARC for stuff to write to L2 is starving/blocking
reads of the DDT from the L2ARC thus preventing the pool from writing any
new data?


I've managed to get the hotkernel dtrace script to work on one system.
It's sampling with vfs.zfs.l2arc_norw=1 enabled and doing some zfs sends
right now.  Hopefully it will stall while I'm sampling.  :)

I'm also in the process of upgrading the biggest box to the latest 9-STABLE
from this morning, and enabling DTrace on there.  With run hotkernel with
l2arc_norw enabled and disabled once the upgrade is complete.

-- 
Freddie Cash
fjwcash at gmail.com


More information about the freebsd-fs mailing list