ZFS FAQ (Was: SSD recommendations for ZFS cache/log)
Adam Nowacki
nowakpl at platinum.linux.pl
Tue Nov 20 07:56:03 UTC 2012
On 2012-11-20 05:59, Charles Sprickman wrote:
> Wonderful to see some work on this.
>
> One of the great remaining zfs mysteries remains all the tunables
> that are under "vfs.zfs.*". Obviously there are plenty of read-only
> items there, but conflicting information gathered from random forum
> posts and commit messages exist about what exactly one can do
> regarding tuning beyond arc sizing.
>
> If you have any opportunity to work with the people who have ported
> and are now maintaining zfs, it would be really wonderful to get
> some feedback from them on what knobs are safe to twiddle and why.
> I suspect many of the tunable items don't really have meaningful
> equivalents in Sun's implementation since the way zfs falls under
> the vfs layer in FreeBSD is so different.
>
> Thanks,
>
> Charles
I'll share my experiences while tuning for home NAS:
vfs.zfs.write_limit_* is a mess.
6 sysctls work together to produce a single value - maximum size of txg
commit. If size of data yet to be stored on disk grows to this size a
txg commit will be forced, but there is a catch, this size is only an
estimate and absolutely worst case one at that - multiply by 24 (there
is a reason for this madness below). This means that writing a 1MB file
will result in 24MB estimated txg commit size (+ metadata). Back to the
sysctls:
# vfs.zfs.write_limit_override - if not 0 absolutely override write
limit (ignore other sysctls), if 0 then an internal dynamically computed
value is used based on:
# vfs.zfs.txg.synctime_ms - adjust write limit based on previous txg
commits so the time to write is equal to this value in milliseconds
(basically estimates disks write bandwidth),
# vfs.zfs.write_limit_shift - sets vfs.zfs.write_limit_max to ram size /
2^write_limit_shift,
# vfs.zfs.write_limit_max - used to derive vfs.zfs.write_limit_inflated
(multiply by 24), but only if vfs.zfs.write_limit_shift is not 0,
# vfs.zfs.write_limit_inflated - maximum size of the dynamic write limit,
# vfs.zfs.write_limit_min - minimum size of the dynamic write limit,
and to have the whole picture:
# vfs.zfs.txg.timeout - force txg commit every this many seconds if it
didn't happen by write limit.
For my home NAS (10x 2TB disks encrypted with geli in raidz2, cpu with
hw aes, 16GB ram, 2x 1GE for samba and iSCSI with MCS) I have ended with:
/boot/loader.conf:
vfs.zfs.write_limit_shift="4" # 16GB ram / 2^4 = 1GB limit
vfs.zfs.write_limit_min="2400M" # 100MB minimum multiplied by the 24
factor, during heavy read-write operations dynamic write limit would
enter positive feedback loop and reduce write limit too much
vfs.zfs.txg.synctime_ms="2000" # try to maintain 2 seconds commit time
during large writes
vfs.zfs.txg.timeout="120" # 2 minutes to reduce fragmentation and wear
from small writes, worst case scenario 2 minutes of asynchronous writes
is lost, synchronous end in ZIL anyway
and for completness:
vfs.zfs.arc_min="10000M"
vfs.zfs.arc_max="10000M"
vfs.zfs.vdev.cache.size="16M" # vdev cache helps a lot during scrubs
vfs.zfs.vdev.cache.bshift="14" # grow all i/o requests to 16kiB, smaller
have shown to have same latency so might as well get more "for free"
vfs.zfs.vdev.cache.max="16384"
vfs.zfs.vdev.write_gap_limit="0"
vfs.zfs.vdev.read_gap_limit="131072"
vfs.zfs.vdev.aggregation_limit="131072" # group smaller reads into one
larger, benchmarking shown no appreciable latency increase while again
getting more bytes
vfs.zfs.vdev.min_pending="1"
vfs.zfs.vdev.max_pending="1" # seems to help txg commit bandwidth by
reducing seeking with parallel reads (not fully tested)
and a reason for 24 factor (4 * 3 * 2, from the code):
/*
* The worst case is single-sector max-parity RAID-Z blocks, in which
* case the space requirement is exactly (VDEV_RAIDZ_MAXPARITY + 1)
* times the size; so just assume that. Add to this the fact that
* we can have up to 3 DVAs per bp, and one more factor of 2 because
* the block may be dittoed with up to 3 DVAs by ddt_sync().
*/
More information about the freebsd-fs
mailing list