zfs arc - just take it all and be good to me

Marco van Tol marco at tols.org
Tue Aug 10 21:44:24 UTC 2010


Hi there,

What you will find here is a short description of my attempts to find
the optimal settings for zfs memory, followed by what I ended up with at
this point.  Then I started wondering what I was missing, because I
think what I'm doing might be plain stupid.  Because noone else seems to
be doing it this way.  I hope it makes a tiny bit of sense.  If it
doesn't just pretend you didn't read it.  ;)

All the way at the bottom of the mail you will find some details about
the hardware / software.

I have been trying, like many others, what would be best practice
regarding zfs, kmem_size, arc_max and what not and had trouble finding
that reliable magic post that just summed up the magic formula.  So I
went ahead and mixed a couple of advises I had seen during my search and
at some point ended up on a machine with 2 Gigabytes of physical memory
with the settings:

vm.kmem_size: 3G
vfs.zfs.arc_max: 750M

The 3G vm.kmem_size on a machine with 2G physical memory comes from
something I had read about fragmented kernel memory which would warrant
this setting.

Now, these settings annoyed me a lot because, it used to be that page
cache (and with that I mean active/inactive memory) was auto tuned.
That it would (in short) take as much available memory as possible, and
just release it when an application needed it.  I'm describing it simple
here because I am by no means a filesystem and/or kernel expert.

At some point I stumbled upon a blog post from Kip Macy, and a reply
from Richard Elling at:
http://daemonflux.blogspot.com/2010/01/zfs-arc-page-cache-and-1970s-buffer.html

Somewhere around this time I started to think that an auto-tuned arc
might still be possible given that ZFS releases memory when there is
high demand for it.  So I googled for "freebsd zfs memory pressure".

After reading through a couple of hits I felt like just trying it and
ended up with the settings:

physical memory: 2G
vm.kmem_size: 3G
vfs.zfs.arc_max: 1792M

Then I setup a couple of terms:
- top -I   # To monitor active/inactive/wired/free
- sysctl kstat.zfs.misc.arcstats.size   # (In a loop) To monitor the arc size
- # And a term to do some tests while watching those values

# test 1 - tar the filesystem to grow the arc from reads
-> tar -cf /dev/null /
Sure enough the arc grew rapid to some max value.  After a little while
the values where:
12M Active, 18M Inact, 1789M Wired, 144K Cache, 156M Free
kstat.zfs.misc.arcstats.size: 1697754720

Nothing to worry about so far

# Test 2 - do a bunch of writes to see if that makes things different
-> bash> for i in {1..10} ; do dd if=/dev/zero of=/zfspath/file$i \
         bs=1m count=1024 ; done
And again the arc would grow and shrink a little bit, leaving the
values:
8308K Active, 22M Inact, 1710M Wired, 136K Cache, 235M Free
kstat.zfs.misc.arcstats.size: 1596992192

Still nothing to worry about

# Test 3 - let an application demand some memory and see what happens.
-> perl -e '$x="x"x500_000_000'
After perl completed the values were:
5112K Active, 5884K Inact, 932M Wired, 14M Cache, 1019M Free
kstat.zfs.misc.arcstats.size: 817991496
No differences in swap usage worth mentioning, I think somewhere around
5 megabytes.  Top mentioned pages going into swap very briefly.
(Side info: I had run a too large value with the perl step before, which
left me with 35MB swap)


# Test 4 - All of test 1, 2 and 3 at the same time, for a (probably
# lame) attempt at a mixed environment.  For test 1 (tar) I excluded
# the files where I was writing the files for test 2 (dd).  Test 3 ran
# in a sleepless loop with the value 1_000_000_000 instead of the 500
# megs I used in the original test 3.
Ending values:
21M Active, 7836K Inact, 1672M Wired, 4140K Cache, 272M Free
kstat.zfs.misc.arcstats.size: 1570260528
Swap usage didn't change in the running top I was watching.

------

All in all this looks like a close attempt at zfs memory being auto
tuned while using maximum amount of memory.  The only problem is, nobody
else is doing it like this so its very likely that this is not the smart
thing to do.  What are the first problems the zfs people can think off
with a setup like this?

Thanks in advance!

Marco van Tol

------

Machine Details:

zpool status
  pool: tank
 state: ONLINE
 scrub: none requested
config:

        NAME             STATE     READ WRITE CKSUM
        tank             ONLINE       0     0     0
          raidz1         ONLINE       0     0     0
            gpt/tank_s0  ONLINE       0     0     0
            gpt/tank_s1  ONLINE       0     0     0
            gpt/tank_s2  ONLINE       0     0     0
            gpt/tank_s3  ONLINE       0     0     0

errors: No known data errors

hw.machine: amd64
hw.model: Intel(R) Atom(TM) CPU  330   @ 1.60GHz
hw.ncpu: 2
hw.physmem: 2135396352
hw.pagesize: 4096
vm.kmem_size: 3221225472
vfs.zfs.version.spa: 14
vfs.zfs.version.zpl: 3
vfs.zfs.prefetch_disable: 1
vfs.zfs.zil_disable: 0
vfs.zfs.zio.use_uma: 0
vfs.zfs.vdev.cache.size: 10485760
vfs.zfs.arc_min: 234881024
vfs.zfs.arc_max: 1879048192


gpart show
=>       34  976773101  ada0  GPT  (466G)
         34        128     1  freebsd-boot  (64K)
        162    4194304     2  freebsd-swap  (2.0G)
    4194466  972578669     3  freebsd-zfs  (464G)

=>       34  976773101  ada1  GPT  (466G)
         34        128     1  freebsd-boot  (64K)
        162    4194304     2  freebsd-swap  (2.0G)
    4194466  972578669     3  freebsd-zfs  (464G)

=>       34  976773101  ada2  GPT  (466G)
         34        128     1  freebsd-boot  (64K)
        162    4194304     2  freebsd-swap  (2.0G)
    4194466  972578669     3  freebsd-zfs  (464G)

=>       34  976773101  ada3  GPT  (466G)
         34        128     1  freebsd-boot  (64K)
        162    4194304     2  freebsd-swap  (2.0G)
    4194466  972578669     3  freebsd-zfs  (464G)


swap:
/dev/gpt/swap0 none swap sw 0 0
/dev/gpt/swap1 none swap sw 0 0
/dev/gpt/swap2 none swap sw 0 0
/dev/gpt/swap3 none swap sw 0 0

zfs list 
NAME           USED  AVAIL  REFER  MOUNTPOINT
tank          83.7G  1.24T  28.4K  legacy
tank/dirvish  21.6G   106G  21.6G  /dirvish
tank/home     21.4G  1.24T  21.4G  /home
tank/mm       30.1G   120G  30.1G  /mm
tank/root      745M   279M   745M  legacy
tank/tmp       126K  4.00G   126K  /tmp
tank/usr      2.95G  13.1G  2.95G  /usr
tank/var       115M  3.89G   115M  /var

-- 
A male gynecologist is like an auto mechanic who never owned a car.
- Carrie Snow


More information about the freebsd-fs mailing list