FreeBSD 8.1-Prerelease Panic amd64 w/ZFS..

'Jeremy Chadwick' freebsd at jdc.parodius.com
Fri May 28 15:51:51 UTC 2010


On Fri, May 28, 2010 at 10:51:59AM -0400, Howard Leadmon wrote:
>  Thanks Jeremy, I will try your recommended settings provided above.
> 
>  To the other poster, as to the settings of kmem, I had nothing specific
> set, just whatever FBSD was using by default.

vfs.zfs.arc_max is calculated on-the-fly during ZFS module/init time, I
believe, unless you explicitly set a value in loader.conf.  vm.kmem_size
is similar in that regard.  I do not know the calculation formulas.

vm.kmem_size_max is more or less static on amd64, because it represents
the maximum amount of kmem usable/addressable.  You can ignore the next
few paragraphs if you don't care about the history of this tunable, but
it will probably help folks reading the list.  This is what I've figured
out mostly on my own.

<history>
Prior to February 2009 this value was significantly smaller due to VM
design/implementation issues.  Alan Cox (not the Linux guy) did the
necessary work to fix this problem in RELENG_7 and committed things
then.

Fast forward to October 2009, by which time there were hundreds of posts
from users/SAs talking about ZFS, stability problems, and the dreaded
"kmem map is too small" error.  I sent the following to -stable:

http://lists.freebsd.org/pipermail/freebsd-stable/2009-October/052256.html

The first thing you're going to notice is that I'm talking about
RELENG_7, and specifically amd64.  However, the exact same code/efforts
(see above) was committed to RELENG_8 simultaneously (or within a very
short period of time).  So any RELENG_[78] amd64 system with sources
from 2009/02 or later should have a very large vm.kmem_size_max.  I can
confirm this on the couple RELENG_7 systems we have in production.

The second thing that you'll notice is one of the links in my mail:
it points to a post from pjd@ stating that on amd64 you need to adjust
vm.kmem_size, not vm.kmem_size_max.  Take note of when this was said:
September 2009.  This was *after* Alan Cox's work, and I'm certain Pawel
had that in mind.

Fast forward to... I'm not sure what date; sometime in mid or late
2009.  The behaviour of vfs.zfs.arc_max is changed so that it becomes a
*hard limit* rather than a "high-water mark" like it was previously.
I'm also not sure if this behaviour changed in just RELENG_8 or
RELENG_7.  My brain is full for a lot of different reasons; I try hard
to remember as much as I can but it's too much for one person.
</history>

Starting to see where all the confusion comes from?  :-)

Fast forward to today.  People are still complaining about the problem,
but when they do they usually don't provide enough details.  Why?
Because they don't know what details to provide.  And why is that?
Because people expect ZFS on FreeBSD to mimic Solaris 10 or OpenSolaris,
where it "just works" (I know because we use it at my workplace on
thousands of boxes).  You tell users "well, you have to tune
loader.conf" and they say "WHY?".  You tell them what to tune and they
ask "What values do I pick?", which vary from system to system and its
workload.  There's really no "magic number".

Getting FreeBSD to that stage is difficult from what I understand (I
believe John Baldwin and a few others have covered this topic).  There
are efforts underway to eventually solve this problem down the road.

Anyway, until then -- I've offered this in the past and I'll offer it
again: I'm 100% willing to sit down and write a document that could go
into the Handbook that covers ZFS tuning on FreeBSD, why it's necessary
(at this point in time), what values are needed, yadda yadda.  But I
can't write this for the same reason the ZFS section on the FreeBSD Wiki
is outdated -- because to get answers to some of the questions, one
needs the kernel folks working on this code to help provide answers.
Most of us (myself included) are not familiar with the inner-workings of
the ZFS port, nor are we fully familiar with that of the VM.  The
documentation dudes need the kernel dudes.  :-)

Back to the rest of your mail:

> In loader.conf all I had was:
> 
> zfs_load="YES"
> vfs.root.mountfrom="zfs:tank/root"
> 
> As to the setting of kmem and arc, I had the following which I will assume
> were defaults or auto-tunes:
> 
> vfs.zfs.arc_max : 862653440
> vm.kmem_size    : 1380245504
> vm.kmem_size_max: 329853485875
>
> {...below taken from your earlier mails...}
>
> panic:kmem_malloc(131072):kmem_map to small: 1296826368 total allocated

I'm not entirely sure, but I think vfs.zfs.arc_max, if not explicitly
set in loader.conf, might still act as a "high-water mark".  Meaning,
it's possible for the ZFS ARC to still exceed vm.kmem_size and cause a
panic.  Setting the arc_max value explicitly in loader.conf probably
forces a hard limit, but I'm not sure.  Can someone validate this?  I'm
basing it on the fact that 1,296,826,368 exceeds 862,653,440, and
*probably* was attempting to exceed 1,380,245,504.

What I do know is that by setting the two parameters I provided, I
can bang on a RELENG_8 box and watch kstat.zfs.misc.arcstats.size
never exceed vfs.zfs.arc_max, and the box never panics.  All our systems
in production, and my two at home, are tuned this way.

> I guess while we are all on the subject, I notice in the dmesg log the
> message:
> 
> ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is
> present;
>             to enable, add "vfs.zfs.prefetch_disable=0" to
> /boot/loader.conf.
> 
> Is this anything I want to enable, that's like a big performance win, or do
> I just not have enough RAM to support it?   Always been kinda curious about
> it, but so far I am liking ZFS, well outside of the machine panic..  LOL

And now you've touched on the *other* thing I've ranted about: how that
message isn't accurate (nor have previous incarnations).  Rather than
explain it here, you can just read my blog entry about this message and
hopefully what I've written will suffice for an explanation (see bottom
half of the post).

http://koitsu.wordpress.com/2009/10/12/testing-out-freebsd-8-0-rc1/

As for "should I actually enable this?" -- I've been in a private
conversation with another FreeBSD user about this, and like me, he isn't
sure either.  Where did this arbitrary limit come from, and why are we
being warned about it?  Where can we read about the decision?

This circles back to what I said earlier -- if documentation can't be
provided, at bare minimum some explanations given in src/UPDATING would
be sufficient.

That's about all I can say on the matter.  I do what I can, but the
ability to accomplish what's needed is mostly out of my control.

-- 
| Jeremy Chadwick                                   jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |



More information about the freebsd-fs mailing list