Fresh 7.0 Install: Fatal Trap 12 panic when put under load

Jeremy Chadwick koitsu at FreeBSD.org
Thu Sep 11 10:56:33 UTC 2008


On Thu, Sep 11, 2008 at 12:08:47PM +0200, Michael Grant wrote:
> On Thu, Sep 11, 2008 at 11:20 AM, Jeremy Chadwick <koitsu at freebsd.org> wrote:
> > On Thu, Sep 11, 2008 at 10:38:36AM +0200, Michael Grant wrote:
> >> My box crashed again:
> >>
> >> panic: kmem_malloc(4096): kmem_map too small: 1073741824 total allocated
> >> cpuid = 0
> >> Uptime: 33d11h12m58s
> >> Dumping 3327 MB (2 chunks)
> >>   chunk 0: 1MB (151 pages) ... ok
> >>   chunk 1: 3327MB (851568 pages)  <---hung here
> >>
> >> Still no valid dump.
> >>
> >> There is 4gig of physical memory in the machine.
> >>
> >> In /boot/loader.conf, I currently have the following:
> >>
> >> vm.kmem_size=1G
> >> vm.kmem_size_max=1G
> >> vm.kmem_size_scale=2
> >>
> >> and in my kernel conf file I have:
> >>
> >> options         KVA_PAGES=512
> >>
> >> It stayed up for 33 days this time.  Is there anything else I can do?
> >
> > First and foremost: are you using ZFS on this machine?  If so, there are
> > many tunables you can apply to try and limit this; I'm willing to bet
> > it's ARC which is doing it.  See below.
> >
> > In general, it appears that you need to increase the maximum range of
> > kmem.  The kernel attempted to utilise more than 1GB, and your limit is
> > 1G.  My machines running RELENG_7 on amd64, with only 2GB of RAM
> > installed, use the following tunables in loader.conf:
> >
> > vm.kmem_size="1536M"
> > vm.kmem_size_max="1536M"
> >
> > If ZFS is in use, I recommend these as well:
> >
> > vfs.zfs.arc_min="16M"
> > vfs.zfs.arc_max="64M"
> > vfs.zfs.prefetch_disable="1"
> >
> > Do not increase kmem_size any larger than 1.5GB; the amount of RAM you
> > have in the machine, with regards to RELENG_7, will not help.  This is a
> > known limitation which has been fixed in HEAD/CURRENT (where the limit
> > has been increased to 512GB).  See the "Kernel" section below; you'll
> > see the applicable item.
> >
> > http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues
> >
> > Your only solution may be to run HEAD/CURRENT.
> 
> I am not running ZFS.  My file systems are ufs.
> 
> This feels like some sort of memory leak in the kernel.  Giving it
> more and more memory just seems to delay the crash.  Are you saying
> the crash is fixed in HEAD/CURRENT?

It's an intentional crash, not "the program tried to access NULL, which
crashed the machine" crash.  The kernel wants more memory to accomplish
a certain thing, and it's not available.  kris@ can explain this in
better terms than I can.

First and foremost, it would be good to find out what all you are
running on this machine (process-wise).  A process could be tickling
something in the kernel which requires a large amount of memory to be
required.  I can imagine something like MySQL would require this.

Ideally what needs to happen is to debug the kernel or get a full map
of kmem to find out what's using what.  I believe vmstat -m or vmstat -z
output might help.

Obviously since the machine panics, you won't be able to run those
commands after the fact.  I would recommend you set up a cronjob that
runs every 1-2 minutes and logs the output of both of those commands
to a file.  When the panic happens, restart the system and look at
the logfile to see if you can figure out if anything suddenly starts
taking up a large amount of memory, or if it's a gradual thing
(indicating a memory leak).

If you can figure out what might be tickling the problem, you can
ultimately figure out if increasing kmem is the right thing to do, or if
there's a greater problem here.

> I'm running 6.3 by the way.
> 
> I have put your changes into my loader.conf, we'll see how long it
> goes this time.  I'm not qute in position to update everything to 7.x
> at the moment.

Our production webservers run RELENG_6 and RELENG_7, and we don't
encounter this kind of problem.  I'm not saying what you're experiencing
is indicative of hardware issues or something like that -- I'm simply
saying I have loaded systems which don't ever hit that condition.  So
figuring out what's causing it in your case would be good.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |



More information about the freebsd-stable mailing list