Fresh 7.0 Install: Fatal Trap 12 panic when put under load
Jeremy Chadwick
koitsu at FreeBSD.org
Thu Sep 11 10:56:33 UTC 2008
On Thu, Sep 11, 2008 at 12:08:47PM +0200, Michael Grant wrote:
> On Thu, Sep 11, 2008 at 11:20 AM, Jeremy Chadwick <koitsu at freebsd.org> wrote:
> > On Thu, Sep 11, 2008 at 10:38:36AM +0200, Michael Grant wrote:
> >> My box crashed again:
> >>
> >> panic: kmem_malloc(4096): kmem_map too small: 1073741824 total allocated
> >> cpuid = 0
> >> Uptime: 33d11h12m58s
> >> Dumping 3327 MB (2 chunks)
> >> chunk 0: 1MB (151 pages) ... ok
> >> chunk 1: 3327MB (851568 pages) <---hung here
> >>
> >> Still no valid dump.
> >>
> >> There is 4gig of physical memory in the machine.
> >>
> >> In /boot/loader.conf, I currently have the following:
> >>
> >> vm.kmem_size=1G
> >> vm.kmem_size_max=1G
> >> vm.kmem_size_scale=2
> >>
> >> and in my kernel conf file I have:
> >>
> >> options KVA_PAGES=512
> >>
> >> It stayed up for 33 days this time. Is there anything else I can do?
> >
> > First and foremost: are you using ZFS on this machine? If so, there are
> > many tunables you can apply to try and limit this; I'm willing to bet
> > it's ARC which is doing it. See below.
> >
> > In general, it appears that you need to increase the maximum range of
> > kmem. The kernel attempted to utilise more than 1GB, and your limit is
> > 1G. My machines running RELENG_7 on amd64, with only 2GB of RAM
> > installed, use the following tunables in loader.conf:
> >
> > vm.kmem_size="1536M"
> > vm.kmem_size_max="1536M"
> >
> > If ZFS is in use, I recommend these as well:
> >
> > vfs.zfs.arc_min="16M"
> > vfs.zfs.arc_max="64M"
> > vfs.zfs.prefetch_disable="1"
> >
> > Do not increase kmem_size any larger than 1.5GB; the amount of RAM you
> > have in the machine, with regards to RELENG_7, will not help. This is a
> > known limitation which has been fixed in HEAD/CURRENT (where the limit
> > has been increased to 512GB). See the "Kernel" section below; you'll
> > see the applicable item.
> >
> > http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues
> >
> > Your only solution may be to run HEAD/CURRENT.
>
> I am not running ZFS. My file systems are ufs.
>
> This feels like some sort of memory leak in the kernel. Giving it
> more and more memory just seems to delay the crash. Are you saying
> the crash is fixed in HEAD/CURRENT?
It's an intentional crash, not "the program tried to access NULL, which
crashed the machine" crash. The kernel wants more memory to accomplish
a certain thing, and it's not available. kris@ can explain this in
better terms than I can.
First and foremost, it would be good to find out what all you are
running on this machine (process-wise). A process could be tickling
something in the kernel which requires a large amount of memory to be
required. I can imagine something like MySQL would require this.
Ideally what needs to happen is to debug the kernel or get a full map
of kmem to find out what's using what. I believe vmstat -m or vmstat -z
output might help.
Obviously since the machine panics, you won't be able to run those
commands after the fact. I would recommend you set up a cronjob that
runs every 1-2 minutes and logs the output of both of those commands
to a file. When the panic happens, restart the system and look at
the logfile to see if you can figure out if anything suddenly starts
taking up a large amount of memory, or if it's a gradual thing
(indicating a memory leak).
If you can figure out what might be tickling the problem, you can
ultimately figure out if increasing kmem is the right thing to do, or if
there's a greater problem here.
> I'm running 6.3 by the way.
>
> I have put your changes into my loader.conf, we'll see how long it
> goes this time. I'm not qute in position to update everything to 7.x
> at the moment.
Our production webservers run RELENG_6 and RELENG_7, and we don't
encounter this kind of problem. I'm not saying what you're experiencing
is indicative of hardware issues or something like that -- I'm simply
saying I have loaded systems which don't ever hit that condition. So
figuring out what's causing it in your case would be good.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |
More information about the freebsd-stable
mailing list