Fresh 7.0 Install: Fatal Trap 12 panic when put under load
Michael Grant
mgrant at grant.org
Mon Nov 24 06:59:40 PST 2008
On Thu, Sep 11, 2008 at 11:56 AM, Jeremy Chadwick <koitsu at freebsd.org> wrote:
> On Thu, Sep 11, 2008 at 12:08:47PM +0200, Michael Grant wrote:
>> On Thu, Sep 11, 2008 at 11:20 AM, Jeremy Chadwick <koitsu at freebsd.org> wrote:
>> > On Thu, Sep 11, 2008 at 10:38:36AM +0200, Michael Grant wrote:
>> >> My box crashed again:
>> >>
>> >> panic: kmem_malloc(4096): kmem_map too small: 1073741824 total allocated
>> >> cpuid = 0
>> >> Uptime: 33d11h12m58s
>> >> Dumping 3327 MB (2 chunks)
>> >> chunk 0: 1MB (151 pages) ... ok
>> >> chunk 1: 3327MB (851568 pages) <---hung here
>> >>
>> >> Still no valid dump.
>> >>
>> >> There is 4gig of physical memory in the machine.
>> >>
>> >> In /boot/loader.conf, I currently have the following:
>> >>
>> >> vm.kmem_size=1G
>> >> vm.kmem_size_max=1G
>> >> vm.kmem_size_scale=2
>> >>
>> >> and in my kernel conf file I have:
>> >>
>> >> options KVA_PAGES=512
>> >>
>> >> It stayed up for 33 days this time. Is there anything else I can do?
>> >
>> > First and foremost: are you using ZFS on this machine? If so, there are
>> > many tunables you can apply to try and limit this; I'm willing to bet
>> > it's ARC which is doing it. See below.
>> >
>> > In general, it appears that you need to increase the maximum range of
>> > kmem. The kernel attempted to utilise more than 1GB, and your limit is
>> > 1G. My machines running RELENG_7 on amd64, with only 2GB of RAM
>> > installed, use the following tunables in loader.conf:
>> >
>> > vm.kmem_size="1536M"
>> > vm.kmem_size_max="1536M"
>> >
>> > If ZFS is in use, I recommend these as well:
>> >
>> > vfs.zfs.arc_min="16M"
>> > vfs.zfs.arc_max="64M"
>> > vfs.zfs.prefetch_disable="1"
>> >
>> > Do not increase kmem_size any larger than 1.5GB; the amount of RAM you
>> > have in the machine, with regards to RELENG_7, will not help. This is a
>> > known limitation which has been fixed in HEAD/CURRENT (where the limit
>> > has been increased to 512GB). See the "Kernel" section below; you'll
>> > see the applicable item.
>> >
>> > http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues
>> >
>> > Your only solution may be to run HEAD/CURRENT.
>>
>> I am not running ZFS. My file systems are ufs.
>>
>> This feels like some sort of memory leak in the kernel. Giving it
>> more and more memory just seems to delay the crash. Are you saying
>> the crash is fixed in HEAD/CURRENT?
>
> It's an intentional crash, not "the program tried to access NULL, which
> crashed the machine" crash. The kernel wants more memory to accomplish
> a certain thing, and it's not available. kris@ can explain this in
> better terms than I can.
>
> First and foremost, it would be good to find out what all you are
> running on this machine (process-wise). A process could be tickling
> something in the kernel which requires a large amount of memory to be
> required. I can imagine something like MySQL would require this.
>
> Ideally what needs to happen is to debug the kernel or get a full map
> of kmem to find out what's using what. I believe vmstat -m or vmstat -z
> output might help.
>
> Obviously since the machine panics, you won't be able to run those
> commands after the fact. I would recommend you set up a cronjob that
> runs every 1-2 minutes and logs the output of both of those commands
> to a file. When the panic happens, restart the system and look at
> the logfile to see if you can figure out if anything suddenly starts
> taking up a large amount of memory, or if it's a gradual thing
> (indicating a memory leak).
>
> If you can figure out what might be tickling the problem, you can
> ultimately figure out if increasing kmem is the right thing to do, or if
> there's a greater problem here.
>
>> I'm running 6.3 by the way.
>>
>> I have put your changes into my loader.conf, we'll see how long it
>> goes this time. I'm not qute in position to update everything to 7.x
>> at the moment.
>
> Our production webservers run RELENG_6 and RELENG_7, and we don't
> encounter this kind of problem. I'm not saying what you're experiencing
> is indicative of hardware issues or something like that -- I'm simply
> saying I have loaded systems which don't ever hit that condition. So
> figuring out what's causing it in your case would be good.
>
This appears to be too high as the machine reboots immediately after the fsck:
>> > vm.kmem_size="1536M"
>> > vm.kmem_size_max="1536M"
Returning it to 1G, it panics again about a month later.
Here's vmstat -m and -z roughly 1 minute before it crashed (I was
logging to a file every minute via cron):
Fri Nov 21 15:15:00 EST 2008
Type InUse MemUse HighUse Requests Size(s)
pfs_vncache 2 1K - 864205 32
GEOM 168 24K - 416279 16,32,64,128,256,512,1024,2048,4096
isadev 17 2K - 17 64
CAM periph 1 1K - 1 128
cdev 26 4K - 26 128
CAM queue 3 1K - 3 16
file desc 739 474K - 284943537 16,32,64,256,512,1024,2048,4096
sigio 3 1K - 4802 32
kenv 116 8K - 118 16,32,64,4096
kqueue 246 154K - 17652506 256,1024
proc-args 153 10K - 107101480 16,32,64,128,256
zombie 0 0K - 99871925 128
ithread 147 15K - 147 16,64,128
KTRACE 100 13K - 265722 16,32,64,128,256,512,1024,2048,4096
linker 178 453K - 475 16,32,256,512,1024,2048,4096
lockf 18 2K - 7774966702 64
devbuf 594 1779K - 598 16,32,64,128,256,512,1024,2048,4096
temp 3170780 795024K - 684086094 16,32,64,128,256,512,1024,2048,
4096
ip6opt 1 1K - 1 128
ip6ndp 7 1K - 8 64,128
module 403 26K - 403 64,128
mtx_pool 1 8K - 1
CAM dev queue 1 1K - 1 64
pgrp 90 6K - 785669 64
session 65 9K - 681185 128
proc 2 8K - 2 4096
subproc 1307 1576K - 99873232 256,4096
cred 268 34K - 1054173599 128
ata_generic 9 9K - 9 1024
plimit 44 11K - 5647664 256
uidinfo 29 2K - 384426 32,1024
sysctl 0 0K - 2200402 16,32,64
sysctloid 3411 104K - 3411 16,32,64
sysctltmp 0 0K - 2662228 16,32,128
umtx 1750 110K - 3360 64
SWAP 2 2189K - 2 64
bus 1090 46K - 7017 16,32,64,128,1024
bus-sc 79 28K - 3015 16,32,64,128,256,512,1024,2048,4096
devstat 12 25K - 12 16,4096
eventhandler 51 3K - 51 32,128
CAM SIM 1 1K - 1 64
kobj 257 514K - 315 2048
CAM XPT 10 1K - 17 16,64,512
ad_driver 8 1K - 8 32
ata_dma 10 2K - 10 128
rman 193 13K - 707 16,64
sbuf 0 0K - 5350749 16,32,64,128,256,512,1024,2048,4096
ar_driver 0 0K - 34 512,2048
taskqueue 11 1K - 11 16,128
Unitno 18 1K - 160999938 16,64
ioctlops 0 0K - 31916658 16,32,64,128,256,512,1024
iov 0 0K - 323400897 16,32,64,128,256,4096
msg 4 25K - 4 1024,4096
sem 4 7K - 4 512,1024,4096
shm 124 135K - 65027 1024
ttys 2337 328K - 100279 128,1024
ptys 21 3K - 21 128
accf 35 1K - 12157 16,32
mbextcnt 11 1K - 87164975 16
mbuf_tag 0 0K - 17517357 32
soname 73 9K - 276614136 16,32,128
pcb 106 6K - 18574167 16,32,64,2048
BIO buffer 28 56K - 11612611 1024,2048
vfscache 1 512K - 1
cluster_save buffer 0 0K - 3154212 32,64
VFS hash 1 256K - 1
vnodes 11 1K - 669 16,128
mount 171 5K - 7997 16,32,64,128,2048
vnodemarker 0 0K - 2210275 512
BPF 6 1K - 3103 16,64,128,256
ifnet 7 7K - 8 256,1024
ifaddr 86 19K - 105 16,32,64,128,256,512,2048
ether_multi 22 1K - 26 16,32,64
clone 6 24K - 6 4096
arpcom 3 1K - 3 16
lo 1 1K - 1 16
acd_driver 1 2K - 1 2048
ppbusdev 3 1K - 3 128
routetbl 212 41K - 16997 16,32,64,128,256
in_multi 4 1K - 5 32
IpFw/IpAcct 1 1K - 1 64
ip_moptions 1 1K - 1 128
hostcache 1 24K - 1
syncache 1 8K - 1
in6_multi 16 1K - 16 16,32,64
NFS req 0 0K - 250799856 128
NFSV3 diroff 0 0K - 183024 512
NFS daemon 1 8K - 1
p1003.1b 1 1K - 1 16
pagedep 1 64K - 1
inodedep 1 256K - 1
newblk 1 1K - 1 256
UFS dirhash 770 175K - 10023288 16,32,64,128,256,512,1024,2048,4096
UFS mount 12 245K - 15 256,2048
UMAHash 9 42K - 46 256,512,1024,2048,4096
entropy 1024 64K - 1024 64
USB 49 5K - 49 16,32,64,128,256
USBdev 4 1K - 13 16,128,512
VM pgdata 2 65K - 2 64
DEVFS2 152 3K - 203 16
atkbddev 2 1K - 2 32
DEVFS3 494 62K - 501 128
DEVFS1 152 38K - 154 256
DEVFS_RULE 34 8K - 34 32,256
DEVFS 38 1K - 42 16,128
I/O APIC 4 4K - 4 1024
memdesc 1 4K - 1 4096
nexusdev 3 1K - 3 16
pfs_nodes 20 3K - 20 128
acpica 1207 66K - 26775 16,32,64,128,256,512,1024,2048
acpitask 0 0K - 1 32
PCI Link 16 2K - 16 32,64,128
acpisem 22 2K - 22 64
acpidev 58 2K - 58 32
raid3_data 4 2K - 2597361 16,32,256,512
NULLFS node 182 3K - 1548645220 16
NULLFS hash 1 1K - 1 64
NULLFS mount 5 1K - 5 16
vlan 2 1K - 2 16,64
netgraph_msg 0 0K - 6464 64,128,256,512,1024
netgraph_node 5 2K - 2521 256
netgraph_hook 16 2K - 156 128
netgraph 1 8K - 18 512
netgraph_sock 1 1K - 2453 64
netgraph_path 0 0K - 6464 16,32
netgraph_iface 1 1K - 2 64
netgraph_ppp 1 2K - 2 2048
netgraph_bpf 6 2K - 144 64,128,256,512
netgraph_ksock 0 0K - 16 64
netgraph_mppc 0 0K - 28 1024
ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
UMA Kegs: 140, 0, 77, 19, 77, 0
UMA Zones: 480, 0, 77, 3, 77, 0
UMA Slabs: 64, 0, 6484, 1304, 30596118, 0
UMA RCntSlabs: 104, 0, 625, 189, 1205420, 0
UMA Hash: 128, 0, 3, 27, 12, 0
16 Bucket: 76, 0, 39, 111, 186, 0
32 Bucket: 140, 0, 66, 74, 208, 0
64 Bucket: 268, 0, 118, 36, 459, 9
128 Bucket: 524, 0, 10974, 261, 984992, 4546474
VM OBJECT: 132, 0, 42296, 60016,
2315027163, 0
MAP: 192, 0, 7, 13, 7, 0
KMAP ENTRY: 68, 90104, 160, 7512, 98287339, 0
MAP ENTRY: 68, 0, 36757, 15379,
4327383373, 0
PV ENTRY: 24, 2067410, 626121, 1238434,
52068959685, 0
DP fakepg: 72, 0, 0, 0, 0, 0
mt_zone: 1024, 0, 219, 237, 219, 0
16: 16, 0, 3875, 1606,
2218944237, 0
32: 32, 0, 2007, 3643, 157755404, 0
64: 64, 0, 5655, 1012,
8091390625, 0
128: 128, 0, 4065, 1245,
1507077079, 0
256: 256, 0, 3169837, 458, 269064785, 0
512: 512, 0, 928, 1288, 12048433, 0
1024: 1024, 0, 2493, 1407, 405766834, 0
2048: 2048, 0, 512, 788, 103888082, 0
4096: 4096, 0, 399, 533, 114531797, 0
Files: 72, 0, 1799, 2070,
2326899098, 0
TURNSTILE: 52, 0, 1751, 373, 3361, 0
PROC: 536, 0, 332, 641, 99872258, 0
THREAD: 384, 0, 1114, 636, 80501077, 0
KSEGRP: 88, 0, 994, 606, 2875793, 0
UPCALL: 44, 0, 72, 630, 3421747, 0
SLEEPQUEUE: 32, 0, 1751, 509, 3361, 0
VMSPACE: 296, 0, 282, 836, 99798265, 0
mbuf_packet: 256, 0, 288, 804,
1623032273, 0
mbuf: 256, 0, 29, 649,
7723849747, 0
mbuf_cluster: 2048, 25600, 1092, 158, 41217209, 0
mbuf_jumbo_pagesize: 4096, 0, 0, 0, 0, 0
mbuf_jumbo_9k: 9216, 0, 0, 0, 0, 0
mbuf_jumbo_16k: 16384, 0, 0, 0, 0, 0
ACL UMA zone: 388, 0, 0, 0, 0, 0
g_bio: 132, 0, 0, 1218,
4046376667, 2
ata_request: 204, 0, 0, 798,
1167883416, 2
ata_composite: 196, 0, 0, 0, 0, 0
VNODE: 272, 0, 32878, 67432,
4286504460, 0
VNODEPOLL: 76, 0, 2, 248, 39, 0
S VFS Cache: 68, 0, 32722, 65670,
3034960875, 0
L VFS Cache: 291, 0, 629, 2790, 66550626, 0
NAMEI: 1024, 0, 1, 667,
9997159801, 0
DIRHASH: 1024, 0, 1850, 434, 21697253, 0
NFSMOUNT: 480, 0, 1, 7, 2, 0
NFSNODE: 464, 0, 1, 3943, 221540609, 0
PIPE: 408, 0, 24, 543, 54218876, 0
KNOTE: 68, 0, 4132, 796, 110922846, 0
socket: 356, 12331, 349, 1081, 47659527, 0
ipq: 32, 904, 0, 904, 2259778, 0
udpcb: 180, 12342, 46, 218, 14346034, 0
inpcb: 180, 12342, 260, 1170, 17672886, 0
tcpcb: 464, 12328, 142, 690, 17672886, 0
tcptw: 48, 2496, 118, 1442, 8139533, 0
syncache: 100, 15366, 2, 622, 12257432, 0
hostcache: 76, 15400, 1220, 1130, 1125859, 0
tcpreass: 20, 1690, 0, 845, 564503, 0
sackhole: 20, 0, 1, 675, 2544305, 0
ripcb: 180, 12342, 1, 153, 637466, 0
unpcb: 144, 12339, 158, 787, 15000640, 0
rtentry: 132, 0, 50, 182, 5723, 0
IPFW dynamic rule: 108, 0, 0, 0, 0, 0
SWAPMETA: 276, 121576, 14525, 22393, 55820649, 0
Mountpoints: 664, 0, 15, 21, 17, 0
FFS inode: 132, 0, 32619, 58383,
2515314975, 0
FFS1 dinode: 128, 0, 0, 0, 0, 0
FFS2 dinode: 256, 0, 32619, 56586,
2515314975, 0
gr3:64k: 65536, 0, 0, 292, 10698518, 178524
gr3:16k: 16384, 0, 0, 348, 53407139, 2817786
gr3:4k: 4096, 0, 0, 284, 53870651, 3399
gr3:64k: 65536, 0, 0, 434, 30935972, 267730
gr3:16k: 16384, 0, 0, 722, 253649141, 26383756
gr3:4k: 4096, 0, 0, 659, 86316934, 4074
NetGraph items: 36, 546, 0, 312, 176587, 0
More information about the freebsd-stable
mailing list