FreeBSD 4.9 goes boom!
adp
dap99 at i-55.com
Wed Mar 24 13:49:28 PST 2004
Problem: FreeBSD 4.9 load average quickly goes to high levels such as 300.
System becomes unusable and HOPEFULLY reboots. In general though we have to
call a tech to reboot it by hitting the power switch.
Here is the setup:
I have a FreeBSD 4.9 server on a P4 with 256MB of RAM. We have a IDE drive.
We were using HiTech RAID-1, but it was flaky so now I'm just using a single
drive with regular IDE.
CPU: Intel(R) Pentium(R) 4 CPU 1500MHz (1494.47-MHz 686-class CPU)
Origin = "GenuineIntel" Id = 0xf07 Stepping = 7
Features=0x3febf9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV
,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM>
real memory = 268369920 (262080K bytes)
avail memory = 257400832 (251368K bytes)
Warning: Pentium 4 CPU: PSE disabled
Pentium Pro MTRR support enabled
atapci0: <Intel ICH2 ATA100 controller> port 0xf000-0xf00f at device 31.1 on
pci0
ad0: 38166MB <WDC WD400BB-00GFA0> [77545/16/63] at ata0-master UDMA33
On this server I have several jails:
jail 1 : running apache and serving about 6 hits/s on average.
jails 2 - 7 : running apache with just one children in general for SSL
(several SSL sites, several jails -- I'm moving to a single SSL jail and
using natd later)
jail 8 - a ssh jail for people to manage the sites
During normal loads we are okay on memory. (I am adding more.)
At all times we have about 1GB of paging disk free.
Normally, my 5 and 10 min loads are around 0.5 (I can watch column r in
vmstat and see we usually have 0 or 1 processes waiting.) This is normal:
last pid: 7924; load averages: 0.11, 0.25, 0.49 up 0+00:39:40
15:30:01
345 processes: 2 running, 342 sleeping, 1 zombie
Mem: 137M Active, 27M Inact, 52M Wired, 2284K Cache, 35M Buf, 30M Free
Swap: 2048M Total, 31M Used, 2017M Free, 1% Inuse
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
7914 root 30 0 2264K 1320K RUN 0:00 31.00% 1.51% top
7883 root 2 0 6600K 6016K sbwait 0:00 13.84% 1.32% perl
6660 nobody 2 0 17940K 12676K sbwait 0:01 1.07% 1.07% httpd
7930 root 29 0 1852K 924K RUN 0:00 17.00% 0.83% top
763 nobody 18 0 15004K 7144K lockf 0:02 0.15% 0.15% httpd
7828 nobody 2 0 17732K 12424K accept 0:00 0.37% 0.15% httpd
4586 nobody 2 0 17944K 12604K sbwait 0:01 0.10% 0.10% httpd
7868 nobody 2 0 16376K 10944K accept 0:00 1.03% 0.10% httpd
7910 root -6 0 1968K 1356K piperd 0:00 2.00% 0.10% perl
1461 nobody 18 0 14628K 6780K lockf 0:02 0.05% 0.05% httpd
2812 nobody 18 0 14368K 6620K lockf 0:02 0.05% 0.05% httpd
4575 nobody 2 0 17768K 12480K accept 0:01 0.05% 0.05% httpd
4593 nobody 2 0 18080K 12780K sbwait 0:05 0.00% 0.00% httpd
4422 root 2 0 16100K 10264K select 0:03 0.00% 0.00% httpd
4595 nobody 2 0 17984K 12728K sbwait 0:03 0.00% 0.00% httpd
764 nobody 18 0 14992K 7300K lockf 0:02 0.00% 0.00% httpd
4560 nobody 2 0 17944K 12684K sbwait 0:02 0.00% 0.00% httpd
4561 nobody 2 0 17944K 12672K sbwait 0:02 0.00% 0.00% httpd
But when the system crashes the system load just skyrockets:
last pid: 88248; load averages: 238.98, 197.07, 127.85 up 2+17:12:36
14:45:38
709 processes: 257 running, 421 sleeping, 31 zombie
Mem: 143M Active, 21M Inact, 75M Wired, 7908K Cache, 35M Buf, 1844K Free
Swap: 2048M Total, 488M Used, 1560M Free, 23% Inuse
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
88185 root 2 0 6504K 5736K connec 0:00 1.47% 0.93% perl
25298 nobody -18 0 13700K 1596K vmpfw 0:13 0.59% 0.39% httpd
57349 nobody -18 0 14788K 1588K spread 0:10 0.57% 0.39% httpd
18115 nobody -18 0 14224K 1604K vmpfw 0:21 0.39% 0.24% httpd
39876 root 2 0 2716K 0K RUN 10:12 0.00% 0.00% <top>
84557 nobody 2 0 22600K 0K RUN 9:54 0.00% 0.00% <httpd>
84567 nobody 2 0 22360K 0K sbwait 9:47 0.00% 0.00% <httpd>
84568 nobody 2 0 22564K 0K RUN 9:47 0.00% 0.00% <httpd>
84564 nobody 2 0 22680K 0K sbwait 9:41 0.00% 0.00% <httpd>
84556 nobody -22 0 21092K 580K swread 9:39 0.00% 0.00% httpd
84554 nobody 2 0 22592K 0K RUN 9:32 0.00% 0.00% <httpd>
84555 nobody 2 0 22608K 0K RUN 9:31 0.00% 0.00% <httpd>
84558 nobody 2 0 22580K 0K RUN 9:22 0.00% 0.00% <httpd>
84563 nobody 2 0 22692K 0K RUN 9:07 0.00% 0.00% <httpd>
84560 nobody 2 0 22580K 0K RUN 8:56 0.00% 0.00% <httpd>
84398 root 2 0 21052K 1604K select 4:14 0.00% 0.00% httpd
94 root 2 0 360K 0K nfsd 3:03 0.00% 0.00% <nfsd>
3730 nobody 18 0 14888K 0K lockf 1:23 0.00% 0.00% <httpd>
Since I have 75M wired I have SOME memory available to my system.
I am using bsdsar. Our system crashed around 2:45 today:
Time ad0 ad1 ad2 ad3 da0 da1 da2 da3 da4 da5 da6
13:40 0
14:00 33
14:20 146
15:00 40
Time % User % Sys % Nice % Intrpt % Idle
13:40 1 2 0 2 96
14:00 11 2 0 0 87
14:20 0 12 0 0 88
15:00 10 6 0 0 84
Time Free Mem Active Mem Inactive Mem Total Swap Used Swap Free Swap
13:40 11M 129M 33M 2097024k 162608k 1934416k
14:00 5936K 149M 14M 2097024k 159464k 1937560k
14:20 904K 144M 24M 2097024k 303504k 1793520k
15:00 656K 163M 19M 2097024k 9544k 2087480k
I looked in /var/log/messages and saw nothing. I do have a lot of these:
Mar 24 13:49:49 europa /kernel: got bad cookie vp 0xd257ca00 bp 0xc651b57c
Mar 24 13:49:49 europa /kernel: got bad cookie vp 0xd257ca00 bp 0xc650a524
Mar 24 13:49:49 europa /kernel: got bad cookie vp 0xd257ca00 bp 0xc651b57c
It seems to come in spurts of once or twice an hour.
Any ideas?
More information about the freebsd-questions
mailing list