More data on 7.2-RELEASE "hangs"
Marc G. Fournier
scrappy at hub.org
Wed May 13 17:44:56 UTC 2009
On Wed, 13 May 2009, John Baldwin wrote:
> Well, you had a whole lot of page faults and other VM activity, plus 500k
> syscalls. The 'w' is a count of swapped processes, so basically your box is
> swapping a whole lot it seems. I think your box is just overloaded.
I knew I was going to regret posting that :(
What I posted was what vmstat 5 shows after the issue *starts*, not what
it normally looks like ... right now, after 10 hours of uptime, and all
the same processes running, it looks like:
io# vmstat 5 (10 hours uptime now)
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr da0 pa0 in sy cs us sy id
0 1 0 10477M 301M 3503 13 1 2 3620 286 0 0 331 45491 4566 26 8 66
0 1 0 10430M 305M 278 7 0 0 550 0 18 0 186 19243 2917 4 3 93
1 1 0 10474M 295M 511 0 0 0 359 0 91 0 253 11632 3516 7 3 90
0 1 0 10447M 310M 819 3 0 0 1473 0 14 0 143 29575 2486 8 3 89
0 1 0 10558M 295M 5008 18 13 5 4128 0 121 0 345 24212 4215 16 7 77
Right now, IO is running ~775 processes ... at the time of the vmstat I
provided earlier, it was up to 1400 processes ... since there is only 5
minutes between script runs, something is causing it to go from zero swap
-> high swap within a very short period of time, but since things get
badly locked up when it happens, I can't isolate where ...
I've got the following two ps outputs at the time of the high paging:
/bin/ps -aucxHl -O jid > ps-long.out
/bin/ps -aux -O jid > ps-short.out
Is there anything in there that I could look at as far as what is putting
things over the edge?
====
As to the 'overloaded server', here is another server, with more running
on it, but exact same configuration:
neptune# vmstat 5 (3 days, 18 hours uptime now)
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr da0 pa0 in sy cs us sy id
0 0 0 12521M 303M 3969 15 5 3 2271 1603 0 0 444 6491 5165 37 19 44
0 0 0 12464M 309M 3009 1 0 15 2833 0 104 0 296 9378 3689 7 5 88
23 0 0 12476M 297M 3845 3 0 0 2627 0 31 0 279 10545 2986 14 5 81
0 1 0 12530M 266M 5259 0 1 0 2551 0 145 0 432 18070 4133 45 8 47
1 0 0 12587M 237M 7049 0 1 0 4484 0 171 0 357 15953 4715 29 7 64
So, normally these servers purr ... and are highly responsive ...
In fact, here is an older 32bit server, less RAM, run about 50% more
processes then neptune:
mercury# vmstat 5
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr da0 pa0 in sy cs us sy id
3 14 1 6817M 114M 641 7 3 1 1036 386 0 0 1109 464 157 5 5 90
0 8 0 6817M 224M 596 33 0 5 5667 3850 86 0 1303 5768 3885 6 7 87
1 10 0 6824M 220M 4332 32 2 0 3228 0 17 0 755 9689 3057 8 7 85
0 9 0 6798M 219M 430 0 0 0 712 0 12 0 1274 4276 3877 2 2 95
0 11 0 6830M 205M 1026 4 1 3 481 0 84 0 1503 5586 4370 6 4 89
----
Marc G. Fournier Hub.Org Networking Services (http://www.hub.org)
Email . scrappy at hub.org MSN . scrappy at hub.org
Yahoo . yscrappy Skype: hub.org ICQ . 7615664
More information about the freebsd-stable
mailing list