How to obtain which interrupts cause system to hang?
Ian Smith
smithi at nimnet.asn.au
Mon Oct 11 04:27:44 UTC 2010
On Sun, 10 Oct 2010 19:27:05 +0300, kes-kes at yandex.ru wrote:
> Hi, Ian.
Hi Eugen,
> >> >> 23.1%Sys 50.8%Intr 1.3%User 0.0%Nice 24.8%Idle %ozfod 1999 cpu0: time
> >> >> | | | | | | | | | | | daefr
> >> >> ============+++++++++++++++++++++++++> 6 prcfr
> >>
> >> IS> Yes, system and esp. interrupt time is heavy .. 23k context switches!?
[..]
> >> IS> Disable p4tcc if it's a modern CPU; that usually hurts more than helps.
> >> IS> Disable polling if you're using that .. you haven't provided much info,
> >> IS> like is this with any network load, despite nfe0 showing no interrupts?
>
> >> Polling is ON. Traffice is about 60Mbit/s routed from nfe0 to vlan4 on rl0
> >> when interrupts are happen traffic slow down to 25-30Mbit/s.
>
> IS> Out of my depth. If it's a net problem - maybe not - you may do better
> IS> in freebsd-net@ if you provide enough information (dmesg plus ifconfig,
> IS> vmstat -i etc, normally and while this problem is happening).
[..]
> >> >> How to obtain what nasty happen, which process take 36-50% of CPU
> >> >> resource?
> >>
> >> IS> Try 'top -S'. It's almost certainly system process[es], not shown above.
>
> IS> Does that not show anything? Also, something like 'ps auxww | less'
> IS> should show you what's using all that CPU. I'm out of wild clues.
>
> vpn_shadow# top -S
> last pid: 57879; load averages: 0.12, 0.06, 0.05 up 1+18:37:39 19:19:14
Ok, this was taken when things were't so busy as the earlier 36-50% ..
> 101 processes: 2 running, 83 sleeping, 16 waiting
> CPU: 0.0% user, 0.0% nice, 14.3% system, 17.3% interrupt, 68.4% idle
> Mem: 319M Active, 799M Inact, 354M Wired, 336K Cache, 213M Buf, 503M Free
> Swap: 4063M Total, 4063M Free
>
> PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
> 11 root 1 171 ki31 0K 16K RUN 24.9H 86.47% idle: cpu0
> 14 root 1 -44 - 0K 16K WAIT 689:52 10.25% swi1: net
> 2 root 1 -68 - 0K 16K sleep 207:35 4.69% ng_queue0
> 40 root 1 -68 - 0K 16K - 101:37 1.46% dummynet
.. but still if you add up the TIMEs above here it comes to about 41.5
hours, all but about half an hour of your total uptime, most of which is
consumed by the next three below, so swi1 and ng_queue look like what's
using most CPU long-term.
> 47 root 1 20 - 0K 16K syncer 5:29 0.29% syncer
> 12 root 1 -32 - 0K 16K WAIT 14:48 0.00% swi4: clock sio
> 15 root 1 -16 - 0K 16K - 5:39 0.00% yarrow
> 986 root 1 44 0 5692K 1408K select 1:29 0.00% syslogd
> 1054 bind 4 4 0 138M 113M kqread 1:22 0.00% named
> 1162 clamav 1 4 0 4616K 1468K accept 0:59 0.00% smtp-gated
Smells net-related to me, maybe polling, but like I said, I'm out of my
depth. You should have enough info to take to freebsd-net@ anyway.
cheers, Ian
PS: I still think you should take the time to close PR kern/129103 :)
More information about the freebsd-questions
mailing list