netisr observations
hiren panchasara
hiren.panchasara at gmail.com
Fri Apr 11 02:17:56 UTC 2014
(Note: This may seem more like a rant than an actual problem report.)
I am on a stable-10ish box with igb0. Workload is mainly inbound nfs
traffic. About 2K connections at any point in time.
device igb # Intel PRO/1000 PCIE Server Gigabit Family
hw.igb.rxd: 4096
hw.igb.txd: 4096
hw.igb.enable_aim: 1
hw.igb.enable_msix: 1
hw.igb.max_interrupt_rate: 32768
hw.igb.buf_ring_size: 4096
hw.igb.header_split: 0
hw.igb.num_queues: 0
hw.igb.rx_process_limit: 100
dev.igb.0.%desc: Intel(R) PRO/1000 Network Connection version - 2.4.0
dev.igb.0.%driver: igb
dev.igb.0.%location: slot=0 function=0
dev.igb.0.%pnpinfo: vendor=0x8086 device=0x10c9 subvendor=0x103c
subdevice=0x323f class=0x020000
-bash-4.2$ netstat -I igb0 -i 1
input igb0 output
packets errs idrops bytes packets errs bytes colls
18332 0 0 19096474 22946 0 18211000 0
19074 0 0 11408912 28280 0 29741195 0
15753 0 0 15499238 21234 0 16779695 0
12914 0 0 9583719 17945 0 14599603 0
13677 0 0 10818359 19050 0 15069889 0
-bash-4.2$ sysctl net.isr
net.isr.dispatch: direct
net.isr.maxthreads: 8
net.isr.bindthreads: 0
net.isr.maxqlimit: 10240
net.isr.defaultqlimit: 256
net.isr.maxprot: 16
net.isr.numthreads: 8
-bash-4.2$ sysctl -a | grep igb.0 | grep rx_bytes
dev.igb.0.queue0.rx_bytes: 65473003127
dev.igb.0.queue1.rx_bytes: 73982776038
dev.igb.0.queue2.rx_bytes: 57669494795
dev.igb.0.queue3.rx_bytes: 57830053867
dev.igb.0.queue4.rx_bytes: 75087429774
dev.igb.0.queue5.rx_bytes: 69252615374
dev.igb.0.queue6.rx_bytes: 70565370833
dev.igb.0.queue7.rx_bytes: 90210083223
I am seeing something interesting in "top":
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
12 root 68 -72 - 0K 1088K WAIT 0
279:36 65.77% intr
I see "intr" in on of top 3 slots almost all the time.
turning on -H (thread view) shows me:
12 root -72 - 0K 1088K WAIT 2 69:04 20.36%
intr{swi1: netisr 3}
(Does this mean netisr has swi (software interrupt) on cpu3?)
also, I see this process jumping to all different CPUs (so its not
sticking to 1 cpu)
-bash-4.2$ vmstat -i
interrupt total rate
irq4: uart0 1538 0
cpu0:timer 23865486 1108
irq256: igb0:que 0 46111948 2140
irq257: igb0:que 1 49820986 2313
irq258: igb0:que 2 41914519 1945
irq259: igb0:que 3 40926921 1900
irq260: igb0:que 4 49549124 2300
irq261: igb0:que 5 47066777 2185
irq262: igb0:que 6 50945395 2365
irq263: igb0:que 7 47147662 2188
irq264: igb0:link 2 0
irq274: ahci0:ch0 196869 9
cpu1:timer 23866170 1108
cpu10:timer 23805794 1105
cpu4:timer 23870757 1108
cpu11:timer 23806733 1105
cpu13:timer 23806644 1105
cpu2:timer 23858811 1107
cpu3:timer 23862250 1107
cpu15:timer 23805634 1105
cpu7:timer 23863865 1107
cpu9:timer 23810503 1105
cpu5:timer 23864136 1107
cpu12:timer 23808397 1105
cpu8:timer 23806059 1105
cpu6:timer 23874612 1108
cpu14:timer 23807698 1105
Total 755065290 35055
So, i seems all queues are being used uniformly.
-bash-4.2$ netstat -Q
Configuration:
Setting Current Limit
Thread count 8 8
Default queue limit 256 10240
Dispatch policy direct n/a
Threads bound to CPUs disabled n/a
Protocols:
Name Proto QLimit Policy Dispatch Flags
ip 1 1024 flow default ---
igmp 2 256 source default ---
rtsock 3 256 source default ---
arp 7 256 source default ---
ether 9 256 source direct ---
ip6 10 256 flow default ---
But the *interesting* part from it:
-bash-4.2$ netstat -Q | grep "ip " (looking at just ip in workstreams)
Workstreams:
WSID CPU Name Len WMark Disp'd HDisp'd QDrops Queued Handled
0 0 ip 0 0 73815267 0 0
0 73815267
1 1 ip 0 0 68975084 0 0
0 68975084
2 2 ip 0 0 48943960 0 0
0 48943960
3 3 ip 0 67 59306618 0 0
203888563 263168729
4 4 ip 0 0 77025108 0 0
0 77025108
5 5 ip 0 0 58537310 0 0
0 58537310
6 6 ip 0 0 81896427 0 0
0 81896427
7 7 ip 0 0 69535857 0 0
0 69535857
So, looks like only cpu3 is doing all the queuing.
But it doesn't look like it's getting hammered or anything:
last pid: 75181; load averages: 27.81, 27.08, 26.93
up 0+06:12:37 19:04:33
508 processes: 23 running, 476 sleeping, 1 waiting, 8 lock
CPU 0: 71.8% user, 0.0% nice, 13.7% system, 14.5% interrupt, 0.0% idle
CPU 1: 80.9% user, 0.0% nice, 14.5% system, 4.6% interrupt, 0.0% idle
CPU 2: 77.1% user, 0.0% nice, 17.6% system, 5.3% interrupt, 0.0% idle
CPU 3: 88.5% user, 0.0% nice, 9.2% system, 2.3% interrupt, 0.0% idle
CPU 4: 80.2% user, 0.0% nice, 14.5% system, 5.3% interrupt, 0.0% idle
CPU 5: 79.4% user, 0.0% nice, 16.8% system, 3.1% interrupt, 0.8% idle
CPU 6: 83.2% user, 0.0% nice, 11.5% system, 4.6% interrupt, 0.8% idle
CPU 7: 68.7% user, 0.0% nice, 18.3% system, 13.0% interrupt, 0.0% idle
CPU 8: 88.5% user, 0.0% nice, 11.5% system, 0.0% interrupt, 0.0% idle
CPU 9: 87.8% user, 0.0% nice, 10.7% system, 0.0% interrupt, 1.5% idle
CPU 10: 87.0% user, 0.0% nice, 10.7% system, 2.3% interrupt, 0.0% idle
CPU 11: 80.9% user, 0.0% nice, 16.8% system, 2.3% interrupt, 0.0% idle
CPU 12: 86.3% user, 0.0% nice, 11.5% system, 2.3% interrupt, 0.0% idle
CPU 13: 84.7% user, 0.0% nice, 14.5% system, 0.8% interrupt, 0.0% idle
CPU 14: 87.0% user, 0.0% nice, 12.2% system, 0.8% interrupt, 0.0% idle
CPU 15: 87.8% user, 0.0% nice, 9.9% system, 2.3% interrupt, 0.0% idle
Mem: 17G Active, 47G Inact, 3712M Wired, 674M Cache, 1655M Buf, 1300M Free
Swap: 8192M Total, 638M Used, 7554M Free, 7% Inuse, 4K In
My conclusion after lookinag it for a bunch of times that all CPUs are
equally doing work (if we believe top -P stats)
Finally, the question: why is cpu3 doing all the queuing. and what
does that actually mean?
Can I improve performance OR reduce cpu load any other way? Should I
change anything in my netisr settings?
cheers,
Hiren
More information about the freebsd-net
mailing list