Sudden mbuf demand increase and shortage under the load (igb issue?)

Maxim Sobolev sobomax at FreeBSD.org
Tue Feb 16 18:11:06 UTC 2010


OK, here is some new data that I think rules out any issues with the 
applications. Following Alfred's suggestion I have made a script to run 
every second and output some system statistics:

date
netstat -m
vmstat -i
ps -axl
pstat -T
vmstat -z
sysctl -a

The problem had hit us again today several times and upon investigating 
the log I found that increase in the mbuf usage happened in one step - 
going from normal 10% to 100% between two script runs. What is more 
interesting, is that time from two such subsequent runs were about 2 
minutes apart (instead of 1 second as it should be) and when inspecting 
cron logs I noticed the same time gap in there. I ruled out any VM 
starvation as a cause of the delay because system has plenty of free 
memory. The incoming network traffic was not sufficient to starve VM so 
quickly either - it was about 7MB/sec at that time, so even if all 
receivers stopped draining their buffers it should have taken at least 
1-2 seconds to fill up mbuf cache and create demand for an additional 
kernel memory. The failure would likely to be more gradual and I should 
have seen how it builds up in the debug log.

So it looks like kernel issue of a sort, which causes all userland 
activity to cease for 2 minutes when the system reaches certain load. 
Mbuf build-up is only the by-product of this, not really a cause. igb(4) 
is being the primary suspect now, since we have other machines with more 
load not having this problem and we don't have anybody else using this 
driver.  The chip is the following:

igb0 at pci0:5:0:0:        class=0x020000 card=0x323f103c chip=0x10c98086 
rev=0x01 hdr=0x00
     vendor     = 'Intel Corporation'
     class      = network
     subclass   = ethernet
igb1 at pci0:5:0:1:        class=0x020000 card=0x323f103c chip=0x10c98086 
rev=0x01 hdr=0x00
     vendor     = 'Intel Corporation'
     class      = network
     subclass   = ethernet

Hardware in question is a new HP DL160G6. I have also checked IPMI logs 
and sensors and have not found any issue in there as well. No sensors 
reported off-range values and chassis temperature is within normal limits.

I am not sure how to debug this problem further. We are now 
investigating opportunity to install external non-igb card to the server 
and see if it solves the issue.

I have the whole log if anyone wants to take a closer peek.

Regards,
-- 
Maksym Sobolyev
Sippy Software, Inc.
Internet Telephony (VoIP) Experts
T/F: +1-646-651-1110
Web: http://www.sippysoft.com
MSN: sales at sippysoft.com
Skype: SippySoft


More information about the freebsd-net mailing list