still mbuf leak in 9.0 / 9.1?

dennis berger db at nipsi.de
Wed May 15 12:40:20 UTC 2013


Hi list,
since we activated 10gbe on ixgbe cards + jumbo frames(9k) on 9.0 and now on 9.1 we recognize that after a random period of time, sometimes a week, sometimes only a day, the
system doesn't send any packets out. The phenomenon is that you can't login via ssh, nfs and istgt is not operative. Yet you can login on the console and execute commands.
A clean shutdown isn't possible though. It hangs after vnode cleaning, normally you would see detaching of usb devices here, or other devices maybe? 
I've read the other post on this ML about mbuf leak in the arp handling code in if_ether.c line 558. We don't see any of those notices in dmesg so I don't think that glebius fix would apply for us.
I'm collecting system and memory information every hour. 


Script looks like this.
less /etc/periodic/hourly/100.report-memory.sh 
#!/bin/sh

reporttimestamp=`date +%d-%m-%Y-%H-%M`
reportname=${reporttimestamp}.txt

cd /root/memory-mon

top -b > $reportname
echo "" >> $reportname
vmstat -m >> $reportname
echo "" >> $reportname
vmstat -z >> $reportname
echo "" >> $reportname
netstat -Q >> $reportname
echo "" >> $reportname
netstat -n -x >> $reportname
echo "" >> $reportname
netstat -m >> $reportname
/usr/bin/perl /usr/local/bin/zfs-stats -a >> $reportname

When you grep for mbuf or mbuf usage you will see this for example:

root at freenas:/root/memory-mon #	grep mbuf_packet: *
14-05-2013-14-09.txt:mbuf_packet:            256,      0,    9246,    2786,201700429,   0,   0
14-05-2013-15-09.txt:mbuf_packet:            256,      0,    9256,    2776,201773122,   0,   0
14-05-2013-16-09.txt:mbuf_packet:            256,      0,    9266,    2766,201871553,   0,   0
14-05-2013-17-09.txt:mbuf_packet:            256,      0,    9276,    2756,201915405,   0,   0
14-05-2013-18-09.txt:mbuf_packet:            256,      0,    9286,    2746,201927956,   0,   0
14-05-2013-19-09.txt:mbuf_packet:            256,      0,    9296,    2352,201935681,   0,   0
14-05-2013-20-09.txt:mbuf_packet:            256,      0,    9306,    2342,201943754,   0,   0
14-05-2013-21-09.txt:mbuf_packet:            256,      0,    9316,    2332,201950961,   0,   0
14-05-2013-22-09.txt:mbuf_packet:            256,      0,    9326,    2450,201958150,   0,   0
14-05-2013-23-09.txt:mbuf_packet:            256,      0,    9336,    2440,201967178,   0,   0
15-05-2013-00-09.txt:mbuf_packet:            256,      0,    9346,    2430,201974561,   0,   0
15-05-2013-01-09.txt:mbuf_packet:            256,      0,    9356,    2420,201982105,   0,   0
15-05-2013-02-09.txt:mbuf_packet:            256,      0,    9366,    2410,201989463,   0,   0
15-05-2013-03-09.txt:mbuf_packet:            256,      0,    9378,    1502,203019168,   0,   0
15-05-2013-04-09.txt:mbuf_packet:            256,      0,    9384,    1624,205953601,   0,   0
15-05-2013-05-09.txt:mbuf_packet:            256,      0,    9394,    1870,205959258,   0,   0
15-05-2013-06-09.txt:mbuf_packet:            256,      0,    9404,    2500,205969396,   0,   0
15-05-2013-07-09.txt:mbuf_packet:            256,      0,    9414,    3386,207945161,   0,   0
15-05-2013-08-09.txt:mbuf_packet:            256,      0,    9424,    3376,208094689,   0,   0
15-05-2013-09-09.txt:mbuf_packet:            256,      0,    9434,    2982,208172465,   0,   0
15-05-2013-10-09.txt:mbuf_packet:            256,      0,    9444,    3100,208270369,   0,   0

and

root at freenas:/root/memory-mon #	grep "mbufs in use" *
14-05-2013-14-09.txt:58444/5816/64260 mbufs in use (current/cache/total)
14-05-2013-15-09.txt:58455/5805/64260 mbufs in use (current/cache/total)
14-05-2013-16-09.txt:58464/5796/64260 mbufs in use (current/cache/total)
14-05-2013-17-09.txt:58475/5785/64260 mbufs in use (current/cache/total)
14-05-2013-18-09.txt:58484/5776/64260 mbufs in use (current/cache/total)
14-05-2013-19-09.txt:58493/5767/64260 mbufs in use (current/cache/total)
14-05-2013-20-09.txt:58503/5757/64260 mbufs in use (current/cache/total)
14-05-2013-21-09.txt:58513/5747/64260 mbufs in use (current/cache/total)
14-05-2013-22-09.txt:58523/5737/64260 mbufs in use (current/cache/total)
14-05-2013-23-09.txt:58533/5727/64260 mbufs in use (current/cache/total)
15-05-2013-00-09.txt:58543/5717/64260 mbufs in use (current/cache/total)
15-05-2013-01-09.txt:58554/5706/64260 mbufs in use (current/cache/total)
15-05-2013-02-09.txt:58563/5697/64260 mbufs in use (current/cache/total)
15-05-2013-03-09.txt:58639/5621/64260 mbufs in use (current/cache/total)
15-05-2013-04-09.txt:58581/5679/64260 mbufs in use (current/cache/total)
15-05-2013-05-09.txt:58591/5669/64260 mbufs in use (current/cache/total)
15-05-2013-06-09.txt:58602/5658/64260 mbufs in use (current/cache/total)
15-05-2013-07-09.txt:58613/5647/64260 mbufs in use (current/cache/total)
15-05-2013-08-09.txt:58623/6027/64650 mbufs in use (current/cache/total)
15-05-2013-09-09.txt:58634/6016/64650 mbufs in use (current/cache/total)
15-05-2013-10-09.txt:58645/6005/64650 mbufs in use (current/cache/total)


This increasing number of used mbuf_packets and mbufs in use makes me nervous.
See the complete reports http://knownhosts.org:/reports-14-15.tgz 

Thanks for help,

-dennis



--------------BEGIN System information---------------
It's a stock FreeBSD 9.1, yet the hostname is called freenas. Don't be confused.


igb0: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
	ether 00:25:90:34:c1:12
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
	media: Ethernet autoselect (1000baseT <full-duplex>)
	status: active
igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
	ether 00:25:90:34:c1:13
	inet 172.16.1.6 netmask 0xfffff000 broadcast 172.16.15.255
	inet6 fe80::225:90ff:fe34:c113%igb1 prefixlen 64 scopeid 0x2 
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
	media: Ethernet autoselect (1000baseT <full-duplex>)
	status: active
ix0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
	options=401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
	ether 00:1b:21:cc:12:8b
	inet 10.254.254.242 netmask 0xfffffffc broadcast 10.254.254.243
	inet6 fe80::21b:21ff:fecc:128b%ix0 prefixlen 64 scopeid 0xb 
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
	media: Ethernet autoselect (10Gbase-Twinax <full-duplex>)
	status: active
ix1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
	options=401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
	ether 00:1b:21:cc:12:8a
	inet 10.254.254.254 netmask 0xfffffffc broadcast 10.254.254.255
	inet6 fe80::21b:21ff:fecc:128a%ix1 prefixlen 64 scopeid 0xc 
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
	media: Ethernet autoselect (10Gbase-Twinax <full-duplex>)
	status: active
ix2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
	options=401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
	ether 00:1b:21:cc:12:b3
	inet 10.254.254.246 netmask 0xfffffffc broadcast 10.254.254.247
	inet6 fe80::21b:21ff:fecc:12b3%ix2 prefixlen 64 scopeid 0xd 
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
	media: Ethernet autoselect
	status: no carrier
ix3: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
	ether 00:1b:21:cc:12:b2
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
	media: Ethernet autoselect
	status: no carrier
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
	options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
	inet6 ::1 prefixlen 128 
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0xf 
	inet 127.0.0.1 netmask 0xff000000 
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

#dmesg
…..
mfi0: 21294 (421879975s/0x0008/info) - Battery started charging
ix1: link state changed to DOWN
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP
ix1: link state changed to DOWN
ix1: link state changed to UP


I should add that the servers that are directly connected to this freebsd server reboot every night. This is why you see ix0 UP/DOWN
messages in dmesg.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmesg.boot
Type: application/octet-stream
Size: 47197 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20130515/97c7011a/attachment.obj>
-------------- next part --------------




------------- END System information------------




More information about the freebsd-stable mailing list