still mbuf leak in 9.0 / 9.1?

dennis berger db at nipsi.de
Thu May 16 07:34:52 UTC 2013


Hi jack,

so the increasing number of "mbufs in use" or mbuf clusters in use is normal, you would say?
jumbo frames are of size 9k. I know that they're from different pools, I also checked that pool.
nmb are:

#cat loader.conf

#tuning network
hw.intr_storm_threshold=9000
kern.ipc.nmbclusters=262144
kern.ipc.nmbjumbop=262144
kern.ipc.nmbjumbo9=65536
kern.ipc.nmbjumbo16=32768


14-05-2013-14-09.txt:9246/4918/14164/262144 mbuf clusters in use (current/cache/total/max)
14-05-2013-15-09.txt:9256/4856/14112/262144 mbuf clusters in use (current/cache/total/max)
14-05-2013-16-09.txt:9266/4846/14112/262144 mbuf clusters in use (current/cache/total/max)
14-05-2013-17-09.txt:9276/4836/14112/262144 mbuf clusters in use (current/cache/total/max)
14-05-2013-18-09.txt:9286/4826/14112/262144 mbuf clusters in use (current/cache/total/max)
14-05-2013-19-09.txt:9296/4734/14030/262144 mbuf clusters in use (current/cache/total/max)
14-05-2013-20-09.txt:9306/4724/14030/262144 mbuf clusters in use (current/cache/total/max)
14-05-2013-21-09.txt:9316/4714/14030/262144 mbuf clusters in use (current/cache/total/max)
14-05-2013-22-09.txt:9326/4704/14030/262144 mbuf clusters in use (current/cache/total/max)
14-05-2013-23-09.txt:9336/4694/14030/262144 mbuf clusters in use (current/cache/total/max)
15-05-2013-00-09.txt:9346/4684/14030/262144 mbuf clusters in use (current/cache/total/max)
15-05-2013-01-09.txt:9356/4674/14030/262144 mbuf clusters in use (current/cache/total/max)
15-05-2013-02-09.txt:9366/4664/14030/262144 mbuf clusters in use (current/cache/total/max)
15-05-2013-03-09.txt:9379/4279/13658/262144 mbuf clusters in use (current/cache/total/max)
15-05-2013-04-09.txt:9384/4086/13470/262144 mbuf clusters in use (current/cache/total/max)
15-05-2013-05-09.txt:9394/4076/13470/262144 mbuf clusters in use (current/cache/total/max)
15-05-2013-06-09.txt:9404/4066/13470/262144 mbuf clusters in use (current/cache/total/max)
15-05-2013-07-09.txt:9414/5040/14454/262144 mbuf clusters in use (current/cache/total/max)
15-05-2013-08-09.txt:9424/5030/14454/262144 mbuf clusters in use (current/cache/total/max)
15-05-2013-09-09.txt:9434/4898/14332/262144 mbuf clusters in use (current/cache/total/max)
15-05-2013-10-09.txt:9444/4850/14294/262144 mbuf clusters in use (current/cache/total/max)
15-05-2013-11-09.txt:9454/5000/14454/262144 mbuf clusters in use (current/cache/total/max)
15-05-2013-12-09.txt:9464/4874/14338/262144 mbuf clusters in use (current/cache/total/max)
15-05-2013-13-09.txt:9474/4856/14330/262144 mbuf clusters in use (current/cache/total/max)
15-05-2013-14-09.txt:17674/4460/22134/262144 mbuf clusters in use (current/cache/total/max)
15-05-2013-15-09.txt:17684/4450/22134/262144 mbuf clusters in use (current/cache/total/max)
15-05-2013-16-09.txt:17694/4696/22390/262144 mbuf clusters in use (current/cache/total/max)
15-05-2013-17-09.txt:17704/4686/22390/262144 mbuf clusters in use (current/cache/total/max)
15-05-2013-18-09.txt:17714/4658/22372/262144 mbuf clusters in use (current/cache/total/max)
15-05-2013-19-09.txt:17724/4648/22372/262144 mbuf clusters in use (current/cache/total/max)
15-05-2013-20-09.txt:17734/4638/22372/262144 mbuf clusters in use (current/cache/total/max)
15-05-2013-21-09.txt:17744/4628/22372/262144 mbuf clusters in use (current/cache/total/max)

Please see the link to http://knownhosts.org/reports-14-15.tgz in my original post, there is the full information including 9k jumbo frames.

it's the driver version 2.4.8 which should be from 9.1-release directly
yes TWINAX is correct.

I'll replace the driver with the latest one.

best regards and thanks,
dennis


Am 15.05.2013 um 19:00 schrieb Jack Vogel:

> So, you stop getting 10G transmission and so you are looking at mbuf leaks? I don't see
> anything in your data that makes it look like you've run out of available mbufs.  You said
> you're running jumbos, what size? You do realize that if you do this the clusters are coming
> from different pools and you are not displaying those. What are all your nmb limits set to?
> 
> So, this is 9.1 RELEASE, or stable? If you are using the driver from release I would first off
> suggest you test the code from HEAD.
> 
> What is the 10G device, I see its using Twinax, and I have been told there is a problem at
> times with those that is corrected in recent shared code, this is why you should try the
> latest code.
> 
> Cheers,
> 
> Jack
> 
> 
> 
> On Wed, May 15, 2013 at 2:00 AM, dennis berger <db at nipsi.de> wrote:
> Hi list,
> since we activated 10gbe on ixgbe cards + jumbo frames(9k) on 9.0 and now on 9.1 we recognize that after a random period of time, sometimes a week, sometimes only a day, the
> system doesn't send any packets out. The phenomenon is that you can't login via ssh, nfs and istgt is not operative. Yet you can login on the console and execute commands.
> A clean shutdown isn't possible though. It hangs after vnode cleaning, normally you would see detaching of usb devices here, or other devices maybe?
> I've read the other post on this ML about mbuf leak in the arp handling code in if_ether.c line 558. We don't see any of those notices in dmesg so I don't think that glebius fix would apply for us.
> I'm collecting system and memory information every hour.
> 
> 
> Script looks like this.
> less /etc/periodic/hourly/100.report-memory.sh
> #!/bin/sh
> 
> reporttimestamp=`date +%d-%m-%Y-%H-%M`
> reportname=${reporttimestamp}.txt
> 
> cd /root/memory-mon
> 
> top -b > $reportname
> echo "" >> $reportname
> vmstat -m >> $reportname
> echo "" >> $reportname
> vmstat -z >> $reportname
> echo "" >> $reportname
> netstat -Q >> $reportname
> echo "" >> $reportname
> netstat -n -x >> $reportname
> echo "" >> $reportname
> netstat -m >> $reportname
> /usr/bin/perl /usr/local/bin/zfs-stats -a >> $reportname
> 
> When you grep for mbuf or mbuf usage you will see this for example:
> 
> root at freenas:/root/memory-mon # grep mbuf_packet: *
> 14-05-2013-14-09.txt:mbuf_packet:            256,      0,    9246,    2786,201700429,   0,   0
> 14-05-2013-15-09.txt:mbuf_packet:            256,      0,    9256,    2776,201773122,   0,   0
> 14-05-2013-16-09.txt:mbuf_packet:            256,      0,    9266,    2766,201871553,   0,   0
> 14-05-2013-17-09.txt:mbuf_packet:            256,      0,    9276,    2756,201915405,   0,   0
> 14-05-2013-18-09.txt:mbuf_packet:            256,      0,    9286,    2746,201927956,   0,   0
> 14-05-2013-19-09.txt:mbuf_packet:            256,      0,    9296,    2352,201935681,   0,   0
> 14-05-2013-20-09.txt:mbuf_packet:            256,      0,    9306,    2342,201943754,   0,   0
> 14-05-2013-21-09.txt:mbuf_packet:            256,      0,    9316,    2332,201950961,   0,   0
> 14-05-2013-22-09.txt:mbuf_packet:            256,      0,    9326,    2450,201958150,   0,   0
> 14-05-2013-23-09.txt:mbuf_packet:            256,      0,    9336,    2440,201967178,   0,   0
> 15-05-2013-00-09.txt:mbuf_packet:            256,      0,    9346,    2430,201974561,   0,   0
> 15-05-2013-01-09.txt:mbuf_packet:            256,      0,    9356,    2420,201982105,   0,   0
> 15-05-2013-02-09.txt:mbuf_packet:            256,      0,    9366,    2410,201989463,   0,   0
> 15-05-2013-03-09.txt:mbuf_packet:            256,      0,    9378,    1502,203019168,   0,   0
> 15-05-2013-04-09.txt:mbuf_packet:            256,      0,    9384,    1624,205953601,   0,   0
> 15-05-2013-05-09.txt:mbuf_packet:            256,      0,    9394,    1870,205959258,   0,   0
> 15-05-2013-06-09.txt:mbuf_packet:            256,      0,    9404,    2500,205969396,   0,   0
> 15-05-2013-07-09.txt:mbuf_packet:            256,      0,    9414,    3386,207945161,   0,   0
> 15-05-2013-08-09.txt:mbuf_packet:            256,      0,    9424,    3376,208094689,   0,   0
> 15-05-2013-09-09.txt:mbuf_packet:            256,      0,    9434,    2982,208172465,   0,   0
> 15-05-2013-10-09.txt:mbuf_packet:            256,      0,    9444,    3100,208270369,   0,   0
> 
> and
> 
> root at freenas:/root/memory-mon # grep "mbufs in use" *
> 14-05-2013-14-09.txt:58444/5816/64260 mbufs in use (current/cache/total)
> 14-05-2013-15-09.txt:58455/5805/64260 mbufs in use (current/cache/total)
> 14-05-2013-16-09.txt:58464/5796/64260 mbufs in use (current/cache/total)
> 14-05-2013-17-09.txt:58475/5785/64260 mbufs in use (current/cache/total)
> 14-05-2013-18-09.txt:58484/5776/64260 mbufs in use (current/cache/total)
> 14-05-2013-19-09.txt:58493/5767/64260 mbufs in use (current/cache/total)
> 14-05-2013-20-09.txt:58503/5757/64260 mbufs in use (current/cache/total)
> 14-05-2013-21-09.txt:58513/5747/64260 mbufs in use (current/cache/total)
> 14-05-2013-22-09.txt:58523/5737/64260 mbufs in use (current/cache/total)
> 14-05-2013-23-09.txt:58533/5727/64260 mbufs in use (current/cache/total)
> 15-05-2013-00-09.txt:58543/5717/64260 mbufs in use (current/cache/total)
> 15-05-2013-01-09.txt:58554/5706/64260 mbufs in use (current/cache/total)
> 15-05-2013-02-09.txt:58563/5697/64260 mbufs in use (current/cache/total)
> 15-05-2013-03-09.txt:58639/5621/64260 mbufs in use (current/cache/total)
> 15-05-2013-04-09.txt:58581/5679/64260 mbufs in use (current/cache/total)
> 15-05-2013-05-09.txt:58591/5669/64260 mbufs in use (current/cache/total)
> 15-05-2013-06-09.txt:58602/5658/64260 mbufs in use (current/cache/total)
> 15-05-2013-07-09.txt:58613/5647/64260 mbufs in use (current/cache/total)
> 15-05-2013-08-09.txt:58623/6027/64650 mbufs in use (current/cache/total)
> 15-05-2013-09-09.txt:58634/6016/64650 mbufs in use (current/cache/total)
> 15-05-2013-10-09.txt:58645/6005/64650 mbufs in use (current/cache/total)
> 
> 
> This increasing number of used mbuf_packets and mbufs in use makes me nervous.
> See the complete reports http://knownhosts.org:/reports-14-15.tgz
> 
> Thanks for help,
> 
> -dennis
> 
> 
> 
> --------------BEGIN System information---------------
> It's a stock FreeBSD 9.1, yet the hostname is called freenas. Don't be confused.
> 
> 
> igb0: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
>         options=401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
>         ether 00:25:90:34:c1:12
>         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
>         media: Ethernet autoselect (1000baseT <full-duplex>)
>         status: active
> igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>         options=401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
>         ether 00:25:90:34:c1:13
>         inet 172.16.1.6 netmask 0xfffff000 broadcast 172.16.15.255
>         inet6 fe80::225:90ff:fe34:c113%igb1 prefixlen 64 scopeid 0x2
>         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
>         media: Ethernet autoselect (1000baseT <full-duplex>)
>         status: active
> ix0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
>         options=401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
>         ether 00:1b:21:cc:12:8b
>         inet 10.254.254.242 netmask 0xfffffffc broadcast 10.254.254.243
>         inet6 fe80::21b:21ff:fecc:128b%ix0 prefixlen 64 scopeid 0xb
>         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
>         media: Ethernet autoselect (10Gbase-Twinax <full-duplex>)
>         status: active
> ix1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
>         options=401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
>         ether 00:1b:21:cc:12:8a
>         inet 10.254.254.254 netmask 0xfffffffc broadcast 10.254.254.255
>         inet6 fe80::21b:21ff:fecc:128a%ix1 prefixlen 64 scopeid 0xc
>         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
>         media: Ethernet autoselect (10Gbase-Twinax <full-duplex>)
>         status: active
> ix2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
>         options=401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
>         ether 00:1b:21:cc:12:b3
>         inet 10.254.254.246 netmask 0xfffffffc broadcast 10.254.254.247
>         inet6 fe80::21b:21ff:fecc:12b3%ix2 prefixlen 64 scopeid 0xd
>         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
>         media: Ethernet autoselect
>         status: no carrier
> ix3: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
>         options=401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
>         ether 00:1b:21:cc:12:b2
>         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
>         media: Ethernet autoselect
>         status: no carrier
> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
>         options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
>         inet6 ::1 prefixlen 128
>         inet6 fe80::1%lo0 prefixlen 64 scopeid 0xf
>         inet 127.0.0.1 netmask 0xff000000
>         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
> 
> #dmesg
> …..
> mfi0: 21294 (421879975s/0x0008/info) - Battery started charging
> ix1: link state changed to DOWN
> ix1: link state changed to UP
> ix1: link state changed to DOWN
> ix1: link state changed to UP
> ix1: link state changed to DOWN
> ix1: link state changed to UP
> ix1: link state changed to DOWN
> ix1: link state changed to UP
> ix1: link state changed to DOWN
> ix1: link state changed to UP
> ix1: link state changed to DOWN
> ix1: link state changed to UP
> ix1: link state changed to DOWN
> ix1: link state changed to UP
> ix1: link state changed to DOWN
> ix1: link state changed to UP
> ix1: link state changed to DOWN
> ix1: link state changed to UP
> ix1: link state changed to DOWN
> ix1: link state changed to UP
> ix1: link state changed to DOWN
> ix1: link state changed to UP
> ix1: link state changed to DOWN
> ix1: link state changed to UP
> ix1: link state changed to DOWN
> ix1: link state changed to UP
> ix1: link state changed to DOWN
> ix1: link state changed to UP
> ix1: link state changed to DOWN
> ix1: link state changed to UP
> ix1: link state changed to DOWN
> ix1: link state changed to UP
> ix1: link state changed to DOWN
> ix1: link state changed to UP
> ix1: link state changed to DOWN
> ix1: link state changed to UP
> ix1: link state changed to DOWN
> ix1: link state changed to UP
> ix1: link state changed to DOWN
> ix1: link state changed to UP
> ix1: link state changed to DOWN
> ix1: link state changed to UP
> 
> 
> I should add that the servers that are directly connected to this freebsd server reboot every night. This is why you see ix0 UP/DOWN
> messages in dmesg.
> 
> 
> 
> 
> 
> 
> ------------- END System information------------
> 
> 
> 
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
> 

Dipl.-Inform. (FH)
Dennis Berger

email:   db at bsdsystems.de
mobile: +491791231509
fon: +494054001817



More information about the freebsd-stable mailing list