mbuf cluster leaks in -CURRENT
Robert Watson
rwatson at FreeBSD.org
Sat Dec 3 14:25:06 PST 2005
Yesterday I sat down to run some benchmarks on phk's changes to the process
time measurement system for scheduling, and discovered SMP boxes were wedging
in [zonelimit] when running netperf tests. I quickly tracked this down to an
mbuf cluster leak:
/zoo/rwatson/netperf/bin/netserver
while (1)
echo ""
netstat -m | grep mbuf
/zoo/rwatson/netperf/bin/netperf -l 30 >& /dev/null
end
Result of:
CVS Date Description Leak?
2005/12/3 sample yes
2005/11/28-2005/11/29 rwatson sosend changes -
2005/11/25 sample yes
2005/11/15 sample yes
2005/11/02-2005/11/05 andre cluster changes -
2005/10/25 sample no
2005/10/15 sample no
2005/10/1 sample no
2005/09/27 rwatson removes mbuf counters -
2005/09/16 sample no
The reason for the wedge is that NFS based systems don't like running out of
mbuf clusters. It turns out that the reason I likely didn't notice this
previously was that I was running the test boxes in question without ACPI, and
for whatever reason, the race becomes many times more serious with ACPI turned
on. It was leaking without ACPI, but since it was slower, I wasn't noticing
since I had the machines up for much shorter tests. Here's a sampling of
kernel dates and whether or not the leak was present in a kernel from the
date, as well as the dates of a few changes I was worried were likely causes:
769/641/1410 mbufs in use (current/cache/total)
768/204/972/25600 mbuf clusters in use (current/cache/total/max)
769/4991/5760 mbufs in use (current/cache/total)
4341/905/5246/25600 mbuf clusters in use (current/cache/total/max)
769/8456/9225 mbufs in use (current/cache/total)
7901/801/8702/25600 mbuf clusters in use (current/cache/total/max)
769/11786/12555 mbufs in use (current/cache/total)
11242/788/12030/25600 mbuf clusters in use (current/cache/total/max)
769/15236/16005 mbufs in use (current/cache/total)
14570/916/15486/25600 mbuf clusters in use (current/cache/total/max)
769/18566/19335 mbufs in use (current/cache/total)
17948/866/18814/25600 mbuf clusters in use (current/cache/total/max)
I've not really had a chance to investigate the details of the leak -- the
number of used (allocated) mbufs remains low, but the cache number grows
steadily. However, the dates suggest that it was the mbuf cluster cleanup
work you did that introduced the problem (although don't guarantee it).
Thanks,
Robert N M Watson
More information about the freebsd-current
mailing list