Recovery from mbuf cluster exhaustion

Thu Oct 9 03:29:55 PDT 2003

Peter Bozarov wrote:
[ ... ]
> What I can't seem to figure out is how to flush out the
> "stale" mbufs/clusters. I can close down all network
> interfaces, and kill/restart most of the processes that I
> presume use up the mbufs. At a given point, there can't
> possibly be any processes that are hogging the mbuf
> clusters. Yet, a while later, this is what the pool
> looks like
> 
> [grid] ~ $ netstat -m
> 4305/4944/18240 mbufs in use (current/peak/max):
>          4305 mbufs allocated to data
> 4304/4560/4560 mbuf clusters in use (current/peak/max)
> 10356 Kbytes allocated to network (75% of mb_map in use)
> 8832 requests for memory denied
> 1 requests for memory delayed
> 0 calls to protocol drain routines
> [grid] ~ $
> 
> A few clusters have been freed. But not much. Now, if
> (presumably) no clusters are being used by a process,
> should they not be released by the kernel? Alternatively,
> how can I enforce this (short of rebooting the machine,
> which is *not* the solution I'm looking for)?

Wait for 2*MSL for the network connections to go away.  Assuming
the other end is still there, and not some network loading device
that effectively SYN-floods and establishes "real" connections
(e.g. a "Web Avalanche" or similar product).

Doing a "netstat -a" will show you a list of active connections,
of which I'm sure you have more than a few hanging around, even
though you killed the process that opened them.

You will see a number of bytes in the receive queue or transmit
queue columns, and these will indicate the amount of data that
you have pending in queues that's either not being read by your
application, or that your application has written, but which
cannot be sent because the other side of the connection has been
shut off, lost network connectivity, died, or intentionally
started a transfer with no intention of actually reading the
data you were going to send (e.g. the Microsoft "WAST" web tool
for benchmarking does this, and so does "httpload").

Probably, you need more mbuf clusters, and therefore more mbufs
as well.

-- Terry