Limits on jumbo mbuf cluster allocation

Fri Mar 8 07:10:43 UTC 2013

I have a machine (actually six of them) with an Intel dual-10G NIC on
the motherboard.  Two of them (so far) are connected to a network
using jumbo frames, with an MTU a little under 9k, so the ixgbe driver
allocates 32,000 9k clusters for its receive rings.  I have noticed,
on the machine that is an active NFS server, that it can get into a
state where allocating more 9k clusters fails (as reflected in the
mbuf failure counters) at a utilization far lower than the configured
limits -- in fact, quite close to the number allocated by the driver
for its rx ring.  Eventually, network traffic grinds completely to a
halt, and if one of the interfaces is administratively downed, it
cannot be brought back up again.  There's generally plenty of physical
memory free (at least two or three GB).

There are no console messages generated to indicate what is going on,
and overall UMA usage doesn't look extreme.  I'm guessing that this is
a result of kernel memory fragmentation, although I'm a little bit
unclear as to how this actually comes about.  I am assuming that this
hardware has only limited scatter-gather capability and can't receive
a single packet into multiple buffers of a smaller size, which would
reduce the requirement for two-and-a-quarter consecutive pages of KVA
for each packet.  In actual usage, most of our clients aren't on a
jumbo network, so most of the time, all the packets will fit into a
normal 2k cluster, and we've never observed this issue when the
*server* is on a non-jumbo network.

Does anyone have suggestions for dealing with this issue?  Will
increasing the amount of KVA (to, say, twice physical memory) help
things?  It seems to me like a bug that these large packets don't have
their own submap to ensure that allocation is always possible when
sufficient physical pages are available.

-GAWollman