Limits on jumbo mbuf cluster allocation

Fri Mar 8 07:55:13 UTC 2013

On Fri, Mar 08, 2013 at 02:10:41AM -0500, Garrett Wollman wrote:
> I have a machine (actually six of them) with an Intel dual-10G NIC on
> the motherboard.  Two of them (so far) are connected to a network
> using jumbo frames, with an MTU a little under 9k, so the ixgbe driver
> allocates 32,000 9k clusters for its receive rings.  I have noticed,
> on the machine that is an active NFS server, that it can get into a
> state where allocating more 9k clusters fails (as reflected in the
> mbuf failure counters) at a utilization far lower than the configured
> limits -- in fact, quite close to the number allocated by the driver
> for its rx ring.  Eventually, network traffic grinds completely to a
> halt, and if one of the interfaces is administratively downed, it
> cannot be brought back up again.  There's generally plenty of physical
> memory free (at least two or three GB).
> 
> There are no console messages generated to indicate what is going on,
> and overall UMA usage doesn't look extreme.  I'm guessing that this is
> a result of kernel memory fragmentation, although I'm a little bit
> unclear as to how this actually comes about.  I am assuming that this
> hardware has only limited scatter-gather capability and can't receive
> a single packet into multiple buffers of a smaller size, which would
> reduce the requirement for two-and-a-quarter consecutive pages of KVA
> for each packet.  In actual usage, most of our clients aren't on a
> jumbo network, so most of the time, all the packets will fit into a
> normal 2k cluster, and we've never observed this issue when the
> *server* is on a non-jumbo network.
> 

AFAIK all Intel controllers generate jumbo frame by concatenating
multiple mbufs on RX side so there is no physically contiguous 9KB
allocation. I vaguely guess there could be mbuf leakage when jumbo
frame is enabled. I would check how driver handles mbuf shortage or 
frame errors while mbuf concatenation for jumbo frame is in
progress.

> Does anyone have suggestions for dealing with this issue?  Will
> increasing the amount of KVA (to, say, twice physical memory) help
> things?  It seems to me like a bug that these large packets don't have
> their own submap to ensure that allocation is always possible when
> sufficient physical pages are available.
> 
> -GAWollman