Limits on jumbo mbuf cluster allocation

Fri Mar 8 08:39:47 UTC 2013

On Fri, Mar 08, 2013 at 12:27:37AM -0800, Jack Vogel wrote:
> On Thu, Mar 7, 2013 at 11:54 PM, YongHyeon PYUN <pyunyh at gmail.com> wrote:
> 
> > On Fri, Mar 08, 2013 at 02:10:41AM -0500, Garrett Wollman wrote:
> > > I have a machine (actually six of them) with an Intel dual-10G NIC on
> > > the motherboard.  Two of them (so far) are connected to a network
> > > using jumbo frames, with an MTU a little under 9k, so the ixgbe driver
> > > allocates 32,000 9k clusters for its receive rings.  I have noticed,
> > > on the machine that is an active NFS server, that it can get into a
> > > state where allocating more 9k clusters fails (as reflected in the
> > > mbuf failure counters) at a utilization far lower than the configured
> > > limits -- in fact, quite close to the number allocated by the driver
> > > for its rx ring.  Eventually, network traffic grinds completely to a
> > > halt, and if one of the interfaces is administratively downed, it
> > > cannot be brought back up again.  There's generally plenty of physical
> > > memory free (at least two or three GB).
> > >
> > > There are no console messages generated to indicate what is going on,
> > > and overall UMA usage doesn't look extreme.  I'm guessing that this is
> > > a result of kernel memory fragmentation, although I'm a little bit
> > > unclear as to how this actually comes about.  I am assuming that this
> > > hardware has only limited scatter-gather capability and can't receive
> > > a single packet into multiple buffers of a smaller size, which would
> > > reduce the requirement for two-and-a-quarter consecutive pages of KVA
> > > for each packet.  In actual usage, most of our clients aren't on a
> > > jumbo network, so most of the time, all the packets will fit into a
> > > normal 2k cluster, and we've never observed this issue when the
> > > *server* is on a non-jumbo network.
> > >
> >
> > AFAIK all Intel controllers generate jumbo frame by concatenating
> > multiple mbufs on RX side so there is no physically contiguous 9KB
> > allocation. I vaguely guess there could be mbuf leakage when jumbo
> > frame is enabled. I would check how driver handles mbuf shortage or
> > frame errors while mbuf concatenation for jumbo frame is in
> > progress.
> >
> 
> No, this is not true, if using a 9K jumbo it will actually use the larger
> mbuf pool, the code has been this way for a little while now.

Ah, thanks for correcting me. If H/W is still able to support old
style chaining like em(4), wouldn't it better to use that rather
than allocating a 9KB buffer? Allocating a 9KB buffer to handle a
pure TCP ACK segment looks inefficient.

> 
> Jack
> 
> 
> >
> > > Does anyone have suggestions for dealing with this issue?  Will
> > > increasing the amount of KVA (to, say, twice physical memory) help
> > > things?  It seems to me like a bug that these large packets don't have
> > > their own submap to ensure that allocation is always possible when
> > > sufficient physical pages are available.
> > >
> > > -GAWollman
> >