9.0-RC2 re(4) "no memory for jumbo buffers" issue

Mon Nov 28 23:43:43 UTC 2011

On Mon, Nov 28, 2011 at 05:38:16PM -0500, Mike Andrews wrote:
> On 11/27/11 8:39 PM, YongHyeon PYUN wrote:
> >On Sat, Nov 26, 2011 at 04:05:58PM -0500, Mike Andrews wrote:
> >>I have a Supermicro 5015A-H (Intel Atom 330) server with two Realtek
> >>RTL8111C-GR gigabit NICs on it.  As far as I can tell, these support
> >>jumbo frames up to 7422 bytes.  When running them at an MTU of 5000 on
> >
> >Actually the maximum size is 6KB for RTL8111C, not 7422.
> >RTL8111C and newer PCIe based gigabit controllers no longer support
> >scattering a jumbo frame into multiple RX buffers so a single RX
> >buffer has to receive an entire jumbo frame.  This adds more burden
> >to system because it has to allocate a jumbo frame even when it
> >receives a pure TCP ACK.
> 
> OK, that makes sense.
> 
> >>FreeBSD 9.0-RC2, after a week or so of update, with fairly light network
> >>activity, the interfaces die with "no memory for jumbo buffers" errors
> >>on the console.  Unloading and reloading the driver (via serial console)
> >>doesn't help; only rebooting seems to clear it up.
> >>
> >
> >The jumbo code path is the same as normal MTU sized one so I think
> >possibility of leaking mbufs in driver is very low.  And the
> >message "no memory for jumbo RX buffers" can only happen either
> >when you up the interface again or interface restart triggered by
> >watchdog timeout handler.  I don't think you're seeing watchdog
> >timeouts though.
> 
> I'm fairly certain the interface isn't changing state when this happens 
> -- it just kinda spontaneously happens after a week or two, with no 
> interface up/down transitions.  I don't see any watchdog messages when 
> this happens.

There is another code path that causes controller reinitialization.
If you change MTU or offloading configuration(TSO, VLAN tagging,
checksum offloading etc) it will reinitialize the controller. So do
you happen to trigger one of these code path during a week or two?

> 
> >When you see "no memory for jumbo RX buffers" message, did you
> >check available mbuf pool?
> 
> Not yet, that's why I asked for debugging tips -- I'll do that the next 
> time this happens.
> 
> >>What's the best way to go about debugging this...  which sysctl's should
> >>I be looking at first?  I have already tried raising kern.ipc.nmbjumbo9
> >>to 16384 and it doesn't seem to help things... maybe prolonging it
> >>slightly, but not by much.  The problem is it takes a week or so to
> >>reproduce the problem each time...
> >>
> >
> >I vaguely guess it could be related with other subsystem which
> >leaks mbufs such that driver was not able to get more jumbo RX
> >buffers from system.  For instance, r228016 would be worth to try on
> >your box.  I can't clearly explain why em(4) does not suffer from
> >the issue though.
> 
> I've just this morning built a kernel with that fix, so we'll see how 
> that goes.

Ok.