9.0-RC2 re(4) "no memory for jumbo buffers" issue

Mon Jan 2 02:34:22 UTC 2012

On Sun, Jan 01, 2012 at 09:03:07PM -0500, Mike Andrews wrote:
> On Fri, 30 Dec 2011, YongHyeon PYUN wrote:
> 
> >On Thu, Dec 29, 2011 at 10:51:25PM -0500, Mike Andrews wrote:
> >>On 11/28/2011 6:42 PM, YongHyeon PYUN wrote:
> >>>On Mon, Nov 28, 2011 at 05:38:16PM -0500, Mike Andrews wrote:
> >>>>On 11/27/11 8:39 PM, YongHyeon PYUN wrote:
> >>>>>On Sat, Nov 26, 2011 at 04:05:58PM -0500, Mike Andrews wrote:
> >>>>>>I have a Supermicro 5015A-H (Intel Atom 330) server with two Realtek
> >>>>>>RTL8111C-GR gigabit NICs on it.  As far as I can tell, these support
> >>>>>>jumbo frames up to 7422 bytes.  When running them at an MTU of 5000 on
> >>>>>Actually the maximum size is 6KB for RTL8111C, not 7422.
> >>>>>RTL8111C and newer PCIe based gigabit controllers no longer support
> >>>>>scattering a jumbo frame into multiple RX buffers so a single RX
> >>>>>buffer has to receive an entire jumbo frame.  This adds more burden
> >>>>>to system because it has to allocate a jumbo frame even when it
> >>>>>receives a pure TCP ACK.
> >>>>OK, that makes sense.
> >>>>
> >>>>>>FreeBSD 9.0-RC2, after a week or so of update, with fairly light 
> >>>>>>network
> >>>>>>activity, the interfaces die with "no memory for jumbo buffers" errors
> >>>>>>on the console.  Unloading and reloading the driver (via serial 
> >>>>>>console)
> >>>>>>doesn't help; only rebooting seems to clear it up.
> >>>>>>
> >>>>>The jumbo code path is the same as normal MTU sized one so I think
> >>>>>possibility of leaking mbufs in driver is very low.  And the
> >>>>>message "no memory for jumbo RX buffers" can only happen either
> >>>>>when you up the interface again or interface restart triggered by
> >>>>>watchdog timeout handler.  I don't think you're seeing watchdog
> >>>>>timeouts though.
> >>>>I'm fairly certain the interface isn't changing state when this happens
> >>>>-- it just kinda spontaneously happens after a week or two, with no
> >>>>interface up/down transitions.  I don't see any watchdog messages when
> >>>>this happens.
> >>>There is another code path that causes controller reinitialization.
> >>>If you change MTU or offloading configuration(TSO, VLAN tagging,
> >>>checksum offloading etc) it will reinitialize the controller. So do
> >>>you happen to trigger one of these code path during a week or two?
> >>>
> >>>>>When you see "no memory for jumbo RX buffers" message, did you
> >>>>>check available mbuf pool?
> >>>>Not yet, that's why I asked for debugging tips -- I'll do that the next
> >>>>time this happens.
> >>>>
> >>>>>>What's the best way to go about debugging this...  which sysctl's 
> >>>>>>should
> >>>>>>I be looking at first?  I have already tried raising 
> >>>>>>kern.ipc.nmbjumbo9
> >>>>>>to 16384 and it doesn't seem to help things... maybe prolonging it
> >>>>>>slightly, but not by much.  The problem is it takes a week or so to
> >>>>>>reproduce the problem each time...
> >>>>>>
> >>>>>I vaguely guess it could be related with other subsystem which
> >>>>>leaks mbufs such that driver was not able to get more jumbo RX
> >>>>>buffers from system.  For instance, r228016 would be worth to try on
> >>>>>your box.  I can't clearly explain why em(4) does not suffer from
> >>>>>the issue though.
> >>>>I've just this morning built a kernel with that fix, so we'll see how
> >>>>that goes.
> >>>Ok.
> >>
> >>OK, this just happened again with a 9.0-RC3 kernel rev  r228247.
> >>
> >>
> >>whitedog# ifconfig re0 down;ifconfig re0 up;ifconfig re1 down;ifconfig
> >
> >
> >Ah, sorry. I should have spotted this issue earlier.
> >Try attached patch and let me know whether it makes any difference.
> >
> >>re1 up
> >>re0: no memory for jumbo RX buffers
> >>re1: no memory for jumbo RX buffers
> >>whitedog# netstat -m
> >>526/1829/2355 mbufs in use (current/cache/total)
> >>0/1278/1278/25600 mbuf clusters in use (current/cache/total/max)
> >>0/356 mbuf+clusters out of packet secondary zone in use (current/cache)
> >>0/336/336/12800 4k (page size) jumbo clusters in use
> >>(current/cache/total/max)
> >>512/385/897/6400 9k jumbo clusters in use (current/cache/total/max)
> >>0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
> >>4739K/7822K/12561K bytes allocated to network (current/cache/total)
> >>0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> >>0/4560/0 requests for jumbo clusters denied (4k/9k/16k)
> >>0/0/0 sfbufs in use (current/peak/max)
> >>0 requests for sfbufs denied
> >>0 requests for sfbufs delayed
> >>0 requests for I/O initiated by sendfile
> >>0 calls to protocol drain routines
> >
> 
> OK, well, the patch changes things... kind of :)
> 
> After putting a lot of stress on the network -- namely about three passes 
> 'make buildworld buildkernel' over NFS/TCP with a 5000 byte MTU -- the 
> interface hangs again, but the symptoms are now different.  First, no 

When you think the interface is stuck, can you check which part(TX,
RX or both) of MAC is in stuck condition?
If you can see receiving packets with tcpdump it means RX MAC is
still working.  If you can see packets sent from host with re(4) on
destination host that means TX MAC works.

> console messages whatsoever, other than NFS timeouts -- even if you 
> ifconfig up/down the interface, which previously would generate the 'no 
> memory for jumbo RX buffers' message.  That message no longer appears, 
> ever.  Even weirder, the interface will revive itself on its own after 
> about 15 minutes or so, and will bounce up and down every few hours for 
> several minutes at a time.  I don't have exact timings on the outages but 
> I can get them if needed.  The netstat -m numbers are not radically out of 
> line with the previous numbers, except maybe the jumbo cluster requests 
> are higher (but that could just be relative to the number of jumbo 
> packets the box has seen):
> 
> 515/1495/2010 mbufs in use (current/cache/total)
> 0/1272/1272/25600 mbuf clusters in use (current/cache/total/max)
> 0/640 mbuf+clusters out of packet secondary zone in use (current/cache)
> 0/282/282/12800 4k (page size) jumbo clusters in use 
> (current/cache/total/max)
> 514/682/1196/6400 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
> 4755K/10183K/14938K bytes allocated to network (current/cache/total)
> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/7888/0 requests for jumbo clusters denied (4k/9k/16k)
> 0/0/0 sfbufs in use (current/peak/max)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
> 0 calls to protocol drain routines
> 
> Anything I can pull out of sysctl to debug this further?  Since it revives 

Could you try re(4) in HEAD?  I think it has a couple of stability
enhancements which were not merged to stable/9 yet.
You may need to download if_re.c and if_rlreg.h from HEAD and apply
re.jumbobuf.diff.
Also try disabling TSO and TX/RX checksum offloading when you use
jumbo frame.

> itself eventually, I can live with it long enough to troubleshoot.  It's 
> not a particular critical machine at the moment.