nfe0 loses network connectivity (8.0-RELEASE-p2)

Thu May 27 17:43:34 UTC 2010

On Thu, May 27, 2010 at 03:13:10PM +0200, Olaf Seibert wrote:
> I have a machine with FreeBSD 8.0-RELEASE-p2 which has a big ZFS file
> system and serves as file server (NFS (newnfs)).
> 
> >From time to time however it seems to lose all network connectivity. The
> machine isn't down; from the console (an IPMI console) it works fine.
> 
> I have tried things like bringing nfe0 down and up again, turning off
> things like checksum offload, and none of them really seem to work
> (although apparently sometimes by accident, a thing I try seems to help,
> but a short time later connectivity is lost again). 
> 
> Carrier status and things like that seem all normal:
> 
> nfe0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>         options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
>         ether 00:30:48:xx:xx:xx
>         inet 131.174.xx.xx netmask 0xffffff00 broadcast 131.174.xx.xxx
>         media: Ethernet autoselect (1000baseT <full-duplex>)
>         status: active
> 
> One time when I was doing an "ifconfig nfe0 up" I got the message
> "initialization failed: no memory for rx buffers", so I am currently
> thinking in the direction of mbuf starvation (with something requiring
> too many mbufs to make any progress; I've seen such a thing with inodes
> once).
> 
> Here is the output of netstat -m while the problem was going on:
> 
> 25751/1774/27525 mbufs in use (current/cache/total)
> 24985/615/25600/25600 mbuf clusters in use (current/cache/total/max)
  ^^^^^^^^^^^^^^^^^^^^^
As Jeremy said, it seems you're hitting mbuf shortage situation. I
think nfe(4) is dropping received frames in that case. See how many
packets were dropped due to mbuf shortage from the output of
"netstat -ndI nfe0". You can also use "sysctl dev.nfe.0.stats" to
see MAC statistics maintained in nfe(4) if your MCP controller
supports hardware MAC counters.

> 23254/532 mbuf+clusters out of packet secondary zone in use (current/cache)
> 0/95/95/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
> 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
> 56407K/2053K/58461K bytes allocated to network (current/cache/total)
> 0/2084/1031 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0/0/0 sfbufs in use (current/peak/max)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 10 requests for I/O initiated by sendfile
> 0 calls to protocol drain routines
> 
> while here are the figures a short time after a reboot (a reboot always
> "fixes" the problem):
> 
> 2133/2352/4485 mbufs in use (current/cache/total)
> 1353/2205/3558/25600 mbuf clusters in use (current/cache/total/max)
> 409/871 mbuf+clusters out of packet secondary zone in use (current/cache)
> 0/35/35/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
> 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
> 3239K/5138K/8377K bytes allocated to network (current/cache/total)
> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0/0/0 sfbufs in use (current/peak/max)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
> 0 calls to protocol drain routines
> 
> Is there a way to increase the maximum number of mbufs, or better yet,
> limit the use by whatever is using them too much?
> 

You already hit the mbuf limit so nfe(4) might have started to drop
incoming frames.