NFE adapter 'hangs'

Thu Sep 2 19:50:21 UTC 2010

On Thu, Sep 02, 2010 at 09:13:46AM +0100, Melissa Jenkins wrote:
> Hiya,
> 
> I've been having trouble with two different machines (FBSD 8.0p3 & FBSD 7.0p5) using the NFE network adapter.  The machines are, respectively, Sun X2200 (AMD64) and a Sun X2100M2 (AMD64) and both are running the amd64 kernel. 
> 
> Basically what appears to happen is that traffic stops flowing through the interface and 'No buffer space available' error messages are produced when trying to send icmp packets. All establish connections appear to hang.
> 
> The machines are running as packet routers, and nfe0 is acting as the 'lan' side.  PF is being used for filtering, NAT, BINAT and RDR.  The same PF configuration works correctly on two other servers using different network adapters. One of them is configured with pfsync & CARP, but the other one isn't.
> 
> The problem seems to happen under fairly light number of sessions ( < 100 active states in PF) though the more states the quicker it occurs.  It is possible it's related to packet rates as putting on high bandwidth clients seems to produce the problem very quickly (several minutes) This is reinforced by the fact that the problem first manifested when we upgraded one of the leased lines.
> 
> Executing ifconfig nfe0 down && ifconfig nfe0 up will restart traffic flow.  
> 
> Neither box is very highly loaded, generally around ~ 1.5 Mb/s.  This doesn't appear to be related to the amount of traffic as I have tried re-routing 95% of traffic around the server without any improvement in performance.  The traffic profile is fairly random - a mix of TCP and UDP, mostly flowing OUT of nfe0.  It is all L3 and there are  less than 5 hosts on the segment attached to the nfe interface.
> 
> Both boxes are in different locations and are connected to different types of Cisco switches.  Both appear to autonegotiate correctly and the switch ports show no status changes.
> 
> It appears that PFSync, CARP & a GRE tunnel works correctly over the NFE interface for long periods of time (weeks +) And that it is something to do adding other traffic to the mix that is resulting in the interface 'hanging'.
> 
> If I move the traffic from NFE to the other BGE interface (the one shared with the LOM) everything is stable and works correctly.  I have not been able to reproduce this using test loads, and the interface worked correctly with iperf testing prior to deployment.  I unfortunately (legal reasons) can't provide a traffic trace up to the time it occurs though everything looks normal to me.
> 
> The FreeBSD 7 X2100 lists the following from PCI conf:
> nfe0 at pci0:0:8:0:        class=0x068000 card=0x534c108e chip=0x037310de rev=0xa3 hdr=0x00
>    vendor     = 'Nvidia Corp'
>    device     = 'MCP55 Ethernet'
>    class      = bridge
> nfe1 at pci0:0:9:0:        class=0x068000 card=0x534c108e chip=0x037310de rev=0xa3 hdr=0x00
>    vendor     = 'Nvidia Corp'
>    device     = 'MCP55 Ethernet'
>    class      = bridge
> 
> The FreeBSD 8 X2200 lists the same thing:
> nfe0 at pci0:0:8:0:        class=0x068000 card=0x534b108e chip=0x037310de rev=0xa3 hdr=0x00
>    vendor     = 'Nvidia Corp'
>    device     = 'MCP55 Ethernet'
>    class      = bridge
> nfe1 at pci0:0:9:0:        class=0x068000 card=0x534b108e chip=0x037310de rev=0xa3 hdr=0x00
>    vendor     = 'Nvidia Corp'
>    device     = 'MCP55 Ethernet'
>    class      = bridge
> 
> 
> Here are the two obvious tests (both from the FreeBSD 7 box), but the icmp response & the mbuf stats are very much the same on both boxes.
> 
> ping 172.31.3.129
> PING 172.31.3.129 (172.31.3.129): 56 data bytes
> ping: sendto: No buffer space available
> ping: sendto: No buffer space available
> ^C
> 
> -- 172.31.3.129 ping statistics ---
> 2 packets transmitted, 0 packets received, 100.0% packet loss
> 
> netstat -m
> 852/678/1530 mbufs in use (current/cache/total)
> 818/448/1266/25600 mbuf clusters in use (current/cache/total/max)
> 817/317 mbuf+clusters out of packet secondary zone in use (current/cache)
> 0/362/362/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
> 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
> 1879K/2513K/4392K bytes allocated to network (current/cache/total)
> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0/0/0 sfbufs in use (current/peak/max)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
> 0 calls to protocol drain routines
> 
> From the other machine, after the problem has occurred & and ifconfig down/up cycle has been done (ie when the interface is working)
> vmstat -z 
> mbuf_packet:              256,        0,     1033,     1783, 330792410,        0
> mbuf:                     256,        0,        5,     1664, 395145472,        0
> mbuf_cluster:            2048,    25600,     2818,     1690, 13234653,        0
> mbuf_jumbo_page:         4096,    12800,        0,      336,   297749,        0
> mbuf_jumbo_9k:           9216,     6400,        0,        0,        0,        0
> mbuf_jumbo_16k:         16384,     3200,        0,        0,        0,        0
> mbuf_ext_refcnt:            4,        0,        0,        0,        0,        0
> 
> 
> Although I failed to keep a copy I don't believe there is a kmem problem
> 
> I'm at a complete loss as to what to try next :(  
> 
> All suggestions very gratefully received!!!  The 7.0 box is live so can't really be played with but I can occasionally run tests on the other box
> 

First of all, if you have 'route-to' rules in pf please disable TSO
on nfe(4). This is a known pf(4) issue for handling TSO frames.
If this is not the case, would you show me the output of
"sysctl dev.nfe.0.stats"? Can you see these statistics are still
updated at the time of hang?
Also I'd like to know whether both RX and TX are dead or only one
RX/TX path is hung. Can you see incoming traffic with tcpdump when
you think the controller is in stuck?
By chance, are you using polling(4) or VLAN on nfe(4)?