bce packet loss
    YongHyeon PYUN 
    pyunyh at gmail.com
       
    Thu Jul  7 17:44:07 UTC 2011
    
    
  
On Thu, Jul 07, 2011 at 02:00:26AM -0400, Charles Sprickman wrote:
> More inline, including a bigger picture of what I'm seeing on some other 
> hosts, but I wanted to thank everyone for all the fascinating ethernet BER 
> info and the final explanation of what the "IfHCInBadOctets" counter 
> represents.  Interesting stuff.
> 
> On Wed, 6 Jul 2011, YongHyeon PYUN wrote:
> 
> >On Mon, Jul 04, 2011 at 09:32:11PM -0400, Charles Sprickman wrote:
> >>Hello,
> >>
> >>We're running a few 8.1-R servers with Broadcom bce interfaces (Dell R510)
> >>and I'm seeing occasional packet loss on them (enough that it trips nagios
> >>now and then).  Cabling seems fine as neither the switch nor the sysctl
> >>info for the device show any errors/collisions/etc, however there is one
> >>odd one, which is "dev.bce.1.stat_IfHCInBadOctets: 539369".  See [1] below
> >>for full sysctl output.  The switch shows no errors but for "Dropped
> >>packets 683868".
> >>
> >>pciconf output is also below. [2]
> >>
> >>By default, the switch had flow control set to "on".  I also let it run
> >>with "auto".  In both cases, the drops continued to increment.  I'm now
> >>running with flow control off to see if that changes anything.
> >>
> >>I do see some correlation between cpu usage and drops - I have cpu usage
> >>graphed in nagios and cacti is graphing the drops on the dell switch.
> >>There's no signs of running out of mbufs or similar.
> >>
> >>So given that limited info, is there anything I can look at to track this
> >>down?  Anything stand out in the stats sysctl exposes?  Two things are
> >>standing out for me - the number of changes in bce regarding flow control
> >>that are not in 8.1, and the correlation between cpu load and the drops.
> >>
> >>What other information can I provide?
> >>
> >
> >You had 282 RX buffer shortages and these frames were dropped. This
> >may explain why you see occasional packet loss. 'netstat -m' will
> >show which size of cluster allocation were failed.
> 
> Nothing of note:
> 
> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0/0/0 sfbufs in use (current/peak/max)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 
Hmm... it's strange, I can't explain how you have non-zero
mbuf_alloc_failed_count.
> >However it seems you have 0 com_no_buffers which indicates
> >controller was able to receive all packets destined for this host.
> >You may host lost some packets(i.e. non-zero mbuf_alloc_failed_count)
> >but your controller and system was still responsive to the network
> >traffic.
> 
> OK.  I recall seeing a thread in the -net archives where some folks had 
> the "com_no_buffers" incrementing, but I'm not seeing that at all.
> 
> >Data sheet says IfHCInBadOctets indicates number of octets received
> >on the interface, including framing characters for packets that
> >were dropped in the MAC for any reason. I'm not sure this counter
> >includes packets IfInFramesL2FilterDiscards which indicates number
> >of good frames that have been dropped due to the L2 perfect match,
> >broadcast, multicast or MAC control frame filters. If your switch
> >runs STP it would periodically sends BPDU packets to destination
> >address of STP multicast address 01:80:C2:00:00:00. Not sure this
> >is the reason though. Probably David can explain more details on
> >IfHCInBadOctets counter(CCed).
> 
> Again, thanks for that.
> 
> If I could just ask for a bit more assistance, it would be greatly 
> appreciated.  I collected a fair bit of data and it's done nothing but 
> complicate the issue for me so far.
> 
> -If I'm reading the switch stats correctly, most of my drops are 
> host->switch, although I'm not certain of that, these Dell 2848s have no 
IfHCInBadOctets is counter for RX(i.e. switch->host).  And TX
hardware MAC counters showed no error at all.
> real cli interface to speak of.
> 
> -I'm seeing similar drops, but not quite so bad, on other hosts.  They all 
> use the em interface but for one other with bge.  This particular host 
> (with the bce interface) just seems to get bad enough to trigger nagios 
> alerts (simple ping check from a host on the same switch/subnet).  All 
> these hosts are forced to 100/FD as is the switch.  The switch is our 
> external (internet facing) switch with a 100Mb connection to our upstream. 
> At *peak* our aggregate bandwidth on this switch is maybe 45Mb/s, most of 
> it outbound.  We are nowhere near saturating the switching fabric (I 
> hope).
> 
> -There are three reasons I set the ports at 100baseTX - the old Cisco that 
> lost a few ports was a 10/100 switch and the hosts were already hard-coded 
> for 100/FD, I figured if the Dell craps out I can toss the Cisco back 
> without changing the speed/duplex on all the hosts, and lastly our uplink 
> is only 100/FD so why bother.  Also maybe some vague notion that I'd not 
> use up some kind of buffers in the switch by matching the speed on all 
> ports...
> 
> -We have an identical switch (same model, same hardware rev, same 
> firmware) for our internal network (lots of log analysis over nfs mounts, 
> a ton of internal dns (upwards of 10K queries/sec at peak), and occasional 
> large file transfers.  On this host and all others, the dropped packet 
> count on the switch ports is at worst around 5000 packets.  The counters 
> have not been reset on it and it's been up for 460 days.
> 
> -A bunch of legacy servers that have fxp interfaces on the external switch 
> and em on the internal switch show *no* significant drops nor do 
> the switch ports they are connected to.
> 
> -To see if forcing the ports to 100/FD was causing a problem, I set the 
> host and switch to 1000/FD.  Over roughly 24 hours, the switch is 
> reporting 197346 dropped packets of 52166986 packets received.
> 
> -Tonight's change was to turn off spanning tree.  This is a long shot 
> based on some Dell bug I saw discussed on their forums.  Given our simple 
> network layout, I don't really see spanning tree as being at all 
> necessary.
> 
> One of the first replies I got to my original post was private and 
> amounted to "Dell is garbage".  That may be true, but the excellent 
> performance on the more heavily loaded internal network makes me doubt 
> there's a fundamental shortcoming in the switch.  It would have to be real 
> garbage to crap out with a combined load of 45Mb/s.  I am somewhat curious 
> if some weird buffering issue is possible with a mix of 100/FD and 1000/FD 
> ports.
> 
> Any thoughts on that?  It's the only thing that differs between the two 
> switches.
> 
This makes me think possibility of duplex mismatch between bce(4)
and link partner. You should not use forced media configuration on
1000baseT link. If you used manual media configuration on bce(4)
and link partner used auto-negotiation, resolved duplex would be
half-duplex. It's standard behavior and Duplex mismatch can cause
strange problems.
I would check whether link partner also agrees on the resolved
speed/duplex of bce(4).
> Before replacing the switch I'm also going to cycle through turning off 
> TSO, rxcsum, and txcsum since it seems that has been a fix for some people 
> with otherwise unexplained network issues.  I assume those features all 
> depend on the firmware of the NIC being bug-free, and I'm not quite ready 
> to accept that.
> 
It's worth to try but I wonder how it can explain ICMP ECHO request
packet loss.
> Thanks,
> 
> Charles
    
    
More information about the freebsd-net
mailing list