FreeBSD 10 network flapping, ix driver unreliable?

Kevin Bowling kevin.bowling at kev009.com
Wed Feb 19 18:29:12 UTC 2014


On 2/18/2014 7:16 AM, George Neville-Neil wrote:
>
> On Feb 17, 2014, at 16:41 , Kevin Bowling <kevin.bowling at kev009.com> wrote:
>
>> On 2/16/2014 9:04 PM, George Neville-Neil wrote:
>>>
>>> On Feb 15, 2014, at 21:32 , Kevin Bowling <kevin.bowling at kev009.com> wrote:
>>>
>>>> On 2/15/2014 4:43 PM, George Neville-Neil wrote:
>>>>>
>>>>> On Feb 15, 2014, at 15:14 , Kevin Bowling <kevin.bowling at kev009.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have FreeBSD 10.0-RELEASE installed on two Dell C6100 nodes.  Each node has an Intel X520-DA2 dual port 10gig card.  One of the ports on each go to a switch using direct attach coaxial cables.  The other port is directly connected between the two nodes (think crossover in twisted pair terminology) again using direct attach coaxial cables.
>>>>>>
>>>>>> On both machines, and on both ports (including the "crossover"), the links flap several times per day.
>>>>>>
>>>>>> I've pasted the output of lspci -vv and dmesg here:
>>>>>> https://gist.github.com/kev009/9024442
>>>>>>
>>>>>> There's nothing outstanding about the setup otherwise.  I suspected some interaction with the switch initially but the "crossover" has eliminated that suspicion.
>>>>>>
>>>>>> It seems the ix driver is not very reliable under common conditions, i.e. https://forums.freebsd.org/viewtopic.php?f=7&t=44570 and a search of this list.  Any recommendations or tests?
>>>>>>
>>>>>
>>>>> Can you post (to your gist link) the output of sysctl dev.ix ?
>>>>
>>>> Hi George,
>>>>
>>>> sysctl info added to gist link.  ix0 has been up for around 27 days. ix1 for about 24hrs.
>>>>
>>>
>>> I think this has something to do with it.
>>>
>>> dev.ix.0.mac_stats.local_faults: 314
>>> dev.ix.0.mac_stats.remote_faults: 41
>>>
>>> The device is seeing errors at the MAC layer, which  I don’t think a driver bug would
>>> cause, though there is always the possibility of a misconfiguration causing flapping.
>>> Can you try different cables?
>>>
>>> When you hook it to the switch does the switch give better diagnostics?  Reading
>>> over the Intel 82599 chip manual is not, shall we say, illuminating,
>>> "Number of faults in the local MAC. This register is valid only when the link speed is 10 Gb/s.”
>>
>> Appreciate your help, this led me to find some new info although it doesn't entirely answer what local_faluts are for me: http://grouper.ieee.org/groups/802/3/ae/public/nov00/taborek_2_1100.pdf
>>
>> I may have spoke too soon, the "crossover" ix1 seems to be holding steady, so the local and remote faults must have been during negotiation and me bringing up the interfaces.
>>
>> On the other system's ix0, the faults are almost all local and quite a bit more frequent:
>> dev.ix.0.mac_stats.local_faults: 10752
>> dev.ix.0.mac_stats.remote_faults: 2
>>
>> I then noticed the switch had mandatory flow control on both send and receive for 10gig, but the FreeBSD box was only negotiating receive flow control.  I disabled both on the switch and rebooted but am still seeing some increments of local_faults.
>>
>> Could it be a switch STP problem?  Switch is a Cisco 4948-10ge.  Configs look like below, which is working well on some copper gigabit interfaces:
>>
>> spanning-tree mode pvst
>> spanning-tree portfast default
>> spanning-tree extend system-id
>> !
>> interface TenGigabitEthernet1/49
>> switchport trunk encapsulation dot1q
>> switchport mode trunk
>> spanning-tree portfast trunk
>> !
>> interface TenGigabitEthernet1/50
>> switchport trunk encapsulation dot1q
>> switchport mode trunk
>> flowcontrol receive desired
>> flowcontrol send desired
>> spanning-tree portfast trunk
>> !
>>
>> It will be hard for me to source SFPs and fiber, but I can try to see if it's a physical layer problem.  In the mean time I might try imaging one of the systems with a different OS and seeing if the problem persists.
>>
>
> Another possibility is flow control.
>
> Can you try this setting?
>
> sysctl dev.ix.0.fc=0

No luck with flow control disabled on the switch and on the interface 
:(.  I'll continue to look into problems on the switch side.





More information about the freebsd-net mailing list