bce(4) - com_no_buffers (Again)

Mon Sep 13 15:04:39 UTC 2010

On 09/09/2010 07:24 PM, Pyun YongHyeon wrote:
> On Thu, Sep 09, 2010 at 03:58:30PM -0500, Tom Judge wrote:
>   
>> Hi,
>> I am just following up on the thread from March (I think) about this issue.
>>
>> We are seeing this issue on a number of systems running 7.1. 
>>
>> The systems in question are all Dell:
>>
>> * R710 R610 R410
>> * PE2950
>>
>> The latter do not show the issue as much as the R series systems.
>>
>> The cards in one of the R610's that I am testing with are:
>>
>> bce0 at pci0:1:0:0:        class=0x020000 card=0x02361028 chip=0x163914e4
>> rev=0x20 hdr=0x00
>>     vendor     = 'Broadcom Corporation'
>>     device     = 'NetXtreme II BCM5709 Gigabit Ethernet'
>>     class      = network
>>     subclass   = ethernet
>>
>> They are connected to Dell PowerConnect 5424 switches.
>>
>> uname -a:
>> FreeBSD bandor.chi-dc.mintel.ad 7.1-RELEASE-p4 FreeBSD 7.1-RELEASE-p4
>> #3: Wed Sep  8 08:19:03 UTC 2010    
>> tj at dev-tj-7-1-amd64.chicago.mintel.ad:/usr/obj/usr/src/sys/MINTELv10  amd64
>>
>> We are also using 8192 byte jumbo frames, if_lagg and if_vlan in the
>> configuration (the nics are in promisc as we are currently capturing
>> netflow data on another vlan for diagnostic purposes. ):
>>
>>
>>     
<SNIP IFCONFIG/>
>> I have updated the bce driver and the Broadcomm MII driver to the
>> version from stable/7 and am still seeing the issue.
>>
>> This morning I did a test with increasing the RX_PAGES to 8 but the
>> system just hung starting the network.  The route command got stuck in a
>> zone state (Sorry can't remember exactly which).
>>
>> The real question is, how do we go about increasing the number of RX
>> BDs? I guess we have to bump more that just RX_PAGES...
>>
>>
>> The cause for us, from what we can see, is the openldap server sending
>> large group search results back to nss_ldap or pam_ldap.  When it does
>> this it seems to send each of the 600 results in its own TCP segment
>> creating a small packet storm (600*~100byte PDU's) at the destination
>> host.  The kernel then retransmits 2 blocks of 100 results each after
>> SACK kicks in for the data that was dropped by the NIC.
>>
>>
>> Thanks in advance
>>
>> Tom
>>
>>
>>     
<SNIP SYSCTL OUTPUT/>
> FW may drop incoming frames when it does not see available RX
> buffers. Increasing number of RX buffers slightly reduce the
> possibility of dropping frames but it wouldn't completely fix it.
> Alternatively driver may tell available RX buffers in the middle
> of RX ring processing instead of giving updated buffers at the end
> of RX processing. This way FW may see available RX buffers while
> driver/upper stack is busy to process received frames. But this may
> introduce coherency issues because the RX ring is shared between
> host and FW. If FreeBSD has way to sync partial region of a DMA
> map, this could be implemented without fear of coherency issue.
> Another way to improve RX performance would be switching to
> multi-RX queue with RSS but that would require a lot of work and I
> had no time to implement it.
>   

Does this mean that these cards are going to perform badly? This is was
what I gathered from the previous thread.

> BTW, given that you've updated to bce(4)/mii(4) of stable/7, I
> wonder why TX/RX flow controls were not kicked in.
>   

The working copy I used for grabbing the upstream source is at r212371.

Last changes for the directories in my working copy:

sys/dev/bce @  211388
sys/dev/mii @ 212020

I discovered that flow control was disabled on the switches, so I set it
to auto and added a pair of BCE_PRINTF's in the code where it enables
and disables flow control and now it gets enabled.

Without BCE_JUMBO_HDRSPLIT then we see no errors.  With it we see number
of errors, however the rate seems to be reduced compaired to the
previous version of the driver.

Tom

-- 
TJU13-ARIN