bce(4) - com_no_buffers (Again)
Tom Judge
tom at tomjudge.com
Fri Sep 24 17:44:51 UTC 2010
On 09/23/2010 02:33 PM, Tom Judge wrote:
> The throttle command I am using in the tests is the one from here:
>
> http://klicman.org/throttle/
>
>
> On 09/23/2010 02:26 PM, Tom Judge wrote:
>
>> On 09/23/2010 01:21 PM, David Christensen wrote:
>>
>>
>>>>>> Under testing I have yet to see a memory fragmentation issue with
>>>>>>
>>>>>>
>>>>>>
>>>> this
>>>>
>>>>
>>>>
>>>>>> driver. I follow up if/when I find a problem with this again.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>> So here we are again. The system is locking up again because of 9k
>>>> mbuf
>>>> allocation failures.
>>>>
>>>>
>>>>
>>> Failure to allocate a new buffer should cause the driver to
>>> drop the received frame and reuse the buffer, not lock up the
>>> system. Are you seeing the lockup come from bce(4) or does
>>> it come from somewhere else due to the dropped data?
>>>
>>>
>>>
>>>
>> The lockup is not from the NIC as such, the systems have the appearance
>> of locking up as home directories are on NFS and the user information is
>> stored in a remote LDAP server. When the system starts to drop frames
>> due to lack of 9k memory regions it tends to last for a few minutes
>> (when it is really bad) and stop all traffic into the system. This
>> appears to the average user as a complete system pause.
>>
>>
>>
>>
>>>>>> Is there a way to fix the RX buffer shortage issues (when header
>>>>>> splitting is turned on) so that they are guarded by flow control.
>>>>>>
>>>>>>
>>>>>>
>>>> Maybe
>>>>
>>>>
>>>>
>>>>>> change the low watermark for flow control when its enabled?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> I'm not sure how much it would help but try changing RX low
>>>>> watermark. Default value is 32 which seems to be reasonable value.
>>>>> But it's only for 5709/5716 controllers and Linux seems to use
>>>>> different default value.
>>>>>
>>>>>
>>>>>
>>>>>
>>>> These are: NetXtreme II BCM5709 Gigabit Ethernet
>>>>
>>>> So my next task is to turn the watermark related defines into sysctls
>>>> and turn on header splitting so that I can try to tune them without
>>>> having to reboot.
>>>>
>>>>
>>>>
>>>>
>>> Do you have flow control enabled? There are arguments both for
>>> and against flow control. For bce(4), I haven't tested flow control
>>> for quite a while and it's behavior may have changed since it is
>>> controlled by firmware. Keep an eye on the hardware statistics
>>> to see that's it's actively generating pause frames.
>>>
>>>
>>>
>> 3) With flow control enabled and header splitting on flood the server
>> with very small frames (200 bytes). (Using the same test as in case 1).
>> My aim is to tune the watermark here so that there are no frames dropped
>> due to BD shortages.
>>
>>
Card info unhidden:
bce0: ASIC (0x57092003); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (5.2.2);
Flags (SPLT|MSI|MFW); MFW (NCSI 2.0.8)
So having done lots of testing with flow control turned on as well as
header splitting it seems like flow control may be broken with header
splitting?
I have been using the patch attached to play with the flow control water
marks.
I have tried with with following data points and am finding it difficult
to get flow control to kick in before the card runs out of descriptors
and starts dropping frames:
low: 16 high: 127
low: 32 high: 127
low: 64 high: 127
low: 96 high: 127
low: 32 high: 196
low: 64 high: 196
low: 128 high: 256
None of these seem to have any noticeable or effect on the drop rate or
the number of dev.bce.0.stat_FlowControlDone's in the sample period.
Thoughs?
Tom
--
TJU13-ARIN
-------------- next part --------------
Index: if_bce.c
===================================================================
--- if_bce.c (revision 949)
+++ if_bce.c (working copy)
@@ -511,6 +511,21 @@
SYSCTL_UINT(_hw_bce, OID_AUTO, msi_enable, CTLFLAG_RDTUN, &bce_msi_enable, 0,
"MSI-X|MSI|INTx selector");
+
+/* Tunable RX flow control low water mark. */
+/* Without header splitting the default is 32 */
+static int bce_rx_low_water_mark = BCE_L2CTX_RX_LO_WATER_MARK_DEFAULT;
+TUNABLE_INT("hw.bce.rx_low_water_mark", &bce_rx_low_water_mark);
+SYSCTL_UINT(_hw_bce, OID_AUTO, rx_low_water_mark, CTLFLAG_RDTUN, &bce_rx_low_water_mark, 0,
+"Default RX Flow Control Low Water Mark");
+
+/* Tunable RX flow control high water mark. */
+/* Without header splitting the default is 32 */
+static int bce_rx_high_water_mark = USABLE_RX_BD / 4;
+TUNABLE_INT("hw.bce.rx_high_water_mark", &bce_rx_high_water_mark);
+SYSCTL_UINT(_hw_bce, OID_AUTO, rx_high_water_mark, CTLFLAG_RDTUN, &bce_rx_high_water_mark, 0,
+"Default RX Flow Control High Water Mark");
+
/* ToDo: Add tunable to enable/disable strict MTU handling. */
/* Currently allows "loose" RX MTU checking (i.e. sets the */
/* H/W RX MTU to the size of the largest receive buffer, or */
@@ -1780,11 +1795,15 @@
}
if (mii->mii_media_active & IFM_FLAG1) {
+ BCE_PRINTF("%s(%d): Enabling TX flow control.\n",
+ __FILE__, __LINE__);
DBPRINT(sc, BCE_INFO_PHY,
"%s(): Enabling TX flow control.\n", __FUNCTION__);
BCE_SETBIT(sc, BCE_EMAC_TX_MODE, BCE_EMAC_TX_MODE_FLOW_EN);
sc->bce_flags |= BCE_USING_TX_FLOW_CONTROL;
} else {
+ BCE_PRINTF("%s(%d): Disabling TX flow control.\n",
+ __FILE__, __LINE__);
DBPRINT(sc, BCE_INFO_PHY,
"%s(): Disabling TX flow control.\n", __FUNCTION__);
BCE_CLRBIT(sc, BCE_EMAC_TX_MODE, BCE_EMAC_TX_MODE_FLOW_EN);
@@ -5414,7 +5433,7 @@
u32 lo_water, hi_water;
if (sc->bce_flags && BCE_USING_TX_FLOW_CONTROL) {
- lo_water = BCE_L2CTX_RX_LO_WATER_MARK_DEFAULT;
+ lo_water = bce_rx_low_water_mark;
} else {
lo_water = 0;
}
@@ -5423,11 +5442,12 @@
lo_water = 0;
}
- hi_water = USABLE_RX_BD / 4;
+ hi_water = bce_rx_high_water_mark;
if (hi_water <= lo_water) {
lo_water = 0;
}
+ BCE_PRINTF("Setting Up Flow Control (Pre Scaling), Low Watermark: %d, High Watermark: %d\n", (int)lo_water, (int)hi_water);
lo_water /= BCE_L2CTX_RX_LO_WATER_MARK_SCALE;
hi_water /= BCE_L2CTX_RX_HI_WATER_MARK_SCALE;
@@ -5436,7 +5456,8 @@
hi_water = 0xf;
else if (hi_water == 0)
lo_water = 0;
-
+
+ BCE_PRINTF("Setting Up Flow Control (Post Scaling), Low Watermark: %d, High Watermark: %d\n", (int)lo_water, (int)hi_water);
val |= (lo_water << BCE_L2CTX_RX_LO_WATER_MARK_SHIFT) |
(hi_water << BCE_L2CTX_RX_HI_WATER_MARK_SHIFT);
}
More information about the freebsd-net
mailing list