Tuning if_bge for high packet rates (receive
brde at optusnet.com.au
Thu Jun 14 00:03:34 UTC 2007
On Tue, 12 Jun 2007, Stephan Koenig wrote:
> We have some servers that have a very high packet rate in a normal
> production mode, and require polling to keep the CPU load reasonable.
> We have Kern.HZ=4000, and even still, have some dropped packets.
> On our servers that use the intel "em" driver, we have tuned the
> drivers as following, by default, if_em.h has:
> #define EM_MIN_TXD 80
> #define EM_MAX_TXD_82543 256
> #define EM_MAX_TXD 4096
> #define EM_DEFAULT_TXD EM_MAX_TXD_82543
> #define EM_MIN_RXD 80
> #define EM_MAX_RXD_82543 256
> #define EM_MAX_RXD 4096
> #define EM_DEFAULT_RXD EM_MAX_RXD_82543
> We have changed EM_DEFAULT_TXD and EM_DEFAULT_RXD to 4096 -- This
> solved the problem on these servers.
> The question is now what to do on our servers with Broadcom "bge"
> series cards. Does anyone know how to tune this driver in a similar
1) Change BGE_SSLOTS from 256 to 512. This corresponds to changing
EM_DEFAULT_RXD from 256 to 4096, except the max is much smaller.
2) Don't use polling. Polling "works" to reduce CPU by dropping packets
on input and be reducing throughput on output. It works particularly
badly for bge.
3) When not using polling, change the interrupt coalescing parameters.
These can be set to give similar behaviour to polling, without most
of pollings losses or features. E.g., setting interrupt moderation
timeouts to 250 uS gives behaviour similar to polling at 4000 Hz.
For input, the main difference is that the interrupts are high priority
so dropping packets is less likely. For output, the throttling
behaviour of polling is more useful and without it bge interfaces may
use more CPU so as to actually send packets as fast as possible.
I use the following simple coalescing tuning in RELENG_6. This
essentially restores the old tuning. The tuning is now essentially
Linux''s and is not good for FreeBSD.
% Index: if_bge.c
% RCS file: /home/ncvs/src/sys/dev/bge/if_bge.c,v
% retrieving revision 220.127.116.11
% diff -u -2 -r18.104.22.168 if_bge.c
% --- if_bge.c 8 May 2007 16:18:21 -0000 22.214.171.124
% +++ if_bge.c 9 May 2007 10:09:55 -0000
% @@ -2391,7 +2391,7 @@
% sc->bge_stat_ticks = BGE_TICKS_PER_SEC;
% sc->bge_rx_coal_ticks = 150;
% - sc->bge_tx_coal_ticks = 150;
% - sc->bge_rx_max_coal_bds = 10;
% - sc->bge_tx_max_coal_bds = 10;
% + sc->bge_tx_coal_ticks = 1000000;
% + sc->bge_rx_max_coal_bds = 64;
% + sc->bge_tx_max_coal_bds = 384;
% /* Set up ifnet structure */
The parameters here are:
Set this to 100000/N to give essentially the same behaviour as
polling at N Hz. The default of 150 gives polling at 6667 Hz.
This is a good default.
Like the rx coal_ticks, but not really needed since tx
can be controlled better by coal_bds. I set it to 1000000
(1 second) since it is only used to free inactive tx descriptors
if the device becomes idle.
Set this to 1 to give minimum latency and maximum CPU use. Set it
to 0 (infinity) or large to give bad latency but less CPU use.
The default of 64 gave a good traeoff. The current and RELENG_6
value is too small. For 1500-byte packets, the regression in this
parameter little effect when the rx ticks timeout is 150 uS, since
the timeout fires first for either value, but for tiny packets a
value of 10 for this parameter asks for 150k interrupts per second
which is too many.
Set this to almost the maximum possible to give minimum CPU
use at almost no cost to latency. The maximum possible is 511
(496?), but that is too agressive. I use 384. The old value
of 128 works OK too (costs ~10% more CPU than 512). The current
and RELENG_6 value of 10 is not good. It costs about 100%
more CPU than a value of 384 and has no observable good effects.
(Above 384, there are some observable bad effects due to bus
contention with rx. I only tested on PCI 33MHz buses. The
tradeoffs might be a little different on faster buses.)
I use dynamic tuning of bge rx coalescing (by rate-limiting interrupts)
in -current. em does this in hardware.
More information about the freebsd-net