Tuning if_bge for high packet rates (receive descriptors/transmit descriptors?)

Bruce Evans brde at optusnet.com.au
Thu Jun 14 00:03:34 UTC 2007


On Tue, 12 Jun 2007, Stephan Koenig wrote:

> We have some servers that have a very high packet rate in a normal
> production mode, and require polling to keep the CPU load reasonable.
>
> We have Kern.HZ=4000, and even still, have some dropped packets.
>
> On our servers that use the intel "em" driver, we have tuned the
> drivers as following, by default, if_em.h has:
>
> #define EM_MIN_TXD              80
> #define EM_MAX_TXD_82543        256
> #define EM_MAX_TXD              4096
> #define EM_DEFAULT_TXD          EM_MAX_TXD_82543
>
> #define EM_MIN_RXD              80
> #define EM_MAX_RXD_82543        256
> #define EM_MAX_RXD              4096
> #define EM_DEFAULT_RXD          EM_MAX_RXD_82543
>
>
> We have changed EM_DEFAULT_TXD and EM_DEFAULT_RXD to 4096 -- This
> solved the problem on these servers.
>
> The question is now what to do on our servers with Broadcom "bge"
> series cards.  Does anyone know how to tune this driver in a similar
> matter?

1) Change BGE_SSLOTS from 256 to 512.  This corresponds to changing
    EM_DEFAULT_RXD from 256 to 4096, except the max is much smaller.
2) Don't use polling.  Polling "works" to reduce CPU by dropping packets
    on input and be reducing throughput on output.  It works particularly
    badly for bge.
3) When not using polling, change the interrupt coalescing parameters.
    These can be set to give similar behaviour to polling, without most
    of pollings losses or features.  E.g., setting interrupt moderation
    timeouts to 250 uS gives behaviour similar to polling at 4000 Hz.
    For input, the main difference is that the interrupts are high priority
    so dropping packets is less likely.  For output, the throttling
    behaviour of polling is more useful and without it bge interfaces may
    use more CPU so as to actually send packets as fast as possible.

    I use the following simple coalescing tuning in RELENG_6.  This
    essentially restores the old tuning.  The tuning is now essentially
    Linux''s and is not good for FreeBSD.

% Index: if_bge.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/dev/bge/if_bge.c,v
% retrieving revision 1.91.2.23
% diff -u -2 -r1.91.2.23 if_bge.c
% --- if_bge.c	8 May 2007 16:18:21 -0000	1.91.2.23
% +++ if_bge.c	9 May 2007 10:09:55 -0000
% @@ -2391,7 +2391,7 @@
%  	sc->bge_stat_ticks = BGE_TICKS_PER_SEC;
%  	sc->bge_rx_coal_ticks = 150;
% -	sc->bge_tx_coal_ticks = 150;
% -	sc->bge_rx_max_coal_bds = 10;
% -	sc->bge_tx_max_coal_bds = 10;
% +	sc->bge_tx_coal_ticks = 1000000;
% +	sc->bge_rx_max_coal_bds = 64;
% +	sc->bge_tx_max_coal_bds = 384;
% 
%  	/* Set up ifnet structure */

The parameters here are:

sc->bge_rx_coal_ticks:
 	Set this to 100000/N to give essentially the same behaviour as
 	polling at N Hz.  The default of 150 gives polling at 6667 Hz.
 	This is a good default.
sc->bge_tx_coal_ticks:
 	Like the rx coal_ticks, but not really needed since tx
 	can be controlled better by coal_bds.  I set it to 1000000
 	(1 second) since it is only used to free inactive tx descriptors
 	if the device becomes idle.
sc->bge_rx_max_coal_bds:
 	Set this to 1 to give minimum latency and maximum CPU use.  Set it
 	to 0 (infinity) or large to give bad latency but less CPU use.
 	The default of 64 gave a good traeoff.  The current and RELENG_6
 	value is too small.  For 1500-byte packets, the regression in this
 	parameter little effect when the rx ticks timeout is 150 uS, since
 	the timeout fires first for either value, but for tiny packets a
 	value of 10 for this parameter asks for 150k interrupts per second
 	which is too many.
sc->bge_tx_max_coal_bds:
 	Set this to almost the maximum possible to give minimum CPU
 	use at almost no cost to latency.  The maximum possible is 511
 	(496?), but that is too agressive.  I use 384.  The old value
 	of 128 works OK too (costs ~10% more CPU than 512).  The current
 	and RELENG_6 value of 10 is not good.  It costs about 100%
 	more CPU than a value of 384 and has no observable good effects.
 	(Above 384, there are some observable bad effects due to bus
 	contention with rx.  I only tested on PCI 33MHz buses.  The
 	tradeoffs might be a little different on faster buses.)

I use dynamic tuning of bge rx coalescing (by rate-limiting interrupts)
in -current.  em does this in hardware.

Bruce


More information about the freebsd-net mailing list