bge(4) sysctl tuneables -- a blast from the past. more knobs! MORE!

Bruce Evans brde at optusnet.com.au
Fri Apr 19 19:03:32 UTC 2013


On Thu, 18 Apr 2013, Sean Bruno wrote:

> Version 0.2 of patches to bge(4).  I'm not totally happy with it, but
> comments welcome.  I need better explanations of usage for the man page.
>
> I've dropped bge_rxd completely here as it was suggested to not even
> bother adjusting it.
>
> http://people.freebsd.org/~sbruno/bge_config_update_1.txt

I dislike this.  It is very bloated to have a SYSCTL_PROC() for every
setting.  Yet the SYSCTL_PROC()'s aren't even correct.  They write
directly to the device registers with no locking.  Further bloat is
given by a tunable backing every sysctl.  Style bugs include verbose
descriptions and obfuscation the strings for these.

I prefer my version of course.  It allows setting all the values using
SYSCTL_INT()s.  It is a feature that these settings are to memory
variables only.  This allows building up a complicated set of changes
without having to ensure that all intermediate steps give a valid and/
or usable hardware setting at every stage.  Then there is a SYSCTL_PROC()
for the operation of transferring the values to the hardware.  This
sysctl isn't missing locking, and doesn't just write to the device
registers, but reprograms the coal hardware like a device reset would.
Direct writing to the registers would probably work except possibly
for transient problems, and but the hardware doesn't seem to guarantee
this and there are no benefits from current corners here.

The little SYSCTL_PROC()s could be used to do range checks, but range
checks would not be very useful and none are done now.   Range checks
of individual values as they are set would defeat building up complicated
sets of changes in another way.  Many combinations of values are invalid,
but it is hard to say which, and it would be wrong to disallow intermediate
invalid combinations that won't ever be used.  All that can be done easily
is to range check each parameter from its min to its max.  This is not
very useful, since combinations with everything min or everything max
are probably just as dysfunctional as some slightly out of bounds values.

>> There are only 2 bge tunables now, and they only have half of these bugs:
>> - hw.bge.allow_asf is in the wrong namespace
>> - hw.bge.allow_asf is too global
> Maintainer disagrees with this.

I think he agreed that this were wrong too.  It is at least missing the
verboseness (5 lines for this; 26 lines for each of 4 new sysctls).
Anyway, the old bugs should be fixed separately.

Here are my old coal patches for bge.  They have been edited from a larger
patch and may have editing errors.

@ diff -c2 ./dev/bge/if_bge.c~ ./dev/bge/if_bge.c
@ *** ./dev/bge/if_bge.c~	Fri May 16 16:39:01 2008
@ --- ./dev/bge/if_bge.c	Thu Sep 30 19:51:38 2010
@ ***************
@ *** 1,2 ****
@ --- 1,10 ----
@ + int bge_careful_coal = 1;
@ + int bge_qlen = 1;
@ + int bge_errsrc = 0x17;
@ + int bge_rx_repl = 64;
@ + int bge_coal_writes;
@ + int bge_coal_write_fails;
@ + int bge_polling_trust_statusword = 0;
@ + 
@   /*-
@    * Copyright (c) 2001 Wind River Systems

Debugging variables.

bge_careful_coal selects a way of directly writing rx_max_coal_bds that is
not completely sloppy though it doesn't busy-wait to synchronize.

@ ***************
@ *** 1661,1664 ****
@ --- 1665,1675 ----
@ 
@   	/* Set up host coalescing defaults */
@ + 	if (sc->bge_dyncoal_max_intr_freq != 0) {
@ + 		sc->bge_dyncoal_scale = ((uint64_t)1 << 24) /
@ + 		    sc->bge_dyncoal_max_intr_freq;
@ + 		sc->bge_rx_coal_ticks = BGE_TICKS_PER_SEC /
@ + 		    sc->bge_dyncoal_max_intr_freq;
@ + 	} else
@ + 		sc->bge_rx_coal_ticks = 150;
@   	CSR_WRITE_4(sc, BGE_HCC_RX_COAL_TICKS, sc->bge_rx_coal_ticks);
@   	CSR_WRITE_4(sc, BGE_HCC_TX_COAL_TICKS, sc->bge_tx_coal_ticks);
@ ***************
@ *** 2351,2354 ****
@ --- 2362,2414 ----
@ 
@   static int
@ + bge_sysctl_program_coal(SYSCTL_HANDLER_ARGS)
@ + {
@ + 	struct bge_softc *sc;
@ + 	int error, i, val;
@ + 
@ + 	val = 0;
@ + 	error = sysctl_handle_int(oidp, &val, 0, req);
@ + 	if (error != 0 || req->newptr == NULL)
@ + 		return (error);
@ + 	sc = arg1;
@ + 	BGE_LOCK(sc);
@ + 
@ + 	/* XXX cut from bge_blockinit(): */
@ + 
@ + 	/* Disable host coalescing until we get it set up */
@ + 	CSR_WRITE_4(sc, BGE_HCC_MODE, 0x00000000);
@ + 
@ + 	/* Poll to make sure it's shut down. */
@ + 	for (i = 0; i < BGE_TIMEOUT; i++) {
@ + 		if (!(CSR_READ_4(sc, BGE_HCC_MODE) & BGE_HCCMODE_ENABLE))
@ + 			break;
@ + 		DELAY(10);
@ + 	}
@ + 
@ + 	if (i == BGE_TIMEOUT) {
@ + 		device_printf(sc->bge_dev,
@ + 		    "host coalescing engine failed to idle\n");
@ + 		CSR_WRITE_4(sc, BGE_HCC_MODE, BGE_HCCMODE_ENABLE);
@ + 		BGE_UNLOCK(sc);
@ + 		return (ENXIO);
@ + 	}
@ + 
@ + 	/* Set up host coalescing defaults */
@ + 	if (sc->bge_dyncoal_max_intr_freq != 0)
@ + 		sc->bge_dyncoal_scale = ((uint64_t)1 << 24) /
@ + 		    sc->bge_dyncoal_max_intr_freq;
@ + 	CSR_WRITE_4(sc, BGE_HCC_RX_COAL_TICKS, sc->bge_rx_coal_ticks);
@ + 	CSR_WRITE_4(sc, BGE_HCC_TX_COAL_TICKS, sc->bge_tx_coal_ticks);
@ + 	CSR_WRITE_4(sc, BGE_HCC_RX_MAX_COAL_BDS, sc->bge_rx_max_coal_bds);
@ + 	CSR_WRITE_4(sc, BGE_HCC_TX_MAX_COAL_BDS, sc->bge_tx_max_coal_bds);
@ + 
@ + 	/* Turn on host coalescing state machine */
@ + 	CSR_WRITE_4(sc, BGE_HCC_MODE, BGE_HCCMODE_ENABLE);
@ + 
@ + 	BGE_UNLOCK(sc);
@ + 	return (0);
@ + }
@ + 
@ + static int
@   bge_attach(device_t dev)
@   {
@ ***************
@ *** 2584,2591 ****
@   	/* Set default tuneable values. */
@   	sc->bge_stat_ticks = BGE_TICKS_PER_SEC;
@ ! 	sc->bge_rx_coal_ticks = 150;
@ ! 	sc->bge_tx_coal_ticks = 150;
@ ! 	sc->bge_rx_max_coal_bds = 10;
@ ! 	sc->bge_tx_max_coal_bds = 10;
@ 
@   	/* Set up ifnet structure */
@ --- 2645,2652 ----
@   	/* Set default tuneable values. */
@   	sc->bge_stat_ticks = BGE_TICKS_PER_SEC;
@ ! 	sc->bge_dyncoal_max_intr_freq = 10000;
@ ! 	sc->bge_tx_coal_ticks = 1000000;
@ ! 	sc->bge_rx_max_coal_bds = 128;
@ ! 	sc->bge_tx_max_coal_bds = BGE_TX_RING_CNT * 3 / 4;
@ 
@   	/* Set up ifnet structure */
@ ***************
@ *** 3331,3343 ****
@   	if ((sc->bge_asicrev == BGE_ASICREV_BCM5700 &&
@   	    sc->bge_chipid != BGE_CHIPID_BCM5700_B2) ||
@ ! 	    statusword || sc->bge_link_evt)
@   		bge_link_upd(sc);
@ 
@   	if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
@ - 		/* Check RX return ring producer/consumer. */
@   		bge_rxeof(sc);
@ - 
@ - 		/* Check TX ring producer/consumer. */
@   		bge_txeof(sc);
@   	}
@ 
@ --- 3583,3638 ----
@   	if ((sc->bge_asicrev == BGE_ASICREV_BCM5700 &&
@   	    sc->bge_chipid != BGE_CHIPID_BCM5700_B2) ||
@ ! 	    (macstatus & BGE_MACSTAT_LINK_CHANGED) || sc->bge_link_evt)
@   		bge_link_upd(sc);
@ 
@   	if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
@   		bge_rxeof(sc);
@   		bge_txeof(sc);
@ + 		if (sc->bge_dyncoal_max_intr_freq != 0 &&
@ + 		    ++sc->bge_dyncoal_intrcnt == 16) {
@ + 			struct bintime bt;
@ + 			uint32_t dpi, pfrac, tfrac, xtime;
@ + 
@ + 			binuptime(&bt);
@ + 			xtime = (bt.sec << 24) | (bt.frac >> 40);
@ + 			pfrac = (ifp->if_ipackets - sc->bge_dyncoal_ipackets) *
@ + 			    sc->bge_dyncoal_scale;
@ + 			tfrac = xtime - sc->bge_dyncoal_xtime;
@ + 			sc->bge_dyncoal_rx_pps =
@ + 			    (ifp->if_ipackets - sc->bge_dyncoal_ipackets) *
@ + 			    ((uint64_t)1 << 24) / tfrac;
@ + 			dpi = pfrac / (tfrac | 2) + 1;
@ + 			if (dpi > sc->bge_rx_max_coal_bds)
@ + 				dpi = sc->bge_rx_max_coal_bds;
@ + 			if (dpi != sc->bge_dyncoal_rx_max_coal_bds) {
@ + 				if (bge_careful_coal) {
@ + 				CSR_WRITE_4(sc, BGE_HCC_MODE, 0);
@ + 				CSR_READ_4(sc, BGE_HCC_MODE);
@ + 				if ((CSR_READ_4(sc, BGE_HCC_MODE) &
@ + 				    BGE_HCCMODE_ENABLE) == 0) {
@ + 					CSR_WRITE_4(sc, BGE_HCC_RX_MAX_COAL_BDS,
@ + 					    dpi);
@ + 					sc->bge_dyncoal_rx_max_coal_bds = dpi;
@ + 					bge_coal_writes++;
@ + 				} else
@ + 					bge_coal_write_fails++;
@ + 				CSR_WRITE_4(sc, BGE_HCC_MODE,
@ + 				    BGE_HCCMODE_ENABLE);
@ + 				} else {
@ + 				/*
@ + 				 * XXX not waiting for the engine is needed
@ + 				 * for efficiency since we reprogram it a
@ + 				 * lot so as to react fast, and this seems
@ + 				 * to work.  However, similar reprogramming
@ + 				 * of RX_COAL_TICKS doesn't work.
@ + 				 */
@ + 				CSR_WRITE_4(sc, BGE_HCC_RX_MAX_COAL_BDS, dpi);
@ + 				sc->bge_dyncoal_rx_max_coal_bds = dpi;
@ + 				}
@ + 			}
@ + 			sc->bge_dyncoal_xtime = xtime;
@ + 			sc->bge_dyncoal_intrcnt = 0;
@ + 			sc->bge_dyncoal_ipackets = ifp->if_ipackets;
@ + 		}
@   	}
@ 
@ ***************
@ *** 4450,4453 ****
@ --- 4756,4782 ----
@   	children = SYSCTL_CHILDREN(device_get_sysctl_tree(sc->bge_dev));
@ 
@ + 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "program_coal",
@ + 	    CTLTYPE_INT | CTLFLAG_RW,
@ + 	    sc, 0, bge_sysctl_program_coal, "I",
@ + 	    "program bge coalescence values");
@ + 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "rx_coal_ticks", CTLFLAG_RW,
@ + 	    &sc->bge_rx_coal_ticks, 0, "");
@ + 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "tx_coal_ticks", CTLFLAG_RW,
@ + 	    &sc->bge_tx_coal_ticks, 0, "");
@ + 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "rx_max_coal_bds", CTLFLAG_RW,
@ + 	    &sc->bge_rx_max_coal_bds, 0, "");
@ + 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "tx_max_coal_bds", CTLFLAG_RW,
@ + 	    &sc->bge_tx_max_coal_bds, 0, "");
@ + 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_max_intr_freq",
@ + 	    CTLFLAG_RW,
@ + 	    &sc->bge_dyncoal_max_intr_freq, 0, "");
@ + 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_rx_max_coal_bds",
@ + 	    CTLFLAG_RD,
@ + 	    &sc->bge_dyncoal_rx_max_coal_bds, 0, "");
@ + 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_rx_pps", CTLFLAG_RD,
@ + 	    &sc->bge_dyncoal_rx_pps, 0, "");
@ + 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_scale", CTLFLAG_RD,
@ + 	    &sc->bge_dyncoal_scale, 0, "");
@ + 
@   #ifdef BGE_REGISTER_DEBUG
@   	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "debug_info",
@ diff -c2 ./dev/bge/if_bgereg.h~ ./dev/bge/if_bgereg.h
@ *** ./dev/bge/if_bgereg.h~	Fri May 16 16:39:21 2008
@ --- ./dev/bge/if_bgereg.h	Fri May 16 16:39:23 2008
@ ***************
@ *** 2579,2582 ****
@ --- 2573,2583 ----
@   	uint32_t		bge_tx_discards;
@   	uint32_t		bge_tx_collisions;
@ + 	int			bge_dyncoal_intrcnt;
@ + 	u_long			bge_dyncoal_ipackets;
@ + 	uint32_t		bge_dyncoal_max_intr_freq;
@ + 	uint32_t		bge_dyncoal_rx_max_coal_bds;
@ + 	uint32_t		bge_dyncoal_rx_pps;
@ + 	uint32_t		bge_dyncoal_scale;
@ + 	uint32_t		bge_dyncoal_xtime;
@   #ifdef DEVICE_POLLING
@   	int			rxcycles;

The dynamical coal programming hasn't been mentioned recently, but really,
it is the only useful thing in the above.  Once the system does correct
dynamic coal programming, there is no need for static sysctls except to
reprove that they can't do such a good job as dynamic programming.

I don't bother with dynamic coal program for tx.  It would be possible
to switch to more conservative tx settings under heavy rx load, and if
you know too much about the device internals then balance all the
resource uses so that they compete with each other as little as possible,
but the possible gains from that are small.

Bruce


More information about the freebsd-net mailing list