bge(4) sysctl tuneables -- a blast from the past.

Tue Apr 16 07:15:00 UTC 2013

On Tue, 16 Apr 2013, YongHyeon PYUN wrote:

> On Mon, Apr 15, 2013 at 03:35:56PM -0700, Sean Bruno wrote:
>>
>>> FreeBSD has too many knobs, but it would be nice if the bge defaults weren't
>>> so broken, so that they don't need overriding.
>>
>> So many knobs ... well here's more.  :-)
>>
>> http://people.freebsd.org/~sbruno/bge_config_update.txt
>>
>> At least this gets a man page update with references to manuals.
>
> You have to change BGE_STD_RX_RING_CNT to change number of RX
> descriptors. It's hard-coded and it needs much more work to change
> that. And I don't see any reason to modify that though(Max # of RX
> descriptor is 512).

I thought that at first too, but a simple change along these lines
must be OK since old versions had it.  There was a BGE_SSLOTS "option"
that was 256.  This was used instead of BGE_STD_RX_RING_CNT in much
the same places that the tunable is now used, since 512 bds used to
be a lot.  From FreeBSD-~5.2:

@ /*
@  * The standard receive ring has 512 entries in it. At 2K per mbuf cluster,
@  * that's 1MB or memory, which is a lot. For now, we fill only the first
@  * 256 ring entries and hope that our CPU is fast enough to keep up with
@  * the NIC.
@  */

"1MB or (sic) memory" wasn't actually a lot when this code was written in
2001.

@ static int
@ bge_init_rx_ring_std(sc)
@ 	struct bge_softc *sc;
@ {
@ 	int i;
@ 
@ /* XXX dishonour the above comment and the misplaced def of BGE_SSLOTS. */
@ #undef BGE_SSLOTS
@ #define BGE_SSLOTS	BGE_STD_RX_RING_CNT

The above 3 lines are from ~5.2, to blow away the BGE_SSLOTS.  It was easier
to edit here than the header file.

@ 	for (i = 0; i < BGE_SSLOTS; i++) {
@ 		if (bge_newbuf_std(sc, i, NULL) == ENOBUFS)
@ 			return(ENOBUFS);
@ 	};

We intentionally removed this "option" other SSLOTS "options", and
changed the default to the maximum possible (BGE_STD_RING_CNT here).
No one knew how to tune these options, especially since they weren't
in conf/options.  Not many more than one one knew that these options
existed.

I don't see how this tunable is useful.  The default of the maximum
value optimizes for throughput and for minimizing dropped packets.
Smaller values would only save a little RAM, but everyone has plenty
of RAM.  Reducing RAM footprint may reduce (~L2) cache pressure, but it
takes _very_ delicate tuning to take advantage of that.

> I think bge(4) touches minimal set of coalescing parameters but
> publicly available bge(4) data sheet shows more coalescing
> parameters.

I don't remember any others except a different set for interrupt mode.

> These parameters could be programmed with different
> values(BDs & ticks) during interrupt. And some parameters are not

No, these are worse than useless, except possibly with with a different
interrupt handler organization.  I looked at a linux driver using them.
They just gave pessimizations, even in the Linux driver.  IIRC, this is
because using them complicates synchronization, so that an extra PIO read
or two is needed to synchronize.

All this for a change in the settings that is useless for most interrupt
handler organizations.  You could try reducing the interrupt moderation
while in interrupt mode, so as to see all new activity before returning.
That is unlikely to be good.  In other interrupt handlers, it is an
optimization to not check for new activity before returning, because
the (PIO) read of the status register is slower than taking another
interrupt.  bge only needs a read from the status block, so it would not
be so slow (but yongari thought that trusting the status block on entry
to the interrupt handler like my version does is too fragile).  Also,
only a small amount of new activity can occur while in interrupt mode
(else the interrupt handler can't keep up), and handling in small batches
(perhaps 1 bd at a time) would be slow.  You could try increasing the
interrupt moderation while in interrupt mode.  I can't see any point in
that.  Perhaps if the hardware is really good, completely turning off
interrupt moderation while in interrupt mode would be good.  The good
hardware would keep DMAing and updating the status block to indicate
how far it got, and we would process as many descriptors as possible
in a single batch before returning, without caring about missing a few.
Then the problems are avoiding races in this and switching back to normal
mode and getting another interrupt for descriptors that we missed.  With
good hardware, there would be no extra i/o for this.

You could add knobs for the interrupt mode settings, and code to support
them.  The main use for the knobs would be to re-verify that any
non-default use of these knobs gives a pessimization.  Not useful.

> applicable to certain controllers. In addition, the allowed value
> range for certain parameters vary on controller models. So I think
> it's good idea to mention allowed value range for each parameters
> as well as a warning that mentions possible connection lost caused
> by wrongly programmed value(i.e. no RX interrupt for
> bge_rx_coal_ticks == 0 && bge_rx_max_coal_bds == 0)

Does anyone know the ranges for all models?  I still haven't found any
documentation that the range is null (nothing works) on 5705-"plus".

bge_rx_coal_ticks == 0 && bge_rx_max_coal_bds == 0 might work accidentally
if there are enough tx interrupts.  There is also the DEVICE_POLLING mistake.
In polling mode, these parameters of course have no effect (Polling mode
disables interrupts, and the coal parameters have no effect when interrupts
are disabled).

Bruce