svn commit: r275136 - in head/sys: dev/e1000 dev/ixgbe kern sys
John Baldwin
jhb at freebsd.org
Mon Dec 1 14:48:39 UTC 2014
On Wednesday, November 26, 2014 08:19:36 PM Alfred Perlstein wrote:
> Author: alfred
> Date: Wed Nov 26 20:19:36 2014
> New Revision: 275136
> URL: https://svnweb.freebsd.org/changeset/base/275136
>
> Log:
> Make igb and ixgbe check tunables at probe time.
>
> This allows one to make a kernel module to tune the
> number of queues before the driver loads.
>
> This is needed so that a module at SI_SUB_CPU can set
> tunables for these drivers to take. Otherwise getenv
> is called too early by the TUNABLE macros.
>
> Reviewed by: smh
> Phabric: https://reviews.freebsd.org/D1149
The explicit TUNABLE_INT_FETCH strikes me as the wrong approach and hackish.
That is, each time you want to frob a tunable like this you will have to go
add a bunch of code to explicitly re-read the tunable, etc. Instead, you
could have simply changed your helper module to use kernel_sysctlbyname() to
set hw.igb.num_queues. That would have required only a single character
change to make the SYSCTL read/write instead of read/only for each tunable in
question. To be completely future proof (i.e. to handle loading the module in
question post-boot), your module could both do setenv() and
kernel_sysctlbyname(). That seems to be a more extensible approach in terms
of allowing more of these to be changed in the future without having to do a
manual bypass of the existing tunable infrastructure for each tunable. I
would much prefer that you revert this part and change the relevant tunables
to CTLFLAG_RWTUN and update your out-of-tree module to use
kernel_sysctlbyname().
Also, if you didn't run this by the Intel folks (e.g. jfv@) that might have
been nice as a courtesy.
Also, please use the existing resource_int_value() that uses 'hint.igb.0.foo'
instead of inventing a different wheel (device_get_int()). This is what all
the other drivers in the tree that provide per-instance tunables use. You
could perhaps have device_get_int() as a wrapper around resource_int_value(),
but we should use the existing convention, not invent a conflicting one.
> Modified:
> head/sys/dev/e1000/if_igb.c
> head/sys/dev/ixgbe/ixgbe.c
> head/sys/kern/subr_bus.c
> head/sys/sys/bus.h
>
> Modified: head/sys/dev/e1000/if_igb.c
> ============================================================================
> == --- head/sys/dev/e1000/if_igb.c Wed Nov 26 18:03:25 2014 (r275135)
+++
> head/sys/dev/e1000/if_igb.c Wed Nov 26 20:19:36 2014 (r275136) @@
-188,6
> +188,7 @@ static char *igb_strings[] = {
> /*********************************************************************
> * Function prototypes
> *********************************************************************/
> +static int igb_per_unit_num_queues(SYSCTL_HANDLER_ARGS);
> static int igb_probe(device_t);
> static int igb_attach(device_t);
> static int igb_detach(device_t);
> @@ -493,6 +494,11 @@ igb_attach(device_t dev)
> OID_AUTO, "nvm", CTLTYPE_INT|CTLFLAG_RW, adapter, 0,
> igb_sysctl_nvm_info, "I", "NVM Information");
>
> + SYSCTL_ADD_PROC(device_get_sysctl_ctx(dev),
> + SYSCTL_CHILDREN(device_get_sysctl_tree(dev)),
> + OID_AUTO, "num_queues", CTLTYPE_INT | CTLFLAG_RD,
> + adapter, 0, igb_per_unit_num_queues, "I", "Number of Queues");
> +
You don't need igb_per_unit_num_queues().
SYSCTL_ADD_INT(..., &adapter->num_queues) should have been used instead.
> @@ -2831,6 +2837,7 @@ igb_setup_msix(struct adapter *adapter)
> {
> device_t dev = adapter->dev;
> int bar, want, queues, msgs, maxqueues;
> + int n_queues;
>
> /* tuneable override */
> if (igb_enable_msix == 0)
> @@ -2858,8 +2865,18 @@ igb_setup_msix(struct adapter *adapter)
> goto msi;
> }
>
> - /* Figure out a reasonable auto config value */
> - queues = (mp_ncpus > (msgs-1)) ? (msgs-1) : mp_ncpus;
> + n_queues = 0;
> + /* try more specific tunable, then global, then finally default to boot
> time tunable if set. */ + if (device_getenv_int(dev, "num_queues",
> &n_queues) != 0) {
> + device_printf(dev, "using specific tunable num_queues=%d", n_queues);
> + } else if (TUNABLE_INT_FETCH("hw.igb.num_queues", &n_queues) != 0) {
> + if (igb_num_queues != n_queues) {
> + device_printf(dev, "using global tunable hw.igb.num_queues=%d",
> n_queues); + igb_num_queues = n_queues;
> + }
> + } else {
> + n_queues = igb_num_queues;
> + }
>
> #ifdef RSS
> /* If we're doing RSS, clamp at the number of RSS buckets */
> @@ -2867,10 +2884,12 @@ igb_setup_msix(struct adapter *adapter)
> queues = rss_getnumbuckets();
> #endif
>
> -
> - /* Manual override */
> - if (igb_num_queues != 0)
> - queues = igb_num_queues;
> + if (n_queues != 0) {
> + queues = n_queues;
> + } else {
> + /* Figure out a reasonable auto config value */
> + queues = (mp_ncpus > (msgs-1)) ? (msgs-1) : mp_ncpus;
> + }
This moves the mp_ncpus auto config below the RSS bits, so now you
are ignoring the RSS value. You probably should have left the
mp_ncpus line where it was.
One old bug here is that igb checked the tunable value twice. It
should only check it once at the end of all the auto-tuning.
> /* Sanity check based on HW */
> switch (adapter->hw.mac.type) {
> @@ -2893,12 +2912,17 @@ igb_setup_msix(struct adapter *adapter)
> maxqueues = 1;
> break;
> }
> - if (queues > maxqueues)
> + if (queues > maxqueues) {
> + device_printf(adapter->dev, "requested %d queues, but max for this
> adapter is %d\n", + queues, maxqueues);
> queues = maxqueues;
> -
> - /* Manual override */
> - if (igb_num_queues != 0)
> - queues = igb_num_queues;
> + } else if (queues == 0) {
> + queues = 1;
> + } else if (queues < 0) {
> + device_printf(adapter->dev, "requested %d queues, but min for this
> adapter is %d\n", + queues, 1);
> + queues = 1;
> + }
So here is where I would check the tunable to allow it to override. Assuming
you revert the TUNABLE_INT_FETCH business then this only needs:
/* Manual override */
if (device_get_int(&n_queues) != 0)
n_queues = igb_num_queues;
if (n_queues != 0) {
if (n_queues > maxqueues || n_queues < 0)
/* whine */
else
queues = n_queues;
}
--
John Baldwin
More information about the svn-src-head
mailing list