Exposing full 32bit RSS hash from card for ixgbe(4)

Mon Aug 10 13:33:13 UTC 2015

     On Wednesday, August 5, 2015 4:28 PM, Kevin Oberman <rkoberman at gmail.com> wrote:

 On Wed, Aug 5, 2015 at 7:10 AM, Barney Cordoba via freebsd-net <
freebsd-net at freebsd.org> wrote:

>
>
>
>      On Wednesday, August 5, 2015 2:19 AM, Olivier Cochard-Labbé <
> olivier at cochard.me> wrote:
>
>
>  On Wed, Aug 5, 2015 at 1:15 AM, Barney Cordoba via freebsd-net <
> freebsd-net at freebsd.org> wrote:
>
> > What's the point of all of this gobbledygook anyway? Seriously, 99% of
> the
> > world needs a driver that passes packets in the most efficient way, and
> > every time I look at igb and ixgbe it has another 2 heads. It's up to 8
> > heads, and none of the things wrong with it have been fixed. This is now
> > even uglier than Kip Macy's cxgb abortion.
> > I'm not trying to be snarky here. I wrote a simple driver 3 years ago
> that
> > runs and runs and uses little cpu; maybe 8% for a full gig load on an E3.
> >
>
> Hi,
>
> I will be very happy to bench your simple driver. Where can I download the
> sources ?
>
> Thanks,
>
> Olivier
> _______________________________________________
>
> Another unproductive dick head on the FreeBSD team? Figures.
>

A typical Barney thread. First he calls the developers incompetent and says
he has done better. Then someone who has experience in real world
benchmarking (not a trivial thing) offers to evaluate Barney's code, and
gets a quick, rude, obscene dismissal. Is it any wonder that, even though
he made some valid arguments (at least for some workloads), almost everyone
just dismisses him as too obnoxious to try to deal with.

Based on my pre-retirement work with high-performance networking, in some
cases it was clear that it would be better to locking down things to a
single CPU on with FreeBSD or Linux. I can further state that this was NOT
true for all workloads, so it is quite possible that Barney's code works
for some cases (perhaps his) and would be bad in others. But without good
benchmarking, it's hard to tell.

I will say that for large volume data transfers (very large flows), a
single CPU solution does work best. But if Barney is going at this with his
usual attitude, it's probably  not worth it to continue the discussion.
--
the "give us the source and we'll test it" nonsense is kindergarden stuff. As if my code is open source and you can just have it, and like you know how to benchmark anything since you can't even benchmark what you have. 
Some advice is to ignore guys like Oberman who spent their lives randomly pounding networks on slow machines with slow busses and bad NICs on OS's that couldn't do SMP properly. Because he'll just lead you down the road to dusty death. Multicore design isn't simple math; its about efficiency, lock minimization and the understanding that shifting memory between cpus unnecessarily is costly. Today's CPUs and NICs can't be judged using test methods of the past. You'll just end up playing the Microsoft Windows game; get bigger machines and more memory and don't worry about the fact that the code is junk.
It's just that the "default" in these drivers is so obviously wrong that it's mind-boggling. The argument to use 1, 2 or 4 queues is one worth having; using "all" of the cpus, including the hyperthreads, is just plain incompetent.
I will contribute one possibly useful tidbit:
disable_queue() only disables receive interrupts. Both tx and rx ints are effectively tied together by moderation so you'll just getan interrupt at the next slot anyway.
BC