Poor high-PPS performance of the 10G ixgbe(9) NIC/driver in FreeBSD 10.1

Tue Aug 11 23:05:00 UTC 2015

Here it is, the distribution looks pretty normal  to me.

dev.ix.0.queue0.tx_packets: 846233384
dev.ix.0.queue0.rx_packets: 856092418
dev.ix.0.queue1.tx_packets: 980356163
dev.ix.0.queue1.rx_packets: 922935329
dev.ix.0.queue2.tx_packets: 970700307
dev.ix.0.queue2.rx_packets: 907776311
dev.ix.0.queue3.tx_packets: 951911927
dev.ix.0.queue3.rx_packets: 903933007
dev.ix.0.queue4.tx_packets: 960075438
dev.ix.0.queue4.rx_packets: 909830391
dev.ix.0.queue5.tx_packets: 957304026
dev.ix.0.queue5.rx_packets: 889722162
dev.ix.0.queue6.tx_packets: 946175921
dev.ix.0.queue6.rx_packets: 898922310
dev.ix.0.queue7.tx_packets: 936382885
dev.ix.0.queue7.rx_packets: 890026885
dev.ix.1.queue0.tx_packets: 844847347
dev.ix.1.queue0.rx_packets: 840770906
dev.ix.1.queue1.tx_packets: 978807036
dev.ix.1.queue1.rx_packets: 906148213
dev.ix.1.queue2.tx_packets: 969026390
dev.ix.1.queue2.rx_packets: 906644000
dev.ix.1.queue3.tx_packets: 950384414
dev.ix.1.queue3.rx_packets: 890646445
dev.ix.1.queue4.tx_packets: 958536903
dev.ix.1.queue4.rx_packets: 887900309
dev.ix.1.queue5.tx_packets: 955802045
dev.ix.1.queue5.rx_packets: 884884583
dev.ix.1.queue6.tx_packets: 944802927
dev.ix.1.queue6.rx_packets: 883266179
dev.ix.1.queue7.tx_packets: 934953601
dev.ix.1.queue7.rx_packets: 886399283

On Tue, Aug 11, 2015 at 3:16 PM, hiren panchasara <
hiren at strugglingcoder.info> wrote:

> On 08/11/15 at 03:01P, Adrian Chadd wrote:
> > hi,
> >
> > Are you able to graph per-queue interrupt rates?
> >
> > It looks like the traffic is distributed differently (the first two
> > queues are taking interrupts).
>
> Yeah, also check out "# sysctl dev.ix | grep packets"
> >
> > Does 10.1 have the flow director code disabled? I remember there was
> > some .. interesting behaviour with ixgbe where it'd look at traffic
> > and set up flow director rules to try and "balance" things. It was
> > buggy and programmed the hardware badly, so we disabled it in at least
> > -HEAD.
>
> Looks like we don't build with IXGBE_FDIR by default on 10 so I assume
> it's off.
>
> There were some lagg/hashing related changes recently so let us know if
> that is hurting you.
>
> Cheers,
> Hiren
> >
> >
> >
> > -adrian
> >
> >
> > On 11 August 2015 at 14:18, Maxim Sobolev <sobomax at freebsd.org> wrote:
> > > Hi folks,
> > >
> > > We've trying to migrate some of our high-PPS systems to a new hardware
> that
> > > has four X540-AT2 10G NICs and observed that interrupt time goes
> through
> > > roof after we cross around 200K PPS in and 200K out (two ports in
> LACP).
> > > The previous hardware was stable up to about 350K PPS in and 350K out.
> I
> > > believe the old one was equipped with the I350 and had the identical
> LACP
> > > configuration. The new box also has better CPU with more cores (i.e. 24
> > > cores vs. 16 cores before). CPU itself is 2 x E5-2690 v3.
> > >
> > > After hitting this limit with the default settings, I've tried to
> tweak the
> > > following settings:
> > >
> > > hw.ix.rx_process_limit="-1"
> > > hw.ix.tx_process_limit="-1"
> > > hw.ix.enable_aim="0"
> > > hw.ix.max_interrupt_rate="-1"
> > > hw.ix.rxd="4096"
> > > hw.ix.txd="4096"
> > >
> > > dev.ix.0.fc=0
> > > dev.ix.1.fc=0
> > > dev.ix.2.fc=0
> > > dev.ix.3.fc=0
> > >
> > > hw.intr_storm_threshold=0
> > >
> > > But there is little or no effect on the performance. The workload is
> just
> > > lot of small UDP packets being relayed between bunch of hosts. The
> symptoms
> > > are always the same - the box runs nice and cool until it his the said
> PPS
> > > threshold, with kernel spending just few percent in the interrupts and
> then
> > > it jumps straight to 100% interrupt time, thereby scaring some traffic
> away
> > > due to packet loss and such, so that the load drops and the system goes
> > > into the "cool" state again. It looks very much like some contention
> in the
> > > driver or in the hardware. Linked are some monitoring screenshots
> > > displaying the issue unfolding as well as systat -vm screenshots from
> the
> > > "cool" state.
> > >
> > > http://sobomax.sippysoft.com/ScreenShot387.png <- CPU utilization
> right
> > > before the "bang event"
> > > http://sobomax.sippysoft.com/ScreenShot382.png <- issue itself
> > > http://sobomax.sippysoft.com/ScreenShot385.png <- systat -vm few
> minutes
> > > after traffic declined somewhat
> > >
> > > We are now trying to get customer install 1Gig NIC so that we can run
> it
> > > and compare performance with the rest of the hardware and software
> being
> > > essentially the same.
> > >
> > > Any ideas on how to improve/resolve this problem are welcome. Thanks!
> > > _______________________________________________
> > > freebsd-net at freebsd.org mailing list
> > > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > > To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
> > _______________________________________________
> > freebsd-net at freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>