Poor high-PPS performance of the 10G ixgbe(9) NIC/driver in FreeBSD 10.1

hiren panchasara hiren at strugglingcoder.info
Tue Aug 11 22:16:28 UTC 2015


On 08/11/15 at 03:01P, Adrian Chadd wrote:
> hi,
> 
> Are you able to graph per-queue interrupt rates?
> 
> It looks like the traffic is distributed differently (the first two
> queues are taking interrupts).

Yeah, also check out "# sysctl dev.ix | grep packets"
> 
> Does 10.1 have the flow director code disabled? I remember there was
> some .. interesting behaviour with ixgbe where it'd look at traffic
> and set up flow director rules to try and "balance" things. It was
> buggy and programmed the hardware badly, so we disabled it in at least
> -HEAD.

Looks like we don't build with IXGBE_FDIR by default on 10 so I assume
it's off.

There were some lagg/hashing related changes recently so let us know if
that is hurting you.

Cheers,
Hiren
> 
> 
> 
> -adrian
> 
> 
> On 11 August 2015 at 14:18, Maxim Sobolev <sobomax at freebsd.org> wrote:
> > Hi folks,
> >
> > We've trying to migrate some of our high-PPS systems to a new hardware that
> > has four X540-AT2 10G NICs and observed that interrupt time goes through
> > roof after we cross around 200K PPS in and 200K out (two ports in LACP).
> > The previous hardware was stable up to about 350K PPS in and 350K out. I
> > believe the old one was equipped with the I350 and had the identical LACP
> > configuration. The new box also has better CPU with more cores (i.e. 24
> > cores vs. 16 cores before). CPU itself is 2 x E5-2690 v3.
> >
> > After hitting this limit with the default settings, I've tried to tweak the
> > following settings:
> >
> > hw.ix.rx_process_limit="-1"
> > hw.ix.tx_process_limit="-1"
> > hw.ix.enable_aim="0"
> > hw.ix.max_interrupt_rate="-1"
> > hw.ix.rxd="4096"
> > hw.ix.txd="4096"
> >
> > dev.ix.0.fc=0
> > dev.ix.1.fc=0
> > dev.ix.2.fc=0
> > dev.ix.3.fc=0
> >
> > hw.intr_storm_threshold=0
> >
> > But there is little or no effect on the performance. The workload is just
> > lot of small UDP packets being relayed between bunch of hosts. The symptoms
> > are always the same - the box runs nice and cool until it his the said PPS
> > threshold, with kernel spending just few percent in the interrupts and then
> > it jumps straight to 100% interrupt time, thereby scaring some traffic away
> > due to packet loss and such, so that the load drops and the system goes
> > into the "cool" state again. It looks very much like some contention in the
> > driver or in the hardware. Linked are some monitoring screenshots
> > displaying the issue unfolding as well as systat -vm screenshots from the
> > "cool" state.
> >
> > http://sobomax.sippysoft.com/ScreenShot387.png <- CPU utilization right
> > before the "bang event"
> > http://sobomax.sippysoft.com/ScreenShot382.png <- issue itself
> > http://sobomax.sippysoft.com/ScreenShot385.png <- systat -vm few minutes
> > after traffic declined somewhat
> >
> > We are now trying to get customer install 1Gig NIC so that we can run it
> > and compare performance with the rest of the hardware and software being
> > essentially the same.
> >
> > Any ideas on how to improve/resolve this problem are welcome. Thanks!
> > _______________________________________________
> > freebsd-net at freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 618 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-net/attachments/20150811/e4c6753a/attachment.bin>


More information about the freebsd-net mailing list