Poor high-PPS performance of the 10G ixgbe(9) NIC/driver in FreeBSD 10.1

Fri Aug 14 17:29:56 UTC 2015

Hi guys, unfortunately no, neither reduction of the number of queues from 8
to 6 nor pinning interrupt rate at 20000 per queue have not made any
difference. The card still goes kaboom at about 200Kpps no matter what. in
fact I've gone bit further, and after the first spike went on an pushed
interrupt rate even further down to 10000, but again no difference either,
it still blows at the same mark. Although it did have effect on interrupt
rate reduction from 190K to some 130K according to the systat -vm, so that
the moderation itself seems to be working fine. We will try disabling
IXGBE_FDIR
tomorrow and see if it helps.

http://sobomax.sippysoft.com/ScreenShot391.png <- systat -vm with
max_interrupt_rate = 20000 right before overload

http://sobomax.sippysoft.com/ScreenShot392.png <- systat -vm during issue
unfolding (max_interrupt_rate = 10000)

http://sobomax.sippysoft.com/ScreenShot394.png <- cpu/net monitoring, first
two spikes are with max_interrupt_rate = 20000, the third one
max_interrupt_rate
= 10000

-Max

On Wed, Aug 12, 2015 at 5:23 AM, Luigi Rizzo <rizzo at iet.unipi.it> wrote:

> As I was telling to maxim, you should disable aim because it only matches
> the max interrupt rate to the average packet size, which is the last thing
> you want.
>
> Setting the interrupt rate with sysctl (one per queue) gives you precise
> control on the max rate and (hence, extra latency). 20k interrupts/s give
> you 50us of latency, and the 2k slots in the queue are still enough to
> absorb a burst of min-sized frames hitting a single queue (the os will
> start dropping long before that level, but that's another story).
>
> Cheers
> Luigi
>
> On Wednesday, August 12, 2015, Babak Farrokhi <farrokhi at freebsd.org>
> wrote:
>
>> I ran into the same problem with almost the same hardware (Intel X520)
>> on 10-STABLE. HT/SMT is disabled and cards are configured with 8 queues,
>> with the same sysctl tunings as sobomax@ did. I am not using lagg, no
>> FLOWTABLE.
>>
>> I experimented with pmcstat (RESOURCE_STALLS) a while ago and here [1]
>> [2] you can see the results, including pmc output, callchain, flamegraph
>> and gprof output.
>>
>> I am experiencing huge number of interrupts with 200kpps load:
>>
>> # sysctl dev.ix | grep interrupt_rate
>> dev.ix.1.queue7.interrupt_rate: 125000
>> dev.ix.1.queue6.interrupt_rate: 6329
>> dev.ix.1.queue5.interrupt_rate: 500000
>> dev.ix.1.queue4.interrupt_rate: 100000
>> dev.ix.1.queue3.interrupt_rate: 50000
>> dev.ix.1.queue2.interrupt_rate: 500000
>> dev.ix.1.queue1.interrupt_rate: 500000
>> dev.ix.1.queue0.interrupt_rate: 100000
>> dev.ix.0.queue7.interrupt_rate: 500000
>> dev.ix.0.queue6.interrupt_rate: 6097
>> dev.ix.0.queue5.interrupt_rate: 10204
>> dev.ix.0.queue4.interrupt_rate: 5208
>> dev.ix.0.queue3.interrupt_rate: 5208
>> dev.ix.0.queue2.interrupt_rate: 71428
>> dev.ix.0.queue1.interrupt_rate: 5494
>> dev.ix.0.queue0.interrupt_rate: 6250
>>
>> [1] http://farrokhi.net/~farrokhi/pmc/6/
>> [2] http://farrokhi.net/~farrokhi/pmc/7/
>>
>> Regards,
>> Babak
>>
>>
>> Alexander V. Chernikov wrote:
>> > 12.08.2015, 02:28, "Maxim Sobolev" <sobomax at FreeBSD.org>:
>> >> Olivier, keep in mind that we are not "kernel forwarding" packets, but
>> "app
>> >> forwarding", i.e. the packet goes full way
>> >> net->kernel->recvfrom->app->sendto->kernel->net, which is why we have
>> much
>> >> lower PPS limits and which is why I think we are actually benefiting
>> from
>> >> the extra queues. Single-thread sendto() in a loop is CPU-bound at
>> about
>> >> 220K PPS, and while running the test I am observing that outbound
>> traffic
>> >> from one thread is mapped into a specific queue (well, pair of queues
>> on
>> >> two separate adaptors, due to lagg load balancing action). And the peak
>> >> performance of that test is at 7 threads, which I believe corresponds
>> to
>> >> the number of queues. We have plenty of CPU cores in the box (24) with
>> >> HTT/SMT disabled and one CPU is mapped to a specific queue. This
>> leaves us
>> >> with at least 8 CPUs fully capable of running our app. If you look at
>> the
>> >> CPU utilization, we are at about 10% when the issue hits.
>> >
>> > In any case, it would be great if you could provide some profiling info
>> since there could be
>> > plenty of problematic places starting from TX rings contention to some
>> locks inside udp or even
>> > (in)famous random entropy harvester..
>> > e.g. something like pmcstat -TS instructions -w1 might be sufficient to
>> determine the reason
>> >> ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15>
>> port
>> >> 0x6020-0x603f mem 0xc7c00000-0xc7dfffff,0xc7e04000-0xc7e07fff irq 40 at
>> >> device 0.0 on pci3
>> >> ix0: Using MSIX interrupts with 9 vectors
>> >> ix0: Bound queue 0 to cpu 0
>> >> ix0: Bound queue 1 to cpu 1
>> >> ix0: Bound queue 2 to cpu 2
>> >> ix0: Bound queue 3 to cpu 3
>> >> ix0: Bound queue 4 to cpu 4
>> >> ix0: Bound queue 5 to cpu 5
>> >> ix0: Bound queue 6 to cpu 6
>> >> ix0: Bound queue 7 to cpu 7
>> >> ix0: Ethernet address: 0c:c4:7a:5e:be:64
>> >> ix0: PCI Express Bus: Speed 5.0GT/s Width x8
>> >> 001.000008 [2705] netmap_attach success for ix0 tx 8/4096 rx
>> >> 8/4096 queues/slots
>> >> ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15>
>> port
>> >> 0x6000-0x601f mem 0xc7a00000-0xc7bfffff,0xc7e00000-0xc7e03fff irq 44 at
>> >> device 0.1 on pci3
>> >> ix1: Using MSIX interrupts with 9 vectors
>> >> ix1: Bound queue 0 to cpu 8
>> >> ix1: Bound queue 1 to cpu 9
>> >> ix1: Bound queue 2 to cpu 10
>> >> ix1: Bound queue 3 to cpu 11
>> >> ix1: Bound queue 4 to cpu 12
>> >> ix1: Bound queue 5 to cpu 13
>> >> ix1: Bound queue 6 to cpu 14
>> >> ix1: Bound queue 7 to cpu 15
>> >> ix1: Ethernet address: 0c:c4:7a:5e:be:65
>> >> ix1: PCI Express Bus: Speed 5.0GT/s Width x8
>> >> 001.000009 [2705] netmap_attach success for ix1 tx 8/4096 rx
>> >> 8/4096 queues/slots
>> >>
>> >> On Tue, Aug 11, 2015 at 4:14 PM, Olivier Cochard-Labbé <
>> olivier at cochard.me>
>> >> wrote:
>> >>
>> >>>  On Tue, Aug 11, 2015 at 11:18 PM, Maxim Sobolev <sobomax at freebsd.org
>> >
>> >>>  wrote:
>> >>>
>> >>>>  Hi folks,
>> >>>>
>> >>>>  Hi,
>> >>>  
>> >>>
>> >>>>  We've trying to migrate some of our high-PPS systems to a new
>> hardware
>> >>>>  that
>> >>>>  has four X540-AT2 10G NICs and observed that interrupt time goes
>> through
>> >>>>  roof after we cross around 200K PPS in and 200K out (two ports in
>> LACP).
>> >>>>  The previous hardware was stable up to about 350K PPS in and 350K
>> out. I
>> >>>>  believe the old one was equipped with the I350 and had the
>> identical LACP
>> >>>>  configuration. The new box also has better CPU with more cores
>> (i.e. 24
>> >>>>  cores vs. 16 cores before). CPU itself is 2 x E5-2690 v3.
>> >>>  200K PPS, and even 350K PPS are very low value indeed.
>> >>>  On a Intel Xeon L5630 (4 cores only) with one X540-AT2
>> >>>
>> >>>  (then 2 10Gigabit ports) I've reached about 1.8Mpps (fastforwarding
>> >>>  enabled) [1].
>> >>>  But my setup didn't use lagg(4): Can you disable lagg configuration
>> and
>> >>>  re-measure your performance without lagg ?
>> >>>
>> >>>  Do you let Intel NIC drivers using 8 queues for port too?
>> >>>  In my use case (forwarding smallest UDP packet size), I obtain better
>> >>>  behaviour by limiting NIC queues to 4 (hw.ix.num_queues or
>> >>>  hw.ixgbe.num_queues, don't remember) if my system had 8 cores. And
>> this
>> >>>  with Gigabit Intel[2] or Chelsio NIC [3].
>> >>>
>> >>>  Don't forget to disable TSO and LRO too.
>> >>>
>> >>>  Regards,
>> >>>
>> >>>  Olivier
>> >>>
>> >>>  [1]
>> >>>
>> http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_an_ibm_system_x3550_m3_with_10-gigabit_intel_x540-at2#graphs
>> >>>  [2]
>> >>>
>> http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_superserver_5018a-ftn4#graph1
>> >>>  [3]
>> >>>
>> http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_hp_proliant_dl360p_gen8_with_10-gigabit_with_10-gigabit_chelsio_t540-cr#reducing_nic_queues
>> >> _______________________________________________
>> >> freebsd-net at freebsd.org mailing list
>> >> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> >> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>> > _______________________________________________
>> > freebsd-net at freebsd.org mailing list
>> > http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> > To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>> _______________________________________________
>> freebsd-net at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>
>
>
> --
> -----------------------------------------+-------------------------------
>  Prof. Luigi RIZZO, rizzo at iet.unipi.it  . Dip. di Ing. dell'Informazione
>  http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
>  TEL      +39-050-2217533               . via Diotisalvi 2
>  Mobile   +39-338-6809875               . 56122 PISA (Italy)
> -----------------------------------------+-------------------------------
>
>