FreeBSD 10 network performance problems

Rumen Telbizov telbizov at gmail.com
Sat Sep 20 16:12:08 UTC 2014


Hello everyone,

I am in the process of upgrading our main PF firewalls from 9.2-RC4 to
10.1-BETA1 (r271684) and as part of the process I have been testing the
forwarding capability of FreeBSD 10 (pf firewall disabled) to have a
base-line and find any bottlenecks on a 10GbE network.

My tests show that for the first 3-4Gbit/s of traffic things are great and
then the machine simply "hits the wall" at around 4-5Gbit/s of traffic.
There's no gradual degradation but a hard drop to 0% idle and 50-50% split
of system and interrupt CPU load. I ran some diagnostics and I was hoping
someone could point me in the right direction as to what is happening and
what I could do to improve the situation. The details I have collected so
far are below:

I run multiple iperf tcp multithreaded instances to generate traffic which
traverses the firewall. As mentioned above for the first 3-4Gbit/s traffic
the machine doesn't even break a sweat.

*top header* during this load when things are good:
CPU 0:   0.0% user,  0.0% nice,  0.0% system, 27.6% interrupt, 72.4% idle
CPU 1:   0.0% user,  0.0% nice,  0.0% system, 27.6% interrupt, 72.4% idle
CPU 2:   0.0% user,  0.0% nice,  0.0% system,  8.7% interrupt, 91.3% idle
CPU 3:   0.0% user,  0.0% nice,  0.0% system, 17.3% interrupt, 82.7% idle
CPU 4:   0.0% user,  0.0% nice,  0.0% system, 12.6% interrupt, 87.4% idle
CPU 5:   0.0% user,  0.0% nice,  0.0% system,  4.7% interrupt, 95.3% idle
CPU 6:   0.0% user,  0.0% nice,  0.0% system, 13.4% interrupt, 86.6% idle
CPU 7:   0.0% user,  0.0% nice,  0.0% system, 26.0% interrupt, 74.0% idle
CPU 8:   0.0% user,  0.0% nice,  0.0% system, 16.5% interrupt, 83.5% idle
CPU 9:   0.0% user,  0.0% nice,  0.0% system,  1.6% interrupt, 98.4% idle
CPU 10:  0.0% user,  0.0% nice,  0.0% system, 19.7% interrupt, 80.3% idle
CPU 11:  0.0% user,  0.0% nice,  0.0% system,  7.1% interrupt, 92.9% idle
Full output at http://pastebin.com/gaaisXV8

*bmon* at the same time:
Interfaces                     │ RX bps       pps     %│ TX bps
pps     %
  ix0                          │ 240.33MiB 242.20K     │ 221.41MiB 236.68K
  ix1                          │ 246.51MiB 248.80K     │ 261.61MiB 250.95K
 >lagg0                        │ 483.45MiB 492.42K     │ 479.54MiB 488.02K

     MiB                      (RX
Bytes/second)                                MiB                      (TX
Bytes/second)
  499.17 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
496.49 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
  415.98 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
413.74 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
  332.78 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
331.00 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
  249.59 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
248.25 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
  166.39 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
165.50 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
   83.20
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||        82.75
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
         1   5   10   15   20   25   30   35   40   45   50   55
60              1   5   10   15   20   25   30   35   40   45   50   55   60
       K                     (RX
Packets/second)                                 K                     (TX
Packets/second)
  508.27 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
505.14 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
  423.56 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
420.95 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
  338.85 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
336.76 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
  254.14 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
252.57 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
  169.42 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
168.38 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
   84.71
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||        84.19
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
         1   5   10   15   20   25   30   35   40   45   50   55
60              1   5   10   15   20   25   30   35   40   45   50   55   60

To profile it without having it degrade sooner I reduced traffic to 2Gbit/s
and ran dtrace hotkernel and lock profiling. Here are the results:

*/usr/share/dtrace/toolkit/hotkernel* for 60 seconds:
kernel`__mtx_lock_flags                                  5812   0.8%
kernel`__mtx_unlock_flags                                7200   1.0%
kernel`acpi_cpu_c1                                       7799   1.1%
kernel`__rw_rlock                                       11196   1.5%
kernel`spinlock_exit                                    14547   2.0%
kernel`cpu_idle                                        166700  22.8%
kernel`sched_idletd                                    461883  63.1%
Full output at http://pastebin.com/w7WfFwPG

*lock profiling* for 60 seconds:
$ head -n 2 good-locks ; cat good-locks | sort -n -k 4 | tail -n6
debug.lock.prof.stats:
     max  wait_max       total  wait_total       count    avg wait_avg
cnt_hold cnt_lock name
      22        24       94378       18481      264549      0
0        0    18639 /usr/src/sys/kern/sched_ule.c:886 (spin mutex:sched
lock 10)
      22        24      112366       20360      219179      0
0        0    17220 /usr/src/sys/kern/sched_ule.c:888 (spin mutex:sched
lock 2)
      18       319        3486       22352        4233      0
5        0     1640 /usr/src/sys/kern/subr_taskqueue.c:344 (sleep
mutex:taskqueue)
      26        66     3219768      204875    14616220      0
0        0   133154 /usr/src/sys/net/route.c:439 (sleep mutex:rtentry)
      25        90     1923012     2353820    14615738      0
0        0  1562097 /usr/src/sys/netinet/ip_fastfwd.c:593 (sleep
mutex:rtentry)
      26        91     1398443     2391458    14616137      0
0        0  1604332 /usr/src/sys/netinet/in_rmx.c:114 (sleep mutex:rtentry)
Full output at http://pastebin.com/qiG3ZuAH

Again, above measurements demonstrate the state of the good/healthy system
under moderate traffic load of 3-4Gbit/s with pf disabled and fast
forwarding enabled. Here are the same measurements when I add an additional
1Gbit/s of traffic. The point when performance tanks varies between
4-5Gbit/s but it's always sudden, without any gradual degradation but
instead idle simply drops down to 0. Let's take a look:

*top header* during this load when things are bad:
CPU 0:   0.0% user,  0.0% nice, 44.6% system, 55.4% interrupt,  0.0% idle
CPU 1:   0.0% user,  0.0% nice, 15.1% system, 84.9% interrupt,  0.0% idle
CPU 2:   0.0% user,  0.0% nice, 59.0% system, 40.3% interrupt,  0.7% idle
CPU 3:   0.0% user,  0.0% nice, 57.6% system, 42.4% interrupt,  0.0% idle
CPU 4:   0.0% user,  0.0% nice, 34.5% system, 63.3% interrupt,  2.2% idle
CPU 5:   0.0% user,  0.0% nice, 51.8% system, 48.2% interrupt,  0.0% idle
CPU 6:   0.0% user,  0.0% nice, 33.8% system, 66.2% interrupt,  0.0% idle
CPU 7:   0.7% user,  0.0% nice, 49.6% system, 49.6% interrupt,  0.0% idle
CPU 8:   0.0% user,  0.0% nice, 66.2% system, 33.8% interrupt,  0.0% idle
CPU 9:   0.0% user,  0.0% nice, 35.3% system, 64.7% interrupt,  0.0% idle
CPU 10:  0.0% user,  0.0% nice, 54.7% system, 45.3% interrupt,  0.0% idle
CPU 11:  0.0% user,  0.0% nice, 34.5% system, 65.5% interrupt,  0.0% idle
Full output at http://pastebin.com/9an9ZWv2

*bmon *at the same time shows inconsistent performance with big dips:
Interfaces                     │ RX bps       pps     %│ TX bps
pps     %
  ix0                          │ 153.89MiB 151.53K     │ 179.69MiB 159.91K
  ix1                          │ 176.56MiB 161.29K     │ 145.17MiB 148.13K
 >lagg0                        │ 327.23MiB 333.05K     │ 322.14MiB 328.13K

     MiB                      (RX
Bytes/second)                                   MiB
(TX Bytes/second)
  657.86
..............|.............................................
648.60 ..............|.............................................
  548.21
..............|.............................................
540.50 ..............|.............................................
  438.57
...........|..|.........................|.....|.......|..|..
432.40 ...........|..|.........................|.....|.......|..|..
  328.93
|..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||..
324.30 |..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||..
  219.29
|..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
216.20 |..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
  109.64
|.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
108.10 |.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
         1   5   10   15   20   25   30   35   40   45   50   55
60                 1   5   10   15   20   25   30   35   40   45   50
55   60
       K                     (RX
Packets/second)                                    K
(TX Packets/second)
  670.41
..............|.............................................
660.27 ..............|.............................................
  558.67
..............|.............................................
550.22 ..............|.............................................
  446.94
...........|..|.........................|.....|.......|..|..
440.18 ...........|..|.........................|.....|.......|..|..
  335.20
|..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||..
330.13 |..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||..
  223.47
|..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
220.09 |..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
  111.73
|.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
110.04 |.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
         1   5   10   15   20   25   30   35   40   45   50   55
60                 1   5   10   15   20   25   30   35   40   45   50
55   60

*/usr/share/dtrace/toolkit/hotkernel* for 60 seconds:
kernel`_rw_runlock_cookie                                7709   1.1%
kernel`__rw_rlock                                       11182   1.6%
kernel`ip_fastforward                                   12231   1.7%
kernel`__mtx_lock_flags                                 22004   3.1%
kernel`__mtx_unlock_flags                               35614   5.0%
kernel`__mtx_lock_sleep                                560768  78.5%
Full output at http://pastebin.com/NurKwkWL

*lock profiling* for 60 seconds:
$ head -n 2 bad-locks ; cat bad-locks | sort -n -k 4 | tail -n6
debug.lock.prof.stats:
     max  wait_max       total  wait_total       count    avg wait_avg
cnt_hold cnt_lock name
  401766    191987     1857397      194162         179  10376   1084
0      2 /usr/src/sys/kern/kern_sysctl.c:1601 (sx:sysctl mem)
   21064    207907       62556      249066         396    157    628
0      3 /usr/src/sys/kern/kern_sysctl.c:1499 (sleep mutex:Giant)
       1    370663          17      372573          86      0   4332
0      2 /usr/src/sys/kern/kern_exit.c:429 (sx:allproc)
      14       648     8856844    46296098    15513956      0      2  0
1467849 /usr/src/sys/net/route.c:439 (sleep mutex:rtentry)
      15       958    13107581   252445472    15513486      0     16  0
9444644 /usr/src/sys/netinet/ip_fastfwd.c:593 (sleep mutex:rtentry)
      12       779    12500816   286324556    15513872      0     18  0
9788497 /usr/src/sys/netinet/in_rmx.c:114 (sleep mutex:rtentry)
Full output at http://pastebin.com/d54QmP13

System hardware is: 12 x E5-2440 @ 2.40GHz, 24GB RAM, Dual port fiber
Intel(R) PRO/10GbE in lacp lagg
System configuration details available at http://pastebin.com/tPvs1MeD

It seems to me like heavy lock contention but I don't understand why it
happens in such an abrupt manner after a given threshold.
Other things I tried:

   - Upgrade ix (ixgbe) driver to the latest from Intel (2.5.25) - for some
   reason I cannot send any packets out
   - enable flowtables - no difference

Any help is highly appreciated. I could provide further details and run
additional tests upon request.

Regrads,
-- 
Rumen Telbizov
Unix Systems Administrator <http://telbizov.com>


More information about the freebsd-stable mailing list