NET.ISR and CPU utilization performance w/ HP DL 585 using FreeBSD 7.1 Beta2

Mon Nov 17 01:19:36 PST 2008

> ----- Original Message ----

> From: Won De Erick <won.derick at yahoo.com>
> To: Jeremy Chadwick <koitsu at FreeBSD.org>
> Cc: rwatson at freebsd.org; freebsd-hackers at freebsd.org
> Sent: Sunday, November 16, 2008 7:18:46 PM
> Subject: Re: NET.ISR and CPU utilization performance w/ HP DL 585 using FreeBSD 7.1 Beta2
> 
> 
> ----- Original Message ----
> 
> > From: Jeremy Chadwick <koitsu at FreeBSD.org>
> > To: Won De Erick <won.derick at yahoo.com>
> > Cc: rwatson at freebsd.org; freebsd-hackers at freebsd.org
> > Sent: Saturday, November 15, 2008 10:16:31 PM
> > Subject: Re: NET.ISR and CPU utilization performance w/ HP DL 585 using FreeBSD 7.1 Beta2
> > 
> > On Sat, Nov 15, 2008 at 04:59:16AM -0800, Won De Erick wrote:
> > > Hello,
> > > 
> > > I tested HP DL 585 (16 CPUs, w/ built-in Broadcom NICs) running FreeBSD 7.1 Beta2 under heavy network traffic (TCP).
> > > 
> > > SCENARIO A : Bombarded w/ TCP traffic:
> > > 
> > > When net.isr.direct=1,
> > > 
> > >   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
> > >    52 root        1 -68    -     0K    16K CPU11  b  38:43 95.36% irq32: bce1
> > >    51 root        1 -68    -     0K    16K CPU10  a  25:50 85.16% irq31: bce0
> > >    16 root        1 171 ki31     0K    16K RUN    a  65:39 15.97% idle: cpu10
> > >    28 root        1 -32    -     0K    16K WAIT   8  12:28  5.18% swi4: clock sio
> > >    15 root        1 171 ki31     0K    16K RUN    b  52:46  3.76% idle: cpu11
> > >    45 root        1 -64    -     0K    16K WAIT   7   7:29  1.17% irq17: uhci0
> > >    47 root        1 -64    -     0K    16K WAIT   6   1:11  0.10% irq16: ciss0
> > >    27 root        1 -44    -     0K    16K WAIT   0  28:52  0.00% swi1: net
> > > 
> > > When net.isr.direct=0,
> > > 
> > >    16 root        1 171 ki31     0K    16K CPU10  a 106:46 92.58% idle: cpu10
> > >    19 root        1 171 ki31     0K    16K CPU7   7 133:37 89.16% idle: cpu7
> > >    27 root        1 -44    -     0K    16K WAIT   0  52:20 76.37% swi1: net
> > >    25 root        1 171 ki31     0K    16K RUN    1 132:30 70.26% idle: cpu1
> > >    26 root        1 171 ki31     0K    16K CPU0   0 111:58 64.36% idle: cpu0
> > >    15 root        1 171 ki31     0K    16K CPU11  b  81:09 57.76% idle: cpu11
> > >    52 root        1 -68    -     0K    16K WAIT   b  64:00 42.97% irq32: bce1
> >    51 root        1 -68    -     0K    16K WAIT   a  38:22 12.26% irq31: bce0
> > >    45 root        1 -64    -     0K    16K WAIT   7  11:31 12.06% irq17: uhci0
> > >    47 root        1 -64    -     0K    16K WAIT   6   1:54  3.66% irq16: ciss0
> > >    28 root        1 -32    -     0K    16K WAIT   8  16:01  0.00% swi4: clock sio
> > > 
> > > Overall CPU utilization has significantly dropped, but I noticed that swi1 has taken CPU0 with high utilization when the net.isr.direct=0.
> > > What does this mean?
> > > 
> > > SCENARIO B : Bombarded w/ more TCP traffic:
> > > 
> > > Worst thing, the box has become unresponsive (can't be PINGed, inaccessible through SSH) after more traffic was added retaining net.isr.direct=0.
> > > This is due maybe to the 100% utilization on CPU0 for sw1:net (see below result, first line). bce's and swi's seem to race each other based on the > result when net.isr.direct=1, swi1 . 
> > > The rest of the CPUs are sitting pretty (100% Idle). Can you shed some lights on this?
> > > 
> > > When net.isr.direct=0:
> > >    27 root        1 -44    -     0K    16K CPU0   0   5:45 100.00% swi1: net
> > >    11 root        1 171 ki31     0K    16K CPU15  0   0:00 100.00% idle: cpu15
> > >    13 root        1 171 ki31     0K    16K CPU13  0   0:00 100.00% idle: cpu13
> > >    17 root        1 171 ki31     0K    16K CPU9   0   0:00 100.00% idle: cpu9
> > >    18 root        1 171 ki31     0K    16K CPU8   0   0:00 100.00% idle: cpu8
> > >    21 root        1 171 ki31     0K    16K CPU5   5 146:17 99.17% idle: cpu5
> > >    22 root        1 171 ki31     0K    16K CPU4   4 146:17 99.07% idle: cpu4
> > >    14 root        1 171 ki31     0K    16K CPU12  0   0:00 99.07% idle: cpu12
> > >    16 root        1 171 ki31     0K    16K CPU10  a 109:33 98.88% idle: cpu10
> > >    15 root        1 171 ki31     0K    16K CPU11  b  86:36 93.55% idle: cpu11
> > >    52 root        1 -68    -     0K    16K WAIT   b  59:42 13.87% irq32: bce1
> > > 
> > > When net.isr.direct=1,
> > >    52 root        1 -68    -     0K    16K CPU11  b  55:04 97.66% irq32: bce1
> > >    51 root        1 -68    -     0K    16K CPU10  a  33:52 73.88% irq31: bce0
> > >    16 root        1 171 ki31     0K    16K RUN    a 102:42 26.86% idle: cpu10
> > >    15 root        1 171 ki31     0K    16K RUN    b  81:20  3.17% idle: cpu11
> > >    28 root        1 -32    -     0K    16K WAIT   e  13:40  0.00% swi4: clock sio
> > > 
> > > With regards to bandwidth in all scenarios above, the result is extremely low (expected is several hundred Mb/s). Why? 
> 
> The below result should be under scenario B above only. 
> 
> > > 
> > >   -         iface                   Rx                   Tx                Total
> > >   ==============================================================================
> > >              bce0:           4.69 Mb/s           10.49 Mb/s           15.18 Mb/s
> > >              bce1:          20.66 Mb/s            4.68 Mb/s           25.34 Mb/s
> > >               lo0:           0.00  b/s            0.00  b/s            0.00  b/s
> > >   ------------------------------------------------------------------------------
> > >             total:          25.35 Mb/s           15.17 Mb/s           40.52 Mb/s
> > > 
> > > 
> > > Thanks,
> > > 
> > > Won
> > 
> > And does this behaviour change if you use some other brand of NIC?
> 
> With Intel Pro NIC ( 82571):
> 
> When net.isr.direct=1,
> 
>    49 root        1 -68    -     0K    16K CPU12  c   6:50 100.00% em0 taskq
>    15 root        1 171 ki31     0K    16K CPU11  b   5:47 100.00% idle: cpu11
>    50 root        1 -68    -     0K    16K CPU13  d   6:15 86.96% em1 taskq
>    25 root        1 171 ki31     0K    16K CPU1   1   9:27 79.79% idle: cpu1
>    28 root        1 -32    -     0K    16K WAIT   1   1:33 22.75% swi4: clock sio
>    13 root        1 171 ki31     0K    16K RUN    d   4:14 12.26% idle: cpu13
>    14 root        1 171 ki31     0K    16K RUN    c   3:37  0.00% idle: cpu12
> 
> em0 and em1 have high CPU utilization, and with netstat, there were packet errors.
> 
> # netstat -I em0 -w 1 -d
>             input          (em0)           output
>    packets  errs      bytes    packets  errs      bytes colls drops
>      15258  3066   22748316      18468     0    4886567     0     0
>      15461  3096   22783724      18379     0    5350130     0     0
> 
> 
> When net.isr.direct=0,
>    12 root        1 171 ki31     0K    16K CPU14  e  22:28 100.00% idle: cpu14
>    20 root        1 171 ki31     0K    16K CPU6   6  24:32 97.85% idle: cpu6
>    25 root        1 171 ki31     0K    16K RUN    1  21:51 96.97% idle: cpu1
>    27 root        1 -44    -     0K    16K CPU2   2   5:12 91.55% swi1: net
>    13 root        1 171 ki31     0K    16K CPU13  d  11:04 86.96% idle: cpu13
>    14 root        1 171 ki31     0K    16K CPU12  c  10:51 81.59% idle: cpu12
>    49 root        1 -68    -     0K    16K CPU12  c  13:48 22.17% em0 taskq
>    24 root        1 171 ki31     0K    16K RUN    2  19:16 12.16% idle: cpu2
>    50 root        1 -68    -     0K    16K -      d  13:34 11.87% em1 taskq
>    28 root        1 -32    -     0K    16K WAIT   3   3:48  0.00% swi4: clock sio
> 
> sw1:net is taking high CPU utilization this time, but without packet errors:
> 
> # netstat -I em0 -w 1 -d
>             input          (em0)           output
>    packets  errs      bytes    packets  errs      bytes colls drops
>       4275     0    5528012      24878     0   24162198     0     0
>       4317     0    5585954      24880     0   24066583     0     0
> 
> 
> Is this related to the context switching in FreeBSD 7.x? I noticed that there were no significant difference in enabling and disabling net.isr.direct in FreeBSD > 6.2.
> Also, is there any significance of enabling device polling?
> 
> > 
> > -- 
> > | Jeremy Chadwick                                jdc at parodius.com |
> > | Parodius Networking                      http://www.parodius.com/ |
> > | UNIX Systems Administrator                  Mountain View, CA, USA |
> > | Making life hard for others since 1977.              PGP: 4BD6C0CB |
> 
> 

I compiled the following em driver for Intel NIC Pro (82571) w/ FreeBSD 7.1 Beta 2 on HPDL 585 machine having 16CPUs.

http://people.yandex-team.ru/~wawa/

With net.isr.direct=1, I made some changes on kthreads(default=2) for em0 and em1's rx.

dev.em.0.rx_kthreads: 6
....
dev.em.1.rx_kthreads: 6

With these settings, the result is:

CPU:  0.0% user,  0.0% nice, 57.2% system,  3.6% interrupt, 39.2% idle
Mem: 17M Active, 7228K Inact, 156M Wired, 76K Cache, 21M Buf, 31G Free
Swap: 4096M Total, 4096M Free

  PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
   52 root        1  43    -     0K    16K CPU12  c  41:38 100.00% em0_rx_kthread_1
   51 root        1  43    -     0K    16K CPU3   3  41:38 100.00% em0_rx_kthread_0
   54 root        1 -68    -     0K    16K CPU2   2  39:39 100.00% em1_txcleaner
 1283 root        1  43    -     0K    16K CPU1   1  38:55 100.00% em0_rx_kthread_3
 1282 root        1  43    -     0K    16K CPU10  a  38:55 100.00% em0_rx_kthread_2
 1344 root        1  43    -     0K    16K CPU9   9  25:51 100.00% em0_rx_kthread_5
 1343 root        1  43    -     0K    16K CPU4   4  25:51 100.00% em0_rx_kthread_4
   12 root        1 171 ki31     0K    16K CPU14  e  44:28 91.70% idle: cpu14
   11 root        1 171 ki31     0K    16K CPU15  f  35:18 76.86% idle: cpu15
   19 root        1 171 ki31     0K    16K RUN    7  24:56 70.46% idle: cpu7
   20 root        1 171 ki31     0K    16K CPU6   6  35:23 69.38% idle: cpu6
   15 root        1 171 ki31     0K    16K CPU11  b  34:33 65.97% idle: cpu11
   18 root        1 171 ki31     0K    16K CPU8   8  40:24 64.45% idle: cpu8
   13 root        1 171 ki31     0K    16K CPU13  d  42:07 61.96% idle: cpu13
   21 root        1 171 ki31     0K    16K CPU5   5  21:35 58.79% idle: cpu5
   28 root        1 -32    -     0K    16K WAIT   8  33:23 57.08% swi4: clock sio
   25 root        1 171 ki31     0K    16K RUN    1  18:13 50.00% idle: cpu1
 1347 root        1  43    -     0K    16K WAIT   5  10:48 44.68% em1_rx_kthread_5
   55 root        1  43    -     0K    16K RUN    0  18:46 43.65% em1_rx_kthread_0
   56 root        1  43    -     0K    16K WAIT   6  18:50 42.97% em1_rx_kthread_1
 1280 root        1  43    -     0K    16K WAIT   d  16:59 41.46% em1_rx_kthread_3
 1279 root        1  43    -     0K    16K WAIT   7  17:00 41.06% em1_rx_kthread_2
 1346 root        1  43    -     0K    16K WAIT   b  10:47 40.77% em1_rx_kthread_4
   26 root        1 171 ki31     0K    16K RUN    0  19:38 10.79% idle: cpu0
   50 root        1 -68    -     0K    16K WAIT   f   1:41  3.86% em0_txcleaner
   24 root        1 171 ki31     0K    16K RUN    2  30:28  0.00% idle: cpu2
   16 root        1 171 ki31     0K    16K RUN    a  29:39  0.00% idle: cpu10
   17 root        1 171 ki31     0K    16K RUN    9  27:08  0.00% idle: cpu9
   14 root        1 171 ki31     0K    16K RUN    c  21:58  0.00% idle: cpu12
   23 root        1 171 ki31     0K    16K RUN    3  11:36  0.00% idle: cpu3
   22 root        1 171 ki31     0K    16K RUN    4  10:24  0.00% idle: cpu4
   27 root        1 -44    -     0K    16K WAIT   2   3:04  0.00% swi1: net

I am happy to see that more processors are now working, but the kthreads are consuming HIGH CPU utilizations.

is there any other things that I can look into and set to minimize CPU utilization for the threads?

thanks,

won