Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp]

Tue Jul 1 22:47:29 UTC 2008

Ok, now THIS is absoultely a whole bunch of ridiculousness..
I set up etherchannel, and I'm evenly distributing packets over em0 em1 
and em2 to lagg0
and i get WORSE performance than with a single interface..  Can anyone 
explain this one? This is horrible.
I got em0-em2 taskq's using 80% cpu EACH and they are only doing 100kpps 
EACH

looks:

 packets  errs      bytes    packets  errs      bytes colls
    105050 11066    6303000          0     0          0     0
    104952 13969    6297120          0     0          0     0
    104331 12121    6259860          0     0          0     0

           input          (em1)           output
   packets  errs      bytes    packets  errs      bytes colls
    103734 70658    6223998          0     0          0     0
    103483 75703    6209046          0     0          0     0
    103848 76195    6230886          0     0          0     0

           input          (em2)           output
   packets  errs      bytes    packets  errs      bytes colls
    103299 62957    6197940          1     0        226     0
    106388 73071    6383280          1     0        178     0
    104503 70573    6270180          4     0        712     0

last pid:  1378;  load averages:  2.31,  1.28,  
0.57                                                                  up 
0+00:06:27  17:42:32
68 processes:  8 running, 42 sleeping, 18 waiting
CPU:  0.0% user,  0.0% nice, 58.9% system,  0.0% interrupt, 41.1% idle
Mem: 7980K Active, 5932K Inact, 47M Wired, 16K Cache, 8512K Buf, 1920M Free
Swap: 8192M Total, 8192M Free

  PID USERNAME PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
   11 root     171 ki31     0K    16K RUN    2   5:18 80.47% idle: cpu2
   38 root     -68    -     0K    16K CPU3   3   2:30 80.18% em2 taskq
   37 root     -68    -     0K    16K CPU1   1   2:28 76.90% em1 taskq
   36 root     -68    -     0K    16K CPU2   2   2:28 72.56% em0 taskq
   13 root     171 ki31     0K    16K RUN    0   3:32 29.20% idle: cpu0
   12 root     171 ki31     0K    16K RUN    1   3:29 27.88% idle: cpu1
   10 root     171 ki31     0K    16K RUN    3   3:21 25.63% idle: cpu3
   39 root     -68    -     0K    16K -      3   0:32 17.68% em3 taskq

See that's total wrongness.. something is very wrong here.  Does anyone 
have any ideas? I really need to get this working.
I figured if I evenly distributed the packets over 3 interfaces it 
simulates having 3 rx queues because it has a separate process for each 
interface
and the result is WAY more CPU usage and a little over half the pps 
throughput with a single port ..

If anyone is interested in tackling some these issues please e-mail me.  
It would be greatly appreciated.

Paul

Julian Elischer wrote:
> Paul wrote:
>> ULE without PREEMPTION is now yeilding better results.
>>         input          (em0)           output
>>   packets  errs      bytes    packets  errs      bytes colls
>>    571595 40639   34564108          1     0        226     0
>>    577892 48865   34941908          1     0        178     0
>>    545240 84744   32966404          1     0        178     0
>>    587661 44691   35534512          1     0        178     0
>>    587839 38073   35544904          1     0        178     0
>>    587787 43556   35540360          1     0        178     0
>>    540786 39492   32712746          1     0        178     0
>>    572071 55797   34595650          1     0        178     0
>>  
>> *OUCH, IPFW HURTS..
>> loading ipfw, and adding one ipfw rule allow ip from any to any drops 
>> 100Kpps off :/ what's up with THAT?
>> unloaded ipfw module and back 100kpps more again, that's not right 
>> with ONE rule.. :/
>
> ipfw need sto gain a lock on hte firewall before running,
> and is quite complex..  I can believe it..
>
> in FreeBSD 4.8 I was able to use ipfw and filter 1Gb between two 
> interfaces (bridged) but I think it has slowed down since then due to 
> the SMP locking.
>
>
>>
>> em0 taskq is still jumping cpus.. is there any way to lock it to one 
>> cpu or is this just a function of ULE
>>
>> running a tar czpvf all.tgz *  and seeing if pps changes..
>> negligible.. guess scheduler is doing it's job at least..
>>
>> Hmm. even when it's getting 50-60k errors per second on the interface 
>> I can still SCP a file through that interface although it's not 
>> fast.. 3-4MB/s..
>>
>> You know, I wouldn't care if it added 5ms latency to the packets when 
>> it was doing 1mpps as long as it didn't drop any.. Why can't it do 
>> that? Queue them up and do them in bigggg chunks so none are 
>> dropped........hmm?
>>
>> 32 bit system is compiling now..  won't do > 400kpps with GENERIC 
>> kernel, as with 64 bit did 450k with GENERIC, although that could be
>> the difference between opteron 270 and opteron 2212..
>>
>> Paul
>>
>> _______________________________________________
>> freebsd-net at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>