CARP performance tuning question.

pluknet pluknet at gmail.com
Thu Nov 6 04:30:51 PST 2008


2008/11/6 Peter Jeremy <peterjeremy at optushome.com.au>:
> Whilst I don't doubt that you have a problem, your comments don't
> correlate particularly well with the data you have provided and
> this makes it difficult to immediately suggest a solution.
>
> On 2008-Nov-05 16:40:32 +0300, pluknet <pluknet at gmail.com> wrote:
>>AT work we use device carp(4) under high load:
>
> carp(4) is solely a failover mechanism.  It either generates or receives
> somewhat under 1pps per carp interface and the state it maintains is
> basically 'master' or 'backup'.  I suspect the 'load' is being caused
> by pf(4), possibly in conjunction with pfsync(4).
>
>>The problem is that the server experiences a bad interactivity (from
>>70k states and very bad from 120-150k)
>>i.e. when a network workload (and interrupts count) begin to increase.
>>
>>>From top(1):
>>CPU states:  0.0% user,  0.0% nice,  0.4% system, 76.3% interrupt, 23.3% idle
>>  PID USERNAME        THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
>>  13 root              1 -44 -163     0K     8K WAIT   407:43 57.86% swi1: net
>
> I agree that swi1 is using a significant amount of CPU but top is
> still reporting >23% idle so you shouldn't be getting poor interactive
> performance.
>
>>ATM pfctl -s info shows such numbers:
>>
>>State Table                          Total             Rate
>>  current entries                   153972
>>  searches                      6052078938         4800.8/s
>>  inserts                        120373545           95.5/s
>>  removals                       120219573           95.4/s
>
> That shows the load on pf(4) but doesn't really reflect what the
> system is doing as a whole.
>
>>It works currently under UP, but could be rebuilt to work under SMP
>>(Xeon 5130) if that helps.
>
> Unfortunately, I don't know if this will help or not because I'm not
> sure what bottleneck you are hitting.
>
>>Can someone give hints to decrease interrupt count and to help with
>>the server stability at all?
>
> Well, you haven't actually reported what the interrupt count or
> what instability you are seeing so this is a bit difficult.
>
> Can you please provide some more information:
> - output from 'uname -a'
> - output from 'vmstat -i; sleep 10; vmstat -i' under load
> - output from 'netstat -i'
> - 10-15 seconds of output from 'netstat -i 1' under load
> - What is the box doing? Is it a straight filtering router?  Does it
>  handle NAT?  Is it running apps itself (eg web, ftp, mail)?
> - What speed are the interface(s) running at?
> - What instability problems are you seeing?
> - Please provide more details on what you mean by 'bad interactivity'.
> - How complex is your pf ruleset?  How many rules?  Anything unusual?
> - What scheduler are you using?
> - What is the full output of 'pfctl -s info'?
>

Thanks for your answer and, please, ignore this premature mail.
It would need a bit more analysis.

-- 
wbr,
pluknet


More information about the freebsd-net mailing list