pf performance?

Kajetan Staszkiewicz vegeta at tuxpowered.net
Tue Apr 23 23:34:26 UTC 2013


Dnia wtorek, 23 kwietnia 2013 o 21:49:21 Erich Weiler napisał(a):
> Hello all,
> 
> I have a question here about how FreeBSD (8.1-RELEASE-p13 specifically)
> behaves when acting as a firewall.  I understand the pf process is
> "giant locked" to a single CPU core when inspecting packets inbound and
> outbound.  I was wondering, how does that manifest when I look at "top
> -P" on the firewall?
> 
> Right now I have a dual port Myricom 10G NIC (packets inbound on one
> interface and outbound on the other), and the mxge driver is
> "multiplexing" interrupt processing across all the CPU cores for speed.
>   So, when the firewall is busy, I see all the cpu cores quite busy
> processing interrupts (like 70% or more CPU utilization).  But, all CPU
> work seems to be in interrupts.  I don't see anything, or *very* little,
> in system or user space for CPU utilization.  Should the pf process be
> using some CPU too?  If so, how could I tell that?  I'm trying to figure
> out if I'm limited by not having enough CPU to process the interrupts or
> not enough CPU to process the packet filtering process.  Right now it
> looks like interrupts but I'm not sure.

As far as I understand, processing of packets by pf takes place in receiving 
network card's interrupt handler even up to sending the packet via another 
network card (at least in my case, when using route-to targets, which make 
routing inside pf).

> The Myricom folks looked at our debugging info on the mxge driver and
> say that based on what they see, mxge is dropping packets because the
> host cannot pull packets out of the NIC buffer fast enough.  The host is
> using a four core Xeon X5677 3.46GHz CPU.  We're processing 140,000
> packets per second or so, and I see rates up to several gigabits per
> second, but all my research seems to indicate it can do better than
> that, and that we should not be dropping packets.  Or maybe the question
> is: why doesn't the host pull the packets from the NIC fast enough?  Is
> the CPU tied up doing something else?  Interrupts?

As for my performance issues, at first I noticed that I always had some cores 
overloaded and some doing noting. So I performed the following tuning:
- disabled HT on CPUs
- deferred netisr and no NIC interrupts assigned to cores used by netisr
- each core gets only one interrupt
But this is in case of NICs with just a single interrupt (so I have netisr at 
cpu0 and 1, one NIC on cpu3, one nic on cpu4), it might not help when you have 
ones that can load all cores.

Some more tips:
- use interrupt coalescing, if you do, tune it to be more agressive
- create states on *both* sides of your firewall, for me this lowered loadavg
  2-3 times on a machine with around 400 rules.
- keep state amount low, I was surprised how many states were hanging in
  "closing" state which has quite a long default timeout.

How do you count the 140kpps value? One interface, both, in, out? I'd like to 
relate this somehow to my values.

-- 
| pozdrawiam / greetings | powered by Debian, CentOS and FreeBSD |
|  Kajetan Staszkiewicz  | jabber,email: vegeta()tuxpowered net  |
|        Vegeta          | www: http://vegeta.tuxpowered.net     |
`------------------------^---------------------------------------'


More information about the freebsd-net mailing list