SMP NAT

Robert Watson rwatson at FreeBSD.org
Thu Mar 2 17:31:08 PST 2006


On Thu, 2 Mar 2006, Iasen Kostov wrote:

> I'm now using a MP system (dual opteron) to do NAT for about 1500 clients at 
> once at speed above 200Mbit/sec full-duplex (e.g about 400Mbit/sec) and I'm 
> using PF to do the NAT. Bad thing is that the second CPU is idle. As I can 
> see from top - about 50% of the cpu is used by irq handler for the ethernet 
> adapter (irq27: bge0 bge1 - I'm using only bge0 to route via VLANs) and 
> about 30% by the network interrupt handler. I guess that the swi1:net is 
> handling the NAT (via PF) and if swi1 and irq27 are in different handlers 
> why they don't get executed on different CPUs (second CPU is 98% idle and 
> top show that both handlers run on same CPU). Aren't both handlers in 
> different kernel threads ? If they are not - is it possible to be in 
> different threads on different CPUs ?

In general, yes -- I frequently look at top -S and see the ithreads running on 
different CPUs from each other.  As you surmise, the hardware ithread is 
handling the hardware interrupts up through link layer processing, and then 
the netisr is doing the IP layer processing including NAT.  On recent FreeBSD, 
generally if a second CPU is idle, we will generally wake up the netisr on 
another idle CPU.  However, that's a property of the scheduler, and the 
details of how that happens vary a bit by FreeBSD version.  You don't include 
information on which FreeBSD version you're using.  It's also worth keeping in 
mind that if you have idle CPU time on your first CPU even with both threads 
going as fast as the hardware is driving, it's not necessarily "better" to be 
running the two tasks on different CPUs, for reasons of memory caching -- 
i.e., the second CPU won't have to cache miss and read the packets in from 
memory a second time when it begins processing the mbufs previously brought 
into memory on the first CPU by the interrupt handler.

So a few questions:

(1) What version of FreeBSD are you running?

(2) Is your network stack running MPSAFE?  "sysctl debug.mpsafenet" will
     return either 0 or 1.  If you're running with certain network features,
     such as the KAME IPSEC stack, you may be running with single processor
     network stack.

(3) Are you using SCHED_4BSD (or rather, have you changed to SCHED_ULE)?

(4) Are you running with PREEMPTION compiled into the kernel?  When a thread,
     such as the netisr, is preempted by a hardware ithread, it won't
     necessarily be bounced to the other CPU immediately.

Robert N M Watson


More information about the freebsd-net mailing list