FreeBSD as PF/Router/Firewall dying on the vine

Michael K. Smith mksmith at adhost.com
Sat Oct 11 21:06:55 UTC 2008


Hello Jeremy:


On 10/6/08 9:30 PM, "Jeremy Chadwick" <koitsu at FreeBSD.org> wrote:

> On Mon, Oct 06, 2008 at 06:08:50PM -0700, Michael K. Smith - Adhost wrote:
>> Hello All:
>> 
>> We have a load balanced pair of PF boxes sitting in front of a whole bunch of
>> server doing all manner of things!  It's been working great up until today
>> when it, well, didn't.  Here's what I see in top -S.
>> 
>>   PID USERNAME       THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU
>> COMMAND
>>    14 root             1 -44 -163     0K     8K CPU1   0  44:21 88.18% swi1:
>> net
>>    11 root             1 171   52     0K     8K RUN    0  24:58 53.32% idle:
>> cpu0
>>    10 root             1 171   52     0K     8K RUN    1  17:44 35.50% idle:
>> cpu1
>>    24 root             1 -68 -187     0K     8K *Giant 0   5:30 11.62% irq16:
>> em2 uhci3
>>    23 root             1 -68 -187     0K     8K WAIT   0   1:27  3.08% irq25:
>> em1
>>    25 root             1 -68 -187     0K     8K WAIT   1   1:16  2.64% irq17:
>> em3
>> 
>> This is 6.3 with Intel 1000 Fiber and Copper interfaces, all using the 'em'
>> driver.  Also, there are 15 VLAN's configured on one of the NIC's for subnet
>> separation.
>> 
>> If anyone has any ideas I'm all ears.  My google-fu is coming up empty with
>> the swi1: net 
> 
> Can you explain what the problem is?

Sorry it took so long to reply.  We actually got the issue resolved, but I
wanted to make sure our fix actually worked.  Here is what the
problem/solution is.

The problem was significant packet loss and connectivity issue to and
through the PF server.  Even pinging the loopback address on the server
itself was returning 4 ms times.

The problem was a very busy NFS server with clients on the same VLAN, but on
a different subnet.  So, we had a VLAN interface on em1 that had two address
ranges attached, 10.255.0.0/16 and 10.212.6.0/16.  The NFS server was on the
10.255 and the clients were on the 10.212.

Even though they were on the same VLAN, they weren't directly ARP'able, so
all traffic (400 - 600 Mb/sec) between them had to be processed by the
server.  When we moved the clients on to the same subnet as the server,
everything stabilized.

I think this was an issue of bad design on my part.

Regards,

Mike



More information about the freebsd-questions mailing list