Traffic mysteriously dropping

Mon Apr 3 21:13:05 UTC 2006

Greg Hennessy wrote:

>>All rules that are block are also using log.  A lot of the 
>>pass rules do not because it generates such enormous logs.  I 
>>can try enable logging on every rule temporarily in order to 
>>troubleshoot this if necessary.
>>    
>>
>
>I would, you need to see what exactly is matching a flow. 
>
>  
>
>>> 
>>>
>>>      
>>>
>>Yes, if I tcpdump on em0, pflog0, and em1 simultaneously 
>>during a traffic test, the traffic hits em0, and never shows 
>>up as blocked in pflog0 and never shows up at all on em1.  As 
>>I stated, it's only 1 out of a bunch of connections, so there 
>>is no rule blocking all the traffic.
>>    
>>
>
>Hmmm, are you using route-to or such like in the policy ? If its not going
>out the interface you expect, it may be going out through another. 
>Time to tcpdump on everything including localhost to be sure. 
>
>Silly question,  is Jumbo frames enabled on one of the end points or are you
>running stock sized ethernet framing everywhere ?
>
>Has the firewall ever does transparent web caching ?
>
>Does the traffic route successfully if you disable pf with pfctl -d ? That
>should quickly determine if it's a routing or a firewall issue. 
>
>  
>
I am not using anything advanced like route-to.  The frames are the 
standard size, and this is not doing any kind of web caching or anything 
since the network behind it is mostly just mail servers, and a few web 
servers.  Unfortunately, since this is a production firewall, I can't 
really disable pf in this scenario.  I have done a simultaneous tcpdump 
on all the network interfaces, and pflog0 and lo0.  It did pretty much 
what I thought.  It would hit the outside interface, not even hit pf, 
and never pass through.  I have also found that it does log state 
mis-matches when this happens.  I found this with pf debugging enabled.

>>>Using interface groups without directionality, means that a 
>>>      
>>>
>>single rule 
>>    
>>
>>>will match the flow on both the ingress and egress interfaces.
>>> 
>>>Combined with antispoof, it makes for simpler policy
>>>
>>> 
>>>
>>>      
>>>
>>I have coded the rule as explained above and even as the 
>>first rule after the default block rule, it still drops 
>>traffic.  If I change it to non-stateful, it doesn't drop the 
>>connections.  I can't seem to get away from the thought of a 
>>state mis-match, however, I don't know why it would 
>>consistently do it on these http connections.
>>    
>>
>
>Hmmm, possibly something strange with the stack on the endpoints. 
>
>Are you using scrub in the policy ? 
>  
>
I am using scrub in the policy, however, as I've detailed above, this 
doesn't play a roll.

>>>What other blocks are in the policy ? 
>>>
>>> 
>>>
>>>      
>>>
>>I don't believe I'm doing any specifc blocks.  Just the 
>>default block and then allow what we need after that.
>>    
>>
>
>Time to do a quick grep to be completely sure, it's easy to miss one by just
>reading through a policy that large. 
>  
>
I have logging enabled in all the rules, so it will be logged no matter 
what it gets blocked by.  I think I have actually found the problem.  It 
is the state-mismatch, and it's because in our test scenario, all the 
requests are coming from 1 client.  There is a thread about this at 
http://www.benzedrine.cx/pf/msg07505.html.  If I choke down the 
tcp.closed time on the rule that allows this, it seems to minimize the 
problem.  Thanks for all the help everyone.

Chris