Lots of weird PF behavior on 7.2-STABLE

Ermal Luçi eri at freebsd.org
Tue Dec 15 01:56:19 PST 2009


On Tue, Dec 15, 2009 at 7:21 AM, Linda Messerschmidt <
linda.messerschmidt at gmail.com> wrote:

> Hi all,
>
> I have a PF machine that is giving fits.  I see a lot of weird behavior.
>
> 1) TCP connections (mainly port 80) sometimes take 3 seconds to get
> started instead of being virtually instant.
> 2) Sometimes HTTP connections just stop responding.  (Client program
> times out waiting for response.)
> 3) Sometimes connections get weirdly dropped ("Connection reset by peer.")
> 4) Sometimes if I am ssh'd through the firewall, something will happen
> and my inbound packets will start getting dropped, but outbound
> packets still pass.  For example, if I'm at the shell prompt, it is
> non-responsive.  But if I log alongside a stuck connection and "write"
> to that tty, I will see it no problem.
> 5) States that have no right to still be there continue to pile up
> into the hundreds of thousands.
>
> I kind of get the feeling that all of these are related.  In
> particular, I think 2, 3, and 4.
>
> Of all of these, the only one I can document at the moment is #3.
>
> Here is a packet capture from the public (web client) interface:
>
> 20:00:02.038067 IP 1.2.3.4.61645 > 5.6.7.8.80: S
> 620577087:620577087(0) win 65535 <mss 1460,nop,wscale
> 9,sackOK,timestamp 953726452 0>
> 20:00:02.038328 IP 5.6.7.8.80 > 1.2.3.4.61645: S 40565958:40565958(0)
> ack 620577088 win 0 <mss 1460>
> 20:00:02.065678 IP 1.2.3.4.61645 > 5.6.7.8.80: . ack 1 win 65535
> 20:00:02.095158 IP 1.2.3.4.61645 > 5.6.7.8.80: P 1:80(79) ack 1 win 65535
> 20:00:02.378248 IP 1.2.3.4.61645 > 5.6.7.8.80: P 1:80(79) ack 1 win 65535
> 20:00:02.746163 IP 1.2.3.4.61645 > 5.6.7.8.80: P 1:80(79) ack 1 win 65535
> 20:00:03.282122 IP 1.2.3.4.61645 > 5.6.7.8.80: P 1:80(79) ack 1 win 65535
> 20:00:04.154112 IP 1.2.3.4.61645 > 5.6.7.8.80: P 1:80(79) ack 1 win 65535
> 20:00:05.698002 IP 1.2.3.4.61645 > 5.6.7.8.80: P 1:80(79) ack 1 win 65535
> 20:00:07.913721 IP 1.2.3.4.61645 > 5.6.7.8.80: P 1:80(79) ack 1 win 65535
> 20:00:12.145438 IP 1.2.3.4.61645 > 5.6.7.8.80: P 1:80(79) ack 1 win 65535
> 20:00:12.287038 IP 5.6.7.8.80 > 1.2.3.4.61645: F 1:1(0) ack 1 win 65535
> 20:00:20.408734 IP 1.2.3.4.61645 > 5.6.7.8.80: P 1:80(79) ack 1 win 65535
> 20:00:20.409874 IP 5.6.7.8.80 > 1.2.3.4.61645: R 40565959:40565959(0) win 0
>
> Here is a packet capture of the same session from the private (web
> server) interface:
>
> 20:00:02.038089 IP 1.2.3.4.61645 > 5.6.7.8.80: S
> 620577087:620577087(0) win 65535 <mss 1460,nop,wscale
> 9,sackOK,timestamp 953726452 0>
> 20:00:02.038311 IP 5.6.7.8.80 > 1.2.3.4.61645: S 40565958:40565958(0)
> ack 620577088 win 0 <mss 1460>
> 20:00:02.065694 IP 1.2.3.4.61645 > 5.6.7.8.80: . ack 1 win 65535
> 20:00:12.287026 IP 5.6.7.8.80 > 1.2.3.4.61645: F 1:1(0) ack 1 win 65535
> 20:00:20.408747 IP 1.2.3.4.61645 > 5.6.7.8.80: P 1:80(79) ack 1 win 65535
> 20:00:20.409859 IP 5.6.7.8.80 > 1.2.3.4.61645: R 40565959:40565959(0) win 0
>
> So that client -> server push packet is not making it through the
> firewall despite numerous retransmits, until 18 seconds later when the
> server has already given up on it.
>
> That connection hangs around in the state table for a long time as:
>
> all tcp 5.6.7.8:80 <- 1.2.3.4:61645       CLOSED:CLOSING
>
> This despite:
>
> set timeout tcp.closed 5
> set timeout tcp.closing 30
>
> To test, I stopped connections from 1.2.3.4 to 5.6.7.8.  At present,
> there are *zero* established connections between 1.2.3.4 and 5.6.7.8.
> None.  But:
>
> $ sudo pfctl -s state | fgrep 1.2.3.4: | fgrep :80 | wc
>    2243   13458  160932
>
> A few minutes later I broke this down by connection status:
> 1222 CLOSED:CLOSING
>  556 ESTABLISHED:ESTABLISHED
>  15 FIN_WAIT_2:CLOSING
>  27 SYN_SENT:FIN_WAIT_2
>
> That doesn't add up to 2243, so they *are* slowly dying off.  I did
> some poking around, and the CLOSED:CLOSING ones expire after fifteen
> minutes, which is the timeout for tcp.opening.  Um, OK.
>
> The 556 ESTABLISHED:ESTABLISHED states appear content to persist until
> they age off too, even though those connections are long gone.
>
> As far as the "3 second" thing, I noticed somebody here recently had a
> similar problem and made it go away by upping their states and
> dropping their timeouts.  Well, he dropped his timeouts to where ours
> are, and we're at:
>
> set limit states 2000000
>
> We are definitely not out of states; we're seeing these problems right
> now and due to my playing around with the tcp.established timeout,
> we're at 66412 states right now.  (Ordinarily it hovers around
> 350,000.)  The machine is a dual-core Core 2 6320 with 2GB of RAM and
> nothing to but load balance this traffic.  It shows as 95% idle all
> day.
>
> So sometimes pf loses packets related to connections that are still
> around, and sometimes it thinks connections are still around long
> after the packets are gone.
>
> I would be really, really grateful for any suggestions or help.  I am
> completely lost here and at my wits' end!
>
> I've included my pf.conf below.
>
>
>
> --------------------------------------------------------------------------------------------
>
> set limit states 2000000
> set timeout tcp.established 86400
> set timeout tcp.closed 5
> set timeout tcp.closing 30
>
> ExtIf = "em0"
> IntIf = "em1"
>
> table <NoRouteIPs> { 127.0.0.0/8, 169.254.0.0/16, 192.0.2.0/24,
> 192.168.0.0/16, 172.16.0.0/12, 10.0.0.0/8 }
> table <OurIPs> { ... }
> table <DNSServers> { ... }
> table <BalanceBlocks> { ... }
>
> scrub
>
> ##  Block Reserved Addresses
> block log quick on $ExtIf from <NoRouteIPs> to any
> block log quick on $ExtIf from any to <NoRouteIPs>
>
> ##  Block our own Addresses
> block in log quick on $ExtIf inet from <OurIPs> to any
>
> ##  Anti-DDOS
> table <AntiDDOS> persist
> block quick from <AntiDDOS> to any
> block quick from any to <AntiDDOS>
>
> ##  Block HTTP traffic to DNS servers
> block quick inet proto tcp from any to <DNSServers> port 80
>
> ##  Weird DNS people added 2009-06-18
> block drop log quick proto 255
> table <GTExperimentDNS> { 61.220.4.0/24 }
> block drop in quick proto { udp, tcp } from <GTExperimentDNS> to any port
> 53
>
> ## Load Balancing
> pass in on $ExtIf route-to { ($IntIf 3.4.5.6), ($IntIf 3.4.5.7),
> ($IntIf 3.4.5.8), ($IntIf 3.4.5.9) } round-robin proto tcp from any to
> <BalanceBlocks> port 80
>
> Try enabling sticky connections here.



-- 
Ermal


More information about the freebsd-pf mailing list