pf and carp, BACKUP host dropping connection

Mon Apr 20 12:21:15 UTC 2009

Hi,

I have 3 hosts set up with 1 virtual IP using carp. I don't yet have 
pfsync (which I'm planning to do next). However, there is a strange 
behavior that I cannot understand.

The 3 machines are all gateways between two networks and have 2 VIP ips 
which are used for routing (actually they have 4 networks and 4 VIPs, 
but only 2 are relevant in this case). When I ssh from one network to 
the other however, connections are sometimes blocked by pf. However, 
they're dropped on the machine which is NOT currently master!

That is, I have machines:

1)
carp1: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
	inet 10.0.80.74 netmask 0xffffff00
	carp: MASTER vhid 2 advbase 1 advskew 0
carp3: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
	inet 10.0.82.74 netmask 0xffffff00
	carp: MASTER vhid 4 advbase 1 advskew 0

2)
carp0: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
	inet 212.61.136.74 netmask 0xfffffff0
	carp: BACKUP vhid 1 advbase 1 advskew 50
carp2: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
	inet 10.0.81.74 netmask 0xffffff00
	carp: BACKUP vhid 3 advbase 1 advskew 50

3)
carp1: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
	inet 10.0.80.74 netmask 0xffffff00
	carp: BACKUP vhid 2 advbase 1 advskew 100
carp3: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
	inet 10.0.82.74 netmask 0xffffff00
	carp: BACKUP vhid 4 advbase 1 advskew 100

Then from the 10.0.80 network I do a ssh to the 10.0.82 network. The 
router for the 10.0.82 network is 10.0.82.74 and the router for the 
10.0.80 network is 10.0.80.74 (the VIPs):

 > ssh 10.0.82.5
sebster at 10.0.82.5's password:
 > Read from remote host 10.0.82.5: Connection reset by peer
Connection to 10.0.82.5 closed.

And then I get on the backup gateways pf log:

machine 2:
# tcpdump -nttteli pflog0 not src or dst port 6155 and not src or dst 
host 224.0.0.18 and not src or dst port 68
tcpdump: WARNING: pflog0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on pflog0, link-type PFLOG (OpenBSD pflog file), capture size 
96 bytes
000000 rule 11/0(match): block in on em1: 10.0.80.3.58876 > 
10.0.82.5.22: [|tcp]
001161 rule 11/0(match): block in on em1: 10.0.80.3.58876 > 
10.0.82.5.22: [|tcp]
000018 rule 11/0(match): block in on em1: 10.0.80.3.58876 > 
10.0.82.5.22:  tcp 20 [bad hdr length 0 - too short, < 20]

machine 3:
# tcpdump -nttteli pflog0 not src or dst port 6155 and not src or dst 
host 224.0.0.18 and not src or dst port 68
tcpdump: WARNING: pflog0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on pflog0, link-type PFLOG (OpenBSD pflog file), capture size 
96 bytes
000000 rule 11/0(match): block in on em1: 10.0.80.3.58876 > 
10.0.82.5.22: [|tcp]
001113 rule 11/0(match): block in on em1: 10.0.80.3.58876 > 
10.0.82.5.22: [|tcp]
000019 rule 11/0(match): block in on em1: 10.0.80.3.58876 > 
10.0.82.5.22:  tcp 20 [bad hdr length 0 - too short, < 20]

I'm wondering why these backup hosts are blocking these packets, even 
though the master is still up, and why they are causing the connection 
to fail. (The pf on all 3 hosts do a "block return log on devif all" 
where devif is the interface with the real 10.0.80.x ip; however, why is 
it returning a RST packet when it's backup?).

I think once I have pfsync the problem will go away due to the 
synchronized state (the backups won't block anymore), but it still seems 
strange to me that all 3 machines will then be actively filtering the 
packets...

Does anybody know what's going on?

Regards,
Sebastiaan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3328 bytes
Desc: S/MIME Cryptographic Signature
Url : http://lists.freebsd.org/pipermail/freebsd-cluster/attachments/20090420/0e7a52ee/smime.bin