Why is backup carp node seeing traffic?

Fri Apr 10 11:03:22 UTC 2015

With a setup containing two carp/pf firewall/load-balancers in front
of six web servers, some packets are popping up on the backup carp
node.

All eight machines are connected to a (fault tolerant) 10Gbit switch
over dual ix connections in an LACP lagg.  The carp/pf nodes use VLANs
to split incoming and outgoing traffic.

With some careful tcpdumping, we have identified that the problem is
that every now and then (about 1 in 1000 connections), the first
SYN-ACK packet from a web server is returned to the backup carp node
instead of the primary.  There, it gets dropped and there's a 3-second
delay until the client host retransmits the SYN packet.

Actually quite a bit more traffic than that winds up on the backup
node, maybe as much as a few % of total traffic, but it seems like the
initial SYN packet is the only one that gets dropped.  This may be
because pfsync hasn't created the state on the backup node yet?

Here's what the tcpdump looks like from each machine, showing all the
packets on the web server, all but one of the packets on the primary
carp node, and the "missing" SYN-ACK packet on the backup carp node.

Web server:

07:02:05.257968 IP 192.168.0.17.30892 > 192.168.80.2.80: Flags [S],
seq 1377231734, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS
val 371382342 ecr 0], length 0
07:02:05.258016 IP 192.168.80.2.80 > 192.168.0.17.30892: Flags [S.],
seq 1053809273, ack 1377231735, win 0, options [mss 1460], length 0
07:02:08.254929 IP 192.168.0.17.30892 > 192.168.80.2.80: Flags [S],
seq 1377231734, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS
val 371385342 ecr 0], length 0
07:02:08.254943 IP 192.168.80.2.80 > 192.168.0.17.30892: Flags [S.],
seq 1053809273, ack 1377231735, win 0, options [mss 1460], length 0
07:02:08.255332 IP 192.168.0.17.30892 > 192.168.80.2.80: Flags [.],
ack 1, win 65535, length 0
07:02:08.255383 IP 192.168.80.2.80 > 192.168.0.17.30892: Flags [.],
ack 1, win 65535, length 0
07:02:08.256155 IP 192.168.0.17.30892 > 192.168.80.2.80: Flags [P.],
seq 1:28, ack 1, win 65535, length 27

Primary carp node: (client is identical)

07:02:05.258324 IP 192.168.0.17.30892 > 192.168.80.2.80: Flags [S],
seq 1377231734, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS
val 371382342 ecr 0], length 0
07:02:08.255287 IP 192.168.0.17.30892 > 192.168.80.2.80: Flags [S],
seq 1377231734, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS
val 371385342 ecr 0], length 0
07:02:08.255331 IP 192.168.80.2.80 > 192.168.0.17.30892: Flags [S.],
seq 1053809273, ack 1377231735, win 0, options [mss 1460], length 0
07:02:08.255691 IP 192.168.0.17.30892 > 192.168.80.2.80: Flags [.],
ack 1, win 65535, length 0
07:02:08.255766 IP 192.168.80.2.80 > 192.168.0.17.30892: Flags [.],
ack 1, win 65535, length 0
07:02:08.256519 IP 192.168.0.17.30892 > 192.168.80.2.80: Flags [P.],
seq 1:28, ack 1, win 65535, length 27

Backup carp node:

07:02:05.263024 IP 192.168.80.2.80 > 192.168.0.17.30892: Flags [S.],
seq 1053809273, ack 1377231735, win 0, options [mss 1460], length 0

There are no carp failovers, interface write errors are <1/million and
the error counters don't increase while the problem is happening.
Switch says no errors, no LACP drops, nothing.  The total traffic
levels are low.  This problem happens in waves, weirdly usually at
low-usage times like the middle of the night.  The primary carp node
is ~98% idle.  All the hardware checks out, and they are all physical
machines, no virtualization or jails or anything going on.

The logs are silent while this is happening.

Six of the eight systems, including both carp nodes, are running 9.3.
Two of the web servers are running 10.1.  The problem does happen with
all six web servers.

The problem is definitely related to carp.  If the default route is
changed from the carp address to the master node's address on that
same interface on some or all of the web servers, the problem will go
away for only those servers.  As soon as it's switched back, the
problem returns.

Similarly, running a flood ping against the carp address from a web server:

$ sudo ping -c 100000 -f 192.168.80.254
PING 192.168.80.254 (192.168.80.254): 56 data bytes
.........................................................
--- 192.168.80.254 ping statistics ---
100000 packets transmitted, 99944 packets received, 0.1% packet loss
round-trip min/avg/max/stddev = 0.017/0.024/0.535/0.013 ms

Produced exactly 56 stray ICMP echo request (100000 - 99944 = 56) on
the backup server:

$ sudo tcpdump -n -i vlan2 host 192.168.80.254 and icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vlan2, link-type EN10MB (Ethernet), capture size 65535 bytes
10:36:34.432811 IP 192.168.80.6 > 192.168.80.254: ICMP echo request,
id 22737, seq 16720, length 64
10:36:34.443833 IP 192.168.80.6 > 192.168.80.254: ICMP echo request,
id 22737, seq 16721, length 64
10:36:34.454830 IP 192.168.80.6 > 192.168.80.254: ICMP echo request,
id 22737, seq 16722, length 64
10:36:34.465831 IP 192.168.80.6 > 192.168.80.254: ICMP echo request,
id 22737, seq 16723, length 64
[ ... etc ... ]
^C
56 packets captured
1219 packets received by filter
0 packets dropped by kernel

One thing of note is that the stray packets all appear to be about
11ms apart.  That pattern is consistent across all 56 packets and is
repeatable.

Finally, here is the ifconfig on the carp nodes, from the carp
interface all the way down to the wire.

Master:

carp2: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
inet 192.168.80.254 netmask 0xfffffff0
inet6 fd00:0:0:80::254 prefixlen 64
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
carp: MASTER vhid 2 advbase 1 advskew 0

vlan2: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST>
metric 0 mtu 1500
options=303<RXCSUM,TXCSUM,TSO4,TSO6>
ether 90:e2:ba:0d:0a:98
inet 192.168.80.251 netmask 0xfffffff0 broadcast 192.168.80.255
inet6 fe80::92e2:baff:fe0d:a98%vlan2 prefixlen 64 scopeid 0xa
inet6 fd00:0:0:80::251 prefixlen 64
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: active
vlan: 2 parent interface: lagg0

lagg0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST>
metric 0 mtu 1500
options=407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO>
ether 90:e2:ba:0d:0a:98
inet6 fe80::92e2:baff:fe0d:a98%lagg0 prefixlen 64 scopeid 0x8
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: active
laggproto lacp lagghash l2,l3,l4
laggport: ix1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: ix0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>

ix0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric
0 mtu 1500
options=407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO>
ether 90:e2:ba:0d:0a:98
inet6 fe80::92e2:baff:fe0d:a98%ix0 prefixlen 64 scopeid 0x1
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
media: Ethernet autoselect (10Gbase-Twinax <full-duplex>)
status: active

ix1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric
0 mtu 1500
options=407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO>
ether 90:e2:ba:0d:0a:98
inet6 fe80::92e2:baff:fe0d:a99%ix1 prefixlen 64 scopeid 0x2
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
media: Ethernet autoselect (10Gbase-Twinax <full-duplex>)
status: active

Backup:

carp2: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
inet 192.168.80.254 netmask 0xfffffff0
inet6 fd00:0:0:80::254 prefixlen 64
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
carp: BACKUP vhid 2 advbase 1 advskew 150

vlan2: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST>
metric 0 mtu 1500
options=303<RXCSUM,TXCSUM,TSO4,TSO6>
ether 90:e2:ba:0d:0a:7d
inet 192.168.80.252 netmask 0xfffffff0 broadcast 192.168.80.255
inet6 fe80::92e2:baff:fe0d:a7d%vlan2 prefixlen 64 scopeid 0xa
inet6 fd00:0:0:80::252 prefixlen 64
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: active
vlan: 2 parent interface: lagg0

lagg0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST>
metric 0 mtu 1500
options=407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO>
ether 90:e2:ba:0d:0a:7d
inet6 fe80::92e2:baff:fe0d:a7d%lagg0 prefixlen 64 scopeid 0x8
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: active
laggproto lacp lagghash l2,l3,l4
laggport: ix0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: ix1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>

ix0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric
0 mtu 1500
options=407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO>
ether 90:e2:ba:0d:0a:7d
inet6 fe80::92e2:baff:fe0d:a7c%ix0 prefixlen 64 scopeid 0x1
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
media: Ethernet autoselect (10Gbase-Twinax <full-duplex>)
status: active

ix1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric
0 mtu 1500
options=407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO>
ether 90:e2:ba:0d:0a:7d
inet6 fe80::92e2:baff:fe0d:a7d%ix1 prefixlen 64 scopeid 0x2
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
media: Ethernet autoselect (10Gbase-Twinax <full-duplex>)
status: active

Why are packets showing up at the backup node?  Why is it
intermittent?  Can anything be done to eliminate this issue?

Thanks for any advice!