CARP interfaces and mastership issue

Damien Fleuriot ml at my.gd
Thu Sep 29 12:57:53 UTC 2011


On 29 September 2011 14:20, Damien Fleuriot <ml at my.gd> wrote:
>
>
> On 9/15/11 11:07 AM, Damien FLEURIOT wrote:
>> Hello list,
>>
>>
>>
>>
>> TLDR: carp interface becomes MASTER for a split second after being
>> created, even if another MASTER exists on the network with faster
>> advertisements. Breaks connections. HOWTO prevent ?
>>
>>
>>
>>
>> We've been experiencing this double mastership problem with CARP interfaces.
>>
>>
>> Allow me to put some context here:
>>
>> 2 firewalls, PF1, PF2, each with 2 VLANs (for example, some have more)
>> on a lagg device (link aggregation).
>> These firewalls then share virtual IPs through CARP interfaces, let us
>> assume the following:
>>
>> PF1:
>> - vlan13
>> - vlan410
>> - carp13 (advskew 50)
>> - carp410 (advskew 50)
>>
>> PF2:
>> - vlan13
>> - vlan410
>> - carp13 (advskew 100)
>> - carp410 (advskew 100)
>>
>> CARP preemption is turned on, so that if vlan13 should fail on PF1, PF2
>> would assume mastership on both CARP interfaces.
>> Syscontrols below:
>> net.inet.carp.allow: 1
>> net.inet.carp.preempt: 1
>> net.inet.carp.log: 1
>> net.inet.carp.arpbalance: 0
>> net.inet.carp.suppress_preempt: 0
>>
>>
>> The problem we have is, say for example we reboot PF2.
>> When it comes back up, it will, even for a split second, assume CARP
>> mastership for its interfaces, at the same time as PF1.
>>
>> This breaks existing sessions, openvpn tunnels and new client connections.
>>
>> While I acknowledge the home-made demons should be built to support tiny
>> network outages, this doesn't solve our main problem.
>>
>>
>>
>>
>>
>> We have the same issue when destroying/creating said CARP interfaces.
>>
>> Recently we upgraded some switches' IOS version on our backup datacenter
>> (which also has 2 PF boxes, sharing the CARP IPs with the 2 PFs on our
>> production DC).
>> To prevent anything nasty happening, we forbade production VLANs on the
>> switches' uplink ports and only allowed management traffic to allow us
>> to perform the upgrade.
>>
>> Things went smoothly but when we brought the production VLANs up again
>> at layer 2 on the switches, when spanning-tree converged we had again a
>> double MASTER problem.
>>
>> I understand I could have avoided it by destroying/recreating the CARP
>> interfaces, but even in this case there is a split second during which
>> both firewalls are CARP MASTER.
>>
>>
>>
>>
>> Is there any way to force CARP to assume INIT state for some time when
>> coming up, and only after X seconds either become MASTER or BACKUP ?
>>
>> Any other idea how to solve this, guys ?
>>
>>
>>
>> _______________________________________________
>> freebsd-pf at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-pf
>> To unsubscribe, send any mail to "freebsd-pf-unsubscribe at freebsd.org"
>
>
>
>
> Hello List,
>
>
>
> This is a follow-up to my original email quoted above.
>
>
>
>
> It seems that there is an existing bug in OpenBSD 3.8 and lower's CARP
> implementation which causes CARP interfaces to skip the INIT state
> altogether and start as MASTER if preempt is enabled.
>
> Source:
> https://calomel.org/pf_carp.html
>
> Quote:
> INIT : All CARP interfaces start in this state. Also, when a CARP
> interface is admin down, i.e. "ifconfig em0 down", it is put into this
> state. When a CARP interface is admin up, it immediately transitions to
> BACKUP. Note that in OpenBSD 3.8 and earlier, a bug exists which will
> cause the host to transition to MASTER right away if preempt is enabled.
>
>
> I have been able to verify and reproduce this behavior on boxes running
> both 8.1 and 8.2 FreeBSD.
>
>
>
>
> Does anyone know what version of OpenBSD's CARP implementation we're
> running on FreeBSD 8.x ?
>
> It seems like this is the same bug, to me.
>



Quick follow-up again.

This is the code for sys/netinet/ip_carp.c on FreeBSD 8.2, OpenBSD
3.8, OpenBSD 3.9 in function carp_setrun(struct carp_softc *sc,
sa_family_t af)



FREEBSD 8.2-PRERELEASE with init + preempt => auto MASTER bug
Function starts at line 1371.
---
        switch (sc->sc_state) {
        case INIT:
                if (carp_opts[CARPCTL_PREEMPT] && !carp_suppress_preempt) {
                        carp_send_ad_locked(sc);
                        carp_send_arp(sc);
#ifdef INET6
                        carp_send_na(sc);
#endif /* INET6 */
                        CARP_LOG("%s: INIT -> MASTER (preempting)\n",
                            SC2IFP(sc)->if_xname);
                        carp_set_state(sc, MASTER);
                        carp_setroute(sc, RTM_ADD);
                } else {
                        CARP_LOG("%s: INIT -> BACKUP\n", SC2IFP(sc)->if_xname);
                        carp_set_state(sc, BACKUP);
                        carp_setroute(sc, RTM_DELETE);
                        carp_setrun(sc, 0);
                }
                break;
---

OPENBSD 3.8 with init + preempt => auto MASTER bug
Function starts at line 1293.
---
        case INIT:
                if (carp_opts[CARPCTL_PREEMPT] && !carp_suppress_preempt) {
                        carp_set_state(sc, MASTER);
                        carp_setroute(sc, RTM_ADD);
                        carp_send_ad(sc);
                        carp_send_arp(sc);
#ifdef INET6
                        carp_send_na(sc);
#endif /* INET6 */
                } else {
                        carp_set_state(sc, BACKUP);
                        carp_setroute(sc, RTM_DELETE);
                        carp_setrun(sc, 0);
                }
                break;
---



OPENBSD 3.9 with bug fixed
Function starts at line 1348.
---
        switch (sc->sc_state) {
        case INIT:
                carp_set_state(sc, BACKUP);
                carp_setroute(sc, RTM_DELETE);
                carp_setrun(sc, 0);
                break;
---


It looks like the root cause is there.

I'll rebuild and test, keep you updated.


More information about the freebsd-stable mailing list