[Bug 235005] r342051 "pfsync: Performance improvement" breaks CARP when used with pfsync

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Wed Jan 16 19:16:13 UTC 2019


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235005

            Bug ID: 235005
           Summary: r342051 "pfsync: Performance improvement" breaks CARP
                    when used with pfsync
           Product: Base System
           Version: 12.0-STABLE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs at FreeBSD.org
          Reporter: thomas at gibfest.dk

After quite a few buildworlds+kernel and reboots I've managed to isolate base
r342051 "pfsync: Performance improvement" as the reason why lagg stopped
working for me.

I've been building a couple of carp+pf routers/firewalls, originally with
12-BETA2 but they were recently upgraded to 12-STABLE base r342254 which is
when both carp nodes started being MASTER instead of one MASTER and one BACKUP
node.

The notes from my bisecting are below. All tests are with the same
configuration. As you can see, base r342051 is the commit where it broke.

12-STABLE base r339946 MASTER/BACKUP
12-STABLE base r341100 MASTER/BACKUP
12-STABLE base r341677 MASTER/BACKUP
12-STABLE base r341965 MASTER/BACKUP
12-STABLE base r342037 MASTER/BACKUP
12-STABLE base r342050 MASTER/BACKUP
12-STABLE base r342051 MASTER/MASTER
12-STABLE base r342055 MASTER/MASTER
12-STABLE base r342073 MASTER/MASTER
12-STABLE base r342109 MASTER/MASTER
12-STABLE base r342254 MASTER/MASTER

I've further confirmed pfsync to be at fault, when pfsync is not enabled the
two nodes are MASTER and BACKUP as they should be. Immediately after I start
pfsync the BACKUP node becomes MASTER and logs these messages:

Jan 16 16:34:56 fwclu2b kernel: carp: demoted by -240 to -240 (pfsync bulk
done)
Jan 16 16:34:56 fwclu2b kernel: carp: 1 at lagg2.52: BACKUP -> MASTER
(preempting a slower master)
Jan 16 16:34:56 fwclu2b kernel: carp: 1 at lagg2.51: BACKUP -> MASTER
(preempting a slower master)
Jan 16 16:34:56 fwclu2b kernel: carp: 1 at lagg3: BACKUP -> MASTER (preempting
a slower master)

...but the MASTER also stays MASTER, and chaos ensues, nothing works on the
network. Stopping pfsync doesn't resolve the situation, only a reboot with
pfsync disabled restores normal carp functionality.

I suggest maybe backing out base r342051 while we investigate the cause, if a
fix can't be found quickly. I suspect it could have something to do with the
pfsync carp demotion code, which the log messages above seem to confirm, but I
don't know.

Let me know if further info is needed about my configuration or anything. See
also this thread on -stable
https://lists.freebsd.org/pipermail/freebsd-stable/2019-January/090421.html
which confirms I am not the only one experiencing this.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list