Locking issues in CARP in 10.2

Dag-Erling Smørgrav des at des.no
Wed Jun 8 11:42:28 UTC 2016


I have two routers which have been unstable ever since I upgraded them
from 10.1 to 10.2.  The symptoms were mostly livelocks, where the
machine doesn't freeze completely but is unusable (network is down,
console doesn't refresh, it seems to react to keyboard input and tries
but fails to shut down when I press Ctrl+Alt+Del)

I suspected that it was related to CARP, because it never happened if
the router was taken out of the group (i.e. not just in BACKUP state,
but no CARP addresses configured at all), and although I could not
confirm this 100%, it seemed to be triggered by adding or removing an
address to an interface.  VLAN interfaces on these routers are created,
destroyed and modified dynamically based on data from the provisioning
system, so it's difficult to pinpoint.  However, earlier today, one of
the routers panicked right as I was taking it offline.  I assume that it
was triggered by one of two things which happened almost simultaneously:
first, the CARP address had been configured on another router, possibly
triggering a state change, and second, I manually deleted the CARP
address from the router that crashed.  Crash dumps were not enabled, but
it looks like the instruction pointer was in __mtx_lock_sleep(), around
line 438 in sys/kern/kern_mutex.c:

435			v = m->mtx_lock;
436			if (v != MTX_UNOWNED) {
437				owner = (struct thread *)(v & ~MTX_FLAGMASK);
438				if (TD_IS_RUNNING(owner)) {
439					if (LOCK_LOG_TEST(&m->lock_object, 0))
440						CTR3(KTR_LOCK,
441						    "%s: spinning on %p held by %p",

Is this a known issue?  If not, has anyone else had similar problems, or
does anyone know of locking issues in the CARP code which might trigger
a livelock or panic when a CARP address is added or removed?

DES
-- 
Dag-Erling Smørgrav - des at des.no


More information about the freebsd-net mailing list