kern/120130: carp causes kernel panics in any constellation
Christoph Weber-Fahr
cwf-mlqarcor.de at FreeBSD.org
Tue Jan 29 21:50:02 UTC 2008
>Number: 120130
>Category: kern
>Synopsis: carp causes kernel panics in any constellation
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Tue Jan 29 21:50:02 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator: Christoph Weber-Fahr
>Release: 6.3-RELEASE
>Organization:
Arcor AG
>Environment:
FreeBSD XXX.tnd.lab.arcor.de 6.3-RELEASE FreeBSD 6.3-RELEASE #0: Fri Jan 25 21:34:42 CET 2008 wefa at XXX.tnd.lab.arcor.de:/usr/obj/usr/src/sys/DL380 i386
>Description:
Carp reliably and reproducably causes kernel panics.
This is an enhancement of kern/117448 (which itself contains a backreference to kern/92776).
The referred PR claims this error only for the case of having and destroying 2 carp interfaces. We have tested carp extensively, with both 6.2-RELEASE-p9 and and 6.3-RELEASE, and we have additionally encountered a number of spontaneous reboots, spurious lockups and similar problems.
Note, that even though the reproduction recipe given below is based on ifconfig destroy commands, we actually saw crashes in normal course of operation during and between tests where carp was active, both with only one and with multiple carp interfaces.
>How-To-Repeat:
Currently we also found 2 ways to repeatbly reproduce those effects:
1.) as documented in the referred kern/11744
ifconfig carp0 destroy
ifconfig carp1 destroy
This is unrelated to the constellation in which those Interfaces are - in some constllations the system crashes immediately, in others after the next ifconfig operation.
2.) is is alsow possible to have a crash using only one crap interface. We found the following script to reliably produce a kernel panic within 15-20 minutes:
while [ 1 ]
do
/etc/rc.d/netif restart
sleep 35
ifconfig carp0 destroy
sleep 35
done
>Fix:
We do not have a fix.
It should specifically be noted, that using ucarp (from net/ucarp in the ports collection) is no alternative either. In our tests we found ucarp 1.3 to have serious recovery issues after a failover wich reproducably left the cluster in a dysfunctional state. We also tested the (not yet ported) ucarp4 and found it to be completely broken in our environment (Cisco Switch platform) - they switched the transport to multicast and apparently completely botched the implementation, so that it doesn't work on either FreeBSD or Linux.
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list