rwatson at FreeBSD.org
Sun Sep 25 04:08:29 PDT 2005
On Fri, 23 Sep 2005, Jason Carroll wrote:
> There seem to be 2 types of crashes we see with pretty different stack
> traces. What I'll call a type 1 crash, I believe, is often caused by
> one of the triggers I mention above. A type 2 crash appears to happen
> spontaneously after the machine has been running for a while.
> I poked around using kgdb in a core file from a type 2 crash, and it
> appeared the system hung closing sockets (specifically cleaning up
> multicast state i think) while cleaning up one of our multicast
> applications (note the trace through sys_exit). There's no reason this
> application should have been exiting unless it encountered some kind of
> I'm attaching:
> kernel-conf.txt (kernel config file)
> type1-core.txt (a kgdb bt from a type1/triggered crash)
> type2-core.txt (a kgdb bt from a type2/spontaneous crash)
> I'm happy to dig for more information, recompile with different options,
> apply patches, or do anything else that might help get this problem
> diagnosed and fixed!
Hi there Jason!
Sounds nasty. It's possible the two panics are related, especially if
they involve a race in the multicast code, which could result in treading
on other kernel memory, potentially leading to the thread related panic.
My leaning would be that they are unrelated, but since we may be able to
eliminate the multicast one (see below), that would be a good starting
In the 6.x branch, quite a bit of work has been done to improve locking in
the multicast code, and several important races have been fixed relating
to IP multicast. These races tended to turn up on the following sorts of
(1) Multi-threaded appplications changing the multicast properties, such
as membership, or a particular socket in parallel.
(2) Changes to multicast membership during high multicast I/O load on the
socket. For example, adding or deleting multicast groups on socket on
CPU 0 while a packet is delivered to the same socket on CPU 1.
(3) Removal of real or synthetic interfaces involved in active multicast,
such as removal of pccards, vlans, etc during multicast I/O, or with
sockets bound to the interfaces.
These changes are not currently scheduled for a backport to 5.x, because
they change the kernel network device driver API and ABI, requiring
changes to and recompiling of third party device drivers. A subset could
be backported, subject to some limitations, but it would be good to
confirm whether these changes actually affect the problems you're seeing
before working through that. All the changes should appear in the most
recent snapshot, BETA5. Make sure to turn off extra kernel debugging
features, such as WITNESS, INVARIANTS, and user space malloc debugging, if
you start running into performance problems -- they have a big performance
impact, although can be quite helpful in testing. Normally we turn these
off during the release candidate portion of the release cycle.
There are some other known stability nits in 6.x which are being worked
on, but in general the network stack stability is higher in 6.x than 5.x
when it comes to multicast due to the work I reference above. If you run
into any stability problems relating to the file system, set
debug.mpsafevfs=0 in loader.conf -- there are a few bug fixes relating to
running out of disk space or hitting quota limits that are fixed in HEAD,
but not yet backported to 6.x.
Robert N M Watson
More information about the freebsd-amd64