kern/185043: Kernel panic: Sleeping thread (tid , pid ) owns a non-sleepable lock from netinet/in_multi.c

Fri Dec 20 18:50:04 UTC 2013

>Number:         185043
>Category:       kern
>Synopsis:       Kernel panic: Sleeping thread (tid …, pid …) owns a non-sleepable lock from netinet/in_multi.c
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Dec 20 18:50:04 UTC 2013
>Closed-Date:
>Last-Modified:
>Originator:     Michael Bentkofsky
>Release:        8.4, 10/stable
>Organization:
VeriSign, Inc.
>Environment:
FreeBSD omg022-std1 8.4-RELEASE-p4 FreeBSD 8.4-RELEASE-p4 #0: Mon Nov 18 16:42:01 UTC 2013     root at bsd83-dev:/usr/obj/usr/src/sys/GENERIC  amd64

>Description:
Running ospfd from Quagga and restarting the network can trigger a kernel panic with the following stack trace:

root at omg022-std1:~ # Sleeping thread (tid 100188, pid 2047) owns a non-sleepable lock
KDB: stack backtrace of thread 100188:
#0 0xffffffff8064a4d6 at mi_switch+0x186
#1 0xffffffff8067f18c at sleepq_catch_signals+0x31c
#2 0xffffffff8067f9a9 at sleepq_timedwait_sig+0x19
#3 0xffffffff805fcdec at _cv_timedwait_sig+0x13c
#4 0xffffffff806850e8 at seltdwait+0x98
#5 0xffffffff806870df at kern_select+0x6ef
#6 0xffffffff8068737d at select+0x5d
#7 0xffffffff809ca3c4 at amd64_syscall+0x1f4
#8 0xffffffff809b253c at Xfast_syscall+0xfc
panic: sleeping thread
cpuid = 11
KDB: stack backtrace:
#0 0xffffffff806756d6 at kdb_backtrace+0x66
#1 0xffffffff80641d9e at panic+0x1ce
#2 0xffffffff806826bb at propagate_priority+0x1cb
#3 0xffffffff806833ff at turnstile_wait+0x1bf
#4 0xffffffff806322dd at _mtx_lock_sleep+0xbd
#5 0xffffffff80743ff1 at igmp_fasttimo+0x881
#6 0xffffffff8069df8b at pffasttimo+0x2b
#7 0xffffffff80655a63 at softclock+0x343
#8 0xffffffff806195a4 at intr_event_execute_handlers+0x104
#9 0xffffffff8061ac35 at ithread_loop+0x95
#10 0xffffffff806167bf at fork_exit+0x11f
#11 0xffffffff809b278e at fork_trampoline+0xe

The pid 2047 is ospfd from quagga-0.99.22. With ospfd running and configured, it uses multicast to listen for OSPF “hello” packets. Restarting the network with this command triggers the kernel panic:

# /etc/rc.d/netif restart && /etc/rc.d/routing restart
Catching the panic in ddb shows this bit of detail about the locks:

db> show allchains
chain 1:
thread 100015 (pid 12, swi4: clock) blocked on lock 0xffffffff80e4eca0 (sleep mutex) "in_multi_mtx"
thread 100233 (pid 2047, ospfd) inhibited

Looking at the places where this lock is acquired, we see several suspicious error paths that can leave the lock acquired in netinet/in_mcast.c. Here is an example from CURRENT:

static int
inp_block_unblock_source(struct inpcb *inp, struct sockopt *sopt)
{
…
	/*
	 * Begin state merge transaction at IGMP layer.
	 */
	IN_MULTI_LOCK();

	CTR1(KTR_IGMPV3, "%s: merge inm state", __func__);
	error = inm_merge(inm, imf);
	if (error) {
		CTR1(KTR_IGMPV3, "%s: failed to merge inm state", __func__);
		goto out_imf_rollback;
	}

	CTR1(KTR_IGMPV3, "%s: doing igmp downcall", __func__);
	error = igmp_change_state(inm);
	if (error)
		CTR1(KTR_IGMPV3, "%s: failed igmp downcall", __func__);

	IN_MULTI_UNLOCK();

out_imf_rollback:
	if (error)
		imf_rollback(imf);
	else
		imf_commit(imf);

	imf_reap(imf);
…
}

Thus when there is an error in inm_merge() or igmp_change_state(), this lock remains acquired. There are other similar code segments such as  inp_join_group(), inp_set_source_filters() and inp_leave_group().

The panic was produced in FreeBSD-8.4, although I have also verified that the case happens with both stable/10  at r259575 and head at the same revision. I have tested a patch that ensures IN_MULTI_UNLOCK() is called after any of these error conditions and added KTR output to the particular path that had triggered this panic.

With even this patch in place, we have further found a subsequent race condition that so far we’ve only reproduced when compiling with KTR, INVARIANTS, and DDB flags, although I suspect the race condition can still occur. Although I don’t have a patch to fix that race condition, I am interested in hearing from others how we should handle this case.

The race condition can happen when the userspace application adds a subscription to a multicast address while at the same time it is being removed. The steps to cause the race condition are exactly the same as cited at the beginning of this PR (running quagga ospfd and doing /etc/rc.d/netif restart), however it doesn’t seem to actually trigger without INVARIANTS and DDB in our test environment.

The race condition happens in two threads  as seen with the KTR output below, although we have added a little more detail in our version to help debug this case:

509 (0xffffff00473ff490:cpu1): in_getmulti: join 224.0.0.6 on 0xffffff0031a93000(eth0.100))
510 (0xffffff00473ff490:cpu1): in_getmulti: join 224.0.0.6 family 2 len 16 on 0xffffff0031a93000(eth0.100))
520 (0xffffff00473ff490:cpu1): ether_resolvemulti: family 2 len 16 on 0xffffff0031a93000(eth0.100))
521 (0xffffff00473ff490:cpu1): ether_resolvemulti: join 224.0.0.6 family 2 len 16 on 0xffffff0031a93000(eth0.100))
523 (0xffffff00473ff490:cpu1): if_addmulti: ifma allocate (0xffffff00473d6780)

In this thread, “ifma allocate” is what’s happening when [in sys/netinet/in_mcast.c] inp_join_group () will add a new address calling in_join_group_locked(), calling  in_getmulti() and add the address. 

At the same time, the interface is being removed in this path:

537 (0xffffff00474a1920:cpu9): mld_ifdetach: called for ifp 0xffffff0031a93000(eth0.100)
545 (0xffffff00474a1920:cpu9): if_purgemaddrs: will free ifma (0xffffff00473d6780)
565 (0xffffff00474a1920:cpu9): if_freemulti: free ifma (0xffffff00473d6780)

Here the interface ifp is removed via [in sys/if/if.c] if_pur gemaddrs() which will if_delmulti_locked() and then call if_freemulti() which triggers this KASSERT because a new ifma is being added, as above:

	KASSERT(ifma->ifma_protospec == NULL,
	    ("if_freemulti: protospec not NULL"));

Interestingly, this comment seems to hint at this being a known issue: 

/*
 * Detach an interface, removing it from the list of "active" interfaces.
 * If vmove flag is set on entry to if_detach_internal(), perform only a
 * limited subset of cleanup tasks, given that we are moving an ifnet from
 * one vnet to another, where it must be fully operational.
 *
 * XXXRW: There are some significant questions about event ordering, and
 * how to prevent things from starting to use the interface during detach.
 */

For this particular race condition, we continue to think about a fix, but would appreciate input from others as well.

>How-To-Repeat:
See description.
>Fix:
See attached patch.

>Release-Note:
>Audit-Trail:
>Unformatted: