inp_gcmoptions / imo_gc_task refers to free()d memory

Mon Feb 8 23:29:17 UTC 2016

Looking for clues from multicast gurus :-)

I turned on all the memory debug options looking for a ZFS issue,
and promptly stumbled over an IP multicast bug that occurs when
creating and deleting bridge and other virtual interfaces quickly.

The actual crash is here (note, clang optimized out several stack
frames due to tail calls), along with the relevant info:

    in_leavegroup_locked() at in_leavegroup_locked+0x6b/frame
    0xfffffe0babd1aae0
    inp_gcmoptions() at inp_gcmoptions+0x1e2/frame 0xfffffe0babd1ab20
    taskqueue_run_locked() at taskqueue_run_locked+0xf0/frame
    0xfffffe0babd1ab80
    taskqueue_thread_loop() at taskqueue_thread_loop+0x9b/frame
    0xfffffe0bab1dabb0
    fork_exit() at ...

    in_leavegroup_locked+0x6b:   movq    0x10(%rax),%rax

    rax     0xdeadc0dedeadc0de

That particular movq instruction is part of the
CURVNET_SET(inm->inm_ifp->if_vnet) macro call.  Here's
gdb disassembly in the area (and gdb points to that line,
netinet/in_mcast.c line 1243 or so).

   0x11c7 <in_leavegroup_locked+103>:   mov    0x18(%r14),%rax
   0x11cb <in_leavegroup_locked+107>:   mov    0x10(%rax),%rax
   0x11cf <in_leavegroup_locked+111>:	test   %rax,%rax

Here %r14 holds "inm", so it's the load to get if_vnet into %rax
that fails, not a load of ifp->foo (CURVNET_SET expands to an assert
that ifp is not NULL and that it has the right vnet magic number,
which all starts with the "test %rax" instruction).

The conclusion is that inm itself is a valid pointer but inm->inm_ifp
is 0xdeadc0dedeadc0de => we've done a free(inm, M_IPMADDR) already.

But, "struct in_multi" has a ref count.  So I'm a bit at a loss as
to where the reference count failed us...

(I did find a thread in the middle of deleting a bridge interface, but
I think at this point that this is a red herring.)

BTW, this is the wrong place for this fix, but I'll attach it anyway,
since turning on memory debugging found this bug too, so if you turn
on memory debugging you might hang if you have a ZFS root :-)

(I need to make a bit of progress with my commit bit, and just
commit the fix below...)

Chris

commit cc2a7f115fe69a5d91e54654ba1f210b7db19df6
Author: Chris Torek <torek at ixsystems.com>
Date:   Sun Feb 7 13:36:34 2016 -0800

    taskqueue_drain_all: reload after sleeping
    
    In taskqueue_drain_all(), after we use TQ_SLEEP to wait
    for pending tasks to run, we must reload the tail of the
    task queue, since it may be done now.
    
    The symptom of this bug was a thread hanging in TQ_SLEEP if a task
    lived in malloc()ed memory that was reused or (more likely) filled
    with 0xdeadc0de on free() due to memory checking being turned on.
    The latter would make the pending value become 49374 (0xc0de) and
    we would wait forever.

diff --git a/sys/kern/subr_taskqueue.c b/sys/kern/subr_taskqueue.c
index f104bb5..55a1ca1 100644
--- a/sys/kern/subr_taskqueue.c
+++ b/sys/kern/subr_taskqueue.c
@@ -440,10 +440,9 @@ taskqueue_drain_all(struct taskqueue *queue)
 		WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, __func__);
 
 	TQ_LOCK(queue);
-	task = STAILQ_LAST(&queue->tq_queue, task, ta_link);
-	if (task != NULL)
-		while (task->ta_pending != 0)
-			TQ_SLEEP(queue, task, &queue->tq_mutex, PWAIT, "-", 0);
+	while ((task = STAILQ_LAST(&queue->tq_queue, task, ta_link)) != NULL &&
+	    task->ta_pending != 0)
+		TQ_SLEEP(queue, task, &queue->tq_mutex, PWAIT, "-", 0);
 	taskqueue_drain_running(queue);
 	KASSERT(STAILQ_EMPTY(&queue->tq_queue),
 	    ("taskqueue queue is not empty after draining"));