kern/144453: bpf(4) can panic due to a race condition on descriptor destruction

Wed Mar 3 21:20:04 UTC 2010

>Number:         144453
>Category:       kern
>Synopsis:       bpf(4) can panic due to a race condition on descriptor destruction
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Mar 03 21:20:03 UTC 2010
>Closed-Date:
>Last-Modified:
>Originator:     Alexander Sack
>Release:        CURRENT, 7.2-amd64
>Organization:
Niksun
>Environment:
SA5000PAL Intel board with 8GB of RAM, em(4) network interface card
>Description:
When an application polls on a particular bpf descriptor, a timeout is scheduled, bpf_timed_out() via callout_reset().  If a buffer is not available within the poll period, bpf_timed_out() is fired which will change the bpf_d state and wakeup any threads waiting for an event.  When bpf_timed_out() is attempts to acquire the descriptor lock.

Now if a process is in the middle of a poll/select and closes (gracefully or otherwise) when the bpf descriptor is closed, bpf_dtor() is called.  This will acquire the descriptor lock and do callout_stop() if the bpf state is in BPF_WAITING (i.e. select was called and callout_reset has completed scheduling the callout).  After calling callout_stop() it released the descriptor lock where now a race condition can occur.

If callout_stop() can't stop bpf_timed_out() from firing (say it has already fired) then bpf_timed_out() is sitting waiting on the descriptor lock to continue. When bpf_dtor() drops the lock, bpf_timed_out() is allowed to continue. But bpf_dtor() is going to free the descriptor that bpf_timed_out() is currently changing.  This can lead to panic.

The patch attached addresses this situation by just doing a callout_active() and if so do a callout_drain() which will wait until bpf_timed_out() has finished.  This allows bpf_dtor() to confidently free the descriptor during close operation.
>How-To-Repeat:
Loads of pollers on a descriptor with high load during a shutdown.
>Fix:
See patch attached.  I tested this on my Intel machine issuing 200 tcpdump processes with zerocopy disabled and enabled (even though with zerocopy libpcap doens't poll on it) capturing 100% utilization gige traffic.  No panic occured during shutdown.  We also saw this using our own custom packet capture application which is where I discovered and fixed the problem.

Patch attached with submission follows:

? bpf.patch
Index: bpf.c
===================================================================
RCS file: /home/ncvs/src/sys/net/bpf.c,v
retrieving revision 1.219
diff -u -r1.219 bpf.c

--- bpf.c	20 Feb 2010 00:19:21 -0000	1.219
+++ bpf.c	3 Mar 2010 21:04:48 -0000
@@ -614,6 +614,15 @@
 	mac_bpfdesc_destroy(d);
 #endif /* MAC */
 	knlist_destroy(&d->bd_sel.si_note);
+	/*
+	 * If we could not stop the callout above, 
+	 * then when we release the descriptor lock, 
+	 * there is a race between when bpf_timed_out() 
+	 * finishes and descriptor tear down.  Check
+	 * for it and drain.
+	 */
+	if (callout_active(&d->bd_callout))
+		callout_drain(&d->bd_callout);
 	bpf_freed(d);
 	free(d, M_BPF);
 }


>Release-Note:
>Audit-Trail:
>Unformatted: