kern/165863: [panic] [patch] in_lltable_prefix_free() races with lla_lookup() and arptimer()

Eric van Gyzen eric_van_gyzen at dell.com
Thu Mar 8 23:00:27 UTC 2012


>Number:         165863
>Category:       kern
>Synopsis:       [panic] [patch] in_lltable_prefix_free() races with lla_lookup() and arptimer()
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Mar 08 23:00:27 UTC 2012
>Closed-Date:
>Last-Modified:
>Originator:     Eric van Gyzen
>Release:        8.2-RELEASE
>Organization:
Dell Compellent
>Environment:
FreeBSD 8.2-RELEASE amd64
>Description:
in_lltable_prefix_free() does not acquire the if_afdata_lock, so it can free the llentry while another thread is in lla_lookup(), such as during packet processing.

While working on a fix, it quickly became apparent that in_lltable_prefix_free() does not safely drain the callout, so it races with arptimer().  If arptimer() has already tested callout_active() before in_lltable_prefix_free() calls callout_drain(), arptimer() will have freed the llentry before callout_drain() returns.

I can reliably reproduce the first problem (lla_lookup) on 8.1-RELEASE and 8.2-RELEASE.  I cannot reproduce it on 9.0-RELEASE or head, but since the relevant code has not changed, I suspect it still exists.

I have not earnestly tried to reproduce the second problem (arptimer).

The same problem exists in the IPv6 code, but that code is never called.
>How-To-Repeat:
In a second process, repeatedly and rapidly call SIOCSIFADDR and SIOCDIFADDR on the /only/ interface address by which that neighbor is reached.  This drives in_lltable_prefix_free().

In one process, flood ping an IPv4 neighbor.  This drives lla_lookup()
via the processing of ARP replies, packets queued on la_hold, and
echo request packets.

This reliably panics in 10-20 seconds.
>Fix:
With the attached patch, my systems survive the test for over 30 minutes.

Patch attached with submission follows:

--- sys/netinet/in.c.orig	2012-03-08 12:57:29.000000000 -0600
+++ sys/netinet/in.c	2012-03-08 12:14:26.000000000 -0600
@@ -1351,21 +1351,39 @@
 	struct llentry *lle, *next;
 	register int i;
 
+restart:
+	IF_AFDATA_WLOCK(llt->llt_ifp);
 	for (i=0; i < LLTBL_HASHTBL_SIZE; i++) {
 		LIST_FOREACH_SAFE(lle, &llt->lle_head[i], lle_next, next) {
 
-			if (IN_ARE_MASKED_ADDR_EQUAL((struct sockaddr_in *)L3_ADDR(lle), 
+			if (IN_ARE_MASKED_ADDR_EQUAL(satosin(L3_ADDR(lle)),
 						     pfx, msk)) {
-				int canceled;
+				int canceled = 0;
 
-				canceled = callout_drain(&lle->la_timer);
 				LLE_WLOCK(lle);
-				if (canceled)
+				if (!callout_active(&lle->la_timer) ||
+				    (canceled = callout_stop(&lle->la_timer))) {
+					if (canceled)
+						LLE_REMREF(lle);
+					llentry_free(lle);
+				} else {
+					/*
+					 * The callout is in progress.
+					 * Since we deactivated it, it won't
+					 * do anything.  Drop its reference.
+					 * Drop our locks to drain it, which
+					 * requires restarting this function.
+					 */
 					LLE_REMREF(lle);
-				llentry_free(lle);
+					LLE_WUNLOCK(lle);
+					IF_AFDATA_WUNLOCK(llt->llt_ifp);
+					(void) callout_drain(&lle->la_timer);
+					goto restart;
+				}
 			}
 		}
 	}
+	IF_AFDATA_WUNLOCK(llt->llt_ifp);
 }
 
 


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list