kern/165863: [panic] [patch] in_lltable_prefix_free() races with
lla_lookup() and arptimer()
Eric van Gyzen
eric_van_gyzen at dell.com
Thu Mar 8 23:00:27 UTC 2012
>Number: 165863
>Category: kern
>Synopsis: [panic] [patch] in_lltable_prefix_free() races with lla_lookup() and arptimer()
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Thu Mar 08 23:00:27 UTC 2012
>Closed-Date:
>Last-Modified:
>Originator: Eric van Gyzen
>Release: 8.2-RELEASE
>Organization:
Dell Compellent
>Environment:
FreeBSD 8.2-RELEASE amd64
>Description:
in_lltable_prefix_free() does not acquire the if_afdata_lock, so it can free the llentry while another thread is in lla_lookup(), such as during packet processing.
While working on a fix, it quickly became apparent that in_lltable_prefix_free() does not safely drain the callout, so it races with arptimer(). If arptimer() has already tested callout_active() before in_lltable_prefix_free() calls callout_drain(), arptimer() will have freed the llentry before callout_drain() returns.
I can reliably reproduce the first problem (lla_lookup) on 8.1-RELEASE and 8.2-RELEASE. I cannot reproduce it on 9.0-RELEASE or head, but since the relevant code has not changed, I suspect it still exists.
I have not earnestly tried to reproduce the second problem (arptimer).
The same problem exists in the IPv6 code, but that code is never called.
>How-To-Repeat:
In a second process, repeatedly and rapidly call SIOCSIFADDR and SIOCDIFADDR on the /only/ interface address by which that neighbor is reached. This drives in_lltable_prefix_free().
In one process, flood ping an IPv4 neighbor. This drives lla_lookup()
via the processing of ARP replies, packets queued on la_hold, and
echo request packets.
This reliably panics in 10-20 seconds.
>Fix:
With the attached patch, my systems survive the test for over 30 minutes.
Patch attached with submission follows:
--- sys/netinet/in.c.orig 2012-03-08 12:57:29.000000000 -0600
+++ sys/netinet/in.c 2012-03-08 12:14:26.000000000 -0600
@@ -1351,21 +1351,39 @@
struct llentry *lle, *next;
register int i;
+restart:
+ IF_AFDATA_WLOCK(llt->llt_ifp);
for (i=0; i < LLTBL_HASHTBL_SIZE; i++) {
LIST_FOREACH_SAFE(lle, &llt->lle_head[i], lle_next, next) {
- if (IN_ARE_MASKED_ADDR_EQUAL((struct sockaddr_in *)L3_ADDR(lle),
+ if (IN_ARE_MASKED_ADDR_EQUAL(satosin(L3_ADDR(lle)),
pfx, msk)) {
- int canceled;
+ int canceled = 0;
- canceled = callout_drain(&lle->la_timer);
LLE_WLOCK(lle);
- if (canceled)
+ if (!callout_active(&lle->la_timer) ||
+ (canceled = callout_stop(&lle->la_timer))) {
+ if (canceled)
+ LLE_REMREF(lle);
+ llentry_free(lle);
+ } else {
+ /*
+ * The callout is in progress.
+ * Since we deactivated it, it won't
+ * do anything. Drop its reference.
+ * Drop our locks to drain it, which
+ * requires restarting this function.
+ */
LLE_REMREF(lle);
- llentry_free(lle);
+ LLE_WUNLOCK(lle);
+ IF_AFDATA_WUNLOCK(llt->llt_ifp);
+ (void) callout_drain(&lle->la_timer);
+ goto restart;
+ }
}
}
}
+ IF_AFDATA_WUNLOCK(llt->llt_ifp);
}
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list