kern/189003: Page fault in lacp_req() while the lagg is being destroyed

Alan Somers asomers at freebsd.org
Fri Apr 25 22:40:01 UTC 2014


>Number:         189003
>Category:       kern
>Synopsis:       Page fault in lacp_req() while the lagg is being destroyed
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Apr 25 22:40:01 UTC 2014
>Closed-Date:
>Last-Modified:
>Originator:     Alan Somers
>Release:        11.0 CURRENT
>Organization:
Spectra Logic
>Environment:
FreeBSD alans-fbsd-head 11.0-CURRENT FreeBSD 11.0-CURRENT #53 r264920M: Fri Apr 25 13:52:21 MDT 2014     alans at ns1.eng.sldomain.com:/vmpool/obj/usr/home/alans/freebsd/head/sys/GENERIC  amd64
>Description:
If you do an "ifconfig -am" in one thread while doing an "ifconfig lagg0 destroy" in another thread, at least two panics may result.  One is in lacp_req(), caused by NULL == lsc.

What happens is that the "ifconfig lagg0 destroy" thread does this:
1) lagg_clone_destroy() acquires LAGG_WLOCK(sc)
2) lagg_clone_destroy() calls lagg_lacp_detach, which calls lacp_detach, which sets sc->sc_psc = NULL
3) lagg_clone_destroy() calls LAGG_WUNLOCK(sc)

then the "ifconfig status" thread does this:
1) calls lagg_ioctl(SIOCGLAGG)
2) lagg_ioctl() acquires LAGG_RLOCK(sc, &tracker)
3) lagg_ioctl() calls sc->sc_req, which dereferences to lacp_req
4) lacp_req does *lsc = LACP_SOFTC(sc), which returns NULL
5) lacp_req dereferences lsc, and panics


db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe009781d380
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe009781d430
witness_warn() at witness_warn+0x4b5/frame 0xfffffe009781d4f0
trap_pfault() at trap_pfault+0x59/frame 0xfffffe009781d590
trap() at trap+0x4d5/frame 0xfffffe009781d7a0
calltrap() at calltrap+0x8/frame 0xfffffe009781d7a0
--- trap 0xc, rip = 0xffffffff81eb9b44, rsp = 0xfffffe009781d860, rbp = 0xfffffe009781d890 ---
lacp_req() at lacp_req+0x14/frame 0xfffffe009781d890
lagg_ioctl() at lagg_ioctl+0x270/frame 0xfffffe009781d970
ifioctl() at ifioctl+0xbf7/frame 0xfffffe009781da30
kern_ioctl() at kern_ioctl+0x22b/frame 0xfffffe009781da90
sys_ioctl() at sys_ioctl+0x13c/frame 0xfffffe009781dae0
amd64_syscall() at amd64_syscall+0x25a/frame 0xfffffe009781dbf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe009781dbf0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800fa045a, rsp = 0x7fffffffd808, rbp = 0x7fffffffe290 ---

>How-To-Repeat:
First, backout change 253687.  That will increase the likelihood of hitting this panic.

Run this script:

#! /usr/local/bin/bash

ifconfig tap0 create
sleep .2
ifconfig tap1 create
sleep .2
ifconfig tap2 create
sleep .2
ifconfig tap0 up
sleep .2
ifconfig tap1 up
sleep .2
ifconfig tap2 up
sleep .2

while true; do
        echo "About to create"
        ifconfig lagg0 create
        #sleep 0.2

        echo "About to up"
        ifconfig lagg0 up laggproto lacp laggport tap0 laggport tap1 laggport tap2 192.0.0.2/24
        sleep 0.2

        echo "About to destroy"
        ifconfig lagg0 destroy
        sleep 0.2
done &

while true; do
        ifconfig -am > /dev/null
done

>Fix:
The purpose of lacp_req is to return LACP property information to userland when you do "ifconfig lagg0".  So I think that it would be ok if it returned a block full of zeros.  This would only happen while the interface is being destroyed, and userland should be able to deal with that.  So my proposed fix (attached), is to simply check for NULL == lsc and return early.

Patch attached with submission follows:

Index: sys/net/ieee8023ad_lacp.c
===================================================================
--- sys/net/ieee8023ad_lacp.c	(revision 264920)
+++ sys/net/ieee8023ad_lacp.c	(working copy)
@@ -590,10 +590,20 @@
 {
 	struct lacp_opreq *req = (struct lacp_opreq *)data;
 	struct lacp_softc *lsc = LACP_SOFTC(sc);
-	struct lacp_aggregator *la = lsc->lsc_active_aggregator;
+	struct lacp_aggregator *la;
 
+	bzero(req, sizeof(struct lacp_opreq));
+	
+	/* 
+	 * If the LACP softc is NULL, return with the opreq structure full of
+	 * zeros.  It is normal for the softc to be NULL while the lagg is
+	 * being destroyed.
+	 */
+	if (NULL == lsc)
+		return;
+
+	la = lsc->lsc_active_aggregator;
 	LACP_LOCK(lsc);
-	bzero(req, sizeof(struct lacp_opreq));
 	if (la != NULL) {
 		req->actor_prio = ntohs(la->la_actor.lip_systemid.lsi_prio);
 		memcpy(&req->actor_mac, &la->la_actor.lip_systemid.lsi_mac,


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list