kern/134548: bge(4) panics on shutdown under heavy traffic load

Alexander Sack pisymbol at gmail.com
Thu May 14 21:10:01 UTC 2009


>Number:         134548
>Category:       kern
>Synopsis:       bge(4) panics on shutdown under heavy traffic load
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu May 14 21:10:01 UTC 2009
>Closed-Date:
>Last-Modified:
>Originator:     Alexander Sack
>Release:        CURRENT (8.x) and 7.1-RELEASE-amd64
>Organization:
Niksun
>Environment:
>Description:
Well shutting down an interface either via IOCTL's or ifconfig bgeX down etc., the bge driver will panic in bge_rxeof() with a kernel page fault (it was trying to access a mbuf I believe).

The problem is a race between bge_stop() and bge_rxeof() for the softc lock.  What is happening is the following:

- bge_intr()
- bge_rxeof()
- process rings in while loop
- bge_stop() is called in the middle of processing BD's in bge_rxeof()
- bge_rxeof() releases soft sc lock BGE_UNLOCK() before calling input routine
- bge_stop() is left through, stops the hardware, and marks the ifp as resets the IFP_DRV_RUNNING flag
- bge_rxeof() continues to process RX rings (BDs) and panics since memory maps have been unloaded and resources released



>How-To-Repeat:
Connect two BGE ports on any amd64 system running CURRENT or 7.1-RELEASE+ and shoot large amounts of traffic through them.  I was sending GIGE traffic through two ports at 100% utilization (it was actually SmartBit traffic).
>Fix:
-- if_bge.c.CURRENT	2009-05-14 14:39:39.000000000 -0400
+++ if_bge.c	2009-05-14 16:57:02.000000000 -0400
@@ -3073,8 +3073,9 @@
 		bus_dmamap_sync(sc->bge_cdata.bge_rx_jumbo_ring_tag,
 		    sc->bge_cdata.bge_rx_jumbo_ring_map, BUS_DMASYNC_POSTREAD);
 
-	while(sc->bge_rx_saved_considx !=
-	    sc->bge_ldata.bge_status_block->bge_idx[0].bge_rx_prod_idx) {
+	while (sc->bge_rx_saved_considx !=
+	    sc->bge_ldata.bge_status_block->bge_idx[0].bge_rx_prod_idx && 
+		(ifp->if_drv_flags & IFF_DRV_RUNNING)) {
 		struct bge_rx_bd	*cur_rx;
 		uint32_t		rxidx;
 		struct mbuf		*m = NULL;


Patch above follows similar style if_em logic in that we check before proceeding to process the RX ring if the driver is running.  This prevents all panics at the cost of an extra check every time we are on the loop (albeit my testing has not shown any significant performance penalty yet to cause drops but I realize this execution path is very sensitive).

>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list