kern/178318: [patch] [arge] if_arge/bootp race under some circunstances

Luiz Otavio O Souza loos.br at gmail.com
Fri May 3 13:00:02 UTC 2013


>Number:         178318
>Category:       kern
>Synopsis:       [patch] [arge] if_arge/bootp race under some circunstances
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri May 03 13:00:01 UTC 2013
>Closed-Date:
>Last-Modified:
>Originator:     Luiz Otavio O Souza
>Release:        -head r250121
>Organization:
>Environment:
FreeBSD rb433 10.0-CURRENT FreeBSD 10.0-CURRENT #61 r250121M: Fri May  3 09:45:51 BRT 2013     root at devel:/data/rb/rb433/obj/mips.mips/data/rb/rb433/src/sys/RSPRO  mips
>Description:
I'd discovered (by the hard way :) that adding some debug on arge_init_locked() (like the example bellow) will cause bootp to fail.

Index: mips/atheros/if_arge.c
===================================================================
--- mips/atheros/if_arge.c      (revision 250121)
+++ mips/atheros/if_arge.c      (working copy)
@@ -1006,6 +1006,7 @@
 
        ARGE_LOCK_ASSERT(sc);
 
+printf("%s: called\n", __func__);
        arge_stop(sc);
 
        /* Init circular RX list. */


Bootp will loop for a while with the timeout message until the kernel panics:


arge0: link state changed to UP
arge_init_locked: called
arge_init_locked: called
arge_init_locked: called
arge_init_locked: called
arge_init_locked: called
arge_init_locked: called
arge_init_locked: called
arge_init_locked: called
DHCP/BOOTP timeout for server 255.255.255.255
arge_init_locked: called
arge_init_locked: called
DHCP/BOOTP timeout for server 255.255.255.255
arge_init_locked: called
arge_init_locked: called
DHCP/BOOTP timeout for server 255.255.255.255
arge_init_locked: called
arge_init_locked: called
DHCP/BOOTP timeout for server 255.255.255.255
arge_init_locked: called
arge_init_locked: called
DHCP/BOOTP timeout for server 255.255.255.255
arge_init_locked: called
panic: EFBIG
KDB: enter: panic
[ thread pid 0 tid 100000 ]
Stopped at      kdb_enter+0x4c: lui     at,0x8059
db> 


After confirm that it really was the printf() that causes the problem i started to look why arge_init() was being called twice between the timeouts and why it was making bootp timeout and fail to boot.

A few things contribute for this race to occur, first arge_init() forces a full stop->start cicle every time it is called, so with the following debug we can understand what happens:

bootpc_call: set netmask 0.0.0.0
arge_init_locked: called
bootpc_call: sosend()
bootpc_call: set netmask 255.0.0.0
arge_init_locked: called
DHCP/BOOTP timeout for server 255.255.255.255
bootpc_call: soreceive()
bootpc_call: soreceive()
bootpc_call: soreceive()
bootpc_call: soreceive()
bootpc_call: set netmask 0.0.0.0
arge_init_locked: called
bootpc_call: sosend()
bootpc_call: set netmask 255.0.0.0
arge_init_locked: called
DHCP/BOOTP timeout for server 255.255.255.255
bootpc_call: soreceive()
bootpc_call: soreceive()
bootpc_call: soreceive()
bootpc_call: soreceive()
bootpc_call: soreceive()
bootpc_call: set netmask 0.0.0.0


If arge_init() isn't fast enough while resetting the driver on the second netmask change it will miss the bootp response packet.


>How-To-Repeat:
Add something like this to arge_init_locked():


Index: mips/atheros/if_arge.c
===================================================================
--- mips/atheros/if_arge.c      (revision 250121)
+++ mips/atheros/if_arge.c      (working copy)
@@ -1006,6 +1006,7 @@
 
        ARGE_LOCK_ASSERT(sc);
 
+printf("%s: called\n", __func__);
        arge_stop(sc);
 
        /* Init circular RX list. */


Add the following to RSPRO kernel:


Index: sys/mips/conf/RSPRO
===================================================================
--- sys/mips/conf/RSPRO (revision 250121)
+++ sys/mips/conf/RSPRO (working copy)
@@ -28,3 +28,12 @@
 # Boot off of flash
 options                ROOTDEVNAME=\"ufs:redboot/rootfs.uzip\"
 
+options                NFSCL
+options                NFS_ROOT
+options                BOOTP
+options                BOOTP_NFSROOT
+options                BOOTP_NFSV3
+options                BOOTP_WIRED_TO=arge0
+options                BOOTP_COMPAT
+
+


And try boot from bootp.
>Fix:
The fix is based on simply refuse to proceed with the driver restart if the driver is already 'up' and 'running'. There is no need to restart the driver on each time we change or add an IP address or netmask.

Then, if we just proceed when the driver is stopped we don't need to force the stop->start cicle anymore.

The leakage that leads to the panic will be fixed in a subsequent PR.

Patch attached with submission follows:

Index: sys/mips/atheros/if_arge.c
===================================================================
--- sys/mips/atheros/if_arge.c	(revision 250121)
+++ sys/mips/atheros/if_arge.c	(working copy)
@@ -1006,7 +1006,8 @@
 
 	ARGE_LOCK_ASSERT(sc);
 
-	arge_stop(sc);
+	if ((ifp->if_flags & IFF_UP) && (ifp->if_drv_flags & IFF_DRV_RUNNING))
+		return;
 
 	/* Init circular RX list. */
 	if (arge_rx_ring_init(sc) != 0) {


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list