kern/87014: BPF_MTAP/bpf_mtap are not threadsafe and cause panics on SMP systems

Mark Gooderum mark at verniernetworks.com
Thu Oct 6 22:10:18 PDT 2005


The following reply was made to PR kern/87014; it has been noted by GNATS.

From: Mark Gooderum <mark at verniernetworks.com>
To: bug-followup at FreeBSD.org,  mark at verniernetworks.com
Cc:  
Subject: Re: kern/87014: BPF_MTAP/bpf_mtap are not threadsafe and cause panics
 on SMP systems
Date: Fri, 07 Oct 2005 00:03:12 -0500

 This is a multi-part message in MIME format.
 --------------050606070600030406070008
 Content-Type: multipart/alternative;
  boundary="------------010506050407060700010906"
 
 
 --------------010506050407060700010906
 Content-Type: text/plain; charset=us-ascii; format=flowed
 Content-Transfer-Encoding: 7bit
 
 FYI - this appears to be a duplicate of  PR 73719.  I did search before 
 but somehow missed it.
 
 Using the attached test program (which spins opening and closing BPF 
 devices) I can make my system crash in a few seconds from this bug.  The 
 test setup is basically:
 
     * FreeBSD system as router
           o 2 GigE interfaces
     * 4 Traffic Generating Systems
           o Two on one interface with a netstraind running
           o Two on second interface with netstrain running
                 + Run netstrain bi-dir (ie: netstrain <desthost> <port>
                   both)
           o I can generate about 450Mbit/sec each way (900 Mbit/sec
             aggregate) with this setup
     * Start the netstraind servers
     * Start the netstrain clients
     * Things are fine
     * Run the attached test program full spin mode on one of the active
       interfaces
           o bpfspin -f 100000 bge0
     * System crashes in 1-2 seconds once bpfspin is started w/o fix
 
 The SUT was a Tyan S2882 based Dual Opteron 248 system.  The motherboard 
 has an Intel 8255x based 10/100 port and two Broadcom 5704 based GigE 
 ports onboard.  It also had a pair of PCI-X Intel Dual GigE PRO/1000M 
 cards (Intel 8254x based).  The crash was reproduced with both the bge 
 driver ports and the em driver interfaces.
 
 This test must be done on a true SMP system as the race requires two 
 active threads - there are no other preemption points in the race 
 window.  Not sure about timing on HTT systems - this testing was on a 
 true Dual Opteron system.
 
 The attached patch fixes the problem and has a couple of debug sysctls - 
 one that counts the number of hits, the second that disables the fix.  
 With the bpfspin running you can see the fix trip every second or so and 
 then disable the fix and it panics almost immediately.
 -=-
 Mark
 
 
 --------------010506050407060700010906
 Content-Type: text/html; charset=us-ascii
 Content-Transfer-Encoding: 7bit
 
 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
 <html>
 <head>
   <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
   <title></title>
 </head>
 <body bgcolor="#ffffff" text="#000000">
 FYI - this appears to be a duplicate of&nbsp; PR 73719.&nbsp; I did search before
 but somehow missed it.<br>
 <br>
 Using the attached test program (which spins opening and closing BPF
 devices) I can make my system crash in a few seconds from this bug.&nbsp;
 The test setup is basically:<br>
 <ul>
   <li>FreeBSD system as router</li>
   <ul>
     <li>2 GigE interfaces</li>
   </ul>
   <li>4 Traffic Generating Systems</li>
   <ul>
     <li>Two on one interface with a netstraind running</li>
     <li>Two on second interface with netstrain running</li>
     <ul>
       <li>Run netstrain bi-dir (ie: netstrain &lt;desthost&gt;
 &lt;port&gt; both)</li>
     </ul>
     <li>I can generate about 450Mbit/sec each way (900 Mbit/sec
 aggregate) with this setup</li>
   </ul>
   <li>Start the netstraind servers<br>
   </li>
   <li>Start the netstrain clients</li>
   <li>Things are fine</li>
   <li>Run the attached test program full spin mode on one of the active
 interfaces</li>
   <ul>
     <li>bpfspin -f 100000 bge0</li>
   </ul>
   <li>System crashes in 1-2 seconds once bpfspin is started w/o fix<br>
   </li>
 </ul>
 The SUT was a Tyan S2882 based Dual Opteron 248 system.&nbsp; The
 motherboard has an Intel 8255x based 10/100 port and two Broadcom 5704
 based GigE ports onboard.&nbsp; It also had a pair of PCI-X Intel Dual GigE
 PRO/1000M cards (Intel 8254x based).&nbsp; The crash was reproduced with
 both the bge driver ports and the em driver interfaces.<br>
 <br>
 This test must be done on a true SMP system as the race requires two
 active threads - there are no other preemption points in the race
 window.&nbsp; Not sure about timing on HTT systems - this testing was on a
 true Dual Opteron system.<br>
 <br>
 The attached patch fixes the problem and has a couple of debug sysctls
 - one that counts the number of hits, the second that disables the
 fix.&nbsp; With the bpfspin running you can see the fix trip every second or
 so and then disable the fix and it panics almost immediately.<br>
 -=-<br>
 Mark<br>
 <br>
 </body>
 </html>
 
 --------------010506050407060700010906--
 
 --------------050606070600030406070008
 Content-Type: text/plain;
  name="Makefile"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: inline;
  filename="Makefile"
 
 bpfspin:	bpfspin.o
 	gcc -g -o bpfspin bpfspin.o -lpcap
 
 bpfspin.o: bpfspin.c
 	gcc -g -c -o bpfspin.o bpfspin.c
 --------------050606070600030406070008
 Content-Type: text/x-csrc;
  name="bpfspin.c"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: inline;
  filename="bpfspin.c"
 
 /*
  * Test program to open and close a BPF a _lot_.
  */
 
 #include <errno.h>
 #include <string.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <signal.h>
 
 #include <sys/types.h>
 #include <sys/ioctl.h>
 #include <net/bpf.h>
 #include <unistd.h>
 
 #include "pcap.h"
 
 #define CAP_LEN 100
 
 const char *argv0;
 const char *iname;
 
 /* Default to something that won't match anything */
 char *filter = "ip proto 199";
 
 pcap_t *
 open_bpf(const char *ifname);
 
 void
 close_bpf(pcap_t *pct);
 
 int debug_level;
 int freq = 10;
 
 int on_sleep;
 int off_sleep;
 int per_cycle;
 int num_cycles = -1;
 int quit_flag;
 
 void
 usage(int badopt);
 
 void
 catchsig(int signo);
 
 
 int
 main(int argc, char *argv[])
 {
 	u_int64_t	npass = 0;
 	const char *estr;
 	int	eno;
 	pcap_t	*pct;
 	int	ch;
 
 	argv0 = strrchr(argv[0], '/');
 	if (argv0 == NULL) {
 		argv0 = argv[0];
 	} else {
 		argv0++;
 	}
 
 	signal(SIGTERM, catchsig);
 	signal(SIGHUP, catchsig);
 	signal(SIGQUIT, catchsig);
 	signal(SIGINT, catchsig);
 
 	/*
 	 * Args...
 	 */
 	while ((ch = getopt(argc, argv, "df:hn:o:")) != -1) {
 		switch (ch) {
 		case 'd':
 			debug_level++;
 			break;
 
 		case 'f':
 			freq = atoi(optarg);
 			break;
 
 		case 'h':
 			usage(0);
 			exit(0);
 
 		case 'n':
 			num_cycles = atoi(optarg);
 			break;
 
 		case '0':
 			on_sleep = atoi(optarg);
 			break;
 
 		default:
 			usage(optopt);
 			exit(1);
 		}
 		
 	}
 
 	argc -= optind;
 	argv += (optind - 1);
 
 	if (argc < 1) {
 		fprintf(stderr, "Error: <ifname> argument required.\n");
 		usage(-1);
 	}
 	iname = argv[1];
 
 	if (freq) {
 		per_cycle = 1000000 / freq;
 		off_sleep = per_cycle;
 	}
 	if (on_sleep) {
 		off_sleep = per_cycle - on_sleep;
 	}
 	
 	while (num_cycles) {
 		pct = open_bpf(iname);
 		if (pct == NULL) {
 			eno = errno;
 			estr = strerror(eno);
 			if (estr == NULL) {
 				estr = "<Unknown>";
 			}
 			fprintf(stderr, "Error: open_bpf(%s) failed %d/%s\n",
 				iname, eno, estr);
 			exit(3);
 		}
 		if (on_sleep) {
 			usleep(on_sleep);
 		}
 
 		close_bpf(pct);
 		if (on_sleep) {
 			usleep(off_sleep);
 		}
 
 		if (num_cycles > 0) {
 			num_cycles--;
 		}
 		npass++;
 		if (quit_flag) {
 			break;
 		}
 	}
 	printf("Open/Closed bpf on %s %llu times.\n", iname, npass);
 	exit(0);
 }
 
 pcap_t *
 open_bpf(const char *ifname)
 {
 	pcap_t	*pct;
 	int	pfd;
 	u_int	one = 1;
 	char	ebuf[PCAP_ERRBUF_SIZE];
 	struct bpf_program	dfilter;
 	u_int32_t		network = 0, netmask = 0;
 
 	pct = pcap_open_live(ifname, CAP_LEN, 0, 1000, ebuf);
 	if (pct == NULL) {
 		perror("pcap_open_live failed");
 		return(NULL);
 	}
 	pfd = pcap_get_selectable_fd(pct);
 	if (ioctl(pfd, BIOCIMMEDIATE, &one) < 0) {
 		perror("BIOCIMMEDIATE failed");
 		pcap_close(pct);
 		return(NULL);
 	}
 #if 0
 	/* Must be needed? */
 	if(pcap_lookupnet(ifname, &network, &netmask, 0) < 0) {
 		perror("pcap_lookupnet failed");
 		pcap_close(pct);
 		return(NULL);
 	}
 #endif
 	/* Compile the Dummy filter pcap program */
 	bzero(&dfilter, sizeof(struct bpf_program));
 	if (pcap_compile(pct, &dfilter, filter, 0, netmask) < 0) {
 		perror("pcap_compile failed");
 		pcap_close(pct);
 		return(NULL);
 	}
 	if (pcap_setfilter(pct, &dfilter) < 0)
 	{
 		perror("pcap_setfilter failed");
 		pcap_close(pct);
 		return(NULL);
 	}
 	return(pct);
 }
 
 void
 close_bpf(pcap_t *pct)
 {
 	pcap_close(pct);
 }
 
 
 void
 usage(int badopt)
 {	
 	if (badopt > 0) {
 		fprintf(stderr, "%s: Bad option [-%c]\n", argv0, 
 			(char) badopt);
 	}
 	fprintf(stderr, "Usage:  %s [-dh] [-f <freq>] <ifname>\n", argv0);
 	fprintf(stderr, "\t-d\tIncrease debug level by 1\n");
 	fprintf(stderr, "\t-f\tSet Flap Freq to <freq>\n");
 	fprintf(stderr, "\t-h\tPrint this help\n");
 	exit(badopt != 0);
 }
 
 void
 catchsig(int signo)
 {
 	switch (signo) {
 	case SIGHUP:
 	case SIGTERM:
 	case SIGQUIT:
 	case SIGINT:
 		quit_flag = 1;
 		break;
 	default:
 		abort();
 	}
 }
 
 
 --------------050606070600030406070008
 Content-Type: text/plain;
  name="BPFMTAP.difftxt"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: inline;
  filename="BPFMTAP.difftxt"
 
 --- /tmp/tmp.44835.0	Fri Oct  7 00:00:08 2005
 +++ freebsd5/sys/net/bpf.c	Thu Oct  6 16:34:36 2005
 @@ -81,20 +81,27 @@
  /*
   * The default read buffer size is patchable.
   */
  static int bpf_bufsize = 4096;
  SYSCTL_INT(_debug, OID_AUTO, bpf_bufsize, CTLFLAG_RW,
  	&bpf_bufsize, 0, "");
  static int bpf_maxbufsize = BPF_MAXBUFSIZE;
  SYSCTL_INT(_debug, OID_AUTO, bpf_maxbufsize, CTLFLAG_RW,
  	&bpf_maxbufsize, 0, "");
  
 +static int bpf_nullhits;
 +static int bpf_donullfix = 1;
 +SYSCTL_INT(_debug, OID_AUTO, bpf_nullfix, CTLFLAG_RW,
 +	   &bpf_donullfix, 0, "Apply the BPF null BP workaround");
 +SYSCTL_INT(_debug, OID_AUTO, bpf_nullhits, CTLFLAG_RW,
 +	   &bpf_nullhits, 0, "# of bpf_mtap/2() workarounds fired");
 +
  /*
   *  bpf_iflist is the list of interfaces; each corresponds to an ifnet
   */
  static LIST_HEAD(, bpf_if)	bpf_iflist;
  static struct mtx	bpf_mtx;		/* bpf global lock */
  
  static int	bpf_allocbufs(struct bpf_d *);
  static void	bpf_attachd(struct bpf_d *d, struct bpf_if *bp);
  static void	bpf_detachd(struct bpf_d *d);
  static void	bpf_freed(struct bpf_d *);
 @@ -1201,20 +1208,31 @@
   */
  void
  bpf_mtap(bp, m)
  	struct bpf_if *bp;
  	struct mbuf *m;
  {
  	struct bpf_d *d;
  	u_int pktlen, slen;
  
  	/*
 +	 * We can sometimes be invoked w/NULL bp due to a small race in 
 +	 * BPF_MTAP(), see PR#xxxxx.
 +	 */
 +	if (bpf_donullfix) {
 +		if (!bp) {
 +			bpf_nullhits++;
 +			return;
 +		}
 +	}
 +
 +	/*
  	 * Lockless read to avoid cost of locking the interface if there are
  	 * no descriptors attached.
  	 */
  	if (LIST_EMPTY(&bp->bif_dlist))
  		return;
  
  	pktlen = m_length(m, NULL);
  	if (pktlen == m->m_len) {
  		bpf_tap(bp, mtod(m, u_char *), pktlen);
  		return;
 @@ -1245,20 +1263,31 @@
  void
  bpf_mtap2(bp, data, dlen, m)
  	struct bpf_if *bp;
  	void *data;
  	u_int dlen;
  	struct mbuf *m;
  {
  	struct mbuf mb;
  	struct bpf_d *d;
  	u_int pktlen, slen;
 +
 +	/*
 +	 * We can sometimes be invoked w/NULL bp due to a small race in 
 +	 * BPF_MTAP2(), see PR#xxxxx.
 +	 */
 +	if (bpf_donullfix) {
 +		if (!bp) {
 +			bpf_nullhits++;
 +			return;
 +		}
 +	}
  
  	/*
  	 * Lockless read to avoid cost of locking the interface if there are
  	 * no descriptors attached.
  	 */
  	if (LIST_EMPTY(&bp->bif_dlist))
  		return;
  
  	pktlen = m_length(m, NULL);
  	/*
 
 --------------050606070600030406070008--
 


More information about the freebsd-bugs mailing list