kern/87014: BPF_MTAP/bpf_mtap are not threadsafe and cause
panics on SMP systems
Mark Gooderum
mark at verniernetworks.com
Thu Oct 6 22:10:18 PDT 2005
The following reply was made to PR kern/87014; it has been noted by GNATS.
From: Mark Gooderum <mark at verniernetworks.com>
To: bug-followup at FreeBSD.org, mark at verniernetworks.com
Cc:
Subject: Re: kern/87014: BPF_MTAP/bpf_mtap are not threadsafe and cause panics
on SMP systems
Date: Fri, 07 Oct 2005 00:03:12 -0500
This is a multi-part message in MIME format.
--------------050606070600030406070008
Content-Type: multipart/alternative;
boundary="------------010506050407060700010906"
--------------010506050407060700010906
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
FYI - this appears to be a duplicate of PR 73719. I did search before
but somehow missed it.
Using the attached test program (which spins opening and closing BPF
devices) I can make my system crash in a few seconds from this bug. The
test setup is basically:
* FreeBSD system as router
o 2 GigE interfaces
* 4 Traffic Generating Systems
o Two on one interface with a netstraind running
o Two on second interface with netstrain running
+ Run netstrain bi-dir (ie: netstrain <desthost> <port>
both)
o I can generate about 450Mbit/sec each way (900 Mbit/sec
aggregate) with this setup
* Start the netstraind servers
* Start the netstrain clients
* Things are fine
* Run the attached test program full spin mode on one of the active
interfaces
o bpfspin -f 100000 bge0
* System crashes in 1-2 seconds once bpfspin is started w/o fix
The SUT was a Tyan S2882 based Dual Opteron 248 system. The motherboard
has an Intel 8255x based 10/100 port and two Broadcom 5704 based GigE
ports onboard. It also had a pair of PCI-X Intel Dual GigE PRO/1000M
cards (Intel 8254x based). The crash was reproduced with both the bge
driver ports and the em driver interfaces.
This test must be done on a true SMP system as the race requires two
active threads - there are no other preemption points in the race
window. Not sure about timing on HTT systems - this testing was on a
true Dual Opteron system.
The attached patch fixes the problem and has a couple of debug sysctls -
one that counts the number of hits, the second that disables the fix.
With the bpfspin running you can see the fix trip every second or so and
then disable the fix and it panics almost immediately.
-=-
Mark
--------------010506050407060700010906
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
FYI - this appears to be a duplicate of PR 73719. I did search before
but somehow missed it.<br>
<br>
Using the attached test program (which spins opening and closing BPF
devices) I can make my system crash in a few seconds from this bug.
The test setup is basically:<br>
<ul>
<li>FreeBSD system as router</li>
<ul>
<li>2 GigE interfaces</li>
</ul>
<li>4 Traffic Generating Systems</li>
<ul>
<li>Two on one interface with a netstraind running</li>
<li>Two on second interface with netstrain running</li>
<ul>
<li>Run netstrain bi-dir (ie: netstrain <desthost>
<port> both)</li>
</ul>
<li>I can generate about 450Mbit/sec each way (900 Mbit/sec
aggregate) with this setup</li>
</ul>
<li>Start the netstraind servers<br>
</li>
<li>Start the netstrain clients</li>
<li>Things are fine</li>
<li>Run the attached test program full spin mode on one of the active
interfaces</li>
<ul>
<li>bpfspin -f 100000 bge0</li>
</ul>
<li>System crashes in 1-2 seconds once bpfspin is started w/o fix<br>
</li>
</ul>
The SUT was a Tyan S2882 based Dual Opteron 248 system. The
motherboard has an Intel 8255x based 10/100 port and two Broadcom 5704
based GigE ports onboard. It also had a pair of PCI-X Intel Dual GigE
PRO/1000M cards (Intel 8254x based). The crash was reproduced with
both the bge driver ports and the em driver interfaces.<br>
<br>
This test must be done on a true SMP system as the race requires two
active threads - there are no other preemption points in the race
window. Not sure about timing on HTT systems - this testing was on a
true Dual Opteron system.<br>
<br>
The attached patch fixes the problem and has a couple of debug sysctls
- one that counts the number of hits, the second that disables the
fix. With the bpfspin running you can see the fix trip every second or
so and then disable the fix and it panics almost immediately.<br>
-=-<br>
Mark<br>
<br>
</body>
</html>
--------------010506050407060700010906--
--------------050606070600030406070008
Content-Type: text/plain;
name="Makefile"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="Makefile"
bpfspin: bpfspin.o
gcc -g -o bpfspin bpfspin.o -lpcap
bpfspin.o: bpfspin.c
gcc -g -c -o bpfspin.o bpfspin.c
--------------050606070600030406070008
Content-Type: text/x-csrc;
name="bpfspin.c"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="bpfspin.c"
/*
* Test program to open and close a BPF a _lot_.
*/
#include <errno.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/ioctl.h>
#include <net/bpf.h>
#include <unistd.h>
#include "pcap.h"
#define CAP_LEN 100
const char *argv0;
const char *iname;
/* Default to something that won't match anything */
char *filter = "ip proto 199";
pcap_t *
open_bpf(const char *ifname);
void
close_bpf(pcap_t *pct);
int debug_level;
int freq = 10;
int on_sleep;
int off_sleep;
int per_cycle;
int num_cycles = -1;
int quit_flag;
void
usage(int badopt);
void
catchsig(int signo);
int
main(int argc, char *argv[])
{
u_int64_t npass = 0;
const char *estr;
int eno;
pcap_t *pct;
int ch;
argv0 = strrchr(argv[0], '/');
if (argv0 == NULL) {
argv0 = argv[0];
} else {
argv0++;
}
signal(SIGTERM, catchsig);
signal(SIGHUP, catchsig);
signal(SIGQUIT, catchsig);
signal(SIGINT, catchsig);
/*
* Args...
*/
while ((ch = getopt(argc, argv, "df:hn:o:")) != -1) {
switch (ch) {
case 'd':
debug_level++;
break;
case 'f':
freq = atoi(optarg);
break;
case 'h':
usage(0);
exit(0);
case 'n':
num_cycles = atoi(optarg);
break;
case '0':
on_sleep = atoi(optarg);
break;
default:
usage(optopt);
exit(1);
}
}
argc -= optind;
argv += (optind - 1);
if (argc < 1) {
fprintf(stderr, "Error: <ifname> argument required.\n");
usage(-1);
}
iname = argv[1];
if (freq) {
per_cycle = 1000000 / freq;
off_sleep = per_cycle;
}
if (on_sleep) {
off_sleep = per_cycle - on_sleep;
}
while (num_cycles) {
pct = open_bpf(iname);
if (pct == NULL) {
eno = errno;
estr = strerror(eno);
if (estr == NULL) {
estr = "<Unknown>";
}
fprintf(stderr, "Error: open_bpf(%s) failed %d/%s\n",
iname, eno, estr);
exit(3);
}
if (on_sleep) {
usleep(on_sleep);
}
close_bpf(pct);
if (on_sleep) {
usleep(off_sleep);
}
if (num_cycles > 0) {
num_cycles--;
}
npass++;
if (quit_flag) {
break;
}
}
printf("Open/Closed bpf on %s %llu times.\n", iname, npass);
exit(0);
}
pcap_t *
open_bpf(const char *ifname)
{
pcap_t *pct;
int pfd;
u_int one = 1;
char ebuf[PCAP_ERRBUF_SIZE];
struct bpf_program dfilter;
u_int32_t network = 0, netmask = 0;
pct = pcap_open_live(ifname, CAP_LEN, 0, 1000, ebuf);
if (pct == NULL) {
perror("pcap_open_live failed");
return(NULL);
}
pfd = pcap_get_selectable_fd(pct);
if (ioctl(pfd, BIOCIMMEDIATE, &one) < 0) {
perror("BIOCIMMEDIATE failed");
pcap_close(pct);
return(NULL);
}
#if 0
/* Must be needed? */
if(pcap_lookupnet(ifname, &network, &netmask, 0) < 0) {
perror("pcap_lookupnet failed");
pcap_close(pct);
return(NULL);
}
#endif
/* Compile the Dummy filter pcap program */
bzero(&dfilter, sizeof(struct bpf_program));
if (pcap_compile(pct, &dfilter, filter, 0, netmask) < 0) {
perror("pcap_compile failed");
pcap_close(pct);
return(NULL);
}
if (pcap_setfilter(pct, &dfilter) < 0)
{
perror("pcap_setfilter failed");
pcap_close(pct);
return(NULL);
}
return(pct);
}
void
close_bpf(pcap_t *pct)
{
pcap_close(pct);
}
void
usage(int badopt)
{
if (badopt > 0) {
fprintf(stderr, "%s: Bad option [-%c]\n", argv0,
(char) badopt);
}
fprintf(stderr, "Usage: %s [-dh] [-f <freq>] <ifname>\n", argv0);
fprintf(stderr, "\t-d\tIncrease debug level by 1\n");
fprintf(stderr, "\t-f\tSet Flap Freq to <freq>\n");
fprintf(stderr, "\t-h\tPrint this help\n");
exit(badopt != 0);
}
void
catchsig(int signo)
{
switch (signo) {
case SIGHUP:
case SIGTERM:
case SIGQUIT:
case SIGINT:
quit_flag = 1;
break;
default:
abort();
}
}
--------------050606070600030406070008
Content-Type: text/plain;
name="BPFMTAP.difftxt"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="BPFMTAP.difftxt"
--- /tmp/tmp.44835.0 Fri Oct 7 00:00:08 2005
+++ freebsd5/sys/net/bpf.c Thu Oct 6 16:34:36 2005
@@ -81,20 +81,27 @@
/*
* The default read buffer size is patchable.
*/
static int bpf_bufsize = 4096;
SYSCTL_INT(_debug, OID_AUTO, bpf_bufsize, CTLFLAG_RW,
&bpf_bufsize, 0, "");
static int bpf_maxbufsize = BPF_MAXBUFSIZE;
SYSCTL_INT(_debug, OID_AUTO, bpf_maxbufsize, CTLFLAG_RW,
&bpf_maxbufsize, 0, "");
+static int bpf_nullhits;
+static int bpf_donullfix = 1;
+SYSCTL_INT(_debug, OID_AUTO, bpf_nullfix, CTLFLAG_RW,
+ &bpf_donullfix, 0, "Apply the BPF null BP workaround");
+SYSCTL_INT(_debug, OID_AUTO, bpf_nullhits, CTLFLAG_RW,
+ &bpf_nullhits, 0, "# of bpf_mtap/2() workarounds fired");
+
/*
* bpf_iflist is the list of interfaces; each corresponds to an ifnet
*/
static LIST_HEAD(, bpf_if) bpf_iflist;
static struct mtx bpf_mtx; /* bpf global lock */
static int bpf_allocbufs(struct bpf_d *);
static void bpf_attachd(struct bpf_d *d, struct bpf_if *bp);
static void bpf_detachd(struct bpf_d *d);
static void bpf_freed(struct bpf_d *);
@@ -1201,20 +1208,31 @@
*/
void
bpf_mtap(bp, m)
struct bpf_if *bp;
struct mbuf *m;
{
struct bpf_d *d;
u_int pktlen, slen;
/*
+ * We can sometimes be invoked w/NULL bp due to a small race in
+ * BPF_MTAP(), see PR#xxxxx.
+ */
+ if (bpf_donullfix) {
+ if (!bp) {
+ bpf_nullhits++;
+ return;
+ }
+ }
+
+ /*
* Lockless read to avoid cost of locking the interface if there are
* no descriptors attached.
*/
if (LIST_EMPTY(&bp->bif_dlist))
return;
pktlen = m_length(m, NULL);
if (pktlen == m->m_len) {
bpf_tap(bp, mtod(m, u_char *), pktlen);
return;
@@ -1245,20 +1263,31 @@
void
bpf_mtap2(bp, data, dlen, m)
struct bpf_if *bp;
void *data;
u_int dlen;
struct mbuf *m;
{
struct mbuf mb;
struct bpf_d *d;
u_int pktlen, slen;
+
+ /*
+ * We can sometimes be invoked w/NULL bp due to a small race in
+ * BPF_MTAP2(), see PR#xxxxx.
+ */
+ if (bpf_donullfix) {
+ if (!bp) {
+ bpf_nullhits++;
+ return;
+ }
+ }
/*
* Lockless read to avoid cost of locking the interface if there are
* no descriptors attached.
*/
if (LIST_EMPTY(&bp->bif_dlist))
return;
pktlen = m_length(m, NULL);
/*
--------------050606070600030406070008--
More information about the freebsd-bugs
mailing list