svn commit: r227614 - in head: share/man/man4 sys/dev/netmap sys/net tools/tools tools/tools/netmap

Luigi Rizzo luigi at FreeBSD.org
Thu Nov 17 12:17:40 UTC 2011


Author: luigi
Date: Thu Nov 17 12:17:39 2011
New Revision: 227614
URL: http://svn.freebsd.org/changeset/base/227614

Log:
  Bring in support for netmap, a framework for very efficient packet
  I/O from userspace, capable of line rate at 10G, see
  
  	http://info.iet.unipi.it/~luigi/netmap/
  
  At this time I am bringing in only the generic code (sys/dev/netmap/
  plus two headers under sys/net/), and some sample applications in
  tools/tools/netmap. There is also a manpage in share/man/man4 [1]
  
  In order to make use of the framework you need to build a kernel
  with "device netmap", and patch individual drivers with the code
  that you can find in
  
  	sys/dev/netmap/head.diff
  
  The file will go away as the relevant pieces are committed to
  the various device drivers, which should happen in a few days
  after talking to the driver maintainers.
  
  Netmap support is available at the moment for Intel 10G and 1G
  cards (ixgbe, em/lem/igb), and for the Realtek 1G card ("re").
  I have partial patches for "bge" and am starting to work on "cxgbe".
  Hopefully changes are trivial enough so interested third parties
  can submit their patches. Interested people can contact me
  for advice on how to add netmap support to specific devices.
  
  CREDITS:
      Netmap has been developed by Luigi Rizzo and other collaborators
      at the Universita` di Pisa, and supported by EU project CHANGE
      (http://www.change-project.eu/)
      The code is distributed under a BSD Copyright.
  
  [1] In my opinion is a bad idea to have all manpage in one directory.
    We should place kernel documentation in the same dir that contains
    the code, which would make it much simpler to keep doc and code
    in sync, reduce the clutter in share/man/ and incidentally is
    the policy used for all of userspace code.
    Makefiles and doc tools can be trivially adjusted to find the
    manpages in the relevant subdirs.

Added:
  head/share/man/man4/netmap.4   (contents, props changed)
  head/sys/dev/netmap/
  head/sys/dev/netmap/head.diff   (contents, props changed)
  head/sys/dev/netmap/if_em_netmap.h   (contents, props changed)
  head/sys/dev/netmap/if_igb_netmap.h   (contents, props changed)
  head/sys/dev/netmap/if_lem_netmap.h   (contents, props changed)
  head/sys/dev/netmap/if_re_netmap.h   (contents, props changed)
  head/sys/dev/netmap/ixgbe_netmap.h   (contents, props changed)
  head/sys/dev/netmap/netmap.c   (contents, props changed)
  head/sys/dev/netmap/netmap_kern.h   (contents, props changed)
  head/sys/net/netmap.h   (contents, props changed)
  head/sys/net/netmap_user.h   (contents, props changed)
  head/tools/tools/netmap/
  head/tools/tools/netmap/Makefile   (contents, props changed)
  head/tools/tools/netmap/README   (contents, props changed)
  head/tools/tools/netmap/bridge.c   (contents, props changed)
  head/tools/tools/netmap/click-test.cfg   (contents, props changed)
  head/tools/tools/netmap/pcap.c   (contents, props changed)
  head/tools/tools/netmap/pkt-gen.c   (contents, props changed)
Modified:
  head/share/man/man4/Makefile
  head/tools/tools/README

Modified: head/share/man/man4/Makefile
==============================================================================
--- head/share/man/man4/Makefile	Thu Nov 17 12:08:12 2011	(r227613)
+++ head/share/man/man4/Makefile	Thu Nov 17 12:17:39 2011	(r227614)
@@ -253,6 +253,7 @@ MAN=	aac.4 \
 	net80211.4 \
 	netgraph.4 \
 	netintro.4 \
+	netmap.4 \
 	${_nfe.4} \
 	${_nfsmb.4} \
 	ng_async.4 \

Added: head/share/man/man4/netmap.4
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/share/man/man4/netmap.4	Thu Nov 17 12:17:39 2011	(r227614)
@@ -0,0 +1,300 @@
+.\" Copyright (c) 2011 Matteo Landi, Luigi Rizzo, Universita` di Pisa
+.\" All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\"    notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\"    notice, this list of conditions and the following disclaimer in the
+.\"    documentation and/or other materials provided with the distribution.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\" 
+.\" This document is derived in part from the enet man page (enet.4)
+.\" distributed with 4.3BSD Unix.
+.\"
+.\" $FreeBSD$
+.\" $Id: netmap.4 9662 2011-11-16 13:18:06Z luigi $: stable/8/share/man/man4/bpf.4 181694 2008-08-13 17:45:06Z ed $
+.\"
+.Dd November 16, 2011
+.Dt NETMAP 4
+.Os
+.Sh NAME
+.Nm netmap
+.Nd a framework for fast packet I/O
+.Sh SYNOPSIS
+.Cd device netmap
+.Sh DESCRIPTION
+.Nm
+is a framework for fast and safe access to network devices
+(reaching 14.88 Mpps at less than 1 GHz).
+.Nm
+uses memory mapped buffers and metadata
+(buffer indexes and lengths) to communicate with the kernel,
+which is in charge of validating information through 
+.Pa ioctl()
+and
+.Pa select()/poll().
+.Nm
+can exploit the parallelism in multiqueue devices and
+multicore systems.
+.Pp
+.Pp
+.Nm
+requires explicit support in device drivers.
+For a list of supported devices, see the end of this manual page.
+.Sh OPERATION
+.Nm
+clients must first open the
+.Pa open("/dev/netmap") ,
+and then issue an
+.Pa ioctl(...,NIOCREGIF,...)
+to bind the file descriptor to a network device.
+.Pp
+When a device is put in
+.Nm
+mode, its data path is disconnected from the host stack.
+The processes owning the file descriptor 
+can exchange packets with the device, or with the host stack,
+through an mmapped memory region that contains pre-allocated
+buffers and metadata.
+.Pp
+Non blocking I/O is done with special
+.Pa ioctl()'s ,
+whereas the file descriptor can be passed to
+.Pa select()/poll()
+to be notified about incoming packet or available transmit buffers.
+.Ss Data structures
+All data structures for all devices in
+.Nm
+mode are in a memory
+region shared by the kernel and all processes
+who open
+.Pa /dev/netmap
+(NOTE: visibility may be restricted in future implementations).
+All references between the shared data structure
+are relative (offsets or indexes). Some macros help converting
+them into actual pointers.
+.Pp
+The data structures in shared memory are the following:
+.Pp
+.Bl -tag -width XXX
+.It Dv struct netmap_if (one per interface)
+indicates the number of rings supported by an interface, their
+sizes, and the offsets of the
+.Pa netmap_rings
+associated to the interface.
+The offset of a
+.Pa struct netmap_if
+in the shared memory region is indicated by the
+.Pa nr_offset
+field in the structure returned by the
+.Pa NIOCREGIF
+(see below).
+.Bd -literal
+struct netmap_if {
+    char ni_name[IFNAMSIZ]; /* name of the interface. */
+    const u_int ni_num_queues; /* number of hw ring pairs */
+    const ssize_t   ring_ofs[]; /* offset of tx and rx rings */
+};
+.Ed
+.It Dv struct netmap_ring (one per ring)
+contains the index of the current read or write slot (cur),
+the number of slots available for reception or transmission (avail),
+and an array of
+.Pa slots
+describing the buffers.
+There is one ring pair for each of the N hardware ring pairs
+supported by the card (numbered 0..N-1), plus
+one ring pair (numbered N) for packets from/to the host stack.
+.Bd -literal
+struct netmap_ring {
+    const ssize_t buf_ofs;
+    const uint32_t num_slots; /* number of slots in the ring. */
+    uint32_t avail; /* number of usable slots */
+    uint32_t cur; /* 'current' index for the user side */
+
+    const uint16_t nr_buf_size;
+    uint16_t flags;
+    struct netmap_slot slot[0]; /* array of slots. */
+}
+.Ed
+.It Dv struct netmap_slot (one per packet)
+contains the metadata for a packet: a buffer index (buf_idx),
+a buffer length (len), and some flags.
+.Bd -literal
+struct netmap_slot {
+    uint32_t buf_idx; /* buffer index */
+    uint16_t len;   /* packet length */
+    uint16_t flags; /* buf changed, etc. */
+#define NS_BUF_CHANGED  0x0001  /* must resync, buffer changed */
+#define NS_REPORT       0x0002  /* tell hw to report results
+                                 * e.g. by generating an interrupt
+                                 */
+};
+.Ed
+.It Dv packet buffers
+are fixed size (approximately 2k) buffers allocated by the kernel
+that contain packet data. Buffers addresses are computed through
+macros.
+.El
+.Pp
+Some macros support the access to objects in the shared memory
+region. In particular:
+.Bd -literal
+struct netmap_if *nifp;
+...
+struct netmap_ring *txring = NETMAP_TXRING(nifp, i);
+struct netmap_ring *rxring = NETMAP_RXRING(nifp, i);
+int i = txring->slot[txring->cur].buf_idx;
+char *buf = NETMAP_BUF(txring, i);
+.Ed
+.Ss IOCTLS
+.Pp
+.Nm
+supports some ioctl() to synchronize the state of the rings
+between the kernel and the user processes, plus some
+to query and configure the interface.
+The former do not require any argument, whereas the latter
+use a
+.Pa struct netmap_req
+defined as follows:
+.Bd -literal
+struct nmreq {
+        char      nr_name[IFNAMSIZ];
+        uint32_t  nr_offset;      /* nifp offset in the shared region */
+        uint32_t  nr_memsize;     /* size of the shared region */
+        uint32_t  nr_numdescs;    /* descriptors per queue */
+        uint16_t  nr_numqueues;
+        uint16_t  nr_ringid;      /* ring(s) we care about */
+#define NETMAP_HW_RING  0x4000    /* low bits indicate one hw ring */
+#define NETMAP_SW_RING  0x2000    /* we process the sw ring */
+#define NETMAP_NO_TX_POLL 0x1000  /* no gratuitous txsync on poll */
+#define NETMAP_RING_MASK 0xfff    /* the actual ring number */
+};
+
+.Ed
+A device descriptor obtained through
+.Pa /dev/netmap
+also supports the ioctl supported by network devices.
+.Pp
+The netmap-specific
+.Xr ioctl 2
+command codes below are defined in
+.In net/netmap.h
+and are:
+.Bl -tag -width XXXX
+.It Dv NIOCGINFO
+returns information about the interface named in nr_name.
+On return, nr_memsize indicates the size of the shared netmap
+memory region (this is device-independent),
+nr_numslots indicates how many buffers are in a ring,
+nr_numrings indicates the number of rings supported by the hardware.
+.Pp
+If the device does not support netmap, the ioctl returns EINVAL.
+.It Dv NIOCREGIF
+puts the interface named in nr_name into netmap mode, disconnecting
+it from the host stack, and/or defines which rings are controlled
+through this file descriptor.
+On return, it gives the same info as NIOCGINFO, and nr_ringid
+indicates the identity of the rings controlled through the file
+descriptor.
+.Pp
+Possible values for nr_ringid are
+.Bl -tag -width XXXXX
+.It 0
+default, all hardware rings
+.It NETMAP_SW_RING
+the ``host rings'' connecting to the host stack
+.It NETMAP_HW_RING + i
+the i-th hardware ring
+.El
+By default, a
+.Nm poll
+or
+.Nm select
+call pushes out any pending packets on the transmit ring, even if
+no write events are specified.
+The feature can be disabled by or-ing
+.Nm NETMAP_NO_TX_SYNC
+to nr_ringid.
+But normally you should keep this feature unless you are using
+separate file descriptors for the send and receive rings, because
+otherwise packets are pushed out only if NETMAP_TXSYNC is called,
+or the send queue is full.
+.Pp
+.Pa NIOCREGIF
+can be used multiple times to change the association of a
+file descriptor to a ring pair, always within the same device.
+.It Dv NIOCUNREGIF
+brings an interface back to normal mode.
+.It Dv NIOCTXSYNC
+tells the hardware of new packets to transmit, and updates the
+number of slots available for transmission.
+.It Dv NIOCRXSYNC
+tells the hardware of consumed packets, and asks for newly available
+packets.
+.El
+.Ss SYSTEM CALLS
+.Nm
+uses
+.Nm select
+and
+.Nm poll
+to wake up processes when significant events occur.
+.Sh EXAMPLES
+The following code implements a traffic generator
+.Pp
+.Bd -literal -compact
+#include <net/netmap.h>
+#include <net/netmap_user.h>
+struct netmap_if *nifp;
+struct netmap_ring *ring;
+struct netmap_request nmr;
+
+fd = open("/dev/netmap", O_RDWR);
+bzero(&nmr, sizeof(nmr));
+strcpy(nmr.nm_name, "ix0");
+ioctl(fd, NIOCREG, &nmr);
+p = mmap(0, nmr.memsize, fd);
+nifp = NETMAP_IF(p, nmr.offset);
+ring = NETMAP_TXRING(nifp, 0);
+fds.fd = fd;
+fds.events = POLLOUT;
+for (;;) {
+    poll(list, 1, -1);
+    while (ring->avail-- > 0) {
+        i = ring->cur;
+        buf = NETMAP_BUF(ring, ring->slot[i].buf_index);
+        ... prepare packet in buf ...
+        ring->slot[i].len = ... packet length ...
+        ring->cur = NETMAP_RING_NEXT(ring, i);
+    }
+}
+.Ed
+.Sh SUPPORTED INTERFACES
+.Nm
+supports the following interfaces:
+.Xr em 4 ,
+.Xr ixgbe 4 ,
+.Xr re 4 ,
+.Sh AUTHORS
+The
+.Nm
+framework has been designed and implemented by
+.An Luigi Rizzo
+and
+.An Matteo Landi
+in 2011 at the Universita` di Pisa.

Added: head/sys/dev/netmap/head.diff
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/sys/dev/netmap/head.diff	Thu Nov 17 12:17:39 2011	(r227614)
@@ -0,0 +1,654 @@
+Index: conf/NOTES
+===================================================================
+--- conf/NOTES	(revision 227552)
++++ conf/NOTES	(working copy)
+@@ -799,6 +799,12 @@
+ #  option.  DHCP requires bpf.
+ device		bpf
+ 
++#  The `netmap' device implements memory-mapped access to network
++#  devices from userspace, enabling wire-speed packet capture and
++#  generation even at 10Gbit/s. Requires support in the device
++#  driver. Supported drivers are ixgbe, e1000, re.
++device		netmap
++
+ #  The `disc' device implements a minimal network interface,
+ #  which throws away all packets sent and never receives any.  It is
+ #  included for testing and benchmarking purposes.
+Index: conf/files
+===================================================================
+--- conf/files	(revision 227552)
++++ conf/files	(working copy)
+@@ -1507,6 +1507,7 @@
+ dev/my/if_my.c			optional my
+ dev/ncv/ncr53c500.c		optional ncv
+ dev/ncv/ncr53c500_pccard.c	optional ncv pccard
++dev/netmap/netmap.c		optional netmap
+ dev/nge/if_nge.c		optional nge
+ dev/nxge/if_nxge.c		optional nxge
+ dev/nxge/xgehal/xgehal-device.c	optional nxge
+Index: conf/options
+===================================================================
+--- conf/options	(revision 227552)
++++ conf/options	(working copy)
+@@ -689,6 +689,7 @@
+ 
+ # various 'device presence' options.
+ DEV_BPF			opt_bpf.h
++DEV_NETMAP		opt_global.h
+ DEV_MCA			opt_mca.h
+ DEV_CARP		opt_carp.h
+ DEV_SPLASH		opt_splash.h
+Index: dev/e1000/if_igb.c
+===================================================================
+--- dev/e1000/if_igb.c	(revision 227552)
++++ dev/e1000/if_igb.c	(working copy)
+@@ -369,6 +369,9 @@
+     &igb_rx_process_limit, 0,
+     "Maximum number of received packets to process at a time, -1 means unlimited");
+ 
++#ifdef DEV_NETMAP
++#include <dev/netmap/if_igb_netmap.h>
++#endif /* DEV_NETMAP */
+ /*********************************************************************
+  *  Device identification routine
+  *
+@@ -664,6 +667,9 @@
+ 	adapter->led_dev = led_create(igb_led_func, adapter,
+ 	    device_get_nameunit(dev));
+ 
++#ifdef DEV_NETMAP
++	igb_netmap_attach(adapter);
++#endif /* DEV_NETMAP */
+ 	INIT_DEBUGOUT("igb_attach: end");
+ 
+ 	return (0);
+@@ -742,6 +748,9 @@
+ 
+ 	callout_drain(&adapter->timer);
+ 
++#ifdef DEV_NETMAP
++	netmap_detach(adapter->ifp);
++#endif /* DEV_NETMAP */
+ 	igb_free_pci_resources(adapter);
+ 	bus_generic_detach(dev);
+ 	if_free(ifp);
+@@ -3212,6 +3221,10 @@
+ 	struct adapter *adapter = txr->adapter;
+ 	struct igb_tx_buffer *txbuf;
+ 	int i;
++#ifdef DEV_NETMAP
++	struct netmap_slot *slot = netmap_reset(NA(adapter->ifp),
++		NR_TX, txr->me, 0);
++#endif
+ 
+ 	/* Clear the old descriptor contents */
+ 	IGB_TX_LOCK(txr);
+@@ -3231,6 +3244,13 @@
+ 			m_freem(txbuf->m_head);
+ 			txbuf->m_head = NULL;
+ 		}
++#ifdef DEV_NETMAP
++		if (slot) {
++			netmap_load_map(txr->txtag, txbuf->map,
++				NMB(slot), adapter->rx_mbuf_sz);
++			slot++;
++		}
++#endif /* DEV_NETMAP */
+ 		/* clear the watch index */
+ 		txbuf->next_eop = -1;
+         }
+@@ -3626,6 +3646,19 @@
+ 
+ 	IGB_TX_LOCK_ASSERT(txr);
+ 
++#ifdef DEV_NETMAP
++	if (ifp->if_capenable & IFCAP_NETMAP) {
++		struct netmap_adapter *na = NA(ifp);
++
++		selwakeuppri(&na->tx_rings[txr->me].si, PI_NET);
++		IGB_TX_UNLOCK(txr);
++		IGB_CORE_LOCK(adapter);
++		selwakeuppri(&na->tx_rings[na->num_queues + 1].si, PI_NET);
++		IGB_CORE_UNLOCK(adapter);
++		IGB_TX_LOCK(txr); // the caller is supposed to own the lock
++		return FALSE;
++	}
++#endif /* DEV_NETMAP */
+         if (txr->tx_avail == adapter->num_tx_desc) {
+ 		txr->queue_status = IGB_QUEUE_IDLE;
+                 return FALSE;
+@@ -3949,6 +3982,10 @@
+ 	bus_dma_segment_t	pseg[1], hseg[1];
+ 	struct lro_ctrl		*lro = &rxr->lro;
+ 	int			rsize, nsegs, error = 0;
++#ifdef DEV_NETMAP
++	struct netmap_slot *slot = netmap_reset(NA(rxr->adapter->ifp),
++				NR_RX, rxr->me, 0);
++#endif
+ 
+ 	adapter = rxr->adapter;
+ 	dev = adapter->dev;
+@@ -3974,6 +4011,18 @@
+ 		struct mbuf	*mh, *mp;
+ 
+ 		rxbuf = &rxr->rx_buffers[j];
++#ifdef DEV_NETMAP
++		if (slot) {
++			netmap_load_map(rxr->ptag,
++					rxbuf->pmap, NMB(slot),
++					adapter->rx_mbuf_sz);
++			/* Update descriptor */
++			rxr->rx_base[j].read.pkt_addr =
++				htole64(vtophys(NMB(slot)));
++			slot++;
++			continue;
++		}
++#endif /* DEV_NETMAP */
+ 		if (rxr->hdr_split == FALSE)
+ 			goto skip_head;
+ 
+@@ -4436,6 +4485,19 @@
+ 	bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
+ 	    BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
+ 
++#ifdef DEV_NETMAP
++	if (ifp->if_capenable & IFCAP_NETMAP) {
++		struct netmap_adapter *na = NA(ifp);
++
++		selwakeuppri(&na->rx_rings[rxr->me].si, PI_NET);
++		IGB_RX_UNLOCK(rxr);
++		IGB_CORE_LOCK(adapter);
++		selwakeuppri(&na->rx_rings[na->num_queues + 1].si, PI_NET);
++		IGB_CORE_UNLOCK(adapter);
++		return (0);
++	}
++#endif /* DEV_NETMAP */
++
+ 	/* Main clean loop */
+ 	for (i = rxr->next_to_check; count != 0;) {
+ 		struct mbuf		*sendmp, *mh, *mp;
+Index: dev/e1000/if_lem.c
+===================================================================
+--- dev/e1000/if_lem.c	(revision 227552)
++++ dev/e1000/if_lem.c	(working copy)
+@@ -316,6 +316,10 @@
+ /* Global used in WOL setup with multiport cards */
+ static int global_quad_port_a = 0;
+ 
++#ifdef DEV_NETMAP
++#include <dev/netmap/if_lem_netmap.h>
++#endif /* DEV_NETMAP */
++
+ /*********************************************************************
+  *  Device identification routine
+  *
+@@ -646,6 +650,9 @@
+ 	adapter->led_dev = led_create(lem_led_func, adapter,
+ 	    device_get_nameunit(dev));
+ 
++#ifdef DEV_NETMAP
++	lem_netmap_attach(adapter);
++#endif /* DEV_NETMAP */
+ 	INIT_DEBUGOUT("lem_attach: end");
+ 
+ 	return (0);
+@@ -724,6 +731,9 @@
+ 	callout_drain(&adapter->timer);
+ 	callout_drain(&adapter->tx_fifo_timer);
+ 
++#ifdef DEV_NETMAP
++	netmap_detach(ifp);
++#endif /* DEV_NETMAP */
+ 	lem_free_pci_resources(adapter);
+ 	bus_generic_detach(dev);
+ 	if_free(ifp);
+@@ -2637,6 +2647,9 @@
+ lem_setup_transmit_structures(struct adapter *adapter)
+ {
+ 	struct em_buffer *tx_buffer;
++#ifdef DEV_NETMAP
++	struct netmap_slot *slot = netmap_reset(NA(adapter->ifp), NR_TX, 0, 0);
++#endif
+ 
+ 	/* Clear the old ring contents */
+ 	bzero(adapter->tx_desc_base,
+@@ -2650,6 +2663,15 @@
+ 		bus_dmamap_unload(adapter->txtag, tx_buffer->map);
+ 		m_freem(tx_buffer->m_head);
+ 		tx_buffer->m_head = NULL;
++#ifdef DEV_NETMAP
++		if (slot) {
++			/* reload the map for netmap mode */
++			netmap_load_map(adapter->txtag,
++				tx_buffer->map, NMB(slot),
++				NA(adapter->ifp)->buff_size);
++			slot++;
++		}
++#endif /* DEV_NETMAP */
+ 		tx_buffer->next_eop = -1;
+ 	}
+ 
+@@ -2951,6 +2973,12 @@
+ 
+ 	EM_TX_LOCK_ASSERT(adapter);
+ 
++#ifdef DEV_NETMAP
++	if (ifp->if_capenable & IFCAP_NETMAP) {
++		selwakeuppri(&NA(ifp)->tx_rings[0].si, PI_NET);
++		return;
++	}
++#endif /* DEV_NETMAP */
+         if (adapter->num_tx_desc_avail == adapter->num_tx_desc)
+                 return;
+ 
+@@ -3181,6 +3209,9 @@
+ {
+ 	struct em_buffer *rx_buffer;
+ 	int i, error;
++#ifdef DEV_NETMAP
++	struct netmap_slot *slot = netmap_reset(NA(adapter->ifp), NR_RX, 0, 0);
++#endif
+ 
+ 	/* Reset descriptor ring */
+ 	bzero(adapter->rx_desc_base,
+@@ -3200,6 +3231,18 @@
+ 
+ 	/* Allocate new ones. */
+ 	for (i = 0; i < adapter->num_rx_desc; i++) {
++#ifdef DEV_NETMAP
++		if (slot) {
++			netmap_load_map(adapter->rxtag,
++				rx_buffer->map, NMB(slot),
++				NA(adapter->ifp)->buff_size);
++			/* Update descriptor */
++			adapter->rx_desc_base[i].buffer_addr =
++				htole64(vtophys(NMB(slot)));
++			slot++;
++			continue;
++		}
++#endif /* DEV_NETMAP */
+ 		error = lem_get_buf(adapter, i);
+ 		if (error)
+                         return (error);
+@@ -3407,6 +3450,14 @@
+ 	bus_dmamap_sync(adapter->rxdma.dma_tag, adapter->rxdma.dma_map,
+ 	    BUS_DMASYNC_POSTREAD);
+ 
++#ifdef DEV_NETMAP
++	if (ifp->if_capenable & IFCAP_NETMAP) {
++		selwakeuppri(&NA(ifp)->rx_rings[0].si, PI_NET);
++		EM_RX_UNLOCK(adapter);
++		return (0);
++	}
++#endif /* DEV_NETMAP */
++
+ 	if (!((current_desc->status) & E1000_RXD_STAT_DD)) {
+ 		if (done != NULL)
+ 			*done = rx_sent;
+Index: dev/e1000/if_em.c
+===================================================================
+--- dev/e1000/if_em.c	(revision 227552)
++++ dev/e1000/if_em.c	(working copy)
+@@ -399,6 +399,10 @@
+ /* Global used in WOL setup with multiport cards */
+ static int global_quad_port_a = 0;
+ 
++#ifdef DEV_NETMAP
++#include <dev/netmap/if_em_netmap.h>
++#endif /* DEV_NETMAP */
++
+ /*********************************************************************
+  *  Device identification routine
+  *
+@@ -714,6 +718,9 @@
+ 
+ 	adapter->led_dev = led_create(em_led_func, adapter,
+ 	    device_get_nameunit(dev));
++#ifdef DEV_NETMAP
++	em_netmap_attach(adapter);
++#endif /* DEV_NETMAP */
+ 
+ 	INIT_DEBUGOUT("em_attach: end");
+ 
+@@ -785,6 +792,10 @@
+ 	ether_ifdetach(adapter->ifp);
+ 	callout_drain(&adapter->timer);
+ 
++#ifdef DEV_NETMAP
++	netmap_detach(ifp);
++#endif /* DEV_NETMAP */
++
+ 	em_free_pci_resources(adapter);
+ 	bus_generic_detach(dev);
+ 	if_free(ifp);
+@@ -3213,6 +3224,10 @@
+ 	struct adapter *adapter = txr->adapter;
+ 	struct em_buffer *txbuf;
+ 	int i;
++#ifdef DEV_NETMAP
++	struct netmap_slot *slot = netmap_reset(NA(adapter->ifp),
++		NR_TX, txr->me, 0);
++#endif
+ 
+ 	/* Clear the old descriptor contents */
+ 	EM_TX_LOCK(txr);
+@@ -3232,6 +3247,16 @@
+ 			m_freem(txbuf->m_head);
+ 			txbuf->m_head = NULL;
+ 		}
++#ifdef DEV_NETMAP
++		if (slot) {
++			/* reload the map for netmap mode */
++			netmap_load_map(txr->txtag,
++					txbuf->map, NMB(slot),
++					adapter->rx_mbuf_sz);
++			slot++;
++		}
++#endif /* DEV_NETMAP */
++
+ 		/* clear the watch index */
+ 		txbuf->next_eop = -1;
+         }
+@@ -3682,6 +3707,12 @@
+ 	struct ifnet   *ifp = adapter->ifp;
+ 
+ 	EM_TX_LOCK_ASSERT(txr);
++#ifdef DEV_NETMAP
++	if (ifp->if_capenable & IFCAP_NETMAP) {
++		selwakeuppri(&NA(ifp)->tx_rings[txr->me].si, PI_NET);
++		return (FALSE);
++	}
++#endif /* DEV_NETMAP */
+ 
+ 	/* No work, make sure watchdog is off */
+         if (txr->tx_avail == adapter->num_tx_desc) {
+@@ -3978,6 +4009,33 @@
+ 		if (++j == adapter->num_rx_desc)
+ 			j = 0;
+ 	}
++#ifdef DEV_NETMAP
++    {
++	/* slot is NULL if we are not in netmap mode */
++	struct netmap_slot *slot = netmap_reset(NA(adapter->ifp),
++		NR_RX, rxr->me, rxr->next_to_check);
++	/*
++	 * we need to restore all buffer addresses in the ring as they might
++	 * be in the wrong state if we are exiting from netmap mode.
++	 */
++	for (j = 0; j != adapter->num_rx_desc; ++j) {
++		void *addr;
++		rxbuf = &rxr->rx_buffers[j];
++		if (rxbuf->m_head == NULL && !slot)
++			continue;
++		addr = slot ? NMB(slot) : rxbuf->m_head->m_data;
++		// XXX load or reload ?
++		netmap_load_map(rxr->rxtag, rxbuf->map, addr, adapter->rx_mbuf_sz);
++		/* Update descriptor */
++		rxr->rx_base[j].buffer_addr = htole64(vtophys(addr));
++		bus_dmamap_sync(rxr->rxtag, rxbuf->map, BUS_DMASYNC_PREREAD);
++		if (slot)
++			slot++;
++	}
++	/* Setup our descriptor indices */
++	NA(adapter->ifp)->rx_rings[rxr->me].nr_hwcur = rxr->next_to_check;
++    }
++#endif /* DEV_NETMAP */
+ 
+ fail:
+ 	rxr->next_to_refresh = i;
+@@ -4247,6 +4305,14 @@
+ 
+ 	EM_RX_LOCK(rxr);
+ 
++#ifdef DEV_NETMAP
++	if (ifp->if_capenable & IFCAP_NETMAP) {
++		selwakeuppri(&NA(ifp)->rx_rings[rxr->me].si, PI_NET);
++		EM_RX_UNLOCK(rxr);
++		return (0);
++	}
++#endif /* DEV_NETMAP */
++
+ 	for (i = rxr->next_to_check, processed = 0; count != 0;) {
+ 
+ 		if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0)
+Index: dev/re/if_re.c
+===================================================================
+--- dev/re/if_re.c	(revision 227552)
++++ dev/re/if_re.c	(working copy)
+@@ -291,6 +291,10 @@
+ static void re_setwol		(struct rl_softc *);
+ static void re_clrwol		(struct rl_softc *);
+ 
++#ifdef DEV_NETMAP
++#include <dev/netmap/if_re_netmap.h>
++#endif /* !DEV_NETMAP */
++
+ #ifdef RE_DIAG
+ static int re_diag		(struct rl_softc *);
+ #endif
+@@ -1583,6 +1587,9 @@
+ 	 */
+ 	ifp->if_data.ifi_hdrlen = sizeof(struct ether_vlan_header);
+ 
++#ifdef DEV_NETMAP
++	re_netmap_attach(sc);
++#endif /* DEV_NETMAP */
+ #ifdef RE_DIAG
+ 	/*
+ 	 * Perform hardware diagnostic on the original RTL8169.
+@@ -1778,6 +1785,9 @@
+ 		bus_dma_tag_destroy(sc->rl_ldata.rl_stag);
+ 	}
+ 
++#ifdef DEV_NETMAP
++	netmap_detach(ifp);
++#endif /* DEV_NETMAP */
+ 	if (sc->rl_parent_tag)
+ 		bus_dma_tag_destroy(sc->rl_parent_tag);
+ 
+@@ -1952,6 +1962,9 @@
+ 	    sc->rl_ldata.rl_tx_desc_cnt * sizeof(struct rl_desc));
+ 	for (i = 0; i < sc->rl_ldata.rl_tx_desc_cnt; i++)
+ 		sc->rl_ldata.rl_tx_desc[i].tx_m = NULL;
++#ifdef DEV_NETMAP
++	re_netmap_tx_init(sc);
++#endif /* DEV_NETMAP */
+ 	/* Set EOR. */
+ 	desc = &sc->rl_ldata.rl_tx_list[sc->rl_ldata.rl_tx_desc_cnt - 1];
+ 	desc->rl_cmdstat |= htole32(RL_TDESC_CMD_EOR);
+@@ -1979,6 +1992,9 @@
+ 		if ((error = re_newbuf(sc, i)) != 0)
+ 			return (error);
+ 	}
++#ifdef DEV_NETMAP
++	re_netmap_rx_init(sc);
++#endif /* DEV_NETMAP */
+ 
+ 	/* Flush the RX descriptors */
+ 
+@@ -2035,6 +2051,12 @@
+ 	RL_LOCK_ASSERT(sc);
+ 
+ 	ifp = sc->rl_ifp;
++#ifdef DEV_NETMAP
++	if (ifp->if_capenable & IFCAP_NETMAP) {
++		selwakeuppri(&NA(ifp)->rx_rings->si, PI_NET);
++		return 0;
++	}
++#endif /* DEV_NETMAP */
+ 	if (ifp->if_mtu > RL_MTU && (sc->rl_flags & RL_FLAG_JUMBOV2) != 0)
+ 		jumbo = 1;
+ 	else
+@@ -2276,6 +2298,12 @@
+ 		return;
+ 
+ 	ifp = sc->rl_ifp;
++#ifdef DEV_NETMAP
++	if (ifp->if_capenable & IFCAP_NETMAP) {
++		selwakeuppri(&NA(ifp)->tx_rings[0].si, PI_NET);
++		return;
++	}
++#endif /* DEV_NETMAP */
+ 	/* Invalidate the TX descriptor list */
+ 	bus_dmamap_sync(sc->rl_ldata.rl_tx_list_tag,
+ 	    sc->rl_ldata.rl_tx_list_map,
+@@ -2794,6 +2822,20 @@
+ 
+ 	sc = ifp->if_softc;
+ 
++#ifdef DEV_NETMAP
++	if (ifp->if_capenable & IFCAP_NETMAP) {
++		struct netmap_kring *kring = &NA(ifp)->tx_rings[0];
++		if (sc->rl_ldata.rl_tx_prodidx != kring->nr_hwcur) {
++			/* kick the tx unit */
++			CSR_WRITE_1(sc, sc->rl_txstart, RL_TXSTART_START);
++#ifdef RE_TX_MODERATION
++			CSR_WRITE_4(sc, RL_TIMERCNT, 1);
++#endif
++			sc->rl_watchdog_timer = 5;
++		}
++		return;
++	}
++#endif /* DEV_NETMAP */
+ 	if ((ifp->if_drv_flags & (IFF_DRV_RUNNING | IFF_DRV_OACTIVE)) !=
+ 	    IFF_DRV_RUNNING || (sc->rl_flags & RL_FLAG_LINK) == 0)
+ 		return;
+Index: dev/ixgbe/ixgbe.c
+===================================================================
+--- dev/ixgbe/ixgbe.c	(revision 227552)
++++ dev/ixgbe/ixgbe.c	(working copy)
+@@ -313,6 +313,10 @@
+ static int fdir_pballoc = 1;
+ #endif
+ 
++#ifdef DEV_NETMAP
++#include <dev/netmap/ixgbe_netmap.h>
++#endif /* DEV_NETMAP */
++
+ /*********************************************************************
+  *  Device identification routine
+  *
+@@ -578,6 +582,9 @@
+ 
+ 	ixgbe_add_hw_stats(adapter);
+ 
++#ifdef DEV_NETMAP
++	ixgbe_netmap_attach(adapter);
++#endif /* DEV_NETMAP */
+ 	INIT_DEBUGOUT("ixgbe_attach: end");
+ 	return (0);
+ err_late:
+@@ -652,6 +659,9 @@
+ 
+ 	ether_ifdetach(adapter->ifp);
+ 	callout_drain(&adapter->timer);
++#ifdef DEV_NETMAP
++	netmap_detach(adapter->ifp);
++#endif /* DEV_NETMAP */
+ 	ixgbe_free_pci_resources(adapter);
+ 	bus_generic_detach(dev);
+ 	if_free(adapter->ifp);
+@@ -1719,6 +1729,7 @@
+ 		if (++i == adapter->num_tx_desc)
+ 			i = 0;
+ 
++		// XXX should we sync each buffer ?
+ 		txbuf->m_head = NULL;
+ 		txbuf->eop_index = -1;
+ 	}
+@@ -2813,6 +2824,10 @@
+ 	struct adapter *adapter = txr->adapter;
+ 	struct ixgbe_tx_buf *txbuf;
+ 	int i;
++#ifdef DEV_NETMAP
++	struct netmap_slot *slot = netmap_reset(NA(adapter->ifp),
++		NR_TX, txr->me, 0);
++#endif
+ 
+ 	/* Clear the old ring contents */
+ 	IXGBE_TX_LOCK(txr);
+@@ -2832,6 +2847,13 @@
+ 			m_freem(txbuf->m_head);
+ 			txbuf->m_head = NULL;
+ 		}
++#ifdef DEV_NETMAP
++		if (slot) {
++			netmap_load_map(txr->txtag, txbuf->map,
++				NMB(slot), adapter->rx_mbuf_sz);
++			slot++;
++		}
++#endif /* DEV_NETMAP */
+ 		/* Clear the EOP index */
+ 		txbuf->eop_index = -1;
+         }
+@@ -3310,6 +3332,20 @@
+ 
+ 	mtx_assert(&txr->tx_mtx, MA_OWNED);
+ 
++#ifdef DEV_NETMAP
++	if (ifp->if_capenable & IFCAP_NETMAP) {
++		struct netmap_adapter *na = NA(ifp);
++
++		selwakeuppri(&na->tx_rings[txr->me].si, PI_NET);
++		IXGBE_TX_UNLOCK(txr);
++		IXGBE_CORE_LOCK(adapter);
++		selwakeuppri(&na->tx_rings[na->num_queues + 1].si, PI_NET);
++		IXGBE_CORE_UNLOCK(adapter);
++		IXGBE_TX_LOCK(txr); // the caller is supposed to own the lock
++		return (FALSE);
++	}
++#endif /* DEV_NETMAP */
++
+ 	if (txr->tx_avail == adapter->num_tx_desc) {
+ 		txr->queue_status = IXGBE_QUEUE_IDLE;
+ 		return FALSE;
+@@ -3698,6 +3734,10 @@
+ 	bus_dma_segment_t	pseg[1], hseg[1];
+ 	struct lro_ctrl		*lro = &rxr->lro;
+ 	int			rsize, nsegs, error = 0;
++#ifdef DEV_NETMAP
++	struct netmap_slot *slot = netmap_reset(NA(rxr->adapter->ifp),
++				NR_RX, rxr->me, 0);
++#endif /* DEV_NETMAP */
+ 
+ 	adapter = rxr->adapter;
+ 	ifp = adapter->ifp;
+@@ -3721,6 +3761,18 @@
+ 		struct mbuf	*mh, *mp;
+ 
+ 		rxbuf = &rxr->rx_buffers[j];
++#ifdef DEV_NETMAP
++		if (slot) {
++			netmap_load_map(rxr->ptag,
++					rxbuf->pmap, NMB(slot),
++					adapter->rx_mbuf_sz);
++			/* Update descriptor */
++			rxr->rx_base[j].read.pkt_addr =
++				htole64(vtophys(NMB(slot)));
++			slot++;
++			continue;
++		}
++#endif /* DEV_NETMAP */
+ 		/*
+ 		** Don't allocate mbufs if not
+ 		** doing header split, its wasteful
+@@ -4148,6 +4200,18 @@
+ 
+ 	IXGBE_RX_LOCK(rxr);
+ 
++#ifdef DEV_NETMAP
++	if (ifp->if_capenable & IFCAP_NETMAP) {
++		struct netmap_adapter *na = NA(ifp);
++
++		selwakeuppri(&na->rx_rings[rxr->me].si, PI_NET);
++		IXGBE_RX_UNLOCK(rxr);
++		IXGBE_CORE_LOCK(adapter);
++		selwakeuppri(&na->rx_rings[na->num_queues + 1].si, PI_NET);
++		IXGBE_CORE_UNLOCK(adapter);
++		return (0);
++	}
++#endif /* DEV_NETMAP */
+ 	for (i = rxr->next_to_check; count != 0;) {
+ 		struct mbuf	*sendmp, *mh, *mp;
+ 		u32		rsc, ptype;

Added: head/sys/dev/netmap/if_em_netmap.h
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/sys/dev/netmap/if_em_netmap.h	Thu Nov 17 12:17:39 2011	(r227614)
@@ -0,0 +1,383 @@
+/*
+ * Copyright (C) 2011 Matteo Landi, Luigi Rizzo. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+/*
+ * $FreeBSD$

*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***


More information about the svn-src-all mailing list