realtek performance (was Re: good ATI chipset results)

Sean McNeil sean at mcneil.com
Wed Oct 19 14:49:58 PDT 2005


On Thu, 2005-10-13 at 16:17 -0400, John Baldwin wrote:
> On Thursday 13 October 2005 03:46 pm, Scott Long wrote:
> > John Baldwin wrote:
> > > On Thursday 13 October 2005 01:07 pm, Sean McNeil wrote:
> > >>On Thu, 2005-10-13 at 11:49 -0400, John Baldwin wrote:
> > >>>On Thursday 13 October 2005 11:13 am, Sean McNeil wrote:
> > >>>>On Thu, 2005-10-13 at 09:17 -0400, Mike Tancsa wrote:
> > >>>>>Havent really seen anyone else use this board, but I have had good
> > >>>>>luck with it so far
> > >>>>>
> > >>>>>http://www.ecs.com.tw/ECSWeb/Products/ProductsDetail.aspx?DetailID=50
> > >>>>>6&Me nuID=90&LanID=0
> > >>>>>
> > >>>>>Its a micro ATX formfactor with built in video and the onboard NIC is
> > >>>>>a realtek.  (Although its not the fastest NIC, its driver is stable
> > >>>>>and mature-- especially compared to the headaches people seem to have
> > >>>>>with the NVIDIA NICs.)
> > >>>>
> > >>>>Is this the RealTek 8169S Single-chip Gigabit Ethernet?
> > >>>>
> > >>>>For those interested, here are some changes I always use to increase
> > >>>>the performance of the above NIC.  With these mods, I can stream over
> > >>>>20 MBps video multicast and do other stuff over the network without
> > >>>>issues. Without the changes, xmit is horrible with severe UDP packet
> > >>>>loss.
> > >>>
> > >>>So, I see two changes.  One is to up the number of descriptors from 32
> > >>> rx and 64 tx to 64 rx and 64 tx on some models and 1024 rx and 1024 tx
> > >>> on other modules.  The other thing is that you seem to pessimize TX
> > >>> performance by always forcing the send packets to be coalesced into one
> > >>> mbuf (which requires doing an alloc and then copying all of the data)
> > >>> instead of making use of scatter/gatter for sending packets.  Do you
> > >>> need both changes or do just the higher descriptor counts make the
> > >>> difference?
> > >>
> > >>Actually, I've found that the higher descriptor counts do not make a
> > >>noticeable difference.  The only thing that mattered was to eliminate
> > >>the scatter/gather of sending packets.  I can't remember why I left the
> > >>descriptor increase in there.  I think it was to get the best use out of
> > >>the hardware.
> > >
> > > Hmm, odd.  Scott, do you have any ideas why m_defrag() plus one
> > > descriptor would be faster than s/g dma for re(4)?
> >
> > There are two things that I would consider.  First is that
> > bus_dmamap_load_mbuf_sg()
> > should be use, as that cuts out some indirection (and thus latency) in
> > the code.  Second
> > is that not all DMA engines are created equal, and I honestly wouldn't
> > expect a whole lot
> > out of Realtek given the price point of this chip.  It might be
> > optimized only for operating
> > on only a single S/G element, for example.  Maybe it's really slow at
> > pre-fetching s/g
> > elements, or maybe it has some sort of a stall after each DMA sement
> > transfer while it
> > restarts a state machine.  I've seen evidence in other hardware that
> > only one S/G element
> > should be used even though there are slots for 2 (or 3 in the case of 9k
> > jumbo frames).  One
> > thing to keep in mind is the difference in the driver models between
> > Windows and BSD
> > that Bill Paul talked about the other day.  In the Windows world, the
> > driver owns the
> > network packet memory, whereas in BSD the stack owns it (in the form of
> > mbufs).  This
> > means that the driver can pre-allocate a contiguous slab and populate
> > the descriptor rings
> > with it without ever having to worry about s/g fragmentation, while in
> > BSD fragmentation
> > is a fact of life.  So it's likely yet another case of hardware being
> > optimized for certain
> > characteristics of Windows at the expense of other operating systems.
> 
> Ok.  Sean, do you think you can trim the patch down to just the m_defrag() 
> changes and test that to make sure that is all that is needed?

I was working on this when I fried my motherboard.  I have two sets of
patches for the re0 and managed to do some testing.  I didn't, however,
have the opportunity to perform my 19Mbps multicast stream.
Unfortunately, I couldn't find a comparable MB and now have an sk0
instead of re0.  So I won't be able to assist with this driver anymore.

The patches are broken into 2:

	Just the m_defrag() patch.
	Using bus_dmamap_load_mbuf_sg() as suggested by Scott.

The first is trivial, the second a little more involved.

Since I've had issues with attachments, I'll include as text as well as
attach.

Cheers,
Sean

First patch (just m_defrag() stuff):

--- sys/dev/re/if_re.c.orig	Wed Oct 19 14:39:32 2005
+++ sys/dev/re/if_re.c	Wed Oct 19 14:40:13 2005
@@ -1901,40 +1901,26 @@
 	arg.sc = sc;
 	arg.rl_idx = *idx;
 	arg.rl_maxsegs = sc->rl_ldata.rl_tx_free;
-	if (arg.rl_maxsegs > 4)
-		arg.rl_maxsegs -= 4;
 	arg.rl_ring = sc->rl_ldata.rl_tx_list;
 
 	map = sc->rl_ldata.rl_tx_dmamap[*idx];
+
+	/*
+	 * Network performance is extremely poor in gigE mode without
+	 * defragmenting each time.  It is believed this might be an
+	 * issue with the realtek part not being able to DMA anything
+	 * smaller than a complete segment.
+	 */
+	m_new = m_defrag(*m_head, M_DONTWAIT);
+	if (m_new != NULL)
+		*m_head = m_new;
+
 	error = bus_dmamap_load_mbuf(sc->rl_ldata.rl_mtag, map,
 	    *m_head, re_dma_map_desc, &arg, BUS_DMA_NOWAIT);
 
-	if (error && error != EFBIG) {
+	if (error) {
 		if_printf(sc->rl_ifp, "can't map mbuf (error %d)\n", error);
 		return (ENOBUFS);
-	}
-
-	/* Too many segments to map, coalesce into a single mbuf */
-
-	if (error || arg.rl_maxsegs == 0) {
-		m_new = m_defrag(*m_head, M_DONTWAIT);
-		if (m_new == NULL)
-			return (ENOBUFS);
-		else
-			*m_head = m_new;
-
-		arg.sc = sc;
-		arg.rl_idx = *idx;
-		arg.rl_maxsegs = sc->rl_ldata.rl_tx_free;
-		arg.rl_ring = sc->rl_ldata.rl_tx_list;
-
-		error = bus_dmamap_load_mbuf(sc->rl_ldata.rl_mtag, map,
-		    *m_head, re_dma_map_desc, &arg, BUS_DMA_NOWAIT);
-		if (error) {
-			if_printf(sc->rl_ifp, "can't map mbuf (error %d)\n",
-			    error);
-			return (EFBIG);
-		}
 	}
 
 	/*

Second patch (_sg changes):

--- sys/pci/if_rlreg.h.orig	Thu Oct  6 13:17:17 2005
+++ sys/pci/if_rlreg.h	Wed Oct 19 14:43:52 2005
@@ -645,14 +645,6 @@
 
 struct rl_softc;
 
-struct rl_dmaload_arg {
-	struct rl_softc		*sc;
-	int			rl_idx;
-	int			rl_maxsegs;
-	uint32_t		rl_flags;
-	struct rl_desc		*rl_ring;
-};
-
 struct rl_list_data {
 	struct mbuf		*rl_tx_mbuf[RL_TX_DESC_CNT];
 	struct mbuf		*rl_rx_mbuf[RL_TX_DESC_CNT];

--- sys/dev/re/if_re.c.orig	Wed Oct 19 14:40:13 2005
+++ sys/dev/re/if_re.c	Wed Oct 19 14:42:23 2005
@@ -159,6 +159,8 @@
 
 #define RE_CSUM_FEATURES    (CSUM_IP | CSUM_TCP | CSUM_UDP)
 
+#define RL_NSEG_NEW 32
+
 /*
  * Various supported device vendors/types and their names.
  */
@@ -205,8 +207,6 @@
 static int re_encap		(struct rl_softc *, struct mbuf **, int *);
 
 static void re_dma_map_addr	(void *, bus_dma_segment_t *, int, int);
-static void re_dma_map_desc	(void *, bus_dma_segment_t *, int,
-				    bus_size_t, int);
 static int re_allocmem		(device_t, struct rl_softc *);
 static int re_newbuf		(struct rl_softc *, int, struct mbuf *);
 static int re_rx_list_init	(struct rl_softc *);
@@ -856,82 +856,6 @@
 }
 
 /*
- * This routine takes the segment list provided as the result of
- * a bus_dma_map_load() operation and assigns the addresses/lengths
- * to RealTek DMA descriptors. This can be called either by the RX
- * code or the TX code. In the RX case, we'll probably wind up mapping
- * at most one segment. For the TX case, there could be any number of
- * segments since TX packets may span multiple mbufs. In either case,
- * if the number of segments is larger than the rl_maxsegs limit
- * specified by the caller, we abort the mapping operation. Sadly,
- * whoever designed the buffer mapping API did not provide a way to
- * return an error from here, so we have to fake it a bit.
- */
-
-static void
-re_dma_map_desc(arg, segs, nseg, mapsize, error)
-	void			*arg;
-	bus_dma_segment_t	*segs;
-	int			nseg;
-	bus_size_t		mapsize;
-	int			error;
-{
-	struct rl_dmaload_arg	*ctx;
-	struct rl_desc		*d = NULL;
-	int			i = 0, idx;
-
-	if (error)
-		return;
-
-	ctx = arg;
-
-	/* Signal error to caller if there's too many segments */
-	if (nseg > ctx->rl_maxsegs) {
-		ctx->rl_maxsegs = 0;
-		return;
-	}
-
-	/*
-	 * Map the segment array into descriptors. Note that we set the
-	 * start-of-frame and end-of-frame markers for either TX or RX, but
-	 * they really only have meaning in the TX case. (In the RX case,
-	 * it's the chip that tells us where packets begin and end.)
-	 * We also keep track of the end of the ring and set the
-	 * end-of-ring bits as needed, and we set the ownership bits
-	 * in all except the very first descriptor. (The caller will
-	 * set this descriptor later when it start transmission or
-	 * reception.)
-	 */
-	idx = ctx->rl_idx;
-	for (;;) {
-		u_int32_t		cmdstat;
-		d = &ctx->rl_ring[idx];
-		if (le32toh(d->rl_cmdstat) & RL_RDESC_STAT_OWN) {
-			ctx->rl_maxsegs = 0;
-			return;
-		}
-		cmdstat = segs[i].ds_len;
-		d->rl_bufaddr_lo = htole32(RL_ADDR_LO(segs[i].ds_addr));
-		d->rl_bufaddr_hi = htole32(RL_ADDR_HI(segs[i].ds_addr));
-		if (i == 0)
-			cmdstat |= RL_TDESC_CMD_SOF;
-		else
-			cmdstat |= RL_TDESC_CMD_OWN;
-		if (idx == (RL_RX_DESC_CNT - 1))
-			cmdstat |= RL_TDESC_CMD_EOR;
-		d->rl_cmdstat = htole32(cmdstat | ctx->rl_flags);
-		i++;
-		if (i == nseg)
-			break;
-		RL_DESC_INC(idx);
-	}
-
-	d->rl_cmdstat |= htole32(RL_TDESC_CMD_EOF);
-	ctx->rl_maxsegs = nseg;
-	ctx->rl_idx = idx;
-}
-
-/*
  * Map a single buffer address.
  */
 
@@ -958,17 +882,15 @@
 	struct rl_softc		*sc;
 {
 	int			error;
-	int			nseg;
 	int			i;
 
 	/*
 	 * Allocate map for RX mbufs.
 	 */
-	nseg = 32;
 	error = bus_dma_tag_create(sc->rl_parent_tag, ETHER_ALIGN, 0,
 	    BUS_SPACE_MAXADDR_32BIT, BUS_SPACE_MAXADDR, NULL,
-	    NULL, MCLBYTES * nseg, nseg, MCLBYTES, BUS_DMA_ALLOCNOW,
-	    NULL, NULL, &sc->rl_ldata.rl_mtag);
+	    NULL, MCLBYTES * RL_NSEG_NEW, RL_NSEG_NEW, MCLBYTES,
+	    BUS_DMA_ALLOCNOW, NULL, NULL, &sc->rl_ldata.rl_mtag);
 	if (error) {
 		device_printf(dev, "could not allocate dma tag\n");
 		return (ENOMEM);
@@ -1163,7 +1085,6 @@
 	/*
 	 * Allocate the parent bus DMA tag appropriate for PCI.
 	 */
-#define RL_NSEG_NEW 32
 	error = bus_dma_tag_create(NULL,	/* parent */
 			1, 0,			/* alignment, boundary */
 			BUS_SPACE_MAXADDR_32BIT,/* lowaddr */
@@ -1374,9 +1295,11 @@
 	int			idx;
 	struct mbuf		*m;
 {
-	struct rl_dmaload_arg	arg;
 	struct mbuf		*n = NULL;
-	int			error;
+	struct rl_desc		*d = &sc->rl_ldata.rl_rx_list[idx];
+	bus_dma_segment_t	segs[1];
+	int			error, nseg;
+	u_int32_t		cmdstat;
 
 	if (m == NULL) {
 		n = m_getcl(M_DONTWAIT, MT_DATA, M_PKTHDR);
@@ -1400,22 +1323,23 @@
 	 */
 	m_adj(m, RE_ETHER_ALIGN);
 #endif
-	arg.sc = sc;
-	arg.rl_idx = idx;
-	arg.rl_maxsegs = 1;
-	arg.rl_flags = 0;
-	arg.rl_ring = sc->rl_ldata.rl_rx_list;
-
-	error = bus_dmamap_load_mbuf(sc->rl_ldata.rl_mtag,
-	    sc->rl_ldata.rl_rx_dmamap[idx], m, re_dma_map_desc,
-	    &arg, BUS_DMA_NOWAIT);
-	if (error || arg.rl_maxsegs != 1) {
+	error = bus_dmamap_load_mbuf_sg(sc->rl_ldata.rl_mtag,
+	    sc->rl_ldata.rl_rx_dmamap[idx], m, segs, &nseg, BUS_DMA_NOWAIT);
+	if (error || nseg != 1) {
 		if (n != NULL)
 			m_freem(n);
 		return (ENOMEM);
 	}
 
-	sc->rl_ldata.rl_rx_list[idx].rl_cmdstat |= htole32(RL_RDESC_CMD_OWN);
+	d->rl_bufaddr_lo = htole32(RL_ADDR_LO(segs[0].ds_addr));
+	d->rl_bufaddr_hi = htole32(RL_ADDR_HI(segs[0].ds_addr));
+
+	cmdstat = segs[0].ds_len | RL_RDESC_CMD_OWN;
+	if (idx == (RL_RX_DESC_CNT - 1))
+	    cmdstat |= RL_TDESC_CMD_EOR;
+
+	d->rl_cmdstat = htole32(cmdstat);
+
 	sc->rl_ldata.rl_rx_mbuf[idx] = m;
 
 	bus_dmamap_sync(sc->rl_ldata.rl_mtag,
@@ -1872,10 +1796,13 @@
 	int			*idx;
 {
 	struct mbuf		*m_new = NULL;
-	struct rl_dmaload_arg	arg;
-	bus_dmamap_t		map;
-	int			error;
 	struct m_tag		*mtag;
+	bus_dmamap_t		map;
+	bus_dma_segment_t	segs[RL_NSEG_NEW];
+	struct rl_desc		*d;
+	int			error, nseg, i;
+	int			nidx = *idx;
+	uint32_t		flags = 0;
 
 	RL_LOCK_ASSERT(sc);
 
@@ -1889,21 +1816,14 @@
 	 * chip. This is a requirement.
 	 */
 
-	arg.rl_flags = 0;
-
 	if ((*m_head)->m_pkthdr.csum_flags & CSUM_IP)
-		arg.rl_flags |= RL_TDESC_CMD_IPCSUM;
+		flags |= RL_TDESC_CMD_IPCSUM;
 	if ((*m_head)->m_pkthdr.csum_flags & CSUM_TCP)
-		arg.rl_flags |= RL_TDESC_CMD_TCPCSUM;
+		flags |= RL_TDESC_CMD_TCPCSUM;
 	if ((*m_head)->m_pkthdr.csum_flags & CSUM_UDP)
-		arg.rl_flags |= RL_TDESC_CMD_UDPCSUM;
-
-	arg.sc = sc;
-	arg.rl_idx = *idx;
-	arg.rl_maxsegs = sc->rl_ldata.rl_tx_free;
-	arg.rl_ring = sc->rl_ldata.rl_tx_list;
+		flags |= RL_TDESC_CMD_UDPCSUM;
 
-	map = sc->rl_ldata.rl_tx_dmamap[*idx];
+	map = sc->rl_ldata.rl_tx_dmamap[nidx];
 
 	/*
 	 * Network performance is extremely poor in gigE mode without
@@ -1915,25 +1835,52 @@
 	if (m_new != NULL)
 		*m_head = m_new;
 
-	error = bus_dmamap_load_mbuf(sc->rl_ldata.rl_mtag, map,
-	    *m_head, re_dma_map_desc, &arg, BUS_DMA_NOWAIT);
+	error = bus_dmamap_load_mbuf_sg(sc->rl_ldata.rl_mtag, map,
+	    *m_head, segs, &nseg, BUS_DMA_NOWAIT);
 
 	if (error) {
 		if_printf(sc->rl_ifp, "can't map mbuf (error %d)\n", error);
 		return (ENOBUFS);
 	}
 
+	KASSERT(nseg <= sc->rl_ldata.rl_tx_free, ("too many DMA segments"));
+
+	/*
+	 * Map the segment array into descriptors. We keep track of the
+	 * end of the ring and set the end-of-ring bits as needed, and
+	 * we set the ownership bits in all except the very first descriptor.
+	 * (It is set later during the start transmission or reception.)
+	 */
+	for (i = 0;;) {
+		u_int32_t		cmdstat;
+		d = &sc->rl_ldata.rl_tx_list[nidx];
+		KASSERT((le32toh(d->rl_cmdstat) & RL_RDESC_STAT_OWN)==0,
+		    ("Descriptor still owned"));
+		cmdstat = segs[i].ds_len;
+		d->rl_bufaddr_lo = htole32(RL_ADDR_LO(segs[i].ds_addr));
+		d->rl_bufaddr_hi = htole32(RL_ADDR_HI(segs[i].ds_addr));
+		cmdstat |= (i == 0) ? RL_TDESC_CMD_SOF : RL_TDESC_CMD_OWN;
+		if (nidx == (RL_TX_DESC_CNT - 1))
+			cmdstat |= RL_TDESC_CMD_EOR;
+		d->rl_cmdstat = htole32(cmdstat | flags);
+		i++;
+		if (i == nseg)
+			break;
+		RL_DESC_INC(nidx);
+	}
+
+	d->rl_cmdstat |= htole32(RL_TDESC_CMD_EOF);
+
 	/*
 	 * Insure that the map for this transmission
 	 * is placed at the array index of the last descriptor
 	 * in this chain.  (Swap last and first dmamaps.)
 	 */
-	sc->rl_ldata.rl_tx_dmamap[*idx] =
-	    sc->rl_ldata.rl_tx_dmamap[arg.rl_idx];
-	sc->rl_ldata.rl_tx_dmamap[arg.rl_idx] = map;
+	sc->rl_ldata.rl_tx_dmamap[*idx] = sc->rl_ldata.rl_tx_dmamap[nidx];
+	sc->rl_ldata.rl_tx_dmamap[nidx] = map;
 
-	sc->rl_ldata.rl_tx_mbuf[arg.rl_idx] = *m_head;
-	sc->rl_ldata.rl_tx_free -= arg.rl_maxsegs;
+	sc->rl_ldata.rl_tx_mbuf[nidx] = *m_head;
+	sc->rl_ldata.rl_tx_free -= nseg;
 
 	/*
 	 * Set up hardware VLAN tagging. Note: vlan tag info must
@@ -1941,21 +1888,19 @@
 	 * transmission attempt.
 	 */
 
+	d = &sc->rl_ldata.rl_tx_list[*idx];
+
 	mtag = VLAN_OUTPUT_TAG(sc->rl_ifp, *m_head);
 	if (mtag != NULL)
-		sc->rl_ldata.rl_tx_list[*idx].rl_vlanctl =
+		d->rl_vlanctl =
 		    htole32(htons(VLAN_TAG_VALUE(mtag)) | RL_TDESC_VLANCTL_TAG);
 
 	/* Transfer ownership of packet to the chip. */
 
-	sc->rl_ldata.rl_tx_list[arg.rl_idx].rl_cmdstat |=
-	    htole32(RL_TDESC_CMD_OWN);
-	if (*idx != arg.rl_idx)
-		sc->rl_ldata.rl_tx_list[*idx].rl_cmdstat |=
-		    htole32(RL_TDESC_CMD_OWN);
+	d->rl_cmdstat |= htole32(RL_TDESC_CMD_OWN);
 
-	RL_DESC_INC(arg.rl_idx);
-	*idx = arg.rl_idx;
+	RL_DESC_INC(nidx);
+	*idx = nidx;
 
 	return (0);
 }

-------------- next part --------------
--- sys/dev/re/if_re.c.orig	Wed Oct 19 14:39:32 2005
+++ sys/dev/re/if_re.c	Wed Oct 19 14:40:13 2005
@@ -1901,40 +1901,26 @@
 	arg.sc = sc;
 	arg.rl_idx = *idx;
 	arg.rl_maxsegs = sc->rl_ldata.rl_tx_free;
-	if (arg.rl_maxsegs > 4)
-		arg.rl_maxsegs -= 4;
 	arg.rl_ring = sc->rl_ldata.rl_tx_list;
 
 	map = sc->rl_ldata.rl_tx_dmamap[*idx];
+
+	/*
+	 * Network performance is extremely poor in gigE mode without
+	 * defragmenting each time.  It is believed this might be an
+	 * issue with the realtek part not being able to DMA anything
+	 * smaller than a complete segment.
+	 */
+	m_new = m_defrag(*m_head, M_DONTWAIT);
+	if (m_new != NULL)
+		*m_head = m_new;
+
 	error = bus_dmamap_load_mbuf(sc->rl_ldata.rl_mtag, map,
 	    *m_head, re_dma_map_desc, &arg, BUS_DMA_NOWAIT);
 
-	if (error && error != EFBIG) {
+	if (error) {
 		if_printf(sc->rl_ifp, "can't map mbuf (error %d)\n", error);
 		return (ENOBUFS);
-	}
-
-	/* Too many segments to map, coalesce into a single mbuf */
-
-	if (error || arg.rl_maxsegs == 0) {
-		m_new = m_defrag(*m_head, M_DONTWAIT);
-		if (m_new == NULL)
-			return (ENOBUFS);
-		else
-			*m_head = m_new;
-
-		arg.sc = sc;
-		arg.rl_idx = *idx;
-		arg.rl_maxsegs = sc->rl_ldata.rl_tx_free;
-		arg.rl_ring = sc->rl_ldata.rl_tx_list;
-
-		error = bus_dmamap_load_mbuf(sc->rl_ldata.rl_mtag, map,
-		    *m_head, re_dma_map_desc, &arg, BUS_DMA_NOWAIT);
-		if (error) {
-			if_printf(sc->rl_ifp, "can't map mbuf (error %d)\n",
-			    error);
-			return (EFBIG);
-		}
 	}
 
 	/*
-------------- next part --------------
--- sys/pci/if_rlreg.h.orig	Thu Oct  6 13:17:17 2005
+++ sys/pci/if_rlreg.h	Wed Oct 19 14:43:52 2005
@@ -645,14 +645,6 @@
 
 struct rl_softc;
 
-struct rl_dmaload_arg {
-	struct rl_softc		*sc;
-	int			rl_idx;
-	int			rl_maxsegs;
-	uint32_t		rl_flags;
-	struct rl_desc		*rl_ring;
-};
-
 struct rl_list_data {
 	struct mbuf		*rl_tx_mbuf[RL_TX_DESC_CNT];
 	struct mbuf		*rl_rx_mbuf[RL_TX_DESC_CNT];
-------------- next part --------------
--- sys/dev/re/if_re.c.orig	Wed Oct 19 14:40:13 2005
+++ sys/dev/re/if_re.c	Wed Oct 19 14:42:23 2005
@@ -159,6 +159,8 @@
 
 #define RE_CSUM_FEATURES    (CSUM_IP | CSUM_TCP | CSUM_UDP)
 
+#define RL_NSEG_NEW 32
+
 /*
  * Various supported device vendors/types and their names.
  */
@@ -205,8 +207,6 @@
 static int re_encap		(struct rl_softc *, struct mbuf **, int *);
 
 static void re_dma_map_addr	(void *, bus_dma_segment_t *, int, int);
-static void re_dma_map_desc	(void *, bus_dma_segment_t *, int,
-				    bus_size_t, int);
 static int re_allocmem		(device_t, struct rl_softc *);
 static int re_newbuf		(struct rl_softc *, int, struct mbuf *);
 static int re_rx_list_init	(struct rl_softc *);
@@ -856,82 +856,6 @@
 }
 
 /*
- * This routine takes the segment list provided as the result of
- * a bus_dma_map_load() operation and assigns the addresses/lengths
- * to RealTek DMA descriptors. This can be called either by the RX
- * code or the TX code. In the RX case, we'll probably wind up mapping
- * at most one segment. For the TX case, there could be any number of
- * segments since TX packets may span multiple mbufs. In either case,
- * if the number of segments is larger than the rl_maxsegs limit
- * specified by the caller, we abort the mapping operation. Sadly,
- * whoever designed the buffer mapping API did not provide a way to
- * return an error from here, so we have to fake it a bit.
- */
-
-static void
-re_dma_map_desc(arg, segs, nseg, mapsize, error)
-	void			*arg;
-	bus_dma_segment_t	*segs;
-	int			nseg;
-	bus_size_t		mapsize;
-	int			error;
-{
-	struct rl_dmaload_arg	*ctx;
-	struct rl_desc		*d = NULL;
-	int			i = 0, idx;
-
-	if (error)
-		return;
-
-	ctx = arg;
-
-	/* Signal error to caller if there's too many segments */
-	if (nseg > ctx->rl_maxsegs) {
-		ctx->rl_maxsegs = 0;
-		return;
-	}
-
-	/*
-	 * Map the segment array into descriptors. Note that we set the
-	 * start-of-frame and end-of-frame markers for either TX or RX, but
-	 * they really only have meaning in the TX case. (In the RX case,
-	 * it's the chip that tells us where packets begin and end.)
-	 * We also keep track of the end of the ring and set the
-	 * end-of-ring bits as needed, and we set the ownership bits
-	 * in all except the very first descriptor. (The caller will
-	 * set this descriptor later when it start transmission or
-	 * reception.)
-	 */
-	idx = ctx->rl_idx;
-	for (;;) {
-		u_int32_t		cmdstat;
-		d = &ctx->rl_ring[idx];
-		if (le32toh(d->rl_cmdstat) & RL_RDESC_STAT_OWN) {
-			ctx->rl_maxsegs = 0;
-			return;
-		}
-		cmdstat = segs[i].ds_len;
-		d->rl_bufaddr_lo = htole32(RL_ADDR_LO(segs[i].ds_addr));
-		d->rl_bufaddr_hi = htole32(RL_ADDR_HI(segs[i].ds_addr));
-		if (i == 0)
-			cmdstat |= RL_TDESC_CMD_SOF;
-		else
-			cmdstat |= RL_TDESC_CMD_OWN;
-		if (idx == (RL_RX_DESC_CNT - 1))
-			cmdstat |= RL_TDESC_CMD_EOR;
-		d->rl_cmdstat = htole32(cmdstat | ctx->rl_flags);
-		i++;
-		if (i == nseg)
-			break;
-		RL_DESC_INC(idx);
-	}
-
-	d->rl_cmdstat |= htole32(RL_TDESC_CMD_EOF);
-	ctx->rl_maxsegs = nseg;
-	ctx->rl_idx = idx;
-}
-
-/*
  * Map a single buffer address.
  */
 
@@ -958,17 +882,15 @@
 	struct rl_softc		*sc;
 {
 	int			error;
-	int			nseg;
 	int			i;
 
 	/*
 	 * Allocate map for RX mbufs.
 	 */
-	nseg = 32;
 	error = bus_dma_tag_create(sc->rl_parent_tag, ETHER_ALIGN, 0,
 	    BUS_SPACE_MAXADDR_32BIT, BUS_SPACE_MAXADDR, NULL,
-	    NULL, MCLBYTES * nseg, nseg, MCLBYTES, BUS_DMA_ALLOCNOW,
-	    NULL, NULL, &sc->rl_ldata.rl_mtag);
+	    NULL, MCLBYTES * RL_NSEG_NEW, RL_NSEG_NEW, MCLBYTES,
+	    BUS_DMA_ALLOCNOW, NULL, NULL, &sc->rl_ldata.rl_mtag);
 	if (error) {
 		device_printf(dev, "could not allocate dma tag\n");
 		return (ENOMEM);
@@ -1163,7 +1085,6 @@
 	/*
 	 * Allocate the parent bus DMA tag appropriate for PCI.
 	 */
-#define RL_NSEG_NEW 32
 	error = bus_dma_tag_create(NULL,	/* parent */
 			1, 0,			/* alignment, boundary */
 			BUS_SPACE_MAXADDR_32BIT,/* lowaddr */
@@ -1374,9 +1295,11 @@
 	int			idx;
 	struct mbuf		*m;
 {
-	struct rl_dmaload_arg	arg;
 	struct mbuf		*n = NULL;
-	int			error;
+	struct rl_desc		*d = &sc->rl_ldata.rl_rx_list[idx];
+	bus_dma_segment_t	segs[1];
+	int			error, nseg;
+	u_int32_t		cmdstat;
 
 	if (m == NULL) {
 		n = m_getcl(M_DONTWAIT, MT_DATA, M_PKTHDR);
@@ -1400,22 +1323,23 @@
 	 */
 	m_adj(m, RE_ETHER_ALIGN);
 #endif
-	arg.sc = sc;
-	arg.rl_idx = idx;
-	arg.rl_maxsegs = 1;
-	arg.rl_flags = 0;
-	arg.rl_ring = sc->rl_ldata.rl_rx_list;
-
-	error = bus_dmamap_load_mbuf(sc->rl_ldata.rl_mtag,
-	    sc->rl_ldata.rl_rx_dmamap[idx], m, re_dma_map_desc,
-	    &arg, BUS_DMA_NOWAIT);
-	if (error || arg.rl_maxsegs != 1) {
+	error = bus_dmamap_load_mbuf_sg(sc->rl_ldata.rl_mtag,
+	    sc->rl_ldata.rl_rx_dmamap[idx], m, segs, &nseg, BUS_DMA_NOWAIT);
+	if (error || nseg != 1) {
 		if (n != NULL)
 			m_freem(n);
 		return (ENOMEM);
 	}
 
-	sc->rl_ldata.rl_rx_list[idx].rl_cmdstat |= htole32(RL_RDESC_CMD_OWN);
+	d->rl_bufaddr_lo = htole32(RL_ADDR_LO(segs[0].ds_addr));
+	d->rl_bufaddr_hi = htole32(RL_ADDR_HI(segs[0].ds_addr));
+
+	cmdstat = segs[0].ds_len | RL_RDESC_CMD_OWN;
+	if (idx == (RL_RX_DESC_CNT - 1))
+	    cmdstat |= RL_TDESC_CMD_EOR;
+
+	d->rl_cmdstat = htole32(cmdstat);
+
 	sc->rl_ldata.rl_rx_mbuf[idx] = m;
 
 	bus_dmamap_sync(sc->rl_ldata.rl_mtag,
@@ -1872,10 +1796,13 @@
 	int			*idx;
 {
 	struct mbuf		*m_new = NULL;
-	struct rl_dmaload_arg	arg;
-	bus_dmamap_t		map;
-	int			error;
 	struct m_tag		*mtag;
+	bus_dmamap_t		map;
+	bus_dma_segment_t	segs[RL_NSEG_NEW];
+	struct rl_desc		*d;
+	int			error, nseg, i;
+	int			nidx = *idx;
+	uint32_t		flags = 0;
 
 	RL_LOCK_ASSERT(sc);
 
@@ -1889,21 +1816,14 @@
 	 * chip. This is a requirement.
 	 */
 
-	arg.rl_flags = 0;
-
 	if ((*m_head)->m_pkthdr.csum_flags & CSUM_IP)
-		arg.rl_flags |= RL_TDESC_CMD_IPCSUM;
+		flags |= RL_TDESC_CMD_IPCSUM;
 	if ((*m_head)->m_pkthdr.csum_flags & CSUM_TCP)
-		arg.rl_flags |= RL_TDESC_CMD_TCPCSUM;
+		flags |= RL_TDESC_CMD_TCPCSUM;
 	if ((*m_head)->m_pkthdr.csum_flags & CSUM_UDP)
-		arg.rl_flags |= RL_TDESC_CMD_UDPCSUM;
-
-	arg.sc = sc;
-	arg.rl_idx = *idx;
-	arg.rl_maxsegs = sc->rl_ldata.rl_tx_free;
-	arg.rl_ring = sc->rl_ldata.rl_tx_list;
+		flags |= RL_TDESC_CMD_UDPCSUM;
 
-	map = sc->rl_ldata.rl_tx_dmamap[*idx];
+	map = sc->rl_ldata.rl_tx_dmamap[nidx];
 
 	/*
 	 * Network performance is extremely poor in gigE mode without
@@ -1915,25 +1835,52 @@
 	if (m_new != NULL)
 		*m_head = m_new;
 
-	error = bus_dmamap_load_mbuf(sc->rl_ldata.rl_mtag, map,
-	    *m_head, re_dma_map_desc, &arg, BUS_DMA_NOWAIT);
+	error = bus_dmamap_load_mbuf_sg(sc->rl_ldata.rl_mtag, map,
+	    *m_head, segs, &nseg, BUS_DMA_NOWAIT);
 
 	if (error) {
 		if_printf(sc->rl_ifp, "can't map mbuf (error %d)\n", error);
 		return (ENOBUFS);
 	}
 
+	KASSERT(nseg <= sc->rl_ldata.rl_tx_free, ("too many DMA segments"));
+
+	/*
+	 * Map the segment array into descriptors. We keep track of the
+	 * end of the ring and set the end-of-ring bits as needed, and
+	 * we set the ownership bits in all except the very first descriptor.
+	 * (It is set later during the start transmission or reception.)
+	 */
+	for (i = 0;;) {
+		u_int32_t		cmdstat;
+		d = &sc->rl_ldata.rl_tx_list[nidx];
+		KASSERT((le32toh(d->rl_cmdstat) & RL_RDESC_STAT_OWN)==0,
+		    ("Descriptor still owned"));
+		cmdstat = segs[i].ds_len;
+		d->rl_bufaddr_lo = htole32(RL_ADDR_LO(segs[i].ds_addr));
+		d->rl_bufaddr_hi = htole32(RL_ADDR_HI(segs[i].ds_addr));
+		cmdstat |= (i == 0) ? RL_TDESC_CMD_SOF : RL_TDESC_CMD_OWN;
+		if (nidx == (RL_TX_DESC_CNT - 1))
+			cmdstat |= RL_TDESC_CMD_EOR;
+		d->rl_cmdstat = htole32(cmdstat | flags);
+		i++;
+		if (i == nseg)
+			break;
+		RL_DESC_INC(nidx);
+	}
+
+	d->rl_cmdstat |= htole32(RL_TDESC_CMD_EOF);
+
 	/*
 	 * Insure that the map for this transmission
 	 * is placed at the array index of the last descriptor
 	 * in this chain.  (Swap last and first dmamaps.)
 	 */
-	sc->rl_ldata.rl_tx_dmamap[*idx] =
-	    sc->rl_ldata.rl_tx_dmamap[arg.rl_idx];
-	sc->rl_ldata.rl_tx_dmamap[arg.rl_idx] = map;
+	sc->rl_ldata.rl_tx_dmamap[*idx] = sc->rl_ldata.rl_tx_dmamap[nidx];
+	sc->rl_ldata.rl_tx_dmamap[nidx] = map;
 
-	sc->rl_ldata.rl_tx_mbuf[arg.rl_idx] = *m_head;
-	sc->rl_ldata.rl_tx_free -= arg.rl_maxsegs;
+	sc->rl_ldata.rl_tx_mbuf[nidx] = *m_head;
+	sc->rl_ldata.rl_tx_free -= nseg;
 
 	/*
 	 * Set up hardware VLAN tagging. Note: vlan tag info must
@@ -1941,21 +1888,19 @@
 	 * transmission attempt.
 	 */
 
+	d = &sc->rl_ldata.rl_tx_list[*idx];
+
 	mtag = VLAN_OUTPUT_TAG(sc->rl_ifp, *m_head);
 	if (mtag != NULL)
-		sc->rl_ldata.rl_tx_list[*idx].rl_vlanctl =
+		d->rl_vlanctl =
 		    htole32(htons(VLAN_TAG_VALUE(mtag)) | RL_TDESC_VLANCTL_TAG);
 
 	/* Transfer ownership of packet to the chip. */
 
-	sc->rl_ldata.rl_tx_list[arg.rl_idx].rl_cmdstat |=
-	    htole32(RL_TDESC_CMD_OWN);
-	if (*idx != arg.rl_idx)
-		sc->rl_ldata.rl_tx_list[*idx].rl_cmdstat |=
-		    htole32(RL_TDESC_CMD_OWN);
+	d->rl_cmdstat |= htole32(RL_TDESC_CMD_OWN);
 
-	RL_DESC_INC(arg.rl_idx);
-	*idx = arg.rl_idx;
+	RL_DESC_INC(nidx);
+	*idx = nidx;
 
 	return (0);
 }


More information about the freebsd-amd64 mailing list