Further mbuf adjustments and changes

Andre Oppermann andre at freebsd.org
Wed Aug 21 13:41:05 UTC 2013


I want to put these mbuf changes/updates/adjustments up for objections, if any,
before committing them.

This is a moderate overhaul of the mbuf headers and fields to take us into the
next 5 years and two releases.  The mbuf headers, in particular the pkthdr, have
seen a number of new uses and abuses over the years.  Some other uses have fallen
by the wayside in the same time.

The goal of the changes presented here is to better accommodate additional upcoming
offload features and to allow full backporting from HEAD to the announced 10-stable
branch while preserving the API and ABI compatibility.

The individual changes and their rationale are described below.  It is presented
as one big patch to show the big picture.  For any commits it will be broken into
functional units as usual.  Except for two limited changes the API in current HEAD
remains stable with only a recompile necessary.

Improved alignment and overall size of mbuf headers:

The m_hdr, pkthdr and m_ext structures are adjusted to improve alignment and packing
on 32 and 64 bit architectures. The mbuf structures have grown/changed considerably
and currently are at 88/144 bytes (32/64bit) leaving less space for data and more
importantly exceeding two 64 byte cache lines on typical CPU's.  The latter being
relevant when m_ext is accessed.

m_hdr is compacted from 24/40 to 20/32 bytes by packing the type and flags fields
into one uint32.  The type is an enum with only a handful of types in use and thus
reduced from int to only 8 bits allowing for 255 types to be specified.  The most
we ever had was around a dozen.  Since then it has shrunk to only 5 for a long time.
The flags field gets the remaining 24 bits with 12 bits for global persistent flags,
of which 9 (possibly 10) are in use, and 12 bits for protocol/layer specific overlays.
Out of the global flags some could be moved to csum/offload bits in the pkthdr.  No
further growth in the number of global flags is foreseen as new uses either are layer
or protocol specific or belong to offload capabilities which have their own flags.

pkthdr size stays the same at 48/56 but changes a number of fields to adapt to
predominant current and future uses.  In particular the "header" field has only
little use and is moved into a 64bit protocol/layer specific union for local use.
Primary users are IP reassembly, IGMP/MLD and ATM storing information while the
packet is being worked on.  "header" was never used across layers.  "csum_flags"
is extended to 64 bits to allow additional future offload information to be
carried (for example IPsec offload and others).  Definition of the RSS hash type
is moved from the hackish global m_flags to its own 8 bit enum in the pkthdr.
An addition is cosqos to store Class of Service / Quality of Service information
with the packet.  Depending on the transport mechanism it may get reduced in
width during encapsulation (vlan header).  These capabilities are currently not
supported in any drivers but allow us to get on par with Cisco/Juniper in routing
applications (plus MPLS QoS).  Four 8 bit fields l[2-5]hlen are added to store
the relative and cumulative header offsets from the start of the packet.  This is
important for various offload capabilities and to relieve the drivers from having
to parse the packet headers to find out or verify the header location for checksums.
Parsing in drivers is a lot of copy-paste and unhandled corner cases which we want
to avoid.  The surrounding infrastructure in the stack and drivers is part of a
current FreeBSD Foundation grant under progress.  Another flexible 64 bit union
serves to map various additional persistent packet information, like ether_vtag,
tso_segsz and csum fields.  Depending on the csum_flags settings some fields may
have different usage making it very flexible and adaptable to future capabilities.

m_ext is compacted from 28/56 to 28/48 simply be rearranging the field ordering
to allow for better packing.  Again the type is an enum with only a few values but
used to have a full int to waste.  It is split into a 8 bit type and 24 bit flags.
With more special uses in high performance network interfaces and more specialized
external memory attached to mbufs it makes sense to add a specific flags field.
It can for example convey information about externally managed reference counts
without having to invent a ext_type each time and having special casing it.
The biggest change is an argument extension to the *ext_free function pointer adding
a pointer to the mbuf itself.  It was always a bit painful not having direct access
to the mbuf we're freeing the external storage from.  One could use one of the args
for it but that would be a waste.  All uses in the tree are mechanically adjusted.
- void (*ext_free)(void *, void *, void *);
+ void (*ext_free)(struct mbuf *, void *, void *);

The header portion of struct mbuf thus changes from 88/144 to 96/136.  The last
8 bytes to push it down to 128 are only reachable with intrusive changes, like
removing the second argument from m_ext.

CSUM flags:

The current CSUM flags are a bit chaotic and rather poorly document, especially
that their use on the outbound (down the stack) and inbound (up the stack) use
is rather different.  Especially the latter are handled partially incorrect in
almost all drivers.  To bring clarity into this mess the CSUM flags are named
and arranged more appropriately with compatibility mappings.  The drivers then
can be corrected one by one as the work progresses in the new 11-HEAD and MFCd
without issue to then 10-stable.  The l[3-5]hlen fields provide the means to
remove all packet header parsing from the drivers for offload setup.

Others:

Mbuf initialization is unified through m_init() and m_pkthdr_init() to avoid
duplication.  m_free_fast() is removed for lack of usage.

Patch is available here:

  http://people.freebsd.org/~andre/mbuf-adjustments-20130821.diff

This work is sponsored by the FreeBSD Foundation.

-- 
Andre
-------------- next part --------------
Index: sys/mbuf.h
===================================================================
--- sys/mbuf.h	(revision 254596)
+++ sys/mbuf.h	(working copy)
@@ -67,8 +67,10 @@
  * type:
  *
  * mtod(m, t)	-- Convert mbuf pointer to data pointer of correct type.
+ * mtodo(m, o)	-- Same as above but with offset 'o' into data.
  */
 #define	mtod(m, t)	((t)((m)->m_data))
+#define	mtodo(m, o)	((void *)(((m)->m_data) + (o)))
 
 /*
  * Argument structure passed to UMA routines during mbuf and packet
@@ -80,74 +82,98 @@
 };
 #endif /* _KERNEL */
 
-#if defined(__LP64__)
-#define M_HDR_PAD    6
-#else
-#define M_HDR_PAD    2
-#endif
-
 /*
  * Header present at the beginning of every mbuf.
+ * Size ILP32: 20
+ *	 LP64: 32
  */
 struct m_hdr {
 	struct mbuf	*mh_next;	/* next buffer in chain */
 	struct mbuf	*mh_nextpkt;	/* next chain in queue/record */
 	caddr_t		 mh_data;	/* location of data */
-	int		 mh_len;	/* amount of data in this mbuf */
-	int		 mh_flags;	/* flags; see below */
-	short		 mh_type;	/* type of data in this mbuf */
-	uint8_t          pad[M_HDR_PAD];/* word align                  */
+	int32_t		 mh_len;	/* amount of data in this mbuf */
+	uint32_t	 mh_type:8,	/* type of data in this mbuf */
+			 mh_flags:24;	/* flags; see below */
 };
 
 /*
  * Packet tag structure (see below for details).
+ * Size ILP32: 16
+ *	 LP64: 24
  */
 struct m_tag {
 	SLIST_ENTRY(m_tag)	m_tag_link;	/* List of packet tags */
-	u_int16_t		m_tag_id;	/* Tag ID */
-	u_int16_t		m_tag_len;	/* Length of data */
-	u_int32_t		m_tag_cookie;	/* ABI/Module ID */
+	uint16_t		m_tag_id;	/* Tag ID */
+	uint16_t		m_tag_len;	/* Length of data */
+	uint32_t		m_tag_cookie;	/* ABI/Module ID */
 	void			(*m_tag_free)(struct m_tag *);
 };
 
 /*
  * Record/packet header in first mbuf of chain; valid only if M_PKTHDR is set.
+ * Size ILP32: 48
+ *	 LP64: 56
  */
 struct pkthdr {
 	struct ifnet	*rcvif;		/* rcv interface */
-	/* variables for ip and tcp reassembly */
-	void		*header;	/* pointer to packet header */
-	int		 len;		/* total packet length */
-	uint32_t	 flowid;	/* packet's 4-tuple system
-					 * flow identifier
-					 */
-	/* variables for hardware checksum */
-	int		 csum_flags;	/* flags regarding checksum */
-	int		 csum_data;	/* data field used by csum routines */
-	u_int16_t	 tso_segsz;	/* TSO segment size */
+	SLIST_HEAD(packet_tags, m_tag) tags; /* list of packet tags */
+	int32_t		 len;		/* total packet length */
+
+	/* Layer crossing persistent information. */
+	uint32_t	 flowid;	/* packet's 4-tuple system */
+	uint64_t	 csum_flags;	/* checksum and offload features */
+
+	uint16_t	 fibnum;	/* this packet should use this fib */
+	uint8_t		 cosqos;	/* class/quality of service */
+	uint8_t		 rsstype;	/* hash type */
+
+	uint8_t		 l2hlen;	/* layer 2 header length */
+	uint8_t		 l3hlen;	/* layer 3 header length */
+	uint8_t		 l4hlen;	/* layer 4 header length */
+	uint8_t		 l5hlen;	/* layer 5 header length */
+
 	union {
-		u_int16_t vt_vtag;	/* Ethernet 802.1p+q vlan tag */
-		u_int16_t vt_nrecs;	/* # of IGMPv3 records in this chain */
-	} PH_vt;
-	u_int16_t	 fibnum;	/* this packet should use this fib */
-	u_int16_t	 pad2;		/* align to 32 bits */
-	SLIST_HEAD(packet_tags, m_tag) tags; /* list of packet tags */
+		uint8_t  eigth[8];
+		uint16_t sixteen[4];
+		uint32_t thirtytwo[2];
+		uint64_t sixtyfour[1];
+		uintptr_t unintptr[1];
+		void	*ptr;
+	} PH_per;
+
+	/* Layer specific non-persistent local storage for reassembly, etc. */
+	union {
+		uint8_t  eigth[8];
+		uint16_t sixteen[4];
+		uint32_t thirtytwo[2];
+		uint64_t sixtyfour[1];
+		uintptr_t unintptr[1];
+		void 	*ptr;
+	} PH_loc;
 };
-#define ether_vtag	PH_vt.vt_vtag
+#define ether_vtag	PH_per.sixteen[0]
+#define PH_vt		PH_per
+#define	vt_nrecs	sixteen[0]
+#define	tso_segsz	PH_per.sixteen[1]
+#define	csum_phsum	PH_per.sixteen[2]
+#define	csum_data	PH_per.thirtytwo[1]
 
 /*
  * Description of external storage mapped into mbuf; valid only if M_EXT is
  * set.
+ * Size ILP32: 28
+ *	 LP64: 48
  */
 struct m_ext {
+	volatile u_int	*ref_cnt;	/* pointer to ref count info */
 	caddr_t		 ext_buf;	/* start of buffer */
+	uint32_t	 ext_size;	/* size of buffer, for ext_free */
+	uint32_t	 ext_type:8,	/* type of external storage */
+			 ext_flags:24;	/* external storage mbuf flags */
 	void		(*ext_free)	/* free routine if not the usual */
-			    (void *, void *);
+			    (struct mbuf *, void *, void *);
 	void		*ext_arg1;	/* optional argument pointer */
 	void		*ext_arg2;	/* optional argument pointer */
-	u_int		 ext_size;	/* size of buffer, for ext_free */
-	volatile u_int	*ref_cnt;	/* pointer to ref count info */
-	int		 ext_type;	/* type of external storage */
 };
 
 /*
@@ -180,7 +206,10 @@
 #define	m_dat		M_dat.M_databuf
 
 /*
- * mbuf flags.
+ * mbuf flags of global significance and layer crossing.
+ * Those of only protocol/layer specific significance are to be mapped
+ * to M_PROTO[1-12] and cleared at layer handoff boundaries.
+ * NB: Limited to the lower 24 bits.
  */
 #define	M_EXT		0x00000001 /* has associated external storage */
 #define	M_PKTHDR	0x00000002 /* start of record */
@@ -205,8 +234,6 @@
 #define	M_PROTO11	0x00400000 /* protocol-specific */
 #define	M_PROTO12	0x00800000 /* protocol-specific */
 
-#define	M_HASHTYPEBITS	0x0F000000 /* mask of bits holding flowid hash type */
-
 /*
  * Flags to purge when crossing layers.
  */
@@ -215,6 +242,13 @@
      M_PROTO9|M_PROTO10|M_PROTO11|M_PROTO12)
 
 /*
+ * Flags preserved when copying m_pkthdr.
+ */
+#define M_COPYFLAGS \
+    (M_PKTHDR|M_EOR|M_RDONLY|M_BCAST|M_MCAST|M_VLANTAG|M_PROMISC| \
+     M_PROTOFLAGS)
+
+/*
  * Mbuf flag description for use with printf(9) %b identifier.
  */
 #define	M_FLAG_BITS \
@@ -241,34 +275,29 @@
  * that provide an opaque flow identifier, allowing for ordering and
  * distribution without explicit affinity.
  */
-#define	M_HASHTYPE_SHIFT		24
-#define	M_HASHTYPE_NONE			0x0
-#define	M_HASHTYPE_RSS_IPV4		0x1	/* IPv4 2-tuple */
-#define	M_HASHTYPE_RSS_TCP_IPV4		0x2	/* TCPv4 4-tuple */
-#define	M_HASHTYPE_RSS_IPV6		0x3	/* IPv6 2-tuple */
-#define	M_HASHTYPE_RSS_TCP_IPV6		0x4	/* TCPv6 4-tuple */
-#define	M_HASHTYPE_RSS_IPV6_EX		0x5	/* IPv6 2-tuple + ext hdrs */
-#define	M_HASHTYPE_RSS_TCP_IPV6_EX	0x6	/* TCPv6 4-tiple + ext hdrs */
-#define	M_HASHTYPE_OPAQUE		0xf	/* ordering, not affinity */
+#define	M_HASHTYPE_NONE			0
+#define	M_HASHTYPE_RSS_IPV4		1	/* IPv4 2-tuple */
+#define	M_HASHTYPE_RSS_TCP_IPV4		2	/* TCPv4 4-tuple */
+#define	M_HASHTYPE_RSS_IPV6		3	/* IPv6 2-tuple */
+#define	M_HASHTYPE_RSS_TCP_IPV6		4	/* TCPv6 4-tuple */
+#define	M_HASHTYPE_RSS_IPV6_EX		5	/* IPv6 2-tuple + ext hdrs */
+#define	M_HASHTYPE_RSS_TCP_IPV6_EX	6	/* TCPv6 4-tiple + ext hdrs */
+#define	M_HASHTYPE_OPAQUE		255	/* ordering, not affinity */
 
-#define	M_HASHTYPE_CLEAR(m)	(m)->m_flags &= ~(M_HASHTYPEBITS)
-#define	M_HASHTYPE_GET(m)	(((m)->m_flags & M_HASHTYPEBITS) >> \
-				    M_HASHTYPE_SHIFT)
+#define	M_HASHTYPE_CLEAR(m)	(m)->m_pkthdr.rsstype = 0
+#define	M_HASHTYPE_GET(m)	((m)->m_pkthdr.rsstype)
 #define	M_HASHTYPE_SET(m, v)	do {					\
-	(m)->m_flags &= ~M_HASHTYPEBITS;				\
-	(m)->m_flags |= ((v) << M_HASHTYPE_SHIFT);			\
+	(m)->m_pkthdr.rsstype = (v)					\
 } while (0)
 #define	M_HASHTYPE_TEST(m, v)	(M_HASHTYPE_GET(m) == (v))
 
 /*
- * Flags preserved when copying m_pkthdr.
+ * COS/QOS class and quality of service tags.
  */
-#define	M_COPYFLAGS \
-    (M_PKTHDR|M_EOR|M_RDONLY|M_BCAST|M_MCAST|M_VLANTAG|M_PROMISC| \
-     M_PROTOFLAGS|M_HASHTYPEBITS)
+#define	COSQOS_BE	0x00	/* best effort */
 
 /*
- * External buffer types: identify ext_buf type.
+ * External mbuf storage buffer types.
  */
 #define	EXT_CLUSTER	1	/* mbuf cluster */
 #define	EXT_SFBUF	2	/* sendfile(2)'s sf_bufs */
@@ -277,56 +306,115 @@
 #define	EXT_JUMBO16	5	/* jumbo cluster 16184 bytes */
 #define	EXT_PACKET	6	/* mbuf+cluster from packet zone */
 #define	EXT_MBUF	7	/* external mbuf reference (M_IOVEC) */
-#define	EXT_NET_DRV	100	/* custom ext_buf provided by net driver(s) */
-#define	EXT_MOD_TYPE	200	/* custom module's ext_buf type */
-#define	EXT_DISPOSABLE	300	/* can throw this buffer away w/page flipping */
-#define	EXT_EXTREF	400	/* has externally maintained ref_cnt ptr */
 
+#define	EXT_VENDOR1	224	/* for vendor-internal use */
+#define	EXT_VENDOR2	225	/* for vendor-internal use */
+#define	EXT_VENDOR3	226	/* for vendor-internal use */
+#define	EXT_VENDOR4	227	/* for vendor-internal use */
+#define	EXT_VENDOR5	228	/* for vendor-internal use */
+#define	EXT_VENDOR6	229	/* for vendor-internal use */
+#define	EXT_VENDOR7	230	/* for vendor-internal use */
+#define	EXT_VENDOR8	231	/* for vendor-internal use */
+
+#define	EXT_EXP1	244	/* for experimental use */
+#define	EXT_EXP2	245	/* for experimental use */
+#define	EXT_EXP4	246	/* for experimental use */
+#define	EXT_EXP8	247	/* for experimental use */
+
+#define	EXT_NET_DRV	252	/* custom ext_buf provided by net driver(s) */
+#define	EXT_MOD_TYPE	253	/* custom module's ext_buf type */
+#define	EXT_DISPOSABLE	254	/* can throw this buffer away w/page flipping */
+#define	EXT_EXTREF	255	/* has externally maintained ref_cnt ptr */
+
 /*
- * Flags indicating hw checksum support and sw checksum requirements.  This
- * field can be directly tested against if_data.ifi_hwassist.
+ * Flags for external mbuf buffer types.
+ * NB: limited to the lower 24 bits.
  */
-#define	CSUM_IP			0x0001		/* will csum IP */
-#define	CSUM_TCP		0x0002		/* will csum TCP */
-#define	CSUM_UDP		0x0004		/* will csum UDP */
-#define	CSUM_FRAGMENT		0x0010		/* will do IP fragmentation */
-#define	CSUM_TSO		0x0020		/* will do TSO */
-#define	CSUM_SCTP		0x0040		/* will csum SCTP */
-#define CSUM_SCTP_IPV6		0x0080		/* will csum IPv6/SCTP */
+#define	EXT_FLAG_EXTREF		0x000001	/* external ref_cnt ptr */
 
-#define	CSUM_IP_CHECKED		0x0100		/* did csum IP */
-#define	CSUM_IP_VALID		0x0200		/*   ... the csum is valid */
-#define	CSUM_DATA_VALID		0x0400		/* csum_data field is valid */
-#define	CSUM_PSEUDO_HDR		0x0800		/* csum_data has pseudo hdr */
-#define	CSUM_SCTP_VALID		0x1000		/* SCTP checksum is valid */
-#define	CSUM_UDP_IPV6		0x2000		/* will csum IPv6/UDP */
-#define	CSUM_TCP_IPV6		0x4000		/* will csum IPv6/TCP */
-/*	CSUM_TSO_IPV6		0x8000		will do IPv6/TSO */
+#define	EXT_FLAG_VENDOR1	0x010000	/* for vendor-internal use */
+#define	EXT_FLAG_VENDOR2	0x020000	/* for vendor-internal use */
+#define	EXT_FLAG_VENDOR3	0x040000	/* for vendor-internal use */
+#define	EXT_FLAG_VENDOR4	0x080000	/* for vendor-internal use */
 
-/*	CSUM_FRAGMENT_IPV6	0x10000		will do IPv6 fragementation */
+#define	EXT_FLAG_EXP1		0x100000	/* for experimental use */
+#define	EXT_FLAG_EXP2		0x200000	/* for experimental use */
+#define	EXT_FLAG_EXP4		0x400000	/* for experimental use */
+#define	EXT_FLAG_EXP8		0x800000	/* for experimental use */
 
-#define	CSUM_DELAY_DATA_IPV6	(CSUM_TCP_IPV6 | CSUM_UDP_IPV6)
+/*
+ * Flags indicating checksum, segmentation and other offload work to be
+ * done, or already done, by hardware or lower layers.  It is split into
+ * separate inbound and outbound flags.
+ *
+ * Outbound flags that are set by upper protocol layers requesting lower
+ * layers, or ideally the hardware, to perform these offloading tasks.
+ * For outbound packets this field and its flags can be directly tested
+ * against if_data.ifi_hwassist.
+ */
+#define	CSUM_IP			0x00000001	/* IP header checksum offload */
+#define	CSUM_IP_UDP		0x00000002	/* UDP checksum offload */
+#define	CSUM_IP_TCP		0x00000004	/* TCP checksum offload */
+#define	CSUM_IP_SCTP		0x00000008	/* SCTP checksum offload */
+#define	CSUM_IP_TSO		0x00000010	/* TCP segmentation offload */
+#define	CSUM_IP_ISCSI		0x00000020	/* iSCSI checksum offload */
+
+#define	CSUM_IP6_UDP		0x00000200	/* UDP checksum offload */
+#define	CSUM_IP6_TCP		0x00000400	/* TCP checksum offload */
+#define	CSUM_IP6_SCTP		0x00000800	/* SCTP checksum offload */
+#define	CSUM_IP6_TSO		0x00001000	/* TCP segmentation offload */
+#define	CSUM_IP6_ISCSI		0x00002000	/* iSCSI checksum offload */
+
+/* Inbound checksum support where the checksum was verified by hardware. */
+#define	CSUM_L3_CALC		0x01000000	/* calculated layer 3 csum */
+#define	CSUM_L3_VALID		0x02000000	/* checksum is correct */
+#define	CSUM_L4_CALC		0x04000000	/* calculated layer 4 csum */
+#define	CSUM_L4_VALID		0x08000000	/* checksum is correct */
+#define	CSUM_L5_CALC		0x10000000	/* calculated layer 5 csum */
+#define	CSUM_L5_VALID		0x20000000	/* checksum is correct */
+#define	CSUM_COALESED		0x40000000	/* contains merged segments */
+
+/* CSUM flags compatibility mappings. */
+#define	CSUM_IP_CHECKED		CSUM_L3_CALC
+#define	CSUM_IP_VALID		CSUM_L3_VALID
+#define	CSUM_DATA_VALID		CSUM_L4_VALID
+#define	CSUM_PSEUDO_HDR		CSUM_L4_CALC
+#define	CSUM_SCTP_VALID		CSUM_L3_VALID
+#define	CSUM_DELAY_DATA		(CSUM_TCP|CSUM_UDP)
+#define	CSUM_DELAY_IP		CSUM_IP		/* Only v4, no v6 IP hdr csum */
+#define	CSUM_DELAY_DATA_IPV6	(CSUM_TCP_IPV6|CSUM_UDP_IPV6)
 #define	CSUM_DATA_VALID_IPV6	CSUM_DATA_VALID
+#define	CSUM_TCP		CSUM_IP_TCP
+#define	CSUM_UDP		CSUM_IP_UDP
+#define	CSUM_SCTP		CSUM_IP_SCTP
+#define	CSUM_TSO		(CSUM_IP_TSO|CSUM_IP6_TSO)
+#define	CSUM_UDP_IPV6		CSUM_IP6_UDP
+#define	CSUM_TCP_IPV6		CSUM_IP6_TCP
+#define	CSUM_SCTP_IPV6		CSUM_IP6_SCTP
+#define	CSUM_FRAGMENT		0x0		/* IP fragmentation offload */
 
-#define	CSUM_DELAY_DATA		(CSUM_TCP | CSUM_UDP)
-#define	CSUM_DELAY_IP		(CSUM_IP)	/* Only v4, no v6 IP hdr csum */
-
 /*
- * mbuf types.
+ * mbuf types describing the content of the mbuf (including external storage).
  */
 #define	MT_NOTMBUF	0	/* USED INTERNALLY ONLY! Object is not mbuf */
 #define	MT_DATA		1	/* dynamic (data) allocation */
 #define	MT_HEADER	MT_DATA	/* packet header, use M_PKTHDR instead */
 #define	MT_SONAME	8	/* socket name */
+
+#define	MT_EXP1		9	/* for experimental use */
+#define	MT_EXP2		10	/* for experimental use */
+#define	MT_VENDOR1	11	/* for vendor-internal use */
+#define	MT_VENDOR2	12	/* for vendor-internal use */
+#define	MT_VENDOR3	13	/* for vendor-internal use */
+
 #define	MT_CONTROL	14	/* extra-data protocol message */
 #define	MT_OOBDATA	15	/* expedited data  */
+
 #define	MT_NTYPES	16	/* number of mbuf types for mbtypes[] */
 
 #define	MT_NOINIT	255	/* Not a type but a flag to allocate
 				   a non-initialized mbuf */
 
-#define MB_NOTAGS	0x1UL	/* no tags attached to mbuf */
-
 /*
  * Compatibility with historic mbuf allocator.
  */
@@ -449,7 +537,6 @@
 
 	m->m_next = NULL;
 	m->m_nextpkt = NULL;
-	m->m_data = m->m_dat;
 	m->m_len = 0;
 	m->m_flags = flags;
 	m->m_type = type;
@@ -456,7 +543,8 @@
 	if (flags & M_PKTHDR) {
 		if ((error = m_pkthdr_init(m, how)) != 0)
 			return (error);
-	}
+	} else if ((flags & M_EXT) == 0)
+		m->m_data = m->m_dat;
 
 	return (0);
 }
@@ -508,17 +596,6 @@
 	return (uma_zalloc_arg(zone_pack, &args, how));
 }
 
-static __inline void
-m_free_fast(struct mbuf *m)
-{
-#ifdef INVARIANTS
-	if (m->m_flags & M_PKTHDR)
-		KASSERT(SLIST_EMPTY(&m->m_pkthdr.tags), ("doing fast free of mbuf with tags"));
-#endif
-
-	uma_zfree_arg(zone_mbuf, m, (void *)MB_NOTAGS);
-}
-
 static __inline struct mbuf *
 m_free(struct mbuf *m)
 {
@@ -779,7 +856,8 @@
 int		 m_append(struct mbuf *, int, c_caddr_t);
 void		 m_cat(struct mbuf *, struct mbuf *);
 int		 m_extadd(struct mbuf *, caddr_t, u_int,
-		    void (*)(void *, void *), void *, void *, int, int, int);
+		    void (*)(struct mbuf *, void *, void *), void *, void *,
+		    int, int, int);
 struct mbuf	*m_collapse(struct mbuf *, int, int);
 void		 m_copyback(struct mbuf *, int, int, c_caddr_t);
 void		 m_copydata(const struct mbuf *, int, int, caddr_t);
Index: dev/cas/if_cas.c
===================================================================
--- dev/cas/if_cas.c	(revision 254596)
+++ dev/cas/if_cas.c	(working copy)
@@ -132,7 +132,7 @@
 static int	cas_disable_rx(struct cas_softc *sc);
 static int	cas_disable_tx(struct cas_softc *sc);
 static void	cas_eint(struct cas_softc *sc, u_int status);
-static void	cas_free(void *arg1, void* arg2);
+static void	cas_free(struct mbuf *m, void *arg1, void* arg2);
 static void	cas_init(void *xsc);
 static void	cas_init_locked(struct cas_softc *sc);
 static void	cas_init_regs(struct cas_softc *sc);
@@ -1888,7 +1888,7 @@
 }
 
 static void
-cas_free(void *arg1, void *arg2)
+cas_free(struct mbuf *m, void *arg1, void *arg2)
 {
 	struct cas_rxdsoft *rxds;
 	struct cas_softc *sc;
Index: dev/cxgb/cxgb_sge.c
===================================================================
--- dev/cxgb/cxgb_sge.c	(revision 254596)
+++ dev/cxgb/cxgb_sge.c	(working copy)
@@ -1470,9 +1470,9 @@
 		hdr->len = htonl(mlen | 0x80000000);
 
 		if (__predict_false(mlen < TCPPKTHDRSIZE)) {
-			printf("mbuf=%p,len=%d,tso_segsz=%d,csum_flags=%#x,flags=%#x",
-			    m0, mlen, m0->m_pkthdr.tso_segsz,
-			    m0->m_pkthdr.csum_flags, m0->m_flags);
+//			printf("mbuf=%p,len=%d,tso_segsz=%d,csum_flags=%#x,flags=%#x",
+//			    m0, mlen, m0->m_pkthdr.tso_segsz,
+//			    m0->m_pkthdr.csum_flags, m0->m_flags);
 			panic("tx tso packet too small");
 		}
 
@@ -2634,7 +2634,6 @@
 	} 
 
 	m->m_pkthdr.rcvif = ifp;
-	m->m_pkthdr.header = mtod(m, uint8_t *) + sizeof(*cpl) + ethpad;
 	/*
 	 * adjust after conversion to mbuf chain
 	 */
Index: dev/e1000/if_igb.c
===================================================================
--- dev/e1000/if_igb.c	(revision 254596)
+++ dev/e1000/if_igb.c	(working copy)
@@ -4981,7 +4981,7 @@
 	}
 
 	if (status & (E1000_RXD_STAT_TCPCS | E1000_RXD_STAT_UDPCS)) {
-		u16 type = (CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
+		u64 type = (CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
 #if __FreeBSD_version >= 800000
 		if (sctp) /* reassign */
 			type = CSUM_SCTP_VALID;
Index: dev/hatm/if_hatm_intr.c
===================================================================
--- dev/hatm/if_hatm_intr.c	(revision 254596)
+++ dev/hatm/if_hatm_intr.c	(working copy)
@@ -261,7 +261,7 @@
  * Free an mbuf and put it onto the free list.
  */
 static void
-hatm_mbuf0_free(void *buf, void *args)
+hatm_mbuf0_free(struct mbuf *m, void *buf, void *args)
 {
 	struct hatm_softc *sc = args;
 	struct mbuf0_chunk *c = buf;
@@ -272,7 +272,7 @@
 	hatm_ext_free(&sc->mbuf_list[0], (struct mbufx_free *)c);
 }
 static void
-hatm_mbuf1_free(void *buf, void *args)
+hatm_mbuf1_free(struct mbuf *m, void *buf, void *args)
 {
 	struct hatm_softc *sc = args;
 	struct mbuf1_chunk *c = buf;
@@ -461,7 +461,7 @@
 			    hatm_mbuf0_free, c0, sc, M_PKTHDR, EXT_EXTREF);
 			m->m_data += MBUF0_OFFSET;
 		} else
-			hatm_mbuf0_free(c0, sc);
+			hatm_mbuf0_free(NULL, c0, sc);
 
 	} else {
 		struct mbuf1_chunk *c1;
@@ -485,7 +485,7 @@
 			    hatm_mbuf1_free, c1, sc, M_PKTHDR, EXT_EXTREF);
 			m->m_data += MBUF1_OFFSET;
 		} else
-			hatm_mbuf1_free(c1, sc);
+			hatm_mbuf1_free(NULL, c1, sc);
 	}
 
 	return (m);
Index: dev/iscsi/initiator/isc_soc.c
===================================================================
--- dev/iscsi/initiator/isc_soc.c	(revision 254596)
+++ dev/iscsi/initiator/isc_soc.c	(working copy)
@@ -69,7 +69,7 @@
  | function for freeing external storage for mbuf
  */
 static void
-ext_free(void *a, void *b)
+ext_free(struct mbuf *m, void *a, void *b)
 {
      pduq_t *pq = b;
 
Index: dev/ixgbe/ixgbe.c
===================================================================
--- dev/ixgbe/ixgbe.c	(revision 254596)
+++ dev/ixgbe/ixgbe.c	(working copy)
@@ -4625,7 +4625,7 @@
 			mp->m_pkthdr.csum_flags = 0;
 	}
 	if (status & IXGBE_RXD_STAT_L4CS) {
-		u16 type = (CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
+		u64 type = (CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
 #if __FreeBSD_version >= 800000
 		if (sctp)
 			type = CSUM_SCTP_VALID;
Index: dev/ixgbe/ixv.c
===================================================================
--- dev/ixgbe/ixv.c	(revision 254596)
+++ dev/ixgbe/ixv.c	(working copy)
@@ -3544,7 +3544,7 @@
 			mp->m_pkthdr.csum_flags = 0;
 	}
 	if (status & IXGBE_RXD_STAT_L4CS) {
-		u16 type = (CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
+		u64 type = (CSUM_DATA_VALID | CSUM_PSEUDO_HDR);
 #if __FreeBSD_version >= 800000
 		if (sctp)
 			type = CSUM_SCTP_VALID;
Index: dev/jme/if_jme.c
===================================================================
--- dev/jme/if_jme.c	(revision 254596)
+++ dev/jme/if_jme.c	(working copy)
@@ -1690,7 +1690,7 @@
 	struct mbuf *m;
 	bus_dma_segment_t txsegs[JME_MAXTXSEGS];
 	int error, i, nsegs, prod;
-	uint32_t cflags, tso_segsz;
+	uint32_t cflags, tso_seg;
 
 	JME_LOCK_ASSERT(sc);
 
@@ -1808,10 +1808,10 @@
 
 	m = *m_head;
 	cflags = 0;
-	tso_segsz = 0;
+	tso_seg = 0;
 	/* Configure checksum offload and TSO. */
 	if ((m->m_pkthdr.csum_flags & CSUM_TSO) != 0) {
-		tso_segsz = (uint32_t)m->m_pkthdr.tso_segsz <<
+		tso_seg = (uint32_t)m->m_pkthdr.tso_segsz <<
 		    JME_TD_MSS_SHIFT;
 		cflags |= JME_TD_TSO;
 	} else {
@@ -1830,7 +1830,7 @@
 
 	desc = &sc->jme_rdata.jme_tx_ring[prod];
 	desc->flags = htole32(cflags);
-	desc->buflen = htole32(tso_segsz);
+	desc->buflen = htole32(tso_seg);
 	desc->addr_hi = htole32(m->m_pkthdr.len);
 	desc->addr_lo = 0;
 	sc->jme_cdata.jme_tx_cnt++;
Index: dev/lge/if_lge.c
===================================================================
--- dev/lge/if_lge.c	(revision 254596)
+++ dev/lge/if_lge.c	(working copy)
@@ -119,11 +119,6 @@
 static int lge_attach(device_t);
 static int lge_detach(device_t);
 
-static int lge_alloc_jumbo_mem(struct lge_softc *);
-static void lge_free_jumbo_mem(struct lge_softc *);
-static void *lge_jalloc(struct lge_softc *);
-static void lge_jfree(void *, void *);
-
 static int lge_newbuf(struct lge_softc *, struct lge_rx_desc *, struct mbuf *);
 static int lge_encap(struct lge_softc *, struct mbuf *, u_int32_t *);
 static void lge_rxeof(struct lge_softc *, int);
@@ -521,13 +516,6 @@
 		goto fail;
 	}
 
-	/* Try to allocate memory for jumbo buffers. */
-	if (lge_alloc_jumbo_mem(sc)) {
-		device_printf(dev, "jumbo buffer allocation failed\n");
-		error = ENXIO;
-		goto fail;
-	}
-
 	ifp = sc->lge_ifp = if_alloc(IFT_ETHER);
 	if (ifp == NULL) {
 		device_printf(dev, "can not if_alloc()\n");
@@ -575,7 +563,6 @@
 	return (0);
 
 fail:
-	lge_free_jumbo_mem(sc);
 	if (sc->lge_ldata)
 		contigfree(sc->lge_ldata,
 		    sizeof(struct lge_list_data), M_DEVBUF);
@@ -615,7 +602,6 @@
 
 	contigfree(sc->lge_ldata, sizeof(struct lge_list_data), M_DEVBUF);
 	if_free(ifp);
-	lge_free_jumbo_mem(sc);
 	mtx_destroy(&sc->lge_mtx);
 
 	return(0);
@@ -688,34 +674,17 @@
 	struct mbuf		*m;
 {
 	struct mbuf		*m_new = NULL;
-	caddr_t			*buf = NULL;
 
 	if (m == NULL) {
-		MGETHDR(m_new, M_NOWAIT, MT_DATA);
+		m_new = m_getcl(M_NOWAIT, MT_DATA, M_PKTHDR);
 		if (m_new == NULL) {
 			device_printf(sc->lge_dev, "no memory for rx list "
 			    "-- packet dropped!\n");
 			return(ENOBUFS);
 		}
-
-		/* Allocate the jumbo buffer */
-		buf = lge_jalloc(sc);
-		if (buf == NULL) {
-#ifdef LGE_VERBOSE
-			device_printf(sc->lge_dev, "jumbo allocation failed "
-			    "-- packet dropped!\n");
-#endif
-			m_freem(m_new);
-			return(ENOBUFS);
-		}
-		/* Attach the buffer to the mbuf */
-		m_new->m_data = (void *)buf;
-		m_new->m_len = m_new->m_pkthdr.len = LGE_JUMBO_FRAMELEN;
-		MEXTADD(m_new, buf, LGE_JUMBO_FRAMELEN, lge_jfree,
-		    buf, (struct lge_softc *)sc, 0, EXT_NET_DRV);
 	} else {
 		m_new = m;
-		m_new->m_len = m_new->m_pkthdr.len = LGE_JUMBO_FRAMELEN;
+		m_new->m_len = m_new->m_pkthdr.len = m_new->m_ext.ext_size;
 		m_new->m_data = m_new->m_ext.ext_buf;
 	}
 
@@ -750,135 +719,7 @@
 	return(0);
 }
 
-static int
-lge_alloc_jumbo_mem(sc)
-	struct lge_softc	*sc;
-{
-	caddr_t			ptr;
-	register int		i;
-	struct lge_jpool_entry   *entry;
-
-	/* Grab a big chunk o' storage. */
-	sc->lge_cdata.lge_jumbo_buf = contigmalloc(LGE_JMEM, M_DEVBUF,
-	    M_NOWAIT, 0, 0xffffffff, PAGE_SIZE, 0);
-
-	if (sc->lge_cdata.lge_jumbo_buf == NULL) {
-		device_printf(sc->lge_dev, "no memory for jumbo buffers!\n");
-		return(ENOBUFS);
-	}
-
-	SLIST_INIT(&sc->lge_jfree_listhead);
-	SLIST_INIT(&sc->lge_jinuse_listhead);
-
-	/*
-	 * Now divide it up into 9K pieces and save the addresses
-	 * in an array.
-	 */
-	ptr = sc->lge_cdata.lge_jumbo_buf;
-	for (i = 0; i < LGE_JSLOTS; i++) {
-		sc->lge_cdata.lge_jslots[i] = ptr;
-		ptr += LGE_JLEN;
-		entry = malloc(sizeof(struct lge_jpool_entry),
-		    M_DEVBUF, M_NOWAIT);
-		if (entry == NULL) {
-			device_printf(sc->lge_dev, "no memory for jumbo "
-			    "buffer queue!\n");
-			return(ENOBUFS);
-		}
-		entry->slot = i;
-		SLIST_INSERT_HEAD(&sc->lge_jfree_listhead,
-		    entry, jpool_entries);
-	}
-
-	return(0);
-}
-
-static void
-lge_free_jumbo_mem(sc)
-	struct lge_softc	*sc;
-{
-	struct lge_jpool_entry	*entry;
-
-	if (sc->lge_cdata.lge_jumbo_buf == NULL)
-		return;
-
-	while ((entry = SLIST_FIRST(&sc->lge_jinuse_listhead))) {
-		device_printf(sc->lge_dev,
-		    "asked to free buffer that is in use!\n");
-		SLIST_REMOVE_HEAD(&sc->lge_jinuse_listhead, jpool_entries);
-		SLIST_INSERT_HEAD(&sc->lge_jfree_listhead, entry,
-		    jpool_entries);
-	}
-	while (!SLIST_EMPTY(&sc->lge_jfree_listhead)) {
-		entry = SLIST_FIRST(&sc->lge_jfree_listhead);
-		SLIST_REMOVE_HEAD(&sc->lge_jfree_listhead, jpool_entries);
-		free(entry, M_DEVBUF);
-	}
-
-	contigfree(sc->lge_cdata.lge_jumbo_buf, LGE_JMEM, M_DEVBUF);
-
-	return;
-}
-
 /*
- * Allocate a jumbo buffer.
- */
-static void *
-lge_jalloc(sc)
-	struct lge_softc	*sc;
-{
-	struct lge_jpool_entry   *entry;
-	
-	entry = SLIST_FIRST(&sc->lge_jfree_listhead);
-	
-	if (entry == NULL) {
-#ifdef LGE_VERBOSE
-		device_printf(sc->lge_dev, "no free jumbo buffers\n");
-#endif
-		return(NULL);
-	}
-
-	SLIST_REMOVE_HEAD(&sc->lge_jfree_listhead, jpool_entries);
-	SLIST_INSERT_HEAD(&sc->lge_jinuse_listhead, entry, jpool_entries);
-	return(sc->lge_cdata.lge_jslots[entry->slot]);
-}
-
-/*
- * Release a jumbo buffer.
- */
-static void
-lge_jfree(buf, args)
-	void			*buf;
-	void			*args;
-{
-	struct lge_softc	*sc;
-	int		        i;
-	struct lge_jpool_entry   *entry;
-
-	/* Extract the softc struct pointer. */
-	sc = args;
-
-	if (sc == NULL)
-		panic("lge_jfree: can't find softc pointer!");
-
-	/* calculate the slot this buffer belongs to */
-	i = ((vm_offset_t)buf
-	     - (vm_offset_t)sc->lge_cdata.lge_jumbo_buf) / LGE_JLEN;
-
-	if ((i < 0) || (i >= LGE_JSLOTS))
-		panic("lge_jfree: asked to free buffer that we don't manage!");
-
-	entry = SLIST_FIRST(&sc->lge_jinuse_listhead);
-	if (entry == NULL)
-		panic("lge_jfree: buffer not in use!");
-	entry->slot = i;
-	SLIST_REMOVE_HEAD(&sc->lge_jinuse_listhead, jpool_entries);
-	SLIST_INSERT_HEAD(&sc->lge_jfree_listhead, entry, jpool_entries);
-
-	return;
-}
-
-/*
  * A frame has been uploaded: pass the resulting mbuf chain up to
  * the higher level protocols.
  */
Index: dev/lge/if_lgereg.h
===================================================================
--- dev/lge/if_lgereg.h	(revision 254596)
+++ dev/lge/if_lgereg.h	(working copy)
@@ -499,9 +499,6 @@
 	int			lge_rx_cons;
 	int			lge_tx_prod;
 	int			lge_tx_cons;
-	/* Stick the jumbo mem management stuff here too. */
-	caddr_t			lge_jslots[LGE_JSLOTS];
-	void			*lge_jumbo_buf;
 };
 
 struct lge_softc {
@@ -522,8 +519,6 @@
 	struct lge_ring_data	lge_cdata;
 	struct callout		lge_stat_callout;
 	struct mtx		lge_mtx;
-	SLIST_HEAD(__lge_jfreehead, lge_jpool_entry)	lge_jfree_listhead;
-	SLIST_HEAD(__lge_jinusehead, lge_jpool_entry)	lge_jinuse_listhead;
 };
 
 /*
Index: dev/mwl/if_mwl.c
===================================================================
--- dev/mwl/if_mwl.c	(revision 254596)
+++ dev/mwl/if_mwl.c	(working copy)
@@ -2622,7 +2622,7 @@
 }
 
 static void
-mwl_ext_free(void *data, void *arg)
+mwl_ext_free(struct mbuf *m, void *data, void *arg)
 {
 	struct mwl_softc *sc = arg;
 
Index: dev/nfe/if_nfe.c
===================================================================
--- dev/nfe/if_nfe.c	(revision 254596)
+++ dev/nfe/if_nfe.c	(working copy)
@@ -2390,7 +2390,7 @@
 	bus_dmamap_t map;
 	bus_dma_segment_t segs[NFE_MAX_SCATTER];
 	int error, i, nsegs, prod, si;
-	uint32_t tso_segsz;
+	uint32_t tso_seg;
 	uint16_t cflags, flags;
 	struct mbuf *m;
 
@@ -2429,9 +2429,9 @@
 
 	m = *m_head;
 	cflags = flags = 0;
-	tso_segsz = 0;
+	tso_seg = 0;
 	if ((m->m_pkthdr.csum_flags & CSUM_TSO) != 0) {
-		tso_segsz = (uint32_t)m->m_pkthdr.tso_segsz <<
+		tso_seg = (uint32_t)m->m_pkthdr.tso_segsz <<
 		    NFE_TX_TSO_SHIFT;
 		cflags &= ~(NFE_TX_IP_CSUM | NFE_TX_TCP_UDP_CSUM);
 		cflags |= NFE_TX_TSO;
@@ -2482,14 +2482,14 @@
 		if ((m->m_flags & M_VLANTAG) != 0)
 			desc64->vtag = htole32(NFE_TX_VTAG |
 			    m->m_pkthdr.ether_vtag);
-		if (tso_segsz != 0) {
+		if (tso_seg != 0) {
 			/*
 			 * XXX
 			 * The following indicates the descriptor element
 			 * is a 32bit quantity.
 			 */
-			desc64->length |= htole16((uint16_t)tso_segsz);
-			desc64->flags |= htole16(tso_segsz >> 16);
+			desc64->length |= htole16((uint16_t)tso_seg);
+			desc64->flags |= htole16(tso_seg >> 16);
 		}
 		/*
 		 * finally, set the valid/checksum/TSO bit in the first
@@ -2502,14 +2502,14 @@
 		else
 			desc32->flags |= htole16(NFE_TX_LASTFRAG_V1);
 		desc32 = &sc->txq.desc32[si];
-		if (tso_segsz != 0) {
+		if (tso_seg != 0) {
 			/*
 			 * XXX
 			 * The following indicates the descriptor element
 			 * is a 32bit quantity.
 			 */
-			desc32->length |= htole16((uint16_t)tso_segsz);
-			desc32->flags |= htole16(tso_segsz >> 16);
+			desc32->length |= htole16((uint16_t)tso_seg);
+			desc32->flags |= htole16(tso_seg >> 16);
 		}
 		/*
 		 * finally, set the valid/checksum/TSO bit in the first
Index: dev/patm/if_patm.c
===================================================================
--- dev/patm/if_patm.c	(revision 254596)
+++ dev/patm/if_patm.c	(working copy)
@@ -319,7 +319,7 @@
 		for (i = 0; i < IDT_TSQE_TAG_SPACE; i++) {
 			if ((m = scd->on_card[i]) != NULL) {
 				scd->on_card[i] = 0;
-				map = m->m_pkthdr.header;
+				map = m->m_pkthdr.PH_loc.ptr;
 
 				bus_dmamap_unload(sc->tx_tag, map->map);
 				SLIST_INSERT_HEAD(&sc->tx_maps_free, map, link);
Index: dev/patm/if_patm_tx.c
===================================================================
--- dev/patm/if_patm_tx.c	(revision 254596)
+++ dev/patm/if_patm_tx.c	(working copy)
@@ -373,7 +373,7 @@
 		}
 
 		/* save data */
-		m->m_pkthdr.header = vcc;
+		m->m_pkthdr.PH_loc.ptr = vcc;
 
 		/* try to put it on the channels queue */
 		if (_IF_QFULL(&vcc->scd->q)) {
@@ -473,7 +473,7 @@
 		if (m == NULL)
 			break;
 
-		a.vcc = m->m_pkthdr.header;
+		a.vcc = m->m_pkthdr.PH_loc.ptr;
 
 		/* we must know the number of segments beforehand - count
 		 * this may actually give a wrong number of segments for
@@ -499,7 +499,7 @@
 		}
 
 		/* load the map */
-		m->m_pkthdr.header = map;
+		m->m_pkthdr.PH_loc.ptr = map;
 		a.mbuf = m;
 
 		/* handle AAL_RAW */
@@ -690,7 +690,7 @@
 		scd->on_card[last] = NULL;
 		patm_debug(sc, TX, "ok tag=%x", last);
 
-		map = m->m_pkthdr.header;
+		map = m->m_pkthdr.PH_loc.ptr;
 		scd->space += m->m_pkthdr.csum_data;
 
 		bus_dmamap_sync(sc->tx_tag, map->map,
Index: dev/qlxgb/qla_hw.c
===================================================================
--- dev/qlxgb/qla_hw.c	(revision 254596)
+++ dev/qlxgb/qla_hw.c	(working copy)
@@ -999,10 +999,10 @@
 		if ((nsegs > Q8_TX_MAX_SEGMENTS) ||
 			(mp->m_pkthdr.len > ha->max_frame_size)){
 			/* TBD: copy into private buffer and send it */
-        		device_printf(dev,
-				"%s: (nsegs[%d, %d, 0x%x] > Q8_TX_MAX_SEGMENTS)\n",
-				__func__, nsegs, mp->m_pkthdr.len,
-				mp->m_pkthdr.csum_flags);
+//        		device_printf(dev,
+//				"%s: (nsegs[%d, %d, 0x%x] > Q8_TX_MAX_SEGMENTS)\n",
+//				__func__, nsegs, mp->m_pkthdr.len,
+//				mp->m_pkthdr.csum_flags);
 			qla_dump_buf8(ha, "qla_hw_send: wrong pkt",
 				mtod(mp, char *), mp->m_len);
 			return (EINVAL);
Index: dev/sfxge/sfxge_rx.c
===================================================================
--- dev/sfxge/sfxge_rx.c	(revision 254596)
+++ dev/sfxge/sfxge_rx.c	(working copy)
@@ -282,7 +282,6 @@
 	struct ifnet *ifp = sc->ifnet;
 
 	m->m_pkthdr.rcvif = ifp;
-	m->m_pkthdr.header = m->m_data;
 	m->m_pkthdr.csum_data = 0xffff;
 	ifp->if_input(ifp, m);
 }
Index: kern/kern_mbuf.c
===================================================================
--- kern/kern_mbuf.c	(revision 254596)
+++ kern/kern_mbuf.c	(working copy)
@@ -410,9 +410,7 @@
 {
 	struct mbuf *m;
 	struct mb_args *args;
-#ifdef MAC
 	int error;
-#endif
 	int flags;
 	short type;
 
@@ -419,9 +417,7 @@
 #ifdef INVARIANTS
 	trash_ctor(mem, size, arg, how);
 #endif
-	m = (struct mbuf *)mem;
 	args = (struct mb_args *)arg;
-	flags = args->flags;
 	type = args->type;
 
 	/*
@@ -431,32 +427,12 @@
 	if (type == MT_NOINIT)
 		return (0);
 
-	m->m_next = NULL;
-	m->m_nextpkt = NULL;
-	m->m_len = 0;
-	m->m_flags = flags;
-	m->m_type = type;
-	if (flags & M_PKTHDR) {
-		m->m_data = m->m_pktdat;
-		m->m_pkthdr.rcvif = NULL;
-		m->m_pkthdr.header = NULL;
-		m->m_pkthdr.len = 0;
-		m->m_pkthdr.csum_flags = 0;
-		m->m_pkthdr.csum_data = 0;
-		m->m_pkthdr.tso_segsz = 0;
-		m->m_pkthdr.ether_vtag = 0;
-		m->m_pkthdr.flowid = 0;
-		m->m_pkthdr.fibnum = 0;
-		SLIST_INIT(&m->m_pkthdr.tags);
-#ifdef MAC
-		/* If the label init fails, fail the alloc */
-		error = mac_mbuf_init(m, how);
-		if (error)
-			return (error);
-#endif
-	} else
-		m->m_data = m->m_dat;
-	return (0);
+	m = (struct mbuf *)mem;
+	flags = args->flags;
+
+	error = m_init(m, NULL, size, how, type, flags);
+
+	return (error);
 }
 
 /*
@@ -466,12 +442,10 @@
 mb_dtor_mbuf(void *mem, int size, void *arg)
 {
 	struct mbuf *m;
-	unsigned long flags;
 
 	m = (struct mbuf *)mem;
-	flags = (unsigned long)arg;
 
-	if ((flags & MB_NOTAGS) == 0 && (m->m_flags & M_PKTHDR) != 0)
+	if ((m->m_flags & M_PKTHDR) != 0 && !SLIST_EMPTY(&m->m_pkthdr.tags))
 		m_tag_delete_chain(m, NULL);
 	KASSERT((m->m_flags & M_EXT) == 0, ("%s: M_EXT set", __func__));
 #ifdef INVARIANTS
@@ -565,12 +539,13 @@
 		m->m_ext.ext_buf = (caddr_t)mem;
 		m->m_data = m->m_ext.ext_buf;
 		m->m_flags |= M_EXT;
+		m->m_ext.ref_cnt = refcnt;
+		m->m_ext.ext_type = type;
+		m->m_ext.ext_flags = 0;
 		m->m_ext.ext_free = NULL;
 		m->m_ext.ext_arg1 = NULL;
 		m->m_ext.ext_arg2 = NULL;
 		m->m_ext.ext_size = size;
-		m->m_ext.ext_type = type;
-		m->m_ext.ref_cnt = refcnt;
 	}
 
 	return (0);
@@ -641,9 +616,7 @@
 {
 	struct mbuf *m;
 	struct mb_args *args;
-#ifdef MAC
 	int error;
-#endif
 	int flags;
 	short type;
 
@@ -655,34 +628,13 @@
 #ifdef INVARIANTS
 	trash_ctor(m->m_ext.ext_buf, MCLBYTES, arg, how);
 #endif
-	m->m_next = NULL;
-	m->m_nextpkt = NULL;
+
+	error = m_init(m, NULL, size, how, type, flags);
+
 	m->m_data = m->m_ext.ext_buf;
-	m->m_len = 0;
 	m->m_flags = (flags | M_EXT);
-	m->m_type = type;
 
-	if (flags & M_PKTHDR) {
-		m->m_pkthdr.rcvif = NULL;
-		m->m_pkthdr.len = 0;
-		m->m_pkthdr.header = NULL;
-		m->m_pkthdr.csum_flags = 0;
-		m->m_pkthdr.csum_data = 0;
-		m->m_pkthdr.tso_segsz = 0;
-		m->m_pkthdr.ether_vtag = 0;
-		m->m_pkthdr.flowid = 0;
-		m->m_pkthdr.fibnum = 0;
-		SLIST_INIT(&m->m_pkthdr.tags);
-#ifdef MAC
-		/* If the label init fails, fail the alloc */
-		error = mac_mbuf_init(m, how);
-		if (error)
-			return (error);
-#endif
-	}
-	/* m_ext is already initialized. */
-
-	return (0);
+	return (error);
 }
 
 int
@@ -691,17 +643,22 @@
 #ifdef MAC
 	int error;
 #endif
-	m->m_data = m->m_pktdat;
-	SLIST_INIT(&m->m_pkthdr.tags);
+	if ((m->m_flags & M_EXT) == 0)
+		m->m_data = m->m_pktdat;
 	m->m_pkthdr.rcvif = NULL;
-	m->m_pkthdr.header = NULL;
 	m->m_pkthdr.len = 0;
-	m->m_pkthdr.flowid = 0;
 	m->m_pkthdr.fibnum = 0;
+	m->m_pkthdr.cosqos = 0;
+	m->m_pkthdr.rsstype = 0;
 	m->m_pkthdr.csum_flags = 0;
-	m->m_pkthdr.csum_data = 0;
-	m->m_pkthdr.tso_segsz = 0;
-	m->m_pkthdr.ether_vtag = 0;
+	m->m_pkthdr.flowid = 0;
+	m->m_pkthdr.l2hlen = 0;
+	m->m_pkthdr.l3hlen = 0;
+	m->m_pkthdr.l4hlen = 0;
+	m->m_pkthdr.l5hlen = 0;
+	m->m_pkthdr.PH_per.sixtyfour[0] = 0;
+	m->m_pkthdr.PH_loc.sixtyfour[0] = 0;
+	SLIST_INIT(&m->m_pkthdr.tags);
 #ifdef MAC
 	/* If the label init fails, fail the alloc */
 	error = mac_mbuf_init(m, how);
Index: kern/subr_mbpool.c
===================================================================
--- kern/subr_mbpool.c	(revision 254596)
+++ kern/subr_mbpool.c	(working copy)
@@ -283,7 +283,7 @@
  * Mbuf system external mbuf free routine
  */
 void
-mbp_ext_free(void *buf, void *arg)
+mbp_ext_free(struct mbuf *m, void *buf, void *arg)
 {
 	mbp_free(arg, buf);
 }
Index: kern/uipc_mbuf.c
===================================================================
--- kern/uipc_mbuf.c	(revision 254596)
+++ kern/uipc_mbuf.c	(working copy)
@@ -247,8 +247,8 @@
  */
 int
 m_extadd(struct mbuf *mb, caddr_t buf, u_int size,
-    void (*freef)(void *, void *), void *arg1, void *arg2, int flags, int type,
-    int wait)
+    void (*freef)(struct mbuf *, void *, void *), void *arg1, void *arg2,
+    int flags, int type, int wait)
 {
 	KASSERT(type != EXT_CLUSTER, ("%s: EXT_CLUSTER not allowed", __func__));
 
@@ -314,7 +314,7 @@
 		case EXT_EXTREF:
 			KASSERT(m->m_ext.ext_free != NULL,
 				("%s: ext_free not set", __func__));
-			(*(m->m_ext.ext_free))(m->m_ext.ext_arg1,
+			(*(m->m_ext.ext_free))(m, m->m_ext.ext_arg1,
 			    m->m_ext.ext_arg2);
 			break;
 		default:
@@ -334,6 +334,7 @@
 	m->m_ext.ref_cnt = NULL;
 	m->m_ext.ext_size = 0;
 	m->m_ext.ext_type = 0;
+	m->m_ext.ext_flags = 0;
 	m->m_flags &= ~M_EXT;
 	uma_zfree(zone_mbuf, m);
 }
@@ -360,6 +361,7 @@
 	n->m_ext.ext_size = m->m_ext.ext_size;
 	n->m_ext.ref_cnt = m->m_ext.ref_cnt;
 	n->m_ext.ext_type = m->m_ext.ext_type;
+	n->m_ext.ext_flags = m->m_ext.ext_flags;
 	n->m_flags |= M_EXT;
 	n->m_flags |= m->m_flags & M_RDONLY;
 }
@@ -427,11 +429,6 @@
 			M_SANITY_ACTION("m_data outside mbuf data range right");
 		if ((caddr_t)m->m_data + m->m_len > b)
 			M_SANITY_ACTION("m_data + m_len exeeds mbuf space");
-		if ((m->m_flags & M_PKTHDR) && m->m_pkthdr.header) {
-			if ((caddr_t)m->m_pkthdr.header < a ||
-			    (caddr_t)m->m_pkthdr.header > b)
-				M_SANITY_ACTION("m_pkthdr.header outside mbuf data range");
-		}
 
 		/* m->m_nextpkt may only be set on first mbuf in chain. */
 		if (m != m0 && m->m_nextpkt != NULL) {
@@ -735,7 +732,6 @@
 			return NULL;
 		bcopy(&buf, mm->m_ext.ext_buf, mm->m_len);
 		mm->m_data = mm->m_ext.ext_buf;
-		mm->m_pkthdr.header = NULL;
 	}
 	if (prep && !(mm->m_flags & M_EXT) && len > M_LEADINGSPACE(mm)) {
 		bcopy(mm->m_data, &buf, mm->m_len);
@@ -746,7 +742,6 @@
 		       mm->m_ext.ext_size - mm->m_len, mm->m_len);
 		mm->m_data = (caddr_t)mm->m_ext.ext_buf +
 			      mm->m_ext.ext_size - mm->m_len;
-		mm->m_pkthdr.header = NULL;
 	}
 
 	/* Append/prepend as many mbuf (clusters) as necessary to fit len. */
Index: kern/uipc_syscalls.c
===================================================================
--- kern/uipc_syscalls.c	(revision 254596)
+++ kern/uipc_syscalls.c	(working copy)
@@ -1855,7 +1855,7 @@
  * Detach mapped page and release resources back to the system.
  */
 void
-sf_buf_mext(void *addr, void *args)
+sf_buf_mext(struct mbuf *mb, void *addr, void *args)
 {
 	vm_page_t m;
 	struct sendfile_sync *sfs;
@@ -2314,7 +2314,7 @@
 			m0 = m_get((mnw ? M_NOWAIT : M_WAITOK), MT_DATA);
 			if (m0 == NULL) {
 				error = (mnw ? EAGAIN : ENOBUFS);
-				sf_buf_mext(NULL, sf);
+				sf_buf_mext(NULL, NULL, sf);
 				break;
 			}
 			if (m_extadd(m0, (caddr_t )sf_buf_kva(sf), PAGE_SIZE,
@@ -2321,7 +2321,7 @@
 			    sf_buf_mext, sfs, sf, M_RDONLY, EXT_SFBUF,
 			    (mnw ? M_NOWAIT : M_WAITOK)) != 0) {
 				error = (mnw ? EAGAIN : ENOBUFS);
-				sf_buf_mext(NULL, sf);
+				sf_buf_mext(NULL, NULL, sf);
 				m_freem(m0);
 				break;
 			}
Index: net/if.h
===================================================================
--- net/if.h	(revision 254596)
+++ net/if.h	(working copy)
@@ -103,7 +103,7 @@
 	u_long	ifi_omcasts;		/* packets sent via multicast */
 	u_long	ifi_iqdrops;		/* dropped on input, this interface */
 	u_long	ifi_noproto;		/* destined for unsupported protocol */
-	u_long	ifi_hwassist;		/* HW offload capabilities, see IFCAP */
+	uint64_t ifi_hwassist;		/* HW offload capabilities, see IFCAP */
 	time_t	ifi_epoch;		/* uptime at attach or stat reset */
 	struct	timeval ifi_lastchange;	/* time of last administrative change */
 };
Index: netinet/igmp.c
===================================================================
--- netinet/igmp.c	(revision 254596)
+++ netinet/igmp.c	(working copy)
@@ -289,7 +289,7 @@
 {
 
 #ifdef VIMAGE
-	m->m_pkthdr.header = ifp->if_vnet;
+	m->m_pkthdr.PH_loc.ptr = ifp->if_vnet;
 #endif /* VIMAGE */
 	m->m_pkthdr.flowid = ifp->if_index;
 }
@@ -298,7 +298,6 @@
 igmp_scrub_context(struct mbuf *m)
 {
 
-	m->m_pkthdr.header = NULL;
 	m->m_pkthdr.flowid = 0;
 }
 
@@ -326,7 +325,7 @@
 
 #ifdef notyet
 #if defined(VIMAGE) && defined(INVARIANTS)
-	KASSERT(curvnet == (m->m_pkthdr.header),
+	KASSERT(curvnet == (m->m_pkthdr.PH_loc.ptr),
 	    ("%s: called when curvnet was not restored", __func__));
 #endif
 #endif
@@ -3403,7 +3402,7 @@
 	 * indexes to guard against interface detach, they are
 	 * unique to each VIMAGE and must be retrieved.
 	 */
-	CURVNET_SET((struct vnet *)(m->m_pkthdr.header));
+	CURVNET_SET((struct vnet *)(m->m_pkthdr.PH_loc.ptr));
 	ifindex = igmp_restore_context(m);
 
 	/*
Index: netinet/ip_input.c
===================================================================
--- netinet/ip_input.c	(revision 254596)
+++ netinet/ip_input.c	(working copy)
@@ -921,7 +921,7 @@
 	 * ip_reass() will return a different mbuf.
 	 */
 	IPSTAT_INC(ips_fragments);
-	m->m_pkthdr.header = ip;
+	m->m_pkthdr.PH_loc.ptr = ip;
 
 	/* Previous ip_reass() started here. */
 	/*
@@ -964,7 +964,7 @@
 #endif
 	}
 
-#define GETIP(m)	((struct ip*)((m)->m_pkthdr.header))
+#define GETIP(m)	((struct ip*)((m)->m_pkthdr.PH_loc.ptr))
 
 	/*
 	 * Handle ECN by comparing this segment with the first one;
Index: netinet6/ip6_output.c
===================================================================
--- netinet6/ip6_output.c	(revision 254596)
+++ netinet6/ip6_output.c	(working copy)
@@ -195,9 +195,9 @@
 	offset += m->m_pkthdr.csum_data;	/* checksum offset */
 
 	if (offset + sizeof(u_short) > m->m_len) {
-		printf("%s: delayed m_pullup, m->len: %d plen %u off %u "
-		    "csum_flags=0x%04x\n", __func__, m->m_len, plen, offset,
-		    m->m_pkthdr.csum_flags);
+//		printf("%s: delayed m_pullup, m->len: %d plen %u off %u "
+//		    "csum_flags=0x%04x\n", __func__, m->m_len, plen, offset,
+//		    m->m_pkthdr.csum_flags);
 		/*
 		 * XXX this should not happen, but if it does, the correct
 		 * behavior may be to insert the checksum in the appropriate
Index: netinet6/mld6.c
===================================================================
--- netinet6/mld6.c	(revision 254596)
+++ netinet6/mld6.c	(working copy)
@@ -275,7 +275,7 @@
 {
 
 #ifdef VIMAGE
-	m->m_pkthdr.header = ifp->if_vnet;
+	m->m_pkthdr.PH_loc.ptr = ifp->if_vnet;
 #endif /* VIMAGE */
 	m->m_pkthdr.flowid = ifp->if_index;
 }
@@ -284,7 +284,7 @@
 mld_scrub_context(struct mbuf *m)
 {
 
-	m->m_pkthdr.header = NULL;
+	m->m_pkthdr.PH_loc.ptr = NULL;
 	m->m_pkthdr.flowid = 0;
 }
 
@@ -300,7 +300,7 @@
 {
 
 #if defined(VIMAGE) && defined(INVARIANTS)
-	KASSERT(curvnet == m->m_pkthdr.header,
+	KASSERT(curvnet == m->m_pkthdr.PH_loc.ptr,
 	    ("%s: called when curvnet was not restored", __func__));
 #endif
 	return (m->m_pkthdr.flowid);
Index: sys/mbpool.h
===================================================================
--- sys/mbpool.h	(revision 254596)
+++ sys/mbpool.h	(working copy)
@@ -69,7 +69,7 @@
 void mbp_free(struct mbpool *, void *);
 
 /* free a chunk that is an external mbuf */
-void mbp_ext_free(void *, void *);
+void mbp_ext_free(struct mbuf *, void *, void *);
 
 /* free all buffers that are marked to be on the card */
 void mbp_card_free(struct mbpool *);
Index: sys/sf_buf.h
===================================================================
--- sys/sf_buf.h	(revision 254596)
+++ sys/sf_buf.h	(working copy)
@@ -55,6 +55,7 @@
 #ifdef _KERNEL
 #include <machine/sf_buf.h>
 #include <sys/counter.h>
+#include <sys/mbuf.h>
 
 extern counter_u64_t sfstat[sizeof(struct sfstat) / sizeof(uint64_t)];
 #define	SFSTAT_ADD(name, val)	\
@@ -66,6 +67,6 @@
 struct sf_buf *
 	sf_buf_alloc(struct vm_page *m, int flags);
 void	sf_buf_free(struct sf_buf *sf);
-void	sf_buf_mext(void *addr, void *args);
+void	sf_buf_mext(struct mbuf *mb, void *addr, void *args);
 
 #endif /* !_SYS_SF_BUF_H_ */


More information about the freebsd-net mailing list