TCP Segmentation Offload

Luigi Rizzo rizzo at icir.org
Sat Sep 6 09:07:11 PDT 2003


On Fri, Sep 05, 2003 at 04:47:22PM -0400, Andrew Gallatin wrote:
> 
> I've been reading a little about TCP Segmentation Offload (aka TSO).
> We don't appear to support it, but at least 2 of our supported nics
> (e1000 and bge) apparently could support it.

i believe there is more commercial hype than actual savings in doing
TCP Segmentation Offload.

With delayed acks (or better, "ack every second packet"),
the sender's TCP typically sends out two
packets at a time. Without delayed acks, it is just one at a
time. So yes, you avoid looping in tcp_output() twice, but
on the other hand in the second execution everything is
already in cache, jump prediction units get it right in many
places, and in the end i believe the actual savings will be
a lot lower than 50% (in a real-life case, with multiple
TCP flows active, and not dumping an entire 64k window
onto the net at the first transmission).

(btw the e1000 is really cheap!)

	cheers
	luigi

> The gist is that TCP pretends the nic has a large mtu, and passes a
> large (> the mtu on the link layer) packet down to driver and then the
> nic.  The nic then fragments the large packet into smaller (<=mtu)
> packets.  It uses the initial TCP header as a template to construct
> the headers for the "fragments.".  The people who implemented it on
> linux claim a 50% CPU savings for an Intel 1Gb/s adaptor with a 1500
> byte mtu. 
> 
> It seems like it could be implemented rather easily by adding an
> if_hwassist flag (CSUM_TSO).  If this flag is set on the interface
> found by tcp_mss(), then the mss is set to 56k.  This causes TCP to
> generate huge packets.  We then add a check in ip_output() after the
> (near the existing CSUM_FRAGMENT check) which checks to see if its
> both a TCP packet, and if CSUM_TSO is set in the if_hwassist flags.
> If so, the huge packet is passed on down to the driver.  Does this
> sound reasonable?  The only other thing I can think of is that some
> nics might not be able to handle such a large mss, and we might want
> to stuff the maximum mss value into the ifnet struct.
> 
> I don't have a bge or an e1000, so I'm not ready to actually implement
> this.   I'm  more considering firmware optimizations for our product,
> and would implement it in a few months, after making the firmware
> changes.
> 
> Drew
> 
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"


More information about the freebsd-net mailing list