Checksum/copy (was: Re: cvs commit: src/sys/netinet ip_output.c)
nate at root.org
Thu Mar 27 09:58:50 PST 2003
On Thu, 27 Mar 2003, Bruce Evans wrote:
> On Wed, 26 Mar 2003, Mike Silbersack wrote:
> > On Wed, 26 Mar 2003, Nate Lawson wrote:
> > > I don't want to hijack the thread too much, but has thought gone into a
> > > combined checksum and copy function? The first mention I can remember of
> > > this is in RFC 817 p. 19-20.
> Is this RFC old? Combined checksum and copy hasn't been a larger
> optimization since L1 caches became large enough, since to a first
> approximation, everything is dominated by memory bandwidth and another
> pass to calculate the checksum is free because copying left all the
> data in the L1 cache.
Yes, the RFC is old. However, there still may be performance advantages
in ILP because while the data is being fetched the first time (for the
copy), idle slots can be filled with an incremental checksum update.
> > Heh, I don't think anyone has. What actually would make sense is for
> > someone who feels like doing ASM timing to look at our bcopy routines /
> > etc.
> I spent a lot of time on this about 7 years ago. See ~bde/cache on
> freefall for old versions of programs that try lots of different
> copy/read/write checksum methods. Better hardware made the differences
> between various methods relatively small. One can probably do better
> (50%?) for largish (1K+ ?) buffers using SSE instructions on i386's
We definitely should have an SSE version for P3+. The 128 bit
instructions make a big difference. And for checksumming, you can do 8
packed adds at once.
More information about the cvs-src