Dag-ErlingSmørgrav des at
Thu Mar 27 01:17:26 PST 2003

Bruce Evans <bde at> writes:
> I spent a lot of time on this about 7 years ago.  See ~bde/cache on
> freefall for old versions of programs that try lots of different
> copy/read/write checksum methods.  Better hardware made the differences
> between various methods relatively small.  One can probably do better
> (50%?) for largish (1K+ ?) buffers using SSE instructions on i386's
> now.

Might it be a good idea to have separate b{copy,zero} implementations
for special purposes like pmap_{copy,zero}_page?  Since these cases
copy or zero a fixed and relatively large amount of data, they should
lend themselves well to optimization.  Zeroing a 4096-byte page on an
SSE-enabled i386 should take no more than 35 SSE instructions (one to
save the contents of the register, one to zero the register, 32 to
actually zero the page and one to restore the previous contents of the
register) and a handful of fast integer instructions for setup.

Dag-Erling Smørgrav - des at

More information about the cvs-src mailing list