Peter Jeremy peterjeremy at
Thu Mar 27 11:05:17 PST 2003

[I think this is getting somewhat off topic for the CVS lists]

On Thu, Mar 27, 2003 at 09:57:35AM +0100, Dag-Erling Smørgrav wrote:
>Might it be a good idea to have separate b{copy,zero} implementations
>for special purposes like pmap_{copy,zero}_page?  Since these cases
>copy or zero a fixed and relatively large amount of data, they should
>lend themselves well to optimization.

I think it would be useful - even ignoring SSE, most of the fast
b{zero,copy} implementations include a fair amount of special code
to handle alignment issues and the odd few bytes at the beginning/end
that don't fit into the main loop's work unit.  Having a known size
and alignment simplifies the code a lot.

>  Zeroing a 4096-byte page on an
>SSE-enabled i386 should take no more than 35 SSE instructions

The downside is that we need multiple implementations to take advantage
of features available in different CPUs.

I guess it's a "put up your patches and benchmark results" issue.


More information about the cvs-src mailing list