[RFC] Port of NetBSD's optimized amd64 string code

Tue Aug 2 17:20:44 GMT 2005

On Tue, Aug 02, 2005 at 12:02:46PM +0800, Xin LI wrote:
> On Mon, Aug 01, 2005 at 06:39:16PM -0700, David O'Brien wrote:
> > On Tue, Aug 02, 2005 at 02:25:18AM +0800, Xin LI wrote:
> > > Here is a patchset that I have produced to make our libc aware of the
> > > NetBSD assembly implementation of the string related operations.
> > 
> > What performance benchmarks have these been thru?
..
> BTW.  Would you please give me some hints on the benchmarking?  I am
> not sure whether just looping the test cases on some determine dataset
> would be enough?

Try some real world tests such as 'make buildworld'.  Looking in
src/usr.bin the following utils make good use of these libc functions and
would be good real world tests: uuencode catman compress last makewhatis

* uuencode a large kernel
* run /etc/periodic/weekly/320.whatis
* compress a large kernel
* last delphij on a large /var/log/wtmp
* cp /usr/src/share/man/man[1-9] to a ram disk and then run catman over it

Just a few suggestions.  It is easy to "optimize" for the simple input case
and miss the larger case.  I've also seen people "optimize" for all cases
but then wind up with so much overhead that small inputs are slower.

I have some very fancy routines from AMD that take into account cache
size, alignment, and uses the prefetch instructions.  The problem is they
are a huge win for large input sizes, but I'm concerned about their
performance on small input sizes.

If these NetBSD routines perform better in the tests I listed above, we
should commit them.  We can continue to refine these libc routines over
time.

-- 
-- David  (obrien at FreeBSD.org)