cvs commit: src/lib/libc/amd64/string Makefile.inc bcopy.S
bzero.S memcpy.S memmove.S memset.S
alc at cs.rice.edu
Thu Apr 7 11:42:55 PDT 2005
On Thu, Apr 07, 2005 at 09:29:33AM -0400, David Schultz wrote:
> On Thu, Apr 07, 2005, Alexey Dokuchaev wrote:
> > On Thu, Apr 07, 2005 at 03:56:03AM +0000, Alan Cox wrote:
> > > alc 2005-04-07 03:56:03 UTC
> > >
> > > FreeBSD src repository
> > >
> > > Added files:
> > > lib/libc/amd64/string Makefile.inc bcopy.S bzero.S memcpy.S
> > > memmove.S memset.S
> > > Log:
> > > Add machine-specific, optimized implementations of bcopy, bzero, memcpy,
> > > memmove, and memset.
> > Great! Are we going to see something like this for ia32?
> i386 has had them since the beginnning of time, and the code
> Alan committed is a port of the i386 versions.
Yes, exactly. That said, the benefits are profound on microbenchmarks
and measureable on macrobenchmarks, like buildworld.
As for more "exotic" copy routines, these are user space routines.
So, it is already the case that SSE registers can be used if desired.
However, the AMD optimization manual only recommends their use for
very large copies. Among the reasons is the fact that even the
simple methods are moving/zeroing data 64 bits at a time. So,
switching to 128-bit SSE registers has a less dramatic effect than on
i386, where Matt is benchmarking.
More information about the cvs-all