[PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs
Chuck Swiger
cswiger at mac.com
Thu Jan 18 22:47:57 UTC 2007
On Jan 18, 2007, at 2:28 PM, Maxim Sobolev wrote:
>> Unfortunately, there are simply different tradeoffs between
>> mechanisms for copying depending on whether you want to use or
>> avoid using/thrashing the L1/L2 caches, whether the data is cache-
>> aligned, and so forth; the CPU can't infer what you want to
>> occur-- you have to tell it. I find it interesting that some of
>> the architectures (PA-RISC,
>
> Well, of course there are some special cases, but in general there
> should be some baseline suitable for most of uses. That's why we
> (and most other operating systems) only provide single version for
> the mem*(3) APIs.
Well, a truly generic version in is lib/libc/string/bcopy.c; it's
architecture-neutral (ie, it's pure C code) and it handles all kinds
of things like overlapping source and destination addresses, non-
aligned access, and so forth. The downside is that it's slower than
using movl/movsl, much less some of the fancier variants that Bruce
and Matt have been discussing (in considerable, interesting detail)
earlier:
http://now.cs.berkeley.edu/Td/bcopy.html
If you're only moving, say, 5 bytes, the overhead of fancy loop
unrolling and prefetching and so forth isn't going to help compared
with a simple movb/movl combination, so it really depends.
--
-Chuck
More information about the freebsd-arch
mailing list