Optimized copy&move (was: Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs)

Attilio Rao attilio at freebsd.org
Wed Jan 17 21:15:09 UTC 2007


2007/1/17, Ivan Voras <ivoras at fer.hr>:
> Bruce Evans wrote:
>
> > And MMX/XMM registers ar not needed to get movnt on machines with SSE2,
> > since movnti is part of SSE2.  This reduces the advantages of using MMX/XMM
> > registers on P4's and A64's in 32-bit mode to the non-nt parts of the
> > above (fully cached case), which I think are less important than the nt
> > parts.
>
> Hmm, I'm looking at i386/i386/support.s and there are several versions
> of bcopy and bmove functions, including some that optimize by using FPU
> registers (large_i586_bcopy_loop), and a version that uses movnti
> (sse2_pagezero), but I can't find the bit of magic which glues them to
> bzero() call.
>
> Also, as as I can tell by the comments, the FPU version works by
> manually saving context... why is this possible (i.e. won't something
> preempt it?)

They are just broken.
My implementation, which follows DragonFlyBSD patterns, just use a bts
(which is atomic) in order to set a "lock" and avoid thread migration
with scheduler pinning. This is enough to solve concurrency problems.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein


More information about the freebsd-arch mailing list