peter.portante at hp.com
Thu Aug 7 12:02:18 PDT 2003
> From: Daniel Eischen
> Reply To: deischen at freebsd.org
> Sent: Thursday, August 7, 2003 1:44 PM
> To: Portante, Peter
> Cc: alpha at freebsd.org; deischen at freebsd.org
> Subject: RE: Atomic swap
> On Thu, 7 Aug 2003, Portante, Peter wrote:
> > Dan,
> > I don't think you want to do the stq_c if the location already holds the
> > same value. Instead, check the loaded value to see if it is the same as the
> The purpose of the atomic swap is to make a FIFO queueing
> list. The values should never be the same. It's not meant
> to be used as test_and_set.
Reasonable. We had a major performance bug in our code when we assumed a routine performed a certain way based on its name. You might want to change the name, because an atomic swap long could be used to implement a mutex if one didn't know better and then this code will tube an MP system under contention.
> > value to be stored, and branch out of the loop returning the result if it is
> > they are the same. And starting with EV56, the need to do the branch
> > forward/branch back logic has been removed. And EV6 and later CPUs do such
> > a good job predicting the branching that it is not worth the instruction
> > stream space when that space can be used to avoid a stq_c.
> > Additionally, the stq_c destroys the contents of %2, so you need to move the
> > value in %2 into another register for use in the stq_c. I don't know how to
> > do that in the ASM, so I just used raw register names below, highlighted in
> > red.
> How about this?
Not too bad, except every time you loop you make another memory reference to get the value. If you load it into a register once, you can just move it into place each time before the store with out referencing memory. For performance, don't reference memory unless you absolutely have to. Also, you might want to issue a ldq, once, before the actual loop of ldq_l so that the processor gets the cache line using the normal load instruction avoiding the heavier load-locked logic.
I just read Marcel's note, and his code looks pretty good. Just add a ldq before the "1: ldq_l" that code will perform quite well. If you don't want to add the ldq to the asm, just read the destination value before call the atomic_swap_long(), it will really help this perform well.
More information about the freebsd-alpha