Atomic swap

Thu Aug 7 12:15:23 PDT 2003

On Thu, 7 Aug 2003, Portante, Peter wrote:

> Dan,
> 
> > From: 	Daniel Eischen
> > 
> > On Thu, 7 Aug 2003, Portante, Peter wrote:
> > 
> > > Dan,
> > > 
> > > I don't think you want to do the stq_c if the location already holds the
> > > same value.  Instead, check the loaded value to see if it is the same as the
> > 
> > The purpose of the atomic swap is to make a FIFO queueing
> > list.  The values should never be the same.  It's not meant
> > to be used as test_and_set.
> > 
> Reasonable.  We had a major performance bug in our code when we assumed a
> routine performed a certain way based on its name.  You might want to change
> the name, because an atomic swap long could be used to implement a mutex if
> one didn't know better and then this code will tube an MP system under
> contention.

Interesting because I use the atomic_swap to implement FIFO-queueing
mutexes :-)

> > > value to be stored, and branch out of the loop returning the result if it is
> > > they are the same.  And starting with EV56, the need to do the branch
> > > forward/branch back logic has been removed.  And EV6 and later CPUs do such
> > > a good job predicting the branching that it is not worth the instruction
> > > stream space when that space can be used to avoid a stq_c.
> > > 
> > > Additionally, the stq_c destroys the contents of %2, so you need to move the
> > > value in %2 into another register for use in the stq_c.  I don't know how to
> > > do that in the ASM, so I just used raw register names below, highlighted in
> > > red.
> > 
> > How about this?
> > 
> Not too bad, except every time you loop you make another memory reference to
> get the value.  If you load it into a register once, you can just move it
> into place each time before the store with out referencing memory.  For
> performance, don't reference memory unless you absolutely have to.  Also,
> you might want to issue a ldq, once, before the actual loop of ldq_l so that
> the processor gets the cache line using the normal load instruction avoiding
> the heavier load-locked logic.
> 
> I just read Marcel's note, and his code looks pretty good.  Just add a ldq
> before the "1: ldq_l" that code will perform quite well.  If you don't want
> to add the ldq to the asm, just read the destination value before call the
> atomic_swap_long(), it will really help this perform well.

OK, thanks.

-- 
Dan Eischen