4.7 vs 5.2.1 SMP/UP bridging performance
Don Bowman
don at sandvine.com
Thu May 6 06:52:30 PDT 2004
From: Bruce Evans [mailto:bde at zeta.org.au]
> On Wed, 5 May 2004, Andrew Gallatin wrote:
>
...
> >
> > Actually, I think his tests are accurate and bus locked instructions
> > take an eternity on P4. See
> > http://www.uwsg.iu.edu/hypermail/linux/kernel/0109.3/0687.html
> >
> > For example, with your test above, I see 212 cycles for the
> UP case on
> > a 2.53GHz P4. Replacing the atomic_store_rel_int(&slock, 0) with a
> > simple slock = 0; reduces that count to 18 cycles.
>
> This seems to be right, unfortunately. I wonder if this has
> anything to
> do with freebsd.org having no P4 machines.
>
> > If its really safe to remove the xchg* from non-SMP
> atomic_store_rel*,
> > then I think you should do it. Of course, that still leaves mutexes
> > as very expensive on SMP (253 cycles on the 2.53GHz from above).
>
> I forgot (again) that there are memory access ordering issues. A lock
> may be needed to get everything synced. See the comment
> before the i386
> versions in i386/include/atomic.h. A single lock may be enough. The
> best example I could think of easily is:
On the P4, there are mfence,lfence,sfence instructions to enforce
memory ordering. These are cheaper than "lock; andl" or "cpuid",
which are the traditional 'sync' instructions.
More information about the freebsd-current
mailing list