4.7 vs 5.2.1 SMP/UP bridging performance
Andrew Gallatin
gallatin at cs.duke.edu
Wed May 5 14:23:44 PDT 2004
Bruce Evans writes:
>
> Athlon XP2600 UP system: !SMP case: 22 cycles SMP case: 37 cycles
> Celeron 366 SMP system: 35 48
>
> The extra cycles for the SMP case are just the extra cost of a one lock
> instruction. Note that SMP should cost twice as much extra, but the
> non-SMP atomic_store_rel_int(&slock, 0) is pessimized by using xchgl
> which always locks the bus. After fixing this:
>
> Athlon XP2600 UP system: !SMP case: 6 cycles SMP case: 37 cycles
> Celeron 366 SMP system: 10 48
>
> Mutexes take longer than simple locks, but not much longer unless the
> lock is contested. In particular, they don't lock the bus any more
> and the extra cycles for locking dominate (even in the !SMP case due
> to the pessimization).
>
> So there seems to be something wrong with your benchmark. Locking the
> bus for the SMP case always costs about 20+ cycles, but this hasn't
> changed since RELENG_4 and mutexes can't be made much faster in the
> uncontested case since their overhead is dominated by the bus lock
> time.
>
Actually, I think his tests are accurate and bus locked instructions
take an eternity on P4. See
http://www.uwsg.iu.edu/hypermail/linux/kernel/0109.3/0687.html
For example, with your test above, I see 212 cycles for the UP case on
a 2.53GHz P4. Replacing the atomic_store_rel_int(&slock, 0) with a
simple slock = 0; reduces that count to 18 cycles.
If its really safe to remove the xchg* from non-SMP atomic_store_rel*,
then I think you should do it. Of course, that still leaves mutexes
as very expensive on SMP (253 cycles on the 2.53GHz from above).
Drew
More information about the freebsd-current
mailing list