4.7 vs 5.2.1 SMP/UP bridging performance
Gerrit Nagelhout
gnagelhout at sandvine.com
Wed May 5 18:16:39 PDT 2004
Andrew Gallatin wrote:
> Bruce Evans writes:
>
> >
> > Athlon XP2600 UP system: !SMP case: 22 cycles SMP case:
> 37 cycles
> > Celeron 366 SMP system: 35 48
> >
> > The extra cycles for the SMP case are just the extra cost
> of a one lock
> > instruction. Note that SMP should cost twice as much
> extra, but the
> > non-SMP atomic_store_rel_int(&slock, 0) is pessimized by
> using xchgl
> > which always locks the bus. After fixing this:
> >
> > Athlon XP2600 UP system: !SMP case: 6 cycles SMP case:
> 37 cycles
> > Celeron 366 SMP system: 10 48
> >
> > Mutexes take longer than simple locks, but not much longer
> unless the
> > lock is contested. In particular, they don't lock the bus any more
> > and the extra cycles for locking dominate (even in the
> !SMP case due
> > to the pessimization).
> >
> > So there seems to be something wrong with your benchmark.
> Locking the
> > bus for the SMP case always costs about 20+ cycles, but this hasn't
> > changed since RELENG_4 and mutexes can't be made much faster in the
> > uncontested case since their overhead is dominated by the bus lock
> > time.
> >
>
> Actually, I think his tests are accurate and bus locked instructions
> take an eternity on P4. See
> http://www.uwsg.iu.edu/hypermail/linux/kernel/0109.3/0687.html
>
> For example, with your test above, I see 212 cycles for the UP case on
> a 2.53GHz P4. Replacing the atomic_store_rel_int(&slock, 0) with a
> simple slock = 0; reduces that count to 18 cycles.
>
> If its really safe to remove the xchg* from non-SMP atomic_store_rel*,
> then I think you should do it. Of course, that still leaves mutexes
> as very expensive on SMP (253 cycles on the 2.53GHz from above).
>
> Drew
>
I wonder if there is anything that can be done to make the locking more
efficient for the Xeon. Are there any other locking types that could
be used instead?
This might also explain why we are seeing much worse system call
performance under 4.7 in SMP versus UP. Here is a table of results
for some system call tests I ran. (The numbers are calls/s)
2.8Ghz Xeon
UP SMP
write 904427 661312
socket 1327692 1067743
select 554131 434390
gettimeofday 1734963 252479
1.3Ghz PIII
UP SMP
write 746705 532223
socket 1179819 977448
select 727811 556537
gettimeofday 1849862 186387
The really interesting one is gettimeofday. For both the Xeon & PIII,
the UP is much better than SMP, but the UP for PIII is better than that
of the Xeon. I may try to get the results for 5.2.1 later. I can
forward the source code of this program to anyone else who wants to try
it out.
Thanks,
Gerrit
More information about the freebsd-current
mailing list