cvs commit: src/sys/i386/i386 pmap.c

Stephan Uphoff ups at tree.com
Tue Nov 9 11:12:02 PST 2004


On Tue, 2004-11-09 at 08:03, Robert Watson wrote:
> On Tue, 9 Nov 2004, Robert Watson wrote:
> 
> > > I've tried changing the store_rel() to just do a simple store since writes are 
> > > ordered on x86, but benchmarks on SMP showed that it actually hurt.  However, 
> > > it would probably be good to at least do that for UP.  The current patch to 
> > > do it for all kernels is:
> 
> Interestingly, I've now run through some more "macro" benchmarks.  I saw a
> couple of percent improvement on UP from the change, but indeed, I saw a
> slight decrease in performance for the rapid packet send benchmark on SMP. 
> 
> So I guess my recommendation is to get this in the tree for UP, and see if
> we can figure out why it's having the slow-down effect on SMP.

We are probably talking cache line effects here.
My guess is that we should:

1) Make sure that important spin mutexes are alone in a cache line.
2) Take care not to dirty the cache line unnecessarily.

I think for 2 we need to change the spin mutex slightly (for SMP) to
never call LOCK cmpxchgl before a simple load operation finds
m->mtx_lock == MTX_UNOWNED since LOCK cmpxchgl always seems to dirty the
cache line.

I have a dual Xeon (p4) where I can run some tests. Please let me know
if there are any tests that you can recommend - I don't want to reinvent
the wheel here.
Interestingly enough the linux spin locks implementation is mentioning
some PPRO errata that seem to require a locked operation.
Guess that means we should take a look at the errata of all SMP able
processors out there :-(
Intel also recommends a locked operation (or SFENCE) for future
processors.
Guess this means either non optimal code, lots of compile options or
self modifying code.

	Stephan






More information about the cvs-all mailing list