cvs commit: src/sys/i386/i386 pmap.c

Sat Oct 25 20:14:51 PDT 2003

On Sat, 25 Oct 2003, Peter Wemm wrote:

> Jeff Roberson wrote:
> > On Sat, 25 Oct 2003, Peter Wemm wrote:
> >
> > > peter       2003/10/25 11:51:41 PDT
> > >
> > >   FreeBSD src repository
> > >
> > >   Modified files:
> > >     sys/i386/i386        pmap.c
> > >   Log:
> > >   For the SMP case, flush the TLB at the beginning of the page zero/copy
> > >   routines.  Otherwise we run into trouble with speculative tlb preloads
> > >   on SMP systems.  This effectively defeats Jeff's revision 1.438
> > >   optimization (for his pentium4-M laptop) in the SMP case.  It breaks
> > >   other systems, particularly athlon-MP's.
> >
> > If the page tables are NULL why does this break speculative tlb preloads?
>
> While we're zeroing the page, CMAP2 (or friends) are non-NULL.  If another
> cpu accesses a nearby page and the cpu decides to speculatively preload
> the nearby TLB entries, then it will cache the CMAP2 value.  Meanwhile, the
> originating cpu clears it again and flushes its own cache.  But, if we then
> do a pmap_zero_page on the other cpu, it can still have the speculatively
> cached tlb entry and zero the wrong page.
>
> Poul-Henning was able to reproduce this problem in short order.  The first
> hack we tried was to change invlcaddr() to do a global shootdown.  It solved
> the crashes.. presumably by purging all other cpu's copies of CMAP2 including
> any speculatively loaded values.  Obviously this is expensive and defeats
> the point of doing local flushes only.
>
> So, as a lighter weight solution, we tried flushing after every page table
> modification, as the IA32 system programmers manual says we must, and it
> too solved the problem - without the expense of extra tlb shootdowns.
>
> Perhaps we should change back to using the the switchin purge and flush at the
> beginning as an alternative to two flushes.  The expense of invlpg seems to
> be unique to the pentium-4's.  athlon's run at about 100 clock cycles (80 on
> athlon64's).

Uhm, dumb question, why don't we just allocate one page of kva per
processor and avoid the mutex, the switchin/out, etc?  To save KVA? At 3
pages per processor and a max of 8 processors on intel, that's 192k.  We
can probably spare it.  I just saved that much with some UMA tuning.

What's the largest intel smp system we've run on?  I thought with > 4
intel had to use funky cache bridges.  Then we just need 2x for ht.

Cheers,
Jeff

>
> Cheers,
> -Peter
> --
> Peter Wemm - peter at wemm.org; peter at FreeBSD.org; peter at yahoo-inc.com
> "All of this is for nothing if we don't go to the stars" - JMS/B5
>