Code review: groundwork for SMP

Fri Jan 29 05:07:53 UTC 2010

On Thu, Jan 28, 2010 at 20:26, Randall Stewart <rrs at lakerest.net> wrote:
> It burns up TLB entries.  Ok that does not sound so bad on the
> surface but wait lets think about this.. and I am going to
> speak in terms of XLR... but other mips processors may have
> the same issue.
>
> 1) I have 8 cores per cpu pack.
> 2) Each core has 4 "threads" which are kinda hyper threads, their
>   own register set, there own everything accept they share a pipeline
>   and get scheduled when one of the others are blocked.
> 3) This means I still need a pcpup per thread.
> 4) Now I have 64 TLB entries for every CPU complex. I can have them
>   16 per thread OR 64 shared amongst all threads.
> 5) This means I dedicate 4 of my 64 TLB entries for your pcpup entries.

So on your systems threads share the TLB?  Wired TLB entries can't be
pulled out (in the case of the kernel stack it's basically
catastrophic for that to happen.)  A compromise if your TLB entries
are really at a premium is to use a single large entry (using, say, a
single 32k page) that contains both PCPU and the kernel stack, or a
page which has pointers to pcpu data, the kernel stack, etc.  I seem
to recall seeing a port of FreeBSD that used the same storage for the
kernel stack and PCPU data, but I could be mistaken.

There are other trade-offs available, of course.  If we don't use the
gp for accessing small data, we can keep a pointer to the pcpu data of
a CPU in gp whenever the kernel is running, and then PCPU accesses are
just a madder of loading from offset+gp, which is very quick — faster
than the wired TLB entry mechanism, unless you use a virtual address
for the pcpu in which case it can be painful.  As there are more
things like VIMAGE, the amount of small global data in the kernel is
going to fall and making gp a pcpu pointer makes more sense.  My old
port used -G0 and I still disable use of the gp in my non-FreeBSD MIPS
work — I think NetBSD used to but I haven't noticed what FreeBSD does.

More curiosity than anything (since I don't seem to be able to get an
RMI system to develop on): if the threads are sharing the TLB, how are
updates to TLB-related fields synchronized?  How do you atomically
increase the wired count of the TLB?  How does 'tlbwr' work?  Do you
have to use special instructions when you're sharing the TLB that are
XLR-specific?

Juli.