Code review: groundwork for SMP

Neel Natu neelnatu at gmail.com
Fri Jan 29 15:25:32 UTC 2010


Thanks Juli, Randall and JC for the comments.

I think it is fair to ask that we don't burn another TLB entry to
store the pcpu data. So maybe it might help if I went through what
options I considered before settling on this one:

- One of the first things that I did investigate was using per-cpu
scratch registers but the Sibyte did not have any and they are not
part of the MIPS architecture.

- The second thing I considered was using a platform-specific
getcpuid() to index into the struct pcpu pcpu[MAXCPU] array to compute
the KSEG0 address of pcpu at runtime. However this turned out to be a
bit messy because there are consumers of getcpuid() in exception
context where we are restricted to using only k0 and k1 (and sometimes
only one of them). Also, like Juli pointed out getcpuid() is slow on
some cpus and I did not want to make the assumption that one could
write getcpuid() using a single k0/k1 register.

So, having the pcpu pointer in a TLB entry divorces us from any
assumptions about the CPU we are running on.

I think that there is a legitimate concern about this on the XLR - but
given that you are sharing the TLB among 4 threads I think there is
the bigger issue of the wired kstack entries that you need to solve
before even thinking about pcpu mapping.

I did not consider the approach suggested by Juli where the pcpu and
kstack pointers can be stashed in a single wired TLB entry. I need
some time to chew on it and prototype it.

I would still like to commit this so as to keep making progress on the
SMP support. This is a small piece of the bigger goal of getting SMP
functional and can be replaced in the future if need be.

best
Neel

On Thu, Jan 28, 2010 at 10:42 PM, Juli Mallett <jmallett at freebsd.org> wrote:
> On Thu, Jan 28, 2010 at 21:28, Randall Stewart <rrs at lakerest.net> wrote:
>>> [ Using a single wired TLB entry for kstack and pcpu ]
>>
>> Which means you have a big array that you are offsetting.
>
> Not really — you can have a structure at 0xc000000000000000u (or the
> same >> 32) with two pointers in it, even, one to pcpu and one to
> KSTACK_PAGES direct-mapped, contiguous pages.  Then you can load the
> kstack address or the pcpu base very quickly.  Of course, you can even
> have a single wired entry consisting of the pcpu data and then put a
> pointer to the top of the kstack in it.  I don't think you can get by
> with no wired TLB entries, but you also don't have to index a big
> array.  The nice thing about setting up a per-CPU TLB entry (you have
> to set up at least one, the kstack, in order to be able to handle
> exceptions) is that then you need only access offsets into it that are
> known at compile time and constant no matter what CPU you're running
> on.  Load the kstack by doing "ld sp, 0(0xc...)" and load the pcpu
> address by doing "ld t0, 8(0xc....)".  Two wired entries lets you get
> rid of the indirection, but you can get by with one and still not have
> to do (1) run-time computation of the index into some array (2)
> possibly very expensive getting of the cpuid.
>
>> I was even thinking get a LARGE entry.. one that is say 8 Meg
>> for the kernel.. covering all text/data etc... with this
>> new super page stuff. of course I have never looked into how
>> its implemented..
>
> That would be easy to do, but what would be the benefits of accessing
> that data through a wired TLB entry instead of the direct map?
>
>> Yes, you pay an index reference for every access .. or at
>> least one to setup a pointer.. but I think that it much cheaper
>> than a TLB miss is... (words for imp to think about)...
>
> Yes, TLB misses are very slow.  Your desire to avoid adding another
> wired entry seems pretty reasonable.  I think that using a single
> wired TLB entry for a mux or for both the kstack and pcpu is easy and
> usable.  I feel like just wiring the kstack and putting a
> direct-mapped, sometimes-recomputed pointer to the pcpu into gp is the
> best combination in the long run — even just loading an immediate
> 64-bit address is pretty slow wrt how often things in the PCPU are
> accessed in SMP kernels.
>
> Juli.
>


More information about the freebsd-mips mailing list