Code review: groundwork for SMP
rrs at lakerest.net
Fri Jan 29 04:27:08 UTC 2010
Now overall my first reaction to this was.. hey this is
a cool way to do this.
But I have been thinking about this a bit more and I am now
having second thoughts.
So lets see the old way was to have an array of
pcpup[MAXCPU]. The down side to this is it may cause
cache line tweaks and you are limited by the MAXCPU of course.
Now we can get around the cache line tweaks by simply padding
out the structure to the cache line size.. so thats probably not
The MAXCPU is not extendable without a recompile.. but thats also
true for your method. Though sexy it has a very big down side that
I think calls for us to think real hard before going this route.
It burns up TLB entries. Ok that does not sound so bad on the
surface but wait lets think about this.. and I am going to
speak in terms of XLR... but other mips processors may have
the same issue.
1) I have 8 cores per cpu pack.
2) Each core has 4 "threads" which are kinda hyper threads, their
own register set, there own everything accept they share a pipeline
and get scheduled when one of the others are blocked.
3) This means I still need a pcpup per thread.
4) Now I have 64 TLB entries for every CPU complex. I can have them
16 per thread OR 64 shared amongst all threads.
5) This means I dedicate 4 of my 64 TLB entries for your pcpup entries.
Now if I am busy, when I need TLB entries most, I am pretty sure all 4
pcpup entries will need to be active and can't be pulled out. So I
over 6% of my TLB entries for this method.
a) I am still bound to MAXCPU .. so there is no dynamics here.. not
that there needs to be
b) I loose one of the most precious and sparse resources in a mips
c) And I really don't gain much over just having an array of pcpup
by my CPUID separated by a pad of a cache line each.
I really would vote that we DO NOT do this. Instead lets stick with an
pad it up to a cache line size if its not already..
Other wise you burn up 4 entries for me (and maybe other platforms)
very little gain. I would rather find a way to get a superpage mapped
the entire kernel + some and hardwire it in...
I am not trying to be negative here because I do think its a real sexy
access the pcpups.. it makes it real transparent... and on the XLP
would have a lot more TLB entries thats cool.. but I have to think of
existing hardware and its very very small TLB's which is one of the
main things limiting performance.
So overall I say we do NOT do this..
On Jan 28, 2010, at 2:01 PM, Neel Natu wrote:
> Forwarding to freebsd-mips as suggested by Warner.
> This is a patch to provide access to pcpu structures for SMP kernels.
> The basic idea is to use a the same virtual address as a window onto
> distinct physical memory locations - one per processor. The physical
> address that you access through this mapping depends on which cpu you
> are currently executing on. We can now use the virtual address on any
> processor to access its per-cpu area.
> The details are:
> 1. The virtual address for 'struct pcpu *pcpup' is obtained by
> stealing 2 pages worth of KVA in pmap_bootstrap().
> 2. The mapping from the constant virtual address to a distinct
> physical page is done in cpu_pcpu_init() through a wired TLB entry.
> 3. A side-effect of this is that we reserve 2 pages worth of memory
> for the pcpu but in reality it needs much less than that. The unused
> memory is now used as the boot stack for the BSP and APs.
> I also cleaned up locore.S to remove the SMP-specific bits from it. I
> plan to use a separate mpboot.S for the AP bootstrap.
> Please review.
> The patch is available here:
> freebsd-mips at freebsd.org mailing list
> To unsubscribe, send any mail to "freebsd-mips-
> unsubscribe at freebsd.org"
More information about the freebsd-mips