cvs commit: src/sys/ia64/ia64 exception.S interrupt.c machdep.c mp_machdep.c pmap.c trap.c vm_machdep.c src/sys/ia64/include proc.h smp.h

Doug Rabson dfr at nlsystems.com
Sun Aug 7 09:02:49 GMT 2005


Excellent! When trying to think about per-cpu VHPT in the past, I could 
never quite see how to handle the collision chains sanely. The solution 
described below seems ideal.

On Saturday 06 August 2005 21:28, Marcel Moolenaar wrote:
> marcel      2005-08-06 20:28:19 UTC
>
>   FreeBSD src repository
>
>   Modified files:
>     sys/ia64/ia64        exception.S interrupt.c machdep.c
>                          mp_machdep.c pmap.c trap.c vm_machdep.c
>     sys/ia64/include     proc.h smp.h
>   Log:
>   Improve SMP support:
>   o  Allocate a VHPT per CPU. The VHPT is a hash table that the CPU
>      uses to look up translations it can't find in the TLB. As such,
>      the VHPT serves as a level 1 cache (the TLB being a level 0
> cache) and best results are obtained when it's not shared between
> CPUs. The collision chain (i.e. the hash bucket) is shared between
> CPUs, as all buckets together constitute our collection of PTEs. To
> achieve this, the collision chain does not point to the first PTE in
> the list anymore, but to a hash bucket head structure. The head
> structure contains the pointer to the first PTE in the list, as well
> as a mutex to lock the bucket. Thus, each bucket is locked
> independently of each other. With at least 1024 buckets in the VHPT,
> this provides for sufficiently finei-grained locking to make the
> ssolution scalable to large SMP machines.
>   o  Add synchronisation to the lazy FP context switching. We do this
>      with a seperate per-thread lock. On SMP machines the lazy high
> FP context switching without synchronisation caused inconsistent
> state, which resulted in a panic. Since the use of the high FP
> registers is not common, it's possible that races exist. The ia64
> package build has proven to be a good stress test, so this will get
> plenty of exercise in the near future.
>   o  Don't use the local ID of the processor we want to send the IPI
> to as the argument to ipi_send(). use the struct pcpu pointer
> instead. The reason for this is that IPI delivery is unreliable. It
> has been observed that sending an IPI to a CPU causes it to receive a
> stray external interrupt. As such, we need a way to make the delivery
> reliable. The intended solution is to queue requests in the target
> CPU's per-CPU structure and use a single IPI to inform the CPU that
> there's a new entry in the queue. If that IPI gets lost, the CPU can
> check it's queue at any convenient time (such as for each clock
> interrupt). This also allows us to send requests to a CPU without
> interrupting it, if such would be beneficial.
>
>   With these changes SMP is almost working. There are still some
> random process crashes and the machine can hang due to having the IPI
> lost that deals with the high FP context switch.
>
>   The overhead of introducing the hash bucket head structure results
>   in a performance degradation of about 1% for UP (extra pointer
>   indirection). This is surprisingly small and is offset by gaining
>   reasonably/good scalable SMP support.
>
>   Revision  Changes    Path
>   1.57      +8 -0      src/sys/ia64/ia64/exception.S
>   1.50      +5 -0      src/sys/ia64/ia64/interrupt.c
>   1.201     +30 -13    src/sys/ia64/ia64/machdep.c
>   1.56      +29 -25    src/sys/ia64/ia64/mp_machdep.c
>   1.161     +227 -272  src/sys/ia64/ia64/pmap.c
>   1.114     +12 -7     src/sys/ia64/ia64/trap.c
>   1.91      +1 -0      src/sys/ia64/ia64/vm_machdep.c
>   1.15      +2 -1      src/sys/ia64/include/proc.h
>   1.10      +4 -2      src/sys/ia64/include/smp.h


More information about the cvs-all mailing list