Performance of SheevaPlug on 8-stable

Mon Mar 8 19:55:04 UTC 2010

On Mon, Mar 08, 2010 at 01:37:23PM -0600, Mark Tinguely wrote:
> 
> 	<deleted>
> >
> >  This puzzled me as well.
> >  What is the requirement for such a handling with shared pages?
> >  I though handing over shared data is done by cache-flush, barriers or
> >  whatever an architectur has for this.
> >  Most systems we talk about are single CPU, so it is just DMA and
> >  handing over dcache writes to icache, but we don't support self
> >  modifying code, so it is always done in a controlled way.
> >  And even for SMP systems handing over data requires using
> >  cache coherence mechanisms - e.g. those embedded in mutexes.
> >  So what is wrong in my picture and requires us to do special handling
> >  for shared pages on ARM?
> >
> >  > And if there's only one copy of 'test' running, why does it hit the
> >  > 'shared' case for this code?
> >  > 
> >  > Warner
> >
> >  -- 
> >  B.Walter <bernd at bwct.de> http://www.bwct.de
> >  Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
> 
> ARMv4/ARMv5 use virtual indexed / virtual tagged level one caches.
> They may or may not have level two caches. This is the ARM chips
> that we currently support, and I will explain the rules below.
> 
> Newest processors the ARMv6 can be virtual index / physical tagged or
> physical index / physical tagged level one caches; The ARM7 must have
> physical index / physical tag level one caches. The ARMv6 and ARMv7 
> have more pde/pte bit explaining the cache status on the "inner"
> and "outter" caches. The ARMv7 has the more mature cache management;
> it defines the "level of unity" and "level of coherence" for the caches.
> There is also a level snooping for the ARMv7 mulit-core, that I will
> just dance around. PIPT cache must be synced to the "level of coherency"
> before DMA and when modified from another process - think debugger in
> another address space modifying instruction code. ARMv6/ARMv7 have
> special address spaces to avoid tlb flushes. If they are not used, then
> tlbs have to be flushed on context switch. This is close to the i386/amd64
> with the exception of DMA, the i386/amd64 have self snooping cache buses.
> 
> VIVT cache rules:
> 
>  1) flush cache and tlb on context change. 
> 
>  2) USER cache must be disabled if a physical page has AT LEAST one writable
>     user mapping AND is also mapped more than one time in the same user
>     address space. (multiple read mappings and no writes are fine, they take
>     up multiple cache entries. Obviously, a single read or a single write
>     is fine. If the mappings are in different user address spaces, we will
>     be okay because the flush on context change will sync things up).
> 
>  3) KERNEL spaces are global.
> 	a) If the page is mapped writable AT LEAST ONCE to a kernel space
> 	   AND the page is mapped more than once, no matter if the second
> 	   mapping is in the user or kernel space, all mappings must not
> 	   be cached. 

I never assumed to be happy without a direct map.

> 	b) If the page has only readable kernel mappings but at least one
> 	   writable user mapping, the cache must be disabled for the mappings
> 	   of page in this address space. This is slightly different from
> 	   rule 2. Kernel mappings are typically writable, so this is a
> 	   case that really does not happen.
> 
> It gets a little tricky to implement, because we have to catch the transition
> from cache -> non-cache (change pte and wbinv/inv data or instruction caches)
> and from non-cache -> cache (change the pte).

Thanks for the detailed explanation.
I took a while, but now I got it.
My picture wasn't expecting caching virtual pages.

-- 
B.Walter <bernd at bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.