Performance of SheevaPlug on 8-stable
tinguely at casselton.net
Sun Mar 7 21:25:43 UTC 2010
FreeBSD-current has kernel and user witness turned on. Witness is for
locks, so it should not change the performance of a tight arithmetic loop
I don't know the marvell interals, and from what I tell, their technial
docs require NDA. That said, many of the ARM processors also have a
instruction internal cache (instruction prefetch) in addition to the
instruction cache. I don't think the prefetch has an enable/disable.
It looks like from the cpu identification that the the branch prediction
is turned on. Branch prediction compensates for the longer pipelines.
I can't see how in the tight loop how that could go astray.
Thus says the ARM ARM:
ARM implementations are free to choose how far ahead of the
current point of execution they prefetch instructions; either
a fixed or a dynamically varying number of instructions. As well
as being free to choose how many instructions to prefetch, an ARM
implementation can choose which possible future execution path to
prefetch along. For example, after a branch instruction, it can
choose to prefetch either the instruction following the branch
or the instruction at the branch target. This is known as branch
There are a few data dangling allocations that I would like to see
closed from the multiple kernel allocation fix. *IN THEORY, IF* a page
is allocated via the arm_nocache (DMA COHERENT) or a sendfile, then
it is never marked as unallocated. *IN THEORY*, if that page is used
again, then we could falsely believe that page is being shared and
we turn off the cache, eventhough it is not shared.
* Disclaimer: I am not sure if DMA COHERENT nor sendfiles are used in
the Sheeva implementation. This is a theoritical observation of a side
effect of the multiple kernel mapping patch that we did just before
More information about the freebsd-arm