Routing benchmarks
Mark Tinguely
tinguely at casselton.net
Tue Sep 9 16:27:37 UTC 2008
(trimmed the thread text)
> >> Running netserver on the gumstix shows a throughput of 2.4Mbit/s. At
> >> the moment I can't get if_bridge to work - will try to figure out what
> >> is going on. A bridging benchmark may be more informative.
> > My experience in working with architectures like this is that cache handling
> > can be a significant cost that doesn't always show up on a profile.
> >
>
> Thanks for the nice idea - will try something similar. At the moment
> I'm also suspecting that cache handling has got a lot to do with the
> performance figures that I'm seeing. The PXA255 has a 32KB data and
> 32KB instruction cache.
>
> Jacques
which version of freebsd are you using - we changed some cache flushing
routines between FreeBSD 7.x and current. Unless errors were introduced
or removed, there should not be that large of a change.
As mentioned, the ARM caches are pretty small. The ARM processors before
version 6, (anything before ARM10) uses virtually indexed / virtually tagged
caches, so they need to be flushed on context changes.
The version 6 and version 7 ARM processors (ARM10/ARM11) are either virtually
indexed / physically tagged or physically indexed / physically tagged.
The PIPT caches don't need to be flushed on context changes and were needed
for multiple processor support. The pmap code will have to be re-written to
take advantage of the PIPT caches (put a process value into the MMU and remove
most flushes).
Also, the pre version 6 ARM processors didn't allow for any spare bits in the
PTE for OS use. The newer processors have a bit or two, still not enough
for FreeBSD's needs, so we need to shadow these bits.
Thirdly, we dynamically allocate a seperate structure that mirrors the
page table. I think I have all the "paper scratching" required to move from
this structure to the FreeBSD i386/amd64 recursive page table approach.
--Mark Tinguely.
More information about the freebsd-arm
mailing list