Routing benchmarks

Tue Sep 9 16:27:37 UTC 2008

(trimmed the thread text)

>  >> Running netserver on the gumstix shows a throughput of 2.4Mbit/s. At
>  >> the moment I can't get if_bridge to work - will try to figure out what
>  >> is going on. A bridging benchmark may be more informative.

>  > My experience in working with architectures like this is that cache handling
>  > can be a significant cost that doesn't always show up on a profile.
>  >
>
>  Thanks for the nice idea - will try something similar. At the moment
>  I'm also suspecting that cache handling has got a lot to do with the
>  performance figures that I'm seeing. The PXA255 has a 32KB data and
>  32KB instruction cache.
>
>  Jacques

which version of freebsd are you using - we changed some cache flushing
routines between FreeBSD 7.x and current. Unless errors were introduced
or removed, there should not be that large of a change.

As mentioned, the ARM caches are pretty small. The ARM processors before
version 6, (anything before ARM10) uses virtually indexed / virtually tagged
caches, so they need to be flushed on context changes.

The version 6 and version 7 ARM processors (ARM10/ARM11) are either virtually
indexed / physically tagged or physically indexed / physically tagged.
The PIPT caches don't need to be flushed on context changes and were needed
for multiple processor support. The pmap code will have to be re-written to
take advantage of the PIPT caches (put a process value into the MMU and remove
most flushes).

Also, the pre version 6 ARM processors didn't allow for any spare bits in the
PTE for OS use. The newer processors have a bit or two, still not enough
for FreeBSD's needs, so we need to shadow these bits. 

Thirdly, we dynamically allocate a seperate structure that mirrors the
page table. I think I have all the "paper scratching" required to move from
this structure to the FreeBSD i386/amd64 recursive page table approach.

--Mark Tinguely.