Some performance measurements on the FreeBSD network stack

Mon Apr 23 13:55:15 UTC 2012

Thus spake Bruce Evans <brde at optusnet.com.au>:

> On Fri, 20 Apr 2012, K. Macy wrote:
>
>> On Fri, Apr 20, 2012 at 4:44 PM, Luigi Rizzo <rizzo at iet.unipi.it> wrote:
>
>>> The small penalty when flowtable is disabled but compiled in is
>>> probably because the net.flowtable.enable flag is checked
>>> a bit deep in the code.
>>>
>>> The advantage with non-connect()ed sockets is huge. I don't
>>> quite understand why disabling the flowtable still helps there.
>>
>> Do you mean having it compiled in but disabled still helps
>> performance? Yes, that is extremely strange.
>
> This reminds me that when I worked on this, I saw very large throughput
> differences (in the 20-50% range) as a result of minor changes in
> unrelated code.  I could get these changes intentionally by adding or
> removing padding in unrelated unused text space, so the differences were
> apparently related to text alignment.  I thought I had some significant
> micro-optimizations, but it turned out that they were acting mainly by
> changing the layout in related used text space where it is harder to
> control.

For short code paths, code layout can significantly influence
performance. We have been puzzled (in a project unrelated to FreeBSD) by
a 10% performance drop in some microbenchmark that was ultimately caused
by having all our code hotspots linked at 8K aligned addresses, which
caused them to evict each other from the L1 instruction cache, because
its associativity was too small.

A simple way to check for this would be to have the option to build a
kernel with random linking order. I don't know how difficult it is to
implement that in the current FreeBSD toolchain.

Julian