HEADS UP: zerocopy bpf commits impending
Robert Watson
rwatson at FreeBSD.org
Tue Apr 8 12:28:19 UTC 2008
On Tue, 8 Apr 2008, Darren Reed wrote:
> Is there a performance analysis of the copy vs zerocopy available? (I don't
> see one in the paper, just a "to do" item.)
>
> The numbers I'm interested in seeing are how many Mb/s you can capture
> before you start suffering packet loss. This needs to be done with
> sequenced packets so that you can observe gaps in the sequence captured.
We've done some analysis, and a couple of companies have the zero-copy BPF
code deployed. I hope to generate a more detailed analysis before the
developer summit so we can review it at BSDCan. The basic observation is that
for quite a few types of network links, the win isn't in packet loss per se,
but in reduced CPU use, freeing up CPU for other activities. There are a
number of sources of win:
- Reduced system call overhead -- as load increases, # system calls goes down,
especially if you get a two-CPU pipeline going.
- Reduced memory access, especially for larger buffer sizes, avoids filling
the cache twice (first in copyout, then again in using the buffer in
userspace).
- Reduced lock contention, as only a single thread, the device driver ithread,
is acquiring the bpf descriptor's lock, and it's no longer contending with
the user thread.
One interesting, and in retrospect reasonable, side effect is that user CPU
time goes up in the SMP scenario, as cache misses on the BPF buffer move from
the read() system call to userspace. And, as you observe, you have to use
somewhat larger buffer sizes, as in the previous scenario there were three
buffers: two kernel buffers and a user buffer, and now there are simply two
kernel buffers shared directly with user space.
The original committed version has a problem in that it allows only one kernel
buffer to be "owned" by userspace at a time, which can lead to excess calls to
select(); this has now been corrected, so if people have run performance
benchmarks, they should update to the new code and re-run them.
I don't have numbers off-hand, but 5%-25% were numbers that appeared in some
of the measurements, and I'd like to think that the recent fix will further
improve that.
For 10gbps, something we need to think about is how to modify the structure of
BPF to allow different BPF devices for different input queues...
Robert N M Watson
Computer Laboratory
University of Cambridge
More information about the freebsd-arch
mailing list