Multiqueue support for bpf

Vlad Galu dudu at dudu.ro
Tue Aug 16 09:50:29 UTC 2011


On Aug 16, 2011, at 11:13 AM, Takuya ASADA wrote:
> Hi all,
> 
> I implemented multiqueue support for bpf, I'd like to present for review.
> This is a Google Summer of Code project, the project goal is to
> support multiqueue network interface on BPF, and provide interfaces
> for multithreaded packet processing using BPF.
> Modern high performance NICs have multiple receive/send queues and RSS
> feature, this allows to process packet concurrently on multiple
> processors.
> Main purpose of the project is to support these hardware and get
> benefit of parallelism.
> 
> This provides following new APIs:
> - queue filter for each bpf descriptor (bpf ioctl)
>    - BIOCENAQMASK    Enables multiqueue filter on the descriptor
>    - BIOCDISQMASK    Disables multiqueue filter on the descriptor
>    - BIOCSTRXQMASK    Set mask bit on specified RX queue
>    - BIOCCRRXQMASK    Clear mask bit on specified RX queue
>    - BIOCGTRXQMASK    Get mask bit on specified RX queue
>    - BIOCSTTXQMASK    Set mask bit on specified TX queue
>    - BIOCCRTXQMASK    Clear mask bit on specified TX queue
>    - BIOCGTTXQMASK    Get mask bit on specified TX queue
>    - BIOCSTOTHERMASK    Set mask bit for the packets which not tied
> with any queues
>    - BIOCCROTHERMASK    Clear mask bit for the packets which not tied
> with any queues
>    - BIOCGTOTHERMASK    Get mask bit for the packets which not tied
> with any queues
> 
> - generic interface for getting hardware queue information from NIC
> driver (socket ioctl)
>    - SIOCGIFQLEN    Get interface RX/TX queue length
>    - SIOCGIFRXQAFFINITY    Get interface RX queue affinity
>    - SIOCGIFTXQAFFINITY    Get interface TX queue affinity
> 
> Patch for -CURRENT is here, right now it only supports igb(4),
> ixgbe(4), mxge(4):
> http://www.dokukino.com/mq_bpf_20110813.diff
> 
> And below is performance benchmark:
> 
> ====
> I implemented benchmark programs based on
> bpfnull(//depot/projects/zcopybpf/utils/bpfnull/),
> 
> test_sqbpf measures bpf throughput on one thread, without using multiqueue APIs.
> http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/soc2011/mq_bpf/src/tools/regression/bpf/mq_bpf/test_sqbpf/test_sqbpf.c
> 
> test_mqbpf is multithreaded version of test_sqbpf, using multiqueue APIs.
> http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/soc2011/mq_bpf/src/tools/regression/bpf/mq_bpf/test_mqbpf/test_mqbpf.c
> 
> I benchmarked with six conditions:
> - benchmark1 only reads bpf, doesn't write packet anywhere
> - benchmark2 writes packet on memory(mfs)
> - benchmark3 writes packet on hdd(zfs)
> - benchmark4 only reads bpf, doesn't write packet anywhere, with zerocopy
> - benchmark5 writes packet on memory(mfs), with zerocopy
> - benchmark6 writes packet on hdd(zfs), with zerocopy
> 
>> From benchmark result, I can say the performance is increased using
> mq_bpf on 10GbE, but not on GbE.
> 
> * Throughput benchmark
> - Test environment
> - FreeBSD node
>   CPU: Core i7 X980 (12 threads)
>   MB: ASUS P6X58D Premium(Intel X58)
>   NIC1: Intel Gigabit ET Dual Port Server Adapter(82576)
>   NIC2: Intel Ethernet X520-DA2 Server Adapter(82599)
> - Linux node
>   CPU: Core 2 Quad (4 threads)
>   MB: GIGABYTE GA-G33-DS3R(Intel G33)
>   NIC1: Intel Gigabit ET Dual Port Server Adapter(82576)
>   NIC2: Intel Ethernet X520-DA2 Server Adapter(82599)
> 
> iperf used for generate network traffic, with following argument options
>   - Linux node: iperf -c [IP] -i 10 -t 100000 -P12
>   - FreeBSD node: iperf -s
>   # 12 threads, TCP
> 
> following sysctl parameter is changed
>   sysctl -w net.bpf.maxbufsize=1048576


Thank you for your work! You may want to increase that (4x/8x) and rerun the test, though.


More information about the freebsd-net mailing list