Multiqueue support for bpf

Takuya ASADA syuu at dokukino.com
Tue Aug 16 09:39:34 UTC 2011


Hi all,

I implemented multiqueue support for bpf, I'd like to present for review.
This is a Google Summer of Code project, the project goal is to
support multiqueue network interface on BPF, and provide interfaces
for multithreaded packet processing using BPF.
Modern high performance NICs have multiple receive/send queues and RSS
feature, this allows to process packet concurrently on multiple
processors.
Main purpose of the project is to support these hardware and get
benefit of parallelism.

This provides following new APIs:
- queue filter for each bpf descriptor (bpf ioctl)
    - BIOCENAQMASK    Enables multiqueue filter on the descriptor
    - BIOCDISQMASK    Disables multiqueue filter on the descriptor
    - BIOCSTRXQMASK    Set mask bit on specified RX queue
    - BIOCCRRXQMASK    Clear mask bit on specified RX queue
    - BIOCGTRXQMASK    Get mask bit on specified RX queue
    - BIOCSTTXQMASK    Set mask bit on specified TX queue
    - BIOCCRTXQMASK    Clear mask bit on specified TX queue
    - BIOCGTTXQMASK    Get mask bit on specified TX queue
    - BIOCSTOTHERMASK    Set mask bit for the packets which not tied
with any queues
    - BIOCCROTHERMASK    Clear mask bit for the packets which not tied
with any queues
    - BIOCGTOTHERMASK    Get mask bit for the packets which not tied
with any queues

- generic interface for getting hardware queue information from NIC
driver (socket ioctl)
    - SIOCGIFQLEN    Get interface RX/TX queue length
    - SIOCGIFRXQAFFINITY    Get interface RX queue affinity
    - SIOCGIFTXQAFFINITY    Get interface TX queue affinity

Patch for -CURRENT is here, right now it only supports igb(4),
ixgbe(4), mxge(4):
http://www.dokukino.com/mq_bpf_20110813.diff

And below is performance benchmark:

====
I implemented benchmark programs based on
bpfnull(//depot/projects/zcopybpf/utils/bpfnull/),

test_sqbpf measures bpf throughput on one thread, without using multiqueue APIs.
http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/soc2011/mq_bpf/src/tools/regression/bpf/mq_bpf/test_sqbpf/test_sqbpf.c

test_mqbpf is multithreaded version of test_sqbpf, using multiqueue APIs.
http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/soc2011/mq_bpf/src/tools/regression/bpf/mq_bpf/test_mqbpf/test_mqbpf.c

I benchmarked with six conditions:
 - benchmark1 only reads bpf, doesn't write packet anywhere
 - benchmark2 writes packet on memory(mfs)
 - benchmark3 writes packet on hdd(zfs)
 - benchmark4 only reads bpf, doesn't write packet anywhere, with zerocopy
 - benchmark5 writes packet on memory(mfs), with zerocopy
 - benchmark6 writes packet on hdd(zfs), with zerocopy

>From benchmark result, I can say the performance is increased using
mq_bpf on 10GbE, but not on GbE.

* Throughput benchmark
- Test environment
 - FreeBSD node
   CPU: Core i7 X980 (12 threads)
   MB: ASUS P6X58D Premium(Intel X58)
   NIC1: Intel Gigabit ET Dual Port Server Adapter(82576)
   NIC2: Intel Ethernet X520-DA2 Server Adapter(82599)
 - Linux node
   CPU: Core 2 Quad (4 threads)
   MB: GIGABYTE GA-G33-DS3R(Intel G33)
   NIC1: Intel Gigabit ET Dual Port Server Adapter(82576)
   NIC2: Intel Ethernet X520-DA2 Server Adapter(82599)

iperf used for generate network traffic, with following argument options
   - Linux node: iperf -c [IP] -i 10 -t 100000 -P12
   - FreeBSD node: iperf -s
   # 12 threads, TCP

following sysctl parameter is changed
   sysctl -w net.bpf.maxbufsize=1048576

- Benchmark1
Benchmark1 doesn't write packet anywhere using following commands
./test_sqbpf -i [interface] -b 1048576
./test_mqbpf -i [interface] -b 1048576
   - ixgbe
       test_mqbpf: 5303.09007533333 Mbps
       test_sqbpf: 3959.83021733333 Mbps
   - igb
       test_mqbpf: 916.752133333333 Mbps
       test_sqbpf: 917.597079 Mbps

- Benchmark2
Benchmark2 write packet on mfs using following commands
mdmfs -s 10G md /mnt
./test_sqbpf -i [interface] -b 1048576 -w -f /mnt/test
./test_mqbpf -i [interface] -b 1048576 -w -f /mnt/test
   - ixgbe
       test_mqbpf: 1061.24890333333 Mbps
       test_sqbpf: 204.779881 Mbps
   - igb
       test_mqbpf: 916.656664666667 Mbps
       test_sqbpf: 914.378636 Mbps

- Benchmark3
Benchmark3 write packet on zfs(on HDD) using following commands
./test_sqbpf -i [interface] -b 1048576 -w -f test
./test_mqbpf -i [interface] -b 1048576 -w -f test
   - ixgbe
       test_mqbpf: 119.912253333333 Mbps
       test_sqbpf: 101.195918 Mbps
   - igb
       test_mqbpf: 228.910355333333 Mbps
       test_sqbpf: 199.639093666667 Mbps

- Benchmark4
Benchmark4 doesn't write packet anywhere using following commands, with zerocopy
./test_sqbpf -i [interface] -b 1048576
./test_mqbpf -i [interface] -b 1048576
   - ixgbe
       test_mqbpf: 4772.924974 Mbps
       test_sqbpf: 3173.19967133333 Mbps
   - igb
       test_mqbpf: 931.217345 Mbps
       test_sqbpf: 925.965270666667 Mbps

- Benchmark5
Benchmark5 write packet on mfs using following commands, with zerocopy
mdmfs -s 10G md /mnt
./test_sqbpf -i [interface] -b 1048576 -w -f /mnt/test
./test_mqbpf -i [interface] -b 1048576 -w -f /mnt/test
   - ixgbe
       test_mqbpf: 306.902822333333 Mbps
       test_sqbpf: 317.605016666667 Mbps
   - igb
       test_mqbpf: 729.075349666667 Mbps
       test_sqbpf: 708.987822666667 Mbps

- Benchmark6
Benchmark6 write packet on zfs(on HDD) using following commands, with zerocopy
./test_sqbpf -i [interface] -b 1048576 -w -f test
./test_mqbpf -i [interface] -b 1048576 -w -f test
   - ixgbe
       test_mqbpf: 174.016136666667 Mbps
       test_sqbpf: 138.068732666667 Mbps
   - igb
       test_mqbpf: 228.794880333333 Mbps
       test_sqbpf: 229.367386333333 Mbps


More information about the freebsd-net mailing list