Multiqueue support for bpf
Takuya ASADA
syuu at dokukino.com
Wed Aug 17 16:11:54 UTC 2011
2011/8/16 Vlad Galu <dudu at dudu.ro>:
> On Aug 16, 2011, at 11:50 AM, Vlad Galu wrote:
>> On Aug 16, 2011, at 11:13 AM, Takuya ASADA wrote:
>>> Hi all,
>>>
>>> I implemented multiqueue support for bpf, I'd like to present for review.
>>> This is a Google Summer of Code project, the project goal is to
>>> support multiqueue network interface on BPF, and provide interfaces
>>> for multithreaded packet processing using BPF.
>>> Modern high performance NICs have multiple receive/send queues and RSS
>>> feature, this allows to process packet concurrently on multiple
>>> processors.
>>> Main purpose of the project is to support these hardware and get
>>> benefit of parallelism.
>>>
>>> This provides following new APIs:
>>> - queue filter for each bpf descriptor (bpf ioctl)
>>> - BIOCENAQMASK Enables multiqueue filter on the descriptor
>>> - BIOCDISQMASK Disables multiqueue filter on the descriptor
>>> - BIOCSTRXQMASK Set mask bit on specified RX queue
>>> - BIOCCRRXQMASK Clear mask bit on specified RX queue
>>> - BIOCGTRXQMASK Get mask bit on specified RX queue
>>> - BIOCSTTXQMASK Set mask bit on specified TX queue
>>> - BIOCCRTXQMASK Clear mask bit on specified TX queue
>>> - BIOCGTTXQMASK Get mask bit on specified TX queue
>>> - BIOCSTOTHERMASK Set mask bit for the packets which not tied
>>> with any queues
>>> - BIOCCROTHERMASK Clear mask bit for the packets which not tied
>>> with any queues
>>> - BIOCGTOTHERMASK Get mask bit for the packets which not tied
>>> with any queues
>>>
>>> - generic interface for getting hardware queue information from NIC
>>> driver (socket ioctl)
>>> - SIOCGIFQLEN Get interface RX/TX queue length
>>> - SIOCGIFRXQAFFINITY Get interface RX queue affinity
>>> - SIOCGIFTXQAFFINITY Get interface TX queue affinity
>>>
>>> Patch for -CURRENT is here, right now it only supports igb(4),
>>> ixgbe(4), mxge(4):
>>> http://www.dokukino.com/mq_bpf_20110813.diff
>>>
>>> And below is performance benchmark:
>>>
>>> ====
>>> I implemented benchmark programs based on
>>> bpfnull(//depot/projects/zcopybpf/utils/bpfnull/),
>>>
>>> test_sqbpf measures bpf throughput on one thread, without using multiqueue APIs.
>>> http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/soc2011/mq_bpf/src/tools/regression/bpf/mq_bpf/test_sqbpf/test_sqbpf.c
>>>
>>> test_mqbpf is multithreaded version of test_sqbpf, using multiqueue APIs.
>>> http://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/soc2011/mq_bpf/src/tools/regression/bpf/mq_bpf/test_mqbpf/test_mqbpf.c
>>>
>>> I benchmarked with six conditions:
>>> - benchmark1 only reads bpf, doesn't write packet anywhere
>>> - benchmark2 writes packet on memory(mfs)
>>> - benchmark3 writes packet on hdd(zfs)
>>> - benchmark4 only reads bpf, doesn't write packet anywhere, with zerocopy
>>> - benchmark5 writes packet on memory(mfs), with zerocopy
>>> - benchmark6 writes packet on hdd(zfs), with zerocopy
>>>
>>>> From benchmark result, I can say the performance is increased using
>>> mq_bpf on 10GbE, but not on GbE.
>>>
>>> * Throughput benchmark
>>> - Test environment
>>> - FreeBSD node
>>> CPU: Core i7 X980 (12 threads)
>>> MB: ASUS P6X58D Premium(Intel X58)
>>> NIC1: Intel Gigabit ET Dual Port Server Adapter(82576)
>>> NIC2: Intel Ethernet X520-DA2 Server Adapter(82599)
>>> - Linux node
>>> CPU: Core 2 Quad (4 threads)
>>> MB: GIGABYTE GA-G33-DS3R(Intel G33)
>>> NIC1: Intel Gigabit ET Dual Port Server Adapter(82576)
>>> NIC2: Intel Ethernet X520-DA2 Server Adapter(82599)
>>>
>>> iperf used for generate network traffic, with following argument options
>>> - Linux node: iperf -c [IP] -i 10 -t 100000 -P12
>>> - FreeBSD node: iperf -s
>>> # 12 threads, TCP
>>>
>>> following sysctl parameter is changed
>>> sysctl -w net.bpf.maxbufsize=1048576
>>
>>
>> Thank you for your work! You may want to increase that (4x/8x) and rerun the test, though.
>
> More, actually. Your current buffer is easily filled.
Hi,
I measured performance again with maxbufsize = 268435456 and multiple
cpu configurations, here's an result.
It seems the performance on 10GbE is bit unstable, not scaling
linearly by adding cpus/queues.
Maybe it depends some sort of system parameter, but I don't figure out
the answer.
Multithreaded BPF performance is increasing than single thread BPF in
all case, anyway.
* Test environment
- FreeBSD node
CPU: Core i7 X980 (12 threads)
# Tested on 1 core, 2 core, 4 core and 6 core configuration (Each
core has 2 threads using HT)
MB: ASUS P6X58D Premium(Intel X58)
NIC: Intel Ethernet X520-DA2 Server Adapter(82599)
- Linux node
CPU: Core 2 Quad (4 threads)
MB: GIGABYTE GA-G33-DS3R(Intel G33)
NIC: Intel Ethernet X520-DA2 Server Adapter(82599)
- iperf
Linux node: iperf -c [IP] -i 10 -t 100000 -P16
FreeBSD node: iperf -s
# 16 threads, TCP
- system parameter
net.bpf.maxbufsize=268435456
hw.ixgbe.num_queues=[n queues]
* 2threads, 2queues
- iperf throughput
iperf only: 8.845Gbps
test_mqbpf: 5.78Gbps
test_sqbpf: 6.89Gbps
- test program throughput
test_mqbpf: 4526.863414 Mbps
test_sqbpf: 762.452475 Mbps
- received/dropped
test_mqbpf:
45315011 packets received (BPF)
9646958 packets dropped (BPF)
test_sqbpf:
56216145 packets received (BPF)
49765127 packets dropped (BPF)
* 4threads, 4queues
- iperf throughput
iperf only: 3.03Gbps
test_mqbpf: 2.49Gbps
test_sqbpf: 2.57Gbps
- test program throughput
test_mqbpf: 2420.195051 Mbps
test_sqbpf: 430.774870 Mbps
- received/dropped
test_mqbpf:
19601503 packets received (BPF)
0 packets dropped (BPF)
test_sqbpf:
22803778 packets received (BPF)
18869653 packets dropped (BPF)
* 8threads, 8queues
- iperf throughput
iperf only: 5.80Gbps
test_mqbpf: 4.42Gbps
test_sqbpf: 4.30Gbps
- test program throughput
test_mqbpf: 4242.314913 Mbps
test_sqbpf: 1291.719866 Mbps
- received/dropped
test_mqbpf:
34996953 packets received (BPF)
361947 packets dropped (BPF)
test_sqbpf:
35738058 packets received (BPF)
24749546 packets dropped (BPF)
* 12threads, 12queues
- iperf throughput
iperf only: 9.31Gbps
test_mqbpf: 8.06Gbps
test_sqbpf: 5.67Gbps
- test program throughput
test_mqbpf: 8089.242472 Mbps
test_sqbpf: 5754.910665 Mbps
- received/dropped
test_mqbpf:
73783957 packets received (BPF)
9938 packets dropped (BPF)
test_sqbpf:
49434479 packets received (BPF)
0 packets dropped (BPF)
More information about the freebsd-net
mailing list