xmit_more / packet batching
Andrew Gallatin
gallatin at netflix.com
Thu Jan 28 20:24:32 UTC 2016
I brought up packet batching in today's meeting, and mentioned
Mellanox asking about a feature like xmit_more in Linux.
To recap, "xmit_more" is a flag to an skb in linux that the stack can
use to tell the driver that there are more packets coming immediately,
and so it is allowed to delay writing any doorbells to notify the NIC
about the transmission. This offers a fairly large amount of savings,
and it avoids mmio access to the NIC doorbells, and (in Mellanox's
case) can be used to reduce transmit completions.
There is a description in more detail of how it works in linux at
http://netoptimizer.blogspot.com/2014/10/unlocked-10gbps-tx-wirespeed-smallest.html
(as I said .. benchmarks .. )
It looks like they are bulking up things as they come out of their
qdisc layer, above the drivers. Note that they have a fairly
sophisticated set of qdiscs that are usable with modern, multi-queue
drivers, as well as centralized software queuing that all drivers use.
Without properly adding all those layers, the simplest thing to
do would be to just have tcp_output (and other proto output
routines) set a similar flag when they are looping, and
sending down mbufs to ip_output. The problem with that is simply
that (at least for our internet facing workloads at netflix), sending
more data on a socket than the tso max seg size is quite rare.
(see appended dtrace output from a machine serving ~90K connections
at ~85Gb/s)
What is unclear to me is whether or not linux would see any benefit
from xmit more in our sort of workload. My guess is that it would not,
as the qdiscs will not be delaying things, and they'll still be mostly
limited to client ack packing, so will also rarely be sending more than
a TSO max seg size.
Drew
c094.ord001.dev# dtrace -s ./xmit_len.d
dtrace: script './xmit_len.d' matched 1 probe
^C
0
value ------------- Distribution ------------- count
< 1514 |@@@ 1454289
1514 |@@@@@@@@ 4601582
2514 |@@@@@@ 3529106
3514 |@@@@@ 2902881
4514 | 150322
5514 |@@@ 1433990
6514 |@ 722206
7514 | 121788
8514 |@ 855718
9514 |@ 441330
10514 | 86424
11514 |@ 742827
12514 |@ 405603
13514 | 33753
14514 |@ 432492
15514 |@ 451240
16514 |@ 309429
17514 | 67458
18514 | 243994
19514 | 275286
20514 | 27206
21514 | 198855
22514 |@ 291526
23514 | 20636
24514 | 210524
25514 | 175945
26514 | 13526
27514 | 156973
28514 | 174066
29514 | 115377
30514 | 24194
31514 | 138833
32514 | 104053
33514 | 23385
34514 | 131241
35514 | 92529
36514 | 27125
37514 | 88839
38514 | 76915
39514 | 6554
40514 | 86519
41514 | 70078
42514 | 55289
43514 | 16175
44514 | 74943
45514 | 66529
46514 | 12766
47514 | 66808
48514 | 51397
49514 | 14020
50514 | 46155
51514 | 37579
52514 | 9177
53514 | 39342
54514 | 31170
55514 | 8080
56514 | 34876
57514 | 40574
58514 | 32921
59514 | 6768
60514 | 30252
61514 | 25051
62514 | 5609
63514 | 174059
>= 64514 |@ 445810
0 65516
c094.ord001.dev# cat xmit_len.d
fbt::mlx5e_xmit:entry
{
m = (struct mbuf *)arg1;
/* @len[0] = quantize(m->M_dat.MH.MH_pkthdr.len); */
@len[0] = lquantize(m->M_dat.MH.MH_pkthdr.len, 1514, 65000, 1000);
@m[0] = max(m->M_dat.MH.MH_pkthdr.len);
}
More information about the freebsd-transport
mailing list