Much improved sosend_*() functions

Andre Oppermann andre at freebsd.org
Thu Sep 28 15:10:27 PDT 2006


The recent addition of TSO (TCP Segmentation Offload) has highlighted some
shortcommings in our sosend_*() kernel implementation.  The current code
uses a sosend_copyin() function that loops over the supplied struct uio
and does interleaved mbuf allocations and uiomove() calls.

I have rewritten m_getm() to be simpler and to allocate PAGE_SIZE sized
jumbo mbuf clusters (4k on most architectures) as well as m_uiotombuf()
to use the new m_getm() to obtain all mbuf space in one go.  It then loops
over it an copies the data into the mbufs by using uiomove().  sosend_dgram()
and sosend_generic() are change to use m_uiotombuf() instead of sosend_copyin().

Looking at the benchmarks we see some very nice improvements (95% confidence):
  66% less cpu (or 2.9 times better) with new sosend vs. old sosend (non-TSO)
  65% less cpu (or 2.8 times better) with new sosend vs. old sosend (TSO)

The sender is an AMD Opteron 852 (2.6GHz) with em(4) PCI-X-133 interface and
the receiver is a DELL Poweredge SC1425 P-IV Xeon 3.2GHz with em(4) LOM connected
back to back at 1000Base-TX full duplex.

The patch is available here:
  http://people.freebsd.org/~andre/sosend+m_uiotombuf-20060928.diff

Any testing and heavy (code) beating and reviews welcome.

-- 
Andre

Here are the raw numbers (netperf at 95% confidence, +-2.5% error margin, the cpu
load reported by netperf is different from the one reported by time(1), all performance
references are made based on time(1) output, netperf 2.4.2 used):

  a) is old sosend kernel implementation
  b) is new sosend kernel implementation

  1) time ./netperf -H192.168.2.2,4 -tTCP_STREAM -C -c -F 6.2-BETA1-i386-disc1.iso
     -- -s32K -S32K  [non-TSO]
  2) time ./netperf -H192.168.2.2,4 -tTCP_STREAM -C -c -F 6.2-BETA1-i386-disc1.iso
     -- -s32K -S32K  [TSO]
  3) time ./netperf -H192.168.2.2,4 -tTCP_STREAM -C -c -F 6.2-BETA1-i386-disc1.iso
     -- -s64K -S64K  [non-TSO]
  4) time ./netperf -H192.168.2.2,4 -tTCP_STREAM -C -c -F 6.2-BETA1-i386-disc1.iso
     -- -s64K -S64K  [TSO]
  5) time ./netperf -H192.168.2.2,4 -tTCP_STREAM -C -c -F 6.2-BETA1-i386-disc1.iso
     -- -s128K -S128K  [non-TSO]
  6) time ./netperf -H192.168.2.2,4 -tTCP_STREAM -C -c -F 6.2-BETA1-i386-disc1.iso
     -- -s128K -S128K  [TSO]

     Recv   Send   Send                          Utilization       Service Demand
     Socket Socket Message  Elapsed              Send     Recv     Send    Recv
     Size   Size   Size     Time     Throughput  local    remote   local   remote
     bytes  bytes  bytes    secs.    10^6bits/s  % C      % C      us/KB   us/KB

1a) 32768  32768  32768    10.00       921.28   28.42    32.48    2.527   2.888
     0.007u 1.747s 0:10.00 17.4%     99+5252k 0+0io 0pf+0w

1b) 32768  32768  32768    10.00       921.39   24.51    31.50    2.179   2.801
     0.028u 0.768s 0:10.00 7.8%      78+4210k 0+0io 0pf+0w

2a) 32768  32768  32768    10.00       897.63   24.29    37.74    2.216   3.445
     0.000u 1.359s 0:10.02 13.4%     96+5152k 5+0io 3pf+0w

2b) 32768  32768  32768    10.00       919.71   15.64    33.01    1.393   2.940
     0.008u 0.528s 0:10.00 5.2%      90+4830k 0+0io 0pf+0w

3a) 65536  65536  65536    10.00       941.60   30.90    32.01    2.689   2.785
     0.000u 1.827s 0:10.00 18.2%     96+5180k 0+0io 0pf+0w

3b) 65536  65536  65536    10.00       941.59   26.39    32.03    2.296   2.787
     0.014u 0.617s 0:10.00 6.2%      101+5362k 0+0io 0pf+0w

4a) 65536  65536  65536    10.00       921.98   26.09    39.47    2.318   3.507
     0.000u 1.467s 0:10.02 14.5%     93+5028k 3+0io 0pf+0w

4b) 65536  65536  65536    10.00       938.44   16.24    34.29    1.418   2.993
     0.000u 0.511s 0:10.00 5.1%      91+4851k 0+0io 0pf+0w

5a) 131072 131072 131072    10.00       941.62   33.81    33.68    2.941   2.930
     0.000u 2.158s 0:10.00 21.5%     97+5247k 0+0io 0pf+0w

5b) 131072 131072 131072    10.00       941.60   28.55    31.65    2.484   2.754
     0.000u 0.676s 0:10.00 6.7%      95+5132k 0+0io 0pf+0w

6a) 131072 131072 131072    10.00       922.92   28.72    40.80    2.549   3.621
     0.000u 1.713s 0:10.00 17.1%     93+5016k 1+0io 0pf+0w

6b) 131072 131072 131072    10.00       939.14   18.20    34.44    1.587   3.004
     0.000u 0.587s 0:10.00 5.8%      78+4197k 1+0io 0pf+0w


More information about the freebsd-current mailing list