tcp_output starving -- is due to mbuf get delay?

Fri Apr 11 09:35:06 PDT 2003

On Fri, 11 Apr 2003 09:22:47 PDT Terry Lambert wrote:

> Borje Josefsson wrote:
> > I should add that I have tried with MTU 1500 also. Using NetBSD as sender
> > works fine (just a little bit higher CPU load). When we tried MTU1500 with
> > FreeBSD as sender, we got even lower performance.
> > 
> > Somebody else in this thread said that he had got full GE speed between
> > two FreeBSD boxes connected back-to-back. I don't question that, but that
> > doesn't prove anything. The problem arises when You are trying to do this
> > long-distance and have to handle a large mbuf queue.
> 
> The boxes were not connected "back to back", they were connected
> through three Gigabit switches and a VLAN trunk.  But they were in
> a lab, yes.
> 
> I'd be happy to try long distance for you, and even go so far as
> to fix the problem for you, if you are willing to drop 10GBit fiber
> to my house.  8-) 8-). 
> 
> As far as a large mbuf queue, one thing that's an obvious difference
> is SACK support; however, this can not be the problem, since the
> NetBSD->FreeBSD speed is unafected (supposedly).
> 
> What is the FreeBSD->NetBSD speed?
> 
> Some knobs to try on FreeBSD:

ttcp-t: buflen=61440, nbuf=20345, align=16384/0, port=5001
ttcp-t: socket
ttcp-t: connect
ttcp-t: 1249996800 bytes in 16.82 real seconds = 567.09 Mbit/sec +++
ttcp-t: 20345 I/O calls, msec/call = 0.85, calls/sec = 1209.79
ttcp-t: 0.0user 15.5sys 0:16real 92% 16i+380d 326maxrss 0+15pf 16+232csw

During that time "top" shows (on the sender):

CPU states:  0.4% user,  0.0% nice, 93.0% system,  6.6% interrupt,  0.0% 
idle

Just for comparation, running in the other direction (with NetBSD as 
sender):

CPU states:  0.0% user,  0.0% nice, 19.9% system, 13.9% interrupt, 66.2% 
idle

Netstat of that test:

 bge0 in       bge0 out              total in      total out            
 packets  errs  packets  errs colls   packets  errs  packets  errs colls
   14709     0    22742     0     0     14709     0    22742     0     0
   18602     0    28002     0     0     18602     0    28002     0     0
   18603     0    28006     0     0     18603     0    28006     0     0
   18599     0    28009     0     0     18600     0    28009     0     0
   18605     0    28006     0     0     18605     0    28006     0     0
   18607     0    28006     0     0     18608     0    28006     0     0
   18608     0    28006     0     0     18608     0    28006     0     0
 bge0 in       bge0 out              total in      total out            
 packets  errs  packets  errs colls   packets  errs  packets  errs colls
14389511   908 14772089     0     0  18404167   908 18231823     0     0
   18607     0    28003     0     0     18607     0    28003     0     0
   18598     0    28006     0     0     18599     0    28006     0     0
    5823     0     8161     0     0      5823     0     8161     0     0

Note how stable it is!

> 	net.inet.ip.intr_queue_maxlen		-> 300
> 	net.inet.ip.check_interface		-> 0
> 	net.inet.tcp.rfc1323			-> 0
> 	net.inet.tcp.inflight_enable		-> 1
> 	net.inet.tcp.inflight_debug		-> 0
> 	net.inet.tcp.delayed_ack		-> 0
> 	net.inet.tcp.newreno			-> 0
> 	net.inet.tcp.slowstart_flightsize	-> 4
> 	net.inet.tcp.msl			-> 1000
> 	net.inet.tcp.always_keepalive		-> 0
> 	net.inet.tcp.sendspace			-> 65536 (on sender)

Hmm. Isn't 65536 an order of magnitude to low? I used 
tcp.sendspace=3223281 for the test above. RTTmax = 20.63 ms. Buffer size 
needed = 2578625 bytes, and then add a few %.

> Don't try them all at once and expect magic; you will probably need
> some combination.
> 
> Also, try recompiling your kernel *without* IPSEC support.

I'll do that.

--Börje