tcp_output starving -- is due to mbuf get delay?

Thu Apr 10 06:12:05 PDT 2003

On Thu, 10 Apr 2003 07:31:25 CDT Eric Anderson wrote:

> David Gilbert wrote:
> >>>>>>"Jin" == Jin Guojun <[DSD]" <j_guojun at lbl.gov>> writes:
> >>>>>
> > 
> > Jin> Some details was left behind -- The machine is 2 GHz Intel P4
> > Jin> with 1 GB memory, so the delay is not from either CPU or lack of
> > Jin> memory.
> > 
> > I just want to quickly jump in with the comment that our GigE tests of
> > routing through FreeBSD have exposed several-order-of-magnitude
> > differences by changing ram/motherboard/which-slot-the-card-is-in.
> > 
> > Do not assume that a fast CPU is the key.  We went through 10
> > motherboards to commission the current routers ... and sometimes
> > faster cpus would route slower (in various combinations of
> > motherboards and RAM).
> 
> Wow - this is very interesting.  I have two machines with GigE in them, 
> one has 6 gige nics (intel pro/1000T server), in a dual Xeon 1.5Ghz 
> (P4), with 2gb of ram.  I haven't put it into production yet, so if 
> there are any tests I can do, let me know.  It would be good to know the 
> pitfalls before I bring it up live.. :)
> 

I am also *very* interested in participating in this. I did apply [some 
version of] Sean Chittendens patch, but that didn't help. With that patch 
applied my system deadlocked(?) when stressing it vith ttcp. To me (but 
bear in mind that I'm not at all a kernel hacker) one of the if-statements 
in the patch seem to cover to little - i.e. when having the fastscan OIS 
unset - code that has to do with fastscan is still executed, at least in 
the version I tried.

I get

Mar 27 11:42:41 stinky kernel: sbappend: bad tail 0x0xc0ef8200 instead of 
0x0xc0ef4b00

when "fastscan" is unset.

I have two test systems with xeon CPUs, that can handle a 1000 km (21 ms) 
full speed GE-connection quite well with NetBSD (I get approx 970 Mbit/sec 
with ttcp), but I run out of CPU when using FreeBSD long before I can fill 
then network. I could stay with NetBSD, but since I am used to FreeBSD I'd 
rader have that instead. We have tested with Linux too on the same 
distance, and it seems that they can handle this too.

What we did in NetBSD (-current) was to increase IFQ_MAXLEN in (their) 
sys/net/if.h, apart from that it's only "traditional" TCP tuning.

My hosts are connected directly to core routers in a 10Gbps nationwide 
network, so if anybody is interested in some testing I am more than 
willing to participate. If anybody produces a patch, I have a third system 
that I can use for piloting of that too.

--Börje