tcp_output starving -- is due to mbuf get delay?

Jin Guojun [DSD] j_guojun at lbl.gov
Wed Apr 9 15:44:04 PDT 2003


When testing  GigE path that has 67 ms RTT, the maximum TCP throughput is
limited at 250 Mb/s. By tracing the problem, I found that tcp_output() is
starving
where snd_wnd and snd_cwnd are fully open. The snd_cc is never filled beyond
the 4.05MB even though the snd_hiwat is 10MB and snd_sbmax is 8MB. That is,
sosend never stopped at sbwait. So only place can slow down is the mbuf
allocation
in sosend(). The attached trace file shows that each MGET and MCLGET takes
significant time -- around 8 us at slow start time, and gradually increasing
after that
in an range 18 to 648.
Each packet Tx on GigE takes 12 us. It average mbuf allocation takes 18 us, then

the performance will be reduced to 40%, in fact it is down to 25%, which means
higher average delay.

I have change NMBCLUSTER from 2446 to 6566 to 10240, and nothing is improved.

Any one can tell what factors would cause MGET / MCLGET to wait?
Is there any way to make MGET/MCLGET not to wait?

    -Jin

----------- system info -------------

kern.ipc.maxsockbuf: 10485760
net.inet.tcp.sendspace: 8388608
kern.ipc.nmbclusters: 10240
kern.ipc.mbuf_wait: 32
kern.ipc.mbtypes: 2606 322 0 0 0 0 0 0 0 0 0 0 0 0 0 0
kern.ipc.nmbufs: 40960

-------------- code trace and explanation ----------

sosend()
{
...
                if (space < resid + clen &&
                    (atomic || space < so->so_snd.sb_lowat || space < clen)) {
                        if (so->so_state & SS_NBIO)
                                snderr(EWOULDBLOCK);
                        sbunlock(&so->so_snd);
                        error = sbwait(&so->so_snd);        /***** never come
down to here ****/
                        splx(s);
                        if (error)
                                goto out;
                        goto restart;
                }
                splx(s);
                mp = &top;
                space -= clen;
                do {
                    if (uio == NULL) {
                        /*
                         * Data is prepackaged in "top".
                         */
                        resid = 0;
                        if (flags & MSG_EOR)
                                top->m_flags |= M_EOR;
                    } else do {
                        if (top == 0) {
microtime(&t1);
                                MGETHDR(m, M_WAIT, MT_DATA);
                                if (m == NULL) {
                                        error = ENOBUFS;
                                        goto release;
                                }
                                mlen = MHLEN;
                                m->m_pkthdr.len = 0;
                                m->m_pkthdr.rcvif = (struct ifnet *)0;
                        } else {
                                MGET(m, M_WAIT, MT_DATA);
                                if (m == NULL) {
                                        error = ENOBUFS;
                                        goto release;
                                }
                                mlen = MLEN;
                        }
                        if (resid >= MINCLSIZE) {
                                MCLGET(m, M_WAIT);
                                if ((m->m_flags & M_EXT) == 0)
                                        goto nopages;
                                mlen = MCLBYTES;
                                len = min(min(mlen, resid), space);
                        } else {
nopages:
                                len = min(min(mlen, resid), space);
                                /*
                                 * For datagram protocols, leave room
                                 * for protocol headers in first mbuf.
                                 */
                                if (atomic && top == 0 && len < mlen)
                                        MH_ALIGN(m, len);
                        }
microtime(&t2);
td = time_diff(&t2, &t1);
if ((td > 5 && (++tcnt & 31) == 0) || td > 50)
    log( ... "td %d %d\n", td, tcnt);

...

} /* end of sosend */
-------------- next part --------------
/kernel: l-timer 1049825316.1469006 1049825316.787656 68135 seq# 1608354634
/kernel: lion init rtt 68135 hiwat 8388608 mbmax 10485760
/kernel: init6 maxbss 8516875 (1460) co_tick 1 nbs 6 NIC 1000000
	#       delay count
/kernel: sosend: td 6 32
/kernel: sosend: td 7 64
/kernel: sosend: td 6 96
/kernel: sosend: td 9 128
/kernel: sosend: td 6 160
/kernel: sosend: td 6 192
/kernel: sosend: td 7 224
/kernel: sosend: td 6 256
/kernel: sosend: td 6 288
/kernel: sosend: td 7 320
/kernel: sosend: td 6 352
/kernel: sosend: td 7 384
/kernel: sosend: td 8 416
/kernel: sosend: td 6 448
/kernel: sosend: td 6 480
/kernel: sosend: td 6 512
/kernel: sosend: td 6 544
/kernel: sosend: td 6 576
/kernel: sosend: td 6 608
/kernel: sosend: td 6 640
/kernel: sosend: td 6 672
/kernel: sosend: td 6 704
/kernel: sosend: td 7 736
/kernel: sosend: td 8 768
/kernel: sosend: td 7 800
/kernel: sosend: td 7 832
/kernel: sosend: td 7 864
/kernel: sosend: td 8 896
/kernel: sosend: td 8 928
/kernel: sosend: td 7 960
/kernel: tcp_lion SO_SNDBUF 8517136	// end of slow start
/kernel: #(MBS 25550625) swnd 202496 cwnd 6744744 mbcnt 4133376 sp 4843184
/kernel: sosend: td 58 965
/kernel: sosend: td 370 966
/kernel: sosend: td 57 970
/kernel: sosend: td 77 972
/kernel: sosend: td 52 974
/kernel: sosend: td 55 975
/kernel: sosend: td 58 976
/kernel: sosend: td 53 982
/kernel: sosend: td 25 992
/kernel: sosend: td 55 1017
/kernel: sosend: td 19 1024
/kernel: sosend: td 27 1056
/kernel: sosend: td 26 1088
/kernel: sosend: td 43 1120
/kernel: sosend: td 30 1152
/kernel: sosend: td 17 1184
/kernel: sosend: td 25 1216
/kernel: sosend: td 25 1248
/kernel: sosend: td 29 1280
/kernel: sosend: td 18 1312
/kernel: sosend: td 28 1344
/kernel: sosend: td 26 1376
/kernel: sosend: td 26 1408
/kernel: sosend: td 27 1440
/kernel: sosend: td 32 1472
/kernel: sosend: td 27 1504
/kernel: sosend: td 19 1536
/kernel: sosend: td 56 1538
/kernel: sosend: td 21 1568
/kernel: sosend: td 648 1578
/kernel: sosend: td 27 1600
/kernel: sosend: td 27 1632
/kernel: sosend: td 29 1664
/kernel: sosend: td 25 1696
/kernel: sosend: td 70 1717
/kernel: sosend: td 28 1728
/kernel: sosend: td 53 1746
/kernel: sosend: td 51 1750
/kernel: sosend: td 84 1751
/kernel: sosend: td 63 1760
/kernel: sosend: td 293 1766
/kernel: sosend: td 166 1768
/kernel: sosend: td 127 1770
/kernel: sosend: td 76 1771
/kernel: sosend: td 78 1773
/kernel: sosend: td 79 1774
/kernel: sosend: td 308 1776
/kernel: sosend: td 78 1777
/kernel: sosend: td 80 1778
/kernel: sosend: td 79 1779
/kernel: sosend: td 150 1781
/kernel: sosend: td 107 1782
/kernel: sosend: td 106 1784
/kernel: sosend: td 102 1785
/kernel: sosend: td 18 1792
/kernel: sosend: td 97 1793
/kernel: sosend: td 113 1794
/kernel: sosend: td 108 1795
/kernel: sosend: td 100 1796
/kernel: sosend: td 188 1799
/kernel: sosend: td 25 1824
/kernel: sosend: td 26 1856
/kernel: sosend: td 26 1888
/kernel: sosend: td 53 1897
/kernel: sosend: td 28 1920
/kernel: FIN len 96 nxt 1691626219 max 1691626219 seq 1691626219

total counted delay = 39596
per packet (1448 bytes) delay is .7132564500 us


More information about the freebsd-hackers mailing list