tcp_output starving -- is due to mbuf get delay?

Jin Guojun [DSD] j_guojun at lbl.gov
Wed Apr 9 16:12:08 PDT 2003


Some details was left behind --

    The machine is 2 GHz Intel P4 with 1 GB memory, so the delay is not from
either CPU or lack of memory.

    -Jin

"Jin Guojun [DSD]" wrote:

> When testing  GigE path that has 67 ms RTT, the maximum TCP throughput is
> limited at 250 Mb/s. By tracing the problem, I found that tcp_output() is
> starving
> where snd_wnd and snd_cwnd are fully open. The snd_cc is never filled beyond
> the 4.05MB even though the snd_hiwat is 10MB and snd_sbmax is 8MB. That is,
> sosend never stopped at sbwait. So only place can slow down is the mbuf
> allocation
> in sosend(). The attached trace file shows that each MGET and MCLGET takes
> significant time -- around 8 us at slow start time, and gradually increasing
> after that
> in an range 18 to 648 us.
> Each packet Tx on GigE takes 12 us. It average mbuf allocation takes 18 us, then
>
> the performance will be reduced to 40%, in fact it is down to 25%, which means
> higher average delay.
>
> I have change NMBCLUSTER from 2446 to 6566 to 10240, and nothing is improved.
>
> Any one can tell what factors would cause MGET / MCLGET to wait?
> Is there any way to make MGET/MCLGET not to wait?
>
>     -Jin
>
> ----------- system info -------------
>
> kern.ipc.maxsockbuf: 10485760
> net.inet.tcp.sendspace: 8388608
> kern.ipc.nmbclusters: 10240
> kern.ipc.mbuf_wait: 32
> kern.ipc.mbtypes: 2606 322 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> kern.ipc.nmbufs: 40960
>
> -------------- code trace and explanation ----------
>
> sosend()
> {
> ...
>                 if (space < resid + clen &&
>                     (atomic || space < so->so_snd.sb_lowat || space < clen)) {
>                         if (so->so_state & SS_NBIO)
>                                 snderr(EWOULDBLOCK);
>                         sbunlock(&so->so_snd);
>                         error = sbwait(&so->so_snd);        /***** never come
> down to here ****/
>                         splx(s);
>                         if (error)
>                                 goto out;
>                         goto restart;
>                 }
>                 splx(s);
>                 mp = &top;
>                 space -= clen;
>                 do {
>                     if (uio == NULL) {
>                         /*
>                          * Data is prepackaged in "top".
>                          */
>                         resid = 0;
>                         if (flags & MSG_EOR)
>                                 top->m_flags |= M_EOR;
>                     } else do {
>                         if (top == 0) {
> microtime(&t1);
>                                 MGETHDR(m, M_WAIT, MT_DATA);
>                                 if (m == NULL) {
>                                         error = ENOBUFS;
>                                         goto release;
>                                 }
>                                 mlen = MHLEN;
>                                 m->m_pkthdr.len = 0;
>                                 m->m_pkthdr.rcvif = (struct ifnet *)0;
>                         } else {
>                                 MGET(m, M_WAIT, MT_DATA);
>                                 if (m == NULL) {
>                                         error = ENOBUFS;
>                                         goto release;
>                                 }
>                                 mlen = MLEN;
>                         }
>                         if (resid >= MINCLSIZE) {
>                                 MCLGET(m, M_WAIT);
>                                 if ((m->m_flags & M_EXT) == 0)
>                                         goto nopages;
>                                 mlen = MCLBYTES;
>                                 len = min(min(mlen, resid), space);
>                         } else {
> nopages:
>                                 len = min(min(mlen, resid), space);
>                                 /*
>                                  * For datagram protocols, leave room
>                                  * for protocol headers in first mbuf.
>                                  */
>                                 if (atomic && top == 0 && len < mlen)
>                                         MH_ALIGN(m, len);
>                         }
> microtime(&t2);
> td = time_diff(&t2, &t1);
> if ((td > 5 && (++tcnt & 31) == 0) || td > 50)
>     log( ... "td %d %d\n", td, tcnt);
>
> ...
>
> } /* end of sosend */



More information about the freebsd-performance mailing list