FreeBSD Network Implementation Question

Thu Apr 21 15:43:04 PDT 2005

Rob wrote:
> 
> > You figured it out correctly.  However at that moment TCP flow control
> > would kick in and save you from local packet loss so to say.
> 
> Hi,
> 
> Thanks for the response, but you have actually confused me more.  It is
> my understanding that TCP doesn't have flow control (i.e., local to the
> node), it has congestion control, which is end-to-end across the network.

tcp_output() knows about queue overflows when it tries to send packets
and instantly gives up until ACK's get in.  This avoids real packet
drops locally and allows to retry the same packet a microseconds later
again without having to wait for the other side to signal a real packet
loss.

> So it is entirely possible to drop packets locally in this method
> with a highband width, high latency (so called "long-fat") connection.

Packets are not exactly dropped.  tcp_output() generates and adds one
packet after the other to the interface queue.  The moment this doesn't
work anymore it breaks out of that loop even it would be allowed by
the window to send more.  The failing packet is not dropped but retried
later and further packets are not produced for the time being.  It just
defers further delivery.

A local loss would be different.  There it would fail with one packet,
drop it, and try again with the next packet which may go through if the
network card was done with an earlier packet that moment.  However this
is not what happens.  Because things happens inside the kernel TCP
knows how to deal with this in the right way and to avoid any costly
recoveries from these cases.

> For example, if there were a giga-bit/second link, with a latency of
> 100 miliseconds rtt, and window scaling set to 14 (the max), tcp could
> in theory open it's congestion window up to 2^16*2^14 or 2^30 bytes,
> which could be ACK'd more quickly than the net.inet.ip.intr_queue_max
> queue would allow for, causing packets to be dropped locally.  Basically,
> the bandwidth-delay product dictates the size the buffer/queue should be,
> and in the above (extreme) example, it should be 0.1s*1Gb/s=12.5MB which
> is larger than the 50 packets at 1500 bytes each that you get
> with net.inet.ip.intr_queue_max=50.

You are mixing up the socket buffer size with the interface queue size.
Those are not the same.  TCP will converges to maximal link speed and
then is bound by the output rate of the interface even if the window
gets larger.

> In otherwords, this is the reason for the net.inet.ip.intr_queue_drops
> counter, right?  I'm surprised that more of the tuning guides don't

No, not directly.  This counter gets to work when the box is doing
routing.  Drop happen when you go from a high bandwidth link to a
low(er) bandwidth link overloading the slow link.  On a box not doing
routing you should'nt see any drops.

> suggest increasing net.inet.ip.intr_queue_max to a higher value - am I
> missing something?  The equivalent setting in Linux is 1000, and Windows
> 2k appears to be 1500 (not that this should be necessarily taken as any
> sort of endorsement).

This doesn't really help.  What happens with larger queues is just that
you fill the larger queues and it takes longer until TCP's limiting
kicks in.  The queues have to have some size to provide some smoothing
of burstiness but should not be too long to keep some direct feedback
in place.

> If my understanding is incorrect, please let me know.  In any case,
> thanks for the help (and thanks to those that have replied off list).

-- 
Andre