In case you haven't noticed this: John Nagle about fixing a
problem in TCP
Alexander at Leidinger.net
Fri Jan 20 01:22:52 PST 2006
I just found this in the comments on slashdot
The trouble with the Nagle algorithm
I really should fix the bad interaction between the "Nagle algorithm" and
"delayed ACKs". Both ideas went into TCP around the same time, and the
interaction is terrible. That fixed timer for ACKs is all wrong.
Here's the real problem, and its solution.
The concept behind delayed ACKs is to bet, when receiving some data from the
net, that the local application will send a reply very soon. So there's no
need to send an ACK immediately; the ACK can be piggybacked on the next data
going the other way. If that doesn't happen, after a 500ms delay, an ACK is
The concept behind the Nagle algorithm is that if the sender is doing very
tiny writes (like single bytes, from Telnet), there's no reason to have more
than one packet outstanding on the connection. This prevents slow links from
choking with huge numbers of outstanding tinygrams.
Both are reasonable. But they interact badly in the case where an application
does two or more small writes to a socket, then waits for a reply. (X-Windows
is notorious for this.) When an application does that, the first write
results in an immediate packet send. The second write is held up until the
first is acknowledged. But because of the delayed ACK strategy, that
acknowledgement is held up for 500ms. This adds 500ms of latency to the
transaction, even on a LAN.
The real problem is that 500ms unconditional delay. (Why 500ms? That was a
reasonable response time for a time-sharing system of the 1980s.) As
mentioned above, delaying an ACK is a bet that the local application will
reply to the data just received. Some apps, like character echo in Telnet
servers, do respond every time. Others, like X-Windows "clients" (really
servers, but X is backwards about this), only reply some of the time.
TCP has no strategy to decide whether it's winning or losing those bets.
That's the real problem.
The right answer is that TCP should keep track of whether delayed ACKs are
"winning" or "losing". A "win" is when, before the 500ms timer runs out, the
application replies. Any needed ACK is then coalesced with the next outgoing
data packet. A "lose" is when the 500ms timer runs out and the delayed ACK
has to be sent anyway. There should be a counter in TCP, incremented on
"wins", and reset to 0 on "loses". Only when the counter exceeds some number
(5 or so), should ACKs be delayed. That would eliminate the problem
automatically, and the need to turn the "Nagle algorithm" on and off.
So that's the proper fix, at the TCP internals level. But I haven't done TCP
internals in years, and really don't want to get back into that. If anyone
is working on TCP internals for Linux today, I can be reached at the e-mail
address above. This really should be fixed, since it's been annoying people
for 20 years and it's not a tough thing to fix.
The user-level solution is to avoid write-write-read sequences on sockets.
write-read-write-read is fine. write-write-write is fine. But
write-write-read is a killer. So, if you can, buffer up your little writes
to TCP and send them all at once. Using the standard UNIX I/O package and
flushing write before each read usually works.
I've looked at the webpage which is connected to this Slashdot user and they
have a "Patents" page. There a "John Nagle" is listed as the inventor of
http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137
"Oh no, not again."
-- A bowl of petunias on it's way to certain death.
More information about the freebsd-net