read() returns ETIMEDOUT
rwatson at FreeBSD.org
Mon Feb 2 10:44:21 PST 2009
On Sun, 1 Feb 2009, Mitar wrote:
> Is there any progress on this error reported:
> I have the same or very similar issue. On my server large HTTP uploads are
> failing because there are only one direction data transmissions (when
> reading/receiving a request) and kernel drops connections after some time
> with ETIMEDOUT returning from read() even if transmissions are doing just
> fine with steady speed, tested at different speeds.
> Is there any workaround currently known?
Given that some time has passed since the previous reports, it's probably best
to do a diagnosis from scratch rather than assume it's necessarily the same.
Could you send us the output of the following commands:
sysctl net.inet.tcp | grep keep
There are a number of situations in which ETIMEDOUT may be set when a
connection is dropped, so we should figure out which one(s) it may be:
(1) TCP keepalive timer fires and finds one of the following cases: the
connection isn't yet established or the keepalive timer has expired.
(2) TCP persist timer fires because the window is closed and the full
exponential backoff has occurred. (tcp_timer_persist)
(3) TCP retransmit timer reaches its full exponntial backoff without being
There are a few ways to go about this -- probably the easiest is to drop a
kernel printf just before each call to tcp_drop(tp, ETIMEDOUT) in tcp_timer.c.
It would also be useful, if possible, to look at the tcpdump for the last
portion of the connection, perhaps ideally from the second-to-last ACK from
the remote host to the connection reset from the local end. It might be worth
running tcpdump on both sides to see if they see the same thing -- for
example, does one side think it's sending ACKs and the other not receive it?
In the previous thread, it looked a bit like the outcome was that there was a
memory exhaustion issue under load, and that bumping nmbclusters helped at
least defer that problem. So it would be useful to see the output of netstat
-m before and after (for as small an epsilon as you can make it) the
connection is timed out. I realize capturing the above sorts of data can be
an issue on high-load boxes but if we can, it would be quite helpful.
Regardless of that, knowing if you're seeing allocation errors in the netstat
-m output would be helpful.
Robert N M Watson
University of Cambridge
More information about the freebsd-net