Abrupt reset sent instead of retransmitting a lost packet
hiren panchasara
hiren at strugglingcoder.info
Tue May 17 23:36:26 UTC 2016
On 05/13/16 at 10:36P, hiren panchasara wrote:
> https://people.freebsd.org/~hiren/pcaps/tcp_weird_reset.txt
> Something we saw in the wild on 10.2ish systems (server and client both).
>
> The most interesting thing can be seen at the end of the file.
>
> 3298737767:3298739215 gets lost, client tells us about it via a bunch of
> dupacks with SACK info. It SACKs all the outstanding data but this one
> missing packet. We (server) never retransmits that missing
> packet but rather decide to send a Reset after 0.312582ms. Which somehow
> causes client to pause for 75secs. (which might be another issue and not
> particularly important for this discussion.)
>
> What could cause this behavior of sending a reset instead of
> retransmitting a lost packet?
Turns out I am finding a lot of "discarded due to memory problems" in
'netstat -sp tcp' and also net.inet.tcp.reass.overflows is rapidly
increasing.
This is happening in a very low RTT env (in the range of 0.20ms) and
about 1G of b/w.
So seems like following code is where reass queue is overflowing:
(I've also confirmed with tcp debug that I am seeing this message)
In tcp_reass()
if ((th->th_seq != tp->rcv_nxt || !TCPS_HAVEESTABLISHED(tp->t_state)) &&
tp->t_segqlen >= (so->so_rcv.sb_hiwat / tp->t_maxseg) + 1) {
V_tcp_reass_overflows++;
TCPSTAT_INC(tcps_rcvmemdrop);
m_freem(m);
*tlenp = 0;
if ((s = tcp_log_addrs(&tp->t_inpcb->inp_inc, th, NULL, NULL))) {
log(LOG_DEBUG, "%s; %s: queue limit reached, "
"segment dropped\n", s, __func__);
free(s, M_TCPLOG);
}
return (0);
}
I know this is a bit older (stable/10) code but I think problem still
remains.
This is the gist of this issue:
tp->t_segqlen >= (so->so_rcv.sb_hiwat / tp->t_maxseg) + 1 - evaluating
to be true which makes us drop packets on the floor.
I've tried to restore default behavior with:
net.inet.tcp.recvbuf_max: 131072
net.inet.tcp.recvbuf_inc: 16384
net.inet.tcp.sendbuf_max: 131072
net.inet.tcp.sendbuf_inc: 16384
net.inet.tcp.sendbuf_auto: 1
net.inet.tcp.sendspace: 65536
net.inet.tcp.recvspace: 65536
net.inet.tcp.reass.overflows: 156440623
net.inet.tcp.reass.cursegments: 91
net.inet.tcp.reass.maxsegments: 557900
And the app is *not* setting SO_SNDBUF or SO_RCVBUF to keep SB_AUTOSIZE
into effect.
I was assuming the usual auto-sizing would kick in and do the right
thing where we don't run into this issue but something is amiss.
I am seeing a bunch of connections to inter-colo hosts with high Recv-Q
(close to recvbuf_max) from 'netstat -an'.
I found and old issue which seems similar:
https://lists.freebsd.org/pipermail/freebsd-net/2011-August/029491.html
I am cc'ing a few folks who've touched this code of may have some idea.
Any help is appreciated.
Cheers,
Hiren
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 603 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-transport/attachments/20160517/e4821fb1/attachment.sig>
More information about the freebsd-transport
mailing list