RE: git: 66605ff791b1 - main - tcp: Undo the increase in sequence number by 1 due to the FIN flag in case of a transient error.

From: Scheffenegger, Richard <Richard.Scheffenegger_at_netapp.com>
Date: Tue, 19 Jul 2022 10:28:09 UTC
Hi John,

No, we don't think so. The symptoms fixed by this are markedly different when performing error-injection to exercise these codepaths, than what we have observed with additional logging from the systems reported in bug 264257...

Here, we fix an issue when a data segment + FIN is retransmitted multiple times, the left edge of the segment moves right (leaving a gap, which the receiver would have to request again; or in the absence of SACK, make no further progress until a full timeout occurs). Certainly a nuisance and incorrect behavior, but unlikely to be the actual root cause of bug264257... 

Michael is currently improving TCP blackbox logging in the base stack, and providing this to the people affected, to find out why the TCPCB variables become erraneous.

This because even while we have extracted effectively full packet captures (only lacking proper timing information) in 3 instances, the problem can not be recreated yet.

From prior logging we know, that on (very) busy servers, these state variables become incorrect much most frequently that expected - but typically without any ill effects.

Conceptually, the base TCP stack can end up in a state, where there are multiple FIN bits - each with distinct sequence numbers - get sent after the conclusion of sending actual data in the session: <SYN>[data]<FIN><FIN><FIN>

(I've seen one logged instance, where 6 consecutive <FIN>s appear to have been transmitted).

Bug264257 is really due to the new "SACK rescue retransmission" feature, which was made active in 13.1, exposing these preexisting, unexpected behavior.

Note that other Stacks (e.g. RACK stack) is not affected by this at all, as there it is made sure that all outstanding data is ACKed by the receiver, prior to sending out the <FIN>.  Also, data + FIN segments are not sent by the RACK stack.


Best regards,
   Richard


-----Original Message-----
From: John Baldwin <jhb@FreeBSD.org> 
Sent: Freitag, 15. Juli 2022 19:51
To: Richard Scheffenegger <rscheff@FreeBSD.org>; src-committers@FreeBSD.org; dev-commits-src-all@FreeBSD.org; dev-commits-src-main@FreeBSD.org
Subject: Re: git: 66605ff791b1 - main - tcp: Undo the increase in sequence number by 1 due to the FIN flag in case of a transient error.

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




On 7/15/22 9:36 AM, Richard Scheffenegger wrote:
> The branch main has been updated by rscheff:
>
> URL: https://cgit.FreeBSD.org/src/commit/?id=66605ff791b12a2c3bb4570379db0e14d29fca4c
>
> commit 66605ff791b12a2c3bb4570379db0e14d29fca4c
> Author:     Richard Scheffenegger <rscheff@FreeBSD.org>
> AuthorDate: 2022-07-14 00:49:10 +0000
> Commit:     Richard Scheffenegger <rscheff@FreeBSD.org>
> CommitDate: 2022-07-14 01:18:19 +0000
>
>      tcp: Undo the increase in sequence number by 1 due to the FIN flag in case of a transient error.
>
>      If an error occurs while processing a TCP segment with some data and the FIN
>      flag, the back out of the sequence number advance does not take into account the
>      increase by 1 due to the FIN flag.
>
>      Reviewed By: jch, gnn, #transport, tuexen
>      Sponsored by: NetApp, Inc.
>      Differential Revision: https://reviews.freebsd.org/D2970

Is this the source of bug 264257?

--
John Baldwin