[Bug 253848] panic: sackhint bytes rtx >= 0

Thu Feb 25 15:33:14 UTC 2021

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253848

--- Comment #1 from Richard Scheffenegger <rscheff at freebsd.org> ---

Hi Andriy,

I guess I am currently the person who has the most recent knowledge about that
part of the base stack...

Do you happen to have more (preceding) information about this, or a way to
reproduce this?

Are you running any special stack (RACK, BBR) which may have switched back to
the base stack in the middle of a loss recovery (I suspected at one point that
this may cause issues, potentially)?

Or was something done with the ipfw that may have temporarily impacted a tcp
session?

The accounting with sack_bytes_rexmit is rather old, and not touched recently
(but the sackhint struct was changed recently, and other/additional scoreboard
accounting was added).

(kgdb) p *cur
$1 = {start = 3846347980, end = 3846352300, rxmit = 3846352300, scblink =
{tqe_next = 0xfffff8013da5a220, tqe_prev = 0xfffff80754818930}}

This indicates, that the current hole in the SACK scoreboard (3 segments of
size 1440 bytes) were retransmitted  (rxmit == end), before the current
acknowledgement came back.

Thus the expectation is, that sackhint.sack_bytes_rexmit also has a value of at
least that number of bytes (4320). It is increased in tcp_output() for each
packet leaving while performing a retransmission.

But this is the peculiar part:
(kgdb) p
tp at entry->sackhint.sack_bytes_rexmit<mailto:tp at entry-%3esackhint.sack_bytes_rexmit>
$3 = -1440

Indicating negative one packet had been retransmitted before (thus subtracting
the hole, which was previously retransmitted violates the invariant). And the
only piece of code decrementing it appears to be in tcp_output() during
non-permanent error handling...

All updates to sackhint should be protected by the INPLOCK, so even if the rx
and tx paths are running on different core, the sack_bytes_rexmit should never
become negative.

The sack blocks returned indicate that (with snd.una as zero baseline, in
segments) the client knows about segments 2..34 and 35..47.

The first hole has shrunk from the right (unusual; possible when two
retransmissions were lost again, or the 3 segment originally sent, delayed by
~50 segments (unlikely).

Sorry to not being able to spot something obvious right away...

-- 
You are receiving this mail because:
You are on the CC list for the bug.