[Bug 263445] [tcp] Fatal trap 12: page fault while in kernel mode // supervisor read data, page not present // 13.1-RC3

From: <bugzilla-noreply_at_freebsd.org>
Date: Fri, 10 Jun 2022 22:18:37 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=263445

--- Comment #24 from Richard Scheffenegger <rscheff@freebsd.org> ---
The current thinking is, that SACK rescue retransmissions (in FBSD13 this is
gated by net.inet.tcp.rfc6675_pipe=1) very rarely creates an entry, which
apparently is beyond the valid data range. 

While under most common circumstances, a final FIN bit in the sequence space is
taken care of, it seems that there may be some double-counting for the FIN bit.

In most of the inspected cores, we found:

TCP state: LAST_ACK (FIN received and also FIN sent)
SACK loss recovery triggered
A cumulative ACK before all outstanding data was received
The remote cliet "disappears" for a significant amount of time (7 to 12
retransmission timeouts), but may re-appear again just prior.
snd_max consistently 2 counts above the last data, instead of the expected 1
(for the FIN bit).

However, it is still unclear under what circumstances this double-counting
happens, possibly when the persist timer triggers, and a few other conditions
are also fulfilled - maybe a race condition between normal packet processing
and a timer firing.

In short: disabling rfc6675 enhanced SACK features (more correct pipeline
accounting, rescue retransmissions) should address the cause of the panic,
while not addressing the root cause of when/why there is the double-accounting
of the FIN bit...

Would you be willing to run an intrumented kernel, which either panics (full
core dump), or spews out various state, when inconsistencies are detected in
this space - while ignoring/addressing them "on the fly" without panicing?

-- 
You are receiving this mail because:
You are the assignee for the bug.