Siftr inflight byte question

Mon May 6 10:37:34 UTC 2013

[ccing freebsd-net@ so my problem description enters the collective
subconscious in case I forget about this again]

For everyone tuning in, Aris asked me the apt question of why siftr(4)'s
"# inflight bytes" field doesn't take into account sacked bytes.

On 05/06/13 18:45, Angelogiannopoulos, Aris wrote:
> On 05/06/13 18:31, Lawrence Stewart wrote:
>> On 05/06/13 18:18, Angelogiannopoulos, Aris wrote:
>>> I noticed that the sent_inflight_bytes variable doesn’t take
>>> into account the SACKd bytes in case SACK is used.
>>> 
>>> Is there a specific reason for this?
>> 
>> There is a very good reason - I was too lazy at the time ;)
>> 
>> More specifically, calculating the number of bytes SACKed by the
>> receiver is more difficult than pulling simple variables out of the
>> TCP control block and I was new to the TCP stack when I wrote
>> SIFTR. I've simply never bothered to go back and improve things in
>> this regard.
> 
> isn't it as simple as (tp->snd_nxt - tp->snd_fack) +
> tp->sackhint.sack_bytes_rexmit ? This is the way the stack is doing
> it in case of more than three duplicate acks.

You would hope so given the comment accompanying that calculation in
tcp_do_segment(), but I'm pretty sure that when I looked into this many
moons ago, I found that calculation to be incorrect. I should obviously
have gotten to the bottom of the problem back then, but from memory it
was not crucial to what I was working on at the time and I subsequently
forgot about it. It's good to come back to this now as it should be fixed.

>>> Should we change it to better reflect the conditions on the
>>> channel?
>> 
>> Ideally, yes. Feel like having a go and submitting a patch? I can
>> give you some hints on how to do it if you want a concrete starting
>> point.
> 
> Sure why not?

My write up below is partially from memory and from a 2 min look at the
code just now i.e. parts may be incorrect - please use it as a starting
point only...

So snd_fack effectively acts as a proxy for snd_una while SACK recovery
is happening i.e. it tracks the highest sequence number we've been
cumulatively SACKed for (left edge of the window).

sack_bytes_rexmit counts the number of bytes we've retransmitted from
the known holes during the current recovery episode.

snd_max is the highest sequence number we've sent (right edge of the
window).

Another important bit of info is that the sender-side sack scoreboard
stores holes (bytes missing at the receiver), whereas the sack blocks
that come in on the wire refer to received byte ranges (bytes
successfully received at the receiver).

Therefore from the senders perspective, I believe the "pseudo"
calculation for bytes in flight during a SACK recovery episode should
look something like:

(snd_max - snd_fack) - total_hole_bytes + sack_bytes_rexmit

3 of those variables are already around and (hopefully) usable, but I
don't believe we currently track a variable equivalent to total_hole_bytes.

Hopefully someone will chime in and correct me if any of the above is
misguided. Otherwise, your challenge, should you choose to accept it, is
to patch SIFTR to perform the above calculation, and to verify that the
reported result is correct - perhaps by running some controlled lab
experiments where you can review the value reported by SIFTR against the
known SACK blocks and sequence numbers on the wire during the recovery
episode.

You could choose to make total_hole_bytes a sackhint like
sack_bytes_rexmit, but I suggest you create a first version of the patch
which manually walks the sack scoreboard each time through
siftr_siftdata() to sum the # bytes in each hole to ensure you don't
inadvertently introduce accounting bugs while developing the patch.
Patch v2 can include total_hole_bytes as a sackhint.

Cheers,
Lawrence