[tcpm] ECN++

Bob Briscoe in at bobbriscoe.net
Thu Feb 13 18:58:30 UTC 2020


See inline tagged [BB]...

On 25/01/2020 10:41, Scheffenegger, Richard wrote:
> Hi Group, Marcelo, Bob,
> Another update in this context, which IMHO may be discussed as an actual
> change of mechanism with ECN++.
> I was looking into the very poor interaction of ECN between a Linux
> client and a BSD server, with request-response workload. That is, where
> each side sends out less than MSS data, before the application waits for
> the other side to respond to this.
> Neal pointed out this statement in RFC3168:
>    ...the TCP sender sets the CWR flag in
>    the TCP header of the first new data packet sent after the window
>    reduction.  ...
>    When the TCP data sender is ready to set the CWR bit after reducing
>    the congestion window, it SHOULD set the CWR bit only on the first
>    new data packet that it transmits.
> However, BSD is sending out the CWR as soon as possible - while Linux
> interprets the SHOULD overly strictly (IMHO) and ignores CWR unless it
> is received with (new) data.
[BB] Assuming the word '(new)' is optional, I think you're implying that 
BSD would set CWR on a pure ACK if that were its first packet after it 
received ECE feedback. Does BSD also set CWR on the first data packet 
after that (if any)?

I think RFC3168 expects CWR only on data packets - so that the sender of 
the CWR can distinguish between ECE that the receiver sends before vs 
after the receiver got the CWR (by whether the ackno of the ECE covers 
the CWR data packet or not).

Consider 4 data packet exchanges of A>B, B>A, B>A, A>B with CWR on pure 

     A>>>B Data#1 <CE in transit>
     A<<<B ACK#1 ECE
                             ...potentially quiet for a time...
     A<<<B Data#101 ECE ACK#1
     A>>>B ACK#101 *CWR*
                             ...potentially quiet for a time...
     A<<<B Data#102 ECE ACK#1
     A>>>B ACK#102
                             ...potentially quiet for a time...
     A>>>B Data#2 *CWR* ACK#102
     A<<<B ACK#2

'A' doesn't know whether the ECE on Data#102 was sent before or after B 
received the CWR on A's pure ACK, so 'A' doesn't know whether to reduce 
its window again or not.

I can't find anything in RFC3168 that explicitly spells out when the 
sender considers ECE to be in a new round. It jumps straight to 
describing the exceptional case of a CWR packet being dropped, and omits 
the 'normal' case of it being delivered.

This seems to be missing - perhaps it ought to be added by an erratum.

> But binding the CWR flag to a new data segment delays the ECN signaling
> loop artificially (for long runs of unidirectional transmitted data),
> and it is not clear what the benefit there would be, as the CWR flag is
> not retransmitted anyway (thus not bound to a point in the sequence
> number space).
[BB] Surely long runs of unidirectional transmitted data don't exhibit 
this problem, 'cos there's plenty of new data to carry the CWR. Or have 
I misunderstood?

In fact, the problem I see with RFC3168 is the opposite case. It seems 
there was an assumption that a data sender would be continually sending 
data, so that, once ECE feedback appeared at the sender, it would 
conveniently always have some data to send, on which CWR could be carried.

For instance, in the sequence above, host A might not send Data#2 for 
ages or perhaps never (a typical case if 'A' is a client requesting a 
large object). In the intervening time, B might send far more than the 
two packets shown. If 'A' does not set CWR on pure ACKs, all B's data 
packets would have to carry ECE, perhaps for many hours, until A has 
some more data to send (if ever).

Nontheless, I think ECE continuing for hours is fine within the logic of 
RFC3168. While A isn't sending anything, it only reduces cwnd once, and 
it's not measuring any round trips, so it's not increasing cwnd either.

Can you describe your case more precisely, so I can understand what 
caused the performance hit?

> I therefore propose a change in the Generalized ECN draft, to lift the
> above restriction (while it is "only" a SHOULD, this is one more example
> of an overly strict receiving-side implementation), and no longer
> artificially delay the CWR signal - to become also more useful for
> passive measurements.
[BB] I'm not yet convinced that this CWR behaviour is anything to do 
with the ECN++ draft. But that might be because I've misunderstood your 
description. As I said above, it might be possible to rectify omissions 
with an erratum to RFC3168.


> Richard
> For those interested: The effect of ignoring the CWR on non-new-data
> segments by Linux is, that the ECE flag is left latched. Thus BSD
> continues window-after-window with cwnd reductions, 
[BB] If it's not sending new data, how does the BSD host consider that 
windows are starting or completing?


> and due to another
> bug where the ECN-induced reduction has no lower bound, eventually cwnd
> ends up at 0 Byte and is only increased to 1 Byte by a Timer - until by
> pure chance, the CWR is sent together with 1 new byte of data. But in
> the preceeding minutes, the session only saw progress by less than 1
> byte / RTT...
> Am 15.01.2020 um 21:42 schrieb Scheffenegger, Richard:
>> Hi,
>> Yet another interesting observation – as fbsd currently doesn’t refrain
>> from marking SACK-retransmission to be not-ECT, you can actually end up
>> getting a CE mark on a retransmission across a ECN-enabled congestion
>> point.
>> Obviously this is better than loss…
>> What happens next is, that fbsd "honors" that ECE mark, since it is in
>> loss-recovery, not congestion-recovery. It adjusts the recovery_point to
>> the current snd_max (rightmost sent segment), and adjusts ssthresh and
>> cwnd by multiplicative decrease factor...
>> Furthermore, it appears that it also resets the traversal of the SACK
>> scoreboard (incidential a "good" thing, as a few earlier retransmissions
>> also got dropped, not marked, and are being resent without an RTO).
>> But in the context of ECN++, what would be the expected response here?
>> I assume, that with the exception of the fresh traversal of the SACK
>> scoreboard, the above steps seem sensible.
>> Any thoughts on this interesting interaction between ECE (during SACK
>> loss recovery)?
>> Best regards,
>>     Richard
>> Am 10.01.2020 um 01:08 schrieb Scheffenegger, Richard:
>>> Marcelo, Bob,
>>> I just noted that there is a slight oversight in FreeBSD currently,
>>> which results in all session that are simultaneously ECN-enabled and
>>> SACK-permitted to effectively send out retransmissions with the ECT0
>>> codepoint.
>>> Strictly speaking, this is in violation of RFC3168, but might also be a
>>> good (nearly a decade long) validation of the performance of ECN++ for
>>> all types of data segments (new and retransmitted ones), although at a
>>> low and implicit exposure...
>>> On that note, since I think ECN++ is quite valuable (with a number of
>>> published research finding this change to be crucial), perhaps you can
>>> summarize the outstanding issues (other than more reviewers required; I
>>> admit I still haven't gone through all the delta between -04 and -05).
>>> Best regards,
>>>    Richard
>>> _______________________________________________
>>> tcpm mailing list
>>> tcpm at ietf.org
>>> https://www.ietf.org/mailman/listinfo/tcpm
>> _______________________________________________
>> tcpm mailing list
>> tcpm at ietf.org
>> https://www.ietf.org/mailman/listinfo/tcpm
> _______________________________________________
> tcpm mailing list
> tcpm at ietf.org
> https://www.ietf.org/mailman/listinfo/tcpm

Bob Briscoe                               http://bobbriscoe.net/

More information about the freebsd-transport mailing list