[SCTP] ICMP unreachable message reenables data transmit

Michael Tüxen Michael.Tuexen at lurchi.franken.de
Mon May 2 06:14:43 UTC 2011


On May 1, 2011, at 7:00 PM, Schoch Christian wrote:

> Zitat von Michael Tüxen:
> 
>> On May 1, 2011, at 1:10 PM, Schoch Christian wrote:
>> 
>>>> On Apr 30, 2011, at 12:15 PM, Schoch Christian wrote:
>>>>> 
>>>>>> On Apr 30, 2011, at 9:11 AM, Schoch Christian wrote:
>>>>>> 
>>>>>>> During a measurement with CMT-SCTP and PF i figured out, that sometimes a ICMP Destination unreachable message triggers a message transmission on an inactive data path that has been primary before.
>>>>>>> 
>>>>>>> It looks as the ICMP message is reseting the inactive state back to active without reseting RTO.
>>>>>>> 
>>>>>>> This behavior is triggered by a returning heartbeat message when no ICMP unreachable by data is sent quite before.
>>>>>>> 
>>>>>>> Test system are two multi-homed hosts with FreeBSD8.1 and a WANem host between.
>>>>>>> 
>>>>>>> A wireshark log can be provided on demand (quite large).
>>>>>> Hi Christian,
>>>>>> 
>>>>>> any chance to upgrade the FreeBSD machines to head or to use newer
>>>>>> SCTP sources, which I could provide? It would require a recompilation
>>>>>> of the kernel...
>>>>> 
>>>>> It is possible, but the results could be provided not until next week
>>>>> if a reboot is necessary.
>>>>> I can use any sources you could provide me since nothing else is done at this systems.
>>>> OK, but maybe I can try to understand what is going on.
>>>> 
>>>> How many paths do you have? One is inactive, but was primary, so it
>>>> is confirmed. On another one, you get an ICMP (which one? Port unreachable,
>>>> host unreachable, ...). Do you have more than two paths?
>>> 
>>> Setup looks like this:
>>> 
>>>     --------     ----cut---
>>> Host A        WANem          Host B
>>>     --------     ----------
>>> 
>>> Transfer is running on both path from A to B till the primary link is cut between WANem and the receiver and the whole transfer switches to the second path. The ICMP message (Host not reachable with a Heartbeat as attachment) is received on the primary interface from WANem host.
>> OK, understood.
>>> 
>>> As I tested this morning, the primary path is switching to unreachable due to the ICMP message but should be in this state quite before by exceeding path.max_retrans.
>>> So this ICMP message does two things:
>>> - Set the primary path to unreachable
>>> - Triggers something to retry data transfer on the primary path.
>> After looking at the tracefile, I somewhat agree.
>> * Do you see something like
>>  ICMP (thresh ??/??) takes interface ?? down
>>  on the console? This would be printed if the ICMP takes the
>>  path to unreachable? (It should also be in /var/log/messages)
>> * If the path is already unreachable, nothing should happen
>>  in response to the ICMP message.
>> 
> 08:04:18  kernel: ICMP (thresh 2/3) takes interface 0xc4e20510 down
> Same timestamp as the faulty start in the tracefile.
> 
>> So the question is: Is the path unreachable before the ICMP message
>> is received?
> Due to the timely difference between first retransmission and ICMP message it should be in unavailable state. But it seams that too many retransmission occur and the ICMP message is moving the path to unavailable state.
> 
> I picked my eyes to the RTO of primary path and could figure out the following:
> 
> inital state: rto.min = 100ms
> RTO = 100ms
> 
> after cutting the link:
> RTO rises to 200ms and 400ms as expected but not higher (rto.max=60000)
> 
> Another test with path_rxt_max = 1 worked as expected.
> 
> So I assume some problems with the retransmission counter when larger than 1 (something like count = 1 instead of count >= 1)
> 
>> Is your application monitoring the SCTP notification?
>> What about the above printout from the kernel?
> 
> Yes, the notifications are monitored and logged (sctp_menu) - the notification for SCTP_PEER_ADDR_CHANGE comes right after ICMP.
OK, it all makes sense using this information after looking at the 8.1
source code. They problem should also be in HEAD.

What is going on:

1. You plug the cable.
2. After the first RTO, the timer fires, (it gets doubled), some but
   not all outstanding chunks are retransmitted. Now the path it
   potentially failed and not DATA chunks are transmitted anymore.
3. Another timeout for the rest of the outstanding chunks. Now the
   error counter goes to two and the RTO is doubled.
4. An ICMP message is received.
5. The error counter is set to 4, the path becomes unreachable and
   the potentially flag is cleared. An indication to the upper layer
   is given. This is the processing of the received ICMP message.
6. Here comes the problem: The send routine, when using CMT and PF,
   only skips path which are potentially failed, not unreachable ones.
   So the path is used again. You are sending out 6 packets, then
   the CWND is used and eventually the path becomes potentially failed
   again.

What is the problem:
1. The error counter is not increased when a HB times out. Therefore
   the path does not become unreachable, even though it is.
   I'm not sure why this is a good thing, but comments in the code
   indicate that it was chosen on purpose. 
2. The CMT code should also skip paths which are unreachable. I think
   this is just a bug.

Since the potentially failed stuff will change anyway (when integrating
support for
http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-02
I will put the fix for 1 on my ToDo list.

A fix for two should be changing the file /use/src/sys/netinet/sctp_output.c.
Change around line 7990:

		/* JRI: if dest is in PF state, do not send data to it */
		if (SCTP_BASE_SYSCTL(sctp_cmt_on_off) &&
		    SCTP_BASE_SYSCTL(sctp_cmt_pf) &&
		    (net->dest_state & SCTP_ADDR_PF)) {
			goto no_data_fill;
		}
		if (net->flight_size >= net->cwnd) {
			goto no_data_fill;
		}

to

		/* JRI: if dest is in PF state, do not send data to it */
		if (SCTP_BASE_SYSCTL(sctp_cmt_on_off) &&
		    SCTP_BASE_SYSCTL(sctp_cmt_pf) &&
		    (net->dest_state & SCTP_ADDR_PF)) {
			goto no_data_fill;
		}
		if (SCTP_BASE_SYSCTL(sctp_cmt_on_off) &&
		    ((net->dest_state & SCTP_ADDR_NOT_REACHABLE) ||
		     (net->dest_state & SCTP_ADDR_UNCONFIRMED))) {
			goto no_data_fill;
		}		
		if (net->flight_size >= net->cwnd) {
			goto no_data_fill;
		}

Then you need to recompile the kernel, install it and reboot.
Please report if this fixes your issue. The I will commit a
corresponding fix to HEAD.

Best regards
Michael
> 
> Best regards,
> Christian
> 
>> Best regards
>> Michael
>>> 
>>>> The ICMP message would not reset the RTO, since you need an ACKed TSN
>>>> or a HB-ACK to to that. Since it is inactive, it is missing these.
>>>> 
>>>> Sending on an inactive path is OK, as soon as you enter the dormant
>>>> state, which means all your paths are inactive.
>>>> 
>>> Transfer is still running on second link which is active.
>> That sounds good.
>>> 
>>>> Are you using the PF support for CMT?
>>> 
>>> Yes, but without NR-SACK and DAC.
>> OK.
>>> 
>>> I uploaded the pcap file to:
>>> http://37116.vs.webtropia.com/cmt_2.pcap
>> That was helpful!
>>> 
>>> Best regards,
>>> Christian
>>> 
>>>> 
>>>> Best regards
>>>> Michael
>>>>> 
>>>>>> 
>>>>>> Are you using IPv4 or IPv6?
>>>>>> 
>>>>> 
>>>>> IPv4
>>>>> 
>>>>> 
>>>>>> Best regards
>>>>>> Michael
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Schoch Christian
>>>>>>> _______________________________________________
>>>>>>> freebsd-net at freebsd.org mailing list
>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>>>>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> freebsd-net at freebsd.org mailing list
>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>>>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> 



More information about the freebsd-net mailing list