SCTP huge connect delays (at amd64) and yet another question

Thu Dec 5 13:39:05 UTC 2013

On Dec 5, 2013, at 1:30 PM, Valentin Nechayev <netch at netch.kiev.ua> wrote:

> Hi,
> 
> Thu, Dec 05, 2013 at 11:32:03, Michael.Tuexen wrote about "Re: SCTP huge connect delays (at amd64) and yet another question": 
> 
>>> The first discrepancy found is specific for FreeBSD on amd64 and not
>>> for i386 version; it's that connection setup lasts 2-4 seconds (!!)
>>> Tcpdump shows indication that could be parsed as message miss:
>> Hi Valentin,
>> 
>> could you send me the .pcap file instead of the tcpdump output.
>> I would like to see the addresses listed in the INIT and INIT-ACK.
> 
> I've sent them, thanks.
I answered...
> 
>>> tcpdump: listening on lo0, link-type NULL (BSD loopback), capture size 65535 byt
>>> es
>>> 08:18:34.639422 IP (tos 0x0, ttl 64, id 65094, offset 0, flags [none], proto SCT
>>> P (132), length 188, bad cksum 0 (->f274)!)
>>>   10.0.0.2.50025 > 127.0.0.1.2500: sctp
>> I'm wondering why 10.0.0.2 is the source address and not 127.0.0.1
> 
> I've showed the code, it doesn't make any explicit binding or address
> suggestion. For this host (9.1/i386), 10.0.0.2 resides on xl0. There
> is no routing specifics which forces it to select 10.0.0.2:
> 
> $ route -n get 127.0.0.1
>   route to: 127.0.0.1
> destination: 127.0.0.1
>  interface: lo0
>      flags: <UP,HOST,DONE,LOCAL>
> recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
>       0         0         0         0     16384         1         0
> $ telnet 127.0.0.1 25
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> 220 iv.local ESMTP Sendmail 8.14.5/8.14.5; Thu, 5 Dec 2013 13:48:31 +0200 (EET)
> ehlo zzz
> 250-iv.local Hello netch at localhost [127.0.0.1], pleased to meet you
> [...]
> 
> At least for TCP and UDP, it's quite straightforward.
There might be an issue in the SCTP stack. It does handle addresses differently
than UDP. However, I wasn't able to reproduce your problem. I need to test a
setup similar to your, which I haven't done yet.
> 
>>> At 08:18:34.639467, cookie echo was sent but likely ignored. One
>>> second later it was resent. Then, yet another strange timeout was
>>> invented before HB REQ.
>>> 
>>> Test series show this can spend more than 4 seconds, average value
>>> is about 3 seconds. Two 20-times run summary times are 58 to 63
>>> seconds, so, I've got 2.9...3.15 average connect time.
>>> 
>>> Neither Linux nor 32-bit FreeBSD shows this.
>> FreeBSD should neither... Do you see this on FreeBSD 9.2 amd64?
> 
> Yes. A fresh dump has reproduced this.
OK. Fine. This might an issue in the address handling... I'll try
to reproduce this,
> 
>>> It's definitely better than delay each run, as on other platforms
>>> (but the initial delay annoys roughly).
>> Without SCTP_NODELAY bundling can happen or not, it depends on timing.
>> It would be great, if you can provide a .pcap file for a transfer you
>> think shows some buggy behaviour. Then we can figure out what is going on.
> 
>> MSG_EOR is nothing you provide at a send() call. The flag is only
>> returned by the recvmsg() call.
> 
> Yes, I know. This has remained from the code which exposes
> SOCK_SEQPACKET specifics over different transport families (e.g.
> FreeBSD keeps this flag over AF_UNIX but Linux doesn't). I didn't take
> it into account, but, if is needed for sight clarity, I'll remove it:)
> 
>>> }
>> OK. Here is what I would expect on the wire:
>> 
>> Without SCTP_NODELAY:
>> 
>>> INIT
>> < INIT_ACK
>>> COOKIE_ECHO
>> < COOKIE_ACK
>> < DATA(abc)
>>> SACK
>> < DATA(def);DATA(ghi);DATA(jkl);DATA(mno);DATA(pqr)
>>> SACK
>>> SHUTDOWN
>> < SHUTDOWN_ACK
>>> SHUTDOWN_COMPLETE
>> 
>> There should be no substantial delay between any messages above.
>> 
>> With SCTP_NODELAY
>>> INIT
>> < INIT_ACK
>>> COOKIE_ECHO
>> < COOKIE_ACK
>> < DATA(abc)
>> < DATA(def)
>> < DATA(ghi)
>> < DATA(mno)
>> < DATA(pqr)
>>> SHUTDOWN
>> < SHUTDOWN_ACK
>>> SHUTDOWN_COMPLETE
>> 
>> There will be three SACK somewhere between the DATA chunks depending on
>> the timing.
>> 
>> There should be no substantial delay between any messages above.
>> 
>> I think if you see anything else, there is a bug. So do you see a different
>> behavior on FreeBSD 9.2 (i386/amd64)? If yes, can you provide a .pcap file?
> 
> Sorry, I don't have 9.2/i386 yet. The dump from 9.1 is attached. It
I actually don't expect a difference between 32-bit or 64-bit. I guess
it might be more related to different address setup or timing.
> has no address mess but the event sequence is following:
> 
>> INIT
> < INIT_ACK
>> COOKIE_ECHO
> < COOKIE_ACK
> < DATA(abc)
>> SACK
> < DATA(def)
> ... delay 200ms...
>> SACK
> < DATA(ghi); DATA(jkl); DATA(mno); DATA(pqr)
> 
> Comparing to your description, it has unexplained waiting after
> DATA(def) from the server side, and SACK delay from the client side.
It is timing related as described in my other mail. Is the SACK received
before the send() calls finish or vice versa...
> 
> If you think it's fixed in 9.2, we can postpone this part of
> discussion until my upgrade to 9.2.
> 
>> Do you have any special routing setup?
> 
> Just this box (9.1/i386) is trivial, no any routing specifics.
> For amd64 boxes, I've sent routing details privately. But it seems
> there are also none principally "special" these except multiple
> addresses at loopback.
> 
>> Please note, that the first SACK is returned without the 200ms delay. This is
>> required by the RFC and the above trace seems to show that.
>>> But, if server shuts its writing side down ("s" in argv[]), this
>>> laziness disappears. Again, the logic is too opaque and confusing.
>> What do you mean by this?
> 
> At least, removing this delay by shutdown(,SHUT_WR) is unexpected.
When you shutdown(,SHUT_WR) we send out pending data without waiting
for a SACK, since there will be no more data from the user. This is
shown by your attached traces and is intended.

So it seems that
* the timing is as expected for the data transmission phase
* there is an issue with setting up associations when there
  are specific addresses on loopback.

Do you agree?

Best regards
Michael
> 
> 
> -netch-
> <dump.blocking.91.i386><dump.blocking.91.i386.with_shutdown>