SCTP huge connect delays (at amd64) and yet another question

Thu Dec 5 12:30:27 UTC 2013

Hi,

 Thu, Dec 05, 2013 at 11:32:03, Michael.Tuexen wrote about "Re: SCTP huge connect delays (at amd64) and yet another question": 

> > The first discrepancy found is specific for FreeBSD on amd64 and not
> > for i386 version; it's that connection setup lasts 2-4 seconds (!!)
> >  Tcpdump shows indication that could be parsed as message miss:
> Hi Valentin,
> 
> could you send me the .pcap file instead of the tcpdump output.
> I would like to see the addresses listed in the INIT and INIT-ACK.

I've sent them, thanks.

> > tcpdump: listening on lo0, link-type NULL (BSD loopback), capture size 65535 byt
> > es
> > 08:18:34.639422 IP (tos 0x0, ttl 64, id 65094, offset 0, flags [none], proto SCT
> > P (132), length 188, bad cksum 0 (->f274)!)
> >    10.0.0.2.50025 > 127.0.0.1.2500: sctp
> I'm wondering why 10.0.0.2 is the source address and not 127.0.0.1

I've showed the code, it doesn't make any explicit binding or address
suggestion. For this host (9.1/i386), 10.0.0.2 resides on xl0. There
is no routing specifics which forces it to select 10.0.0.2:

$ route -n get 127.0.0.1
   route to: 127.0.0.1
destination: 127.0.0.1
  interface: lo0
      flags: <UP,HOST,DONE,LOCAL>
 recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
       0         0         0         0     16384         1         0
$ telnet 127.0.0.1 25
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
220 iv.local ESMTP Sendmail 8.14.5/8.14.5; Thu, 5 Dec 2013 13:48:31 +0200 (EET)
ehlo zzz
250-iv.local Hello netch at localhost [127.0.0.1], pleased to meet you
[...]

At least for TCP and UDP, it's quite straightforward.

> > At 08:18:34.639467, cookie echo was sent but likely ignored. One
> > second later it was resent. Then, yet another strange timeout was
> > invented before HB REQ.
> > 
> > Test series show this can spend more than 4 seconds, average value
> > is about 3 seconds. Two 20-times run summary times are 58 to 63
> > seconds, so, I've got 2.9...3.15 average connect time.
> > 
> > Neither Linux nor 32-bit FreeBSD shows this.
> FreeBSD should neither... Do you see this on FreeBSD 9.2 amd64?

Yes. A fresh dump has reproduced this.

> > It's definitely better than delay each run, as on other platforms
> > (but the initial delay annoys roughly).
> Without SCTP_NODELAY bundling can happen or not, it depends on timing.
> It would be great, if you can provide a .pcap file for a transfer you
> think shows some buggy behaviour. Then we can figure out what is going on.

> MSG_EOR is nothing you provide at a send() call. The flag is only
> returned by the recvmsg() call.

Yes, I know. This has remained from the code which exposes
SOCK_SEQPACKET specifics over different transport families (e.g.
FreeBSD keeps this flag over AF_UNIX but Linux doesn't). I didn't take
it into account, but, if is needed for sight clarity, I'll remove it:)

> > }
> OK. Here is what I would expect on the wire:
> 
> Without SCTP_NODELAY:
> 
> > INIT
> < INIT_ACK
> > COOKIE_ECHO
> < COOKIE_ACK
> < DATA(abc)
> > SACK
> < DATA(def);DATA(ghi);DATA(jkl);DATA(mno);DATA(pqr)
> > SACK
> > SHUTDOWN
> < SHUTDOWN_ACK
> > SHUTDOWN_COMPLETE
> 
> There should be no substantial delay between any messages above.
> 
> With SCTP_NODELAY
> > INIT
> < INIT_ACK
> > COOKIE_ECHO
> < COOKIE_ACK
> < DATA(abc)
> < DATA(def)
> < DATA(ghi)
> < DATA(mno)
> < DATA(pqr)
> > SHUTDOWN
> < SHUTDOWN_ACK
> > SHUTDOWN_COMPLETE
> 
> There will be three SACK somewhere between the DATA chunks depending on
> the timing.
> 
> There should be no substantial delay between any messages above.
> 
> I think if you see anything else, there is a bug. So do you see a different
> behavior on FreeBSD 9.2 (i386/amd64)? If yes, can you provide a .pcap file?

Sorry, I don't have 9.2/i386 yet. The dump from 9.1 is attached. It
has no address mess but the event sequence is following:

> INIT
< INIT_ACK
> COOKIE_ECHO
< COOKIE_ACK
< DATA(abc)
> SACK
< DATA(def)
... delay 200ms...
> SACK
< DATA(ghi); DATA(jkl); DATA(mno); DATA(pqr)

Comparing to your description, it has unexplained waiting after
DATA(def) from the server side, and SACK delay from the client side.

If you think it's fixed in 9.2, we can postpone this part of
discussion until my upgrade to 9.2.

> Do you have any special routing setup?

Just this box (9.1/i386) is trivial, no any routing specifics.
For amd64 boxes, I've sent routing details privately. But it seems
there are also none principally "special" these except multiple
addresses at loopback.

> Please note, that the first SACK is returned without the 200ms delay. This is
> required by the RFC and the above trace seems to show that.
> > But, if server shuts its writing side down ("s" in argv[]), this
> > laziness disappears. Again, the logic is too opaque and confusing.
> What do you mean by this?

At least, removing this delay by shutdown(,SHUT_WR) is unexpected.

-netch-
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dump.blocking.91.i386
Type: application/octet-stream
Size: 1772 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-net/attachments/20131205/d0ebc062/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dump.blocking.91.i386.with_shutdown
Type: application/octet-stream
Size: 1772 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-net/attachments/20131205/d0ebc062/attachment-0001.obj>