SCTP huge connect delays (at amd64) and yet another question

Valentin Nechayev netch at netch.kiev.ua
Thu Dec 5 08:41:56 UTC 2013


Hi,

I've got some test results which are surprising and I would get
a clarification.

A simple connection is created between two one-to-one SCTP sockets
(AF_INET, SOCK_STREAM, IPPROTO_SCTP) at loopback (127.0.0.1). The
server side sends 6 3-byte messages to client side and optionally
designates writing shutdown. Client receives all them and measures
a time before each receiving.
Code is showed at the end of this message.
Tested systems are:
* FreeBSD 9.2-release/amd64
* FreeBSD 9.1-release/amd64
* FreeBSD 9.1-release/i386
* Linux OpenSuSE 12.2, kernel 3.4.63-2.44-default, x86_64
* Linux RHEL 6.3, kernel 2.6.32-279.22.1.38.0.el6.x86_64

The first discrepancy found is specific for FreeBSD on amd64 and not
for i386 version; it's that connection setup lasts 2-4 seconds (!!)
  Tcpdump shows indication that could be parsed as message miss:

tcpdump: listening on lo0, link-type NULL (BSD loopback), capture size 65535 byt
es
08:18:34.639422 IP (tos 0x0, ttl 64, id 65094, offset 0, flags [none], proto SCT
P (132), length 188, bad cksum 0 (->f274)!)
    10.0.0.2.50025 > 127.0.0.1.2500: sctp
        1) [INIT] [init tag: 3943463987] [rwnd: 1864135] [OS: 10] [MIS: 2048] [i
nit TSN: 3475830004]
08:18:34.639450 IP (tos 0x0, ttl 64, id 42621, offset 0, flags [none], proto SCT
P (132), length 524, bad cksum 0 (->48ee)!)
    127.0.0.1.2500 > 10.0.0.2.50025: sctp
        1) [INIT ACK] [init tag: 59811639] [rwnd: 1864135] [OS: 10] [MIS: 2048]
[init TSN: 466863335]
08:18:34.639467 IP (tos 0x0, ttl 64, id 52783, offset 0, flags [none], proto SCT
P (132), length 424, bad cksum 0 (->21a0)!)
    10.0.0.2.50025 > 127.0.0.1.2500: sctp
        1) [COOKIE ECHO]
08:18:35.639618 IP (tos 0x0, ttl 64, id 12109, offset 0, flags [DF], proto SCTP
(132), length 424, bad cksum 0 (->8082)!)
    10.0.0.2.50025 > 127.0.0.1.2500: sctp
        1) [COOKIE ECHO]
08:18:36.692628 IP (tos 0x0, ttl 64, id 48682, offset 0, flags [DF], proto SCTP
(132), length 76, bad cksum 0 (->7e01)!)
    127.0.0.1.2500 > 127.0.0.1.50025: sctp
        1) [HB REQ]
08:18:36.692668 IP (tos 0x0, ttl 64, id 10809, offset 0, flags [DF], proto SCTP (132), length 76, bad cksum 0 (->86f2)!)
    10.0.0.2.50025 > 127.0.0.1.2500: sctp
        1) [HB ACK] 
08:18:36.692707 IP (tos 0x2,ECT(0), ttl 64, id 16588, offset 0, flags [DF], proto SCTP (132), length 52, bad cksum 0 (->fb75)!)
    127.0.0.1.2500 > 127.0.0.1.50025: sctp
        1) [DATA] (B)(E) [TSN: 466863335] [SID: 0] [SSEQ 0] [PPID 0x0] [Payload:
        0x0000:  6162 63                                  abc
[...]

At 08:18:34.639467, cookie echo was sent but likely ignored. One
second later it was resent. Then, yet another strange timeout was
invented before HB REQ.

Test series show this can spend more than 4 seconds, average value
is about 3 seconds. Two 20-times run summary times are 58 to 63
seconds, so, I've got 2.9...3.15 average connect time.

Neither Linux nor 32-bit FreeBSD shows this.

The second discrepancy is well known case of so-called "Nagle"
algorithm adapted for SCTP but details are confusing. If
SCTP_NODELAY isn't turned on on server side, tcpdump shows that the
second packet is sent from sender side without delay, but receiver's
SACK is delayed for 200 ms by default. These results are identical for
FreeBSD (32 bit) and Linux, but not amd64 FreeBSD (see below). But
why? A common sense suggests that, if client receives all immediately,
and server has already prepared its data, no additional delay shall be
invented. In analogue to TCP, I would expect that, until acknoledge
for "abc" is got, "def" isn't sent, but then the latter is sent
immediately.

09:28:11.374335 IP (tos 0x2,ECT(0), ttl 64, id 24204, offset 0, flags [DF], prot
o SCTP (132), length 52, bad cksum 0 (->ddb5)!)
    127.0.0.1.2500 > 127.0.0.1.41007: sctp
        1) [DATA] (B)(E) [TSN: 183313025] [SID: 0] [SSEQ 0] [PPID 0x0] [Payload:
        0x0000:  6162 63                                  abc
09:28:11.374349 IP (tos 0x0, ttl 64, id 522, offset 0, flags [none], proto SCTP 
(132), length 48, bad cksum 0 (->7a3e)!)
    127.0.0.1.41007 > 127.0.0.1.2500: sctp
        1) [SACK] [cum ack 183313025] [a_rwnd 1863876] [#gap acks 0] [#dup tsns 
0] 
09:28:11.374368 IP (tos 0x2,ECT(0), ttl 64, id 64629, offset 0, flags [DF], prot
o SCTP (132), length 52, bad cksum 0 (->3fcc)!)
    127.0.0.1.2500 > 127.0.0.1.41007: sctp
        1) [DATA] (B)(E) [TSN: 183313026] [SID: 0] [SSEQ 1] [PPID 0x0] [Payload:
        0x0000:  6465 66                                  def
09:28:11.573780 IP (tos 0x0, ttl 64, id 12179, offset 0, flags [none], proto SCT
P (132), length 48, bad cksum 0 (->4cb5)!)
    127.0.0.1.41007 > 127.0.0.1.2500: sctp
        1) [SACK] [cum ack 183313026] [a_rwnd 1864135] [#gap acks 0] [#dup tsns 
0] 

But, if server shuts its writing side down ("s" in argv[]), this
laziness disappears. Again, the logic is too opaque and confusing.

64-bit (amd64) FreeBSD shows another behavior (both 9.1 and 9.2): in
addition to setup delay (see above), the delay between 2nd and 3rd
received packet (case SCTP_NODELAY isn't activated) could be longer
than minimally needed one and spreads between a few hundreds of
microseconds up to full 0.2 second delay shown on other platforms.
In average, 1/8 of runs show this delay:

$ fgrep ghi ll | sort -rn -k2,2 -t= | uniq -c
   1 got: ghi (with MSG_EOR) tdiff=200835
   1 got: ghi (with MSG_EOR) tdiff=200829
   1 got: ghi (with MSG_EOR) tdiff=200826
   1 got: ghi (with MSG_EOR) tdiff=200822
   1 got: ghi (with MSG_EOR) tdiff=200819
   1 got: ghi (with MSG_EOR) tdiff=200800
   1 got: ghi (with MSG_EOR) tdiff=200792
   1 got: ghi (with MSG_EOR) tdiff=199885
   1 got: ghi (with MSG_EOR) tdiff=163816
   1 got: ghi (with MSG_EOR) tdiff=55849
   1 got: ghi (with MSG_EOR) tdiff=1825
  21 got: ghi (with MSG_EOR) tdiff=2
  38 got: ghi (with MSG_EOR) tdiff=1

It's definitely better than delay each run, as on other platforms
(but the initial delay annoys roughly).

The testing code:
===
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/poll.h>
#include <sys/time.h>
#include <netinet/in.h>
#include <netinet/sctp.h>
#include <arpa/inet.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <err.h>

#define PORT 2500

int main(int argc, char *argv[])
{
  int s_li, s_ac, s_cl;
  struct sockaddr_in sia;
  struct iovec iov[1];
  struct msghdr msg;
  socklen_t slen;
  struct timeval tv0, tv1;
  int tdiff;
  int i;

  s_li = socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP);
  if (s_li < 0)
    err(1, "socket");
  memset(&sia, 0, sizeof(sia));
  sia.sin_family = AF_INET;
  sia.sin_addr.s_addr = htonl(0x7F000001);
  sia.sin_port = htons(PORT);
  if (bind(s_li, (struct sockaddr*)&sia, sizeof(sia)) < 0)
    err(1, "bind");
  if (listen(s_li, 1) < 0)
    err(1, "listen");
  s_cl = socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP);
  if (s_cl < 0)
    err(1, "socket");
  if (connect(s_cl, (struct sockaddr*)&sia, sizeof(sia)) < 0)
    err(1, "connect");
  slen = sizeof(sia);
  s_ac = accept(s_li, (struct sockaddr*) &sia, &slen);
  if (s_ac < 0)
    err(1, "accept");
  for (i = 1; i < argc; ++i) {
    if (!strcmp(argv[i], "nn")) {
      const int one = 1;
      if (setsockopt(s_ac, IPPROTO_SCTP, SCTP_NODELAY, &one, sizeof(one)) < 0)
        warn("setsockopt(SCTP_NODELAY)");
    }
  }
  if (send(s_ac, "abc", 3, 0) != 3)
    err(1, "send");
  if (send(s_ac, "def", 3, MSG_EOR) != 3)
    err(1, "send");
  if (send(s_ac, "ghi", 3, 0) != 3)
    err(1, "send");
  if (send(s_ac, "jkl", 3, MSG_EOR) != 3)
    err(1, "send");
  if (send(s_ac, "mno", 3, 0) != 3)
    err(1, "send");
  if (send(s_ac, "pqr", 3, MSG_EOR) != 3)
    err(1, "send");
  for (i = 1; i < argc; ++i) {
    if (!strcmp(argv[i], "s"))
      shutdown(s_ac, SHUT_WR);
  }
  for(;;) {
    char buf[1024];
    memset(&msg, 0, sizeof(msg));
    iov[0].iov_base = buf; iov[0].iov_len = sizeof(buf) - 1;
    msg.msg_iov = iov; msg.msg_iovlen = 1;
    gettimeofday(&tv0, NULL);
    ssize_t got = recvmsg(s_cl, &msg, 0);
    gettimeofday(&tv1, NULL);
    tdiff = (int)tv1.tv_usec - (int)tv0.tv_usec;
    if (tdiff < 0)
      tdiff += 1000000;
    if (got == 0)
      break;
    if (got == -1) {
      perror("recvmsg");
      break;
    }
    buf[got] = 0;
    printf("got: %s (%s MSG_EOR) tdiff=%d\n",
        buf,
        (msg.msg_flags & MSG_EOR) ? "with" : "without",
        tdiff);
    if (!strncmp(buf, "pqr", 3))
      break;
  }
  return 0;
}
// vim:ts=2:sts=2:sw=2:et:si:
===


-netch-


More information about the freebsd-net mailing list