[Bug 225535] Delays in TCP connection over Gigabit Ethernet connections; Regression from 6.3-RELEASE

Mon Jan 29 14:36:09 UTC 2018

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225535

            Bug ID: 225535
           Summary: Delays in TCP connection over Gigabit Ethernet
                    connections; Regression from 6.3-RELEASE
           Product: Base System
           Version: 10.3-RELEASE
          Hardware: i386
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs at FreeBSD.org
          Reporter: aeder at list.ru

Created attachment 190162
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=190162&action=edit
cycle_clock.c

Delays in TCP connection over Gigabit Ethernet connections; Regression from
6.3-RELEASE.

Long description of usage case:

The company I'm working in is developing soft-realtime applications, with
internal cycle like 600 ms (0.6) seconds. The core of applications is working
(for functions safety) simulatenously on two computers with different
architecture (A - FreeBSD/Intel, B - Linux/PowerPC). 
A and B computers connected together via copper Gigabit switch, with no other
devices connected to the same network.

Normal cycle for application looks like this:

1. Process input from object controllers;
2. Cross-compare input between A and B computers
3. Make evaluations (simulate complex automata).
4. Produce output to send to objects controllers.
5. Cross-compare output between A and B.
6. Send output to object controllers.
7. Sleep until the end-of-cycle. 
8. Go to step 1

Cross-compare part is done using tcp connection over gigabit ethernet, between
A and B computers.
If A or B do not able to handle all operations in appropriate time (600 ms +/-
150 ms) the whole system halts due to internal time checks.

Yes, may be using UDP packets may be better - but even in this configuration,
last released version of hardware is working just fine - uptime ~ 1 year per
installation, approximatelly 100 installations worldwide. Moreover, no cases of
halts due to internal time checks was found - some other defects, software or
hardware, was causing rare halts.

So, the old release use A industrial computer,

CPU: Intel(R) Core(TM)2 Duo CPU     L7400  @ 1.50GHz (1500.12-MHz 686-class
CPU)
  Origin = "GenuineIntel"  Id = 0x6fb  Stepping = 11

Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0xe3bd<SSE3,RSVD2,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM>
  AMD Features=0x20100000<NX,LM>
  AMD Features2=0x1<LAHF>
  Cores per package: 2
....
em0: <Intel(R) PRO/1000 Network Connection Version - 6.7.2> port 0xdf00-0xdf1f
mem 0xff880000-0xff89ffff,0xff860000-0xff87ffff irq 16 at device 0.0 on pci4
em0: Ethernet address: 00:30:64:09:6f:ee
....

FreeBSD fspa2 6.3-RELEASE FreeBSD 6.3-RELEASE #0: Wed Jan 16 04:45:45 UTC 2008 
   root at dessler.cse.buffalo.edu:/usr/obj/usr/src/sys/SMP  i386

and this configuration works just fine. 
I have made test application simulating main cycle of the complex system, and
maximum delay is 24 ms for the 1 million cycles.

Test application (running on two different computers) is working like this:

1. Establish tcp connection (one instance listen, another connect).

2. send() small fixed amount of data from both sides
3. recv() small fixed data.
4. send() 40 K of data from both sides
5. recv() 40 K of data on both sides
6. Perform complex, but useless computation (simulate actual application load).
7. Calculate how much to nanosleep() until end-of-cycle, call nanosleep().

go on step 2

------------------
every operation is guarded with clock_gettime() and duration of send(), recv(),
evaluate() and nanosleep() is printed out.

I will attach test application sources to this ticket.

For FreeBSD 6.3-RELEASE, output looks like this - running between two identical
computers with FreeBSD-6.3 installed (only 2-4 columns is shown):

fspa2# grep times fspa2_fspa2_vpu.txt | awk '{print $3 " " $4 " " $6 " " $8 " "
$10}' | sort | uniq -c
1115322 send_sync 0  0 0 0
7425    send_sync 0  0 0 1
73629   send_sync 0  1 0 0
   1    send_sync 0 13 0 0
  66    send_sync 0  2 0 0
   1    send_sync 0 24 0 0
  27    send_sync 0  3 0 0
  17    send_sync 0  4 0 0

As you can see, the maximum delay is 24 milliseconds (0.024 s) and it's
happends only once.

So now the real problem: using FreeBSD 10.3-RELEASE and much more powerfull
hardware (Moxa DA-820 industrial computer):

CPU: Intel(R) Core(TM) i7-3555LE CPU @ 2.50GHz (2494.39-MHz 686-class CPU)
  Origin="GenuineIntel"  Id=0x306a9  Family=0x6  Model=0x3a  Stepping=9

Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>

Features2=0x7fbae3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
  AMD Features=0x28100000<NX,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  Structured Extended Features=0x281<FSGSBASE,SMEP,ERMS>
  XSAVE Features=0x1<XSAVEOPT>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
  TSC: P-state invariant, performance statistics

em0: <Intel(R) PRO/1000 Network Connection 7.6.1-k> port 0x6600-0x661f mem
0xc0a00000-0xc0a1ffff,0xc0a27000-0xc0a27fff irq 20 at device 25.0 on pci0
em0: Using an MSI interrupt
em0: Ethernet address: 00:90:e8:69:ea:3c

root at fspa2:~/clock/new_res # uname -a
FreeBSD fspa2 10.3-RELEASE FreeBSD 10.3-RELEASE #0 r297264: Fri Mar 25 03:51:29
UTC 2016     root at releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  i386

and the same test application we get (running between two identical devices,
both Moxa DA-820 with FreeBSD-10.3):

root at fspa2:~/clock/new_res # grep times  direct_1.txt  | awk '{print $3 " " $4
"
 " $6 " " $8 " " $10 ;}' | sort | uniq -c
155042 send_sync 0 0 0 0
122890 send_sync 0 0 0 1
   1 send_sync 0 0 0 100
   1 send_sync 0 0 0 102
   1 send_sync 0 0 0 111
   1 send_sync 0 0 0 12
   1 send_sync 0 0 0 125
   2 send_sync 0 0 0 13
   1 send_sync 0 0 0 130
   1 send_sync 0 0 0 131
   1 send_sync 0 0 0 133
   2 send_sync 0 0 0 136
   1 send_sync 0 0 0 148
   2 send_sync 0 0 0 149
   3 send_sync 0 0 0 150
   1 send_sync 0 0 0 156
   1 send_sync 0 0 0 159
   1 send_sync 0 0 0 16
   1 send_sync 0 0 0 161
   1 send_sync 0 0 0 17
   1 send_sync 0 0 0 176
   1 send_sync 0 0 0 19
  18 send_sync 0 0 0 2
   1 send_sync 0 0 0 229
   1 send_sync 0 0 0 23
   1 send_sync 0 0 0 24
   2 send_sync 0 0 0 25
   1 send_sync 0 0 0 26
   1 send_sync 0 0 0 28
   1 send_sync 0 0 0 282
   1 send_sync 0 0 0 29
  14 send_sync 0 0 0 3
   1 send_sync 0 0 0 30
   1 send_sync 0 0 0 31
   1 send_sync 0 0 0 32
   1 send_sync 0 0 0 37
   1 send_sync 0 0 0 38
  14 send_sync 0 0 0 4
   1 send_sync 0 0 0 40
   1 send_sync 0 0 0 41
   1 send_sync 0 0 0 43
   1 send_sync 0 0 0 45
   2 send_sync 0 0 0 46
   4 send_sync 0 0 0 49
  16 send_sync 0 0 0 5
   2 send_sync 0 0 0 53
   2 send_sync 0 0 0 57
   4 send_sync 0 0 0 58
  14 send_sync 0 0 0 59
  17 send_sync 0 0 0 6
  20 send_sync 0 0 0 60
  16 send_sync 0 0 0 61
   8 send_sync 0 0 0 62
   4 send_sync 0 0 0 63
   1 send_sync 0 0 0 64
   1 send_sync 0 0 0 67
   1 send_sync 0 0 0 68
   9 send_sync 0 0 0 7
   1 send_sync 0 0 0 70
   1 send_sync 0 0 0 72
   1 send_sync 0 0 0 79
   5 send_sync 0 0 0 8
   3 send_sync 0 0 0 80
   1 send_sync 0 0 0 81
   1 send_sync 0 0 0 82
   1 send_sync 0 0 0 84
   1 send_sync 0 0 0 89
   3 send_sync 0 0 0 9
   1 send_sync 0 0 0 90
   1 send_sync 0 0 0 93
   1 send_sync 0 0 0 95
   1 send_sync 0 0 0 97
 147 send_sync 0 1 0 0
   1 send_sync 0 33 0 0

As you can see, with only 300.000 cycles (1.100.000 cycles in old
configuration) we have multiple cases of long and very long delays, including
delay for 229 ms.

The only difference from standart OS configuration is 

sysctl kern.timecounter.alloweddeviation=0

because without it, nanosleep() return with random error up to 4%.

============================================
I have tryed everything that I can think of:

1. Tweaking em0 syscontrols - do not help. In polling mode, works much worse.
2. Replace intel em0 devices with Realtec re0 device - slightly better, but
stil produce significant delays.
3. Tweaking kernel syscontrols - disabling new rfc compatibility - do not help.
4. Replace cables and switch to different model just from the box - do not
help.

If anybody have any ideas how to fix it, please comment it here.

-- 
You are receiving this mail because:
You are the assignee for the bug.