Network anomalies after update from 11.2 STABLE to 12.1 STABLE
Paul
devgs at ukr.net
Fri Oct 18 12:57:22 UTC 2019
Our current version is:
FreeBSD 11.2-STABLE #0 r340725
New version that we have problems with:
FreeBSD 12.1-STABLE #5 r352893
After update to new version we have started to observe an incredible number of
errors in HTTP requests in between various services in our system. This problem
appeared on all the servers that were upgraded, and seems to not be specific to
concrete network card: we use different models, all are affected.
During various tests, we observed a lot of spontaneous TCP stream abortions,
including at the establishment stage (SYN) in cases that were 100% issue free
on 11.2-STABLE. Concrete test cases will be shown below.
We also want to highlight that, on numerous occasions, we have observed random,
huge ACK indices in a first response to a SYN packet, instead of 1, as expected.
This forces client to abort connection via RST.
On the fist glance it looks like races in the kernel, because problem disappears when:
* we use `dev.ixl.0.iflib.override_nrxqs=1` and `dev.ixl.0.iflib.override_ntxqs=1`
* we use `dev.ixl.0.iflib.override_nrxqs=0` and `dev.ixl.0.iflib.override_ntxqs=0`, but don't issue concurrent TCP streams
These are some debug log messages, emitted by 12.1-STABLE:
Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:16304 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored
Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:16326 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored
Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:16402 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored
Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:16652 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored
Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:16686 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored
Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:18562 to [10.10.10.92]:80 tcpflags 0x4<RST>; tcp_do_segment: Timestamp missing, no action
Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:18918 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored
Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19331 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored
Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19340 to [10.10.10.92]:80 tcpflags 0x4<RST>; tcp_do_segment: Timestamp missing, no action
Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19340 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored
Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19340 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored
Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19489 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored
Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19580 to [10.10.10.92]:80 tcpflags 0x4<RST>; tcp_do_segment: Timestamp missing, no action
Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19580 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored
Oct 18 14:59:01 test kernel: TCP: [10.10.10.39]:19580 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Spurious RST without matching syncache entry (possibly syncookie only), segment ignored
Oct 18 14:59:02 test kernel: TCP: [10.10.10.39]:17705 to [10.10.10.92]:80; syncache_timer: Response timeout, retransmitting (1) SYN|ACK
Oct 18 14:59:02 test kernel: TCP: [10.10.10.39]:18066 to [10.10.10.92]:80; syncache_timer: Response timeout, retransmitting (1) SYN|ACK
Oct 18 14:59:02 test kernel: TCP: [10.10.10.39]:18066 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Our SYN|ACK was rejected, connection attempt aborted by remote endpoint
Oct 18 14:59:02 test kernel: TCP: [10.10.10.39]:17705 to [10.10.10.92]:80 tcpflags 0x4<RST>; syncache_chkrst: Our SYN|ACK was rejected, connection attempt aborted by remote endpoint
Here, 10.10.10.92 runs 12.1-STABLE, while 10.10.10.39 is a client that runs 11.2-STABLE.
In our test case we use nginx and wrk , with a minimal config, where nginx always returns
error page 404. nginx is on the 12.1-STABLE, while wrk is on 11.2-STABLE.
We run wrk like so:
wrk -c 10 --header "Connection: close" -d 10 -t 1 --latency http://10.10.10.92:80/missing
and often see errors like these:
Socket errors: connect 12, read 4, write 4, timeout 0
If we reverse the test, by switching two servers places, ie 12.1-STABLE becomes a client and
issues requests via wrk, we see no problems at all. Same is true between two between two
11.2-STABLE machines.
It seems like issue appears only when the same local port is used for multiple connections
on 12.1-STABLE. Currently this is possible only when 12.1-STABLE is a server and accepts
connections on port, say 80, as in our case. To confirm, this we made another test. We've
configured nginx to listen on 10 different ports, 80 through 89, and then launched 10
different wrk processes, each using only one concurrent connection, meaning that we will
have only 10 TCP streams, each having its own unique port on the 12.1-STABLE's side:
for I in {0..9}; do wrk -c 1 --header "Connection: close" -d 10 -t 1 --latency http://10.10.10.92:8${I}/missing & ; done
Socket errors stopped appearing. We ran this test many many times, errors just don't appear.
Though, whenever we repeat a previous test, using a single port:
wrk -c 10 --header "Connection: close" -d 10 -t 1 --latency http://10.10.10.92:80/missing
errors start appearing again and again:
Socket errors: connect 8, read 14, write 9, timeout 0
We've tested different drivers with the same outcome:
em driver:
em0 at pci0:10:0:0: class=0x020000 card=0x000015d9 chip=0x10d38086 rev=0x00 hdr=0x00
vendor = 'Intel Corporation'
device = '82574L Gigabit Network Connection'
ixl driver:
ixl0 at pci0:4:0:0: class=0x020000 card=0x00078086 chip=0x15728086 rev=0x01 hdr=0x00
vendor = 'Intel Corporation'
device = 'Ethernet Controller X710 for 10GbE SFP+'
Even the driver from ports (/usr/ports/net/intel-ixl-kmod): ixl-1.11.9
Help with this matter would be really appreciated.
Best regards,
-Paul
More information about the freebsd-stable
mailing list