TCP options order changed in FreeBSD 7, incompatible with some routers

d.s. al coda coda.trigger at gmail.com
Tue Mar 11 18:23:32 PDT 2008


Hi,
We recently upgraded one of our webservers to FreeBSD 7, and we started
receiving complaints from some users not able to connect to that server
anymore. On top of that, users were saying that the problem only occurred on
Windows (at least, the ones who had more than on OS to try it out).

After managing to get a user who had the problem running windump, running
tcpdump on the new server, and comparing that to the windump & tcpdump
output for a "control" user (me) that could connect, we managed to figure
out the following:
- For the user with this problem, ping works fine, but all TCP connections
to the server fail.
- The user, trying to connect, sends out a SYN packet, receives no response,
and retries a few times until timing out.
- The server sees a bunch of SYN packets and responds with SYN-ACK each
time.
- The issue only seems to arise if the sender has RFC1323 disabled.

So, the SYN-ACK is getting lost somewhere.

- For the control user (who can connect via TCP just fine), we set the TCP
window size and RFC1323 options the same as the user with the problem.
- The control user sees the SYN-ACK packet.
- We send a connection attempt to one of our other servers, running FreeBSD
5.5, and one to the server running FreeBSD 7.
- There is only one notable difference between the responses: the order of
the options.
- FreeBSD 5.5 has <mss 1412, nop, nop, sackOK>
- FreeBSD 7 has <mss 1412, sackOK, eol> (there is of course an aligning nop
after the eol, which tcpdump skips)
- These options don't appear in this exact configuration when using RFC1323
options.

I get a hunch that the users with the problem have a router that erroneously
thinks that these options are invalid, or thinks that the some part of byte
sequence (e.g. 0204 05b4 0101 0402) is an attack.

Just to try it out, I patched tcp_output.c so that the SACK permitted option
was aligned on a 4-byte boundary, preventing the "sackOK, eol" pattern from
ever occuring.  Looking through previous versions, I found where the tcp
option code had changed, and there used to be a comment about putting SACK
permitted last, but I can't tell if it's relevant.
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_output.c.diff?r1=1.125;r2=1.126

The one-line patch to tcp_output.c is attached.

Sure enough, it fixed the problem. Afterwards, we collected some information
about the routers the users who had the problem were using, and while they
didn't all have the same manufacturer, several mentioned that their router
had a built-in firewall, which, when they disabled it, allowed them to
access the server.

Does all of this sound reasonable? And if so, would it be worth submitting
this patch? I don't know if this particular change in options order was
intentional, or just a side-effect of the new code, but it certainly works
around an extremely hard-to-diagnose problem.

-coda
-------------- next part --------------
--- tcp_output_orig.c   2008-03-03 00:13:06.000000000 -0500
+++ tcp_output_new.c    2008-03-03 00:16:06.000000000 -0500
@@ -1304,7 +1304,7 @@
                        *optp++ = to->to_wscale;
                        break;
                case TOF_SACKPERM:
-                       while (optlen % 2) {
+                       while (!optlen || optlen % 4 != 2) {
                                optlen += TCPOLEN_NOP;
                                *optp++ = TCPOPT_NOP;
                        }


More information about the freebsd-net mailing list