TCP options order changed in FreeBSD 7, incompatible with some routers

Andre Oppermann andre at freebsd.org
Wed Mar 12 16:00:58 PDT 2008


d.s. al coda wrote:
> Hi,
> We recently upgraded one of our webservers to FreeBSD 7, and we started
> receiving complaints from some users not able to connect to that server
> anymore. On top of that, users were saying that the problem only occurred on
> Windows (at least, the ones who had more than on OS to try it out).
> 
> After managing to get a user who had the problem running windump, running
> tcpdump on the new server, and comparing that to the windump & tcpdump
> output for a "control" user (me) that could connect, we managed to figure
> out the following:
> - For the user with this problem, ping works fine, but all TCP connections
> to the server fail.
> - The user, trying to connect, sends out a SYN packet, receives no response,
> and retries a few times until timing out.
> - The server sees a bunch of SYN packets and responds with SYN-ACK each
> time.
> - The issue only seems to arise if the sender has RFC1323 disabled.
> 
> So, the SYN-ACK is getting lost somewhere.
> 
> - For the control user (who can connect via TCP just fine), we set the TCP
> window size and RFC1323 options the same as the user with the problem.
> - The control user sees the SYN-ACK packet.
> - We send a connection attempt to one of our other servers, running FreeBSD
> 5.5, and one to the server running FreeBSD 7.
> - There is only one notable difference between the responses: the order of
> the options.
> - FreeBSD 5.5 has <mss 1412, nop, nop, sackOK>
> - FreeBSD 7 has <mss 1412, sackOK, eol> (there is of course an aligning nop
> after the eol, which tcpdump skips)
> - These options don't appear in this exact configuration when using RFC1323
> options.
> 
> I get a hunch that the users with the problem have a router that erroneously
> thinks that these options are invalid, or thinks that the some part of byte
> sequence (e.g. 0204 05b4 0101 0402) is an attack.
> 
> Just to try it out, I patched tcp_output.c so that the SACK permitted option
> was aligned on a 4-byte boundary, preventing the "sackOK, eol" pattern from
> ever occuring.  Looking through previous versions, I found where the tcp
> option code had changed, and there used to be a comment about putting SACK
> permitted last, but I can't tell if it's relevant.
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_output.c.diff?r1=1.125;r2=1.126
> 
> The one-line patch to tcp_output.c is attached.
> 
> Sure enough, it fixed the problem. Afterwards, we collected some information
> about the routers the users who had the problem were using, and while they
> didn't all have the same manufacturer, several mentioned that their router
> had a built-in firewall, which, when they disabled it, allowed them to
> access the server.

I'd be very interesting to know the exactly models and their firmware version
of the affected routers.  If available locally I'd like to obtain a similar
model myself for future regression tests.

> Does all of this sound reasonable? And if so, would it be worth submitting
> this patch? I don't know if this particular change in options order was
> intentional, or just a side-effect of the new code, but it certainly works
> around an extremely hard-to-diagnose problem.

We've already fixed two issues.  The first changes the order of the TCP options
and is in this change:

  http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_var.h.diff?r1=1.160;r2=1.161

It is to solve a problem observed by ISC that sounds very much like what you
describe.  This fixed the issue in this case.

The second changes the alignment padding from NOP to 0x00.  Whether this was
a contributing factor to the reported problem is not clear.  There hasn't (yet)
been any specific test case for it.  It was fixed because the RFC specifies 0x00
to be used for padding and nothing else.

  http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_output.c.diff?r1=1.145;r2=1.146

It would be very helpful if you could apply these two patches after each other
to your 7.0 test server and find out together with the affected user(s) which
of these fixes the issue.  If you can please try to test each one with and w/o
the routers firewall enabled.  It is interesting to know whether the NAT or
firewalling part of the router chokes on it.

Your help is very appreciated and I try to document all strange TCP occurrences
so we can incorporate them into a regression test suite later on.  The more
information we have the better it'll become.

-- 
Andre



More information about the freebsd-net mailing list