Issues with TCP Timestamps allocation

Tue Jul 9 15:30:05 UTC 2019


> On 9. Jul 2019, at 14:58, Paul <devgs at ukr.net> wrote:
> 
> Hi Michael,
> 
> 9 July 2019, 15:34:29, by "Michael Tuexen" <tuexen at freebsd.org>:
> 
>> 
>> 
>>> On 8. Jul 2019, at 17:22, Paul <devgs at ukr.net> wrote:
>>> 
>>> 
>>> 
>>> 8 July 2019, 17:12:21, by "Michael Tuexen" <tuexen at freebsd.org>:
>>> 
>>>>> On 8. Jul 2019, at 15:24, Paul <devgs at ukr.net> wrote:
>>>>> 
>>>>> Hi Michael,
>>>>> 
>>>>> 8 July 2019, 15:53:15, by "Michael Tuexen" <tuexen at freebsd.org>:
>>>>> 
>>>>>>> On 8. Jul 2019, at 12:37, Paul <devgs at ukr.net> wrote:
>>>>>>> 
>>>>>>> Hi team,
>>>>>>> 
>>>>>>> Recently we had an upgrade to 12 Stable. Immediately after, we have started 
>>>>>>> seeing some strange connection establishment timeouts to some fixed number
>>>>>>> of external (world) hosts. The issue was persistent and easy to reproduce.
>>>>>>> Thanks to a patience and dedication of our system engineer we have tracked  
>>>>>>> this issue down to a specific commit:
>>>>>>> 
>>>>>>> https://svnweb.freebsd.org/base?view=revision&revision=338053
>>>>>>> 
>>>>>>> This patch was also back-ported into 11 Stable:
>>>>>>> 
>>>>>>> https://svnweb.freebsd.org/base?view=revision&revision=348435
>>>>>>> 
>>>>>>> Among other things this patch changes the timestamp allocation strategy,
>>>>>>> by introducing a deterministic randomness via a hash function that takes
>>>>>>> into account a random key as well as source address, source port, dest
>>>>>>> address and dest port. As the result, timestamp offsets of different
>>>>>>> tuples (SA,SP,DA,DP) will be wildly different and will jump from small 
>>>>>>> to large numbers and back, as long as something in the tuple changes.
>>>>>> Hi Paul,
>>>>>> 
>>>>>> this is correct.
>>>>>> 
>>>>>> Please note that the same happens with the old method, if two hosts with
>>>>>> different uptimes are bind a consumer grade NAT.
>>>>> 
>>>>> If NAT does not replace timestamps then yes, it should be the case.
>>>>> 
>>>>>>> 
>>>>>>> After performing various tests of hosts that produce the above mentioned 
>>>>>>> issue we came to conclusion that there are some interesting implementations 
>>>>>>> that drop SYN packets with timestamps smaller  than the largest timestamp 
>>>>>>> value from streams of all recent or current connections from a specific 
>>>>>>> address. This looks as some kind of SYN flood protection.
>>>>>> This also breaks multiple hosts with different uptimes behind a consumer
>>>>>> level NAT talking to such a server.
>>>>>>> 
>>>>>>> To ensure that each external host is not going to see a wild jumps of 
>>>>>>> timestamp values I propose a patch that removes ports from the equation
>>>>>>> all together, when calculating the timestamp offset:
>>>>>>> 
>>>>>>> Index: sys/netinet/tcp_subr.c
>>>>>>> ===================================================================
>>>>>>> --- sys/netinet/tcp_subr.c	(revision 348435)
>>>>>>> +++ sys/netinet/tcp_subr.c	(working copy)
>>>>>>> @@ -2224,7 +2224,22 @@
>>>>>>> uint32_t
>>>>>>> tcp_new_ts_offset(struct in_conninfo *inc)
>>>>>>> {
>>>>>>> -	return (tcp_keyed_hash(inc, V_ts_offset_secret));
>>>>>>> +        /* 
>>>>>>> +         * Some implementations show a strange behaviour when a wildly random 
>>>>>>> +         * timestamps allocated for different streams. It seems that only the
>>>>>>> +         * SYN packets are affected. Observed implementations drop SYN packets
>>>>>>> +         * with timestamps smaller than the largest timestamp value of all 
>>>>>>> +         * recent or current connections from specific a address. To mitigate 
>>>>>>> +         * this we are going to ensure that each host will always observe 
>>>>>>> +         * timestamps as increasing no matter the stream: by dropping ports
>>>>>>> +         * from the equation.
>>>>>>> +         */ 
>>>>>>> +        struct in_conninfo inc_copy = *inc;
>>>>>>> +
>>>>>>> +        inc_copy.inc_fport = 0;
>>>>>>> +        inc_copy.inc_lport = 0;
>>>>>>> +
>>>>>>> +	return (tcp_keyed_hash(&inc_copy, V_ts_offset_secret));
>>>>>>> }
>>>>>>> 
>>>>>>> /*
>>>>>>> 
>>>>>>> In any case, the solution of the uptime leak, implemented in rev338053 is 
>>>>>>> not going to suffer, because a supposed attacker is currently able to use 
>>>>>>> any fixed values of SP and DP, albeit not 0, anyway, to remove them out 
>>>>>>> of the equation.
>>>>>> Can you describe how a peer can compute the uptime from two observed timestamps?
>>>>>> I don't see how you can do that...
>>>>> 
>>>>> Supposed attacker could run a script that continuously monitors timestamps,
>>>>> for example via a periodic TCP connection from a fixed local port (eg 12345) 
>>>>> and a fixed local address to the fixed victim's address and port (eg 80).
>>>>> Whenever large discrepancy is observed, attacker can assume that reboot has 
>>>>> happened (due to V_ts_offset_secret re-generation), hence the received 
>>>>> timestamp is considered an approximate point of reboot from which the uptime
>>>>> can be calculated, until the next reboot and so on.
>>>> Ahh, I see. The patch we are talking about is not intended to protect against
>>>> continuous monitoring, which is something you can always do. You could even
>>>> watch for service availability and detect reboots. A change of the local key
>>>> would also look similar to a reboot without a temporary loss of connectivity.
>>>> 
>>>> Thanks for the clarification.
>>>>> 
>>>>>>> 
>>>>>>> There is the list of example hosts that we were able to reproduce the 
>>>>>>> issue with:
>>>>>>> 
>>>>>>> curl -v http://88.99.60.171:80
>>>>>>> curl -v http://163.172.71.252:80
>>>>>>> curl -v http://5.9.242.150:80
>>>>>>> curl -v https://185.134.205.105:443
>>>>>>> curl -v https://136.243.1.231:443
>>>>>>> curl -v https://144.76.196.4:443
>>>>>>> curl -v http://94.127.191.194:80
>>>>>>> 
>>>>>>> To reproduce, call curl repeatedly with a same URL some number of times. 
>>>>>>> You are going  to see some of the requests stuck in 
>>>>>>> `*    Trying XXX.XXX.XXX.XXX...`
>>>>>>> 
>>>>>>> For some reason, the easiest way to reproduce the issue is with nc:
>>>>>>> 
>>>>>>> $ echo "foooooo" | nc -v 88.99.60.171 80
>>>>>>> 
>>>>>>> Only a few such calls are required until one of them is stuck on connect():
>>>>>>> issuing SYN packets with an exponential backoff.
>>>>>> Thanks for providing an end-point to test with. I'll take a look.
>>>>>> Just to be clear: You are running a FreeBSD client against one of the above
>>>>>> servers and experience the problem with the new timestamp computations.
>>>>>> 
>>>>>> You are not running arbitrary clients against a FreeBSD server...
>>>>> 
>>>>> We are talking about FreeBSD being the client. Peers that yield this unwanted
>>>>> behaviour are unknown. Little bit of tinkering showed that some of them run 
>>>>> Debian:
>>>>> 
>>>>> telnet 88.99.60.171 22
>>>>> Trying 88.99.60.171...
>>>>> Connected to 88.99.60.171.
>>>>> Escape character is '^]'.
>>>>> SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3
>>>> Also some are hosted by Hetzner, but not all. I'll will look into
>>>> this tomorrow, since I'm on a deadline today (well it is 2am tomorrow
>>>> morning, to be precise)...
>>> 
>>> Thanks a lot, I would appreciate that.
>> Hi Paul,
>> 
>> I have looked into this.
>> 
>> * The FreeBSD behaviour is the one which is specified in the last bullet item
>>  in https://tools.ietf.org/html/rfc7323#section-5.4
>>  It is also the one, which is RECOMMENDED in
>>  https://tools.ietf.org/html/rfc7323#section-7.1 
>> 
>> * My NAT box (a popular one in Germany) does NOT rewrite TCP timestamps.
>> 
>> This means that the host you are referring to have some sort of protection,
>> which makes incorrect assumptions. It will also break multiple hosts behind
>> a NAT.
>> 
>> I can run
>> curl -v http://88.99.60.171:80
>> in a loop without any problems from a FreeBSD head system. I tested 1000
>> iterations or so. The TS.val is jumping up and down as expected.
>> I'm wondering why you are observing errors in this case, too.
>> 
>> However, doing something like
>> echo "foooooo" | nc -v 88.99.60.171 80
>> triggers the problem.
>> 
>> So I think there is some functionality (in a middlebox or running on the host),
>> which incorrectly assume monotonic timestamps between multiple TCP connections
>> coming from the same IP address, but only in case of errors at the application layer.
> 
> Yeah, exactly, some hosts seem to enable this only in case of an error in HTTP
> communication (some smart proxy?). However, there are some that behave this way
> regardless of errors, for example these:
> 
> curl -v https://185.134.205.105:443
> curl -v https://136.243.1.231:443
Wireshark sees an Encrypted Alert in both cases. So I guess this is another indication
of "error at the application layer".
> 
>> 
>> Do you have any insights whether the hosts you are listed share something in
>> common. Some of them are hosted by Hetzner, but not all.
> 
> Nope. A whole set of endpoints that we have detected so far is pretty diverse,
> containing a lot of different locations geographically, as well as different
> hosters.
OK. Thanks for the clarification.
> 
>> 
>> I think in general, it is the correct thing to include the port numbers in
>> the offset computation. We might add a sysctl variable to control the inclusion.
>> This would allow interworking with broken middleboxes.
> 
> Yeah, I completely agree that these rare cases should not dictate the implementation.
> But an ability to enable a work-around via sysctl would be greatly appreciated.
> Currently we are unable to roll-out the upgrade across all servers because of this
> issue: even though it happens not so often, a lot of requests from our users 
> get stuck or fail all together. For example, a host 185.134.205.105 is a kind of
> social network that our proxy servers connect to so securely access to content,
> such as images, on behalf of our users.
> 
>> 
>> Please note, this does not fix the case of multiple clients behind a NAT.
> 
> Yeah, that's true. Fortunately we don't use NAT.
> 
>> 
>> I'm also trying to figure out how and why Linux and Windows are handling this.
> 
> Thanks for bothering!
Will let you know what I figure out.

Best regards
Michael
> 
>> 
>> Best regards
>> Michael
>> 
>>> 
>>>> 
>>>> Best regards
>>>> Michael 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Best regards
>>>>>> Michael
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>