TCP and syncache question
Andre Oppermann
andre at freebsd.org
Mon Nov 17 14:40:12 PST 2008
Andre Oppermann wrote:
> Hartmut Brandt wrote:
>> Rui Paulo wrote:
>>>
>>> On 15 Nov 2008, at 20:08, Hartmut Brandt wrote:
>>>
>>>> Hi,
>>>>
>>>> in tcp_syncache.c:syncache_expand() there is a test that the
>>>> acknowledgement number and the sequence number of an incoming ACK
>>>> segment are in the expected range. If they are not,
>>>> syncache_expand() returns 0 and tcp_input drops the segment and sets
>>>> a reset. So far so good. But syncache_expand() also deletes the
>>>> syncache entry, and so destroys the connection. I cannot see why it
>>>> does it. It seems to me that such a wrong segment should be
>>>> interpreted as to be from another connection and as such the segment
>>>> should be ignored (but a reset sent). When the correct ACK comes,
>>>> the connection could still be established. As it is now, the
>>>> establishment of incoming connections can seriously be disturbed by
>>>> someone sending fake ACK packets.
>>>>
>>>> The same test (for the ack number, not for the sequence number) is
>>>> also further down in tcp_input.c:tcp_do_segment() (just after the
>>>> header prediction stuff) and here the handling is correct: the goto
>>>> dropwithreset just sends a reset and drops the segment but leaves
>>>> the connection in the SYN-RECEIVED state. This test is probably
>>>> never reached now, because of syncache_expand(), though.
>
> Correct.
>
>>>> Maybe I fail to see something obvious, though...
>>>
>>>
>>> Well, if the RST is sent, why should we keep the syncache entry?
>>
>> Because this effectively destroys the connection in SYN-RECEIVED which
>> is wrong according to RFC793. On page 69 the handling of incoming
>> segments for connections in SYN-RECEIVED is described: first you check
>> the sequence number and, if it is wrong, you send an RST (unless the
>> RST bit is set in the incoming segment), but otherwise ignore the
>> segment.
> >
>> A segment with a bad sequence number in SYN-RECEIVED is either forged
>> or from an old connection. In both cases you don't want to destroy the
>> embryonic connection, because the correct ACK from the correct peer
>> may still arrive.
>
> I see your problem. Syncookies mitigate this problem (if not disabled) as
> the correct ACK will pass that test later even if the syncache entry went
> away before (which can also happen due to a generally high SYN load).
>
> RFC793 wants us to do the following:
>
> Page 69: Send back a challenge ACK with the correct parameters to help
> to re-synchronize the connection when
> !(RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND).
>
> Page 72: Send back a RST when !(SND.UNA =< SEG.ACK =< SND.NXT).
>
> At the moment we send the RST and delete the syncache entry for both cases.
> However we should send the ACK in the former, the RST in the latter, and
> and keep the syncache entry in either case.
>
> Fixing this requires some re-shuffling of the syncache_expand(). I'll
> post a version later today.
This is a bit more complicated because of interactions with tcp_input()
where syncache_expand() is called from.
The old code (as of December 2002) behaved slightly different. It would
not remove the syncache entry when (SND.UNA == SEG.ACK) but send a RST.
The (RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND) test wasn't done at
all. Instead a socket was opened whenever (SND.UNA == SEG.ACK) succeeded.
This gave way to the "LAND" DoS attack which was mostly fixed with a test
for (RCV.IRS < SEG.SEQ).
See the attached patch for fixed version of syncache_expand(). This patch
is untested though. My development machine is currently down. Harti, Rui
and Bjoern, please have a look at the patch and review it.
--
Andre
-------------- next part --------------
--- tcp_input.c.1.390 Mon Nov 17 21:33:25 2008
+++ tcp_input.c.1.390.mod Mon Nov 17 21:35:22 2008
@@ -642,10 +642,13 @@ findpcb:
if (!syncache_expand(&inc, &to, th, &so, m)) {
/*
* No syncache entry or ACK was not
- * for our SYN/ACK. Send a RST.
+ * for our SYN/ACK. Send a RST or
+ * an ACK for re-synchronization.
* NB: syncache did its own logging
* of the failure cause.
*/
+ if (so == 1)
+ goto dropunlock;
rstreason = BANDLIM_RST_OPENPORT;
goto dropwithreset;
}
--- tcp_syncache.c.1.160 Mon Nov 17 16:49:01 2008
+++ tcp_syncache.c.1.160.mod Mon Nov 17 23:35:39 2008
@@ -817,59 +817,47 @@ syncache_expand(struct in_conninfo *inc,
INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
KASSERT((th->th_flags & (TH_RST|TH_ACK|TH_SYN)) == TH_ACK,
("%s: can handle only ACK", __func__));
+ *lsop = NULL;
sc = syncache_lookup(inc, &sch); /* returns locked sch */
SCH_LOCK_ASSERT(sch);
if (sc == NULL) {
/*
* There is no syncache entry, so see if this ACK is
- * a returning syncookie. To do this, first:
- * A. See if this socket has had a syncache entry dropped in
- * the past. We don't want to accept a bogus syncookie
- * if we've never received a SYN.
- * B. check that the syncookie is valid. If it is, then
- * cobble up a fake syncache entry, and return.
+ * a returning syncookie. If the syncookie is valid,
+ * cobble up a fake syncache entry and create a socket.
+ *
+ * NB: Syncache head is locked for the syncookie access.
*/
if (!tcp_syncookies) {
- SCH_UNLOCK(sch);
if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
log(LOG_DEBUG, "%s; %s: Spurious ACK, "
"segment rejected (syncookies disabled)\n",
s, __func__);
- goto failed;
+ goto sendrst;
}
bzero(&scs, sizeof(scs));
sc = syncookie_lookup(inc, sch, &scs, to, th, *lsop);
- SCH_UNLOCK(sch);
if (sc == NULL) {
if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
log(LOG_DEBUG, "%s; %s: Segment failed "
"SYNCOOKIE authentication, segment rejected "
"(probably spoofed)\n", s, __func__);
- goto failed;
+ goto sendrst;
}
- } else {
- /* Pull out the entry to unlock the bucket row. */
- TAILQ_REMOVE(&sch->sch_bucket, sc, sc_hash);
- sch->sch_length--;
- V_tcp_syncache.cache_count--;
- SCH_UNLOCK(sch);
+ goto expand; /* fully validated through syncookie */
}
+ SCH_LOCK_ASSERT(sch);
/*
* Segment validation:
- * ACK must match our initial sequence number + 1 (the SYN|ACK).
- */
- if (th->th_ack != sc->sc_iss + 1 && !TOEPCB_ISSET(sc)) {
- if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
- log(LOG_DEBUG, "%s; %s: ACK %u != ISS+1 %u, segment "
- "rejected\n", s, __func__, th->th_ack, sc->sc_iss);
- goto failed;
- }
-
- /*
+ *
* The SEQ must fall in the window starting at the received
* initial receive sequence number + 1 (the SYN).
+ * If not the segment may be from an earlier connection. We send
+ * an ACK to re-synchronize the connection and keep the syncache
+ * entry without ajusting its timers.
+ * See RFC793 page 69, first check sequence number [SYN_RECEIVED].
*/
if ((SEQ_LEQ(th->th_seq, sc->sc_irs) ||
SEQ_GT(th->th_seq, sc->sc_irs + sc->sc_wnd)) &&
@@ -877,14 +865,41 @@ syncache_expand(struct in_conninfo *inc,
if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
log(LOG_DEBUG, "%s; %s: SEQ %u != IRS+1 %u, segment "
"rejected\n", s, __func__, th->th_seq, sc->sc_irs);
- goto failed;
+ (void) syncache_respond(sc);
+ *lsop = 1; /* prevent RST */
+ goto sendrstkeep;
}
+ /*
+ * ACK must match our initial sequence number + 1 (the SYN|ACK).
+ * If not the segment may be from an earlier connection. We send
+ * a RST but keep the syncache entry without ajusting its timers.
+ * See RFC793 page 72, fifth check the ACK field, [SYN_RECEIVED].
+ */
+ if (th->th_ack != sc->sc_iss + 1 && !TOEPCB_ISSET(sc)) {
+ if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
+ log(LOG_DEBUG, "%s; %s: ACK %u != ISS+1 %u, segment "
+ "rejected\n", s, __func__, th->th_ack, sc->sc_iss);
+ goto sendrstkeep;
+ }
+
+ /*
+ * Remove the entry to unlock the bucket row.
+ * Tests from now on are fatal and remove the syncache entry.
+ */
+ TAILQ_REMOVE(&sch->sch_bucket, sc, sc_hash);
+ sch->sch_length--;
+ V_tcp_syncache.cache_count--;
+
+ /*
+ * If timestamps were not negotiated they must not show up later.
+ * See RFC1312bis, section 1.3, second paragraph
+ */
if (!(sc->sc_flags & SCF_TIMESTAMP) && (to->to_flags & TOF_TS)) {
if ((s = tcp_log_addrs(inc, th, NULL, NULL)))
log(LOG_DEBUG, "%s; %s: Timestamp not expected, "
"segment rejected\n", s, __func__);
- goto failed;
+ goto sendrst;
}
/*
* If timestamps were negotiated the reflected timestamp
@@ -896,9 +911,11 @@ syncache_expand(struct in_conninfo *inc,
log(LOG_DEBUG, "%s; %s: TSECR %u != TS %u, "
"segment rejected\n",
s, __func__, to->to_tsecr, sc->sc_ts);
- goto failed;
+ goto sendrst;
}
+expand:
+ SCH_UNLOCK(sch);
*lsop = syncache_socket(sc, *lsop, m);
if (*lsop == NULL)
@@ -906,16 +923,18 @@ syncache_expand(struct in_conninfo *inc,
else
V_tcpstat.tcps_sc_completed++;
-/* how do we find the inp for the new socket? */
if (sc != &scs)
syncache_free(sc);
return (1);
-failed:
- if (sc != NULL && sc != &scs)
+
+sendrst:
+ if (sc != &scs)
syncache_free(sc);
+sendrstkeep:
+ SCH_LOCK_ASSERT(sch);
+ SCH_UNLOCK(sch);
if (s != NULL)
free(s, M_TCPLOG);
- *lsop = NULL;
return (0);
}
@@ -1322,6 +1341,8 @@ syncache_respond(struct syncache *sc)
*
* 1) path_mtu_discovery is disabled
* 2) the SCF_UNREACH flag has been set
+ *
+ * XXXAO: The route lookup comment doesn't make sense.
*/
if (V_path_mtu_discovery && ((sc->sc_flags & SCF_UNREACH) == 0))
ip->ip_off |= IP_DF;
More information about the freebsd-net
mailing list