Handling 100.000 packets/sec or more

Tom Pavel pavel at NetworkPhysics.COM
Wed Jan 14 14:04:42 PST 2004


>>>>> On Wed, 14 Jan 2004, Richard Wendland <richard at starburst.demon.co.uk> wri
tes:

> > device polling(8) really does help _alot_ for packet floods/storms.
> > for device polling to work properly (imho) you would need to set HZ 
> > to 1000.
> > I dont recommend any higher HZ on a PIII.
> 
> Incidentally, setting HZ > 1000 would cause FreeBSD TCP to not comply
> with RFC1323, as it would make the TCP timestamp option clock tick faster
> than 1ms.  RFC1323 4.2.2 specifies the clock rate to be in the range
> 1 ms to 1 sec per tick.
> 
> Really the TCP timestamp option clock should be divorced from HZ before
> too long, as a time will come when people will want HZ > 1000.
> 
> Actually a bit faster tick-rate is unlikely to run into much trouble in
> practice, but it will cause the PAWS algorithm to stop a long running
> TCP connection, see 4.2.3 of RFC1323.
> 
> 	Richard


The PAWS thing is real.  Idle SSH or telnet connections can easily get
hosed by wraparound if you crank up HZ too much.  We encountered this
at Network Physics.

I had been meaning to submit a PR about this (and probably several
others as well) for quite a while now, but I always got distracted by
some other urgent matter...  However, given the prod, I was able to
dig up the fix we used for this particular problem.  Pretty sure these
diffs will not apply cleanly, even to -stable, but no doubt the gist
of the idea should be clear enough.  Hopefully, this can save someone
some work on getting a fix into the tree.

Tom Pavel

Network Physics
pavel at networkphysics.com / pavel at alum.mit.edu 



Index: tcp_input.c
===================================================================
RCS file: /u1/Repo/FreeBSD/sys/netinet/tcp_input.c,v
retrieving revision 1.41
retrieving revision 1.42
diff -u -r1.41 -r1.42
--- tcp_input.c	2 Apr 2002 23:27:33 -0000	1.41
+++ tcp_input.c	3 Apr 2002 22:24:24 -0000	1.42
@@ -1185,7 +1185,7 @@
 		 */
 		if ((to.to_flag & TOF_TS) != 0 &&
 		   SEQ_LEQ(th->th_seq, tp->last_ack_sent)) {
-			tp->ts_recent_age = ticks;
+			GETCURTS(tp->ts_recent_age);
 			tp->ts_recent = to.to_tsval;
 		}
 
@@ -1228,9 +1228,12 @@
                          && ((!(sack_check(tp))) ||
 			     to.to_tsecr)
 #endif
-			    )
-			    tcp_xmit_timer(tp, ticks - to.to_tsecr + 1);
-			else {
+			    ) {
+			    u_long cur_ts, rtt_ticks;
+			    GETCURTS(cur_ts);
+			    rtt_ticks = TSTMPTOTICK (cur_ts - to.to_tsecr);
+			    tcp_xmit_timer(tp, rtt_ticks  + 1);
+			} else {
 #ifdef LTSTMP
 			    tcp_xmit_timer(tp, tp->t_rtttime);
 #else
@@ -1941,9 +1944,11 @@
 	 */
 	if ((to.to_flag & TOF_TS) != 0 && tp->ts_recent &&
 	    TSTMP_LT(to.to_tsval, tp->ts_recent)) {
+	    	u_long cur_ts;
 
 		/* Check to see if ts_recent is over 24 days old.  */
-		if ((int)(ticks - tp->ts_recent_age) > TCP_PAWS_IDLE) {
+		GETCURTS(cur_ts);
+		if ((int)(cur_ts - tp->ts_recent_age) > TCP_PAWS_IDLE) {
 			/*
 			 * Invalidate ts_recent.  If this segment updates
 			 * ts_recent, the age will be reset later and ts_recent
@@ -2120,7 +2125,7 @@
 	 */
 	if ((to.to_flag & TOF_TS) != 0 &&
 	    SEQ_LEQ(th->th_seq, tp->last_ack_sent)) {
-		tp->ts_recent_age = ticks;
+	    	GETCURTS(tp->ts_recent_age);
 		tp->ts_recent = to.to_tsval;
 	}
 
@@ -2754,9 +2759,12 @@
               /* bug fix from Mark Allman  */
 		&& ((!sack_check(tp)) || to.to_tsecr)
 #endif
-		    )
-			tcp_xmit_timer(tp, ticks - to.to_tsecr + 1);
-		else {
+		    ) {
+		    	u_long cur_ts, rtt_ticks;
+			GETCURTS(cur_ts);
+			rtt_ticks = TSTMPTOTICK (cur_ts - to.to_tsecr);
+			tcp_xmit_timer(tp, rtt_ticks  + 1);
+		} else {
 
 #ifdef LTSTMP    /* use local timestamp */
 		tcp_xmit_timer(tp, tp->t_rtttime);
@@ -3293,7 +3301,7 @@
 			if (th->th_flags & TH_SYN) {
 				tp->t_flags |= TF_RCVD_TSTMP;
 				tp->ts_recent = to->to_tsval;
-				tp->ts_recent_age = ticks;
+				GETCURTS(tp->ts_recent_age);
 			}
 			break;
 
Index: tcp_output.c
===================================================================
RCS file: /u1/Repo/FreeBSD/sys/netinet/tcp_output.c,v
retrieving revision 1.32
retrieving revision 1.33
diff -u -r1.32 -r1.33
--- tcp_output.c	3 Apr 2002 01:55:20 -0000	1.32
+++ tcp_output.c	3 Apr 2002 22:24:24 -0000	1.33
@@ -616,7 +616,8 @@
 
  		/* Form timestamp option as shown in appendix A of RFC 1323. */
  		*lp++ = htonl(TCPOPT_TSTAMP_HDR);
- 		*lp++ = htonl(ticks);
+		GETCURTS(*lp);
+ 		*lp++ = htonl(*lp);
  		*lp   = htonl(tp->ts_recent);
  		optlen += TCPOLEN_TSTAMP_APPA;
  	}
Index: tcp_seq.h
===================================================================
RCS file: /u1/Repo/FreeBSD/sys/netinet/tcp_seq.h,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -r1.2 -r1.3
--- tcp_seq.h	16 Jul 2001 18:18:44 -0000	1.2
+++ tcp_seq.h	3 Apr 2002 22:24:24 -0000	1.3
@@ -88,8 +88,19 @@
 	    (tp)->iss
 #endif
 
-#define TCP_PAWS_IDLE	(24 * 24 * 60 * 60 * hz)
-					/* timestamp wrap-around time */
+/* clock macros for RFC1323 timestamps */
+#define TSTMP_UNITS	(10)	/* in ms (RFC1323 says 1-1000 ms) */
+#define GETCURTS(ts)							\
+	do {								\
+    		struct timeval tv;					\
+		getmicrouptime(&tv);					\
+		(ts) = (u_long)tv.tv_sec * 1000 + tv.tv_usec / 1000;	\
+		(ts) /= TSTMP_UNITS;					\
+	} while (0) 
+#define TSTMPTOTICK(ts) (((int64_t)(ts))*hz*TSTMP_UNITS/1000)
+
+#define TCP_PAWS_IDLE	(24 * 24 * 60 * 60 * 1000/TSTMP_UNITS)
+                       /* timestamp wrap-around time (24 days in 10ms units) */
 
 #ifdef _KERNEL
 extern tcp_cc	tcp_ccgen;		/* global connection count */


More information about the freebsd-net mailing list