svn commit: r193941 - head/sys/netinet

Bruce Evans brde at optusnet.com.au
Thu Jun 11 03:53:26 UTC 2009


On Wed, 10 Jun 2009, John Baldwin wrote:

> Log:
>  Change a few members of tcpcb that store cached copies of ticks to be ints
>  instead of unsigned longs.  This fixes a few overflow edge cases on 64-bit
>  platforms.  Specifically, if an idle connection receives a packet shortly

I think the variables should still be unsigned (ints now).  Otherwise there
is at best benign undefined behaviour when the variables overflow at
INT_MAX, while the code is apparently designed to work with unsigned
values (it casts to int to look at differences between the unsigned values).

>  before 2^31 clock ticks of uptime (about 25 days with hz=1000) and the keep
>  alive timer fires after 2^31 clock ticks, the keep alive timer will think
>  that the connection has been idle for a very long time and will immediately
>  drop the connection instead of sending a keep alive probe.
>
>  Reviewed by:	silby, gnn, lstewart
>  MFC after:	1 week
>
> Modified:
>  head/sys/netinet/tcp_input.c
>  head/sys/netinet/tcp_usrreq.c
>  head/sys/netinet/tcp_var.h
>
> Modified: head/sys/netinet/tcp_input.c
> ==============================================================================
> --- head/sys/netinet/tcp_input.c	Wed Jun 10 18:26:02 2009	(r193940)
> +++ head/sys/netinet/tcp_input.c	Wed Jun 10 18:27:15 2009	(r193941)
> @@ -1778,7 +1778,7 @@ tcp_do_segment(struct mbuf *m, struct tc
> 	    TSTMP_LT(to.to_tsval, tp->ts_recent)) {
>
> 		/* Check to see if ts_recent is over 24 days old.  */
> -		if ((int)(ticks - tp->ts_recent_age) > TCP_PAWS_IDLE) {
> +		if ((ticks - tp->ts_recent_age) > TCP_PAWS_IDLE) {

The variables are now ints, and there is now also overflow in the
subtraction.  E.g., INT_MAX - INT_MIN now overflows.  On 2's complement
machines it normally overflows to -2, which is probably the wrong value
(the large unsigned value 0xFFF...FEU is probably correct), but it is
the same value as is given by casting the difference of unsigned values
to int ((int)0xFFF...FEU = -2).  This is because, when representing times
mod UINT_MAX+1, all time differences up to UINT_MAX can be represented,
but only if we keep track of which operand is older (here we probably do
know that ts_recent_age is older); if we don't keep track then we normally
assume that (unsigned) differences smaller than INT_MAX mean that the
difference is nonnegative while differences larger than INT_MAX mean
that the difference is negative.  The signed interpreatation works well
if we know that the nonnegative differences never exceed INT_MAX.

Style bug: without the cast, the above has excessive parentheses.

> 			/*
> 			 * Invalidate ts_recent.  If this segment updates
> 			 * ts_recent, the age will be reset later and ts_recent
>
> Modified: head/sys/netinet/tcp_usrreq.c
> ==============================================================================
> --- head/sys/netinet/tcp_usrreq.c	Wed Jun 10 18:26:02 2009	(r193940)
> +++ head/sys/netinet/tcp_usrreq.c	Wed Jun 10 18:27:15 2009	(r193941)
> @@ -1823,7 +1823,7 @@ db_print_tcpcb(struct tcpcb *tp, const c
> 	    tp->snd_recover);
>
> 	db_print_indent(indent);
> -	db_printf("t_maxopd: %u   t_rcvtime: %lu   t_startime: %lu\n",
> +	db_printf("t_maxopd: %u   t_rcvtime: %u   t_startime: %u\n",
> 	    tp->t_maxopd, tp->t_rcvtime, tp->t_starttime);
>
> 	db_print_indent(indent);

You still print all the times as unsigned.  Since the variables are now
signed, this is now a printf format error, which gcc still doesn't detect
but I usually do.

> Modified: head/sys/netinet/tcp_var.h
> ==============================================================================
> --- head/sys/netinet/tcp_var.h	Wed Jun 10 18:26:02 2009	(r193940)
> +++ head/sys/netinet/tcp_var.h	Wed Jun 10 18:27:15 2009	(r193941)
> @@ -139,8 +139,8 @@ struct tcpcb {
>
> 	u_int	t_maxopd;		/* mss plus options */
>
> -	u_long	t_rcvtime;		/* inactivity time */
> -	u_long	t_starttime;		/* time connection was established */
> +	int	t_rcvtime;		/* inactivity time */
> +	int	t_starttime;		/* time connection was established */
> 	int	t_rtttime;		/* round trip time */
> 	tcp_seq	t_rtseq;		/* sequence number being timed */
>
> @@ -167,7 +167,7 @@ struct tcpcb {
> 	u_char	rcv_scale;		/* window scaling for recv window */
> 	u_char	request_r_scale;	/* pending window scaling */
> 	u_int32_t  ts_recent;		/* timestamp echo data */
> -	u_long	ts_recent_age;		/* when last updated */
> +	int	ts_recent_age;		/* when last updated */
> 	u_int32_t  ts_offset;		/* our timestamp offset */
>
> 	tcp_seq	last_ack_sent;
> @@ -175,7 +175,7 @@ struct tcpcb {
> 	u_long	snd_cwnd_prev;		/* cwnd prior to retransmit */
> 	u_long	snd_ssthresh_prev;	/* ssthresh prior to retransmit */
> 	tcp_seq	snd_recover_prev;	/* snd_recover prior to retransmit */
> -	u_long	t_badrxtwin;		/* window for retransmit recovery */
> +	int	t_badrxtwin;		/* window for retransmit recovery */
> 	u_char	snd_limited;		/* segments limited transmitted */
> /* SACK related state */
> 	int	snd_numholes;		/* number of holes seen by sender */
>

Should all be changed to u_int?

There still seem to be some very nice sign extension/overflow bugs.
E.g., `ticks' is (bogusly) signed.  After changing the above variables
to int to be bug for bug compatible with `ticks', the semantics of
expressions not directly touched in this commit is changed too.  E.g.,
there is the expression `ticks < tp->t_badrxtwin'.  Changing the signedness
of the type of tp_t_badrxtwin stops promotion of `ticks' to unsigned in
this expression.  However, this expression seems to have been just broken
before, and the change only moves the bug slightly (from times near where
`ticks' wraps around at to to times near where `ticks' overflows at
INT_MAX).  To handle wraparound and avoid overflow, such expressions should
be written as (int)(ticks - tp->t_badrxtwin) < 0 where the variables have
unsigned type.  This seems to have already been done for all the other
variables changed in this commit, except the cast to int is missing in
some cases.

Wrapararound used to occur after 497 days when HZ was 100 and variables
were unsigned 32 bits.  Now overflow occurs after 24 days when HZ is
1000 and variables are signed 32 bits.  Larger misconfigurations of
HZ for polling can make the bugs occur after only a week of uptime.

It might be a practical exercise to change the type of the time variables
to uint16_t so that the wraparound bugs are seen after only a few
seconds of uptime and type size mismatches with ints through u_longs
affect all machines.  The type size mismatches shouldn't matter if times
are always clamped compatibly.  E.g., expressions like `tp->t_badrxtwin
= ticks + adj' depend on t_badrxtwin wrapping around in the same way
as `ticks'.  This could be written as something like `tp->t_badrxtwin
= (ticks + adj) & 0xFFFF' to get compatible wrapping with 16-bit values.

Bruce


More information about the svn-src-all mailing list