kern/147647: select wakes after 24 hours even if timeout is longer

Mon Jun 7 19:50:04 UTC 2010

The following reply was made to PR kern/147647; it has been noted by GNATS.

From: Bruce Evans <brde at optusnet.com.au>
To: Martin Simmons <martin at lispworks.com>
Cc: freebsd-gnats-submit at FreeBSD.org, freebsd-bugs at FreeBSD.org
Subject: Re: kern/147647: select wakes after 24 hours even if timeout is
 longer
Date: Tue, 8 Jun 2010 05:41:39 +1000 (EST)

 On Mon, 7 Jun 2010, Martin Simmons wrote:

 >> Description:
 > The select system call wakes indicating timeout after 24 hours even if the timeout specifies a longer delay.
 >
 >> Fix:
 > I think the problem is that seltdwait is implemented using cv_timedwait_sig, which returns EWOULDBLOCK on timeout.  The calls to seltdwait in kern_select etc limit the timeout to a maximum of 24 * 60 * 60 * hz but interpret EWOULDBLOCK as a reason to exit rather than checking the actual timeout.

 The bug may be mainly that the limiting code is garbage.  It defeats the
 careful scaling of tvtohz(), and invokes undefined behaviour in the check
 if hz is preposterously large.

 I use the following fix in FreeBSD-~5.

 % Index: sys_generic.c
 % ===================================================================
 % RCS file: /home/ncvs/src/sys/kern/sys_generic.c,v
 % retrieving revision 1.131
 % diff -u -2 -r1.131 sys_generic.c
 % --- sys_generic.c	5 Apr 2004 21:03:35 -0000	1.131
 % +++ sys_generic.c	13 Aug 2009 11:21:29 -0000
 % @@ -830,13 +819,10 @@
 %  		ttv = atv;
 %  		timevalsub(&ttv, &rtv);
 % -		timo = ttv.tv_sec > 24 * 60 * 60 ?
 % -		    24 * 60 * 60 * hz : tvtohz(&ttv);
 % +		timo = tvtohz(&ttv);
 %  	}
 % 
 %  	/*
 % -	 * An event of interest may occur while we do not hold
 % -	 * sellock, so check TDF_SELECT and the number of
 % -	 * collisions and rescan the file descriptors if
 % -	 * necessary.
 % +	 * An event of interest may have occurred while we did not hold
 % +	 * sellock.  Check for this and rescan if necessary.
 %  	 */
 %  	mtx_lock_spin(&sched_lock);
 % @@ -1016,12 +1008,8 @@
 %  		ttv = atv;
 %  		timevalsub(&ttv, &rtv);
 % -		timo = ttv.tv_sec > 24 * 60 * 60 ?
 % -		    24 * 60 * 60 * hz : tvtohz(&ttv);
 % +		timo = tvtohz(&ttv);
 %  	}
 % -	/*
 % -	 * An event of interest may occur while we do not hold
 % -	 * sellock, so check TDF_SELECT and the number of collisions
 % -	 * and rescan the file descriptors if necessary.
 % -	 */
 % +
 % +	/* Rescan if necessary, as above. */
 %  	mtx_lock_spin(&sched_lock);
 %  	if ((td->td_flags & TDF_SELECT) == 0 || nselcoll != ncoll) {

 This also fixes style bugs in an unrelated comment, and has the same fixes
 for poll().

 When hz is preposterously large, the 24 * 60 * 60 * hz invokes undefined
 behaviour by overflowing.  This bug is missing in tvtohz().

 When hz is merely large, the timeout in seconds is somewhat limited and
 the bug probably still occurs.  tvtohz() returns an int, so the timeout
 is limited to INT_MAX / hz seconds (actually 1 or 2 ticks less).  tvtohz()
 clamps to (INT_MAX - 1) in this case (1 less than the max so that 1 can
 be added without checking).  The too-large default hz of 1000 allows
 the timeout to be up to nearly 25 days.

 seltdwait() (like any function that can time out) is supposed to fail
 with EWOULDBLOCK if the timeout expires, since this is the API for
 timeout expiry and seltdwait() has no knowledge of any longer timeout
 at a higher level.  select() and poll() should check if the timeout
 expired and restart with a reduced timeout, but apparently don't.

 These bugs are both missing in kern_nanosleep().  Smaller bugs involving
 sleeping unnecessarily long are present there too.  Clock drift is
 normally only a few seconds per day, but innacuracies in the timeout
 calibration can be much larger than that (I think up to about 40 seconds
 per day is possible and unavoidable at the hardware level for an i8254
 clk0 with the too-large default hz of 1000).  If these innacuracies
 result in a timeout being late then there is no way to recover.

 I just noticed another bug in nanosleep().  POSIX.1-2001 requires it
 to use CLOCK_REALTIME, but in FreeBSD it uses CLOCK_MONOTONIC.
 POSIX.1-2001 provides clock_nanosleep() to support specifying the
 clock, but FreeBSD doesn't support this yet.  The POSIX rationale
 says that nanosleep() was intentionally not modified to use
 CLOCK_MONOTONIC.  I couldn't find anything in POSIX about the clock
 used for select() and poll() timeouts.  Only CLOCK_MONOTONIC makes
 sense, but that is true for nanosleep() too.

 Bruce