kern/181416: socket timeout rounding issue

Tue Aug 20 10:20:01 UTC 2013

The following reply was made to PR kern/181416; it has been noted by GNATS.

From: Bruce Evans <brde at optusnet.com.au>
To: Vitja Makarov <vitja.makarov at gmail.com>
Cc: freebsd-gnats-submit at FreeBSD.org, freebsd-bugs at FreeBSD.org
Subject: Re: kern/181416: socket timeout rounding issue
Date: Tue, 20 Aug 2013 20:11:37 +1000 (EST)

 On Tue, 20 Aug 2013, Vitja Makarov wrote:

 >> Description:
 > Recently I was playing with small socket timeouts. setsockopt(2)
 > SO_RCVTIMEO and found a problem with it: if timeout is small enough
 > read(2) may return before timeout is actually expired.
 >
 > I was unable to reproduce this on linux box.
 >
 > I found that kernel uses a timer with 1/HZ precision so it converts
 > time in microseconds to ticks that's ok linux does it as well. The
 > problem is in details: freebsd uses floor() approach while linux uses
 > ceil():
 >
 > from FreeBSD's sys/kern/uipc_socket.c:
 > val = (u_long)(tv.tv_sec * hz) + tv.tv_usec / tick;

 This is actually an off-by-2 error in most case.  ceil() isn't high enough
 either, since for example with hz = 100 and tv = 25 msec, the ceil() of 3
 ticks is 2 full ticks plus a fractional tick which may be 1 nsec long.  At
 least with old timeout code.

 > if (val == 0 && tv.tv_usec != 0)
 >     val = 1; /* at least one tick if tv > 0 */

 This does the ceil() in the special case where tv < 1 tick.  This is a
 waste of timeout, at least with old timeout code, since callout_reset()
 used to add 1.  This seems to have been lost, breaking old callers that
 depended on it.  Current timeout code tries to be more accurute, but that
 means that it less accurate if the caller is broken and rounds down.
 Maybe your bug can only be seen with the increased accuracy.

 tvtohz() should always be used to convert timevals to ticks.  It rounds
 up and adds 1, and handles overflow.  The conversion in uipc_socket.c
 isn't even short.  It takes 15 lines for its own overflow handling.  It
 seems to check the SHRT_MAX limit twice.

 If uipc_socket.c called tvtohz(), then it would still have to check
 that the result fits in a short.  Its error handling when it doesn't
 fit seems wrong.  EDOM is documented as a domain error for math
 software.  setsockopt() isn't math software, and EDOM isn't a documented
 errno for it.  EINVAL and EOVERFLOW are more usual kernel errors for
 unrepresentable values.

 Grepping for ' / tick' in /sys shows no other home made tvtohz()'s.

 > from Linux's net/core/sock.c:
 > *timeo_p = tv.tv_sec*HZ + (tv.tv_usec+(1000000/HZ-1))/(1000000/HZ);

 The conversion is much simpler when HZ is hard-coded.  Linux has some
 bounds checking before this, but the error handling in at least
 Linux-2.6.10 is to ignore invalid tv's and return success without
 changing the timeout.

 > So, for instance, we have a freebsd system running with kern.hz set to
 > 100 and set receive timeout to 25ms that is converted to 2 ticks which
 > is 20ms. In my test program read(2) returns with EAGAIN set in
 > 0.019ms.

 Bruce

 Bruce