svn commit: r280674 - user/dchagin/lemul/sys/compat/linux

Thu Mar 26 07:43:22 UTC 2015

On Thu, 26 Mar 2015, Dmitry Chagin wrote:

> Log:
>  Linux nanosleep() and clock_nanosleep() system calls always
>  writes the remaining time into the structure pointed to by rmtp
>  unless rmtp is NULL. The value of *rmtp can then be used to call
>  nanosleep() again and complete the specified pause if the previous
>  call was interrupted.
>
>  Note. clock_nanosleep() with an absolute time value does not write
>  the remaining time.

FreeBSD doesn't even have clock_nanosleep().  It also sleeps on a wrong
clock id (CLOCK_MONOTONIC instead of CLOCK_REALTIME) in nanosleep().
clock_nanosleep() exists mainly because CLOCK_REALTIME is usually the
wrong clock to sleep on, but is the one specified for nanosleep() for
historical reasons.

It is stupid for the emulator to have clock_nanosleep() before the host
system.  The the emulator doesn't seem to have it either.  It seems to
just use the native nanosleep(), so sleeps on the wrong clock id for all
except CLOCK_MONOTONIC.

> Modified: user/dchagin/lemul/sys/compat/linux/linux_time.c
> ==============================================================================
> --- user/dchagin/lemul/sys/compat/linux/linux_time.c	Thu Mar 26 06:00:42 2015	(r280673)
> +++ user/dchagin/lemul/sys/compat/linux/linux_time.c	Thu Mar 26 06:36:34 2015	(r280674)
> ...
> @@ -490,25 +488,19 @@ linux_nanosleep(struct thread *td, struc
> 		return (error);
> 	}
> 	error = kern_nanosleep(td, &rqts, rmtp);

The emulator could fix up cases where the sleep on the wrong clock is too
short, by calculating the error and sleeping again.  It doesn't do this,
but if it just used the correct clock id for calculating the remaining
time for the purpose of returning it, then it would know the time to
sleep again (when there is no error but the remaining time is > 0).

> @@ -558,24 +550,19 @@ linux_clock_nanosleep(struct thread *td,
> 		return (error);
> 	}
> 	error = kern_nanosleep(td, &rqts, rmtp);

Linux apparently uses the correct clock id for nanosleep(), since this
function only tries to support that clock id (LINUX_CLOCK_REALTIME).
The function has already returned EINVAL for other clock ids.  But since
FreeBSD nanosleep() is broken, the unique clock id that can be supported
here is actually LINUX_CLOCK_MONOTONIC.
   Strictly, not even that.  FreeBSD used to use getnanouptime() in
   nanosleep(), so nanosleep() only gave 1/HZ resolution.  There is a
   clock id CLOCK_MONOTIC_FAST for this, so nanosleep() strictly only
   used that clock id, and if clock_nanosleep() were implemented then
   it should use getnanouptime() precisely when the caller uses this
   clock id.

   Presumably Linux doesn't have LINUX_CLOCK_MONOTONIC_FAST to map to
   the FreeBSD mistake CLOCK_MONOTONIC_FAST, so the best this function
   can do is map to CLOCK_MONOTONIC.  Changing to this would probably
   break more than it fixes.  Apparently, Linux software does use
   clock_nanosleep(), but with CLOCK_REALTIME, else the error for using
   it with CLOCK_MONOTIC would be noticed.  clock_nanosleep() is not
   very useful without TIMER_ABSTIME since it is only a verbose spelling
   of nanosleep() then.  It is useful with TIMER_ABSTIME, but that case
   is not supported.  Perhaps the software uses clock_nanosleep() because
   it wants to use TIMER_ABSTIME someday when that is supported.

   Now, nanosleep() uses sbintimes so it isn't clear what clock id it
   sleeps on.  The clock is still monotonic and not affected by leap
   seconds or suspensions.  Its resolution seems to be much lower than
   1/HZ even when HZ is 100.  "sleep 1" often sleeps for 65 milliseconds
   extra on freefall, but shorter sleeps rarely sleep by more than 3
   milliseconds extra.

> -	if (error != 0) {
> -		LIN_SDT_PROBE1(time, linux_clock_nanosleep, nanosleep_error,
> -		    error);
> -		LIN_SDT_PROBE1(time, linux_clock_nanosleep, return, error);
> -		return (error);
> -	}
> -
> 	if (args->rmtp != NULL) {
> +		/* XXX. Not for TIMER_ABSTIME */

TIMER_ABSTIME has already been handled (by XXXing and returning an error).

TIMER_ABSTIME is only very useful with CLOCK_REALTIME.  It allows sleeping
until a specified real time without being messed up by leap seconds and
clock steps.  For CLOCK_MONOTONIC, it is not even clear what an absolute
time is.  I think POSIX specifies CLOCK_MONOTONIC to increment in as
clocks to physical seconds as it can, but in FreeBSD it stops incrementing
during suspend.

Bruce