From nobody Wed Apr 30 16:49:20 2025 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Znjns0PRZz5tfs3; Wed, 30 Apr 2025 16:49:21 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R11" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Znjnr5FJZz4J6T; Wed, 30 Apr 2025 16:49:20 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1746031760; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=KAm1GipsZpNhownAutGfMBb3caLyBiN7nDy1nwp81rw=; b=DvMyi5fWg6dkBj2Su3HaR9rZLLJ01I5viZGMBVXfHiNRKdltQqoTJH/3KY+HR4amV2Noow Q03or18YWnxfvSfZqoL/5qLZ0kWsK3uyjlR7mVSZMRH6h0IwEp872pP5Mae98+Cc2dcBdv JCq9fOGYa+vObixEpxH1B1QPee6NiUm4UrcsiBG4+Hd/owe98981Bu/LNXJ3GuH89ivHzE zWUejOlUNF2dGBJFf3oxP8DXaOMo7a10yyNniNzEDGmZJ8plXZXxdIY/LZA0PO4rMrPgkT oclDczRB49qhFBND1BSsu5bqIbNKt97nYXWGGTIEjADoNGAPVYFu6G43dyjb4w== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1746031760; a=rsa-sha256; cv=none; b=BQQT/zodwjEiOae/xVWo+YSk+c4NbjZlseUVa8TtMIgmQx/IyfzlkM/UDnDXohgKOcrU+M GE1aLwO7Y1abxeL+glzKnfnRULUGcC1Z3ac6MGcgwYLiUU78uLI1w/e2KsJbPlV3dpX6Ez KLBrAVfevFmkXywSqfZojaT4youzLVXvuWXmx7DxQXRy5tMWF1H04cBCbLmhI/xA2g1MW9 BIJ0WNNbSejjAM/ti1+y1J1V5ppAp6ctK4+gJL1vfxMlSnIfWCTlZBtKo6tsKbwm886Trt uJBYlbNBqKTRE33yJgg73rbZ241CPFNZkHAqYy6Q7fUTp1iV4iXXY2J6D97gsA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1746031760; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=KAm1GipsZpNhownAutGfMBb3caLyBiN7nDy1nwp81rw=; b=Ww0MEc6JCF26Du2s58EFPfK3SJ6zNa8wIBmDbNoSHZd3Rk0B42gGNY6K9EgjV8qi5pwZGq lQkXzlLKV01WSYpAV5hP9Eswh1VumLATPm3E6XZWyDyW335XUtBqRxgROTApYDlGMdgFs0 GsjXiOR08WS+kzyCBJtqkrCqM6GIB3AFmulOzZS1HS1b3EjbbLj7rLe8uYBZVyF8zrYXgq j7J+x3pkloPilvaVr15S+GzgTM0vTqg6+xgA3ymVHogisYi6yanju/bkAjk+f8GNfrB1XE n5HRPEwFWj1GUa8zQAviNIsOGvybRb5ENQ7TGVi3qdRO/CdrO2+ezojhXNsYZA== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4Znjnr4cVdz1CMB; Wed, 30 Apr 2025 16:49:20 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.18.1/8.18.1) with ESMTP id 53UGnKEY022364; Wed, 30 Apr 2025 16:49:20 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.18.1/8.18.1/Submit) id 53UGnKKT022361; Wed, 30 Apr 2025 16:49:20 GMT (envelope-from git) Date: Wed, 30 Apr 2025 16:49:20 GMT Message-Id: <202504301649.53UGnKKT022361@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Gleb Smirnoff Subject: git: 626ea75ed2e9 - main - time: use precise callout for clock_nanosleep(2) and nanosleep(2) List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-main@freebsd.org Sender: owner-dev-commits-src-main@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: glebius X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: 626ea75ed2e9e9365ef8d7a4fa8ef219020c98c6 Auto-Submitted: auto-generated The branch main has been updated by glebius: URL: https://cgit.FreeBSD.org/src/commit/?id=626ea75ed2e9e9365ef8d7a4fa8ef219020c98c6 commit 626ea75ed2e9e9365ef8d7a4fa8ef219020c98c6 Author: Gleb Smirnoff AuthorDate: 2025-04-30 16:47:57 +0000 Commit: Gleb Smirnoff CommitDate: 2025-04-30 16:47:57 +0000 time: use precise callout for clock_nanosleep(2) and nanosleep(2) Don't apply tc_precexp and TIMESEL() that uses sbt_timethreshold (both derivatives of kern.timecounter.alloweddeviation) to sleep callout when processing the default and precise clocks. The default timer deviation of 5% is our internal optimization in the kernel, and we shouldn't leak that into the POSIX APIs. Note that application doesn't have any control to cancel the deviation, only a superuser can change the global tunable [with side effects]. Leave the deviation for CLOCK_*_FAST and CLOCK_SECOND that are documented as imprecise. Provide a sysctl kern.timecounter.nanosleep_precise that allows to restore the previous behavior. Improve documentation. Reviewed by: ziaee, vangyzen, imp, kib Differential Revision: https://reviews.freebsd.org/D50075 --- lib/libsys/nanosleep.2 | 52 +++++++++++++++++++++++++++++++++++++++++--------- sys/kern/kern_time.c | 36 +++++++++++++++++++++++++++------- 2 files changed, 72 insertions(+), 16 deletions(-) diff --git a/lib/libsys/nanosleep.2 b/lib/libsys/nanosleep.2 index 8a4931e51413..ba9aae1edf57 100644 --- a/lib/libsys/nanosleep.2 +++ b/lib/libsys/nanosleep.2 @@ -27,7 +27,7 @@ .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" -.Dd April 3, 2022 +.Dd April 29, 2025 .Dt NANOSLEEP 2 .Os .Sh NAME @@ -87,14 +87,6 @@ If, at the time of the call, the time value specified by is less than or equal to the time value of the specified clock, then .Fn clock_nanosleep returns immediately and the calling thread is not suspended. -.Pp -The suspension time may be longer than requested due to the -scheduling of other activity by the system. -It is also subject to the allowed time interval deviation -specified by the -.Va kern.timecounter.alloweddeviation -.Xr sysctl 8 -variable. An unmasked signal will terminate the sleep early, regardless of the .Dv SA_RESTART value on the interrupting signal. @@ -131,6 +123,32 @@ CLOCK_UPTIME_FAST CLOCK_UPTIME_PRECISE .El .Pp +The suspension time may be longer than requested due to the +scheduling of other activity by the system. +The clocks with the +.Dv _FAST +suffix and the +.Dv CLOCK_SECOND +are subject to the allowed time interval deviation specified by the +.Va kern.timecounter.alloweddeviation +.Xr sysctl 8 +variable. +The clocks with the +.Dv _PRECISE +suffix are always as precise as possible. +The +.Dv CLOCK_MONOTONIC , +.Dv CLOCK_REALTIME +and +.Dv CLOCK_UPTIME +are precise by default. +Setting the +.Va kern.timecounter.nanosleep_precise +.Xr sysctl 8 +to a false value would make those clocks to behave like the +.Dv _FAST +clocks. +.Pp The .Fn nanosleep function behaves like @@ -217,3 +235,19 @@ and was ported to .Ox 2.1 and .Fx 3.0 . +The +.Fn clock_nanosleep +system call has been available since +.Fx 11.1 . +.Pp +In +.Fx 15.0 +the default behavior of +.Fn clock_nanosleep +with +.Dv CLOCK_MONOTONIC , +.Dv CLOCK_REALTIME , +.Dv CLOCK_UPTIME +clocks and +.Fn nanosleep +has been switched to use precise clock. diff --git a/sys/kern/kern_time.c b/sys/kern/kern_time.c index d7dc78366292..0c31c1563d99 100644 --- a/sys/kern/kern_time.c +++ b/sys/kern/kern_time.c @@ -494,6 +494,10 @@ kern_nanosleep(struct thread *td, struct timespec *rqt, struct timespec *rmt) rmt)); } +static __read_mostly bool nanosleep_precise = true; +SYSCTL_BOOL(_kern_timecounter, OID_AUTO, nanosleep_precise, CTLFLAG_RW, + &nanosleep_precise, 0, "clock_nanosleep() with CLOCK_REALTIME, " + "CLOCK_MONOTONIC, CLOCK_UPTIME and nanosleep(2) use precise clock"); static uint8_t nanowait[MAXCPU]; int @@ -504,7 +508,7 @@ kern_clock_nanosleep(struct thread *td, clockid_t clock_id, int flags, sbintime_t sbt, sbtt, prec, tmp; time_t over; int error; - bool is_abs_real; + bool is_abs_real, precise; if (rqt->tv_nsec < 0 || rqt->tv_nsec >= NS_PER_SEC) return (EINVAL); @@ -512,17 +516,31 @@ kern_clock_nanosleep(struct thread *td, clockid_t clock_id, int flags, return (EINVAL); switch (clock_id) { case CLOCK_REALTIME: + precise = nanosleep_precise; + is_abs_real = (flags & TIMER_ABSTIME) != 0; + break; case CLOCK_REALTIME_PRECISE: + precise = true; + is_abs_real = (flags & TIMER_ABSTIME) != 0; + break; case CLOCK_REALTIME_FAST: case CLOCK_SECOND: + precise = false; is_abs_real = (flags & TIMER_ABSTIME) != 0; break; case CLOCK_MONOTONIC: - case CLOCK_MONOTONIC_PRECISE: - case CLOCK_MONOTONIC_FAST: case CLOCK_UPTIME: + precise = nanosleep_precise; + is_abs_real = false; + break; + case CLOCK_MONOTONIC_PRECISE: case CLOCK_UPTIME_PRECISE: + precise = true; + is_abs_real = false; + break; + case CLOCK_MONOTONIC_FAST: case CLOCK_UPTIME_FAST: + precise = false; is_abs_real = false; break; case CLOCK_VIRTUAL: @@ -553,10 +571,14 @@ kern_clock_nanosleep(struct thread *td, clockid_t clock_id, int flags, } else over = 0; tmp = tstosbt(ts); - prec = tmp; - prec >>= tc_precexp; - if (TIMESEL(&sbt, tmp)) - sbt += tc_tick_sbt; + if (precise) { + prec = 0; + sbt = sbinuptime(); + } else { + prec = tmp >> tc_precexp; + if (TIMESEL(&sbt, tmp)) + sbt += tc_tick_sbt; + } sbt += tmp; error = tsleep_sbt(&nanowait[curcpu], PWAIT | PCATCH, "nanslp", sbt, prec, C_ABSOLUTE);