From nobody Tue Nov 30 13:34:19 2021 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id A2E8918A893C; Tue, 30 Nov 2021 13:34:19 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4J3NVM2yXxz3qkY; Tue, 30 Nov 2021 13:34:19 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 459A8172AE; Tue, 30 Nov 2021 13:34:19 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 1AUDYJqV014079; Tue, 30 Nov 2021 13:34:19 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 1AUDYJEU014078; Tue, 30 Nov 2021 13:34:19 GMT (envelope-from git) Date: Tue, 30 Nov 2021 13:34:19 GMT Message-Id: <202111301334.1AUDYJEU014078@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Andriy Gapon Subject: git: 3d9d64aa1846 - main - kern_tc: unify timecounter to bintime delta conversion List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: avg X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: 3d9d64aa1846217eac9229f8cba5cb6646a688b7 Auto-Submitted: auto-generated ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1638279259; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=yOSP+9nsFJrMRNvRA/rt5Nt7HGwZFkdAUkw23zBCQgA=; b=UaBTbroxs6SzLaxhnsPX+3phwgX+e3v46t/S0Q4CMIv7A95edQXyJrTPdHy1xEw4AlQBK8 VN03Ca31QiBO6CXMICU4tC0ZlN2+sL1m6Nk21afiJyKBcbW3GKij2RKrTTqzpv+sl/IBIr +/V1OLervA1ZiEM0fORMOYWiWwHU2YMNRVpa6rYXEP7rpxVzWJbf+Huf7GNlVJysNHZSt8 eRkuldqNSU7t6E9AzC5NT5JNGIN3UNIOfRmcUfH9DFUkA1IsX1L9Vr6cWrbETySrPA/ZHG LsRJf6ALCc+DkH4NiT3kVmqzW5q5ct69i7LCw2HuhFbwFm3t6gmrVRiFfpo0aA== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1638279259; a=rsa-sha256; cv=none; b=kNbtzcKUvTtS6+gsC+O1352LtEc02apMwtDEl4wdqSDFcHfZnNE+Kk5p47Xd8drAOiRIg6 GxDM0dXH4UiD2eRs/EPIHS21mpTGYBo7VjYHWGyGrCV+dywVvtcy4Nyg5qSqwi0YWAKnke IC0CItUKJ/SmiN6RjZvnVPqkOz9w9908Yv9MVuw7/Qt5fknNlGHecoM72548QpgmlmjB4c +sPOhENNvsIUZXKv2slKZRAiXQ87OCemloZaA0E2jjf+KXbZaOaGwZBLTVAI7sHupxfOAd rSG6Ba05rz4ne/W1UtdFd3jlq4kStP/o780ruspZmJHD0t7WaVby8lE86Pt0rg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N The branch main has been updated by avg: URL: https://cgit.FreeBSD.org/src/commit/?id=3d9d64aa1846217eac9229f8cba5cb6646a688b7 commit 3d9d64aa1846217eac9229f8cba5cb6646a688b7 Author: Andriy Gapon AuthorDate: 2021-11-30 13:23:23 +0000 Commit: Andriy Gapon CommitDate: 2021-11-30 13:23:23 +0000 kern_tc: unify timecounter to bintime delta conversion There are two places where we convert from a timecounter delta to a bintime delta: tc_windup and bintime_off. Both functions use the same calculations when the timecounter delta is small. But for a large delta (greater than approximately an equivalent of 1 second) the calculations were different. Both functions use approximate calculations based on th_scale that avoid division. Both produce values slightly greater than a true value, calculated with division by tc_frequency, would be. tc_windup is slightly more accurate, so its result is closer to the true value and, thus, smaller than bintime_off result. As a consequence there can be a jump back in time when time hands are switched after a long period of time (a large delta). Just before the switch the time would be calculated with a large delta from th_offset_count in bintime_off. tc_windup does the switch using its own calculations of a new th_offset using the large delta. As explained earlier, the new th_offset may end up being less than the previously produced binuptime. So, for a period of time new binuptime values may be "back in time" comparing to values just before the switch. Such a jump must never happen. All the code assumes that the uptime is monotonically nondecreasing and some code works incorrectly when that assumption is broken. For example, we have observed sleepq_timeout() ignoring a timeout when the sbinuptime value obtained by the callout code was greater than the expiration value, but the sbinuptime obtained in sleepq_timeout() was less than it. In that case the target thread would never get woken up. The unified calculations should ensure the monotonic property of the uptime. The problem is quite rare as normally tc_windup should be called HZ times per second (typically 1000 or 100). But it may happen in VMs on very busy hypervisors where a VM's virtual CPU may not get an execution time slot for a second or more. Reviewed by: kib MFC after: 2 weeks Sponsored by: Panzura LLC --- sys/kern/kern_tc.c | 42 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c index f041c5f547e0..01b4e8656090 100644 --- a/sys/kern/kern_tc.c +++ b/sys/kern/kern_tc.c @@ -213,6 +213,23 @@ tc_delta(struct timehands *th) tc->tc_counter_mask); } +static __inline void +bintime_add_tc_delta(struct bintime *bt, uint64_t scale, + uint64_t large_delta, uint64_t delta) +{ + uint64_t x; + + if (__predict_false(delta >= large_delta)) { + /* Avoid overflow for scale * delta. */ + x = (scale >> 32) * delta; + bt->sec += x >> 32; + bintime_addx(bt, x << 32); + bintime_addx(bt, (scale & 0xffffffff) * delta); + } else { + bintime_addx(bt, scale * delta); + } +} + /* * Functions for reading the time. We have to loop until we are sure that * the timehands that we operated on was not updated under our feet. See @@ -224,7 +241,7 @@ bintime_off(struct bintime *bt, u_int off) { struct timehands *th; struct bintime *btp; - uint64_t scale, x; + uint64_t scale; u_int delta, gen, large_delta; do { @@ -238,15 +255,7 @@ bintime_off(struct bintime *bt, u_int off) atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); - if (__predict_false(delta >= large_delta)) { - /* Avoid overflow for scale * delta. */ - x = (scale >> 32) * delta; - bt->sec += x >> 32; - bintime_addx(bt, x << 32); - bintime_addx(bt, (scale & 0xffffffff) * delta); - } else { - bintime_addx(bt, scale * delta); - } + bintime_add_tc_delta(bt, scale, large_delta, delta); } #define GETTHBINTIME(dst, member) \ do { \ @@ -1392,17 +1401,8 @@ tc_windup(struct bintime *new_boottimebin) #endif th->th_offset_count += delta; th->th_offset_count &= th->th_counter->tc_counter_mask; - while (delta > th->th_counter->tc_frequency) { - /* Eat complete unadjusted seconds. */ - delta -= th->th_counter->tc_frequency; - th->th_offset.sec++; - } - if ((delta > th->th_counter->tc_frequency / 2) && - (th->th_scale * delta < ((uint64_t)1 << 63))) { - /* The product th_scale * delta just barely overflows. */ - th->th_offset.sec++; - } - bintime_addx(&th->th_offset, th->th_scale * delta); + bintime_add_tc_delta(&th->th_offset, th->th_scale, + th->th_large_delta, delta); /* * Hardware latching timecounters may not generate interrupts on