git: 3d9d64aa1846 - main - kern_tc: unify timecounter to bintime delta conversion
Date: Tue, 30 Nov 2021 13:34:19 UTC
The branch main has been updated by avg: URL: https://cgit.FreeBSD.org/src/commit/?id=3d9d64aa1846217eac9229f8cba5cb6646a688b7 commit 3d9d64aa1846217eac9229f8cba5cb6646a688b7 Author: Andriy Gapon <avg@FreeBSD.org> AuthorDate: 2021-11-30 13:23:23 +0000 Commit: Andriy Gapon <avg@FreeBSD.org> CommitDate: 2021-11-30 13:23:23 +0000 kern_tc: unify timecounter to bintime delta conversion There are two places where we convert from a timecounter delta to a bintime delta: tc_windup and bintime_off. Both functions use the same calculations when the timecounter delta is small. But for a large delta (greater than approximately an equivalent of 1 second) the calculations were different. Both functions use approximate calculations based on th_scale that avoid division. Both produce values slightly greater than a true value, calculated with division by tc_frequency, would be. tc_windup is slightly more accurate, so its result is closer to the true value and, thus, smaller than bintime_off result. As a consequence there can be a jump back in time when time hands are switched after a long period of time (a large delta). Just before the switch the time would be calculated with a large delta from th_offset_count in bintime_off. tc_windup does the switch using its own calculations of a new th_offset using the large delta. As explained earlier, the new th_offset may end up being less than the previously produced binuptime. So, for a period of time new binuptime values may be "back in time" comparing to values just before the switch. Such a jump must never happen. All the code assumes that the uptime is monotonically nondecreasing and some code works incorrectly when that assumption is broken. For example, we have observed sleepq_timeout() ignoring a timeout when the sbinuptime value obtained by the callout code was greater than the expiration value, but the sbinuptime obtained in sleepq_timeout() was less than it. In that case the target thread would never get woken up. The unified calculations should ensure the monotonic property of the uptime. The problem is quite rare as normally tc_windup should be called HZ times per second (typically 1000 or 100). But it may happen in VMs on very busy hypervisors where a VM's virtual CPU may not get an execution time slot for a second or more. Reviewed by: kib MFC after: 2 weeks Sponsored by: Panzura LLC --- sys/kern/kern_tc.c | 42 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c index f041c5f547e0..01b4e8656090 100644 --- a/sys/kern/kern_tc.c +++ b/sys/kern/kern_tc.c @@ -213,6 +213,23 @@ tc_delta(struct timehands *th) tc->tc_counter_mask); } +static __inline void +bintime_add_tc_delta(struct bintime *bt, uint64_t scale, + uint64_t large_delta, uint64_t delta) +{ + uint64_t x; + + if (__predict_false(delta >= large_delta)) { + /* Avoid overflow for scale * delta. */ + x = (scale >> 32) * delta; + bt->sec += x >> 32; + bintime_addx(bt, x << 32); + bintime_addx(bt, (scale & 0xffffffff) * delta); + } else { + bintime_addx(bt, scale * delta); + } +} + /* * Functions for reading the time. We have to loop until we are sure that * the timehands that we operated on was not updated under our feet. See @@ -224,7 +241,7 @@ bintime_off(struct bintime *bt, u_int off) { struct timehands *th; struct bintime *btp; - uint64_t scale, x; + uint64_t scale; u_int delta, gen, large_delta; do { @@ -238,15 +255,7 @@ bintime_off(struct bintime *bt, u_int off) atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); - if (__predict_false(delta >= large_delta)) { - /* Avoid overflow for scale * delta. */ - x = (scale >> 32) * delta; - bt->sec += x >> 32; - bintime_addx(bt, x << 32); - bintime_addx(bt, (scale & 0xffffffff) * delta); - } else { - bintime_addx(bt, scale * delta); - } + bintime_add_tc_delta(bt, scale, large_delta, delta); } #define GETTHBINTIME(dst, member) \ do { \ @@ -1392,17 +1401,8 @@ tc_windup(struct bintime *new_boottimebin) #endif th->th_offset_count += delta; th->th_offset_count &= th->th_counter->tc_counter_mask; - while (delta > th->th_counter->tc_frequency) { - /* Eat complete unadjusted seconds. */ - delta -= th->th_counter->tc_frequency; - th->th_offset.sec++; - } - if ((delta > th->th_counter->tc_frequency / 2) && - (th->th_scale * delta < ((uint64_t)1 << 63))) { - /* The product th_scale * delta just barely overflows. */ - th->th_offset.sec++; - } - bintime_addx(&th->th_offset, th->th_scale * delta); + bintime_add_tc_delta(&th->th_offset, th->th_scale, + th->th_large_delta, delta); /* * Hardware latching timecounters may not generate interrupts on