powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]
Michael Tuexen
tuexen at fh-muenster.de
Tue Apr 2 22:27:45 UTC 2019
> On 24. Mar 2019, at 12:01, Konstantin Belousov <kostikbel at gmail.com> wrote:
>
> On Sat, Mar 09, 2019 at 06:00:14PM +1100, Bruce Evans wrote:
>> I more strongly disclike (sic) the more complete merge. The central APIs
>> have even more parameters and reduced type safety to describe objects as
>> (offset, size) pairs.
> I changed the patch to be type-safe. Now I like it even more. It provides
> 1. internal
> 2. concise
> 3. type-safe
> API to fetch data from timehands. The implementation needs to be read
> only once.
Hi,
I'm a bit lost... I think this started to fix a problem on G5 PowerMacs.
Do you think this patch solves the problem. Should this be tested?
Or is this still work in progress or a general improvement not necessary
fixing the problem on G5 PowerMacs?
Best regards
Michael
>
>> Small delicate loops are ideal for duplicating. They are easier to
>> understand individually and short enough to compare without using diff
>> to see gratuitous and substantive differences. Multiple instances are
>> only hard to write and maintain. Since these multiple instances are
>> already written, they are only harder to maintain.
> This is a good argument to have bintime_off and getthmember unmerged
> (there are two small but delicate loops). The API is internal, so it is
> only matter for maintainer, which means that the absence of duplication
> is important. More, all that arguments clearly explain why there should
> be not twenty similar loops scattered over the source.
>
>>
>>>> XX void
>>>> XX binuptime(struct bintime *bt)
>>>> XX {
>>>> XX @@ -361,7 +383,7 @@
>>>> XX th = timehands;
>>>> XX gen = atomic_load_acq_int(&th->th_generation);
>>>> XX *bt = th->th_offset;
>>>> XX - bintime_addx(bt, th->th_scale * tc_delta(th));
>>>> XX + bintime_adddelta(bt, th);
>>>> XX atomic_thread_fence_acq();
>>>> XX } while (gen == 0 || gen != th->th_generation);
>>>> XX }
>>>>
>>>> This is the kind of non-churning change that I like.
>>> Ok. I made all cases where timehands are read, more uniform by
>>> moving calculations after the generation loop. This makes the
>>> atomic part of the functions easier to see, and loop body has lower
>>> chance to hit generation reset.
>>
>> I think this change is slightly worse:
>> - it increases register pressure. 'scale' and 'delta' must be read in a
>> alost program program before the loop exit test. The above order uses
>> them and stores the results to memory, so more registers are free for
>> the exit test. i386 certainly runs out of registers. IIRC, i386 now
>> spills 'gen'. It would have to spill something to load 'gen' or 'th'
>> for the test.
> Which does not matter on any modern architecture anyway.
>
>> - it enlarges the window between reading 'scale' and 'delta' and the
>> caller seeing the results. Preemption in this window gives results
>> that may be far in the past.
> My opinion is that quickly exiting the code and avoid retry is more
> important (as in performance) than making an impression that we protect
> against preemption. If preemption is important for the caller, then
> the calling place must use some measures like interrupt disabling and
> re-checking the time after the operation. Preemption can occur after the
> loop exit with the same consequences.
>
>> The 'get' name is another problem. I would like all the get*time
>> functions and not add new names starting with 'get'. The library
>> implementation already doesn't bother optimizing the get*time functions,
>> but always uses the hardware timecounter.
>>
>> getfoo() is a more natural name than foo_get() for the action of getting
>> foo, but the latter is better for consistency, especially in code that
>> puts the subsystem name first in nearby code.
>>
>> The get*time functions would be better if they were more like
>> time_second. Note that time_second is racy if time_t is too larger
>> for the arch so that accesses to it are not atomic, as happens on
>> 32-bit arches with premature 64-bit time_t. However, in this 32/64
>> case, the race is only run every 136 years, with the next event
>> scheduled in 2038, so this race is even less important now than other
>> events scheduled in 2038. Bintimes are 96 or 128 bits, so directly
>> copying a global like time_second for them would race every 1/2**32
>> second on 2-bit arches or every 1 second on 64-bit arches. Most of
>> the loops on the generation count are for fixing these races, but
>> perhaps a simpler method would work. On 64-bit arches with atomic
>> 64 accesses on 32-bit boundaries, the following would work:
>> - set the lower 32 bits of the fraction to 0, or ignore them
>> - load the higher 32 bits of the fraction and the lower 32 bits of the
>> seconds
>> - race once every 136 years starting in 2038 reading the higher 32 bits
>> of the seconds non-atomically.
>> - alternatively, break instead of racing in 2038 by setting the higher
>> 32 bits to 0. This is the same as using sbintimes instead of bintimes.
>> - drop a few more lower bits by storing a right-shifted value. Right
>> shifting by just 1 gives a race frequency of once per 272 years, with
>> the next one in 2006.
> It would make sense if the functions were written from scratch, but since
> we already have the generation counts, it is not obvious that such change
> is useful.
>
> But if we decide to go that route (later), my current patch only
> requires exactly one location getthmember() to experiment with and to
> change after. So you effectively made yet another, and perhaps most
> convincing, argument, for me.
>
>>
>>> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
>>> index 2656fb4d22f..8d12847f2cd 100644
>>> --- a/sys/kern/kern_tc.c
>>> +++ b/sys/kern/kern_tc.c
>>> @@ -200,20 +202,56 @@ tc_delta(struct timehands *th)
>>> * the comment in <sys/time.h> for a description of these 12 functions.
>>> */
>>>
>>> -#ifdef FFCLOCK
>>> -void
>>> -fbclock_binuptime(struct bintime *bt)
>>> +static __inline void
>>> +bintime_off(struct bintime *bt, u_int off)
>>> {
>>> struct timehands *th;
>>> - unsigned int gen;
>>> + struct bintime *btp;
>>> + uint64_t scale, x;
>>> + u_int delta, gen, large_delta;
>>>
>>> do {
>>> th = timehands;
>>> gen = atomic_load_acq_int(&th->th_generation);
>>> - *bt = th->th_offset;
>>> - bintime_addx(bt, th->th_scale * tc_delta(th));
>>
>> You didn't fully obfuscate this by combinining this function with
>> getthmember() so as to deduplicate the loop.
>>
>>> + btp = (struct bintime *)((vm_offset_t)th + off);
>>
>> Ugly conversion to share code. This is technically incorrect. Improving
>> the casts gives:
>>
>> btp = (void *)(uintptr_t)((uintptr_t)(void *)th + off);
>>
>> but this assumes that arithmetic on the intermediate integer does what
>> is espected. uintptr_t is only guaranteed to work when the intermediate
>> representation held in it is not adjusted.
> vm_offset_t has the semantic that is needed for the arithmetic. It is
> better uintptr_t for kernel, where we know that all object pointers are
> compatible with vm_offset_t (AKA flat tag-less memory model).
>
>>
>> Fixing the API gives
>>
>> static __inline void
>> bintime_off(struct bintime *btp, struct bintime *base_btp)
>>
>> where base_btp is &th->th_bintime or &th->th_offset.
>>
>> (th_offset and th_bintime are badly named. th_offset is really a base
>> time and the offset is tc_delta(). th_bintime is also a base time.
>> It is the same as th_offset with another actual offset (the difference
>> between UTC and local time) already added to it as an optimization. In
>> old versions, th_bintime didn't exist, but the related struct members
>> th_nanotime and th_microtime existed, since these benefit more from
>> not converting on every call.
> How could it be &th->th_offset, when th is calculated inside the call ?
> But I did modified the API in this spirit, indeed. It takes the member
> name directly as an argument.
>
>>
>> My old version even documents the struct members, while -current still
>> has no comments. The comments were lost to staticization. My version
>> mostly adds "duh" to the banal comments after recovering them:
>>
>> XX /*
>> XX * XXX rotted comment cloned from <sys/timetc.h>.
>> XX *
>> XX * th_counter is undocumented (duh).
>> XX *
>> XX * th_adjustment [PPM << 16] which means that the smallest unit of correction
>> XX * you can apply amounts to 481.5 usec/year.
>> XX *
>> XX * th_scale is undocumented (duh).
>> XX *
>> XX * th_offset_count is the contents of the counter which corresponds to the
>> XX *
>> XX * rest of the offset_* values.
>> XX *
>> XX * th_offset is undocumented (duh).
>> XX *
>> XX * th_microtime is undocumented (duh).
>> XX *
>> XX * th_nanotime is undocumented (duh).
>> XX *
>> XX * XXX especially massive bitrot here. "three" is now "many"...
>> XX * Each timecounter must supply an array of three timecounters. This is needed
>> XX * to guarantee atomicity in the code. Index zero is used to transport
>> XX * modifications, for instance done with sysctl, into the timecounter being
>> XX * used in a safe way. Such changes may be adopted with a delay of up to 1/HZ.
>> XX * Index one and two are used alternately for the actual timekeeping.
>> XX *
>> XX * th_generation is undocumented (duh).
>> XX *
>> XX * th_next is undocumented (duh).
>> XX */
>>
>>> + *bt = *btp;
>>> + scale = th->th_scale;
>>> + delta = tc_delta(th);
>>> + large_delta = th->th_large_delta;
>>
>> I had forgotten that th_scale is so volatile (it may be adjusted on
>> every windup). th_large_delta is equally volatile. So moving the
>> calculation outside of the loop gives even more register pressure
>> than I noticed above.
>>
>>> atomic_thread_fence_acq();
>>> } while (gen == 0 || gen != th->th_generation);
>>> +
>>> + if (__predict_false(delta < large_delta)) {
>>> + /* Avoid overflow for scale * delta. */
>>> + x = (scale >> 32) * delta;
>>> + bt->sec += x >> 32;
>>> + bintime_addx(bt, x << 32);
>>> + bintime_addx(bt, (scale & 0xffffffff) * delta);
>>> + } else {
>>> + bintime_addx(bt, scale * delta);
>>> + }
>>> +}
>>> +
>>> +static __inline void
>>> +getthmember(void *out, size_t out_size, u_int off)
>>> +{
>>> + struct timehands *th;
>>> + u_int gen;
>>> +
>>> + do {
>>> + th = timehands;
>>> + gen = atomic_load_acq_int(&th->th_generation);
>>> + memcpy(out, (char *)th + off, out_size);
>>
>> This isn't so ugly or technically incorrect. Now the object is generic,
>> so the reference to it should be passed as (void *objp, size_t objsize)
>> instead of the type-safe (struct bintime *base_bpt).
> _Generic is what gave me a hint how to make the implementation type-safe.
> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
> index 2656fb4d22f..4e94f762026 100644
> --- a/sys/kern/kern_tc.c
> +++ b/sys/kern/kern_tc.c
> @@ -72,6 +72,7 @@ struct timehands {
> struct timecounter *th_counter;
> int64_t th_adjustment;
> uint64_t th_scale;
> + u_int th_large_delta;
> u_int th_offset_count;
> struct bintime th_offset;
> struct bintime th_bintime;
> @@ -90,6 +91,7 @@ static struct timehands th1 = {
> static struct timehands th0 = {
> .th_counter = &dummy_timecounter,
> .th_scale = (uint64_t)-1 / 1000000,
> + .th_large_delta = 1000000,
> .th_offset = { .sec = 1 },
> .th_generation = 1,
> .th_next = &th1
> @@ -200,20 +202,72 @@ tc_delta(struct timehands *th)
> * the comment in <sys/time.h> for a description of these 12 functions.
> */
>
> -#ifdef FFCLOCK
> -void
> -fbclock_binuptime(struct bintime *bt)
> +static __inline void
> +bintime_off(struct bintime *bt, u_int off)
> {
> struct timehands *th;
> - unsigned int gen;
> + struct bintime *btp;
> + uint64_t scale, x;
> + u_int delta, gen, large_delta;
>
> do {
> th = timehands;
> gen = atomic_load_acq_int(&th->th_generation);
> - *bt = th->th_offset;
> - bintime_addx(bt, th->th_scale * tc_delta(th));
> + btp = (struct bintime *)((vm_offset_t)th + off);
> + *bt = *btp;
> + scale = th->th_scale;
> + delta = tc_delta(th);
> + large_delta = th->th_large_delta;
> atomic_thread_fence_acq();
> } while (gen == 0 || gen != th->th_generation);
> +
> + if (__predict_false(delta >= large_delta)) {
> + /* Avoid overflow for scale * delta. */
> + x = (scale >> 32) * delta;
> + bt->sec += x >> 32;
> + bintime_addx(bt, x << 32);
> + bintime_addx(bt, (scale & 0xffffffff) * delta);
> + } else {
> + bintime_addx(bt, scale * delta);
> + }
> +}
> +#define GETTHBINTIME(dst, member) \
> +do { \
> + _Static_assert(_Generic(((struct timehands *)NULL)->member, \
> + struct bintime: 1, default: 0) == 1, \
> + "struct timehands member is not of struct bintime type"); \
> + bintime_off(dst, __offsetof(struct timehands, member)); \
> +} while (0)
> +
> +static __inline void
> +getthmember(void *out, size_t out_size, u_int off)
> +{
> + struct timehands *th;
> + u_int gen;
> +
> + do {
> + th = timehands;
> + gen = atomic_load_acq_int(&th->th_generation);
> + memcpy(out, (char *)th + off, out_size);
> + atomic_thread_fence_acq();
> + } while (gen == 0 || gen != th->th_generation);
> +}
> +#define GETTHMEMBER(dst, member) \
> +do { \
> + _Static_assert(_Generic(*dst, \
> + __typeof(((struct timehands *)NULL)->member): 1, \
> + default: 0) == 1, \
> + "*dst and struct timehands member have different types"); \
> + getthmember(dst, sizeof(*dst), __offsetof(struct timehands, \
> + member)); \
> +} while (0)
> +
> +#ifdef FFCLOCK
> +void
> +fbclock_binuptime(struct bintime *bt)
> +{
> +
> + GETTHBINTIME(bt, th_offset);
> }
>
> void
> @@ -237,16 +291,8 @@ fbclock_microuptime(struct timeval *tvp)
> void
> fbclock_bintime(struct bintime *bt)
> {
> - struct timehands *th;
> - unsigned int gen;
>
> - do {
> - th = timehands;
> - gen = atomic_load_acq_int(&th->th_generation);
> - *bt = th->th_bintime;
> - bintime_addx(bt, th->th_scale * tc_delta(th));
> - atomic_thread_fence_acq();
> - } while (gen == 0 || gen != th->th_generation);
> + GETTHBINTIME(bt, th_bintime);
> }
>
> void
> @@ -270,100 +316,55 @@ fbclock_microtime(struct timeval *tvp)
> void
> fbclock_getbinuptime(struct bintime *bt)
> {
> - struct timehands *th;
> - unsigned int gen;
>
> - do {
> - th = timehands;
> - gen = atomic_load_acq_int(&th->th_generation);
> - *bt = th->th_offset;
> - atomic_thread_fence_acq();
> - } while (gen == 0 || gen != th->th_generation);
> + GETTHMEMBER(bt, th_offset);
> }
>
> void
> fbclock_getnanouptime(struct timespec *tsp)
> {
> - struct timehands *th;
> - unsigned int gen;
> + struct bintime bt;
>
> - do {
> - th = timehands;
> - gen = atomic_load_acq_int(&th->th_generation);
> - bintime2timespec(&th->th_offset, tsp);
> - atomic_thread_fence_acq();
> - } while (gen == 0 || gen != th->th_generation);
> + GETTHMEMBER(&bt, th_offset);
> + bintime2timespec(&bt, tsp);
> }
>
> void
> fbclock_getmicrouptime(struct timeval *tvp)
> {
> - struct timehands *th;
> - unsigned int gen;
> + struct bintime bt;
>
> - do {
> - th = timehands;
> - gen = atomic_load_acq_int(&th->th_generation);
> - bintime2timeval(&th->th_offset, tvp);
> - atomic_thread_fence_acq();
> - } while (gen == 0 || gen != th->th_generation);
> + GETTHMEMBER(&bt, th_offset);
> + bintime2timeval(&bt, tvp);
> }
>
> void
> fbclock_getbintime(struct bintime *bt)
> {
> - struct timehands *th;
> - unsigned int gen;
>
> - do {
> - th = timehands;
> - gen = atomic_load_acq_int(&th->th_generation);
> - *bt = th->th_bintime;
> - atomic_thread_fence_acq();
> - } while (gen == 0 || gen != th->th_generation);
> + GETTHMEMBER(bt, th_bintime);
> }
>
> void
> fbclock_getnanotime(struct timespec *tsp)
> {
> - struct timehands *th;
> - unsigned int gen;
>
> - do {
> - th = timehands;
> - gen = atomic_load_acq_int(&th->th_generation);
> - *tsp = th->th_nanotime;
> - atomic_thread_fence_acq();
> - } while (gen == 0 || gen != th->th_generation);
> + GETTHMEMBER(tsp, th_nanotime);
> }
>
> void
> fbclock_getmicrotime(struct timeval *tvp)
> {
> - struct timehands *th;
> - unsigned int gen;
>
> - do {
> - th = timehands;
> - gen = atomic_load_acq_int(&th->th_generation);
> - *tvp = th->th_microtime;
> - atomic_thread_fence_acq();
> - } while (gen == 0 || gen != th->th_generation);
> + GETTHMEMBER(tvp, th_microtime);
> }
> #else /* !FFCLOCK */
> +
> void
> binuptime(struct bintime *bt)
> {
> - struct timehands *th;
> - u_int gen;
>
> - do {
> - th = timehands;
> - gen = atomic_load_acq_int(&th->th_generation);
> - *bt = th->th_offset;
> - bintime_addx(bt, th->th_scale * tc_delta(th));
> - atomic_thread_fence_acq();
> - } while (gen == 0 || gen != th->th_generation);
> + GETTHBINTIME(bt, th_offset);
> }
>
> void
> @@ -387,16 +388,8 @@ microuptime(struct timeval *tvp)
> void
> bintime(struct bintime *bt)
> {
> - struct timehands *th;
> - u_int gen;
>
> - do {
> - th = timehands;
> - gen = atomic_load_acq_int(&th->th_generation);
> - *bt = th->th_bintime;
> - bintime_addx(bt, th->th_scale * tc_delta(th));
> - atomic_thread_fence_acq();
> - } while (gen == 0 || gen != th->th_generation);
> + GETTHBINTIME(bt, th_bintime);
> }
>
> void
> @@ -420,85 +413,47 @@ microtime(struct timeval *tvp)
> void
> getbinuptime(struct bintime *bt)
> {
> - struct timehands *th;
> - u_int gen;
>
> - do {
> - th = timehands;
> - gen = atomic_load_acq_int(&th->th_generation);
> - *bt = th->th_offset;
> - atomic_thread_fence_acq();
> - } while (gen == 0 || gen != th->th_generation);
> + GETTHMEMBER(bt, th_offset);
> }
>
> void
> getnanouptime(struct timespec *tsp)
> {
> - struct timehands *th;
> - u_int gen;
> + struct bintime bt;
>
> - do {
> - th = timehands;
> - gen = atomic_load_acq_int(&th->th_generation);
> - bintime2timespec(&th->th_offset, tsp);
> - atomic_thread_fence_acq();
> - } while (gen == 0 || gen != th->th_generation);
> + GETTHMEMBER(&bt, th_offset);
> + bintime2timespec(&bt, tsp);
> }
>
> void
> getmicrouptime(struct timeval *tvp)
> {
> - struct timehands *th;
> - u_int gen;
> + struct bintime bt;
>
> - do {
> - th = timehands;
> - gen = atomic_load_acq_int(&th->th_generation);
> - bintime2timeval(&th->th_offset, tvp);
> - atomic_thread_fence_acq();
> - } while (gen == 0 || gen != th->th_generation);
> + GETTHMEMBER(&bt, th_offset);
> + bintime2timeval(&bt, tvp);
> }
>
> void
> getbintime(struct bintime *bt)
> {
> - struct timehands *th;
> - u_int gen;
>
> - do {
> - th = timehands;
> - gen = atomic_load_acq_int(&th->th_generation);
> - *bt = th->th_bintime;
> - atomic_thread_fence_acq();
> - } while (gen == 0 || gen != th->th_generation);
> + GETTHMEMBER(bt, th_bintime);
> }
>
> void
> getnanotime(struct timespec *tsp)
> {
> - struct timehands *th;
> - u_int gen;
>
> - do {
> - th = timehands;
> - gen = atomic_load_acq_int(&th->th_generation);
> - *tsp = th->th_nanotime;
> - atomic_thread_fence_acq();
> - } while (gen == 0 || gen != th->th_generation);
> + GETTHMEMBER(tsp, th_nanotime);
> }
>
> void
> getmicrotime(struct timeval *tvp)
> {
> - struct timehands *th;
> - u_int gen;
>
> - do {
> - th = timehands;
> - gen = atomic_load_acq_int(&th->th_generation);
> - *tvp = th->th_microtime;
> - atomic_thread_fence_acq();
> - } while (gen == 0 || gen != th->th_generation);
> + GETTHMEMBER(tvp, th_microtime);
> }
> #endif /* FFCLOCK */
>
> @@ -514,15 +469,8 @@ getboottime(struct timeval *boottime)
> void
> getboottimebin(struct bintime *boottimebin)
> {
> - struct timehands *th;
> - u_int gen;
>
> - do {
> - th = timehands;
> - gen = atomic_load_acq_int(&th->th_generation);
> - *boottimebin = th->th_boottime;
> - atomic_thread_fence_acq();
> - } while (gen == 0 || gen != th->th_generation);
> + GETTHMEMBER(boottimebin, th_boottime);
> }
>
> #ifdef FFCLOCK
> @@ -1038,15 +986,8 @@ getmicrotime(struct timeval *tvp)
> void
> dtrace_getnanotime(struct timespec *tsp)
> {
> - struct timehands *th;
> - u_int gen;
>
> - do {
> - th = timehands;
> - gen = atomic_load_acq_int(&th->th_generation);
> - *tsp = th->th_nanotime;
> - atomic_thread_fence_acq();
> - } while (gen == 0 || gen != th->th_generation);
> + GETTHMEMBER(tsp, th_nanotime);
> }
>
> /*
> @@ -1464,6 +1405,7 @@ tc_windup(struct bintime *new_boottimebin)
> scale += (th->th_adjustment / 1024) * 2199;
> scale /= th->th_counter->tc_frequency;
> th->th_scale = scale * 2;
> + th->th_large_delta = MIN(((uint64_t)1 << 63) / scale, UINT_MAX);
>
> /*
> * Now that the struct timehands is again consistent, set the new
> _______________________________________________
> freebsd-ppc at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-ppc
> To unsubscribe, send any mail to "freebsd-ppc-unsubscribe at freebsd.org"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5387 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-ppc/attachments/20190403/d2490489/attachment.bin>
More information about the freebsd-ppc
mailing list