powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]

Mark Millard marklmi at yahoo.com
Fri Apr 5 08:40:19 UTC 2019



On 2019-Apr-4, at 21:38, Bruce Evans <brde at optusnet.com.au> wrote:

> On Thu, 4 Apr 2019, Mark Millard wrote:
> 
>> On 2019-Apr-3, at 08:47, Bruce Evans <brde at optusnet.com.au> wrote:
>>> . . .
>>> 
>>> I noticed (or better realized) a general problem with multiple
>>> timehands.  ntpd can slew the clock at up to 500 ppm, and at least an
>>> old version of it uses a rate of 50 ppm to fix up fairly small drifts
>>> in the milliseconds range.  500 ppm is enormous in CPU cycles -- it is
>>> 500 thousand nsec or 2 million cycles at 4GHz.  Winding up the timecounter
>>> every 1 msec reduces this to only 2000 cycles.
>>> 
>>> More details of ordering and timing for 1 thread:
>>> ...
>> Thanks for the description of an example way that sbinuptime and
>> the like might not give weakly increasing results.
>> 
>> Unfortunately, all the multi-socket contexts that I sometimes have
>> access to are old PowerMacs. And, currently, the only such context
>> is the G5 with 2 sockets, 2 cores per socket (powerpc64). So I've
>> not been able to set up other types of examples to see if problems
>> repeat.
>> 
>> I do not have access to a single-socket powerpc64 for contrast in
>> that direction.
> 
> Testing 1 socket is time-consuming enough.  Do these old systems
> use the equivalent of an x86 TSC for the timecounter?  With multiple
> sockets, it isn't clear how even a hardware timer independent of the
> CPUs can be distributed so as to appear to be monotonic on all cors.

"The DEC frequency is based on the same implementation-dependent
frequency that drives the time base." The frequency may well be
fixed by the PowerMac G5 model implementation but is not fixed
by the powerpc64 architecture.

The Time Base Register (TBR) in a powerpc64 core (cpu in FreeBSD terms)
increments at 33,333,333 Hz (nominal) and is 64 bits wide. Its value
can be set (mttb instruction) and the boot sequence in FreeBSD does
attempt to adjust as the FreeBSD CPU is brought-up/started. mftb is
used to read the 64-bit value. FreeBSD masks it down to 32-bits to
contribute to the time-counter.

(Is that description sufficient for what you were after? I've never
seen documentation of how the 33,333,333 MHz is produced.)

As FreeBSD supports multi-socket, what are its criteria for a sufficient
context for it to work with for supporting sbinuptime and the like? Is
FreeBSD supposed to then make it appear that sbinputime and the like are
weakly increasing, even as threads migrate between CPUs (cores,
hw-threads)?

>> One oddity is that the eventtimer's decrementer and timecounter
>> may change (nearly) together: both change at 33,333,333 Hz, as if
>> they are tied to the same clock (at least on one socket).
> 
> I think this is from a normal hardware implementation.  On all of
> my x86 systems with a TSC, the TSC frequency is an exact fractional
> multiple of the i8254, the ACPI timer (if present) and the HPET (if
> present).  Only the RTC has an independent frequency.  The fraction is
> changed by changing the nominal TSC frequency in the BIOS, but is not
> changed by temperature variations.  This must be because most clocks are
> derived from a common clock using a PLL.  I use this to calibrate all
> clocks (except the RTC) by calibrating only 1.

I'm not aware of the OpenFirmware having any control over the
TBR-change frequency behavior. I've no evidence about any variability
based on temperature.

>> In case it helps with knowing how analogous your investigations
>> are to the original problem context, I report the following.
>> 
>> If you do not care for such information, stop reading here.
>> 
>> # grep ntpd /etc/rc.conf
>> ntpd_enable="YES"
>> ntpd_sync_on_start="YES"
>> 
>> # sysctl kern.eventtimer
>> kern.eventtimer.periodic: 0
>> kern.eventtimer.timer: decrementer
>> kern.eventtimer.idletick: 0
>> kern.eventtimer.singlemul: 2
>> kern.eventtimer.choice: decrementer(1000)
>> kern.eventtimer.et.decrementer.quality: 1000
>> kern.eventtimer.et.decrementer.frequency: 33333333
>> kern.eventtimer.et.decrementer.flags: 7
>> 
>> # vmstat -ai | grep decrementer
>> cpu0:decrementer                 4451007         35
>> cpu3:decrementer                 1466010         11
>> cpu2:decrementer                 1481722         12
>> cpu1:decrementer                 1478618         12
> 
> Powerpc seems to have a PLL in software too.  Event timers don't need to
> be very precise or accurate.

powerpc64 sets the value to count down from (in the 32-bit
DEC regsiter) via the mtdec instruction. I'm not ware of being 
able to change the frequency that the countdown occurs at
on the old PowerMac G5's. (mtdec is a form of mtspr SPRNUM,rs .)

[Accessing the DEC's value is via a privileged instruction
on powerpc64 vs. a non-privileged one on power, different
instruction encodings but the same mnemonic unless mfspr is
used directly as the mnemonic. Just a difference in one bit
position in the SPUNUM in teh encoding.]

>> (That last is from a basically-idle timeframe.)
>> 
>> # sysctl -a | grep hz
>> kern.clockrate: { hz = 1000, tick = 1000, profhz = 8128, stathz = 127 }
>> kern.hz: 1000
> 
> x86 is similar.  I think synchronization from using PLLs still gives
> unfair scheduling, but with multiple CPUs and often more cycles than can
> be used, no one cares about accidental synchronization or bothers to steal
> cycles using intentional synchronization.

Good to know.

>> # sysctl kern.timecounter
>> kern.timecounter.fast_gettime: 1
>> kern.timecounter.tick: 1
>> kern.timecounter.choice: timebase(0) dummy(-1000000)
>> kern.timecounter.hardware: timebase
>> kern.timecounter.alloweddeviation: 5
>> kern.timecounter.stepwarnings: 0
>> kern.timecounter.tc.timebase.quality: 0
>> kern.timecounter.tc.timebase.frequency: 33333333
>> kern.timecounter.tc.timebase.counter: 1144662532
>> kern.timecounter.tc.timebase.mask: 4294967295
>> 
>> (The actual Time Base Register (tbr) i s 64 bits
>> and freebsd truncates it down.)
>> 
>> # sysctl -a | grep 'cpu.*freq'
>> device	cpufreq
>> debug.cpufreq.verbose: 0
>> debug.cpufreq.lowest: 0
>> dev.cpufreq.0.%parent: cpu3
>> dev.cpufreq.0.%pnpinfo:
>> dev.cpufreq.0.%location:
>> dev.cpufreq.0.%driver: cpufreq
>> dev.cpufreq.0.%desc:
>> dev.cpufreq.%parent:
>> dev.cpu.3.freq_levels: 2500/-1 1250/-1
>> dev.cpu.3.freq: 2500
>> 
>> So 2500 MHz / 33333333 Hz is very near 75 clock periods per
>> timebase counter value.
> 
> Looks like it is exactly 75.  Fractions are especially easy to guess and
> verify when they are integral.

I'm not sure what happens for DEC and TBR change frequencies
if the 2500 MHz cpu frequency is dropped down to 1250 MHz.
But as I understand my context, 1250 MHz is not in use at all,
just 2500 MHz.



===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)



More information about the freebsd-ppc mailing list