Semi-working patch for amd64 suspend/resume

Bruce Evans brde at
Wed Dec 3 10:33:34 PST 2008

On Tue, 2 Dec 2008, Alexander Motin wrote:

> Nate Lawson wrote:
>>> The only strange effect I have noticed was incorrect CPU time some
>>> processes got:
>>> %ps ax
>>>    12  ??  WL   280503:38,05 [intr]
>>>  1430  ??  Ss   280503:38,34 icewm
>>> But I think it is more timer driver related then resume itself.
>> If you are using the LAPIC timer (default), it won't be running properly
>> during resume.  However, this wide discrepancy seems to indicate that
>> the timer state is not being resumed properly.  What if you use the ACPI
>> timer (hw.timecounter.* I think are the sysctls)?
> As I understand, I am now using LAPIC timer for HZ generation, ACPI-fast as 
> time source and TSC as kernel DELAY() source.

CPU times use mainly the cpu_ticker (TSC on i386), and the cpu_ticker code
has always been broken if the frequency changes a lot.  The main bugs that
I know about are:
(1) the cpu time is (total cpu_ticks) / cputick_frequency(now) but should be
     the integral over previous thread history of
 		    (delta cpu_ticks) / cputick_frequency(t) dt.
     The former gives a wrong value if cputick_frequency(t) is not
     constant over previous thread history, and the wrongness is very
     obvious for long-running threads like intr and idle ones if the
     TSC frequency changes significantly (e.g., by cpufreq).  cpufreq
     has a callback to reinitialize the frequency calibration, but this
     doesn't help much.  I can't find any resume method for the TSC.

(2) frequency _re_calibration is broken.  It never decreases the frequency.
     Thus if the frequency is transiently high, the transiently high
     calibration persists until the next reinitialization of the frequency
     calibration (or until a tranisiently higher frequency is seen).
     Small variations due to temperature changes thus make the frequency
     persistently slightly higher that it should be, and large variations
     due to stopping a timer or stopping or throttling the TSC can make
     the freqency persistently very wrong.  This wrongness is very
     obvious using ddb.  While in ddb, interrupt timers are stopped but
     the TSC advances.  Recalibration then gives an enormously high
     frequency (nearly 1/0 = infinity) that is sticky due to the bug.
     Dividing by this then gives all cpu times of nearly 0, modulo
     monotonicity enforcement by calcru() (which helps here -- old
     nonzero times for intr and idle remain nonzero).  The sanity checking
     in the recalibration detects remarkably few cases of insanity for
     some reason, perhaps because the timers are too in sync.

You seem to have the opposite problem, that times are enormously high.
This would be caused by the frequency being calibrated as nearly 0,
but I can't see how this could happen -- the TSC is presumably stopped
while the system is suspended, so the recalibration code would tend
to give a frequency far too low if the resume method is indeed missing,
but bug (2) prevents this low value being used; OTOH, the resume method
should recalibrate only after restarting all clocks, so it shouldn't
suffer from bug (2).  There are possible races getting the calibration
done by the resume method before the main timer interrupt handler does
it based on bogus data, but the latter doesn't happen on every timer
interrupt so you would be unlucky to lose these races.

If the frequency is transiently miscalibrated as nearly 0 and you look
at the cpu time using calcru() during this time, then bug (1) gives
enormously high times like the above (nearly 1/0) for long running
processes; then calcru()'s monotonicity enforcement preserves the
enormously high times almost forever (recalibration should eventually
fix the frequency, so bug (1) would give normal times again since
nothing much has happened to the tick counts; however monotonicity
enforcement results in the transiently high times being returned almost
forever -- the returned times won't even increase until the normal
times reach the enormous value or another transient miscalibration
messes up the calculation of the raw times again).


More information about the freebsd-amd64 mailing list