calcru: runtime went backwards, RELENG_6, SMP
Matthew Dillon
dillon at apollo.backplane.com
Wed Jun 6 21:27:39 UTC 2007
:IV> > Upd: on GENERIC/amd64 kernel I got the same errors.
:IV>
:IV> Do you perhaps run with TSC timecounter? (that's the only cause I've notice
:IV> that can generate this message).
:
:Nope:
:
:marck at ct-new:~> sysctl kern.timecounter
:kern.timecounter.tick: 1
:kern.timecounter.choice: TSC(-100) ACPI-fast(1000) i8254(0) dummy(-1000000)
:kern.timecounter.hardware: ACPI-fast
:...
kgdb your live kernel and 'print cpu_ticks'. See what the cpu ticker
is actually pointing at, because it might not be the time counter.
It could still be TSC.
The TSC isn't synchronized between the cores on a SMP box, not even
on multi-core parts. It can't be used to calculate delta times
for any thread that has the possibility of migrating between cpu's.
Not only will the absolute offset be off between cpus, but the frequency
will also be slightly different (at least on SMP multi-core parts),
so you get frequency drift too.
There is also possibly an issue with tc_cpu_ticks(), which seems to
be using a static 64 bit variable to handle rollover instead of
a per-cpu variable. I don't see how that could possibly be MP safe,
especially if the timecount is not synchronized between cpus and
causes multiple rollover events.
In fact, I can *barely* use the TSC on DragonFly for KTR logging, and
even then I have to have some kernel threads sitting there doing nothing
but figuring out the drift between the cpus so it can correct the
TSC values when it logs information... and even with all of that I
can't get them synchronized any closer then around 500ns from each
other.
I'd recommend that FreeBSD do what we did years ago with calcru ... stop
trying to calculate the time down to the nanosecond and just do it
statistically. It works just fine and takes the whole mess out of
the critical path.
-Matt
More information about the freebsd-stable
mailing list