TSC Timecounter and multi-core/SMP
Matthew Dillon
dillon at apollo.backplane.com
Fri Apr 18 19:40:59 UTC 2008
:How hard can it be?
:
:An instruction that gives a 64 bit counter, in some reasonable
:granularity that is run at the same speed for all CPUS in a system
:regardless of the speed each cpu is running..
:While nsecs would be nice even usecs might do.
:They don't even have to be in sync as long as the offset
:between them is constant (though that would be nice).
:Bonus points for being able to read it from user space. The
:hardware people don't seem to realise the importance
:of this. and keep throwing it out to gain/save a pin or to save
:some transistors for some other feature.
I think it's harder then it sounds. The technology isn't difficult,
the problem is the two requirements people seem to have for a solid
time base these days:
* Fast access time (in-instruction-stream)
* High resolution (~1nS)
* Not eat up a bunch of die area or current
What it comes down to, really, is simply the fact that you can't just
generate an independant time source at a fixed frequency, use it to
drive a counter, and then latch it into the cpu without synchronizing
it to the cpu's internal clock. Latches are highly sensitive to input
changes that occur simultaniously with the latching clock. I'd have
to research the actual gate configuration AMD and Intel use but
basically you can wind up with either a full-blown latch-up condition,
where the latch tries to drive both a 1 and a 0 (resulting in a short),
or you can create an oscillation or other indeterminant state for a
short while which can propogate onto the cpu's internal busses and
would be really bad news (or at least result in occassional garbage
when trying to read the counter). The very last thing you want to have
to do is resynchronize 64 bits in parallel, which means the actual
counter would have to be implemented in the cpu's core logic and be
synchronized to the cpu's core frequency.
One solution is to place the counter on a bus which is able to
resynchronize the data flow, such as a hyper-transport bus.
But of course if you do that your 'RDTSC' equivalent is going to take
more then a few cycles to run.
If one didn't mind foregoing the high resolution requirement then
the problem is greatly simplified... an external time base, such as
a 1-30 MHz crystal, can be fed into just one bit's worth of
resynchronization logic to generate counter pulses at the cpu's operating
frequency and the counter can then be implemented inside the cpu,
synchronized to its operating frequency. THAT could be done
very easily, and virtually no cost in die area or current. The
timer would have to run at 1/2 the frequency of the cpu's lowest
frequency operating state, which could be very low indeed.
It kind of turns into a mess no matter how you twist it, as long as
the 'fast access time' requirement is left in place.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the freebsd-current
mailing list