svn commit: r297039 - head/sys/x86/x86

Thu Mar 24 16:25:00 UTC 2016

On Fri, Mar 25, 2016 at 01:57:32AM +1100, Bruce Evans wrote:
> > In fact, if we can use TSC with the only requirement of being monotonic,
> > I do not see why do we need TSC at all. We can return to pre-r278325
> > loop, but calibrate the number of loop iterations for known delay in
> > 1us, once on boot.  Do you agree with this ?
> 
> As a comment in the previous version says, the old method is highly
> bogus since it is sensitive to CPU clock speed.
We do not care if we time-out in the 5usec or 50usec.  We can even wait
for 1 second.  The only thing that must be avoided is infinite wait to
prevent a situation when CPU spins with disabled interrupts infinitely.

> 
> My systems allow speed variations of about 4000:800 = 5:1 for one CPU and
> about 50:1 for different CPUs.  So the old method gave a variation of up
> to 50:1.  This can be reduced to only 5:1 using the boot-time calibration.
What do you mean by 'for different CPUs' ?  I understand that modern ESS
can give us CPU frequency between 800-4200MHz, which is what you mean
by 'for one CPU'.  We definitely do not care if 5usec timeout becomes
25usecs, since we practically never time-out there at all.

As I understand the original report, the LAPIC becomes ready for next IPI
command much faster than 1usec, but not that fast to report readiness on
the next register read.

The timeout is more practical for APs startup, where microcode must
initialize and test core, which does take time, and sometimes other
issues do prevent APs from starting. But this happens during early boot,
when ESS and trottling cannot happen, so initial calibration is more or
less accurate.

> 
> Old versions of DELAY() had similar problems.  E.g., in FreeBSD-7:
> 
> X 	if (tsc_freq != 0 && !tsc_is_broken) {
> X 		uint64_t start, end, now;
> X 
> X 		sched_pin();
> X 		start = rdtsc();
> X 		end = start + (tsc_freq * n) / 1000000;
> X 		do {
> X 			cpu_spinwait();
> X 			now = rdtsc();
> X 		} while (now < end || (now > start && end < start));
> X 		sched_unpin();
> X 		return;
> X 	}
> 
> This only works if the TSC is invariant.  tsc_freq_changed() updates
> tsc_freq, but it is too hard to do this atomically, so the above is
> broken when it is called while the frequency is being updated (and
> this can be arranged by single stepping through the update code).
> The bug might break at least the keyboard i/o used to do the single
> stepping, depending on whether the DELAY()s in it are really needed
> and on how much these DELAYS()s are lengthened or shortened.
> 
> This loop doesn't need to use the TSC, but can use a simple calibrated
> loop since the delay only needs to be right to within a few usec of
> a factor of 2 or so, or any longer time (so that it doesn't matter if
> the thread is preempted many quanta).  The calibration just needs to
> be updated whenever the CPU execution rate changes.
I do not see why would we need to re-calibrate for ipi_wait(),  the possible
inaccuracy is not important.

> 
> Newer CPUs give complications like the CPU execution rate being independent
> of the (TSC) CPU frequency.  An invariant TSC gives an invariant CPU
> frequency, but the execution rate may change in lots of ways for lots
> of reasons.
> 
> The cputicker has similar problems with non-invariant CPUs.  It
> recalibrates, but there are races and other bugs in the recalibration.
> These bugs are small compared with counting ticks at old rate(s) as
> being at the current rate.

Yes, but I do not believe any of the listed problems are relevant for
the wait for LAPIC readiness.