svn commit: r297039 - head/sys/x86/x86

Mon Mar 28 18:58:21 UTC 2016

On Thursday, March 24, 2016 11:09:17 AM Konstantin Belousov wrote:
> On Wed, Mar 23, 2016 at 02:21:59PM -0700, John Baldwin wrote:
> > As you noted, the issue is if a timecounter needs locks (e.g. i8254) though
> > outside of that I think the patch is great. :-/  Of course, if the TSC
> > isn't advertised as invariant, DELAY() is talking to the timecounter
> > directly as well.
> > 
> > However, I think we probably can use the TSC.  The only specific note I got
> > from Ryan (cc'd) was about the TSC being unstable as a timecounter under KVM.
> > That doesn't mean that the TSC is non-mononotic on a single vCPU.  In fact,
> > thinking about this more I have a different theory to explain how the TSC
> > can be out of whack on different vCPUs even if the hardware TSC is in sync
> > in the physical CPUs underneath.
> In fact, if we can use TSC with the only requirement of being monotonic,
> I do not see why do we need TSC at all. We can return to pre-r278325
> loop, but calibrate the number of loop iterations for known delay in
> 1us, once on boot.  Do you agree with this ?

Yes, I think this is fine.  What we really want is to be sure the minimum wait
times aren't too low, but we just need to be reasonably close.

> > One of the things present in the VCMS on Intel CPUs using VT-x is a TSC
> > adjustment.  The hypervisor can alter this TSC adjustment during a VM-exit to
> > alter the offset between the TSC in the guest and the "real" TSC value in the
> > physical CPU itself.  One way a hypervisor might use this is to try to
> > "pause" the TSC during a VM-exit by taking TSC timestamps at the start and
> > end of a VM-exit and adding that delta to the TSC offset just before each
> > VM-entry.  However, if you have two vCPUs, one of which is running in the
> > guest and one of which is handling a VM-exit in the hypervisor, the TSC on
> > the first vCPU will run while the effective TSC of the second vCPU is paused.
> > When the second vCPU resumes after a VM-entry, it's TSC will now "unpause",
> > but it will lag the first vCPU by however long it took to handle its VM-exit.
> > 
> > It wouldn't surprise me if KVM was doing this.  bhyve does not do this to my
> > knowledge (so the TSC is actually still usable as a timecounter under bhyve
> > for some value of "usable").  However, even with this TSC pausing/unpausing,
> > the TSC would still increase monotonically on a single vCPU.  For the purposes
> > of DELAY() (and other spin loops on a pinned thread such as in
> > lapic_ipi_wait()), that is all you need.
> BTW, Intel exported this mechanism to the non-VT environment as well, on
> recent chips.  So I would be not too surprised if SMM handlers start
> 'compensate' for some long delays in near future.

I think this is more to allow you to keep the TSCs in sync across cores
more sanely by being able to adjust TSC_ADJ instead of trying to time
a write to the TSC to apply an offset (which is racy).  If it was targeted
at SMM, it wouldn't be exposed to the host OS.  I think Intel understands
at this point that OS's want a synchronized TSC on bare metal for "cheap"
timekeeping (at this point the TSC is more about that then about counting
CPU "instructions").

I think your patch later in the thread looks fine.

Most of the worry about older hardware with variable TSC's later in the
thread is becoming less and less relevant.  Intel hasn't shipped a CPU with
a variable TSC in close to a decade now?

Bruce's points about the hardcoded timeouts for things like mutxes are well
founded.  I'm not sure about how best to fix those however.

-- 
John Baldwin