svn commit: r297039 - head/sys/x86/x86

Konstantin Belousov kostikbel at gmail.com
Wed Mar 23 07:58:48 UTC 2016


On Mon, Mar 21, 2016 at 11:12:57AM -0700, John Baldwin wrote:
> On Saturday, March 19, 2016 05:22:16 AM Konstantin Belousov wrote:
> > On Fri, Mar 18, 2016 at 07:48:49PM +0000, John Baldwin wrote:
> > >  
> > > -	for (x = 0; x < delay; x += 5) {
> > > +	for (x = 0; x < delay; x++) {
> > >  		if ((lapic_read_icr_lo() & APIC_DELSTAT_MASK) ==
> > >  		    APIC_DELSTAT_IDLE)
> > >  			return (1);
> > > -		DELAY(5);
> > > +		DELAY(1);
> > >  	}
> > >  	return (0);
> > >  }
> > 
> > Ideally we would structure the loop differently. I think it is more
> > efficient WRT latency to only block execution by ia32_pause(), and
> > compare the the getbinuptime() results to calculate spent time, on each
> > loop step.
> 
> Yes.  I've thought about using the TSC directly to do that, but folks are
> worried about the TSC being unstable due to vcpus in a VM migrating
> across physical CPUs.  DELAY() does seem to DTRT in that case assuming the
> hypervisor doesn't advertise an invariant TSC via cpuid.  We'd have to
> essentially duplicate DELAY() (really delay_tc()) inline.

If TSC has the behaviour you described, i.e. suddenly jumping random
steps on single CPU, from the point of view of kernel, then the system
is seriosly misbehaving.  The timekeeping stuff would be badly broken
regardless of the ipi_wait().  I do not see why should we worry about
that in ipi_wait().

I proposed slightly different thing, i.e. using timekeep code to indirect
to TSC if configured so.  Below is the proof of concept patch, use of
nanouptime() may be too naive, and binuptime() would cause tiny bit less
overhead, but I do not want to think about arithmetic.

diff --git a/sys/x86/x86/local_apic.c b/sys/x86/x86/local_apic.c
index 7e5087b..fdc12a4 100644
--- a/sys/x86/x86/local_apic.c
+++ b/sys/x86/x86/local_apic.c
@@ -1621,31 +1621,37 @@ SYSINIT(apic_setup_io, SI_SUB_INTR, SI_ORDER_THIRD, apic_setup_io, NULL);
  * private to the MD code.  The public interface for the rest of the
  * kernel is defined in mp_machdep.c.
  */
+
+/*
+ * Wait delay microseconds for IPI to be sent.  If delay is -1, we
+ * wait forever.
+ */
 static int
 native_lapic_ipi_wait(int delay)
 {
-	int x;
+	struct timespec end, x;
 
 	/* LAPIC_ICR.APIC_DELSTAT_MASK is undefined in x2APIC mode */
 	if (x2apic_mode)
 		return (1);
 
-	/*
-	 * Wait delay microseconds for IPI to be sent.  If delay is
-	 * -1, we wait forever.
-	 */
-	if (delay == -1) {
-		while ((lapic_read_icr_lo() & APIC_DELSTAT_MASK) !=
-		    APIC_DELSTAT_IDLE)
-			ia32_pause();
-		return (1);
+	if (delay != -1) {
+		KASSERT(delay > 0, ("wrong delay %d", delay));
+		x.tv_sec = delay / 1000000;
+		x.tv_nsec = (delay % 1000000) * 1000;
+		nanouptime(&end);
+		timespecadd(&end, &x);
 	}
-
-	for (x = 0; x < delay; x++) {
+	for (;;) {
 		if ((lapic_read_icr_lo() & APIC_DELSTAT_MASK) ==
 		    APIC_DELSTAT_IDLE)
 			return (1);
-		DELAY(1);
+		ia32_pause();
+		if (delay != -1) {
+			nanouptime(&x);
+			if (timespeccmp(&x, &end, >))
+				break;
+		}
 	}
 	return (0);
 }


More information about the svn-src-head mailing list