svn commit: r222866 - head/sys/x86/x86

Bruce Evans brde at optusnet.com.au
Wed Jun 8 20:55:37 UTC 2011


On Wed, 8 Jun 2011, Jung-uk Kim wrote:

> Log:
>  Introduce low-resolution TSC timecounter "TSC-low".  It replaces the normal
>  TSC timecounter if TSC frequency is higher than ~4.29 MHz (or 2^32-1 Hz) or

Off by a factor of 1024.

>  multiple CPUs are present.  The "TSC-low" frequency is always lower than a
>  preset maximum value and derived from TSC frequency (by being halved until
>  it becomes lower than the maximum).  Note the maximum value for SMP case is
>  significantly lower than UP case because we want to reduce (rare but known)
>  "temporal anomalies" caused by non-serialized RDTSC instruction.  Normally,
>  it is still higher than "ACPI-fast" timecounter frequency (which was default
>  timecounter hardware for long time until r222222) to be useful.

It should be a separate timecounter so that the user can choose it
independently, at least in the SMP case where it is very low (at most
~4.29 GHz >> 8 ~= 17 MHz).

> Modified:
>  head/sys/x86/x86/tsc.c
>
> Modified: head/sys/x86/x86/tsc.c
> ==============================================================================
> --- head/sys/x86/x86/tsc.c	Wed Jun  8 19:28:59 2011	(r222865)
> +++ head/sys/x86/x86/tsc.c	Wed Jun  8 19:38:31 2011	(r222866)
> @@ -79,7 +79,8 @@ static void tsc_freq_changed(void *arg,
>     int status);
> static void tsc_freq_changing(void *arg, const struct cf_level *level,
>     int *status);
> -static	unsigned tsc_get_timecount(struct timecounter *tc);
> +static unsigned tsc_get_timecount(struct timecounter *tc);
> +static unsigned tsc_get_timecount_lowres(struct timecounter *tc);

Should be spelled low or lowres throughout.

> static void tsc_levels_changed(void *arg, int unit);
>
> static struct timecounter tsc_timecounter = {
> @@ -392,11 +393,19 @@ test_smp_tsc(void)
> static void
> init_TSC_tc(void)

This seems to only be called once at boot time.  So the lowness may be
much lower than necessary if the levels are reduced significantly later.

> {
> +	uint64_t max_freq;
> +	int shift;
>
> 	if ((cpu_feature & CPUID_TSC) == 0 || tsc_disabled)
> 		return;
>
> 	/*
> +	 * Limit timecounter frequency to fit in an int and prevent it from
> +	 * overflowing too fast.
> +	 */
> +	max_freq = UINT_MAX;
> +
> +	/*
> 	 * We can not use the TSC if we support APM.  Precise timekeeping
> 	 * on an APM'ed machine is at best a fools pursuit, since
> 	 * any and all of the time spent in various SMM code can't
> @@ -418,13 +427,27 @@ init_TSC_tc(void)
> 	 * We can not use the TSC in SMP mode unless the TSCs on all CPUs are
> 	 * synchronized.  If the user is sure that the system has synchronized
> 	 * TSCs, set kern.timecounter.smp_tsc tunable to a non-zero value.
> +	 * We also limit the frequency even lower to avoid "temporal anomalies"
> +	 * as much as possible.
> 	 */
> -	if (smp_cpus > 1)
> +	if (smp_cpus > 1) {
> 		tsc_timecounter.tc_quality = test_smp_tsc();
> +		max_freq >>= 8;
> +	}

This gives especially low lowness if the levels are reduced significantly.
Maybe as low as 100 MHz >> 8 = ~390 KHz = lower than an i8254.  OTOH, maybe
the temporal anomalies scale with the TSC frequency, so you need to right
shift by a few irrespective of the TSC frequency.  A shift count of 8
seems too much, but if the initial TSC frequency is already < 2**32
shifted by 8, then the final shift is 0.

> #endif
> init:
> +	for (shift = 0; shift < 32 && (tsc_freq >> shift) > max_freq; shift++)
> +		;
> +	if (shift > 0) {
> +		tsc_timecounter.tc_get_timecount = tsc_get_timecount_lowres;
> +		tsc_timecounter.tc_name = "TSC-low";
> +		if (bootverbose)
> +			printf("TSC timecounter discards lower %d bit(s).\n",
> +			    shift);

Error messages are not terminated by a "." in KNF.

> +	}
> 	if (tsc_freq != 0) {
> -		tsc_timecounter.tc_frequency = tsc_freq;
> +		tsc_timecounter.tc_frequency = tsc_freq >> shift;
> +		tsc_timecounter.tc_priv = (void *)(intptr_t)shift;
> 		tc_init(&tsc_timecounter);
> 	}
> }
> @@ -496,7 +519,8 @@ tsc_freq_changed(void *arg, const struct
> 	/* Total setting for this level gives the new frequency in MHz. */
> 	freq = (uint64_t)level->total_set.freq * 1000000;
> 	atomic_store_rel_64(&tsc_freq, freq);

I thought you would scale the low-level tsc_freq too, but this is not so
easy (the shift would have to be put in rdtsc()).

> -	atomic_store_rel_64(&tsc_timecounter.tc_frequency, freq);
> +	tsc_timecounter.tc_frequency =
> +	    freq >> (int)(intptr_t)tsc_timecounter.tc_priv;

Perhaps the levels can also be increased significantly later.  Then
the timecounter frequency may exceed 4.29 GHz despite its scaling.

> }
>
> static int
> @@ -511,7 +535,8 @@ sysctl_machdep_tsc_freq(SYSCTL_HANDLER_A
> 	error = sysctl_handle_64(oidp, &freq, 0, req);
> 	if (error == 0 && req->newptr != NULL) {
> 		atomic_store_rel_64(&tsc_freq, freq);
> -		atomic_store_rel_64(&tsc_timecounter.tc_frequency, freq);
> +		tsc_timecounter.tc_frequency =
> +		    freq >> (int)(intptr_t)tsc_timecounter.tc_priv;

Root can easily increase this frequency from below 4.29 GHz (with no scaling)
to above (still with no scaling).  And this change might even be correct
if it is from 4.29 GHz - epsilon to 4.29 GHz plus epsilon.  Micro-adjustments
that cross the 4.29 GHz threshold are precisely the ones that need the atomic
op.

> 	}
> 	return (error);
> }
> @@ -520,8 +545,15 @@ SYSCTL_PROC(_machdep, OID_AUTO, tsc_freq
>     0, 0, sysctl_machdep_tsc_freq, "QU", "Time Stamp Counter frequency");
>
> static u_int
> -tsc_get_timecount(struct timecounter *tc)
> +tsc_get_timecount(struct timecounter *tc __unused)
> {
>
> 	return (rdtsc32());
> }
> +
> +static u_int
> +tsc_get_timecount_lowres(struct timecounter *tc)
> +{
> +
> +	return (rdtsc() >> (int)(intptr_t)tc->tc_priv);

This forces a slow 64-bit shift (shrdl; shrl) in all cases.  rdtsc32()
with a scaled tc_counter_mask should work OK (essentially the same as
the non-low timecounter except for reduced accuracy; the only loss is
an decrease in the time until counter overflow to the same as for
the non-low timecounter).

> +}
>

Blindly changing the scale factor on every change to tsc_freq would give
some nice races, but ones that are essentially the same as for changing
the timecounter hardware.  This gives me an idea for another
implementation: have an array of TSC timecounters with 1 element for each
small shift count, and switch using the timecounter hardware switch code,
and let the user select the element.

Bruce


More information about the svn-src-head mailing list