svn commit: r238755 - head/sys/x86/x86

Wed Jul 25 18:05:44 UTC 2012

On Wed, Jul 25, 2012 at 01:27:48PM -0400, Jung-uk Kim wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 2012-07-25 10:44:04 -0400, Bruce Evans wrote:
> > On Wed, 25 Jul 2012, Andriy Gapon wrote:
> > 
> >> on 25/07/2012 13:21 Konstantin Belousov said the following:
> >>> ... diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c index
> >>> 085c339..229b351 100644 --- a/sys/x86/x86/tsc.c +++
> >>> b/sys/x86/x86/tsc.c @@ -594,6 +594,7 @@ static u_int 
> >>> tsc_get_timecount(struct timecounter *tc __unused) {
> >>> 
> >>> +    rmb(); return (rdtsc32()); }
> >> 
> >> This makes sense to me.  We probably want correctness over
> >> performance here. [BTW, I originally thought that the change was
> >> here; brain malfunction]
> > 
> > And I liked the original change because it wasn't here :-).
> > 
> >>> @@ -602,8 +603,9 @@ tsc_get_timecount_low(struct timecounter
> >>> *tc) { uint32_t rv;
> >>> 
> >>> +    rmb(); __asm __volatile("rdtsc; shrd %%cl, %%edx, %0" -
> >>> : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx"); +
> >>> : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx"); return
> >>> (rv); }
> >>> 
> >> 
> >> It would correct here too, but not sure if it would make any 
> >> difference given that some lower bits are discarded anyway.
> >> Probably depends on exact CPU.
> > 
> > It is needed to pessimize this too. :-)
> > 
> > As I have complained before, the loss of resolution from the shift
> > is easy to see by reading the time from userland, even with syscall
> > overhead taking 10-20 times longer than the read.  On core2 with
> > TSC-low, a clock- checking utility gives:
> > 
> > % min 481, max 12031, mean 530.589452, std 51.633626 % 1th: 550
> > (1296487 observations) % 2th: 481 (448425 observations) % 3th: 482
> > (142650 observations) % 4th: 549 (61945 observations) % 5th: 551
> > (47619 observations)
> > 
> > The numbers are diffences in nanoseconds measured by
> > clock_gettime(). The jump from 481 to 549 is 68.  From this I can
> > tell that the clock frequency is 1.86 Ghz and the shift is 128, or
> > the clock frequency is 3.72 Ghz and the shift is 256.
> > 
> > On AthlonXP with TSC:
> > 
> > % min 273, max 29075, mean 274.412811, std 80.425963 % 1th: 273
> > (853962 observations) % 2th: 274 (745606 observations) % 3th: 275
> > (400212 observations) % 4th: 276 (20 observations) % 5th: 280 (10
> > observations)
> > 
> > Now the numbers cluster about the mean.  Although syscalls take
> > much longer than the loss of resolution with TSC-low, and even the
> > core2 TSC takes almost as long to read as the loss, it is still
> > possible to see things happening at the limits of the resolution
> > (~0.5 nsec).
> > 
> >> And, oh hmm, I read AMD Software Optimization Guide for AMD
> >> Family 10h Processors and they suggest using cpuid (with a note
> >> that it may be intercepted in virtualized environments) or
> >> _mfence_ in the discussed role (Appendix F of the document). 
> >> Googling for 'rdtsc mfence lfence' yields some interesting
> >> results.
> > 
> > The second hit was for the shrd pessimization/loss of resolution
> > and a memory access hack in lkml in 2011.  I now seem to remember
> > jkim mentioning the memory access hack.  rmb() on i386 has a
> > related memory access hack, but now with a lock prefix that defeats
> > the point of the 2011 hack (it wanted to save 5 nsec by removing
> > fences).  rmb() on amd64 uses lfence.
> 
> I believe I mentioned this thread at the time:
> 
> https://patchwork.kernel.org/patch/691712/
> 
> FYI, r238755 is essentially this commit for Linux:
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=93ce99e849433ede4ce8b410b749dc0cad1100b2
> 
> > Some of the other hits are a bit old.  The 8th one was by me in
> > the thread about kib@ implementing gettimeofday() in userland.
> 
> Since we have gettimeofday() in userland, the above Linux thread is
> more relevant now, I guess.

For some unrelated reasons, we do have lfence;rdtsc sequence in the
userland already. Well, it is not exactly such sequence, there are
some instructions between, but the main fact is that two consequtive
invocations of gettimeofday(2) (*) or clock_gettime(2) are interleaved
with lfence on Intels, guaranteeing that backstep of the counter is
impossible.

* - it is not a syscall anymore.

As I said, using recommended mfence;rdtsc sequence for AMDs would require
some work, but lets handle the kernel and userspace issues separately.

And, I really failed to find what the patch from the thread you referenced
tried to fix. Was it really committed into Linux ?

I see actual problem of us allowing timecounters going back, and a
solution that exactly follows words of both Intel and AMD documentation.
This is good one step forward IMHO.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/svn-src-all/attachments/20120725/2124e7d9/attachment-0001.pgp