KVM Clock

John Baldwin jhb at freebsd.org
Mon Jan 20 20:50:25 UTC 2014


On Wednesday 15 January 2014 17:59:10 Julian Stecklina wrote:
> On Mi, 2014-01-15 at 17:04 +0100, Roger Pau Monné wrote:
> > On 15/01/14 14:45, Julian Stecklina wrote:
> > > On Mi, 2014-01-15 at 14:08 +0100, Roger Pau Monné wrote:
> > >> On 15/01/14 13:05, Julian Stecklina wrote:
> > >>> On 01/14/2014 05:13 PM, Peter Grehan wrote:
> > >>>> Hi Julian,
> > >>>> 
> > >>>>> is anyone working on KVM clock support for FreeBSD? If not, I
> > >>>>> might take a shot at it.
> > >>>> 
> > >>>> None I know of: go for it :)
> > >>> 
> > >>> Works for me so far:
> > >>> https://github.com/blitz/freebsd/commit/cdc5f872b3e48cc0dda031fc7d6bde
> > >>> dc65c3148f> >> 
> > >> Looking
> > >> 
> > >> at the code it seems some common routines could be shared
> > >> between the KVM PV clock and the Xen PV clock
> > >> (sys/dev/xen/timer/timer.c). The data passed from the hypervisor to
> > >> the guest has exactly the same format (see struct vcpu_time_info in
> > >> Xen public headers).
> > > 
> > > Yes, I know. Didn't get around to making it pretty yesterday evening. ;)
> > > I'll post an updated patch, when I have some time. Any suggestions where
> > > to put the two common functions?
> > > 
> > >> At a first sight the KVM clock can benefit from using scale_delta
> > >> (which is going to be faster than the C version implemented in
> > >> kvmclock_get_timecount),
> > > 
> > > At least somewhat on amd64. 32-bit looks pretty identical.
> > > 
> > >>  and xen_fetch_vcpu_tinfo is exactly the same
> > >> 
> > >> as kvmclock_fetch.
> > > 
> > > I think xen_fetch_vcpu_tinfo has a subtle bug:
> > >   217         do {
> > >   218                 dst->version = src->version;
> > >   219                 rmb();
> > >   220                 dst->tsc_timestamp = src->tsc_timestamp;
> > >   221                 dst->system_time = src->system_time;
> > >   222                 dst->tsc_to_system_mul = src->tsc_to_system_mul;
> > >   223                 dst->tsc_shift = src->tsc_shift;
> > >   224                 rmb();
> > >   225         } while ((src->version & 1) | (dst->version ^
> > > 
> > > src->version));
> > > 
> > > In line 225 src->version is potentially read twice. If you consider the
> > > following schedule:
> > > 
> > > host starts updating data, sets src->version to 1
> > > guest reads src->version (1) and stores it into dst->version.
> > > guest copies inconsistent data
> > > guest reads src->version (1) and computes xor with dst->version.
> > > host finishes updating data and sets src->version to 2
> > > guest reads src->version (2) and checks whether lower bit is not set.
> > > while loop exits with inconsistent data!
> > > 
> > > I think the C standard (at least C11) permits this as it does not
> > > specify in which order the two reads in line 225 need to happen.
> > 
> > According to the operator precedence and associativity in C
> > (http://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B#Operator_preceden
> > ce), if I'm reading it right, the condition in the while line will be
> > evaluated as follows (because of the left-to-right associativity of the
> > 
> > | operator):
> > 1. (src->version & 1)
> > 2. (dst->version ^ src->version)
> > 3. result of 1 | result of 2
> > 
> > So AFAICT the flow that you describe could never happen, because
> > (src->version & 1) is always done before (dst->version ^ src->version).
> 
> Operator precedence doesn't matter. Consider it written like this:
> 
> or(and(src->version, 1), xor(dst->version, src->version))
> 
> C gives you no guarantees whether the and or the xor will be executed
> first. There is no sequence point in between. And even with a sequence
> point, the C11 memory model gives you no guarantees about how the reads
> are ordered.
> 
> This discussion is somewhat theoretical, because
>  a) the hypervisor will probably update the data structure in the
> current vCPU context (making memory consistency issues go away).
>  b) the compiler will likely not be an asshole. ;)
> 
> Pseudocode for a better fetch could be:
> 
> dst->version = src->version;
> rmb();
> ... read data ...
> rmb();
> version_post = src->version;
> rmb(); <- keeps the compiler from reading src->version multiple times
> 
> and then doing the test with version_post and dst->version.
> Unfortunately, rmb() expands into lfence, even if there is no need for
> that here.

There is the __compiler_membar() macro in <sys/cdefs.h> that you could use if 
this code is x86-specific (and thus knows it only needs a compiler barrier).

-- 
John Baldwin


More information about the freebsd-virtualization mailing list