Fwd: [RFC] Kernel shared variables

Konstantin Belousov kostikbel at gmail.com
Tue Jun 5 13:39:02 UTC 2012


On Mon, Jun 04, 2012 at 05:22:07PM -0400, John Baldwin wrote:
> On Monday, June 04, 2012 2:19:17 pm Konstantin Belousov wrote:
> > On Mon, Jun 04, 2012 at 11:01:57AM -0400, John Baldwin wrote:
> > > On Sunday, June 03, 2012 6:49:27 am Bruce Evans wrote:
> > > > On Sun, 3 Jun 2012, Konstantin Belousov wrote:
> > > > 
> > > > > On Sun, Jun 03, 2012 at 07:28:09AM +1000, Bruce Evans wrote:
> > > > >> On Sat, 2 Jun 2012, Konstantin Belousov wrote:
> > > > >>> ...
> > > > >>> In fact, I think that if the whole goal is only fast clocks, then we
> > > > >>> do not need any additional system mechanisms, since we can easily export
> > > > >>> coefficients for rdtsc formula already. E.g. we can put it into elf auxv,
> > > > >>> which is ugly but bearable.
> > > > >>
> > > > >> How do you get the timehands offsets?  These only need to be updated
> > > > >> every second or so, or when used, but how can the application know
> > > > >> when they need to be updated if this is not done automatically in the
> > > > >> kernel by writing to a shared page?  I can only think of the
> > > > >> application arranging an alarm signal every second or so and updating
> > > > >> then.  No good for libraries.
> > > > > What is timehands offsets ? Do you mean things like leap seconds ?
> > > > 
> > > > Yes.  binuptime() is:
> > > > 
> > > > % void
> > > > % binuptime(struct bintime *bt)
> > > > % {
> > > > % 	struct timehands *th;
> > > > % 	u_int gen;
> > > > % 
> > > > % 	do {
> > > > % 		th = timehands;
> > > > % 		gen = th->th_generation;
> > > > % 		*bt = th->th_offset;
> > > > % 		bintime_addx(bt, th->th_scale * tc_delta(th));
> > > > % 	} while (gen == 0 || gen != th->th_generation);
> > > > % }
> > > > 
> > > > Without the kernel providing th->th_offset, you have to do lots of ntp
> > > > handling for yourself (compatibly with the kernel) just to get an
> > > > accuracy of 1 second.  Leap seconds don't affect CLOCK_MONOTONIC, but
> > > > they do affect CLOCK_REALTIME which is the clock id used by
> > > > gettimeofday().  For the former, you only have to advance the offset
> > > > for yourself occasionally (compatibly with the kernel) and manage
> > > > (compatibly with the kernel, especially in the long term) ntp slewing
> > > > and other syscall/sysctl kernel activity that micro-adjusts th->th_scale.
> > > 
> > > I think duplicating this logic in userland would just be wasteful.  I have
> > > a private fast gettimeofday() at my current job and it works by exporting
> > > the current timehands structure (well, the equivalent) to userland.  The
> > > userland bits then fetch a copy of the details and do the same as bintime().
> > > (I move the math (bintime_addx() and the multiply)) out of the loop however.
> > I started yesterday an implementation which uses shared page to export
> > some variant of timehands, and uses auxv to provide the libc with a pointer
> > to timehands when rdtsc is reasonable.
> > 
> > I almost finished both 32bit and 64bit userspace, but there is
> > kernel-side work left. Is your implementation ready or close to be ready
> > for commit ? In other words, should I drop the efforts, or continue ?
> 
> No, mine is not general purpose.  I'll see if I can make a public patch of what
> it looks like.

My first version that seems to work on amd64 is at
http://people.freebsd.org/~kib/misc/moronix.1.patch

The plugs do allow for the new gettimeofday code to be replaced by
vdso version in future.

This is definitely WIP, in particular, the memory barriers handling in
the __vdso_gettimeofday and in the tc_windup updater is missing.
Also, clock_gettime() support would require ABI change.

I only compiled amd64 kernel, i386 is probably broken, other architectures
are definitely broken.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20120605/f05aeab4/attachment.pgp


More information about the freebsd-arch mailing list