kqueue periodic timer confusion

Ian Lepore freebsd at damnhippie.dyndns.org
Thu Jul 12 15:40:43 UTC 2012


On Thu, 2012-07-12 at 17:08 +0200, Davide Italiano wrote:
> On Thu, Jul 12, 2012 at 4:26 PM, John Baldwin <jhb at freebsd.org> wrote:
> > On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote:
> >> On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote:
> >> > On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote:
> >> > > On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote:
> >> > > > Hi,
> >> > > >
> >> > > > Sorry about this repost but I'm confused about the responses I received
> >> > > > in my last post so I'm looking for some clarification.
> >> > > >
> >> > > > Specifically, I though I could use the kqueue timer as essentially a
> >> > > > "drop in" replacement for linuxfd_create/read, but was surprised that
> >> > > > the accuracy of the kqueue timer is much less than what I need for my
> >> > > > application.
> >> > > >
> >> > > > So my confusion at this point is whether this is consider to be a bug or
> >> > > > "feature"?
> >> > > >
> >> > > > Here's some test code if you want to verify the problem:
> >> > > >
> >> > > > #include <stdio.h>
> >> > > > #include <stdlib.h>
> >> > > > #include <string.h>
> >> > > > #include <unistd.h>
> >> > > > #include <errno.h>
> >> > > > #include <sys/types.h>
> >> > > > #include <sys/event.h>
> >> > > > #include <sys/time.h>
> >> > > >
> >> > > > int
> >> > > > main(void)
> >> > > > {
> >> > > >         int i,msec;
> >> > > >         int kq,nev;
> >> > > >         struct kevent inqueue;
> >> > > >         struct kevent outqueue;
> >> > > >         struct timeval start,end;
> >> > > >
> >> > > >         if ((kq = kqueue()) == -1) {
> >> > > >                 fprintf(stderr, "kqueue error!? errno = %s",
> >> > strerror(errno));
> >> > > >                 exit(EXIT_FAILURE);
> >> > > >         }
> >> > > >         EV_SET(&inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0);
> >> > > >
> >> > > >         gettimeofday(&start, 0);
> >> > > >         for (i = 0; i < 50; i++) {
> >> > > >                 if ((nev = kevent(kq, &inqueue, 1, &outqueue, 1, NULL)) ==
> >> > -1) {
> >> > > >                         fprintf(stderr, "kevent error!? errno = %s",
> >> > strerror(errno));
> >> > > >                         exit(EXIT_FAILURE);
> >> > > >                 } else if (outqueue.flags & EV_ERROR) {
> >> > > >                         fprintf(stderr, "EV_ERROR: %s\n",
> >> > strerror(outqueue.data));
> >> > > >                         exit(EXIT_FAILURE);
> >> > > >                 }
> >> > > >         }
> >> > > >         gettimeofday(&end, 0);
> >> > > >
> >> > > >         msec = ((end.tv_sec - start.tv_sec) * 1000) + (((1000000 +
> >> > end.tv_usec - start.tv_usec) / 1000) - 1000);
> >> > > >
> >> > > >         printf("msec = %d\n", msec);
> >> > > >
> >> > > >         close(kq);
> >> > > >         return EXIT_SUCCESS;
> >> > > > }
> >> > > >
> >> > > >
> >> > >
> >> > > What you are seeing is "just the way FreeBSD currently works."
> >> > >
> >> > > Sleeping (in most all of its various forms, and I've just looked at the
> >> > > kevent code to verify this is true there) is handled by converting the
> >> > > amount of time to sleep (usually specified in a timeval or timespec
> >> > > struct) to a count of timer ticks, using an internal routine called
> >> > > tvtohz() in kern/kern_time.c.  That routine rounds up by one tick to
> >> > > account for the current tick.  Whether that's a good idea or not (it
> >> > > probably was once, and probably not anymore) it's how things currently
> >> > > work, and could explain the fairly consistant +1ms you're seeing.
> >> >
> >> > This is all true, but mostly irrelevant for his case.  EVFILT_TIMER
> >> > installs a periodic callout that executes KNOTE() and then resets itself (via
> >> > callout_reset()) each time it runs.  This should generally be closer to
> >> > regulary spaced intervals than something that does:
> >> >
> >>
> >> In what way is it irrelevant?  That is, what did I miss?  It appears to
> >> me that the next callout is scheduled by calling timertoticks() passing
> >> a count of milliseconds, that count is converted to a struct timeval and
> >> passed to tvtohz() which is where the +1 adjustment happens.  If you ask
> >> for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms.
> >> There is some time, likely a small number of microseconds, that you've
> >> consumed of the current tick, and that's what the +1 in tvtohz() is
> >> supposed to account for according to the comments.
> >>
> >> The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick
> >> and then adds one tick on top of that.  That seems not quite right to
> >> me, except that it is a way to g'tee that you don't return early, and
> >> that is the one promise made by sleep routines on any OS; those magical
> >> "at least" words always appear in the docs.
> >>
> >> Actually what I'm missing (that I know of) is how the scheduler works.
> >> Maybe the +1 adjustment to account for the fraction of the current tick
> >> you've already consumed is the right thing to do, even when that
> >> fraction is 1uS or less of a 1mS tick.  That would depend on scheduler
> >> behavior that I know nothing about.
> >
> > Ohhhhh.  My bad, sorry.  You are correct.  It is a bug to use +1 in this
> > case.  That is, the +1 makes sense when you are computing a one-time delta
> > for things like nanosleep().  It is incorrect when computing a periodic
> > delta such as for computing the interval for an itimer (setitimer) or
> > EVFILT_TIMER().
> >
> > Hah, setitimer()'s callout (realitexpire) uses tvtohz - 1:
> >
> > sys/kern/kern_time.c:
> >
> > /*
> >  * Real interval timer expired:
> >  * send process whose timer expired an alarm signal.
> >  * If time is not set up to reload, then just return.
> >  * Else compute next time timer should go off which is > current time.
> >  * This is where delay in processing this timeout causes multiple
> >  * SIGALRM calls to be compressed into one.
> >  * tvtohz() always adds 1 to allow for the time until the next clock
> >  * interrupt being strictly less than 1 clock tick, but we don't want
> >  * that here since we want to appear to be in sync with the clock
> >  * interrupt even when we're delayed.
> >  */
> > void
> > realitexpire(void *arg)
> > {
> >         struct proc *p;
> >         struct timeval ctv, ntv;
> >
> >         p = (struct proc *)arg;
> >         PROC_LOCK(p);
> >         kern_psignal(p, SIGALRM);
> >         if (!timevalisset(&p->p_realtimer.it_interval)) {
> >                 timevalclear(&p->p_realtimer.it_value);
> >                 if (p->p_flag & P_WEXIT)
> >                         wakeup(&p->p_itcallout);
> >                 PROC_UNLOCK(p);
> >                 return;
> >         }
> >         for (;;) {
> >                 timevaladd(&p->p_realtimer.it_value,
> >                     &p->p_realtimer.it_interval);
> >                 getmicrouptime(&ctv);
> >                 if (timevalcmp(&p->p_realtimer.it_value, &ctv, >)) {
> >                         ntv = p->p_realtimer.it_value;
> >                         timevalsub(&ntv, &ctv);
> >                         callout_reset(&p->p_itcallout, tvtohz(&ntv) - 1,
> >                             realitexpire, p);
> >                         PROC_UNLOCK(p);
> >                         return;
> >                 }
> >         }
> >         /*NOTREACHED*/
> > }
> >
> > Paul, try this patch for sys/kern/kern_event.c.  It uses the same approach as
> > seitimer() above:
> >
> > Index: kern_event.c
> > ===================================================================
> > --- kern_event.c        (revision 238365)
> > +++ kern_event.c        (working copy)
> > @@ -538,7 +538,7 @@ filt_timerexpire(void *knx)
> >
> >         if ((kn->kn_flags & EV_ONESHOT) != EV_ONESHOT) {
> >                 calloutp = (struct callout *)kn->kn_hook;
> > -               callout_reset_curcpu(calloutp, timertoticks(kn->kn_sdata),
> > +               callout_reset_curcpu(calloutp, timertoticks(kn->kn_sdata) - 1,
> >                     filt_timerexpire, kn);
> >         }
> >  }
> >
> > --
> > John Baldwin
> > _______________________________________________
> > freebsd-hackers at freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
> 
> John,
> I don't think it's good to decrease by a unit the 'ticks' you pass to
> callout_reset_* KPI.
> If this have to be fixed it should be fixed at the callout level and
> not at the consumer level. In other words, subsystems that makes use
> of callout_reset_* should not deal with the inherent limitations of
> callout precision, as it is right now.
> 
> Davide

I'm not sure it can be fixed in the callout_reset_* routines because
those routines have no idea whether the passed-in tick count came from
tvtohz() where the +1 gets applied.  Either tvtohz() has to stop adding
a tick, then other code that needs the adjustment has to be changed to
add it, or code that doesn't need the adjustment has to subtract it back
out.  Neither prospect is very appealing, really.

-- Ian



More information about the freebsd-hackers mailing list