kqueue periodic timer confusion

Thu Jul 12 18:52:30 UTC 2012

On Thu, 2012-07-12 at 10:40 -0500, Ian Lepore wrote:
> On Thu, 2012-07-12 at 17:08 +0200, Davide Italiano wrote:
> > On Thu, Jul 12, 2012 at 4:26 PM, John Baldwin <jhb at freebsd.org> wrote:
> > > On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote:
> > >> On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote:
> > >> > On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote:
> > >> > > On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote:
> > >> > > > Hi,
> > >> > > >
> > >> > > > Sorry about this repost but I'm confused about the responses I received
> > >> > > > in my last post so I'm looking for some clarification.
> > >> > > >
> > >> > > > Specifically, I though I could use the kqueue timer as essentially a
> > >> > > > "drop in" replacement for linuxfd_create/read, but was surprised that
> > >> > > > the accuracy of the kqueue timer is much less than what I need for my
> > >> > > > application.
> > >> > > >
> > >> > > > So my confusion at this point is whether this is consider to be a bug or
> > >> > > > "feature"?
> > >> > > >
> > >> > > > Here's some test code if you want to verify the problem:
> > >> > > >
> > >> > > > #include <stdio.h>
> > >> > > > #include <stdlib.h>
> > >> > > > #include <string.h>
> > >> > > > #include <unistd.h>
> > >> > > > #include <errno.h>
> > >> > > > #include <sys/types.h>
> > >> > > > #include <sys/event.h>
> > >> > > > #include <sys/time.h>
> > >> > > >
> > >> > > > int
> > >> > > > main(void)
> > >> > > > {
> > >> > > >         int i,msec;
> > >> > > >         int kq,nev;
> > >> > > >         struct kevent inqueue;
> > >> > > >         struct kevent outqueue;
> > >> > > >         struct timeval start,end;
> > >> > > >
> > >> > > >         if ((kq = kqueue()) == -1) {
> > >> > > >                 fprintf(stderr, "kqueue error!? errno = %s",
> > >> > strerror(errno));
> > >> > > >                 exit(EXIT_FAILURE);
> > >> > > >         }
> > >> > > >         EV_SET(&inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0);
> > >> > > >
> > >> > > >         gettimeofday(&start, 0);
> > >> > > >         for (i = 0; i < 50; i++) {
> > >> > > >                 if ((nev = kevent(kq, &inqueue, 1, &outqueue, 1, NULL)) ==
> > >> > -1) {
> > >> > > >                         fprintf(stderr, "kevent error!? errno = %s",
> > >> > strerror(errno));
> > >> > > >                         exit(EXIT_FAILURE);
> > >> > > >                 } else if (outqueue.flags & EV_ERROR) {
> > >> > > >                         fprintf(stderr, "EV_ERROR: %s\n",
> > >> > strerror(outqueue.data));
> > >> > > >                         exit(EXIT_FAILURE);
> > >> > > >                 }
> > >> > > >         }
> > >> > > >         gettimeofday(&end, 0);
> > >> > > >
> > >> > > >         msec = ((end.tv_sec - start.tv_sec) * 1000) + (((1000000 +
> > >> > end.tv_usec - start.tv_usec) / 1000) - 1000);
> > >> > > >
> > >> > > >         printf("msec = %d\n", msec);
> > >> > > >
> > >> > > >         close(kq);
> > >> > > >         return EXIT_SUCCESS;
> > >> > > > }
> > >> > > >
> > >> > > >
> > >> > >
> > >> > > What you are seeing is "just the way FreeBSD currently works."
> > >> > >
> > >> > > Sleeping (in most all of its various forms, and I've just looked at the
> > >> > > kevent code to verify this is true there) is handled by converting the
> > >> > > amount of time to sleep (usually specified in a timeval or timespec
> > >> > > struct) to a count of timer ticks, using an internal routine called
> > >> > > tvtohz() in kern/kern_time.c.  That routine rounds up by one tick to
> > >> > > account for the current tick.  Whether that's a good idea or not (it
> > >> > > probably was once, and probably not anymore) it's how things currently
> > >> > > work, and could explain the fairly consistant +1ms you're seeing.
> > >> >
> > >> > This is all true, but mostly irrelevant for his case.  EVFILT_TIMER
> > >> > installs a periodic callout that executes KNOTE() and then resets itself (via
> > >> > callout_reset()) each time it runs.  This should generally be closer to
> > >> > regulary spaced intervals than something that does:
> > >> >
> > >>
> > >> In what way is it irrelevant?  That is, what did I miss?  It appears to
> > >> me that the next callout is scheduled by calling timertoticks() passing
> > >> a count of milliseconds, that count is converted to a struct timeval and
> > >> passed to tvtohz() which is where the +1 adjustment happens.  If you ask
> > >> for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms.
> > >> There is some time, likely a small number of microseconds, that you've
> > >> consumed of the current tick, and that's what the +1 in tvtohz() is
> > >> supposed to account for according to the comments.
> > >>
> > >> The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick
> > >> and then adds one tick on top of that.  That seems not quite right to
> > >> me, except that it is a way to g'tee that you don't return early, and
> > >> that is the one promise made by sleep routines on any OS; those magical
> > >> "at least" words always appear in the docs.
> > >>
> > >> Actually what I'm missing (that I know of) is how the scheduler works.
> > >> Maybe the +1 adjustment to account for the fraction of the current tick
> > >> you've already consumed is the right thing to do, even when that
> > >> fraction is 1uS or less of a 1mS tick.  That would depend on scheduler
> > >> behavior that I know nothing about.
> > >
> > > Ohhhhh.  My bad, sorry.  You are correct.  It is a bug to use +1 in this
> > > case.  That is, the +1 makes sense when you are computing a one-time delta
> > > for things like nanosleep().  It is incorrect when computing a periodic
> > > delta such as for computing the interval for an itimer (setitimer) or
> > > EVFILT_TIMER().
> > >
> > > Hah, setitimer()'s callout (realitexpire) uses tvtohz - 1:
> > >
> > > sys/kern/kern_time.c:
> > >
> > > /*
> > >  * Real interval timer expired:
> > >  * send process whose timer expired an alarm signal.
> > >  * If time is not set up to reload, then just return.
> > >  * Else compute next time timer should go off which is > current time.
> > >  * This is where delay in processing this timeout causes multiple
> > >  * SIGALRM calls to be compressed into one.
> > >  * tvtohz() always adds 1 to allow for the time until the next clock
> > >  * interrupt being strictly less than 1 clock tick, but we don't want
> > >  * that here since we want to appear to be in sync with the clock
> > >  * interrupt even when we're delayed.
> > >  */
> > > void
> > > realitexpire(void *arg)
> > > {
> > >         struct proc *p;
> > >         struct timeval ctv, ntv;
> > >
> > >         p = (struct proc *)arg;
> > >         PROC_LOCK(p);
> > >         kern_psignal(p, SIGALRM);
> > >         if (!timevalisset(&p->p_realtimer.it_interval)) {
> > >                 timevalclear(&p->p_realtimer.it_value);
> > >                 if (p->p_flag & P_WEXIT)
> > >                         wakeup(&p->p_itcallout);
> > >                 PROC_UNLOCK(p);
> > >                 return;
> > >         }
> > >         for (;;) {
> > >                 timevaladd(&p->p_realtimer.it_value,
> > >                     &p->p_realtimer.it_interval);
> > >                 getmicrouptime(&ctv);
> > >                 if (timevalcmp(&p->p_realtimer.it_value, &ctv, >)) {
> > >                         ntv = p->p_realtimer.it_value;
> > >                         timevalsub(&ntv, &ctv);
> > >                         callout_reset(&p->p_itcallout, tvtohz(&ntv) - 1,
> > >                             realitexpire, p);
> > >                         PROC_UNLOCK(p);
> > >                         return;
> > >                 }
> > >         }
> > >         /*NOTREACHED*/
> > > }
> > >
> > > Paul, try this patch for sys/kern/kern_event.c.  It uses the same approach as
> > > seitimer() above:
> > >
> > > Index: kern_event.c
> > > ===================================================================
> > > --- kern_event.c        (revision 238365)
> > > +++ kern_event.c        (working copy)
> > > @@ -538,7 +538,7 @@ filt_timerexpire(void *knx)
> > >
> > >         if ((kn->kn_flags & EV_ONESHOT) != EV_ONESHOT) {
> > >                 calloutp = (struct callout *)kn->kn_hook;
> > > -               callout_reset_curcpu(calloutp, timertoticks(kn->kn_sdata),
> > > +               callout_reset_curcpu(calloutp, timertoticks(kn->kn_sdata) - 1,
> > >                     filt_timerexpire, kn);
> > >         }
> > >  }
> > >
> > > --
> > > John Baldwin
> > > _______________________________________________
> > > freebsd-hackers at freebsd.org mailing list
> > > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> > > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
> > 
> > John,
> > I don't think it's good to decrease by a unit the 'ticks' you pass to
> > callout_reset_* KPI.
> > If this have to be fixed it should be fixed at the callout level and
> > not at the consumer level. In other words, subsystems that makes use
> > of callout_reset_* should not deal with the inherent limitations of
> > callout precision, as it is right now.
> > 
> > Davide
> 
> I'm not sure it can be fixed in the callout_reset_* routines because
> those routines have no idea whether the passed-in tick count came from
> tvtohz() where the +1 gets applied.  Either tvtohz() has to stop adding
> a tick, then other code that needs the adjustment has to be changed to
> add it, or code that doesn't need the adjustment has to subtract it back
> out.  Neither prospect is very appealing, really.
> 

One other thing to note that is that mac os has the same problem. Does
that make any difference in determining how to do the fix?

> -- Ian
> 
-- 
Paul Albrecht