kqueue periodic timer confusion

Thu Jul 19 20:33:30 UTC 2012

on 19/07/2012 18:11 Davide Italiano said the following:
> On Thu, Jul 19, 2012 at 5:06 PM, Paul Albrecht <albrecht at glccom.com> wrote:
>> On Fri, 2012-07-13 at 07:22 -0500, Davide Italiano wrote:
>>> On Thu, Jul 12, 2012 at 5:25 PM, John Baldwin <jhb at freebsd.org> wrote:
>>>> On Thursday, July 12, 2012 11:08:47 am Davide Italiano wrote:
>>>>> On Thu, Jul 12, 2012 at 4:26 PM, John Baldwin <jhb at freebsd.org> wrote:
>>>>>> On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote:
>>>>>>> On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote:
>>>>>>>> On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote:
>>>>>>>>> On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Sorry about this repost but I'm confused about the responses I received
>>>>>>>>>> in my last post so I'm looking for some clarification.
>>>>>>>>>>
>>>>>>>>>> Specifically, I though I could use the kqueue timer as essentially a
>>>>>>>>>> "drop in" replacement for linuxfd_create/read, but was surprised that
>>>>>>>>>> the accuracy of the kqueue timer is much less than what I need for my
>>>>>>>>>> application.
>>>>>>>>>>
>>>>>>>>>> So my confusion at this point is whether this is consider to be a bug or
>>>>>>>>>> "feature"?
>>>>>>>>>>
>>>>>>>>>> Here's some test code if you want to verify the problem:
>>>>>>>>>>
>>>>>>>>>> #include <stdio.h>
>>>>>>>>>> #include <stdlib.h>
>>>>>>>>>> #include <string.h>
>>>>>>>>>> #include <unistd.h>
>>>>>>>>>> #include <errno.h>
>>>>>>>>>> #include <sys/types.h>
>>>>>>>>>> #include <sys/event.h>
>>>>>>>>>> #include <sys/time.h>
>>>>>>>>>>
>>>>>>>>>> int
>>>>>>>>>> main(void)
>>>>>>>>>> {
>>>>>>>>>>         int i,msec;
>>>>>>>>>>         int kq,nev;
>>>>>>>>>>         struct kevent inqueue;
>>>>>>>>>>         struct kevent outqueue;
>>>>>>>>>>         struct timeval start,end;
>>>>>>>>>>
>>>>>>>>>>         if ((kq = kqueue()) == -1) {
>>>>>>>>>>                 fprintf(stderr, "kqueue error!? errno = %s",
>>>>>>>> strerror(errno));
>>>>>>>>>>                 exit(EXIT_FAILURE);
>>>>>>>>>>         }
>>>>>>>>>>         EV_SET(&inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0);
>>>>>>>>>>
>>>>>>>>>>         gettimeofday(&start, 0);
>>>>>>>>>>         for (i = 0; i < 50; i++) {
>>>>>>>>>>                 if ((nev = kevent(kq, &inqueue, 1, &outqueue, 1, NULL)) ==
>>>>>>>> -1) {
>>>>>>>>>>                         fprintf(stderr, "kevent error!? errno = %s",
>>>>>>>> strerror(errno));
>>>>>>>>>>                         exit(EXIT_FAILURE);
>>>>>>>>>>                 } else if (outqueue.flags & EV_ERROR) {
>>>>>>>>>>                         fprintf(stderr, "EV_ERROR: %s\n",
>>>>>>>> strerror(outqueue.data));
>>>>>>>>>>                         exit(EXIT_FAILURE);
>>>>>>>>>>                 }
>>>>>>>>>>         }
>>>>>>>>>>         gettimeofday(&end, 0);
>>>>>>>>>>
>>>>>>>>>>         msec = ((end.tv_sec - start.tv_sec) * 1000) + (((1000000 +
>>>>>>>> end.tv_usec - start.tv_usec) / 1000) - 1000);
>>>>>>>>>>
>>>>>>>>>>         printf("msec = %d\n", msec);
>>>>>>>>>>
>>>>>>>>>>         close(kq);
>>>>>>>>>>         return EXIT_SUCCESS;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What you are seeing is "just the way FreeBSD currently works."
>>>>>>>>>
>>>>>>>>> Sleeping (in most all of its various forms, and I've just looked at the
>>>>>>>>> kevent code to verify this is true there) is handled by converting the
>>>>>>>>> amount of time to sleep (usually specified in a timeval or timespec
>>>>>>>>> struct) to a count of timer ticks, using an internal routine called
>>>>>>>>> tvtohz() in kern/kern_time.c.  That routine rounds up by one tick to
>>>>>>>>> account for the current tick.  Whether that's a good idea or not (it
>>>>>>>>> probably was once, and probably not anymore) it's how things currently
>>>>>>>>> work, and could explain the fairly consistant +1ms you're seeing.
>>>>>>>>
>>>>>>>> This is all true, but mostly irrelevant for his case.  EVFILT_TIMER
>>>>>>>> installs a periodic callout that executes KNOTE() and then resets itself (via
>>>>>>>> callout_reset()) each time it runs.  This should generally be closer to
>>>>>>>> regulary spaced intervals than something that does:
>>>>>>>>
>>>>>>>
>>>>>>> In what way is it irrelevant?  That is, what did I miss?  It appears to
>>>>>>> me that the next callout is scheduled by calling timertoticks() passing
>>>>>>> a count of milliseconds, that count is converted to a struct timeval and
>>>>>>> passed to tvtohz() which is where the +1 adjustment happens.  If you ask
>>>>>>> for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms.
>>>>>>> There is some time, likely a small number of microseconds, that you've
>>>>>>> consumed of the current tick, and that's what the +1 in tvtohz() is
>>>>>>> supposed to account for according to the comments.
>>>>>>>
>>>>>>> The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick
>>>>>>> and then adds one tick on top of that.  That seems not quite right to
>>>>>>> me, except that it is a way to g'tee that you don't return early, and
>>>>>>> that is the one promise made by sleep routines on any OS; those magical
>>>>>>> "at least" words always appear in the docs.
>>>>>>>
>>>>>>> Actually what I'm missing (that I know of) is how the scheduler works.
>>>>>>> Maybe the +1 adjustment to account for the fraction of the current tick
>>>>>>> you've already consumed is the right thing to do, even when that
>>>>>>> fraction is 1uS or less of a 1mS tick.  That would depend on scheduler
>>>>>>> behavior that I know nothing about.
>>>>>>
>>>>>> Ohhhhh.  My bad, sorry.  You are correct.  It is a bug to use +1 in this
>>>>>> case.  That is, the +1 makes sense when you are computing a one-time delta
>>>>>> for things like nanosleep().  It is incorrect when computing a periodic
>>>>>> delta such as for computing the interval for an itimer (setitimer) or
>>>>>> EVFILT_TIMER().
>>>>>>
>>>>>> Hah, setitimer()'s callout (realitexpire) uses tvtohz - 1:
>>>>>>
>>>>>> sys/kern/kern_time.c:
>>>>>>
>>>>>> /*
>>>>>>  * Real interval timer expired:
>>>>>>  * send process whose timer expired an alarm signal.
>>>>>>  * If time is not set up to reload, then just return.
>>>>>>  * Else compute next time timer should go off which is > current time.
>>>>>>  * This is where delay in processing this timeout causes multiple
>>>>>>  * SIGALRM calls to be compressed into one.
>>>>>>  * tvtohz() always adds 1 to allow for the time until the next clock
>>>>>>  * interrupt being strictly less than 1 clock tick, but we don't want
>>>>>>  * that here since we want to appear to be in sync with the clock
>>>>>>  * interrupt even when we're delayed.
>>>>>>  */
>>>>>> void
>>>>>> realitexpire(void *arg)
>>>>>> {
>>>>>>         struct proc *p;
>>>>>>         struct timeval ctv, ntv;
>>>>>>
>>>>>>         p = (struct proc *)arg;
>>>>>>         PROC_LOCK(p);
>>>>>>         kern_psignal(p, SIGALRM);
>>>>>>         if (!timevalisset(&p->p_realtimer.it_interval)) {
>>>>>>                 timevalclear(&p->p_realtimer.it_value);
>>>>>>                 if (p->p_flag & P_WEXIT)
>>>>>>                         wakeup(&p->p_itcallout);
>>>>>>                 PROC_UNLOCK(p);
>>>>>>                 return;
>>>>>>         }
>>>>>>         for (;;) {
>>>>>>                 timevaladd(&p->p_realtimer.it_value,
>>>>>>                     &p->p_realtimer.it_interval);
>>>>>>                 getmicrouptime(&ctv);
>>>>>>                 if (timevalcmp(&p->p_realtimer.it_value, &ctv, >)) {
>>>>>>                         ntv = p->p_realtimer.it_value;
>>>>>>                         timevalsub(&ntv, &ctv);
>>>>>>                         callout_reset(&p->p_itcallout, tvtohz(&ntv) - 1,
>>>>>>                             realitexpire, p);
>>>>>>                         PROC_UNLOCK(p);
>>>>>>                         return;
>>>>>>                 }
>>>>>>         }
>>>>>>         /*NOTREACHED*/
>>>>>> }
>>>>>>
>>>>>> Paul, try this patch for sys/kern/kern_event.c.  It uses the same approach as
>>>>>> seitimer() above:
>>>>>>
>>>>>> Index: kern_event.c
>>>>>> ===================================================================
>>>>>> --- kern_event.c        (revision 238365)
>>>>>> +++ kern_event.c        (working copy)
>>>>>> @@ -538,7 +538,7 @@ filt_timerexpire(void *knx)
>>>>>>
>>>>>>         if ((kn->kn_flags & EV_ONESHOT) != EV_ONESHOT) {
>>>>>>                 calloutp = (struct callout *)kn->kn_hook;
>>>>>> -               callout_reset_curcpu(calloutp, timertoticks(kn->kn_sdata),
>>>>>> +               callout_reset_curcpu(calloutp, timertoticks(kn->kn_sdata) - 1,
>>>>>>                     filt_timerexpire, kn);
>>>>>>         }
>>>>>>  }
>>>>>>
>>>>>> --
>>>>>> John Baldwin
>>>>>> _______________________________________________
>>>>>> freebsd-hackers at freebsd.org mailing list
>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>>>>>> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
>>>>>
>>>>> John,
>>>>> I don't think it's good to decrease by a unit the 'ticks' you pass to
>>>>> callout_reset_* KPI.
>>>>> If this have to be fixed it should be fixed at the callout level and
>>>>> not at the consumer level. In other words, subsystems that makes use
>>>>> of callout_reset_* should not deal with the inherent limitations of
>>>>> callout precision, as it is right now.
>>>>
>>>> Given that you are reworking callout already, it would seem a bit odd
>>>> to rework callout a separate time just to fix this bug.  A simple fix
>>>> like this (which follows the same pattern we already use for setitimer())
>>>> is something we can easily merge back to 8 and 9.  Reworking callout in
>>>> a different way than you are already doing, OTOH, is not something we can
>>>> merge trivially.
>>>>
>>>> --
>>>> John Baldwin
>>>
>>> I understand what you mean. Indeed I hadn't thought about the
>>> difficulties of merging that work back. Only a thing: if I'd were you
>>> before committing I'd add a comment explaining the reasons of the
>>> negative correction in the timeout value passed.
>>>
>>
>> ... and you're going to test the kqueue perioic timer after you make
>> your changes to make sure there aren't any regressions? Right?
>>
> Yes, this is the idea.

If you scrolled down here, then probably you have already learned what
over-quoting is :-)

>> The reason I'm asking is because darwin's bsd subsystem uses the callout
>> framework for its kqueue implementation and the periodic timer is of
>> course broken.
>>
>>> Davide
>> --
>> Paul Albrecht
>>

Including signatures into quotes is a sign of a bad mail user agent.

-- 
Andriy Gapon