bin/102747: ppp timer subsystem stops under certain circumstances

Sergey Zaharchenko doublef-ctm at yandex.ru
Fri Sep 1 11:10:24 UTC 2006


>Number:         102747
>Category:       bin
>Synopsis:       ppp timer subsystem stops under certain circumstances
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Sep 01 11:10:21 GMT 2006
>Closed-Date:
>Last-Modified:
>Originator:     Sergey Zaharchenko
>Release:        FreeBSD 7.0-CURRENT i386
>Organization:
Volgograd State Technical University
>Environment:
System: FreeBSD shark.localdomain 7.0-CURRENT FreeBSD 7.0-CURRENT #1:
Mon Aug 28 15:41:24 UTC 2006
root at shark.localdomain:/var/obj/src/usr.src/sys/GENERIC i386
>Description:
There are certain situations which cause the ppp timer subsystem to stop
(that is, there are timers in the list, but none of them ever get
triggered). This is caused by calling setitimer() with a zero it_value
in timer_InitService(). Killing ppp with SIGALRM triggers the first
timer and resumes normal operation. Here's a description of one of the
situations when the offending setitimer() call could happen.

If two timers (call them A and B) are near expiration (less than ~0.05
seconds left before triggering; A is first in the list; B->rest==0,
because they trigger at roughly the same time), and, at the same time,
ppp decides to stop A using StopTimerNoBlock(), the call adjusts
TimerList->rest to the itimer's value ``so that it reflects what's
really happening''.

Now A->rest==RESTVAL(small time value)==0, B->rest==0. Then
B->rest+=A->rest; 0+0==0; B->rest is still zero. A is dequeued, B is now
first in the list, and TimerList==B. timer_InitService() is called,
because the first timer was deleted.

timer_InitService() calls setitimer() with it_value==(a conversion
of)B->rest==0, which stops the itimer according to getitimer(2). If a
timer is later added, B will still sit there with rest==0, preventing
the itimer from launching. Neither B nor any other timer will ever be
serviced, unless some magic intervenes. An example of such magic is the
user killing ppp with SIGALRM. Then B will get serviced and dequeued,
and itimer relaunched. This is not the only situation when
TimerList->rest==0.

>How-To-Repeat:
The described situation appears only rarely in ordinary practice. They
happen when a lot of timers are being started, stopped and expired (for
example, just after the authentification succeeds). A way of detecting
the situation would be to insert the following line into timer.c:

     if (setitimer(ITIMER_REAL, &itimer, NULL) == -1)
       log_Printf(LogERROR, "Unable to set itimer (%s)\n", strerror(errno));
+    if (TimerList->rest==0) raise(SIGSTOP);
   }

recompile and reinstall ppp, add "set log timer" to ppp.conf and run ppp
in the following script:

#!/bin/sh
ppp -dd whatever
# We assume 100 seconds is enough to connect and authenticate to the peer
until ps -o state,comm | grep '^T *ppp$'; do killall -INT ppp; sleep 100;done

The script should exit when the offending call has just been issued. You
can then ``killall -CONT ppp'' and observe the absence of any timer
activity in the log. You can ``killall -ALRM ppp'' to revive it.

>Fix:
Never calling timer_InitService() when TimerList->rest==0 seems to be
the right thing to do.

The following is a patch which avoids situations when TimerList->rest==0
and timer_InitService() is called. 

---------------------------------------------------------------------cut here
--- ppp-7.0-CURRENT.orig/timer.c	Wed Jan  5 19:25:29 2005
+++ ppp-7.0-CURRENT/timer.c	Fri Sep  1 13:37:08 2006
@@ -94,9 +94,16 @@
     return;
   }
 
-  /* Adjust our first delta so that it reflects what's really happening */
+  /*
+   * Adjust the tick count so that it roughly corresponds to reality.
+   * Changing TimerList->rest might cause it to become zero (if
+   * getitimer() returns a value close to zero), and the
+   * timer_InitService() call will call setitimer() with zero it_value,
+   * stopping the itimer.
+   */
+
   if (TimerList && getitimer(ITIMER_REAL, &itimer) == 0)
-    TimerList->rest = RESTVAL(itimer);
+    ticks=(RESTVAL(itimer)-(long)TimerList->rest);
 
   pt = NULL;
   for (t = TimerList; t; t = t->next) {
@@ -156,14 +163,24 @@
 	timer_TermService();	/* Terminate Timer Service */
     }
     if (t->next) {
+      unsigned t_rest=t->rest;
       if (!pt) {		/* t (tp) was the first in the list */
         struct itimerval itimer;
 
         if (getitimer(ITIMER_REAL, &itimer) == 0)
-          t->rest = RESTVAL(itimer);
+          t_rest = RESTVAL(itimer);
       }
-      t->next->rest += t->rest;
-      if (!pt)			/* t->next is now the first in the list */
+      t->next->rest += t_rest;
+
+      /*
+       * It's still possible for t->next->rest to be zero (just
+       * because the next timer should trigger right after the
+       * one being stopped), in which case calling timer_InitService()
+       * would be a bad idea. Don't do that, then. In this case, the
+       * SIGALRM will trigger the next timer, so that's OK.
+       */
+
+      if (!pt&&(t->next->rest>0)) /* t->next is now the first in the list */
         timer_InitService(1);
     }
   } else {
@@ -235,11 +252,18 @@
 {
   struct itimerval itimer;
   struct pppTimer *pt;
-  u_long rest = 0;
+  long rest = 0;	/* should allow negative values */
+
+  /*
+   * Adjust the base time so that the deltas reflect what's really
+   * happening. Changing TimerList->rest might cause it to become zero
+   * (if getitimer() returns a value close to zero), and the
+   * timer_InitService() call will call setitimer() with zero it_value,
+   * stopping the itimer.
+   */
 
-  /* Adjust our first delta so that it reflects what's really happening */
   if (TimerList && getitimer(ITIMER_REAL, &itimer) == 0)
-    TimerList->rest = RESTVAL(itimer);
+    rest=(RESTVAL(itimer)-(long)TimerList->rest);
 
 #define SECS(val)	((val) / SECTICKS)
 #define HSECS(val)	(((val) % SECTICKS) * 100 / SECTICKS)
---------------------------------------------------------------------cut here

Another way to fix the problem is to simply check for TimerList->rest==0
after setitimer() in timer_InitService() and do a raise(SIGALRM) then.
It's simpler, but some of the accuracy is lost.
>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list