calcru: negative time ... followed by freeze

Mon Jun 21 12:53:53 GMT 2004

On Mon, 21 Jun 2004, Don Lewis wrote:

> On 21 Jun, Don Lewis wrote:
> > On 21 Jun, Bruce Evans wrote:
> >> Ah, here is a likely cause of the bug in -current:
> >>
> >> % 	if (p == curthread->td_proc) {
> >> % 		/*
> >> % 		 * Adjust for the current time slice.  This is actually fairly
> >> % 		 * important since the error here is on the order of a time
> >> % 		 * quantum, which is much greater than the sampling error.
> >> % 		 * XXXKSE use a different test due to threads on other
> >> % 		 * processors also being 'current'.
> >> % 		 */
> >> % 		binuptime(&bt);
> >> % 		bintime_sub(&bt, PCPU_PTR(switchtime));
> >> % 		bintime_add(&bt, &p->p_runtime);
> >> % 	} else
> >> % 		bt = p->p_runtime;
> >>
> >> The XXXKSE comment is correct that this might be broken.  If the (p
> >> ...
>
> It looks like another way to trigger this in the SMP case would be to
> have two threads of the same process running at the same time, and for
> the second thread to call calcru() to have been running for a shorter
> period of time than when the first thread called calcru().

I think my test for the problem case covers this.  It's necessary to
add the runtime for the current slice for all threads in the process,
but the above adds it for at most one thread.  I set a flag to indicate
that there is a problem if any running thread in the process can't be
handled:

%%%
	rt = p->p_runtime;
	problemcase = 0;
	FOREACH_THREAD_IN_PROC(p, td) {
		/*
		 * Adjust for the current time slice.  This is actually fairly
		 * important since the error here is on the order of a time
		 * quantum, which is much greater than the sampling error.
		 */
		if (td == curthread) {
			binuptime(&bt);
			bintime_sub(&bt, PCPU_PTR(switchtime));
			bintime_add(&rt, &bt);
		} else {
			/*
			 * This case should add the current time less the
			 * switch time as above, but the switch time is
			 * inaccessible.  So we might end up with rt too
			 * small and then the monotonicity check might detect
			 * the problem.  Just set a flag to avoid warning
			 * about this known problem.
			 */
			problemcase = 1;
		}
	}
%%%

Oops, this is missing the critical TD_IS_RUNNING(td) condition for setting
problemcase.

> In the SMP cases, it probably makes sense to just silently to do
> 	if (tu < ptu)
> 		tu = ptu
> because of the complications of attempting to do an accurate
> calculation.

I'm tempted to do that because the debugging code is so ugly, but the
diagnostic has been useful for finding bugs that weren't all there when
it was written, so I'd prefer not to remove it.

> > There must be some !SMP trigger for this as well.  I just checked and I
> > was able to trigger this on my Pentium-M laptop as well by leaning on
> > the ^T key while I was logged on via ssh and running 'portupgrade -aP'.
> >
> > Jun 21 00:41:31 hairball kernel: calcru: negative time of 23169 usec for pid 44653 (sh)
> > Jun 21 00:41:32 hairball kernel: calcru: negative time of 21990 usec for pid 44665 (sh)
> >
> > I didn't use ^T on my Athlon box.  I might have had top running, though.
> >
> > It's interesting that this bug only seems to get triggered on /bin/sh.
> > Maybe it is fork()/exit()/wait() related?
>
> It looks like the bug is in the exit code.  I tweaked the printf() in
> calcru() to print out p_state, p_flag, and p_sflag in addition to the
> other info.  In all cases, the processes that trigger the printf were
> zombies, and show up as [running] in ttyinfo() on a uniprocessor box.

There is a PR about this (#52490).  The oops in my test fixes it for bogus
reasons.  I'm currently adding similar printfs to help figure out what is
going wrong.

> Jun 21 03:17:03 hairball kernel: calcru: negative time of 179 usec for
> pid 4543 (sh) p_state=0x2 p_flag=0x2002 p_sflag=0x1
>
> load: 0.71  cmd: sh 4543 [running] 0.00u 0.00s 3% 0k
>
> p_runtime only gets updated in mi_switch(), and it appears that it never
> gets updated after the calcru() call in exit1().

That explains the problem.  The calcru() values sets up to date (final)
values for the components of ptu.  When we look at the process after it
has become a zombie, we use only p_runtime since the process is not
running, but p_runtime is stale.

> It also looks like a bug that a zombie remains in the [running] state
> and thus looks interesting to ttyinfo().

I think it isn't really running.  ttyinfo() should pick it if it is the
only process the terminal.  ttyinfo() does pick it for the zombie in the
test program in the PR, and reports that it is running, but ps reports
it correctly as a zomble.

Bruce