Strange issue after early AP startup

John Baldwin jhb at freebsd.org
Tue Jan 17 19:10:22 UTC 2017


On Tuesday, January 17, 2017 07:04:15 PM Hans Petter Selasky wrote:
> On 01/17/17 16:50, John Baldwin wrote:
>  > On Monday, January 16, 2017 10:10:16 PM Hans Petter Selasky wrote:
>  >> On 01/16/17 20:31, John Baldwin wrote:
>  >>> On Monday, January 16, 2017 04:51:42 PM Hans Petter Selasky wrote:
>  >>>> Hi,
>  >>>>
>  >>>> When booting I observe an additional 30-second delay after this print:
>  >>>>
>  >>>>> Timecounters tick every 1.000 msec
>  >>>>
>  >>>> ~30 second delay and boot continues like normal.
>  >>>>
>  >>>> Checking "vmstat -i" reveals that some timers have been running loose.
>  >>>>
>  >>>>> cpu0:timer                         44300        442
>  >>>>> cpu1:timer                         40561        404
>  >>>>> cpu3:timer                      48462822     483058
>  >>>>> cpu2:timer                      48477898     483209
>  >>>>
>  >>>> Trying to add delays and/or prints around the Timecounters printout
>  >>>> makes the issue go away. Any ideas for debugging?
>  >>>
>  >>> I have generally used KTR tracing to trace what is happening during
>  >>> boot to debug EARLY_AP_STARTUP issues.
>  >>>
>  >>
>  >> Hi John,
>  >>
>  >> What happens is that getnextcpuevent(0) keeps on returning
>  >> "state->nextcall" which is in the past for CPU #2 and #3 on my box.
>  >>
>  >> In "cpu_new_callout()" there is a check if "bt >= state->nextcall",
>  >> which I suspect is true, so "state->nextcall" never gets set to real
>  >> minimum sbintime.
>  >>
>  >> The attached patch fixes the problem for me, but I'm not 100% sure 
> if it
>  >> is correct.
>  >
> 
> Hi,
> 
>  > I think we want to be honoring any currently scheduled callouts.
> 
> The problem here is that we might be changing the clocksource, then 
> sbinuptime() will change too, so I think the value should be reset by 
> configtimer() and then corrected at the next call to callout_process().
> 
>  > You could
>  > do that by setting it to 'cc_firstevent' of the associated CPU, but in
>  > practice 'state->nextcall' should already be set to that (it is 
> initalized
>  > to SBT_MAX in cpu_initclocks_bsp() and is then only set to other 
> values due
>  > to cpu_new_callout()).  Keep in mind that configtimer() is not just 
> called
>  > from boot, but is also invoked when starting/stopping the profiling 
> timer.
>  >
> 
>  > However, when setting 'nextevent' (which is used to schedule the next 
> timer
>  > interrupt), we should be honoring the existing 'nextcall' if it is sooner
>  > than the next hardclock.
> 
> Does this matter for the first tick? How often is configtimer() called?

As I said, it is called at runtime when profclock is started / stopped, not
just at boot.  Those changes at runtime probably have existing callouts
active and your change will not process any callouts until the next hardclock
tick fires (but only because you are setting nextcallopt to the bogus
'next' value).

>  > (One odd thing is that even in your case the first call to 
> handleevents(),
>  > the 'now => state->nextcallout' check in handleevents() should be true
>  > which resets both nextcall and nextcallopt and invokes 
> callout_process().)
> 
> Let me take you through the failure path, by code inspection:

I would really appreciate it if you could add traces to find out what actually
happens rather than what seems to happen by looking at the code. :-/

0) cpu_initclocks_bsp() is called and init's nextcall and nexcallopt to SBT_MAX
   similar to your change.  If no callout is scheduled before configtimer()
   then they remain set to SBT_MAX.  Your current patch happens to trigger a
   (bogus) call to callout_process() on the first hardclock() because it
   sets nextcallopt to 'next' even though no callout is actually scheduled to
   fire at time 'next'.

> 1) configtimer() is called and we init nextcall and nextcallopt:
> 
>  >                 next = now + timerperiod;
> ...
>  >                       state->nextcall = next;
>  >                       state->nextcallopt = next;

These both say "the next callout() should fire at 'next' which is the time of
the next hardclock()", even though there may be no callouts scheduled (in which
case both of these fields should be set to SBT_MAX from the call to
cpu_initclocks_bsp(), or there may be callouts scheduled in which case 'nextcall'
and 'nextcallopt' will reflect the time that those callouts are already
scheduled for and this overwrites that).

> 2) Any callout_reset() calls cpu_new_callout():
> 
>  >          */
>  >         state->nextcallopt = bt_opt;
>  >         if (bt >= state->nextcall)
> We follow this path, because "bt" is surely based on sbinuptime() and is 
> greater or equal to state->nextcall. Note that state->nextcallopt is 
> updated to bt_opt, which is in the future.

Note, my patch should _leave_ nextcall at SBT_MAX (from cpu_initclocks_bsp())
unless there was already an earlier call to callout_reset().  IOW, it
should be a NOP for the purposes of this branch compared with your change.
(You could add a warning to print out if 'nextcall' != SBT_MAX during boot
and see if it fires for example.)

>  >                 goto done;
>  >         state->nextcall = bt;
> 
> 3) getnextcpuevent(0) is called by the fast timercb() to setup the next 
> event:
> 
>  >         state = DPCPU_PTR(timerstate);
>  >         /* Handle hardclock() events, skipping some if CPU is idle. */
>  >         event = state->nexthard;
> ...
>  >         /* Handle callout events. */
>  >         if (event > state->nextcall)
> We then go looping into this path, because state->nextcall is still 
> equal to "next" as in step 1) which is now in the past, until "now >= 
> state->nextcallopt" inside handleevents(), which clears this condition.
>  >                 event = state->nextcall;
> ...
>  >         return (event);

I'm curious if there is a callout_reset() that has set 'nextcall' to a time
that is effectively before 'now'.

Maybe add a printf like this:

Index: kern_clocksource.c
===================================================================
--- kern_clocksource.c	(revision 312301)
+++ kern_clocksource.c	(working copy)
@@ -498,12 +498,18 @@ configtimer(int start)
 		CPU_FOREACH(cpu) {
 			state = DPCPU_ID_PTR(cpu, timerstate);
 			state->now = now;
+			printf("%s: CPU %d: now %jd nextcall %jd nextcallopt %jd next %jd\n", __func__, cpu, state->nextcall, state->nextcall, next);
 #ifndef EARLY_AP_STARTUP
 			if (!smp_started && cpu != CPU_FIRST())
 				state->nextevent = SBT_MAX;
 			else
 #endif

In particular what I am worried about with your patch is that for
post-boot calls to configtimer() you will delay any previously-scheduled
callouts until the next hardclock.

-- 
John Baldwin


More information about the freebsd-current mailing list