Revision 309657 to stack_machdep.c renders unbootable system

Fri Dec 16 23:19:13 UTC 2016

On Wednesday, December 14, 2016 04:50:21 PM Mark Johnston wrote:
> On Wed, Dec 14, 2016 at 03:48:04PM -0800, Steven G. Kargl wrote:
> > On Wed, Dec 14, 2016 at 02:10:48PM -0800, Mark Johnston wrote:
> > > On Wed, Dec 14, 2016 at 12:14:16PM -0800, Mark Johnston wrote:
> > > > On Wed, Dec 14, 2016 at 11:49:26AM -0800, Steven G. Kargl wrote:
> > > > > Well, after 3 days of bisection, I finally found the commit
> > > > > that renders my system unbootable.  The system does not panic.
> > > > > It simply gets stuck in some state.  Nonfunctional keyboard,
> > > > > so can't break into debugger.  No serial console available.
> > > > > The verbose dmesg.boot for a working kernel from revision
> > > > > 309656 is at
> > > > > 
> > > > > http://troutmask.apl.washington.edu/~kargl/freebsd/dmesg.309656.txt
> > > > > 
> > > > > The kernel config file is at
> > > > > 
> > > > > http://troutmask.apl.washington.edu/~kargl/freebsd/SPEW.txt
> > > > > 
> > > > > In looking at /usr/src/UPDATING, there is no warning that one
> > > > > can create a boat anchor by upgrading to 309657.  If compiling
> > > > > a kernel with 'options DDB' is no longer supported, this should
> > > > > be stated in UPDATING.  Or, UPDATING should state that 'options
> > > > > DDB' requires 'options STACK'.  Or, 'options DDB' should simply
> > > > > to the right thing and pull in whatever 'option STACK' does. 
> > > > 
> > > > It is supported though - the point of that change was to fix a problem
> > > > that occurred when DDB is configured but STACK isn't. While testing I
> > > > tried every combination of the two options, and I just tried and
> > > > successfully booted a kernel with DDB and !STACK.
> > > > 
> > > > Does the kernel boot successfully if STACK is added to your
> > > > configuration?
> > > 
> > > I tried your config (plus virtio drivers) and was able to reproduce the
> > > hang in bhyve. Adding STACK "fixed" the hang, as did reverting part of
> > > my change to re-add dead code into the kernel. My VM was always hanging
> > > after printing
> > > 
> > > 000.000050 [ 426] vtnet_netmap_attach       virtio attached txq=1, txd=1024 rxq=1, rxd=1024
> > > 
> > > Sure enough, removing "device netmap" from your config also fixes the
> > > hang. When the hang occurs, I can see with "bhyvectl --get-rip" that
> > > we're stuck in DELAY(), but I can't get a stack at that point. I think
> > > my change is an innocent bystander - it just happened to expose a latent
> > > issue elsewhere.
> > > 
> > > I don't have much more time to look at this right now, but I'll look
> > > into it more tonight.
> > 
> > Yes, adding STACK got me to a booting kernel.  I can't remember
> > why I added netmap to my config file.  Re-adding dead code seems to
> > point to some memory corruption issue or a rogue pointer. :(
> 
> It's not quite that bad, as it turns out. The key is that
> adding/removing the dead code changes the ordering of the items in the
> sysinit linker set. I discovered that if the ctl(4) module is
> initialized before the vtnet driver attaches, the hang occurs, and
> reverting my commit results in a sysinit order where vtnet comes
> _before_ ctl(4). So my change triggers the problem just because it
> happens to perturb something in the compile-time linker.
> 
> > 
> > BTW, I think it would be prudent to add something like 
> > 
> >   20161206:
> >      At revision 309657, 'options STACK' was introduced into
> >      sys/x86/x86/mstack_machdep.c.  Old kernel configuration files
> >      that included 'options DDB' are now required to include also
> >      'options STACK'.
> > 
> > to UPDATING or some such wording.  I was jumping from circ Oct 10th world
> > to top of tree, and got caught by ~3000 commits.
> 
> The issue actually seems to be in 4BSD, and more specifically in r308564
> and r308565. Switching to ULE or reverting either of those two commits
> fixes the hang. Here's what happens:
> 
> 1. ctl_init() runs and creates ctl_thresh_thread. This thread's main
>    loop cause pause(9) when it has no work to do. During boot, pause(9)
>    just calls DELAY() and does not yield the CPU.
> 2. thread0 attaches the vtnet driver. As part of this, it creates and
>    starts some high-priority taskqueue threads in
>    _taskqueue_thread_start(). They're added to the scheduler with:
> 
>    thread_lock();
>    sched_pri(...);
>    sched_add(...);
>    thread_unlock();
> 
>    4BSD's sched_add() will call maybe_preempt() in this case, which as
>    of r308564 will unconditionally set td_owepreempt in the current
>    thread.
> 3. thread_unlock() will release the critical section held by the current
>    thread and because td_owepreempt is set, we'll yield the CPU. The
>    taskqueue threads have nothing to do, but ctl_thresh_thread runs
>    and ends up busy-waiting in pause() forever.
> 
> r308565 removes a check in maybe_preempt() that would have stopped
> td_owepreempt from being set. Before r308564, maybe_preempt() would have
> switched directly to the new thread and apparently always switched back
> immediately.
> 
> I'm not sure what the correct fix is - jhb might have an idea. I wonder
> if pause() should try to yield periodically when called during boot.

So the hack in pause() is probably not as necessary now.  In particular, I
think we only need it for thread0, not for other threads.  The patch below
worked for me with SPEW's config:

Index: kern_synch.c
===================================================================

--- kern_synch.c	(revision 310128)
+++ kern_synch.c	(working copy)
@@ -321,7 +321,8 @@ pause_sbt(const char *wmesg, sbintime_t sbt, sbint
 	if (sbt == 0)
 		sbt = tick_sbt;
 
-	if (cold || kdb_active || SCHEDULER_STOPPED()) {
+	if ((cold && curthread == &thread0) || kdb_active ||
+	    SCHEDULER_STOPPED()) {
 		/*
 		 * We delay one second at a time to avoid overflowing the
 		 * system specific DELAY() function(s):



-- 
John Baldwin