Kernel hanging after today cvsup at Timecounter "TSC

Wed May 23 17:02:20 UTC 2007

2007/5/23, Robert Watson <rwatson at freebsd.org>:
>
> On Tue, 22 May 2007, David Wolfskill wrote:
>
> >> I saw an identical problem when I booted a kernel from yesterday with
> >> kernel modules from a week or so earlier, only mine froze before getting to
> >> the Timecounter line, rather, immediately after the vga0 line.  Perhaps
> >> we're looking at an issue with mixing modules with a kernel from
> >> before/after the compiler upgrade?
> >
> > I'm about as certain as I can be that ssuch "module-mixing" did not cause
> > the hang that I saw on my laptop today, as I was able to successfully boot
> > from a kernel built (along with world, at the time) on 19 May (after the gcc
> > 4 import).
> >
> > In my case, the keyboard was unresponsive (e.g., didn't go into "scroll
> > lock" mode when I used the keyboard chord to request that), and the fan
> > started gaining speed pretty rapidly, so I elected to power-cycle the
> > machine aftter a few seconds of trying less disruptive means of getting its
> > attention.
> >
> > I believe the last message displayed had the word "probe" in it.  I am
> > certain that the system hadn't tried to mount any file systems yet.
> >
> > Sorry I don't have more specifics handy.  I did save the hanging kernel, of
> > course.
>
> Indeed, looks like I was on the wrong track -- with INVARIANTS enabled, I see
> something more sensible.  Apologies for any typos, this is hand-transcribed:
>
> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
> panic: corupt spinlock
> KDB: enter: panic
> [thread pid 0 tid 0 ]
> Stopped at kdb_enter+0x32: leave
> db> bt
> Tracing pid 0 tid 0 td 0xc0b36620
> kdb_enter(...) at kdb_enter+0x32
> panic(...) at panic+0xc5
> statclock(...) at statclock+0x108
> hardclock(...) at hardclock+0x37
> clkintr(...) at clkintr+0xd2
> intr_execute_handlers(...) at intr_execute_handlers+0xe5
> atpic_handle_intr(...) at atpic_handle_intr+0xba
> Xatpic_intr0() at Xatpic_int0+0x21
> --- interrupt, eip = 0xc099409b, esp=0xc1020cb7, ebp = 0xc1020cbc ---
> spinlock_exit(...) at spinlock_exit+0x2b
> _mtx_unlock_spin_flags(...) at _mtx_unlock_spin_flags+0xdc
> atpic_enable_source(...) at atpic_enable_source+0x86
> intr_add_handler(...) at intr_add_handler+0xb3
> cpu_initclockS(...) at cpu_initclocks+0x50
> initclocks(...) at initclocks+0xb
> mi_startup(...) at mi_startup+0x96
> begin() at begin+0x2c
>
> I've uploaded the actual stack trace with arguments as
>
>   http://www.watson.org/~robert/freebsd/20070523-panic.png

Ok, it seems that the new time_lock spinlock is initialized too late,
and infact cpu_initclocks() is called before it is initialized.

It is enough to move the mtx_init() just before the cpu_initclocks()
in kern_clock.c::initclocks().

Can someone try that and report if it works?

I'm sorry for that, but I (and presumabily other people testing the
patch) never got that problems on all machines we used (probabilly
beacause this is just triggered when attaching vgaX).

Thanks,
Attilio

-- 
Peace can only be achieved by understanding - A. Einstein