R: Re: 6.4-RC2 crashes after a few minutes of uptime

Tue Nov 25 03:35:25 PST 2008

Ken,

I built a GENERIC debug kernel, and now have a backtrace that I can  
provide related to this problem on 6.4-RC2:

surfer#	kgdb /sys/i386/compile/GENERIC/kernel.debug /var/crash/vmcore.1	
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and  
you are
welcome to change it and/or distribute copies of it under certain  
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for  
details.
This GDB was configured as "i386-marcel-freebsd"...

Unread portion of the kernel message buffer:
acd0: WARNING - READ_TOC read data overrun 18>12
kernel trap 12 with interrupts disabled

Fatal trap 12: page fault while in kernel mode
fault virtual address	= 0x78
fault code		= supervisor read, page not present
instruction pointer	= 0x20:0xc06d39b9
stack pointer	        = 0x28:0xca865c10
frame pointer	        = 0x28:0xca865c14
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= resume, IOPL = 0
current process		= 19 (swi6: task queue)
trap number		= 12
panic: page fault
Uptime: 16m20s
Physical memory: 179 MB
Dumping 53 MB: 38 22 6

Reading symbols from /boot/kernel/snd_maestro.ko...done.
Loaded symbols for /boot/kernel/snd_maestro.ko
Reading symbols from /boot/kernel/sound.ko...done.
Loaded symbols for /boot/kernel/sound.ko
Reading symbols from /boot/kernel/acpi.ko...done.
Loaded symbols for /boot/kernel/acpi.ko
Reading symbols from /boot/kernel/mach64.ko...done.
Loaded symbols for /boot/kernel/mach64.ko
Reading symbols from /boot/kernel/drm.ko...done.
Loaded symbols for /boot/kernel/drm.ko
#0  doadump () at pcpu.h:165
165	pcpu.h: No such file or directory.
	in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:165
#1  0xc06b2e3e in boot (howto=260) at ../../../kern/kern_shutdown.c:410
#2  0xc06b30d4 in panic (fmt=0xc098be6b "%s")
     at ../../../kern/kern_shutdown.c:566
#3  0xc092b1f4 in trap_fatal (frame=0xca865bd0, eva=120)
     at ../../../i386/i386/trap.c:838
#4  0xc092a992 in trap (frame=
       {tf_fs = 8, tf_es = -1038352344, tf_ds = -1038352344, tf_edi =  
-1033627044, tf_esi = -1038289792, tf_ebp = -897164268, tf_isp =  
-897164292, tf_ebx = -1039268288, tf_edx = 0, tf_ecx = 4, tf_eax =  
-1038289760, tf_trapno = 12, tf_err = 0, tf_eip = -1066583623, tf_cs =  
32, tf_eflags = 589826, tf_esp = -1038289792, tf_ss = -897164232})  
at ../../../i386/i386/trap.c:270
#5  0xc0917e2a in calltrap () at ../../../i386/i386/exception.s:139
#6  0xc06d39b9 in turnstile_setowner (ts=0xc20e0640, owner=0x4)
     at ../../../kern/subr_turnstile.c:456
#7  0xc06d3d16 in turnstile_wait (lock=0xc2641aa8, owner=0x4, queue=0)
     at ../../../kern/subr_turnstile.c:661
#8  0xc06a9d2a in _mtx_lock_sleep (m=0xc2641aa8, tid=3256677504, opts=0,
     file=0x0, line=0) at ../../../kern/kern_mutex.c:579
#9  0xc06b2492 in _sema_post (sema=0xc2641aa8, file=0x0, line=0)
     at ../../../kern/kern_sema.c:79
#10 0xc04e7c26 in ata_completed (context=0xc2641a5c, dummy=1)
     at ../../../dev/ata/ata-queue.c:481
---Type <return> to continue, or q <return> to quit---
#11 0xc06d29a3 in taskqueue_run (queue=0xc21c4100)
     at ../../../kern/subr_taskqueue.c:257
#12 0xc06d2bb6 in taskqueue_swi_run (dummy=0x0)
     at ../../../kern/subr_taskqueue.c:299
#13 0xc069baad in ithread_execute_handlers (p=0xc21ce860, ie=0xc21c4080)
     at ../../../kern/kern_intr.c:682
#14 0xc069bbc8 in ithread_loop (arg=0xc214cb60)
     at ../../../kern/kern_intr.c:766
#15 0xc069aa34 in fork_exit (callout=0xc069bb74 <ithread_loop>,
     arg=0xc214cb60, frame=0xca865d38) at ../../../kern/kern_fork.c:788
#16 0xc0917e8c in fork_trampoline () at ../../../i386/i386/exception.s: 
208
(kgdb) print panicstr
$1 = 0xc0a8d480 "page fault"
(kgdb)

This panic happened just a few minutes after bootup completed, without  
logging on.

Also, I've noticed that sometimes when the panic happens, savecore(8)  
seems to be unable to recover the coredump in the swap area. I noticed  
that on the bootup, the system seems to engage the swap partition well  
before savecore(8) has a chance to scan it. So, I wondered if it's  
possible that maybe that when swap is being engaged, it may be writing  
something to the swap partition, effectively overwriting the signature  
that savecore(8) checks, to detect the existence of a core dump?

To recap (in case this doesn't get attached to the original thread)
As I said in the original message, another odd thing about this, is  
that usually after it crashes on the first (and sometimes second) cold  
boot, it will remain stable till the machine is shut down.

And as I think I also mentioned, once logged into GNOME, I see that a  
"blank disc" icon flashes off and on on the desktop, as if the system  
detects the existence of a CD in the drive, so that might be related.  
Though, this also happened with 6.3 + GNOME, though as I said, the  
system didn't panic there.

Thanks,

- rory