Panic : bad pte

Thu Mar 21 17:39:25 UTC 2013

Le mercredi 20 mars 2013 20:09:42 David Demelier a écrit :
> 2013/3/19 Jeremy Chadwick <jdc at koitsu.org>:
> > On Tue, Mar 19, 2013 at 06:34:24PM +0100, David Demelier wrote:
> >> Hello,
> >> 
> >> There it is, all my computers on FreeBSD 9.1-RELEASE had panic. I can
> >> just say there is a problem in the 9.1-RELEASE because I had no panic
> >> before. What afraid me is that my production server also panic'ed a
> >> few days ago, fortunately it does not appears so often.
> >> 
> >> This is a panic that happened on my desktop computer, with a graphic
> >> card. The crash usually appears when X starts.
> >> 
> >> GNU gdb 6.1.1 [FreeBSD]
> >> Copyright 2004 Free Software Foundation, Inc.
> >> GDB is free software, covered by the GNU General Public License, and you
> >> are welcome to change it and/or distribute copies of it under certain
> >> conditions. Type "show copying" to see the conditions.
> >> There is absolutely no warranty for GDB.  Type "show warranty" for
> >> details.
> >> This GDB was configured as "amd64-marcel-freebsd"...
> >> 
> >> Unread portion of the kernel message buffer:
> >> panic: bad pte
> >> cpuid = 3
> >> KDB: stack backtrace:
> >> Uptime: 2m31s
> >> Dumping 183 out of 1950
> >> MB:..9%..18%..27%..35%..44%..53%..62%..79%..88%..96%
> >> 
> >> Reading symbols from /boot/modules/nvidia.ko...done.
> >> Loaded symbols for /boot/modules/nvidia.ko
> >> #0  doadump (textdump=Variable "textdump" is not available.
> >> ) at pcpu.h:224
> >> 224     pcpu.h: No such file or directory.
> >> 
> >>         in pcpu.h
> >> 
> >> (kgdb) bt
> >> #0  doadump (textdump=Variable "textdump" is not available.
> >> ) at pcpu.h:224
> >> #1  0x0000000000000004 in ?? ()
> >> #2  0xffffffff8048c156 in kern_reboot (howto=260) at
> >> /usr/src/sys/kern/kern_shutdown.c:448
> >> #3  0xffffffff8048c619 in panic (fmt=0x1 <Address 0x1 out of bounds>)
> >> at /usr/src/sys/kern/kern_shutdown.c:636
> >> #4  0xffffffff8065f88a in pmap_remove_pages (pmap=0xfffffe0005a2fa60)
> >> at /usr/src/sys/amd64/amd64/pmap.c:4156
> >> #5  0xffffffff8063d26b in vmspace_exit (td=0xfffffe0005a05470) at
> >> /usr/src/sys/vm/vm_map.c:422
> >> #6  0xffffffff8045d725 in exit1 (td=0xfffffe0005a05470, rv=Variable
> >> "rv" is not available.
> >> ) at /usr/src/sys/kern/kern_exit.c:315
> >> #7  0xffffffff8045e5ce in sys_sys_exit (td=Variable "td" is not
> >> available.
> >> ) at /usr/src/sys/kern/kern_exit.c:122
> >> #8  0xffffffff8066737f in amd64_syscall (td=0xfffffe0005a05470,
> >> traced=0) at subr_syscall.c:135
> >> #9  0xffffffff80652d97 in Xfast_syscall () at
> >> /usr/src/sys/amd64/amd64/exception.S:387
> >> #10 0x0000000800d51c1c in ?? ()
> >> Previous frame inner to this frame (corrupt stack?)
> >> (kgdb)
> >> 
> >> Of course I may do something wrong, and I hope so but unfortunately I
> >> can't find any solution. May the nvidia driver be the problem?
> > 
> > Interesting timing.  Semi-recently (February) src/sys/amd64/amd64/pmap.c
> > in 9.1-STABLE (not -RELEASE) was modified to increase the information
> > shown for this specific type of panic.  See revision 247079:
> > 
> > http://svnweb.freebsd.org/base/stable/9/sys/amd64/amd64/pmap.c?view=log
> > 
> > I've CC'd Konstantin Belousov (kib@), who should be able to help step
> > you through getting information out of the crash dump, to help track
> > down the root cause.
> > 
> > --
> > 
> > | Jeremy Chadwick                                   jdc at koitsu.org |
> > | UNIX Systems Administrator                http://jdc.koitsu.org/ |
> > | Mountain View, CA, US                                            |
> > | Making life hard for others since 1977.             PGP 4BD6C0CB |
> 
> You will not believe that, when I leave the desktop. I completely
> detach the AC adaptor (usually at evening). And everyday when I plug
> it and start the machine it panics. But when it reboots and start
> again no panic anymore. I just can't believe it.
> 
> The panic is completely different from yesterday's one :
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x8010
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x20:0xffffffff8049db4e
> stack pointer           = 0x28:0xffffff8000225a90
> frame pointer           = 0x28:0xfffffe000247c8e0
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = resume, IOPL = 0
> current process         = 10 (idle: cpu0)
> trap number             = 12
> panic: page fault
> cpuid = 0
> KDB: stack backtrace:
> Uptime: 1m3s
> Dumping 324 out of 1950 MB:..5%..15%..25%..35%..45%..55%..65%..74%..84%..94%
> 
> Reading symbols from /boot/modules/nvidia.ko...done.
> Loaded symbols for /boot/modules/nvidia.ko
> #0  doadump (textdump=Variable "textdump" is not available.
> ) at pcpu.h:224
> 224     pcpu.h: No such file or directory.
>         in pcpu.h
> (kgdb) bt
> #0  doadump (textdump=Variable "textdump" is not available.
> ) at pcpu.h:224
> #1  0x0000000000000004 in ?? ()
> #2  0xffffffff80489506 in kern_reboot (howto=260) at
> /usr/src/sys/kern/kern_shutdown.c:448
> #3  0xffffffff804899c9 in panic (fmt=0x1 <Address 0x1 out of bounds>)
> at /usr/src/sys/kern/kern_shutdown.c:636
> #4  0xffffffff80664e39 in trap_fatal (frame=0xc, eva=Variable "eva" is
> not available.
> ) at /usr/src/sys/amd64/amd64/trap.c:857
> #5  0xffffffff806651c4 in trap_pfault (frame=0xffffff80002259e0,
> usermode=0) at /usr/src/sys/amd64/amd64/trap.c:773
> #6  0xffffffff806655bb in trap (frame=0xffffff80002259e0) at
> /usr/src/sys/amd64/amd64/trap.c:456
> #7  0xffffffff8064fe5f in calltrap () at
> /usr/src/sys/amd64/amd64/exception.S:228
> #8  0xffffffff8049db4e in callout_tickstofirst (limit=250) at
> /usr/src/sys/kern/kern_timeout.c:381
> #9  0xffffffff806761d1 in getnextcpuevent (event=0xffffff8000225b10,
> idle=1) at /usr/src/sys/kern/kern_clocksource.c:282
> #10 0xffffffff8067741e in cpu_idleclock () at
> /usr/src/sys/kern/kern_clocksource.c:785
> #11 0xffffffff8065685a in cpu_idle (busy=0) at
> /usr/src/sys/amd64/amd64/machdep.c:801
> #12 0xffffffff804b0a3f in sched_idletd (dummy=Variable "dummy" is not
> available. ) at /usr/src/sys/kern/sched_ule.c:2617
> #13 0xffffffff8045c88d in fork_exit (callout=0xffffffff804b07f0
> <sched_idletd>, arg=0x0, frame=0xffffff8000225c40) at
> /usr/src/sys/kern/kern_fork.c:992
> #14 0xffffffff8065031e in fork_trampoline () at
> /usr/src/sys/amd64/amd64/exception.S:602
> #15 0x0000000000000000 in ?? ()
> #16 0x0000000000000000 in ?? ()
> #17 0x0000000000000001 in ?? ()
> #18 0x0000000000000000 in ?? ()
> #19 0x0000000000000000 in ?? ()
> #20 0x0000000000000000 in ?? ()
> #21 0x0000000000000000 in ?? ()
> #22 0x0000000000000000 in ?? ()
> #23 0x0000000000000000 in ?? ()
> #24 0x0000000000000000 in ?? ()
> #25 0x0000000000000000 in ?? ()
> #26 0x0000000000000000 in ?? ()
> #27 0x0000000000000000 in ?? ()
> #28 0x0000000000000000 in ?? ()
> #29 0x0000000000000000 in ?? ()
> #30 0x0000000000000000 in ?? ()
> #31 0x0000000000000000 in ?? ()
> #32 0x0000000000000000 in ?? ()
> #33 0x0000000000000000 in ?? ()
> #34 0x0000000000000000 in ?? ()
> #35 0x0000000000000000 in ?? ()
> #36 0x0000000000000000 in ?? ()
> #37 0x0000000000000000 in ?? ()
> #38 0x0000000000000000 in ?? ()
> #39 0xffffffff809fcdc0 in affinity ()
> #40 0x0000000000000000 in ?? ()
> #41 0xfffffe000247cd20 in ?? ()
> #42 0xfffffe000247c8e0 in ?? ()
> #43 0x0000000000000000 in ?? ()
> #44 0xffffff8000225aa8 in ?? ()
> #45 0xfffffe000247f8e0 in ?? ()
> #46 0xffffffff804b1b49 in sched_switch (td=0xffffffff804b07f0,
> newtd=0x0, flags=Variable "flags" is not available.
> ) at /usr/src/sys/kern/sched_ule.c:1921
> 
> As you can see the uptime is almost the same, just after the start.
> I'm guessing if I have no hardware failure such as power problems. And
> now I'm writing from it aftour an uptime of one hour without any
> problem. This is just crazy.
> 

You can forget all my emails. It's definitively a hardware failure. Even 
Windows 7 is crashing now. A few weeks ago I have sent the mother board + CPU 
unit + memory to my reseller and they said there are no problem at all so for 
now I suspect hard drive or power supply failures, however I wonder how these 
computer parts may break the system..

Regards,

-- 
David Demelier