SMP crashes / reboots 5.4 with CPanel

Simon simon at optinet.com
Thu Apr 5 17:05:23 UTC 2007


Get 6.2, 5.4 is old and is known to have kernel bugs that are fixed in 6.x.

-Simon

On Thu, 05 Apr 2007 14:49:52 +0200, Robin Vley wrote:

>Hi!

>I posted this to the FBSD-Questions mailinglist, because I'm completely
>not sure if this is hardware or software. Last time I got some good
>pointers there, but since I'm 100% in the dark where this is coming
>from, I crosspost it here.

>For a couple of years already I've been trying to find out why our
>hosting machine reboots randomly. Got some tips, mostly about hardware.
>What happens is that both the main server and the backup server (which
>is just idling) just reboot. Sometimes after 60 days, sometimes after
>one day. No logs, no strange traffic patterns, nothing. I enabled kernel
>debugging. Caught a crashdump on our backup machine which I will post
>below. The process that crashes is the CPU monitor for Cpanel. I
>disabled that one, so it crashed on any other process (httpd, perl,
>etc). I tried disabling ACPI, rebuild world with just -O in make.conf,
>etc etc. This morning the main server rebooted again, it didn't even
>leave a dump in /var/crash. Hardware is not the same. This behavious
>I've seen on dual athlons (two different mainboards) and dual Xeons. It
>seems related to SMP code. Played around with idle and hyperthreading
>settings in sysctl too. Nothing seems to make any difference at all. The
>crashump is below, does anyone have ANY idea what might cause this?

>The machine is running on a SuperMicro Dual Xeon board (X5DPA-TMG+).
>Crashes happen on this board, but also on the Tyan MPX dual athlon
>systems. I think it has to be the cpanel hosting panel, but such an
>application shouldn't be able to to crash the OS...

>Fatal trap 12: page fault while in kernel mode
>cpuid = 0; apic id = 01
>fault virtual address   = 0x98
>fault code              = supervisor write, page not present
>instruction pointer     = 0x20:0xc06b7f1e
>stack pointer           = 0x28:0xece5f730
>frame pointer           = 0x28:0xece5f774
>code segment            = base 0x0, limit 0xfffff, type 0x1b
>                        = DPL 0, pres 1, def32 1, gran 1
>processor eflags        = interrupt enabled, resume, IOPL = 0
>current process         = 69885 (dcpumon)
>trap number             = 12
>panic: page fault
>cpuid = 0
>Uptime: 2d22h1m13s
>Dumping 2047 MB (2 chunks)
>  chunk 0: 1MB (159 pages) ... ok
>  chunk 1: 2047MB (523904 pages) 2031 2015 1999 1983 1967 1951 1935 1919
>1903 1887 1871 1855 1839 1823 1807 1791 1775 1759 1743 1727 1711 1695
>1679 1663 1647 1631 1615 1599 1583 1567 1551 1535 1519 1503 1487 1471
>1455 1439 1423 1407 1391 1375 1359 1343 1327 1311 1295 1279 1263 1247
>1231 1215 1199 1183 1167 1151 1135 1119 1103 1087 1071 1055 1039 1023
>1007 991 975 959 943 927 911 895 879 863 847 831 815 799 783 767 751 735
>719 703 687 671 655 639 623 607 591 575 559 543 527 511 495 479 463 447
>431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 159
>143 127 111 95 79 63 47 31 15

>#0  doadump () at pcpu.h:165
>165             __asm __volatile("movl %%fs:0,%0" : "=r" (td));
>(kgdb) backtrace
>#0  doadump () at pcpu.h:165
>#1  0xc063efca in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399
>#2  0xc063f396 in panic (fmt=0xc0870bd4 "%s") at
>/usr/src/sys/kern/kern_shutdown.c:555
>#3  0xc082e16c in trap_fatal (frame=0xece5f6f0, eva=0) at
>/usr/src/sys/i386/i386/trap.c:831
>#4  0xc082de52 in trap_pfault (frame=0xece5f6f0, usermode=0, eva=152) at
>/usr/src/sys/i386/i386/trap.c:742
>#5  0xc082da02 in trap (frame=
>      {tf_fs = 8, tf_es = 40, tf_ds = 40, tf_edi = 4, tf_esi = 0, tf_ebp
>= -320473228, tf_isp = -320473316, tf_ebx = 4098, tf_edx = -1002850048,
>tf_ecx = 0, tf_eax = 4, tf_trapno = 12, tf_err = 2, tf_eip =
>-1066696930, tf_cs = 32, tf_eflags = 66118, tf_esp = -320473100, tf_ss =
>1017})
>    at /usr/src/sys/i386/i386/trap.c:432
>#6  0xc0817d0a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
>#7  0xc06b7f1e in vn_lock (vp=0x0, flags=4098, td=0xc439b900) at
>atomic.h:149
>#8  0xc05eee46 in procfs_doprocfile (td=0xc439b900, p=0xc9068830,
>pn=0xc35f3900, sb=0x4, uio=0x0) at /usr/src/sys/fs/procfs/procfs.c:73
>#9  0xc05f3f5b in pfs_readlink (va=0x4) at pcpu.h:162
>#10 0xc0841a13 in VOP_READLINK_APV (vop=0x4, a=0xc439b900) at
>vnode_if.c:1481
>#11 0xc06b14e3 in kern_readlink (td=0xc439b900, path=0xc439b900 "<j\006É
>x\006É", pathseg=3292117248, buf=0x4 <Address 0x4 out of bounds>, bufseg=4,
>    count=1024) at vnode_if.h:772
>#12 0xc06b13e8 in readlink (td=0x4, uap=0xc439b900) at
>/usr/src/sys/kern/vfs_syscalls.c:2261
>#13 0xc082e573 in syscall (frame=
>      {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 135512892, tf_esi =
>135663632, tf_ebp = -1077940936, tf_isp = -320471708, tf_ebx =
>674109588, tf_edx = -1077941960, tf_ecx = 0, tf_eax = 58, tf_trapno = 0,
>tf_err = 2, tf_eip = 672579140, tf_cs = 51, tf_eflags = 647, tf_esp =
>-1077942020, tf_ss = 59}) at /usr/src/sys/i386/i386/trap.c:976
>#14 0xc0817d5f in Xint0x80_syscall () at
>/usr/src/sys/i386/i386/exception.s:200
>#15 0x00000033 in ?? ()
>Previous frame inner to this frame (corrupt stack?)

>/Robin
>_______________________________________________
>freebsd-questions at freebsd.org mailing list
>http://lists.freebsd.org/mailman/listinfo/freebsd-questions
>To unsubscribe, send any mail to "freebsd-questions-unsubscribe at freebsd.org"



>-- 
>Robin Vley
>F/X Services Managed Hosting
>http://www.fx-services.com
>_______________________________________________
>freebsd-hardware at freebsd.org mailing list
>http://lists.freebsd.org/mailman/listinfo/freebsd-hardware
>To unsubscribe, send any mail to "freebsd-hardware-unsubscribe at freebsd.org"





More information about the freebsd-hardware mailing list