SMP crashes / reboots 5.4 with CPanel

Robin Vley viper at fx-services.com
Thu Apr 5 12:48:32 UTC 2007


Hi!

I posted this to the FBSD-Questions mailinglist, because I'm completely
not sure if this is hardware or software. Last time I got some good
pointers there, but since I'm 100% in the dark where this is coming
from, I crosspost it here.

For a couple of years already I've been trying to find out why our
hosting machine reboots randomly. Got some tips, mostly about hardware.
What happens is that both the main server and the backup server (which
is just idling) just reboot. Sometimes after 60 days, sometimes after
one day. No logs, no strange traffic patterns, nothing. I enabled kernel
debugging. Caught a crashdump on our backup machine which I will post
below. The process that crashes is the CPU monitor for Cpanel. I
disabled that one, so it crashed on any other process (httpd, perl,
etc). I tried disabling ACPI, rebuild world with just -O in make.conf,
etc etc. This morning the main server rebooted again, it didn't even
leave a dump in /var/crash. Hardware is not the same. This behavious
I've seen on dual athlons (two different mainboards) and dual Xeons. It
seems related to SMP code. Played around with idle and hyperthreading
settings in sysctl too. Nothing seems to make any difference at all. The
crashump is below, does anyone have ANY idea what might cause this?

The machine is running on a SuperMicro Dual Xeon board (X5DPA-TMG+).
Crashes happen on this board, but also on the Tyan MPX dual athlon
systems. I think it has to be the cpanel hosting panel, but such an
application shouldn't be able to to crash the OS...

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 01
fault virtual address   = 0x98
fault code              = supervisor write, page not present
instruction pointer     = 0x20:0xc06b7f1e
stack pointer           = 0x28:0xece5f730
frame pointer           = 0x28:0xece5f774
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 69885 (dcpumon)
trap number             = 12
panic: page fault
cpuid = 0
Uptime: 2d22h1m13s
Dumping 2047 MB (2 chunks)
  chunk 0: 1MB (159 pages) ... ok
  chunk 1: 2047MB (523904 pages) 2031 2015 1999 1983 1967 1951 1935 1919
1903 1887 1871 1855 1839 1823 1807 1791 1775 1759 1743 1727 1711 1695
1679 1663 1647 1631 1615 1599 1583 1567 1551 1535 1519 1503 1487 1471
1455 1439 1423 1407 1391 1375 1359 1343 1327 1311 1295 1279 1263 1247
1231 1215 1199 1183 1167 1151 1135 1119 1103 1087 1071 1055 1039 1023
1007 991 975 959 943 927 911 895 879 863 847 831 815 799 783 767 751 735
719 703 687 671 655 639 623 607 591 575 559 543 527 511 495 479 463 447
431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 159
143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:165
165             __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) backtrace
#0  doadump () at pcpu.h:165
#1  0xc063efca in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xc063f396 in panic (fmt=0xc0870bd4 "%s") at
/usr/src/sys/kern/kern_shutdown.c:555
#3  0xc082e16c in trap_fatal (frame=0xece5f6f0, eva=0) at
/usr/src/sys/i386/i386/trap.c:831
#4  0xc082de52 in trap_pfault (frame=0xece5f6f0, usermode=0, eva=152) at
/usr/src/sys/i386/i386/trap.c:742
#5  0xc082da02 in trap (frame=
      {tf_fs = 8, tf_es = 40, tf_ds = 40, tf_edi = 4, tf_esi = 0, tf_ebp
= -320473228, tf_isp = -320473316, tf_ebx = 4098, tf_edx = -1002850048,
tf_ecx = 0, tf_eax = 4, tf_trapno = 12, tf_err = 2, tf_eip =
-1066696930, tf_cs = 32, tf_eflags = 66118, tf_esp = -320473100, tf_ss =
1017})
    at /usr/src/sys/i386/i386/trap.c:432
#6  0xc0817d0a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7  0xc06b7f1e in vn_lock (vp=0x0, flags=4098, td=0xc439b900) at
atomic.h:149
#8  0xc05eee46 in procfs_doprocfile (td=0xc439b900, p=0xc9068830,
pn=0xc35f3900, sb=0x4, uio=0x0) at /usr/src/sys/fs/procfs/procfs.c:73
#9  0xc05f3f5b in pfs_readlink (va=0x4) at pcpu.h:162
#10 0xc0841a13 in VOP_READLINK_APV (vop=0x4, a=0xc439b900) at
vnode_if.c:1481
#11 0xc06b14e3 in kern_readlink (td=0xc439b900, path=0xc439b900 "<j\006É
x\006É", pathseg=3292117248, buf=0x4 <Address 0x4 out of bounds>, bufseg=4,
    count=1024) at vnode_if.h:772
#12 0xc06b13e8 in readlink (td=0x4, uap=0xc439b900) at
/usr/src/sys/kern/vfs_syscalls.c:2261
#13 0xc082e573 in syscall (frame=
      {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 135512892, tf_esi =
135663632, tf_ebp = -1077940936, tf_isp = -320471708, tf_ebx =
674109588, tf_edx = -1077941960, tf_ecx = 0, tf_eax = 58, tf_trapno = 0,
tf_err = 2, tf_eip = 672579140, tf_cs = 51, tf_eflags = 647, tf_esp =
-1077942020, tf_ss = 59}) at /usr/src/sys/i386/i386/trap.c:976
#14 0xc0817d5f in Xint0x80_syscall () at
/usr/src/sys/i386/i386/exception.s:200
#15 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)

/Robin
_______________________________________________
freebsd-questions at freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscribe at freebsd.org"



-- 
Robin Vley
F/X Services Managed Hosting
http://www.fx-services.com


More information about the freebsd-hardware mailing list