For some time now we have been having a lot of trouble with one particular 
server which is part of a farm of six other largely identical servers. 
These servers run under extremely high load through a majority of the day 
and run a mix of postfix, MySQL (running as replication slaves) and custom 
filter software using MFS partitions.  All seven servers are running on 
identical SuperMicro 6013E-i SuperServers with dual hyper-threading Xeon 
2.80GHz CPU's with 2G of RAM.  It is not all together uncommon for these 
machines to crash under extremely high load, but this one server in 
particular crashes much more frequently.

We started with memtest and CPU tests with no errors.  As part of our 
troubleshooting we have replaced (or swapped out with the other servers) 
every piece of hardware in this box, replaced every cable and cord and 
moved to different switch and power ports.  We've even changed physical 
locations in our data center.  We have so far been unable resolve the more 
frequent crashes or move the increased instability to another server in an 
effort to find the cause.  We've also disable hyper-threading in the bios 
and in FreeBSD on this machine since it sounds as if we might see other 
benefits from this.  Also, as a stretch I've moved this box to using the 
ULE scheduler instead of the standard 4BSD.  Really I'm starting to 
suspect it is haunted (or that I'm sleepdriving into work at night to foil 
my own progress).

These boxes traditionally run FreeBSD 4.11, but in a move of desperation 
we decided to take this particular machine up to FreeBSD 6.1 in an effort 
to rule out problems related to OS improvements and to ensure we are 
running the latest stable version of the different software pieces (and 
because it seems like the right move in the long term).  (We install 
service software manually by the way, not from ports.  MySQL we've 
installed from their binary distribution for 6.x.)

With the upgrade we are still receiving crashes at the same frequency and 
although the errors appear to report a bit differently they appear to be 
the same errors.  Mostly a combination of "Fatal Trap 12" and 
"vm_page_fault" errors, though we have seen a couple "Sleeping thread owns 
a non-sleepable lock" errors.

The biggest frustration in this is that of the few dozen crashes we've had 
I've only been able to get one successful dump.  All the other times I get 
the savecore error message:

   kernel: kernel dumps on /dev/ad0s1b
   kernel: Checking for core dump on /dev/ad0s1b...
   kernel: unable to open bounds file, using 0
   kernel: checking for kernel dump on device /dev/ad0s1b
   kernel: mediasize = 4294967296
   kernel: sectorsize = 512
   kernel: magic mismatch on last dump header on /dev/ad0s1b
   kernel: savecore: no dumps found
   savecore: no dumps found

Is there something I am missing to more reliably receive successful dumps? 
I have plenty of space on /var (22G) and my swap partition is 4G (with 2G 
of RAM).

The one successful dump returned the below gdb information.  I've also 
included the non-commented bits of our kernel config at the very bottom.

If anyone has any suggestions on what this dump information indicates I 
would be very appreciative.  Please let me know what other information I 
can furnish. If I can determine how to get another vmcore I'd be happy to 
send along another debug as well.

Thank you very much in advance.

Matt Ruzicka - Senior Systems Administrator
Front Range Internet, Inc.
matt at - (970) 212-0728


[GDB will not be able to debug user-mode threads: 
/usr/lib/ Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you 
welcome to change it and/or distribute copies of it under certain 
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for 
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:
vm_page_free: pindex(3255307648), busy(194), PG_BUSY(1), hold(-10260)
panic: vm_page_free: freeing busy page
cpuid = 0
Uptime: 18h43m26s
Dumping 2047 MB (2 chunks)
   chunk 0: 1MB (159 pages) ... ok
   chunk 1: 2047MB (524016 pages) 2031 2015 1999 1983 1967 1951 1935 1919 
1903 1887 1871 1855 1839 1823 1807 1791 1775 1759 1743 1727 1711 1695 1679 
1663 1647 1631 1615 1599 1583 1567 1551 1535 1519 1503 1487 1471 1455 1439 
1423 1407 1391 1375 1359 1343 1327 1311 1295 1279 1263 1247 1231 1215 1199 
1183 1167 1151 1135 1119 1103 1087 1071 1055 1039 1023 1007 991 975 959 
943 927 911 895 879 863 847 831 815 799 783 767 751 735 719 703 687 671 
655 639 623 607 591 575 559 543 527 511 495 479 463 447 431 415 399 383 
367 351 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 
63 47 31 15

#0  doadump () at pcpu.h:165
165     pcpu.h: No such file or directory.
         in pcpu.h
(kgdb) where
#0  doadump () at pcpu.h:165
#1  0xc04b029d in boot (howto=260) at 
#2  0xc04b05c5 in panic (fmt=0xc0600359 "vm_page_free: freeing busy page")
     at /u/frii/src/FreeBSD-6.1-RELEASE/sys/kern/kern_shutdown.c:558
#3  0xc05a2f45 in vm_page_free_toq (m=0xc207d7b0)
     at /u/frii/src/FreeBSD-6.1-RELEASE/sys/vm/vm_page.c:1025
#4  0xc05a256d in vm_page_free (m=0xc207d7b0) at 
#5  0xc059ff39 in vm_object_terminate (object=0xc878b4a4)
     at /u/frii/src/FreeBSD-6.1-RELEASE/sys/vm/vm_object.c:631
#6  0xc059fe13 in vm_object_deallocate (object=0xc878b4a4)
     at /u/frii/src/FreeBSD-6.1-RELEASE/sys/vm/vm_object.c:564
#7  0xc059c8fa in vm_map_entry_delete (map=0xc9f7e12c, entry=0xca3e2c38)
     at /u/frii/src/FreeBSD-6.1-RELEASE/sys/vm/vm_map.c:2207
#8  0xc059cac7 in vm_map_delete (map=0xc9f7e12c, start=3335031932, 
     at /u/frii/src/FreeBSD-6.1-RELEASE/sys/vm/vm_map.c:2300
#9  0xc059cb28 in vm_map_remove (map=0xc9f7e12c, start=0, end=3217031168)
     at /u/frii/src/FreeBSD-6.1-RELEASE/sys/vm/vm_map.c:2319
#10 0xc0496fcd in exit1 (td=0xc9d93190, rv=0) at vm_map.h:211
#11 0xc04969b8 in sys_exit (td=0xc9d93190, uap=0x0)
     at /u/frii/src/FreeBSD-6.1-RELEASE/sys/kern/kern_exit.c:97
#12 0xc05d8917 in syscall (frame=
       {tf_fs = 59, tf_es = 59, tf_ds = -1079115717, tf_edi = -1077942712, 
tf_esi = -1077942820, tf_ebp = -1077942876, tf_isp = -387965596, tf_ebx = 
672734248, tf_edx = 10, tf_ecx = 672733680, tf_eax = 1, tf_trapno = 12, 
tf_err = 2, tf_eip = 672673571, tf_cs = 51, tf_eflags = 646, tf_esp = 
-1077942904, tf_ss = 59}) at 
#13 0xc05c58bf in Xint0x80_syscall () at 
#14 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) up 2
#2  0xc04b05c5 in panic (fmt=0xc0600359 "vm_page_free: freeing busy page")
     at /u/frii/src/FreeBSD-6.1-RELEASE/sys/kern/kern_shutdown.c:558
558             boot(bootopt);
(kgdb) p bootopt
$1 = 260
(kgdb) p *bootopt
Cannot access memory at address 0x104


machine         i386
cpu             I686_CPU
ident           MAFILTER-NEW
makeoptions     DEBUG=-g                # Build kernel with gdb(1) debug 
options         SCHED_ULE               # ULE scheduler
options         PREEMPTION              # Enable kernel thread preemption
options         INET                    # InterNETworking
options         FFS                     # Berkeley Fast Filesystem
options         SOFTUPDATES             # Enable FFS soft updates support
options         UFS_ACL                 # Support for access control lists
options         UFS_DIRHASH             # Improve performance on big 
options         NFSCLIENT               # Network Filesystem Client
options         PROCFS                  # Process filesystem (requires 
options         PSEUDOFS                # Pseudo-filesystem framework
options         COMPAT_43               # Compatible with BSD 4.3 [KEEP 
options         COMPAT_FREEBSD4         # Compatible with FreeBSD4
options         COMPAT_FREEBSD5         # Compatible with FreeBSD5
options         KTRACE                  # ktrace(1) support
options         SYSVSHM                 # SYSV-style shared memory
options         SYSVMSG                 # SYSV-style message queues
options         SYSVSEM                 # SYSV-style semaphores
options         _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time 
options         KBD_INSTALL_CDEV        # install a CDEV entry in /dev
options         AHC_REG_PRETTY_PRINT    # Print register bitfields in 
                                         # output.  Adds ~128k to driver.
options         AHD_REG_PRETTY_PRINT    # Print register bitfields in 
                                         # output.  Adds ~215k to driver.
options         ADAPTIVE_GIANT          # Giant mutex is adaptive.
options         SMP                     # Symmetric MultiProcessor Kernel
device          apic                    # I/O APIC
device          eisa
device          pci
device          ata
device          atadisk         # ATA disk drives
device          atkbdc          # AT keyboard controller
device          atkbd           # AT keyboard
device          psm             # PS/2 mouse
device          kbdmux          # keyboard multiplexer
device          vga             # VGA video card driver
device          sc
device          em              # Intel PRO/1000 adapter Gigabit Ethernet 
device          miibus          # MII bus support
device          fxp             # Intel EtherExpress PRO/100B (82557, 
device          loop            # Network loopback
device          random          # Entropy device
device          ether           # Ethernet support
device          tun             # Packet tunnel.
device          pty             # Pseudo-ttys (telnet etc)
device          md              # Memory "disks"

