Strange crashing/rebooting problem

Dan Charrois dan at syz.com
Tue Oct 25 14:01:36 PDT 2005


Hi all.  I'm wondering if anyone can shed some light on a strange  
crashing/rebooting problem I'm having.  First, the specs:

Hardware: Dell PowerEdge 2850 rack mounted server, Dual 3.4 Ghz Xeon,  
5 Gb memory
Hard Drives: LSILogic PERC 4e/Di, configured as RAID 5, with 3 X 40  
Gb disks
OS: FreeBSD 5.4-RELEASE-p6 for amd64
Other related software: mysql  Ver 14.7 Distrib 4.1.14, for portbld- 
freebsd5.4 (amd64) using  4.3

I currently have hyperthreading enabled, since I'm not too concerned  
about the security of the system (it's on an internal-only network,  
with no user accounts other than the administrator, and figure that  
if the security issue associated with hyperthreading is the only  
problem, it wouldn't hurt to get a bit more speed).  It's intended to  
be a single-purpose MySQL server to other client machines via TCP/IP,  
and supposed to be a high reliability, fast as possible machine.

But the problem is this.  I have it set to run mysqlhotcopy a couple  
of times during the day to back up the databases.  And twice now in  
the last month or so, when it starts to run, it brings down the  
server.  But the odd thing is that it doesn't lock up indefinitely,  
or even reboot itself normally.  Instead, it suddenly seems to quit  
as though someone unplugged it and then goes through the boot  
sequence.  It's at a remote location from me, so I haven't been able  
to see the console while it goes through its problems, but according  
to /var/log/messages, everything is running fine, and then suddenly,  
starts to write its initial boot messages:

sql syslogd: kernel boot file is /boot/kernel/kernel
sql kernel: Copyright (c) 1992-2005 The FreeBSD Project.
etc..

There are no logs of any "shutting down" variety, and sure enough, I get

sql kernel: Mounting root from ufs:/dev/amrd0s1a
sql kernel: WARNING: / was not properly dismounted
sql kernel: WARNING: /usr was not properly dismounted

messages written a bit later in the boot sequence.

What gets me is that if the machine was "really" locking up due to a  
kernel panic or something, I would expect it to stay frozen and not  
restart itself.  But within a couple of minutes of going down hard,  
it has rebooted itself.  There isn't any kind of watchdog timer that  
reboots itself after a lockup that I'm not aware of, is there?
Because of this, I sometimes don't even realize it's happened until I  
found that the odd MySQL database needs to be repaired, and then I  
checked the logs and see what's happened.  According to the logs,  
it's almost as though it's getting physically unplugged midstream,  
then plugged back in and boots from there.  But it's in a locked  
cabinet in a colocation centre with other machines of mine which  
aren't having the problem, and it's happened twice now at exactly the  
same time - just right as mysqlhotcopy is about to run.

Considering that this machine is supposed to be high availability,  
being down for even a couple of minutes like this is a problem.   
Plus, I really don't like not understanding what's making it go down  
like it does, and I'm obviously concerned about data corruption to  
the databases when something like this happens.

Does anyone have any advice on what may be wrong, or something to  
try?  I really have no idea even how to begin to troubleshoot this  
problem.  If you need any more information at all, please let me know.

Thanks for your help!

Dan
--
Syzygy Research & Technology
Box 83, Legal, AB  T0G 1L0 Canada
Phone: 780-961-2213



More information about the freebsd-stable mailing list