Sudden Reboots

Jim Durham durham at jcdurham.com
Thu Sep 30 07:03:02 PDT 2004


I have had this problem now with at least 3 FreeBSD servers over a period of 
about 2 years. I had put it down to some hardware problem but it seems to be 
too much of a coincidence with 3 different machines doing the same thing.

The first time was when I put 4.5-RELEASE on a brand new Dell Poweredge 2650. 
I ran it on the bench for a week or so, then decided all was well and put it 
in the server rack and started doing the company's email service on it. After 
a few weeks, it suddenly would 'reboot' for no apparent reason. No log 
entries, nothing at all except the usual stuff in /var/log/messages about '/ 
was not unmounted correctly', etc. Just like you had pulled the power plug.

The 2nd instance was a server that I maintain for an ISP that was a mirror 
image of their primary server, a 'hot spare' so to speak. The primary, 
running the same software was solid, but the backup would reboot at about 
5:20 every morning with the same syndrome..no log entries of any sort and 
just the usual entries in /var/log messages saying the the / partition was 
not unmounted properly. The odd thing was that it was happening at virtually 
the same time every morning.

 I upgraded both systems to the latest -RELEASE and it made no difference. 
Then, they both just *stopped doing it by themselves* with no apparent 
correlation to anything installed software-wise. Neither server has had any 
problem for over a year now.

The 3rd instance is happening now. Another server I maintain for my 'night 
job' is doing the same thing for a customer. It just 'stops' like you pulled 
the power plug. However, this time I thought to check using 'last' and found 
that I had accidentally left an ssh session open and that entry said 'crash'.
There are no other log entries I can find related to the 'reboot'.

I 'googled' this problem and found it mentioned at least dozens of times 
without any answer brought forth.

I'm beginning to think this is real, but so intermittent that I don't know how 
to begin to debug or find it.  A wild guess would be something like an 
unitialized pointer, where everything works until whereever it is pointing to 
assumes some value that makes it just die suddenly without even a panic 
message.

The reason that I suspect this is also that the server that is doing this 
currently was running fine for a year, then the floods we had recently caused 
it to be powered down for a day or so and usually it is on a UPS and never is 
powered down, so that would have maybe changed the 'garbage' in memory, 
whereas normally it would stay the same until it was powered down. IE; if an 
uninitialized pointer was the culprit, maybe what it is pointing to, or where 
it is pointing is critical and powering it down changes where it is pointing 
and that area gets overwritten by some system process and causes the reboot.

I'm posting this to 'hackers' because I thought it might be a kernel thing.

-- 
-Jim


More information about the freebsd-hackers mailing list