durham at jcdurham.com
Thu Sep 30 07:03:02 PDT 2004
I have had this problem now with at least 3 FreeBSD servers over a period of
about 2 years. I had put it down to some hardware problem but it seems to be
too much of a coincidence with 3 different machines doing the same thing.
The first time was when I put 4.5-RELEASE on a brand new Dell Poweredge 2650.
I ran it on the bench for a week or so, then decided all was well and put it
in the server rack and started doing the company's email service on it. After
a few weeks, it suddenly would 'reboot' for no apparent reason. No log
entries, nothing at all except the usual stuff in /var/log/messages about '/
was not unmounted correctly', etc. Just like you had pulled the power plug.
The 2nd instance was a server that I maintain for an ISP that was a mirror
image of their primary server, a 'hot spare' so to speak. The primary,
running the same software was solid, but the backup would reboot at about
5:20 every morning with the same syndrome..no log entries of any sort and
just the usual entries in /var/log messages saying the the / partition was
not unmounted properly. The odd thing was that it was happening at virtually
the same time every morning.
I upgraded both systems to the latest -RELEASE and it made no difference.
Then, they both just *stopped doing it by themselves* with no apparent
correlation to anything installed software-wise. Neither server has had any
problem for over a year now.
The 3rd instance is happening now. Another server I maintain for my 'night
job' is doing the same thing for a customer. It just 'stops' like you pulled
the power plug. However, this time I thought to check using 'last' and found
that I had accidentally left an ssh session open and that entry said 'crash'.
There are no other log entries I can find related to the 'reboot'.
I 'googled' this problem and found it mentioned at least dozens of times
without any answer brought forth.
I'm beginning to think this is real, but so intermittent that I don't know how
to begin to debug or find it. A wild guess would be something like an
unitialized pointer, where everything works until whereever it is pointing to
assumes some value that makes it just die suddenly without even a panic
The reason that I suspect this is also that the server that is doing this
currently was running fine for a year, then the floods we had recently caused
it to be powered down for a day or so and usually it is on a UPS and never is
powered down, so that would have maybe changed the 'garbage' in memory,
whereas normally it would stay the same until it was powered down. IE; if an
uninitialized pointer was the culprit, maybe what it is pointing to, or where
it is pointing is critical and powering it down changes where it is pointing
and that area gets overwritten by some system process and causes the reboot.
I'm posting this to 'hackers' because I thought it might be a kernel thing.
More information about the freebsd-hackers