Sudden Reboots
Jim Durham
durham at jcdurham.com
Mon Oct 4 10:49:39 PDT 2004
On Saturday 02 October 2004 06:42 pm, Mike Tancsa wrote:
> On Fri, 1 Oct 2004 21:50:26 -0500, in sentex.lists.freebsd.hackers you
>
> wrote:
> >On Oct 1, 2004, at 7:23 PM, Jim Durham wrote:
> >> These are very rare.... except they seem to happen about once a day
> >> for a
> >> while and then stop... very strange..
> >>
> >>> and usually caused by hardware problems (e.g. faulty power supply,
> >>> overheating CPU, bad RAM).
> >>
> >> Possible, but if so, the hardware fixed itself on the first two boxes I
> >> mentioned.
> >
> >All of this can be bad, or not quite bad -- just not healthy --
> >hardware. Say a power supply that can't supply reliable +5, when the
> >line voltage drops a tad while all the disks are being hammered. It
> >can be a nightmare to figure out. Setup crash dumps, but also make
> >sure that the UPS the box is attached to isn't having problems. If
> >it's not on conditioned power, fix that.
>
> Also, a lot of older UPSes do not have any AVR (automatic voltage
> regulation). This in conjunction with a marginal power supply can
> cause problems like you describe. One of our POPs are in an area that
> has seen tremendous residential and industrial growth putting a strain
> on the local power. Prior to some major upgrades from the local
> utility company, we would see street power dropping below 100V during
> peak usage coming from the street and our APCs that have "smart boost"
> would all kick in to compensate. Also, the UPS can just be "bad" over
> time.
>
> As others have said, its pretty rare that reboots do not leave a crash
> dump behind when its a software issue. At the very least, enable crash
> dumps on your machines in question. See the man page for dumpon. At
> least this way you can narrow down the odds as to whether or not its
> pointing to a hardware or software issue.
>
> ---Mike
I will do that. However, there is something really weird about this after
watching it for a few days now that I'd like to tell about..
The reboots started out happening at 5.15 pm or so. I had them unplug the
server completely from AC and restart it and now it's happening withing a few
minutes of 12:40pm every day.
The 'last' command output is the only thing showing anything log-wise. Look at
this:
reboot ~ Mon Oct 4 12:33
reboot ~ Sun Oct 3 12:37
reboot ~ Sat Oct 2 12:42
reboot ~ Fri Oct 1 12:45
Looks like it's creeping 3 minutes earlier every day. Of course, the fsck time
is involved, but probably that is about the same every time.
I don't have documentation any more, but the one server I remember noting the
time when it was doing this before did it at 5:15 or so every morning.
This sure doesn't sound like hardware to me unless it's something to do with
the motherboard clock. I can't think of anything in hardware that would cycle
like this.
I remember having an AM radio transmitter back in my youth that would blow HV
rectifiers every day at the same time and we traced it to an industrial plant
pulling a breaker on the same line as us, but this server is on a UPS and the
time keeps creeping by 3 minutes. Really strange.
I will try crashdump.
-Jim
More information about the freebsd-hackers
mailing list