Hardware or OS problem? System Crashing...
v.velox at vvelox.net
Thu Jan 6 15:20:06 PST 2005
On Thu, 06 Jan 2005 13:33:41 -0600
"Joseph Koenig (jWeb)" <joe at jwebmedia.com> wrote:
> >> On 1/5/2005 at 09:14 Joseph Koenig (jWeb) wrote:
> >>> Hi,
> >>> We have a system that is currently giving us some trouble. The
> >system> is
> >>> FreeBSD 4.9. It's a 2 GHz system with 1MB RAM and (here's the
> >kicker)> 73GB
> >>> RAID 1 ATA drives. The system serves as a web/database server
> >> dedicated to
> >>> 1
> >>> site. Daily the system goes out and downloads real estate
> >listings> (via
> >>> shell scripts and cURL) and processes them (via PHP into MySQL).
> >Also,>> nightly the system downloads a zipped set of images
> >(probably around>> 400-500) and processes them into thumbnails (PHP
> >scripts calling>> ImageMagick). Over the last week or two, the
> >system is crashing and>> rebooting into single user mode. It's not
> >consistently during updates,> or
> >>> resizing of images, or anything like that. Yesterday, it crashed
> >with> 99%
> >>> processor idle and load averages of 0.00 0.00 0.00 -- I was
> >watching a>> 'top'
> >>> when the machine died. When it boots into single user mode, an
> >fsck> must be
> >>> run, which identified a few corrupt JPEG files -- however, the
> >> sysadmin who
> >>> reboots it never tells me which files they are. The sysadmin is
> >> convinced
> >>> it
> >>> is a FreeBSD problem and says that Linux will not crash because
> >of a>> corrupt
> >>> file and if it does, will not boot into single user mode and he
> >will> be
> >>> able
> >>> to access it remotely to do the fsck. About 3-4 weeks ago, one
> >of the>> drives
> >>> in the mirror set crashed and had to be replaced. I'm not
> >convinced> that
> >>> drives are not to blame for these issues. Is there any way to
> >verify> that?
> >>> Is it possible a corrupt JPEG on the drive could cause the
> >system to> crash
> >>> randomly? What can I do to correctly identify the problem so
> >that we> can
> >>> fix
> >>> it and not change the OS? Thanks,
> >> The sysadmin has no clue about either linux or freebsd!
> >> A corrupt JPEG cannot cause a crash of the OS, for any real OS.
> >(If it> does, it is a bug in the OS, but I doubt one exists) Real
> >OS includes> Windows XP, linux, and FreeBSD.
> >> However, an OS crash can cause a corrupt JPEG!
> >> Either linux or FreeBSD may boot into single user mode when the
> >> filesystem is corrupt. What your sysadmin means is that with
> >one of> the newer filesystems Linux uses journeling, which is much
> >less likely> to enter this situation, but it still can happen.
> >With soft updates> FreeBSD is in the same situation as linux, but
> >softupdates is> (generally, there are exceptions) better than
> >journeling. There is> softupdates in Freebsd 4.9, but I'm not
> >sure how to enable it, or how> good it is. (in 5.3 it is awesome!)
> >> I suspect hardware.
> >> I'd burn memtest to a CD, and run that for a few hours to see if
> >> something is identified. Memtest won't catch everything, but it
> >does> a pretty good job.
> >> Also look at other factors. Does the HVAC kick in when this
> >happens?> Is someone hitting the panic stop switch? Situations
> >like that have> happened, and they can take a while to debug. They
> >are not likely, but> don't rule them out.
> >> FreeBSD 4.9 is fairly old at this point. You should seriously
> >> consider upgrading to 4.11 (due out in a few weeks), or 5.3 (my
> >> recommendation, but a much more involved upgrade).
> > In addition, to the original problem stated above, we are seeing a
> > number of problems like "...in free(): warning: modified (page-)
> > pointer" and "...in free(): warning: chunk is already free". I
> > have them admin running a memtest today, but wanted to make sure
> > these errors were not indicative of something else going on.
> > Thanks,
> Well, the sysadmin tells me that memtest passed. Any one have any
> suggestions as to what could be causing the crashes? Thanks,
Don't trust memtest. I've seen it fail to identify faulty hardware in
FreeBSD does not crash because of bad files and I would be seriously
suspect of the admin that is trying to feed you this. That and he does
appear to be not concerned with it what so ever.
Yeah, in 4x a major file system problem is a lot more likely to need
fsck manually ran than on 5x. 5x will boot and run a back ground fsck.
So you will still have network and ect.
The best way to test drive is this... run lots of transactions across
all parts of the disk for a rather nice amount of time. Smartmontools
is also aviable in the ports.
I would be suspect of nearly any possible chuck of the hardware in
that box. Since you've not listed any thing that would allow any piece
of hardware to be ruled out as a problem regardless of the OS being
The places I would focus my attentions are the PSU, RAM, CPU, mother
board, cables, any PCI card or the like, and the drives them self.
BTW you may want to check this out...
More information about the freebsd-questions