Hardware or OS problem?

Joseph Koenig (jWeb) joe at jwebmedia.com
Thu Jan 6 07:29:35 PST 2005


> On 1/5/2005 at 09:14 Joseph Koenig (jWeb) wrote:
> 
>> Hi,
>> 
>> We have a system that is currently giving us some trouble. The system
> is
>> FreeBSD 4.9. It's a 2 GHz system with 1MB RAM and (here's the kicker)
> 73GB
>> RAID 1 ATA drives. The system serves as a web/database server
> dedicated to
>> 1
>> site. Daily the system goes out and downloads real estate listings
> (via
>> shell scripts and cURL) and processes them (via PHP into MySQL). Also,
>> nightly the system downloads a zipped set of images (probably around
>> 400-500) and processes them into thumbnails (PHP scripts calling
>> ImageMagick). Over the last week or two, the system is crashing and
>> rebooting into single user mode. It's not consistently during updates,
> or
>> resizing of images, or anything like that. Yesterday, it crashed with
> 99%
>> processor idle and load averages of 0.00 0.00 0.00 -- I was watching a
>> 'top'
>> when the machine died. When it boots into single user mode, an fsck
> must be
>> run, which identified a few corrupt JPEG files -- however, the
> sysadmin who
>> reboots it never tells me which files they are. The sysadmin is
> convinced
>> it
>> is a FreeBSD problem and says that Linux will not crash because of a
>> corrupt
>> file and if it does, will not boot into single user mode and he will
> be
>> able
>> to access it remotely to do the fsck. About 3-4 weeks ago, one of the
>> drives
>> in the mirror set crashed and had to be replaced. I'm not convinced
> that
>> drives are not to blame for these issues. Is there any way to verify
> that?
>> Is it possible a corrupt JPEG on the drive could cause the system to
> crash
>> randomly? What can I do to correctly identify the problem so that we
> can
>> fix
>> it and not change the OS? Thanks,
> 
> The sysadmin has no clue about either linux or freebsd!
> 
> A corrupt JPEG cannot cause a crash of the OS, for any real OS.  (If it
> does, it is a bug in the OS, but I doubt one exists)  Real OS includes
> Windows XP, linux, and FreeBSD.
> 
> However, an OS crash can cause a corrupt JPEG!
> 
> Either linux or FreeBSD may boot into single user mode when the
> filesystem is corrupt.    What your sysadmin means is that with one of
> the newer filesystems Linux uses journeling, which is much less likely
> to enter this situation, but it still can happen.   With soft updates
> FreeBSD is in the same situation as linux, but softupdates is
> (generally, there are exceptions) better than journeling.   There is
> softupdates in Freebsd 4.9, but I'm not sure how to enable it, or how
> good it is.  (in 5.3 it is awesome!)
> 
> I suspect hardware.
> 
> I'd burn memtest to a CD, and run that for a few hours to see if
> something is identified.   Memtest won't catch everything, but it does
> a pretty good job.
> 
> Also look at other factors.  Does the HVAC kick in when this happens?
> Is someone hitting the panic stop switch?  Situations like that have
> happened, and they can take a while to debug.  They are not likely, but
> don't rule them out.
> 
> FreeBSD 4.9 is fairly old at this point.   You should seriously
> consider upgrading to 4.11 (due out in a few weeks), or 5.3 (my
> recommendation, but a much more involved upgrade).
> 

In addition, to the original problem stated above, we are seeing a number of
problems like "...in free(): warning: modified (page-) pointer" and "...in
free(): warning: chunk is already free". I have them admin running a memtest
today, but wanted to make sure these errors were not indicative of something
else going on. Thanks,

Joe Koenig
Production Manager
jWeb New Media Design
joe at jWebmedia.com
http://www.jwebmedia.com/
636.928.3162 



More information about the freebsd-questions mailing list