File corruption: how to find the guilty?

Doug Ledford dledford at redhat.com
Thu Dec 17 08:35:34 PST 1998


Neil Conway wrote:
> 
> Doug Ledford wrote:
> >
> > Stephane Bortzmeyer wrote:
> > >
> > > I have a Linux box which shows random corruption of files. Example: all Perl
> > > scripts suddenly die with "segmentation fault". Reinstalling the same Perl
> > > package cures it. Two days ago, /etc/resolv.conf became corrupted : strange
> > > characters were in it.
> > >
> > > I wonder what to do? Change the disk? The SCSI controller? The kernel?
> > >
> > > I run Linux 2.0.35 (Debian distribution 2.0), patched for the Adaptec driver
> > > 5.1.2. Here is the configuration:
> >
> > It's memory corruption.  I've seen this float through this list or that about
> > 30 different times in the past.  Not once has it ever been a kernel or driver
> > issue.  In *every* case it has been either RAM, cache, or CPU.  Check the CPU
> > fan, check the cache (if it isn't part of the CPU) and check your RAM.
> 
> Well perhaps with a stable kernel this is the most likely culprit.
> However, it's dangerous to make blanket assertions - they come back to
> haunt you.  Alan Cox was telling me last month about how 2.1.129 was
> causing him random memory corruption leading to disk corruption, and
> this turned out to be a kernel bug (nfs-related I think).

Even in the devel kernels, 2.1.44 is the only one that was likely to do
this on a *local* filesystem.  There is a difference when running NFS. 
Not the least of that difference is that NFS is currently getting it's
last fixes after having been re-done for the most part, where as ext2fs
hasn't hardly been touched during the entire 2.1 kernel series.

-- 
  Doug Ledford   <dledford at redhat.com>
   Opinions expressed are my own, but
      they should be everybody's.

To Unsubscribe: send mail to majordomo at FreeBSD.org
with "unsubscribe aic7xxx" in the body of the message



More information about the aic7xxx mailing list