File corruption: how to find the guilty?
Doug Ledford
dledford at redhat.com
Thu Dec 17 08:35:34 PST 1998
Neil Conway wrote:
>
> Doug Ledford wrote:
> >
> > Stephane Bortzmeyer wrote:
> > >
> > > I have a Linux box which shows random corruption of files. Example: all Perl
> > > scripts suddenly die with "segmentation fault". Reinstalling the same Perl
> > > package cures it. Two days ago, /etc/resolv.conf became corrupted : strange
> > > characters were in it.
> > >
> > > I wonder what to do? Change the disk? The SCSI controller? The kernel?
> > >
> > > I run Linux 2.0.35 (Debian distribution 2.0), patched for the Adaptec driver
> > > 5.1.2. Here is the configuration:
> >
> > It's memory corruption. I've seen this float through this list or that about
> > 30 different times in the past. Not once has it ever been a kernel or driver
> > issue. In *every* case it has been either RAM, cache, or CPU. Check the CPU
> > fan, check the cache (if it isn't part of the CPU) and check your RAM.
>
> Well perhaps with a stable kernel this is the most likely culprit.
> However, it's dangerous to make blanket assertions - they come back to
> haunt you. Alan Cox was telling me last month about how 2.1.129 was
> causing him random memory corruption leading to disk corruption, and
> this turned out to be a kernel bug (nfs-related I think).
Even in the devel kernels, 2.1.44 is the only one that was likely to do
this on a *local* filesystem. There is a difference when running NFS.
Not the least of that difference is that NFS is currently getting it's
last fixes after having been re-done for the most part, where as ext2fs
hasn't hardly been touched during the entire 2.1 kernel series.
--
Doug Ledford <dledford at redhat.com>
Opinions expressed are my own, but
they should be everybody's.
To Unsubscribe: send mail to majordomo at FreeBSD.org
with "unsubscribe aic7xxx" in the body of the message
More information about the aic7xxx
mailing list