Anyone??? (was Reproducible data corruption on 6.1-Stable)

Jonathan Stewart jonathan at kc8onw.net
Wed Sep 13 05:09:26 PDT 2006


Oliver Fromme wrote:
> Jonathan Stewart <jonathan at kc8onw.net> wrote:
>  > I set up a new server recently and transferred all the information from
>  > my old server over.  I tried to use unison to synchronize the backup of
>  > pictures I have taken and noticed that a large number of pictures where
>  > marked as changed on the server.  After checking the pictures by hand I
>  > confirmed that many of the pictures on the server were corrupted.  I
>  > attempted to use unison to update the files on the server with the
>  > correct local copies but it would fail on almost all the files with the
>  > message "destination updated during synchronization."
>  > 
>  > It appears the corruption happens during the read process because when I
>  > recompare the files in a graphical diff tool between cache flushes the
>  > differences move around!?!?!?  The differences also appear to be very
>  > small for the most part, single bytes scattered throughout the file.  I
>  > really have no idea what is causing the problem and would like to pin it
>  > down so I can either replace hardware if it's bad or fix whatever the
>  > bug is.
> 
> That very much sounds like bad RAM, or overclocked CPU
> or bus.  I assume you do not overclock, so I recommend
> you replace your RAM modules and check if the symptoms
> are gone.
> 
> Also check your BIOS settings for the RAM timings.
> Setting the timings to more conservative values might
> already solve the problem.

Thanks for the suggestions but I have tried lowering the clock rate on
the processor and and the RAM speed with no luck whatsoever.

I appear to have forgotten to mention that the problem appears no matter
how I read the file, unison, md5, etc.  1 out of maybe 100 times it will
read correctly.  I have another drive that I use for the OS and I have
done many buildworlds/kernels without problems on that drive as well as
compiling some very large software packages.  I'm wondering if a
possible cause is the controller ignoring read errors from the hard
drive but I would think more than the occasional single byte would be
changed.

I'm thinking about maybe trying to dd the file from the raw device in an
attempt to see if the problem is occurring in the filesystem code or is
lower level yet.  Any suggestions on how to locate the file on the disk
or how to isolate the problem better are welcome.  I don't mind doing
the work I just have no idea where to look/what to try next.

Thanks,
Jonathan


More information about the freebsd-stable mailing list