Anyone??? (was Reproducible data corruption on 6.1-Stable)

Jonathan Stewart jonathan at kc8onw.net
Thu Sep 14 19:27:26 PDT 2006


Daniel Gerzo wrote:
> Hello Jonathan,
> 
> Wednesday, September 13, 2006, 2:38:14 AM, you wrote:
> 
>> I set up a new server recently and transferred all the information from
>> my old server over.  I tried to use unison to synchronize the backup of
>> pictures I have taken and noticed that a large number of pictures where
>> marked as changed on the server.  After checking the pictures by hand I
>> confirmed that many of the pictures on the server were corrupted.
> 
>> It appears the corruption happens during the read process because when I
>> recompare the files in a graphical diff tool between cache flushes the
>> differences move around!?!?!?  The differences also appear to be very
>> small for the most part, single bytes scattered throughout the file.  I
>> really have no idea what is causing the problem and would like to pin it
>> down so I can either replace hardware if it's bad or fix whatever the
>> bug is.
> 
>> CPU: AMD Athlon(tm) XP 3200+ (2090.16-MHz 686-class CPU)
>>   Origin = "AuthenticAMD"  Id = 0x6a0  Stepping = 0
> 
> I saw very similar simptons on p4 3.2ghz. I was able to build world
> without any problems and the overall stability of the machine was
> completely good, but when I tried to install some ports, the md5
> sums didn't match the source and I was sure that they were all right.
> 
> The following simple test demonstrates the problem I was hitting:
> 
> root@[bigbang ~]# sha256 /usr/ports/distfiles/ruby/ruby-1.8.4.tar.gz
> SHA256 (/usr/ports/distfiles/ruby/ruby-1.8.4.tar.gz) = b95ddf27bc0ffa379c9aa881ca39e92a7d79e0d08999b4dff6d7d9547ee2a72d
> root@[bigbang ~]# sha256 /usr/ports/distfiles/ruby/ruby-1.8.4.tar.gz
> SHA256 (/usr/ports/distfiles/ruby/ruby-1.8.4.tar.gz) = 71432841b3965b7ab2d83f0dc7c3049195ea4e9267a8dc2d825a8a0466982930
> root@[bigbang ~]# sha256 /usr/ports/distfiles/ruby/ruby-1.8.4.tar.gz
> SHA256 (/usr/ports/distfiles/ruby/ruby-1.8.4.tar.gz) = 83e44f5301b3270e821850164c74d275f6721bed5d126480cf518a9fe5ca0d6c
> root@[bigbang ~]# md5 < /usr/ports/distfiles/ruby/ruby-1.8.4.tar.gz
> bd8c2e593e1fa4b01fd98eaf016329bb
> root@[bigbang ~]# md5 < /usr/ports/distfiles/ruby/ruby-1.8.4.tar.gz
> bd8c2e593e1fa4b01fd98eaf016329bb
> root@[bigbang ~]# md5 < /usr/ports/distfiles/ruby/ruby-1.8.4.tar.gz
> b9342bb213393238dd37322d4e2ee3fe
> root@[bigbang ~]# md5 < /usr/ports/distfiles/ruby/ruby-1.8.4.tar.gz
> 88efa7977fd3febaa8d260e3d5f21917
> 
> The memtest didn't show any problems with RAM and we were unable to
> clarify what is really going on. Then we managed to get the machine
> replaced with the complete new hardware and the problem was gone.
> Later, I was told that it is some kind of known bug in older p4's
> bioses (and advised to update the bios which should have been fixed
> in the meantime) but we were unable to find out any information about
> the problem. Fortunately the colo company replaced the hardware with
> no problems. So long so good and the box is running flawlessly.
> 

I don't think it's quite the same as my problem as I have to use dd on a
large file to flush the cache and force freebsd to go back to the disk
before the checksum changes.  At this point I think I need to further
narrow down where the error is occurring but I don't know what to try
next.  I am 99.999% sure memory and cpu are not the problem but after
that point I'm getting into driver and filesystem code testing which is
a little overwhelming to just dive into.

Jonathan


More information about the freebsd-stable mailing list