Anyone??? (was Reproducible data corruption on 6.1-Stable)

Daniel Gerzo danger at FreeBSD.org
Thu Sep 14 13:53:02 PDT 2006


Hello Jonathan,

Wednesday, September 13, 2006, 2:38:14 AM, you wrote:

> I set up a new server recently and transferred all the information from
> my old server over.  I tried to use unison to synchronize the backup of
> pictures I have taken and noticed that a large number of pictures where
> marked as changed on the server.  After checking the pictures by hand I
> confirmed that many of the pictures on the server were corrupted.

> It appears the corruption happens during the read process because when I
> recompare the files in a graphical diff tool between cache flushes the
> differences move around!?!?!?  The differences also appear to be very
> small for the most part, single bytes scattered throughout the file.  I
> really have no idea what is causing the problem and would like to pin it
> down so I can either replace hardware if it's bad or fix whatever the
> bug is.

> CPU: AMD Athlon(tm) XP 3200+ (2090.16-MHz 686-class CPU)
>   Origin = "AuthenticAMD"  Id = 0x6a0  Stepping = 0

I saw very similar simptons on p4 3.2ghz. I was able to build world
without any problems and the overall stability of the machine was
completely good, but when I tried to install some ports, the md5
sums didn't match the source and I was sure that they were all right.

The following simple test demonstrates the problem I was hitting:

root@[bigbang ~]# sha256 /usr/ports/distfiles/ruby/ruby-1.8.4.tar.gz
SHA256 (/usr/ports/distfiles/ruby/ruby-1.8.4.tar.gz) = b95ddf27bc0ffa379c9aa881ca39e92a7d79e0d08999b4dff6d7d9547ee2a72d
root@[bigbang ~]# sha256 /usr/ports/distfiles/ruby/ruby-1.8.4.tar.gz
SHA256 (/usr/ports/distfiles/ruby/ruby-1.8.4.tar.gz) = 71432841b3965b7ab2d83f0dc7c3049195ea4e9267a8dc2d825a8a0466982930
root@[bigbang ~]# sha256 /usr/ports/distfiles/ruby/ruby-1.8.4.tar.gz
SHA256 (/usr/ports/distfiles/ruby/ruby-1.8.4.tar.gz) = 83e44f5301b3270e821850164c74d275f6721bed5d126480cf518a9fe5ca0d6c
root@[bigbang ~]# md5 < /usr/ports/distfiles/ruby/ruby-1.8.4.tar.gz
bd8c2e593e1fa4b01fd98eaf016329bb
root@[bigbang ~]# md5 < /usr/ports/distfiles/ruby/ruby-1.8.4.tar.gz
bd8c2e593e1fa4b01fd98eaf016329bb
root@[bigbang ~]# md5 < /usr/ports/distfiles/ruby/ruby-1.8.4.tar.gz
b9342bb213393238dd37322d4e2ee3fe
root@[bigbang ~]# md5 < /usr/ports/distfiles/ruby/ruby-1.8.4.tar.gz
88efa7977fd3febaa8d260e3d5f21917

The memtest didn't show any problems with RAM and we were unable to
clarify what is really going on. Then we managed to get the machine
replaced with the complete new hardware and the problem was gone.
Later, I was told that it is some kind of known bug in older p4's
bioses (and advised to update the bios which should have been fixed
in the meantime) but we were unable to find out any information about
the problem. Fortunately the colo company replaced the hardware with
no problems. So long so good and the box is running flawlessly.

-- 
Best regards,
 Daniel                            mailto:danger at FreeBSD.org



More information about the freebsd-stable mailing list