rare, random issue with read(), mmap() failing to read entire file

John Refling netbsdrat at gmail.com
Sat Nov 16 02:56:21 UTC 2013


 

I'm having some very insidious issues with copying and verifying (identical)
data from several hard disks.  This might be a hardware issue or something
very deep in the disk / filesystem code.  I have verified this with several
disks and motherboards.  It corrupts 0.0096% of my files, different files
each time!

 

Background:

 

1.  I have a 500 GB USB hard disk (the new 4,096 [4k] sector size) which I
have been using to store a master archive of over 70,000 files.

 

2.  To make a backup of the USB disk, I copied everything over to a 500 GB
SATA hard disk.  [Various combinations of `cp -r', `scp -r', `tar -cf - . |
rsh ... tar -xf -', etc.]

 

3.  To verify that the copy was correct, I did sha256 sums of all files on
both disks.

 

4.  When comparing the sha256 sums on both drives, I discovered that 6 or so
files did not compare OK from one drive to the other.

 

5.  When I checked the files individually, the files compared OK, and even
when I recomputed their individual sha256 sums, I got DIFFERENT sha256 sums
which were correct this time!

 

The above lead me to investigate further, and using ONLY the USB disk, I
recomputed the sha256 sums for all files ON THAT DISK.  A small number
(6-12) of files ON THE SAME DISK had different sha256 sums than previously
computed!  The disk is read-only so nothing could have changed.

 

To try to get to the bottom of this, I took the sha256 code and put it in my
own file reading routine, which reads-in data from the file using read().
On summing up the total bytes read in the read() loop, I discovered that on
the files that failed to compare, the read() returned EOF before the actual
EOF. According to the manual page this is impossible.  I compared the total
number of bytes read by the read() loop to the stat() file length value, and
they were different!  Obviously, the sha256 sum will be different since not
all the file is read.

 

This happens consistently on 6 to 12 files out of 70,000+ *every* time, and
on DIFFERENT files *every* time.  So things work 99.9904% of the time.

 

But something fails 0.0096% (one hundredth of one percent) of the time,
which with a large number of files is significant!

 

Instead of read(), I tried mmap()ing chunks of the file.  Using mmap() to
access the data in the file instead of read() resulted in a (different)
sha256 sum than the read() version!  The mmap() version was correct, except
in ONE case where BOTH versions were WRONG, when compared to a 3rd and 4th
run!

 

Using `diff -rq disk1 disk2` resulted in similar issues.  There were always
a few files that failed to compare.  Doing another `diff -rq disk1 disk2`
resulted in a few *other* files that failed to compare, while the ones that
didn't compare OK the first time, DID compare OK the second time.  This
happened to 6-12 files out of 70,000+.

 

Whatever is affecting my use of read() in my sha256 routine seems to also
affect system utilities such as diff!

 

This gets really insidious because I don't know if the original `cp -r disk1
disk2` did these short reads on a few files while copying the files, thus
corrupting my archive backup (on 6-12 files)!

 

Some of the files that fail are small (10KB) and some are huge (8GB).

 

HELP!

 

It takes 7 hours to recompute the sha256 sums of the files on the disk so
random experiments are time consuming, but I'm willing to try things that
are suggested.

 

System details:

 

This is observed with the following disks:

 

Western Digital 500GB SATA 512 byte sectors

Hitachi 500GB SATA 512 byte sectors

Iomega RPHD-UG3 500GB USB 4096 byte sectors

 

in combination with these motherboards:

 

P4M800Pro-M V2.0: Pentium D 2.66 GHz, 2GB memory

HP/Compaq Evo: Pentium 4, 2.8 GHz, 2GB memory

 

OP System version:

Freebsd: 9.1 RELEASE #0

 

no hardware errors noted in /var/log/messages during the file reading

 

did Spinrite on disks to freshen (re-read/write) all sectors, with no
errors.

 

The file systems were built using:

 

dd if=/dev/zero of=/dev/xxx bs=2m

newfs -m0 /dev/xxx

 

Looked through the mailing lists and bug reports but can't see anything
similar.

 

Thanks for your help,

 

John Refling

 



More information about the freebsd-questions mailing list