kern/50201: [twe] 3ware RAID 5 resulting in data corruption

Jan Srzednicki w at
Sat Nov 26 16:00:20 GMT 2005

The following reply was made to PR kern/50201; it has been noted by GNATS.

From: Jan Srzednicki <w at>
To: bug-followup at, bruce at, dpk at
Subject: Re: kern/50201: [twe] 3ware RAID 5 resulting in data corruption
Date: Sat, 26 Nov 2005 16:58:36 +0100

 I'm experiencing a similar problem, though with a few notable
 First of all, I'm running FreeBSD 5.4-RELEASE (with RELENG_5_4 fixes) on
 my machine. Here's a brief output from my dmesg related to the 3ware
 [16:32] hostname:~ # dmesg | grep twe
 twe0: <3ware Storage Controller. Driver version> port 0xcc00-0xcc0f mem 0xfe000000-0xfe7fffff irq 21 at device 0.0 on pci2
 twe0: 8 ports, Firmware FE7X, BIOS BE7X
 twed0: <Unit 0, RAID5, Normal> on twe0
 twed0: 1192370MB (2441975040 sectors)
 The controller is a 7000-class 8-way RAID controller with PATA
 I'm experiencing repeatable data corruption, but it's was far more
 difficult to pin it down. I'm using the array for backups, which I'm
 doing via ssh over the network (100Mbit ethernet) in the following way:
 dump | gzip | md5checker | network(ssh) | md5checker | split twe0/files
 md5checker is my small utility to calculate md5 sums of each 1MB chunk
 of data piped through it. It assured me that data corruption does not
 occur on the network, as MD5 sums on each sides match each other. The
 total size of backuped data after gzipping sums to about 43GB.
 The strange thing was that performing _the same_ backup in the following
 dump | gzip > file
 cat file | md5checker | network(ssh) | md5checker | split twe0/files
 .. did not produce any errors (I repeated both "ways" several times, to
 make sure). Well, it appears that the data corruption is somehow related
 to the speed of the data transmition, as dump output is quite irregular
 and becomes rather slow when it hits a bunch of small files. The whole
 dump process takes about 6 hours. 
 I tried dumping the data into an IDE disk on the machine with the
 controller, which resulted in no errors. I also tried turning off
 softupdates on the filesystem on the 3ware array, with no effect. It
 clearly appears the data corruption is somehow related to the 3ware
 After some investigation, I've discovered the following facts:
  - data is corrupted in exact 128kB chunks; the whole 128kB is bad and
    appears to be random (that is, I could not find any similar chunk in
    other files on the partition).
  - errors are pretty rare; in the whole 43GB stream I'm getting about 3
    or 4 errors.
  - I'm not able to repeat data corruption locally. Things like:
 	cat /dev/(zero|urandom) | md5checker | split array/files
    .. did not produce _any_ errors, after piping about a terabyte of
 It also appears that turning off write-cache on the controller fixed the
 problem, but writes are very slow now.
 I don't have another 3ware controller, so I cannot check if it isn't a
 hardware issue within it.
 I'm of course willing to provide any feedback needed on that issue, but
 because of the duration of the process testing stuff is rather slow.

More information about the freebsd-bugs mailing list