graid3 data corruption?!?

Michael Reifenberger mike at Reifenberger.com
Thu Feb 23 04:31:20 PST 2006


Hi,
I'm having 5 firewire Disks in one graid3 set.
and using a fresh STABLE on SMP with an dual AMD64 in i386 mode.

While doing an md5 checksum of all files in the filesystem
(~770GB of data) on disk died. graid3 did the right thing
and disconnected the disk.

BUT:
after diffing the md5sums of the files on large file (probably the one
that got checked during the disk failure) had an different md5sum than before.
--- md5_11.log  Fri Dec  9 13:23:07 2005
+++ md5_12.log  Wed Feb 22 18:03:03 2006
@@ -4460,3 +4460,3 @@
  MD5 (Backup/totum/root_0_050211_i386.dmp.gz) = 5a3e7b03f48ea4c2cba10624edd996cf
-MD5 (Backup/totum/root_0_050715.dmp.gz) = 0e154301cbec84571d1df94bf68e3d79
+MD5 (Backup/totum/root_0_050715.dmp.gz) = 172d7c12b78f3f191c184d467e31a53c
  MD5 (RIP/.pgp/PGPMacBinaryMappings.txt) = bf1b637a3a69bcbb8d4177be46a1c3ac

BUT:
doing a fresh md5sum now in degraded mode of the file I get again
(the correct) value of:
MD5 (Backup/totum/root_0_050715.dmp.gz) = 0e154301cbec84571d1df94bf68e3d79

For me this means, that graid3 gave incorrect data during the disk los.
This shouldn't happen!

Any clues how this could happen?
Has anyone else seen this behaviour?

BTW: dmesg showed:
...
GEOM_RAID3: Device data created (id=0).
GEOM_RAID3: Device data: provider da5s1a detected.
GEOM_RAID3: Device data: provider da4s1a detected.
GEOM_RAID3: Device data: provider da3s1a detected.
GEOM_RAID3: Device data: provider da2s1a detected.
GEOM_RAID3: Device data: provider da1s1a detected.
GEOM_RAID3: Device data: provider da1s1a activated.
GEOM_RAID3: Device data: provider da2s1a activated.
GEOM_RAID3: Device data: provider da4s1a activated.
GEOM_RAID3: Device data: provider da3s1a activated.
GEOM_RAID3: Device data: provider da5s1a activated.
GEOM_RAID3: Device data: provider raid3/data launched.
...
(da2:sbp0:0:0:0): READ(10). CDB: 28 0 9 3f 46 6f 0 0 40 0
(da2:sbp0:0:0:0): CAM Status: SCSI Status Error
(da2:sbp0:0:0:0): SCSI Status: Check Condition
(da2:sbp0:0:0:0): ABORTED COMMAND asc:0,0
(da2:sbp0:0:0:0): No additional sense information
(da2:sbp0:0:0:0): Retrying Command (per Sense Data)
(da2:sbp0:0:0:0): READ(10). CDB: 28 0 9 3f 46 6f 0 0 40 0
(da2:sbp0:0:0:0): CAM Status: SCSI Status Error
(da2:sbp0:0:0:0): SCSI Status: Check Condition
(da2:sbp0:0:0:0): MEDIUM ERROR asc:4b,0
(da2:sbp0:0:0:0): Data phase error
(da2:sbp0:0:0:0): Retrying Command (per Sense Data)
(da2:sbp0:0:0:0): READ(10). CDB: 28 0 9 3f 46 6f 0 0 40 0
(da2:sbp0:0:0:0): CAM Status: SCSI Status Error
(da2:sbp0:0:0:0): SCSI Status: Check Condition
(da2:sbp0:0:0:0): ABORTED COMMAND asc:0,0
(da2:sbp0:0:0:0): No additional sense information
(da2:sbp0:0:0:0): Retrying Command (per Sense Data)
(da2:sbp0:0:0:0): READ(10). CDB: 28 0 9 3f 46 6f 0 0 40 0
(da2:sbp0:0:0:0): CAM Status: SCSI Status Error
(da2:sbp0:0:0:0): SCSI Status: Check Condition
(da2:sbp0:0:0:0): MEDIUM ERROR asc:4b,0
(da2:sbp0:0:0:0): Data phase error
(da2:sbp0:0:0:0): Retrying Command (per Sense Data)
(da2:sbp0:0:0:0): READ(10). CDB: 28 0 9 3f 46 6f 0 0 40 0
(da2:sbp0:0:0:0): CAM Status: SCSI Status Error
(da2:sbp0:0:0:0): SCSI Status: Check Condition
(da2:sbp0:0:0:0): ABORTED COMMAND asc:0,0
(da2:sbp0:0:0:0): No additional sense information
(da2:sbp0:0:0:0): Retries Exhausted
GEOM_RAID3: Request failed. da2s1a[READ(offset=79432531968, length=32768)]
GEOM_RAID3: Device data: provider da2s1a disconnected.
GEOM_RAID3: Request failed. da2s1a[READ(offset=79432761344, length=32768)]
GEOM_RAID3: Device data: provider [unknown] disconnected.
GEOM_RAID3: Request failed. da2s1a[READ(offset=79432695808, length=32768)]
GEOM_RAID3: Device data: provider [unknown] disconnected.
GEOM_RAID3: Request failed. da2s1a[READ(offset=79432663040, length=32768)]
GEOM_RAID3: Device data: provider [unknown] disconnected.
GEOM_RAID3: Request failed. da2s1a[READ(offset=79432630272, length=32768)]
GEOM_RAID3: Device data: provider [unknown] disconnected.
GEOM_RAID3: Request failed. da2s1a[READ(offset=79432597504, length=32768)]
GEOM_RAID3: Device data: provider [unknown] disconnected.
...
(da2:sbp0:0:0:0): READ(10). CDB: 28 0 9 3f 46 80 0 0 40 0
(da2:sbp0:0:0:0): CAM Status: SCSI Status Error
(da2:sbp0:0:0:0): SCSI Status: Check Condition
(da2:sbp0:0:0:0): MEDIUM ERROR asc:4b,0
(da2:sbp0:0:0:0): Data phase error
(da2:sbp0:0:0:0): Retrying Command (per Sense Data)
(da2:sbp0:0:0:0): READ(10). CDB: 28 0 9 3f 46 80 0 0 40 0
(da2:sbp0:0:0:0): CAM Status: SCSI Status Error
(da2:sbp0:0:0:0): SCSI Status: Check Condition
(da2:sbp0:0:0:0): MEDIUM ERROR asc:4b,0
(da2:sbp0:0:0:0): Data phase error
(da2:sbp0:0:0:0): Retrying Command (per Sense Data)

The last cam errors are during `dd`.

Bye/2
---
Michael Reifenberger, Business Development Manager SAP-Basis, Plaut Consulting
Comp: Michael.Reifenberger at plaut.de | Priv: Michael at Reifenberger.com
       http://www.plaut.de           |       http://www.Reifenberger.com



More information about the freebsd-stable mailing list