ZFS: 'checksum mismatch' all over the place

Kenneth Vestergaard Schmidt kvs at pil.dk
Mon Aug 20 05:20:36 PDT 2007


Pawel Jakub Dawidek <pjd at FreeBSD.org> writes:
>> The drive-cage was previously used to expose a RAID-5 array, composed of
>> the 12 disks. This worked just fine, connecting to the same machine and
>> controller (i386 IBM xSeries X335, mpt(4) controller).
>
> How do you know it was fine? Did you have something that did
> checksumming? You could try geli with integrity verification feature
> turned on, fill the disks with some random data and then read it back,
> if your controller corrupts the data, geli should tell you this.

I may have to do this. The previous drive was almost filled to the brim
with data, which rsync looked at each day, and we didn't have a lot of
re-transfer, but that doesn't necessarily mean anything.

The same controller is used in 50+ other machines, but only connected to
two internal drives. There are no problems in those machines.

Still, the really weird thing is that we're seeing checksum-errors in
the same block across many drives. This does smell like either an issue
with the driver, the controller, or the drivecage, and not ZFS or GEOM.

The machine should have been in production, but the array just failed,
and if I can't salvage it, I'll have to start over. I might just as well
try geli with integrity verification before recreating the ZFS array,
then.

-- 
Kenneth Schmidt
pil.dk


More information about the freebsd-fs mailing list