ZFS Checksum errors
Daniel Kalchev
daniel at digsys.bg
Fri Jun 22 08:08:22 UTC 2012
On 22.06.12 00:48, rondzierwa at comcast.net wrote:
> its these kind of antics that make me resistant to the thought of allowing ZFS to manage the raid. it seems to be having problems just managing a big file system. I don't want it to correct anything, or restore anything, just let me delete the files that hurt, fix up the free space list so it doesn't point outside the bounds of the disk, and get on with life.
Been there, done that... I too believed the hype that "hardware RAID" is
the better, more "reliable" solution.
But, when a third 3ware RAID array failed on me (I have lots) and had to
spend few days and nights to reassemble the pieces, I finally decided to
migrate everything to ZFS. In some cases I still do use the expensive
3ware RAID controllers as SLOW multi-port SATA controllers... I have
discovered that my disks aren't actually that slow by just using normal
HBA instead of "hardware RAID" controllers.
I also had few cases, where a disk was considered just perfect, by both
the 3ware controller (via any kind of test or verification) and
S.M.A.R.T. but produced checksum errors when used with ZFS. I keep one
or two such lying around to show to unbelievers.
ZFS is way, way, WAY, more reliable for your data than any RAID
controller could ever be. The reason is that ZFS checksums each and
every block (metadata and data) in memory, before sending it down the
pipe to disks and verifies those checksums when data comes back into
memory. This is not done by any other system. If your memory is
reliable, then you can trust ZFS if it tells you there are checksum
errors: these happened somewhere between memory and disks, most probably
corrupted RAID controller or on-disk caches, or some flaky bus. If your
memory is unreliable, bad luck -- no file system can help.
With ZFS over RAID you can only know there are problems "somewhere".
With ZFS directly managing your disks, you know exactly which disk or
the bus to it is failing and ZFS will automatically correct things. If
you have enough redundancy, no data will be damaged.
ZFS doesn't really have FAT table or such and free space is managed
differently. But you are correct -- there should be tools to fix this
kind of corruption. There is instrumentation for this, via zdb, but not
enough good documentation and no one-click tool. In any case, you should
be more concerned how you got to that corruption.
ZFS does not have problem managing big file system. In fact, if anything
can manage BIG file system that is ZFS.
Your 12TB is in fact, an moderately small filesystem for ZFS -- it's
used by way larger installations.
Just let ZFS manage the disks directly.
Daniel
More information about the freebsd-fs
mailing list