Is ZFS production ready?

Wojciech Puchar wojtek at wojtek.tensor.gdynia.pl
Thu Jun 21 14:44:33 UTC 2012


> One interesting feature of ZFS if it's block checksum: all reads and writes 
> include block checksum, so it can easily detect situations where, for 
> example, data is quietly corrupted by RAM.

you may be shocked but you are sometimes wrong. i already demostrated it 
and checksumming doesn't get any errors, and do write wrong data with right 
checksums :)

it's quite easy to explain if one understand hardware details.

Checksumming will protect you from

- failed SATA/SAS port, on-disk controller that returns bad data as good. 
This is actually really rare case. i never seen that, but maybe it 
happens.

- some types of DRAM failure - but not all. Actually just a small 
fraction because DRAM failure like that would bring your system to crash 
so quickly that you are unlikely to get big data corruption.

Common case with DRAM memory is that after you write to it, keeps right 
data some time and RARELY flips some bit later in spite of refresh.

With this type you may run your machine for hours, even days or longer.
And ZFS would calculate proper checksum of wrong data and will write it to 
disk.


This is the reason i keep few failed DIMMs - for testing how 
different software behaves on broken machine.

UFS resulted in few corrupted files after half a day of heavy work and 4 
crashes. fsck always recovered things well (of course "unexpected 
softupdate inconsistency....")

ZFS survived 2 crashes. After third it panicked on startup.

Of course - no zfs_fsck.
And no possibility of making really good zfs_fsck because of data layout, 
at least not easy.


> This feature is very important for databases.
is data integrity not important for the rest? :)

Still - disks itself perform quite heavy ECC and both SATA and SAS ports.


More information about the freebsd-questions mailing list