Is ZFS ready for prime time?

Wed Nov 17 00:10:31 UTC 2010

On 11/16/10 20:23, Adam Vande More wrote:
> On Tue, Nov 16, 2010 at 8:11 AM, Ivan Voras<ivoras at freebsd.org>  wrote:
>
>> Actually, I don't see anything incorrect in the above archive post.
>>
>
> I do.  Cherry picking ZFS deficiencies without addressing the proper
> documented way to work around them or at even acknowledging it's possible to
> do so is FUD.  It's not like traditional RAID doesn't have it's own set of
> gotcha's and proper usage environment.

Well, you are also doing cherry picking of *good* features so I'd say 
there's no conceptual difference here :)

NHF, I'm not attacking you; as with everything else, people need to test 
technologies they are going to use and decide if they are good enough.

> Dismissing the value of checksumming your data seems foolhardy to say the
> least.  The place where silent data corruption most frequently occurs, in
> large archive type filesystems, also happens to be one of the prime usage
> candidates of RAIDZ.

Now if only the default checksum wasn't so weak:

http://opensolaris.org/jive/thread.jspa?threadID=69655&tstart=30
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6740597

There are no details about its "fixed" status so I think the problem is 
still there.

(of course, stronger options are available, etc. - and it's better than 
nothing)

>> As for specific problems with ZFS, I'm also pessimistic right now - it's
>> enough to read the freebsd-fs @ freebsd.org and zfs-discuss @
>> opensolaris.org lists to see that there are frequent problems and
>> outstanding issues. You can almost grep for people losing data on ZFS
>> weekly. Compare this to the volume of complaints about UFS in both OSes
>> (almost none).
>>
>
> There are actually very few stories of ZFS/zpool loss on the FreeBSD
> list(some are misidentifications of issues like this:
> http://lists.freebsd.org/pipermail/freebsd-fs/2010-September/009417.html),
> another source I would point you to is http://forums.freebsd.org/.  The
> single recent valid one I can find involves a pool on geli, but I will grant
> you that it happens at all is quite disconcerting.

Yes, especially since GELI is very sensitive to corruption.

But I'm also counting cases like the inability to replace a drive which 
failed, log device corruptions and similar things which will, if not 
result in a totally broken file system, result in a file system which is 
wedged in a way that requires it to be re-created.

In many of those, though, it's not clear if the error is in ZFS or FreeBSD.

> UFS has it's own set of
> issues/limitations so regardless of what you pick make sure you're aware of
> them and take issues to address them before problems occur.

Of course, UFS *is* old and "classical" in its implementation - it would 
be just as wrong to expect fancy features from UFS like to expect such 
time-tested stability from ZFS.

And new technologies need time to settle down: there are still 
occasional reports of SUJ problems.

Personally, I have encountered only stability issues and currently have 
only one server with ZFS in production (reduction from several of them 
about a year ago), but I'm constantly testing it in staging. If the v28 
import doesn't destabilize it in 9, I'm going to give it another chance.