ZFS Panic after freebsd-update

Jeremy Chadwick jdc at koitsu.org
Tue Jul 2 07:57:33 UTC 2013


On Tue, Jul 02, 2013 at 08:59:56AM +0300, Andriy Gapon wrote:
> on 01/07/2013 21:50 Jeremy Chadwick said the following:
> > The issue is that ZFS on FreeBSD is still young compared to other
> > filesystems (specifically UFS).
> 
> That's a fact.
> 
> > Nothing is perfect, but FFS/UFS tends
> > to have a significantly larger number of bugs worked out of it to the
> > point where people can use it without losing sleep (barring the SUJ
> > stuff, don't get me started).
> 
> That's subjective.
> 
> > I have the same concerns over other
> > things, like ext2fs and fusefs for that matter -- but this thread is
> > about a ZFS-related crash, and that's why I'm "over-focused" on it.
> 
> I have an impression that you seem to state your (negative) opinion of ZFS in
> every other thread about ZFS problems.

The OP in question ended his post with the line "Thoughts?", and I have
given those thoughts.  My thoughts/opinions/experience may differ from
that of others.  Diversity of thoughts/opinions/experiences is good.
I'm not some kind of "authoritative ZFS guru" -- far from it.  If I
misunderstood what "Thoughts?" meant/implied, then draw and quarter me
for it; my actions/words = my responsibility.

I do not feel I have a "negative opinion" of ZFS.  I still use it today
on FreeBSD, donated money to Pawel when the project was originally
announced (because I wanted to see something new and useful thrive on
FreeBSD), and try my best to assist with issues pertaining to it where
applicable.  These are not the actions of someone with a negative
opinion, these are the actions of someone who is supportive while
simultaneously very cautious.

Is ZFS better today than it was when it was introduced?  By a long shot.
For example, on my stable/9 system here I don't tune /boot/loader.conf
any longer.  But that doesn't change my viewpoint when it comes to using
ZFS exclusively on a FreeBSD box.

> > A heterogeneous (UFS+ZFS) setup, rather than homogeneous (ZFS-only),
> > results in a system where an admin can upgrade + boot into single-user
> > and perform some tasks to test/troubleshoot; if the ZFS layer is
> > broken, it doesn't mean an essentially useless box.  That isn't FUD,
> > that's just the stage we're at right now.  I'm aware lots of people have
> > working ZFS-exclusive setups; like I said, "works great until it
> > doesn't".
> 
> Yeah, a heterogeneous setup can have its benefits, but it can have its drawbacks
> too.  This is true for heterogeneous vs monoculture in general.
> But the sword cuts both ways: what if something is broken in "UFS layer" or god
> forbid in VFS layer and you have only UFS?
> Besides, without mentioning specific classes of problems "ZFS layer is broken"
> is too vague.

The likelihood of something being broken in UFS is significantly lower
given its established history.  I have to go off of experience, both
personal and professional -- in my years of dealing with FreeBSD
(1997-present), I have only encountered issues with UFS a few times (I
can count them on one, maybe two hands), and I'm choosing to exclude
SU+J from the picture for what should be obvious reasons.  With ZFS,
well... just look at the mailing lists and PR count.  I don't want to be
a jerk about it, but you really have to look at the quantity.  It
doesn't mean ZFS is crap, it just means that for me, I don't think
we're quite "there" yet.

And I will gladly admit -- because you are the one who taught me this --
that every incident need be treated unique.  But one can't deny that a
substantial percentage (I would say majority) of -fs and -stable posts
relate somehow to ZFS; I'm often thrilled when it turns out to be
something else.

Playing a strange devil's advocate, let me give you an interesting
example: softupdates.  When SU was introduced to FreeBSD back in the
late 90s, there were issues and concerns -- lots.  As such, SU was
chosen to be disabled by default on root filesystems given the
importance of that filesystem (re: "we do not want to risk losing as
much data in the case of a crash" -- see the official FAQ, section 8.3).
All other filesystems defaulted to SU enabled.  It's been like that up
until 9.x where it now defaults to enabled.  So that's what, 15 years?

You could say that my example could also apply to ZFS, i.e. the reports
are a part of its growth and maturity, and I'd agree.  But I don't feel
it's reached the point where I'm willing to risk going ZFS-only.  Down
the road, sure, but not now.  That's just my take on it.

Please make sure to also consider, politely, that a lot of people who
have issues with ZFS have not been subscribed to the lists for long
periods of time.  They sign up/post when they have a problem.  Meaning:
they do not necessarily know of the history.  If they did, I (again
politely) believe they're likely to use a UFS+ZFS mix, or maybe a
gmirror+UFS+ZFS mix (though the GPT/gmirror thing is... never mind...).

> > So, how do you kernel guys debug a problem in this environment:
> > 
> > - ZFS-only
> > - Running -RELEASE (i.e. no source, thus a kernel cannot be rebuilt
> >   with added debugging features, etc.)
> > - No swap configured
> > - No serial console
> 
> I use boot environments and boot to a previous / known-good environment if I hit
> a loader bug, a kernel bug or a major userland problem in a new environment.
> I also use a mirrored setup and keep two copies of earlier boot chains.
> I am also not shy of live media in the case everything else fails.
>
> Now I wonder how you deal with the same kind of UFS-only environment.

The very few times I have had to deal with a system with "filesystem
oddities" with UFS, the disk was removed from the system and put into a
separate system (running the same kernel/world bits) which was then
booted into single-user and things manually dealt with.  The points were
that the other system 1) was dedicated to this task, 2) had swap set up,
and 3) had serial console set up.  That system could be rebuilt (from
source) to include kernel adjustments/etc. if further debugging data was
needed (kernel compile-time features, mainly).

All of these could apply to ZFS too, obviously.

But in the OP's case, the situation sounds dire given the limitations --
limitations that someone (apparently not him) chose, which greatly
hinder debugging/troubleshooting.  Had a heterogeneous setup been
chosen, the debugging/troubleshooting pains are less (IMO).  When I see
this, it makes me step back and ponder the decisions that lead to the
ZFS-only setup.

I work under the model that ZFS is young and therefore will break/cause
chaos for me in some way.  It's a safety net stemming from actual
experiences, in addition to what I see on the lists.  I operate under
the same pretense when it comes to things like HAMMER on DragonflyBSD
and Btrfs on Linux.  I do not operate this way when it comes to UFS,
just like I do not operate this way when it comes to ext2/ext3 on Linux.

I choose to use UFS for root/var/tmp/usr and ZFS for "other stuff"
because it allows me to debugging assistance without having to boot
alternate media, play around with ISO/memstick images, set up a PXE boot
environment, worry about bootloaders, or other whatnots.  I just boot
the system in single-user and go from there.

What about the fact that you do work on ZFS and have familiarity with
its code?  Would you say your familiarity makes you more comfortable
with a ZFS-only setup than others who do not have this familiarity?

So with regards to "spreading FUD":

- Fear: I'm not afraid of ZFS, I am simply not willing to accept the
  present-day risks given the alternatives that have been solid for
  me historically and given my skill set,
- Uncertainty: true, I am always uncertain of youthful filesystems,
- Doubt: I have no doubts regarding ZFS and its capabilities, potential,
  usefulness (see above, re: my experience), nor the fact it can (in
  the binary (yes/no) sense) be used for a root filesystem and/or other
  critical filesystems.

"Spreading FUD" to me conjures the impact of someone running around
trying to make people dislike or become afraid of something (I consider
this a form of trolling) -- the polar and extreme opposite of advocacy.
Such is not my intent, nor has it ever been.  While I do have "problems"
with FreeBSD (as a whole, the direction it's going, etc.), and would be
silly to deny that doesn't influence the tone I use in my mails, it is
something quite separate and would rather not go into that.

My intent is to make people think about their setup decisions given what
they've now experienced, and (hopefully) to get indirect answers as to
why they chose the path they did (not quite relevant in this case, since
the OP was not the one who deployed this setup).

If you feel that's FUD, one might say *that's* subjective, and
understandably so -- and I respect that.

-- 
| Jeremy Chadwick                                   jdc at koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Making life hard for others since 1977.             PGP 4BD6C0CB |



More information about the freebsd-stable mailing list