When will ZFS become stable?

Kris Kennaway kris at FreeBSD.org
Sun Jan 6 07:46:12 PST 2008

Maciej Suszko wrote:
> Kris Kennaway wrote:
>> Ivan Voras wrote:
>>> On 06/01/2008, Peter Schuller <peter.schuller at infidyne.com> wrote:
>>>>> This number is not so large. It seems to be easily crashed by
>>>>> rsync, for example (speaking from my own experience, and also
>>>>> some of my colleagues).
>>>> I can definitely say this is not *generally* true, as I do a lot of
>>>> rsyncing/rdiff-backup:ing and similar stuff (with many files /
>>>> large files) on ZFS without any stability issues. Problems for me
>>>> have been limited to 32bit and the memory exhaustion issue rather
>>>> than "hard" issues.
>>> It's not generally true since kmem problems with rsync are often
>>> hard to repeat - I have them on one machine, but not on another,
>>> similar machine. This nonrepeatability is also a part of the
>>> problem.
>>>> But perhaps that's all you are referring to.
>>> Mostly. I did have a ZFS crash with rsync that wasn't kmem related,
>>> but only once.
>> kmem problems are just tuning.  They are not indicative of stability 
>> problems in ZFS.  Please report any further non-kmem panics you
>> experience.
> I agree that ZFS is pretty stable itself. I use 32bit machine with
> 2gigs od RAM and all hang cases are kmem related, but the fact is that
> I haven't found any way of tuning to stop it crashing. When I do some
> rsyncing, especially beetwen different pools - it hangs or reboots -
> mostly on bigger files (i.e. rsyncing ports tree with distfiles).
> At the moment I patched the kernel with vm_kern.c.2.patch and it just
> stopped crashing, but from time to time the machine looks like beeing
> freezed for a second or two, after that it works normally.
> Have you got any similar experience?

That is expected.  That patch makes the system do more work to try and 
reclaim memory when it would previously have panicked from lack of 
memory.  However, the same advice applies as to Ivan: you should try and 
tune the memory parameters better to avoid this last-ditch sitation.


P.S. It sounds like you do not have sufficient debugging configured 
either: crashes should produce either a DDB prompt or a coredump so they 
can be studied and understood.

