When will ZFS become stable?

Kris Kennaway kris at FreeBSD.org
Sun Jan 6 08:03:49 PST 2008


Henri Hennebert wrote:
> Kris Kennaway wrote:
>> Ivan Voras wrote:
>>> On 06/01/2008, Peter Schuller <peter.schuller at infidyne.com> wrote:
>>>>> This number is not so large. It seems to be easily crashed by rsync,
>>>>> for example (speaking from my own experience, and also some of my
>>>>> colleagues).
>>>> I can definitely say this is not *generally* true, as I do a lot of
>>>> rsyncing/rdiff-backup:ing and similar stuff (with many files / large 
>>>> files)
>>>> on ZFS without any stability issues. Problems for me have been 
>>>> limited to
>>>> 32bit and the memory exhaustion issue rather than "hard" issues.
>>>
>>> It's not generally true since kmem problems with rsync are often hard
>>> to repeat - I have them on one machine, but not on another, similar
>>> machine. This nonrepeatability is also a part of the problem.
>>>
>>>> But perhaps that's all you are referring to.
>>>
>>> Mostly. I did have a ZFS crash with rsync that wasn't kmem related,
>>> but only once.
>>
>> kmem problems are just tuning.  They are not indicative of stability 
>> problems in ZFS.  Please report any further non-kmem panics you 
>> experience.
> 
> I encounter 2 times a deadlock during high I/O activity (the last one 
> during rsync + rm -r on a 5GB hierarchy (openoffice-2/work).
> 
> I was running with this patch:
> http://people.freebsd.org/~pjd/patches/zgd_done.patch
> db> show allpcpu
> Current CPU: 1
> 
> cpuid        = 0
> curthread    = 0xa5ebe440: pid 3422 "txg_thread_enter"
> curpcb       = 0xeb175d90
> fpcurthread  = none
> idlethread   = 0xa5529aa0: pid 12 "idle: cpu0"
> APIC ID      = 0
> currentldt   = 0x50
> 
> cpuid        = 1
> curthread    = 0xa56ab220: pid 47 "arc_reclaim_thread"
> curpcb       = 0xe6837d90
> fpcurthread  = none
> idlethread   = 0xa5529880: pid 11 "idle: cpu1"
> APIC ID      = 1
> currentldt   = 0x50
> 
> With the 2 times arc_reclaim_thread `running`

Backtraces of the affected processes (or just alltrace) are usually 
required to proceed with debugging, and lock status is also often vital 
(show alllocks, requires witness).  Also, in the case when threads are 
actually running (not deadlocked), then it is often useful to repeatedly 
break/continue and sample many backtraces to try and determine where the 
threads are looping.

Kris


More information about the freebsd-current mailing list