When will ZFS become stable?
Henri Hennebert
hlh at restart.be
Sun Jan 6 08:47:33 PST 2008
Kris Kennaway wrote:
> Henri Hennebert wrote:
>> Kris Kennaway wrote:
>>> Ivan Voras wrote:
>>>> On 06/01/2008, Peter Schuller <peter.schuller at infidyne.com> wrote:
>>>>>> This number is not so large. It seems to be easily crashed by rsync,
>>>>>> for example (speaking from my own experience, and also some of my
>>>>>> colleagues).
>>>>> I can definitely say this is not *generally* true, as I do a lot of
>>>>> rsyncing/rdiff-backup:ing and similar stuff (with many files /
>>>>> large files)
>>>>> on ZFS without any stability issues. Problems for me have been
>>>>> limited to
>>>>> 32bit and the memory exhaustion issue rather than "hard" issues.
>>>>
>>>> It's not generally true since kmem problems with rsync are often hard
>>>> to repeat - I have them on one machine, but not on another, similar
>>>> machine. This nonrepeatability is also a part of the problem.
>>>>
>>>>> But perhaps that's all you are referring to.
>>>>
>>>> Mostly. I did have a ZFS crash with rsync that wasn't kmem related,
>>>> but only once.
>>>
>>> kmem problems are just tuning. They are not indicative of stability
>>> problems in ZFS. Please report any further non-kmem panics you
>>> experience.
>>
>> I encounter 2 times a deadlock during high I/O activity (the last one
>> during rsync + rm -r on a 5GB hierarchy (openoffice-2/work).
>>
>> I was running with this patch:
>> http://people.freebsd.org/~pjd/patches/zgd_done.patch
>> db> show allpcpu
>> Current CPU: 1
>>
>> cpuid = 0
>> curthread = 0xa5ebe440: pid 3422 "txg_thread_enter"
>> curpcb = 0xeb175d90
>> fpcurthread = none
>> idlethread = 0xa5529aa0: pid 12 "idle: cpu0"
>> APIC ID = 0
>> currentldt = 0x50
>>
>> cpuid = 1
>> curthread = 0xa56ab220: pid 47 "arc_reclaim_thread"
>> curpcb = 0xe6837d90
>> fpcurthread = none
>> idlethread = 0xa5529880: pid 11 "idle: cpu1"
>> APIC ID = 1
>> currentldt = 0x50
>>
>> With the 2 times arc_reclaim_thread `running`
>
> Backtraces of the affected processes (or just alltrace) are usually
noted for next time
> required to proceed with debugging, and lock status is also often vital
> (show alllocks, requires witness).
I add it to my kernel config
Also, in the case when threads are
> actually running (not deadlocked), then it is often useful to repeatedly
> break/continue and sample many backtraces to try and determine where the
> threads are looping.
I do this after the second deadlock and arc_reclaim_thread was always
there and second cpu was idle.
Henri
>
> Kris
More information about the freebsd-current
mailing list