NFS-exported ZFS instability

Wed Jan 30 08:31:48 UTC 2013

on 30/01/2013 01:06 Rick Macklem said the following:
> Andriy Gapon wrote:
>> on 29/01/2013 23:44 Hiroki Sato said the following:
>>>   http://people.allbsd.org/~hrs/FreeBSD/pool-20130130.txt
>>>   http://people.allbsd.org/~hrs/FreeBSD/pool-20130130-info.txt
>>
>> I recognize here a ZFS ARC deadlock that should have been prevented by
>> r241773
>> and its MFCs (r242858 for 9, r242859 for 8).
>>
> Unfortunately, pool-20130130-info.txt shows a kernel built from r244417,
> unless I somehow misread it.

You are right.  I slightly misdiagnosed the problem - it's not the same, but a
slightly different problem.  So it has "almost the same" cause, but r241773
didn't handle this situation.

Basically:
- a thread goes into ARC, acquires some ARC lock and then calls malloc(M_WAITOK)
- there is a page shortage, so the thread ends up in VM_WAIT() waiting on pagedaemon
- pagedaemon synchronously invokes lowmem hook
- the ARC hook sleeps waiting on ARC reclaim thread to make a pass
- ARC reclaim thread is blocked on the ARC lock held by the original thread

My conclusion: ARC lowmem hook should never wait on ARC reclaim thread.  At
least as long as the ARC code calls malloc(M_WAITOK) while holding locks.

Perhaps the root cause here is that we treat both KM_PUSHPAGE and KM_SLEEP as
M_WAITOK.  We do not seem to have an equivalent of KM_PUSHPAGE?
Perhaps resurrected M_USE_RESERVE could serve this role?

Quote:
A small pool of reserved memory is available to allow the system to progress
toward the goal of freeing additional memory while in a low memory situation.
The KM_PUSHPAGE flag enables use of this reserved memory pool on an allocation.
This flag can be used by drivers that implement strategy(9E) on memory
allocations associated with a single I/O operation. The driver guarantees that
the I/O operation will complete (or timeout) and, on completion, that the memory
will be returned. The KM_PUSHPAGE flag should be used only in kmem_cache_alloc()
calls. All allocations from a given cache should be consistent in their use of
the flag. A driver that adheres to these restrictions can guarantee progress in
a low memory situation without resorting to complex private allocation and
queuing schemes. If KM_PUSHPAGE is specified, KM_SLEEP can also be used without
causing deadlock.

But please note how the Solaris API allows to use KM_PUSHPAGE with KM_SLEEP, not
sure what's going on under the hood in that case.

>> See tid 100153 (arc reclaim thread), tid 100105 (pagedaemon) and tid
>> 100639
>> (nfsd in kmem_back).
>>
>> --
>> Andriy Gapon

-- 
Andriy Gapon