NFS-exported ZFS instability

Rick Macklem rmacklem at uoguelph.ca
Wed Jan 30 15:47:26 UTC 2013


Andriy Gapon wrote:
> on 30/01/2013 01:06 Rick Macklem said the following:
> > Andriy Gapon wrote:
> >> on 29/01/2013 23:44 Hiroki Sato said the following:
> >>>   http://people.allbsd.org/~hrs/FreeBSD/pool-20130130.txt
> >>>   http://people.allbsd.org/~hrs/FreeBSD/pool-20130130-info.txt
> >>
> >> I recognize here a ZFS ARC deadlock that should have been prevented
> >> by
> >> r241773
> >> and its MFCs (r242858 for 9, r242859 for 8).
> >>
> > Unfortunately, pool-20130130-info.txt shows a kernel built from
> > r244417,
> > unless I somehow misread it.
> 
> You are right. I slightly misdiagnosed the problem - it's not the
> same, but a
> slightly different problem. So it has "almost the same" cause, but
> r241773
> didn't handle this situation.
> 
> Basically:
> - a thread goes into ARC, acquires some ARC lock and then calls
> malloc(M_WAITOK)
> - there is a page shortage, so the thread ends up in VM_WAIT() waiting
> on pagedaemon
> - pagedaemon synchronously invokes lowmem hook
> - the ARC hook sleeps waiting on ARC reclaim thread to make a pass
> - ARC reclaim thread is blocked on the ARC lock held by the original
> thread
> 
> My conclusion: ARC lowmem hook should never wait on ARC reclaim
> thread. At
> least as long as the ARC code calls malloc(M_WAITOK) while holding
> locks.
> 
> Perhaps the root cause here is that we treat both KM_PUSHPAGE and
> KM_SLEEP as
> M_WAITOK. We do not seem to have an equivalent of KM_PUSHPAGE?
> Perhaps resurrected M_USE_RESERVE could serve this role?
> 
Good work figuring this out! Obviously, better folk that I will
have to figure out how to fix this.

Good luck with it, rick
ps: Having some "special" place malloc() can go for critical allocations,
    sounds like a good plan to me. Possibly have malloc() follow the
    M_NOWAIT path and then go to this area when M_NOWAIT fails to allocate?

> Quote:
> A small pool of reserved memory is available to allow the system to
> progress
> toward the goal of freeing additional memory while in a low memory
> situation.
> The KM_PUSHPAGE flag enables use of this reserved memory pool on an
> allocation.
> This flag can be used by drivers that implement strategy(9E) on memory
> allocations associated with a single I/O operation. The driver
> guarantees that
> the I/O operation will complete (or timeout) and, on completion, that
> the memory
> will be returned. The KM_PUSHPAGE flag should be used only in
> kmem_cache_alloc()
> calls. All allocations from a given cache should be consistent in
> their use of
> the flag. A driver that adheres to these restrictions can guarantee
> progress in
> a low memory situation without resorting to complex private allocation
> and
> queuing schemes. If KM_PUSHPAGE is specified, KM_SLEEP can also be
> used without
> causing deadlock.
> 
> 
> But please note how the Solaris API allows to use KM_PUSHPAGE with
> KM_SLEEP, not
> sure what's going on under the hood in that case.
> 
> >> See tid 100153 (arc reclaim thread), tid 100105 (pagedaemon) and
> >> tid
> >> 100639
> >> (nfsd in kmem_back).
> >>
> >> --
> >> Andriy Gapon
> 
> 
> --
> Andriy Gapon
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
> "freebsd-stable-unsubscribe at freebsd.org"


More information about the freebsd-stable mailing list