When will ZFS become stable?
Robert Watson
rwatson at FreeBSD.org
Tue Jan 8 01:20:00 PST 2008
On Sun, 6 Jan 2008, Ivan Voras wrote:
> Robert Watson wrote:
>
>> Actually, with mbuma, this has changed -- mbufs are now allocated from the
>> general kernel map. Pipe buffer memory and a few other things are still
>> allocated from separate maps, however. In fact, this was one of the known
>> issues with the introduction of large cluster sizes without resource
>> limits: address space and memory use were potentially unbounded, so Randall
>> recently properly implemented the resource limits on mbuf clusters of large
>> sizes.
>
> Is this related to reported panics with ZFS and a heavy network load (NFS
> mostly)?
Handling resource exhaustion is a tricky issue, because sometimes it takes
resources to make resources available. In the presence of a really greedy
(that is to say, effectively leaking) subsystem, there isn't really any way to
recover. There are really two alternatives: deadlock (no resources are
available, so no progress can be made) or panic (no resources are available so
do the only thing we can). Subsystems are relied upon to impose their own
limits, or at least provide those limits to UMA so that UMA can impose them,
as "appropriate" limits are entirely dependent on context. It's indeed the
case that the more load the system is under, the more resources are in use,
and therefore the lower the threshold for any particular system to contribute
to a potential exhaustion of resources. If the network is at a very high
watermark, then indeed ZFS has to use less to exhaust it.
Normally, subsystems like the network stack and file systems rely on "back
pressure" to cause them to release memory -- the network stack largely
allocates using UMA, so the VM low memory event frees up its caches, and it
also implements its own per-protocol low memory handlers, doing things like
discarding TCP reassembly buffers, etc. VM also knows to discard un-dirtied
pages. Pawel has a patch to make ZFS more agressively call low memory event
handlers when it gets a bit too greedy, which I saw in the re@ MFC queue
yesterday, it you might find this improves behavior a bit more. However,
things do get quite tricky when you're low on resources, because you waiting
indefinitely for resources rather than panicking may actually be worse,
because the system may never recover. That's why constaining initial resource
and responding to back pressure early is critical, in order to avoid getting
into situations where the only possible response is to hang or panic.
There's an interesting paper by Gibson, et al, from CMU on economic models for
"investing" memory pages in different sorts of cache -- prefetch, read-ahead,
buffer cache, etc, and is a good read for getting a grasp of just how tricky
the balance is to find.
Robert N M Watson
Computer Laboratory
University of Cambridge
More information about the freebsd-current
mailing list