When will ZFS become stable?
rwatson at FreeBSD.org
Mon Jan 7 15:39:14 PST 2008
On Tue, 8 Jan 2008, Vadim Goncharov wrote:
>> To make life slightly more complicated, small malloc allocations are
>> actually implemented using uma -- there are a small number of small object
>> size zones reserved for this purpose, and malloc just rounds up to the next
>> such bucket size and allocations from that bucket. For larger sizes,
>> malloc goes through uma, but pretty much directly to VM which makes pages
>> available directly. So when you look at "vmstat -z" output, be aware that
>> some of the information presented there (zones named things like "128",
>> "256", etc) are actually the pools from which malloc allocations come, so
>> there's double-counting.
> Yes, I've known it, but didn't known what column names exactly mean.
> Requests/Failures, I guess, is a pure statistics, Size is one element size,
> but why USED + FREE != LIMIT (on whose where limit is non-zero) ?
Possibly we should rename the "FREE" column to "CACHE" -- the free count is
the number of items in the UMA cache. These may be hung in buckets off the
per-CPU cache, or be spare buckets in the zone. Either way, the memory has to
be reclaimed before it can be used for other purposes, and generally for
complex objects, it can be allocated much more quickly than going back to VM
for more memory. LIMIT is an administrative limit that may be configured on
the zone, and is configured for some but not all zones.
I'll let someone with a bit more VM experience follow up with more information
about how the various maps and submaps relate to each other.
>> The concept of kernel memory, as seen above, is a bit of a convoluted
>> concept. Simple memory allocated by the kernel for its internal data
>> structures, such as vnodes, sockets, mbufs, etc, is almost always not
>> something that can be paged, as it may be accessed from contexts where
>> blocking on I/O is not permitted (for example, in interrupt threads or with
>> critical mutexes held). However, other memory in the kernel map may well be
>> pageable, such as kernel thread stacks for sleeping user threads
> We can assume for simplicty that their memoru is not-so-kernel but part of
> process address space :)
If it is mapped into the kernel address space, then it still counts towards
the limit on the map. There are really two critical resources: memory itself,
and address space to map it into. Over time, the balance between address
space and memory changes -- for a long time, 32 bits was the 640k of the UNIX
world, so there was always plenty of address space and not enough memory to
fill it. More recently, physical memory started to overtake address space,
and now with the advent of widely available 64-bit systems, it's swinging in
the other direction. The trick is always in how to tune things, as tuning
parameters designed for "memory is bounded and address space is infinite"
often work less well when that's not the case. In the early 5.x series, we
had a lot of kernel panics because kernel constants were scaling to physical
memory rather than address space, so the kernel would run out of address
space, for example.
>> (which can be swapped out under heavy memory load), pipe buffers, and
>> general cached data for the buffer cache / file system, which will be paged
>> out or discarded when memory pressure goes up.
> Umm. I think there is no point in swapping disk cache which can be
> discarded, so the most actual part of kernel memory which is swappable are
> anonymous pipe(2) buffers?
Yes, that's what I meant. There are some other types of pageable kernel
memory, such as memory used for swap-backed md devices.
Robert N M Watson
University of Cambridge
More information about the freebsd-current