ten thousand small processes
Chuck Swiger
cswiger at mac.com
Wed Jun 25 15:57:59 PDT 2003
D. J. Bernstein wrote:
[ ... ]
> Why does the memory manager keep the stack separate from data? Suppose a
> program has 1000 bytes of data+bss. You could organize VM as follows:
>
> 0x7fffac18 0x7fffb000 0x80000000
> <---- stack data+bss text, say 5 pages heap ---->
>
> As long as the stack doesn't chew up more than 3096 bytes and the heap
> isn't used, there's just one page per process.
Remember that VMM hardware requires page-alignment: TEXT should be on pages
marked X (or RX if the local architecture needs it), DATA+BSS should be RW, and
I think FreeBSD needs the stack to be RWX. We need to consider the kernel's
address space-- 32-bit systems generally reserve the top 2GB, or sometimes less,
exclusively for the kernel.
Besides, most programs are probably not built as PIC and could not have their
starting address relocated arbitrarily, although perhaps it would be interesting
to consider the following process address map:
VM Address Usage
0x0 PAGEZERO
0x4000 XXX bytes reserved per the hard limit to process stack size
XXX (+ 0x4000) TEXT segment
YYY DATA + BSS
ZZZ heap
0x80000000 KVA
> As for page tables: Instead of allocating space for a bunch of nearly
> identical page tables, why not overlap page tables, with the changes
> copied on a process switch?
Mach uses copy-on-write for VMO's associated with the virtual address space used
by processes, which are similar abstractions to the page table entries used
under classic BSD. From "man vmmap":
The share mode describes whether pages are shared between processes, and
what happens when pages are modified. Private pages (PRV) are pages only
visible to this process. They are allocated as they are written to, and
can be paged out to disk. Copy-on-write (COW) pages are shared by multi-
ple processes (or shared by a single process in multiple locations).
When the page is modified, the writing process then receives its own copy
of the page. Empty (NUL) sharing implies that the page does not really
exist in physical memory. Aliased (ALI) and shared (SHM) memory is
shared between processes.
The share mode typically describes the general mode controlling the
region. For example, as copy-on-write pages are modified, they become
private to the application. Even with the private pages, the region is
still COW until all pages become private. Once all pages are private,
then the share mode would change to private.
The far left column names the purpose of the memory: text segment, data
segment, allocated via malloc, stack, etc. For regions loaded from bina-
ries, the far right shows the library loaded into the memory.
Some lines in vmmap's output describe submaps. A submap is a shared set
of virtual memory page descriptions that the operating system can reuse
between multiple processes. The memory between 0x70000000 and
0x80000000, for example, is a submap containing the most common dynamic
libraries. Submaps minimize the operating system's memory usage by rep-
resenting the virtual memory regions only once. Submaps can either be
shared by all processes (machine-wide) or local to the process (process-
only). If the contents of a machine-wide submap are changed -- for exam-
ple, the debugger makes a section of memory for a dylib writable so it
can insert debugging traps -- then the submap becomes local, and the ker-
nel will allocate memory to store the extra copy.
8-cube# vmmap 252
==== Non-writable regions for process 252
__PAGEZERO 0 [ 4K] ---/--- SM=NUL syslogd
__TEXT 1000 [ 20K] r-x/rwx SM=COW syslogd
__LINKEDIT 7000 [ 4K] r--/rwx SM=COW syslogd
Submap 90000000-9fffffff r--/r-- machine-wide submap
__TEXT 90000000 [ 932K] r-x/r-x SM=COW ...System.B.dylib
__LINKEDIT 900e9000 [ 260K] r--/r-- SM=COW ...System.B.dylib
__TEXT 93a40000 [ 20K] r-x/r-x SM=COW ...Common.A.dylib
__LINKEDIT 93a45000 [ 4K] r--/r-- SM=COW ...Common.A.dylib
Submap a000b000-a3a3ffff r--/r-- process-only submap
Submap a3a41000-afffffff r--/r-- process-only submap
aff80000 [ 512K] r--/r-- SM=SHM
==== Writable regions for process 252
__DATA 6000 [ 4K] rw-/rwx SM=PRV syslogd
MALLOC_USED(DefaultMallocZone_ 8000 [ 20K] rw-/rwx SM=COW
MALLOC_USED(DefaultMallocZone_ d000 [ 4K] rw-/rwx SM=ZER
MALLOC_USED(DefaultMallocZone_ e000 [ 4K] rw-/rwx SM=COW
MALLOC_FREE(DefaultMallocZone_ f000 [ 228K] rw-/rwx SM=ZER
__TEXT 8fe00000 [ 288K] rw-/rwx SM=COW /usr/lib/dyld
__DATA 8fe48000 [ 8K] rw-/rwx SM=COW /usr/lib/dyld
__DATA 8fe4a000 [ 4K] rw-/rwx SM=COW /usr/lib/dyld
__DATA 8fe4b000 [ 4K] rw-/rwx SM=ZER /usr/lib/dyld
__DATA 8fe4c000 [ 12K] rw-/rwx SM=COW /usr/lib/dyld
__DATA 8fe4f000 [ 144K] rw-/rwx SM=ZER /usr/lib/dyld
__LOCK 8fe73000 [ 4K] rw-/rwx SM=NUL /usr/lib/dyld
__LINKEDIT 8fe74000 [ 44K] rw-/rwx SM=COW /usr/lib/dyld
Submap 90000000-9fffffff r--/r-- machine-wide submap
__DATA a0000000 [ 4K] rw-/rw- SM=ZER ...System.B.dylib
__DATA a0001000 [ 4K] rw-/rw- SM=COW ...System.B.dylib
__DATA a0002000 [ 20K] rw-/rw- SM=COW ...System.B.dylib
__DATA a0007000 [ 16K] rw-/rw- SM=PRV ...System.B.dylib
Submap a000b000-a3a3ffff r--/r-- process-only submap
__DATA a3a40000 [ 4K] rw-/rw- SM=COW ...Common.A.dylib
Submap a3a41000-afffffff r--/r-- process-only submap
STACK[0] bff80000 [ 508K] rw-/rwx SM=PRV
bffff000 [ 4K] rw-/rwx SM=PRV
==== Legend
SM=sharing mode:
COW=copy_on_write PRV=private NUL=empty ALI=aliased
SHM=shared ZER=zero_filled S/A=shared_alias
==== Summary for process 252
ReadOnly portion of Libraries: Total=1572KB resident=1444KB(92%) swapped_out_or_
unallocated=128KB(8%)
Writable regions: Total=968KB written=40KB(4%) resident=88KB(9%) swapped_out=0KB
(0%) unallocated=880KB(91%)
> As for 39 pages of VM, mostly stack: Can the system actually allocate
> 390000 pages of VM?
I believe 390000 4K pages is 1523 MB: if you've got the datasize resource limit
set high enough and you've got the RAM or swap space available, the answer to
your question should be yes.
> I'm only mildly concerned with the memory-management
> time; what bothers me is the loss of valuable address space. I hope that
> this 128-kilobyte stack carelessness doesn't reflect a general policy of
> dishonest VM allocation (``overcommitment''); I need to be able to
> preallocate memory with proper error detection, so that I can guarantee
> the success of subsequent operations.
Preallocate at compile time, or preallocate at process run time?
> As for malloc()'s careless use of memory: Is it really asking so much
> that a single malloc(1) not be expanded by a factor of 16384?
>
> Here's a really easy way to improve malloc(). Apparently, right now,
> there's no use of the space between the initial brk and the next page
> boundary. Okay: allocate that space in the simplest possible way---
It's easy to write a memory allocator that performs a specific case well;
writing a general purpose malloc is significantly more complicated, and
FreeBSD's malloc is tuned for programs which are much larger than your example.
If you know of a malloc() implementation that does better than FreeBSD's, and is
suitable for SMP systems, let us know.
-Chuck
More information about the freebsd-performance
mailing list