ten thousand small processes

Jeff Roberson jroberson at chesapeake.net
Wed Jun 25 00:06:57 PDT 2003


On 25 Jun 2003, D. J. Bernstein wrote:

> As I said, I don't particularly care about the text segment. I'm not
> talking about ten thousand separate programs.
>
> Why does the memory manager keep the stack separate from data? Suppose a
> program has 1000 bytes of data+bss. You could organize VM as follows:
>
>                    0x7fffac18    0x7fffb000             0x80000000
>      <---- stack   data+bss      text, say 5 pages      heap ---->

This is a layout that is chosen by some 64bit architectures.  Alpha for
example.  The problem is that on alpha you have a LOT of address space and
so it has many options for placing shared libraries.  On x86 if you place
them roughly in the middle you take space away from heap and stack
equally.

Furthermore, text is typically linked to run relative to address 0.  This
isn't up to the operating system.  This is up to the tool chain and object
format.  In some cases it is up to the ABI.  The other problem with this
arrangement is that it restricts the heap size.   On FreeBSD this would
leave you with 1GB of heap and nearly 2GB of stack.  Perhaps you use your
stack differently than I do but that does not sound so appealing.

>
> As long as the stack doesn't chew up more than 3096 bytes and the heap
> isn't used, there's just one page per process.

Except that the operating system needs a stack too.  That's several pages.
And the uarea adds another page.  And the proc structure, and the vm
space, and the file desc table, and the thread structures now that freebsd
is multithreaded.  That's probably another 20kb or so on x86.  The minor
savings in user space are far outweighed by the kernel usage.  Amdahl
would have something to say about that.

Furthermore, the VM treats stack pages and data pages differently.  it
also treats bss pages differently.  Sure you could fit them all in if you
wrote special case code to handle this situation, but how often does it
really occur?  I'm guessing just about never for almost all applications
that FreeBSD is used for.  This is a general purpose operating system that
needs to work for normal cases.

> As for page tables: Instead of allocating space for a bunch of nearly
> identical page tables, why not overlap page tables, with the changes
> copied on a process switch?

They aren't nearly identical.  They point at different pages.  You can't
overlap them unless you have 4MB of aligned mapped pages that are
identical across two processes as is the case with large shared memory
segments.  Again, I think you would do well to read up on MMUs and paging
hardware.

If I gave two processes the same page directory and page tables they would
overwrite each others memory!

> As for 39 pages of VM, mostly stack: Can the system actually allocate
> 390000 pages of VM? I'm only mildly concerned with the memory-management

There is no special allocation for virtual address space that is
contiguous with another region.  It is simply the upper bound on an
address.  The system can allocate more vm than the system has swap and
physical memory.  The system can allocate more vm than available disk
space if you ask for the right thing in the right number of processes.
390000 is only 1.5 gigs.  You could allocate that many pages in one
process on x86.

> time; what bothers me is the loss of valuable address space. I hope that
> this 128-kilobyte stack carelessness doesn't reflect a general policy of
> dishonest VM allocation (``overcommitment''); I need to be able to
> preallocate memory with proper error detection, so that I can guarantee
> the success of subsequent operations.

You need to look at the situation realisticly.  FreeBSD is not being
developed for your mythical one page process.  It's developed for real
applications that use up stack space.  That limit is set so that in the
common case we don't have to do an expensive operation to grow the stack's
map.  Make the common case fast, right?  I don't appreciate your tone
here, especially coming from someone who obviously is not familiar with
VMs.

>
> As for malloc()'s careless use of memory: Is it really asking so much
> that a single malloc(1) not be expanded by a factor of 16384?

Yes, when in the common case that extra allocation will be used later.
The size of the allocation from the back end dramatically impacts the
performance of malloc and the vm system.  It also effects fragmentation.

> Here's a really easy way to improve malloc(). Apparently, right now,
> there's no use of the space between the initial brk and the next page
> boundary. Okay: allocate that space in the simplest possible way---

This is fairly extreme hackery to save a half page of memory on average
and take a branch mispredict the rest of the time.

[code removed]
>
> ---with no waste of space and practically no waste of time. Maybe add


Except for the most important time; developers.  This is an absurd
suggestion.

> 8192 to wherewenormallystart; this is lots of room for people who know
> how to write small programs, and the cost is unnoticeable for people who
> don't.

People who know how to write really small programs would know not to use
the standard libc or at least not the standard malloc implementation.  It
is designed for average programs for real systems.

> (Quite a few of my programs simulate this effect by checking for space
> in a bss array, typically 2K. But setting aside the right amount of
> space would mean compiling, inspecting the brk alignment, and
> recompiling. I also feel bad chewing up space on systems where malloc()
> actually knows what it's doing.)

I'm sure your programs are very small.  Our userland malloc is actually
quite good.  We have phk to thank for that.  I'm sure he'd love to hear
your critiques and suggestions.

> As for the safety of writing code that makes malloc() fail horribly:
> After the Solaris treatment of BSD sockets, and the ``look, Ma, I can
> make an only-slightly-broken imitation of poll() using select()!''
> epidemic, I don't trust OS distributors to reserve syscall names for
> actual syscalls. I encounter more than enough portability problems
> without going out of my way to look for them.

The man pages specifically warn against using brk and sbrk yourself if
you're going to use malloc() and free().  You get what you deserve if you
do that.

> ---D. J. Bernstein, Associate Professor, Department of
Mathematics,
> Statistics, and Computer Science, University of Illinois at Chicago
>

As I said before, it sounds like your application is better suited for
DOS.  I'm sure you'll find that you have much more control over the
address layout of your system.

Cheers,
Jeff



More information about the freebsd-performance mailing list