sbrk(2) broken

Thu Jan 3 16:26:32 PST 2008

On Thu, 3 Jan 2008, Dag-Erling Smørgrav wrote:

> Jason Evans <jasone at freebsd.org> writes:
>> [sbrk is broken]
>
> The real question is why we would revert perfectly good code (jemalloc) from 
> using a modern interface to using one that has been obsolete for twenty 
> years, and marked as such in the man page for seven years.
>
> If rwatson@ wants malloc() to respect resource limits, he can bloody well 
> fix mmap().  Until he does, the datasize limit is a joke anyway, as anyone 
> can circumvent it by either using mmap() instead of malloc() or setting 
> _malloc_options before calling malloc().

The issue here was that there were a number of reports that out-of-control 
applications were toasting systems that weren't getting toasted under 6.x.  I 
experienced this on my web server, but the ports build cluster has been 
running into it for months.  The symptom is that a single application exhausts 
swap, causing all sorts of things to break (tm), killing of other large 
processes, etc.  To be clear, in the new world order, instead of getting NULL 
back from malloc(3), SIGKILL is delivered to large processes.

When I e-mailed Jason Evans and Alan Cox about it, I suggested that we 
actually teach malloc(3) to enforce an allocation limit itself by querying a 
limit once at process startup, and then using its own accounting to decide 
when to start failing requests.  As an alternative model that would require 
some more infrastructural changes, I suggested a new mmap() flag that hinted 
to the kernel that the page should count against a swap/anonymous memory 
limit, but that we should avoid more serious changes at the last minute before 
a release.  Alan suggested the the model Jason ended up implementing as a 
lower risk way to restore the 6.x resource limits non-disruptively.  As it 
turned out, this proved much more complicated than expected.

The right answer is presumably to introduce a new LIMIT_SWAP, which limits the 
allocation of anonymous memory by processes, and size it to something like 90% 
of swap space by default.  Since that won't be happening before 7.0, I believe 
the consensus is to simply not MFC the changes for 7 and proceed with the 
release.  However, having a resource limit on swap use in order to prevent the 
above scenario is actually quite important: SIGKILL of arbitrary processes is 
not a good way to deal with one run-away process, and the virtual memory size 
limit, while also useful, prevents you from limiting the allocation of swap 
without also limiting memory mapping.  So wouldn't help, for example, to limit 
swap used by a web cache that memory mapped cache files.

Robert N M Watson
Computer Laboratory
University of Cambridge