sbrk(2) broken

Robert Watson rwatson at FreeBSD.org
Fri Jan 4 03:38:24 PST 2008


On Fri, 4 Jan 2008, Igor Mozolevsky wrote:

> On 04/01/2008, Robert Watson <rwatson at freebsd.org> wrote:
>> On Fri, 4 Jan 2008, Igor Mozolevsky wrote:
>>
>>> Of course, if you're afraid of memory overcommit and you know in advance
>>>> how much memory you need, you can simply allocate a sufficient amount of 
>>>> address space at startup and touch it all.  This way, you will either be 
>>>> killed right away, or be guaranteed to have sufficient memory for the 
>>>> rest of your (process) lifetime.  Alternatively, do what Varnish does: 
>>>> create a large file, mmap it, and allocate everything you need from that 
>>>> area, so you have your own private swap space.  Just make sure to 
>>>> actually allocate the disk space you need (by filling the file with 
>>>> zeroes, or at the minimum writing a zero to the file every sb.st_blksize 
>>>> bytes, preferably sequentially to avoid excessive fragmentation)
>>>
>>> Surely you can just fseek() on the file at the correct lenght?
>>
>> That will create a sparse file without file system blocks to back it, and 
>> is effectively also over-commit.  When the file system runs out of room, 
>> you will get SIGSEGV when the vnode pager discovers it can't write a page 
>> to disk.  If you zero-fill it, the blocks are pre-allocated.
>
> Surely you should not be allowed to overcommit on fseek() followed by 
> write(,,1); zeroing out gigs of hdd space seems rather silly...

Sparse files are a feature.  It just becomes inconvenient at that point 
because you discover the lack of space asynchronously from a useful user 
process event.  When memory pressure gets high, the vnode pager decides it's 
time to push a dirty page to disk, and then discovers that there are no free 
blocks on the file system to write to.  As I mentioned in my e-mail, it would 
be nice if our file system supported a way to reserve blocks for files without 
hooking them up to the file's visiible address space (in order to avoid 
zeroing them, which is required if you do want to hook them up for an 
unprivileged process).  However, that feature doesn't currently exist.

Many systems with sensitivity to on-demand allocation costs and without 
security requirements allow files to be extended without zeroing.  On systems 
with security requirements, this becomes a privileged operation (such as on 
Mac OS X) because exposing unzeroed pages from other files or processes not 
explicitly shared is Not Allowed.

Robert N M Watson
Computer Laboratory
University of Cambridge


More information about the freebsd-current mailing list