kern/87255: Large malloc-backed mfs crashes the system
Robert Watson
rwatson at FreeBSD.org
Wed Jul 5 11:20:29 UTC 2006
The following reply was made to PR kern/87255; it has been noted by GNATS.
From: Robert Watson <rwatson at FreeBSD.org>
To: Yar Tikhiy <yar at comp.chem.msu.su>
Cc: freebsd-bugs at FreeBSD.org, bug-followup at FreeBSD.org
Subject: Re: kern/87255: Large malloc-backed mfs crashes the system
Date: Wed, 5 Jul 2006 12:16:11 +0100 (BST)
On Wed, 26 Oct 2005, Yar Tikhiy wrote:
> > In all cases it is a "don't do that then" class of problem.
>
> Yes, of course. The question is whether we consider it normal for root to
> have ability to panic the system using standard tools. "cat /dev/zero >
> /dev/mem" still is the ultimate way to. IMHO it is a key issue whether we
> fall back at the academical/research stage where rough corners are OK and
> the system is just a toy for eggheads, or we pretend our system is stable
> and robust. I doubt if an admin can crash the Windows NT kernel from the
> userland using conventional interfaces. I by no means expect this issue to
> be resolved soon, but it's worth being reflected on at tea-time :-)
>
> Apropos, here's another reproducible crash induced by md:
>
> # mdconfig -a -t malloc -s 300m
> md0
> # dd if=/dev/urandom of=/dev/md0 bs=1
> dd: /dev/md0: Input/output error
> 79+0 records in
> 78+9 records out
> # reboot
> panic: kmem_malloc(4096): kmem_map too small: 86224896 total allocated
>
> Apparently, it is not a fault of md, just our kernel memory allocator allows
> other kernel parts to starve it to death.
I'm not sure I entirely go along with this interpretation. The answer to the
question "What do do when the kernel runs out of address space?" is not easily
found. The "problem" is that md performs potentially unbounded allocation of
a quite bounded resource -- remember that resource deadlocks are very real,
sometimes it takes memory to release memory (abstractly, think of memory
allocation as locking). UMA supports allocator-enforced resource limits,
which can be requested by the consumer using uma_zone_set_max(). md(4) should
probably be using that interface and requesting a resource limit.
There is also a problem then regarding what happens when md(4) runs out of
resources to allocate when it has already "promised" that it's a disk of a
certain size up the stack. I.e., if the result isn't a panic, then how will
md(4) handle failure? Most file systems will not be happy when they get EIO,
so then perhaps the problem is that md(4) provides an abstraction for a
non-sparse device up the storage stack, but is in fact over-committing. This
suggests either that the size of an md device should be strictly bounded if it
is malloc-backed. Picking that maximum bound is also tricky. This is why, in
practice, we recommend using swap-backed md devices, so that the pages
associated with the md device can be swapped out under memory pressure, and
that the swap system have enough memory to fully back the md device.
Robert N M Watson
Computer Laboratory
University of Cambridge
More information about the freebsd-bugs
mailing list