kern/87255: Large malloc-backed mfs crashes the system

Wed Jul 5 11:20:29 UTC 2006

The following reply was made to PR kern/87255; it has been noted by GNATS.

From: Robert Watson <rwatson at FreeBSD.org>
To: Yar Tikhiy <yar at comp.chem.msu.su>
Cc: freebsd-bugs at FreeBSD.org, bug-followup at FreeBSD.org
Subject: Re: kern/87255: Large malloc-backed mfs crashes the system
Date: Wed, 5 Jul 2006 12:16:11 +0100 (BST)

 On Wed, 26 Oct 2005, Yar Tikhiy wrote:

 > > In all cases it is a "don't do that then" class of problem.
 >
 > Yes, of course.  The question is whether we consider it normal for root to 
 > have ability to panic the system using standard tools. "cat /dev/zero > 
 > /dev/mem" still is the ultimate way to.  IMHO it is a key issue whether we 
 > fall back at the academical/research stage where rough corners are OK and 
 > the system is just a toy for eggheads, or we pretend our system is stable 
 > and robust.  I doubt if an admin can crash the Windows NT kernel from the 
 > userland using conventional interfaces.  I by no means expect this issue to 
 > be resolved soon, but it's worth being reflected on at tea-time :-)
 >
 > Apropos, here's another reproducible crash induced by md:
 >
 > 	# mdconfig -a -t malloc -s 300m
 > 	md0
 > 	# dd if=/dev/urandom of=/dev/md0 bs=1
 > 	dd: /dev/md0: Input/output error
 > 	79+0 records in
 > 	78+9 records out
 > 	# reboot
 > 	panic: kmem_malloc(4096): kmem_map too small: 86224896 total allocated
 >
 > Apparently, it is not a fault of md, just our kernel memory allocator allows 
 > other kernel parts to starve it to death.

 I'm not sure I entirely go along with this interpretation.  The answer to the 
 question "What do do when the kernel runs out of address space?" is not easily 
 found.  The "problem" is that md performs potentially unbounded allocation of 
 a quite bounded resource -- remember that resource deadlocks are very real, 
 sometimes it takes memory to release memory (abstractly, think of memory 
 allocation as locking).  UMA supports allocator-enforced resource limits, 
 which can be requested by the consumer using uma_zone_set_max().  md(4) should 
 probably be using that interface and requesting a resource limit.

 There is also a problem then regarding what happens when md(4) runs out of 
 resources to allocate when it has already "promised" that it's a disk of a 
 certain size up the stack.  I.e., if the result isn't a panic, then how will 
 md(4) handle failure?  Most file systems will not be happy when they get EIO, 
 so then perhaps the problem is that md(4) provides an abstraction for a 
 non-sparse device up the storage stack, but is in fact over-committing.  This 
 suggests either that the size of an md device should be strictly bounded if it 
 is malloc-backed.  Picking that maximum bound is also tricky.  This is why, in 
 practice, we recommend using swap-backed md devices, so that the pages 
 associated with the md device can be swapped out under memory pressure, and 
 that the swap system have enough memory to fully back the md device.

 Robert N M Watson
 Computer Laboratory
 University of Cambridge