[RFC] [PATCH] VM & VFS changes

Don Lewis truckman at FreeBSD.org
Thu Jun 2 11:29:05 GMT 2005


On  2 Jun, Jeremie Le Hen wrote:
> Hi Don, hi all,
> 
>> I just thought of a case where this might not work.  It is possible to
>> create a swap-backed md, use it to back a file system, add a bunch of
>> swap, and then fill the file system, consuming the swap space.  if you
>> do the cleanup processing in reverse order, the first operation would be
>> to remove the swap device, which you might not be able to do because of
>> a lack of RAM and alternate swap space.
> 
> I am maybe missing something, but at time of shutdown when filesystems
> are going to be unmounted, I think user processes don't exist any longer.
> Do kernel threads consume so much memory sometimes that they won't
> fit in RAM ?

Other things besides user processes can occupy swap space, such as
swap-backed memory disks that can be used for temporary file system
space.  The mdconfig(8) man page gives this example:

     To create and mount a 128MByte swap backed file system on /tmp:

           mdconfig -a -t swap -s 128M -u 10
           newfs -U /dev/md10
           mount /dev/md10 /tmp
           chmod 1777 /tmp

There's even a shortcut way of doing this, see mdmfs(8).

If you had 10+ GB of swap configured, you could create a 10 GB /tmp file
system this way.  Then if you write 10 GB of data to /tmp, 10 GB of your
swap will be in use, even after kill off all the user processes.  You
can only reclaim the swap space by unmounting the file system and
detaching the memory disk with "mdconfig -d".

You don't even need to create and mount the file system, you can consume
the swap space by just configuring the md device and writing to it:
	dd if=/dev/random of=/dev/md10 bs=8k


>> [...]
>> Taking care of the md's first is a good idea for the sake of efficiency,
>> because it eliminates the need to page in any of their contents that
>> have been paged out to swap.  The problem is that if they are used to
>> back file systems, any dependent file systems must be unmounted first,
>> and that might not be possible if one of the dependent file systems
>> contains a swap file.  An example would be using an md to back /tmp,
>> mounting /dev/adXXX on /tmp/foo, and adding /tmp/foo/swapfile as a swap
>> device.  It might not be possible to clean up this arrangement at
>> shutdown without deadlocking.
>> 
>> If a dependency tree is maintained, it should be possible prevent the
>> troublesome cases from happening.  If swapping to a swap-backed md or
>> swapping to vnode-backed md that resides in a file system that is
>> dependent on a file system that resides on a swap-backed md are
>> forbidden, I think that is sufficient to prevent deadlock.
> 
> Does the suppression a warning at shutdown (devfs) really needs to
> lead to a feature diminution ?  This might appear overkill.
> Although such setups you describe seems to be slightly useless, we
> can imagine that, in the future, resources restrictions may be
> applied to jails and this would allow to set a vnode-backed swap
> partition for each jail.  I'm not sure at all if this is going to
> happen someday, but I just wanted to point out that removing a feature
> now for this purpose could be a problem in the future/

The troublesome cases could prevent disk-backed file systems from being
unmounted on shutdown, causing them to be marked dirty and requiring
them to be fsck'ed at boot time.

Vnode backed swap isn't a problem unless the file system that holds the
swap file is swap-backed or there is a swap-backed file system between
its mount point and /

That said, I think a separate vnode-backed swap file per jail would be
very inefficient.  Vnode-backed swap is already slower than
device-backed swap because of the extra translation needed to map the
offset into swap to the actual disk sectors.  As I recall we also limit
the number of swap devices to a fairly small number because the amount
of kernel memory needed to map between vm pages and swap offsets
increases with the number of swap devices.   A per-jail swap quota would
seem to be easier to implement and would perform a lot better.




More information about the freebsd-current mailing list