Maximum Swapsize
Matthew Dillon
dillon at apollo.backplane.com
Tue Apr 11 21:18:28 UTC 2006
From 'man tuning' (I think I wrote this, a long time ago):
You should typically size your swap space to approximately 2x main mem-
ory. If you do not have a lot of RAM, though, you will generally want a
lot more swap. It is not recommended that you configure any less than
256M of swap on a system and you should keep in mind future memory expan-
sion when sizing the swap partition. The kernel's VM paging algorithms
are tuned to perform best when there is at least 2x swap versus main mem-
ory. Configuring too little swap can lead to inefficiencies in the VM
page scanning code as well as create issues later on if you add more mem-
ory to your machine. Finally, on larger systems with multiple SCSI disks
(or multiple IDE disks operating on different controllers), we strongly
recommend that you configure swap on each drive (up to four drives). The
swap partitions on the drives should be approximately the same size. The
kernel can handle arbitrary sizes but internal data structures scale to 4
times the largest swap partition. Keeping the swap partitions near the
same size will allow the kernel to optimally stripe swap space across the
N disks. Do not worry about overdoing it a little, swap space is the
saving grace of UNIX and even if you do not normally use much swap, it
can give you more time to recover from a runaway program before being
forced to reboot.
--
The last sentence is probably the most important. The primary reason why
you want to configure a fairly large amount of swap has less to do with
performance and more to do with giving the system admin a long runway
to have the time to deal with unexpected situations before the machine
blows itself to bits.
The swap subsystem has the following limitation:
/*
* If we go beyond this, we get overflows in the radix
* tree bitmap code.
*/
if (nblks > 0x40000000 / BLIST_META_RADIX / nswdev) {
printf("exceeded maximum of %d blocks per swap unit\n",
0x40000000 / BLIST_META_RADIX / nswdev);
VOP_CLOSE(vp, FREAD | FWRITE, td);
return (ENXIO);
}
By default, BLIST_META_RADIX is 16 and nswdev is 4, so the maximum
number of blocks *PER* swap device is 16 million. If PAGE_SIZE is 4K,
the limitation is 64 GB per swap device and up to 4 swap devices
(256 GB total swap).
The kernel has to allocate memory to track the swap space. This memory
is allocated and managed by kern/subr_blist.c (assuming you haven't
changed things since I wrote it). This is basically implemented as a
flattened radix tree using a fixed radix of 16. The memory overhead is
fixed (based on the amount of swap configured) and comes to
approximately 2 bits per VM page. Performance is approximately O(log N).
Additionally, once pages are actually swapped out the VM object must
record the swap index for each page. This costs around 4 bytes per
swapped-out page and is probably the greatest limiting factor in the
amount of swap you can actually use. 256GB of 100% used swap would
eat 256MB of kernel ram.
I believe that large linear chunks of reserved swap, such as used by MD,
currently still require the per-page overhead. However, theoretically,
since the reservation model uses a radix tree, it *IS* possible to
reserve huge swaths of linear-addressed swap space with no per-page
storage requirements in the VM object. It is even possible to do away
with the 2 bits per page that the radix tree uses if the radix tree
were allocated dynamically. I decided against doing that because I
did not want the swap subsystem to be reliant on malloc() during
critical low-memory paging situations.
-Matt
More information about the freebsd-stable
mailing list