vm.swap_reserved toooooo large?
Matthew Dillon
dillon at apollo.backplane.com
Mon Dec 20 23:14:33 UTC 2010
One of the problems with resource management in general is
that it has traditionally been per-process, and due to the
multiplicative effect (e.g. max-descriptors * limit-per-descriptor),
per-process resources cannot be set such that any given user is
prevented from DDOSing the system without making them so low that
normal programs begin to fail for no good reason.
Hence the advent of per-user and other more suitable resource
limits, nominally set via sysctl. Even with these, however,
it is virtually impossible to protect against a user DDOS.
The kernel itself has resource limitations which are fairly easy
to blow out... mbufs are usually the easiest to blow up, followed
by pipe KVM memory. Filesytems can be blown up too by creating
sparse files and mmap()ing them (thus circumventing normal overcommit
limitations).
Paging just itself, without running the system out of VM, can destroy
a machine's performance and be just as effective a DDOS attack as
resource starvation is.
Virtual memory resources are similarly impacted. Overcommit limiting
features have as many downsides as they have upsides. Its an endless
argument but I've seen systems blow up with overcommit limits set even
more readily than with no (overcommit) limits set. Theoretically
overcommit limits make the system more manageable but in actual practice
they only work when the application base is written with such limits
in mind (and most are not). So for a general purpose unix environment
putting limits on overcommit tends to create headaches. To be sure, in
a turn-key environment overcommit serves a very important function. In
a non-turn-key environment however it will likely create more problems
than it will solve.
The only way to realistically deal with the mess, if it is important
to you, is to partition the systems' real resources and run stuff
inside their own virtualized kernels each of which does its own
independent resource management and whos I/O on the real system can
be well-controlled as an aggregate.
Alternatively, creating very large swap partitions work very well to
mitigate the more common problems. Swap itself is changing its function.
Swap is no longer just used for real memory overcommit (in fact,
real memory overcommit is quite uncommon these days). It is now also
used for things like tmpfs, temporary virtual disks, meta-data
caching, and so forth. These days the minimum amount of swap I
configure is 32G and as efficient swap storage gets more cost effective
(e.g. SSDs), significantly more. 70G, 110G, etc.
It becomes more a matter of being able to detact and act on the
DDOS/resource issue BEFORE it gets to the point of killing important
processes (definition: whatever is important for the functioning of
that particular machine, user-run or root-run), and less a matter of
hoping the system will do the right thing when the resource limit is
actually reached. Having a lot of swap gives you more time to act.
-Matt
More information about the freebsd-stable
mailing list