protecting some processes from out-of-swap killer

Tue Apr 28 13:19:52 UTC 2015

On Apr 28, 2015, at 5:51 AM, Ronald Klop <ronald-lists at klop.ws> wrote:

> On Sat, 25 Apr 2015 13:15:32 +0200, Dmitry Morozovsky <marck at rinet.ru> wrote:
> 
>> On Sat, 25 Apr 2015, Baptiste Daroussin wrote:
>> 
>>> > However, sometimes postgres processes got killed by 'out of swap space'.
>>> > I suppose the source of problem could be that VSZ size of postgres processes
>>> > (8-9 G) is bigger than swap congigured (4G).
>>> >
>>> > Is there any way to prevent this, besides reallocating space for swap?
>>> 
>>> protect(1) ?
>> 
>> Of course.  I really do not understand how google hides the man page from me.
>> 
>> Thanks, and sorry fot the noise.
>> 
> 
> 
> The OS trying to kill a process is probably not what you want. So when you protect(1) postgres the OS will kill another process, which I hope is not running without reason.
> My advice would be to
> - or increase your swap space
> - or tune postgresql to use less memory
> - or limit tmpfs (tmpfs uses swap if RAM is short)
> - or tune zfs to use less memory

That is good advice, although I do think "protect" has its place for preventing unforeseen accidents as mentioned above.  I believe it's good to be able to designate certain processes as more valuable than others if you know that to be the case.

A case in point: We have an Omeka instance on a fairly low-resource system.  Omeka uses ImageMagick to generate thumbnails for items added to collections.  In one case, we had a very large TIFF file added, who, when having a thumbnail generated, caused its thumbnail-generating process to consume large amounts of memory and swap.  When swap was exhausted, the OS decided it needed to kill off something and settled upon Apache (under which Omeka runs), causing Omeka to stop running.  In other times, the out-of-swap killer has chosen to kill the SSH daemon, making it harder subsequently to access the box and fix problems.

Being able to protect httpd---or even just sshd---served as a handy safety belt after that happened. :-)

I guess the proper way to address this would be to set limits on Apache or the thumbnail generation so it doesn't go hog wild in the future...

Cheers,

Paul.