Re: limiting jail memory use with rctl/racct
- In reply to: Andriy Gapon : "limiting jail memory use with rctl/racct"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 30 Oct 2025 14:53:13 UTC
On Wed, Sep 24, 2025 at 02:08:11PM +0300, Andriy Gapon wrote: > > I wonder if people here use rctl to limit memory utilization for some > practical purposes and what your experience is. > > Recently I had a "bright" idea to limit memory use of Firefox (which, for > me, tends to consume all memory and swap impacting everything else on the > system). > Since Firefox is multi-process now, I decided to use a "null" jail as a > resource container. > That is, a jail configured with path=/ mount.nodevfs host=inherit ip4=inherit. > There is no filesystem or network isolation (so, no security benefits), just > grouping of related Firefox processes. > > The memory limit is set with this rule: > jail:firefox-cage:memoryuse:deny=8g > > I didn't know in advance how the memory limiting would affect Firefox and > how Firefox would react to it, so I decided to go ahead and experiment. > > I want to add that initially I also had a rule to limit swapuse but with it > enabled, Firefox wouldn't even start. When I removed the rule I observed > that initially rctl reported some absurdly high and unstable swapuse for the > jail. Gradually, it went down to some reasonable values. Maybe there is > some bug in RACCT code about accounting swap. I think the implementation is bogus. It hooks into swap_reserve_by_cred() so really it's limiting the amount of swap-backed virtual memory which can be allocated to the process. Each swap-backed VM object (typically corresponding to anonymous memory) has an associated user ID which is charged for the virtual mappings of that object. This can be seen by looking at RLIMIT_SWAP, e.g., on my desktop `procstat rlimitusage $(pgrep firefox) | grep swap` shows the same value for all processes. racct is hooking in at the wrong place. It also assumes that calls to swap_reserve_by_cred() and swap_release_by_cred() are balanced within a single process, which I think is not true. > For example: > $ rctl -h -u jail:firefox-cage: | sort > coredumpsize=0 > cputime=524 > datasize=276K > maxproc=23 > memorylocked=0 > memoryuse=8236M > msgqqueued=0 > msgqsize=0 > nmsgq=0 > nsem=0 > nsemop=0 > nshm=0 > nthr=559 > openfiles=5376 > pcpu=93 > pseudoterminals=0 > readbps=0 > readiops=0 > shmsize=0 > stacksize=8792K > swapuse=32G > vmemoryuse=73G > wallclock=3445 > writebps=288 > writeiops=2 > > One minute later: > $ rctl -h -u jail:firefox-cage: | sort > coredumpsize=0 > cputime=588 > datasize=312K > maxproc=26 > memorylocked=0 > memoryuse=8249M > msgqqueued=0 > msgqsize=0 > nmsgq=0 > nsem=0 > nsemop=0 > nshm=0 > nthr=633 > openfiles=5496 > pcpu=80 > pseudoterminals=0 > readbps=0 > readiops=0 > shmsize=0 > stacksize=10M > swapuse=19G > vmemoryuse=73G > wallclock=5140 > writebps=32K > writeiops=16 > > So, I had to ditch that rule although I find limiting memoryuse without > limiting swapuse to be incomplete. > > Also, I didn't even consider limiting vmemoryuse because it is very large, > it is hard to predict and it seems to have little correlation with the > physical memory use. I agree, limiting vmemoryuse is not very useful in general. It's just easy to implement. > Regarding the experiment, Firefox more or less works, but not without issues. > When there are a lot of sites are open in tabs, especially some "web > applications" that I have to use and which I know to be memory hogs, Firefox > start glitching here and there. Mostly it looks like some broken > JavaScript. > > Another observation is that memoryuse always stays somewhat above the 8 GB > limit. Sometimes it's just very slightly above, sometimes it's a couple of > hundred megs (or a few percent) above, e.g., memoryuse=8455M. > > And almost all the time I see a vmdaemon thread being active: > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 16 root 119 - 0B 16K CPU4 4 57.1H 68.98% vmdaemon > > And it's always somewhere in this call chain: > procstat -kk 16 > PID TID COMM TDNAME KSTACK > 16 100177 vmdaemon - vm_swapout_object_deactivate+0x130 > vm_swapout_map_deactivate_pages+0x1f3 vm_daemon+0x87d fork_exit+0xc7 > fork_trampoline+0xe > > My impression is that vm_daemon is trying to inactivate some pages belonging > to processes in the jail, so that they could get swapped out. But either > they get reactivated or the pageout code does not see a need to swap them > out and they remain resident. I think the RACCT_RSS implementation doesn't work well in general. The vm_daemon loop periodically (1Hz) scans all processes in the system, and for each process updates the stored RSS and checks to see if a limit applying to the process is reached. If so, it picks some pages mapped into the process and tries to map them, so they don't count against the RSS anymore. But: its strategy for picking pages to unmap is totally unrelated to their usage, i.e., it may unmap frequently accessed pages, in which case they will be faulted back into the pmap very quickly. In that case, vmdaemon and firefox will constantly be fighting each other. > I'd say that this is kind of unexpected consequence. > It keeps a CPU core busy and doesn't allow the system to enter power saving states. > > To conclude, this has a been useful experiment for me. > Initially, I had some naive expectations that memory limiting would just > magically "limit memory". The experiment forced me to think about what it > actually means to limit memory, how it could be done, what consequences it > would have and in what cases it could be useful. > > If anyone has better suggestions and better experience, please let me know. I don't have any better suggestions. It would take a fair bit of work to improve racct such that it's able to limit memory usage the way you want.