Re: limiting jail memory use with rctl/racct

From: Mark Johnston <markj_at_freebsd.org>
Date: Thu, 30 Oct 2025 14:53:13 UTC
On Wed, Sep 24, 2025 at 02:08:11PM +0300, Andriy Gapon wrote:
> 
> I wonder if people here use rctl to limit memory utilization for some
> practical purposes and what your experience is.
> 
> Recently I had a "bright" idea to limit memory use of Firefox (which, for
> me, tends to consume all memory and swap impacting everything else on the
> system).
> Since Firefox is multi-process now, I decided to use a "null" jail as a
> resource container.
> That is, a jail configured with path=/ mount.nodevfs host=inherit ip4=inherit.
> There is no filesystem or network isolation (so, no security benefits), just
> grouping of related Firefox processes.
> 
> The memory limit is set with this rule:
> jail:firefox-cage:memoryuse:deny=8g
> 
> I didn't know in advance how the memory limiting would affect Firefox and
> how Firefox would react to it, so I decided to go ahead and experiment.
> 
> I want to add that initially I also had a rule to limit swapuse but with it
> enabled, Firefox wouldn't even start.  When I removed the rule I observed
> that initially rctl reported some absurdly high and unstable swapuse for the
> jail. Gradually, it went down to some reasonable values.  Maybe there is
> some bug in RACCT code about accounting swap.

I think the implementation is bogus.  It hooks into
swap_reserve_by_cred() so really it's limiting the amount of swap-backed
virtual memory which can be allocated to the process.

Each swap-backed VM object (typically corresponding to anonymous memory)
has an associated user ID which is charged for the virtual mappings of
that object.  This can be seen by looking at RLIMIT_SWAP, e.g., on my
desktop `procstat rlimitusage $(pgrep firefox) | grep swap` shows the
same value for all processes.

racct is hooking in at the wrong place.  It also assumes that calls to
swap_reserve_by_cred() and swap_release_by_cred() are balanced within a
single process, which I think is not true.

> For example:
> $ rctl -h -u jail:firefox-cage: | sort
> coredumpsize=0
> cputime=524
> datasize=276K
> maxproc=23
> memorylocked=0
> memoryuse=8236M
> msgqqueued=0
> msgqsize=0
> nmsgq=0
> nsem=0
> nsemop=0
> nshm=0
> nthr=559
> openfiles=5376
> pcpu=93
> pseudoterminals=0
> readbps=0
> readiops=0
> shmsize=0
> stacksize=8792K
> swapuse=32G
> vmemoryuse=73G
> wallclock=3445
> writebps=288
> writeiops=2
> 
> One minute later:
> $ rctl -h -u jail:firefox-cage: | sort
> coredumpsize=0
> cputime=588
> datasize=312K
> maxproc=26
> memorylocked=0
> memoryuse=8249M
> msgqqueued=0
> msgqsize=0
> nmsgq=0
> nsem=0
> nsemop=0
> nshm=0
> nthr=633
> openfiles=5496
> pcpu=80
> pseudoterminals=0
> readbps=0
> readiops=0
> shmsize=0
> stacksize=10M
> swapuse=19G
> vmemoryuse=73G
> wallclock=5140
> writebps=32K
> writeiops=16
> 
> So, I had to ditch that rule although I find limiting memoryuse without
> limiting swapuse to be incomplete.
> 
> Also, I didn't even consider limiting vmemoryuse because it is very large,
> it is hard to predict and it seems to have little correlation with the
> physical memory use.

I agree, limiting vmemoryuse is not very useful in general.  It's just
easy to implement.

> Regarding the experiment, Firefox more or less works, but not without issues.
> When there are a lot of sites are open in tabs, especially some "web
> applications" that I have to use and which I know to be memory hogs, Firefox
> start glitching here and there.  Mostly it looks like some broken
> JavaScript.
> 
> Another observation is that memoryuse always stays somewhat above the 8 GB
> limit.  Sometimes it's just very slightly above, sometimes it's a couple of
> hundred megs (or a few percent) above, e.g., memoryuse=8455M.
> 
> And almost all the time I see a vmdaemon thread being active:
>   PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
>    16 root        119    -     0B    16K CPU4     4  57.1H  68.98% vmdaemon
> 
> And it's always somewhere in this call chain:
> procstat -kk 16
>   PID    TID COMM                TDNAME              KSTACK
>    16 100177 vmdaemon            - vm_swapout_object_deactivate+0x130
> vm_swapout_map_deactivate_pages+0x1f3 vm_daemon+0x87d fork_exit+0xc7
> fork_trampoline+0xe
> 
> My impression is that vm_daemon is trying to inactivate some pages belonging
> to processes in the jail, so that they could get swapped out.  But either
> they get reactivated or the pageout code does not see a need to swap them
> out and they remain resident.

I think the RACCT_RSS implementation doesn't work well in general.  The
vm_daemon loop periodically (1Hz) scans all processes in the system, and
for each process updates the stored RSS and checks to see if a limit
applying to the process is reached.  If so, it picks some pages mapped
into the process and tries to map them, so they don't count against the
RSS anymore.  But: its strategy for picking pages to unmap is totally
unrelated to their usage, i.e., it may unmap frequently accessed pages,
in which case they will be faulted back into the pmap very quickly.  In
that case, vmdaemon and firefox will constantly be fighting each other.

> I'd say that this is kind of unexpected consequence.
> It keeps a CPU core busy and doesn't allow the system to enter power saving states.
> 
> To conclude, this has a been useful experiment for me.
> Initially, I had some naive expectations that memory limiting would just
> magically "limit memory".  The experiment forced me to think about what it
> actually means to limit memory, how it could be done, what consequences it
> would have and in what cases it could be useful.
> 
> If anyone has better suggestions and better experience, please let me know.

I don't have any better suggestions.  It would take a fair bit of work
to improve racct such that it's able to limit memory usage the way you
want.