Re: Private resident count in procstat(1)

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Mon, 18 Jul 2022 12:20:07 UTC
On Mon, Jul 18, 2022 at 12:30:59PM +0530, Arka Sharma wrote:
> Hi All,
> 
> I tracked the PRES field and observed that 'kve->kve_private_resident' is
> the count of vm_page contained by vm_object if the vm_map entry doesn't
> contain a shadow object. My question is why this field is called
> private resident as I understand the underlying vm_object will also contain
> pages which correspond to other map entries for the same process, and for
> shared mapping it could be referenced by map entries with other process as
> well.
> 
> For the RES field my understanding is, it is the sum of pages of shadow
> objects and the tail of the object chain for the range of the given map
> entry and we also have 'ki_rssize' which is 'pm_stats.resident_count' plus
> the sum of kernel stack pages of the threads. Is there a situation the RES
> value will be greater than 'ki_rssize' ?
> 
> Please correct me if above understanding is wrong.

The definition of the resident and 'private resident' per-process count
are quite arbitrary. I do not think it is possible for Mach VM to have
a single definition that would serve all desired uses. Whatever the
algorithm to calculate the values is chosen, it should be good enough
for common case, while not penalize corner situations by too consuming
computations.

For instance, if we have a shadow chain behind the top object, should
the resident pages in the objects below the top accounted to RSS?  Should
shadowed pages be accounted, and if not, where their count should go?  They
do consume memory, leaving less pages available to the rest of the system.
On the other hand, if we account them to each address space that this
shadow participates in, we would do over-accounting.

Another issue is that it is not possible to determine fast if the mapping
is shared, or truely shared.  Imagine MAP_SHARED anonymous mapping in the
process that never forked.  Or tmpfs mapped file with no other mappings.
In all that cases formally the mappings are shared, but it is very hard
to prove that there is no other mappings.

More, there are at least three places that look at RSS in kernel:
- the pure userspace reporting in kern.proc.vmmap
- OOM code that looks for the largest process
- swapout code that selects a process to swap out
Each place that (slightly) different definition of the residency.