Re: What can I learn about data that is staying paged out? (There is a more specific poudriere bulk related context given.)

From: Mark Millard <marklmi_at_yahoo.com>
Date: Tue, 07 Jun 2022 05:05:08 UTC
Daniel Ebdrup Jensen <debdrup_at_FreeBSD.org> wrote on
Date: Mon, 06 Jun 2022 12:49:37 UTC :

> On Sun, Jun 05, 2022 at 03:55:21PM -0700, Mark Millard wrote:
> >Thanks for the idea. Know how I could find an approximation
> >to the amount of paged out buffer cache to see about how much
> >of the ~300 MiBytes it might explain?
> >
> . . .
> 
>       I believe what you're looking for is:
>       vmstat -o | awk '$7 == "sw" { print $0 }'
> 
>       The definition of the 7th column is escaping me right now; I'm
>       pretty sure I've seen it in a manual page somewhere, but can't for
>       the life of me remember it - so if anyone knows, do tell and I'll
>       figure out a way to get it added to vmstat. ;)
> 
>       If a lot of lines in vmstat -o are blank, it's usually a good idea
>       to have a look at `pstat -f` because it'll either be shared memory
>       objects, sockets, pipes, or things like that.
> 
>       There's also vmstat -m or vmstat -z which can be useful in breaking
>       down types of VM objects by allocator.
> 
>       I've also taken the liberty of including `zones.pl` which has been
>       floating around FreeBSD circles for ages, and which uses the vmstat
>       -z flag mentioned above plus a bit of perl to sum everything up
>       nicely.
> 
>       This is just what I've picked up over the course of sysadmining,
>       I'm by no means a VM expert - as evidenced by the fact that I
>       didn't know about sytat -swap, despite using systat regularly, and
>       wishing it had a `systat -sensors` page which detailed the
>       temperature values that can be found via acpi(4), coretemp(4) and
>       occationally others, as well as fan-speed as reported by
>       acpi_ibm(4) and others of its kind in /boot/kernel/acpi_*.
> 

Thanks. It was an interesting exploration.


# vmstat -o | more
  RES   ACT INACT REF SHD  CM TP PATH
    1     0     1   0   0 WB  vn /usr/local/poudriere/data/.m/main-CA7-bulk_a-default/01/usr/local/share/doc/gettext/examples/hello-objc-gnustep/po/ro.po
   33     4    29  29  17 WB  vn /lib/libtinfow.so.9
    0     0     0   1   0 WB  sw 
. . .

"TP" is probably TyPe and "sw" is probably for
swap-backed. ("vn" for vnode-backed.)

RES, ACT, INACT, REF, and SHD seem to give no
clue about how much is paged out into swap space,
just what is in RAM. Nor does there seem to be
any specific information about the content, even
when at least one of the size figures is non-zero.
I do not see how to identify tmpfs material vs.
non-tmpfs material in the output, for example.

# pstat -f | more
460/256959 open files
       LOC       TYPE   FLG  CNT MSG       DATA            OFFSET
ffffa0009e78f550 pipe     RW   1   0 ffffa00076cbc2e8                0
ffffa000136bf910 inode    RW  13   0 ffffa001c24f7400           8c4a4d
. . .
ffffa0009e78f910 fifo     RW  30   0 ffffa000080845d0                0
. . .

This seems to not have information about swap-backed. But
even if it did, I've not identified how I could put the
output to use. I expect that the inode ones are vnode-backed
instead of swap backed. (Although, tmpfs might be an odd
fit for that claim. But I do not see how to identify tmpfs
material vs. non-tmpfs material in the output.)


# vmstat -m | more
         Type InUse MemUse Requests  Size(s)
CAM I/O Scheduler     1     1K        1  128
        evdev     2     2K        2  1024
. . .

I do not see any way to identify any mounts
paged out to swap space here. There is some
material about tmpfs overhead:

  tmpfs mount     6     1K        8  128
   tmpfs name 122014  5917K   989436  16,32,64,128
    tmpfs dir 122020  7627K   989458  64

but I've not identified anything for the space in
tmpfs file systems.

# vmstat -z | more
ITEM                   SIZE  LIMIT     USED     FREE      REQ     FAILSLEEP XDOMAIN
kstack_cache:         16384,      0,     590,      12,  820664,   0,   0,   0
buf free cache:         432,      0,   42016,    5737,431716208,   0,   0,   0
vm pgcache:            4096,      0, 1704016,     176,10384187744, 944,   0,   0
vm pgcache:            4096,      0,  157200,     346,851328618, 286,   0,   0
. . .

I do not see any way to identify any mounts
paged out to swap space here. (The two lines
with the same name "vm pgcache" is an
interesting oddity/ambiguity.)

# ~/zones.pl
KMAP_ENTRY                        0.004    0.000    0.003   92.5%
. . .
TMPFS_node                       85.698   26.124   59.574   69.5%
VNODE                            88.432   48.078   40.354   45.6%
VM_OBJECT                       165.794   50.435  115.360   69.6%
vm_pgcache                      626.609  624.301    2.309    0.4%
vm_pgcache                     5608.180 5606.609    1.570    0.0%
TOTAL                          6858.561 6511.032  347.529    0.0%

This being a transformation of the vmstat -z data, the same
points apply.


It is also not clear how much of the used swap space
has exactly matching memory data vs. how much would
have to be read back in if swapoff was used to turn
off all the swap space. I've not figured out a good
way to get evidence from such an experiment and it
would mess up learning how big a swap space would
be put to use for as long as I let the bulk -a -c
continue.


Note: I do wonder about a tmpfs related leak being
involved:

# du -xsAm /usr/local/poudriere/data/.m/main-CA7-bulk_a-default/ref/.p/
103	/usr/local/poudriere/data/.m/main-CA7-bulk_a-default/ref/.p/

# du -xsm /usr/local/poudriere/data/.m/main-CA7-bulk_a-default/ref/.p/
68	/usr/local/poudriere/data/.m/main-CA7-bulk_a-default/ref/.p/

but:

# df -m /usr/local/poudriere/data/.m/main-CA7-bulk_a-default/ref/.p
Filesystem 1M-blocks Used Avail Capacity  Mounted on
tmpfs           1024  405   618    40%    /usr/local/poudriere/data/.m/main-CA7-bulk_a-default/ref/.p

405 MiBytes is big enough to supply sufficient material
to the (now) 330988Ki-used swap space. But why the huge
difference vs. the du totaling? cpdup and copy-on-write
to 4 builders or some such and where the usage is
accounted for?

But I've not found a way to check if the 330988Ki-used
swap space is mostly filled with tmpfs content or not.
It is not obvious to me that such a large tmpfs "Used"
should exist at all for the USE_TMPFS="data" style and
only 4 poudriere-devel builders.

For reference: 7122 ports built in about 235 hours with
builders running so far:

Swap: 30720Mi Total, 330988Ki Used, 30397Mi Free, 1% Inuse

I'll note that poudriere only reports small tmpfs use
figures, such as in:

[main-CA7-bulk_a-default] [2022-05-28_01h56m38s] [parallel_build:] Queued: 32034 Built: 7120  Failed: 32    Skipped: 1774  Ignored: 1413  Fetched: 0     Tobuild: 21695  Time: 234:41:54
 ID  TOTAL                          ORIGIN   PKGNAME                             PHASE PHASE    TMPFS       CPU% MEM%
[01] 00:19:33            finance/aqbanking | aqbanking-6.4.1                     build 00:00:48 40.00 KiB   0.1% 0.8%
[02] 00:13:10 www/py-django-treebeard@py38 | py38-django-treebeard-4.5.1 build-depends 00:12:07 56.00 KiB   0.6% 0.3%
[03] 07:41:31                 devel/llvm90 | llvm90-9.0.1_6                      build 07:28:53 4.00 KiB  258.6% 9.6%
[04] 13:06:10               emulators/mame | mame-0.226_1                        build 12:59:46 4.00 KiB   12.1% 1.2%

So if tmpfs is really using much of the 330988Ki Used
swap space, poudriere is far from counting all the tmpfs
use in its reporting.


One thing about using a RPi4B for this kind of experiment:
it gives time to observe the behavior. (But I'll not be able
to let it run for the 5+wk or whatever it would take for the
"bulk -a -c" to actually finish for how I've set things up.)


===
Mark Millard
marklmi at yahoo.com