[Bug 260245] swap/vm: Apparent memory leak: 100% swap usage

From: <bugzilla-noreply_at_freebsd.org>
Date: Sun, 05 Dec 2021 23:58:01 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=260245

            Bug ID: 260245
           Summary: swap/vm: Apparent memory leak: 100% swap usage
           Product: Base System
           Version: 12.2-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Keywords: needs-qa
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: koobs@FreeBSD.org
                CC: markj@FreeBSD.org
             Flags: mfc-stable13?, mfc-stable12?

A system running buildbot workers for the upstream (C)Python project CI,
running 12.2-RELEASE-p7 GENERIC amd64, sees swap usage increase over time
(multiple runs) until 100% is consumed, at which point the following errors are
generated:

Nov 23 10:10:57 122-RELEASE-p10-amd64-9e36 kernel: swap_pager_getswapspace(18):
failed
Nov 23 10:10:57 122-RELEASE-p10-amd64-9e36 kernel: swap_pager_getswapspace(9):
failed
Nov 23 10:10:57 122-RELEASE-p10-amd64-9e36 kernel: swap_pager_getswapspace(3):
failed
Nov 23 10:10:58 122-RELEASE-p10-amd64-9e36 kernel: pid 24131 (python), jid 0,
uid 1002, was killed: out of swap space
Nov 23 10:11:01 122-RELEASE-p10-amd64-9e36 kernel: pid 78211 (python3.9), jid
0, uid 1002, was killed: out of swap space
Nov 23 10:11:01 122-RELEASE-p10-amd64-9e36 kernel: swap_pager_getswapspace(32):
failed
Nov 23 10:11:01 122-RELEASE-p10-amd64-9e36 syslogd: last message repeated 26
times
Nov 23 10:11:01 122-RELEASE-p10-amd64-9e36 kernel: swap_pager_getswapspace(24):
failed
Nov 23 10:11:01 122-RELEASE-p10-amd64-9e36 kernel: swap_pager_getswapspace(32):
failed
Nov 23 10:11:01 122-RELEASE-p10-amd64-9e36 kernel: swap_pager_getswapspace(24):
failed

Usually, a single test run will consume 30-40% of the 8Gb swap total, and a
second or third run will consume it all.

The system is unable to swapoff, and no memory/vm resource utilisation changes
after killing processes.

The issue appears not to be reproducible after updating to a stable/12 kernel
and installing in-place.

Steps to reproduce:

- Install 12.2-RELEASE and update to latest patch level (p7 at time of writing)
- Checkout CPython (main) sources and run test suite [1]

Additional References:

Buildbot worker build run history:
https://buildbot.python.org/all/#/builders/172 (test failure output may provide
additional info))

[1] test command: make buildbottest TESTOPTS=-j4 TESTTIMEOUT=2100

I will follow this report up with `sysctl -a |grep vm` and `vmstat -z` output
before and after issue reproduction

-- 
You are receiving this mail because:
You are the assignee for the bug.