[Bug 231457] Out of swap space on ZFS

Thu Feb 7 09:40:18 UTC 2019

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231457

mail at rubenvos.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mail at rubenvos.com

--- Comment #15 from mail at rubenvos.com ---
Hi,

We are seeing similar behaviour on one of our zfs-nfs servers as well.

Jan 31 10:41:13 volume1 kernel: pid 17505 (collectd), uid 0, was killed: out of
swap space
Jan 31 10:41:13 volume1 kernel: pid 51659 (ntpd), uid 0, was killed: out of
swap space
Jan 31 10:42:54 volume1 kernel: pid 73673 (devd), uid 0, was killed: out of
swap space
Jan 31 10:43:11 volume1 kernel: pid 31167 (mountd), uid 0, was killed: out of
swap space
Jan 31 10:44:12 volume1 kernel: pid 50359 (nfsd), uid 0, was killed: out of
swap space
Jan 31 10:44:36 volume1 kernel: pid 81152 (zsh), uid 0, was killed: out of swap
space
Jan 31 10:44:54 volume1 kernel: pid 49005 (zsh), uid 4002, was killed: out of
swap space
Jan 31 10:46:13 volume1 kernel: pid 95263 (nrpe3), uid 181, was killed: out of
swap space
Jan 31 10:46:36 volume1 kernel: pid 48518 (sshd), uid 4002, was killed: out of
swap space
Jan 31 10:46:55 volume1 kernel: pid 92367 (rpcbind), uid 0, was killed: out of
swap space
Jan 31 10:47:11 volume1 kernel: pid 56206 (nfsd), uid 0, was killed: out of
swap space
Jan 31 10:47:23 volume1 kernel: pid 68827 (dhclient), uid 65, was killed: out
of swap space
Jan 31 10:47:38 volume1 kernel: pid 87548 (getty), uid 0, was killed: out of
swap space
Jan 31 10:47:50 volume1 kernel: pid 24945 (getty), uid 0, was killed: out of
swap space
Jan 31 10:49:14 volume1 kernel: pid 29466 (getty), uid 0, was killed: out of
swap space
Jan 31 10:49:37 volume1 kernel: pid 77339 (getty), uid 0, was killed: out of
swap space
Jan 31 10:49:51 volume1 kernel: pid 78317 (getty), uid 0, was killed: out of
swap space
Jan 31 10:50:13 volume1 kernel: pid 81831 (getty), uid 0, was killed: out of
swap space
Jan 31 10:50:37 volume1 kernel: pid 89762 (getty), uid 0, was killed: out of
swap space
Jan 31 10:50:51 volume1 kernel: pid 92067 (getty), uid 0, was killed: out of
swap space
Jan 31 10:51:49 volume1 kernel: pid 97499 (getty), uid 0, was killed: out of
swap space
Jan 31 10:52:14 volume1 kernel: pid 96091 (getty), uid 0, was killed: out of
swap space
Jan 31 10:52:37 volume1 kernel: pid 98907 (getty), uid 0, was killed: out of
swap space
Jan 31 10:52:51 volume1 kernel: pid 99595 (getty), uid 0, was killed: out of
swap space
Jan 31 10:55:47 volume1 kernel: pid 60068 (zsh), uid 0, was killed: out of swap
space
Feb  7 09:57:40 volume1 collectd[25157]: plugin_read_thread: read-function of
the `swap' plugin took 19.765 seconds, which is above its read interval (10.000
seconds). You might want to adjust the `Interval' or `ReadThreads' settings.
Feb  7 09:59:48 volume1 kernel: pid 25157 (collectd), uid 0, was killed: out of
swap space
Feb  7 09:59:48 volume1 kernel: pid 94240 (atop), uid 0, was killed: out of
swap space
Feb  7 09:59:48 volume1 kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 327109, size: 16384
Feb  7 09:59:48 volume1 kernel: pid 51515 (ntpd), uid 0, was killed: out of
swap space
Feb  7 09:59:48 volume1 kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 326787, size: 4096
Feb  7 09:59:48 volume1 kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 102263, size: 4096
Feb  7 09:59:48 volume1 kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 327152, size: 4096
Feb  7 09:59:48 volume1 kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 100915, size: 8192
Feb  7 09:59:48 volume1 kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 326754, size: 8192
Feb  7 09:59:48 volume1 kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 8471, size: 4096
Feb  7 09:59:48 volume1 kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 106028, size: 12288
Feb  7 09:59:48 volume1 kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 8229, size: 8192
Feb  7 09:59:48 volume1 kernel: swap_pager: indefinite wait buffer: bufobj: 0,
blkno: 103890, size: 8192
Feb  7 10:03:11 volume1 kernel: swap_pager_getswapspace(32): failed
Feb  7 10:06:00 volume1 kernel: swap_pager_getswapspace(32): failed

root at volume1:~ # grep arc /boot/loader.conf 
vfs.zfs.arc_min="10024M"
vfs.zfs.arc_max="13084M"
root at volume1:~ # sysctl -a | grep phys
kern.ipc.shm_use_phys: 0
vm.phys_segs: 
vm.phys_free: 
vm.phys_pager_cluster: 1024
hw.physmem: 17139478528
root at volume1:~ # sysctl vm.pageout_oom_seq
vm.pageout_oom_seq: 120
root at volume1:~ # 
root at volume1:~ # swapinfo 
Device          1K-blocks     Used    Avail Capacity
/dev/gpt/swap     8388608    26080  8362528     0%
root at volume1:~ # freebsd-version -uk
11.2-RELEASE-p8
11.2-RELEASE-p8
root at volume1:~ # 

We actually do have reason to assume the VM's storage backend might be
periodically affected by an extremely slow storage provider (its running as a
VM on Openstack), as indicated by the "swap_pager: indefinite wait buffer:
bufobj".  It's kind of worrisome that important processes (nfsd for instance)
are shot down by the OOM with the default value of vm.pageout_oom_seq (if the
default setting of that sysctl turns out to cause the OOM killer).

We've just changed the vm.pageout_oom_seq from its default of 12 to 120 and are
monitoring the impact of that change.

Ruben(In reply to Billg from comment #13)

-- 
You are receiving this mail because:
You are the assignee for the bug.