[Bug 262743] Memory leak in strongswan's charon daemon when communicating over vici socket.

From: <bugzilla-noreply_at_freebsd.org>
Date: Wed, 23 Mar 2022 17:20:28 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262743

            Bug ID: 262743
           Summary: Memory leak in strongswan's charon daemon when
                    communicating over vici socket.
           Product: Base System
           Version: 13.1-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: misc
          Assignee: bugs@FreeBSD.org
          Reporter: mskalski13@gmail.com
 Attachment #232660 text/plain
         mime type:

Created attachment 232660
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=232660&action=edit
Dump of statistics of jemalloc library at charon daemon exit

On FreeBSD system (amd64, arm64) when communicating over vici socket memory
leaks in terms of constantly increasing Virtual and Resident (VMS and RSS)
memory of process occur, until all system memory is exhausted, when process
charon is killed by kernel with message kernel: pid 903 (charon), jid 0, uid 0,
was killed: failed to reclaim memory.

Any tool for memory leak detection tools (valgrind, ktrace) does not detect any
memory leaks, increasing RSS is the only symptom.

The same behaviour was observed on FreeBSD 12.1, 12.2 and 9.3 (the latter is
the last release before incorporating jemalloc library to FreeBSD's libc).

When running charon daemon on Linux (tested on Ubuntu 20.04 and Debian 10
bookworm/sid) problem does not occur.


I think this behaviour is because frequent memory allocation and deallocation
(malloc/free functions), which is used in vici plugin. And I observed that this
increase can also be caused by SA renegotiations, but that is harder to
isolate.

And there is no special malloc configuration for charon daemon and on the other
hand other applications on FreeBSD box are not affected, which are i.e. some
running python daemons (which I believe do massive allocations and use multiple
threads). I wonder what is specific in a way strongswan allocates memory that
RSS process memory is increasing so much?

To reproduce:
=============
1. Download any VM image with FreeBSD 12.0+ (was tested also on latest amd64
13.1-BETA2 to confirm)
Configure virtual machine; for strongswan compilation give more memory, but for
test 256 MB is enough.

2. Run VM and disable swap (to speed-up failure)

# swapoff /dev/gpt/swapfs

3. install required packages for strongswan compilation:

# pkg install git autoconf gperf autoconf-archive libtool m4 automake flex
bison pkgconf gettext

4. get strongswan: git clone https://github.com/strongswan/strongswan

5. Compile strongswan:

cd strongswan
./configure --disable-kernel-netlink --enable-kernel-pfroute
--enable-kernel-pfkey --disable-gmp --enable-openssl --enable-mediation
--disable-scripts --with-group=wheel --enable-gcm --enable-ccm --enable-pkcs11
make -j4
make install

6. start strongswan: ipsec start

7. run in loop any command which communicates on vici interface, swanctl
--stats is enough to reproduce error:

sh -c 'while swanctl --stats >/dev/null; do true; done'

8. Observe increase of VSS and RSS (Virtual and resident) memory of charon
process, using e.g. top

9. After few hours charon should be killed by kernel due to not enough
memory/swap space.


Additional info
===============
Problem occurred when monitored via vici socket state of charon daemon (tunnel
definitions, SAs, etc), but it was also reproduced using simple swanctl --stats
command repeated in loop.

No change in this beaviour is observed when using different configure's
--with-printf-hooks= -- according to issue in pfsense:
https://redmine.pfsense.org/issues/5149 this could be the reason, but tests
with --with-printf-hooks=builtin, --with-printf-hooks=glibc and
--with-printf-hooks=vstr did not fix the error.


I did some tests using various settings of `jemalloc`, attaching results, but I
don't know how to interpret the results. It was gathered using following
command:

sh -c "MALLOC_CONF='stats_print:true,narenas:1' /usr/local/libexec/ipsec/charon
2>/var/log/charon-memdump-0.log"

-- 
You are receiving this mail because:
You are the assignee for the bug.