Date: Wed, 23 Mar 2022 17:20:28 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262743 Bug ID: 262743 Summary: Memory leak in strongswan's charon daemon when communicating over vici socket. Product: Base System Version: 13.1-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: misc Assignee: bugs@FreeBSD.org Reporter: email@example.com Attachment #232660 text/plain mime type: Created attachment 232660 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=232660&action=edit Dump of statistics of jemalloc library at charon daemon exit On FreeBSD system (amd64, arm64) when communicating over vici socket memory leaks in terms of constantly increasing Virtual and Resident (VMS and RSS) memory of process occur, until all system memory is exhausted, when process charon is killed by kernel with message kernel: pid 903 (charon), jid 0, uid 0, was killed: failed to reclaim memory. Any tool for memory leak detection tools (valgrind, ktrace) does not detect any memory leaks, increasing RSS is the only symptom. The same behaviour was observed on FreeBSD 12.1, 12.2 and 9.3 (the latter is the last release before incorporating jemalloc library to FreeBSD's libc). When running charon daemon on Linux (tested on Ubuntu 20.04 and Debian 10 bookworm/sid) problem does not occur. I think this behaviour is because frequent memory allocation and deallocation (malloc/free functions), which is used in vici plugin. And I observed that this increase can also be caused by SA renegotiations, but that is harder to isolate. And there is no special malloc configuration for charon daemon and on the other hand other applications on FreeBSD box are not affected, which are i.e. some running python daemons (which I believe do massive allocations and use multiple threads). I wonder what is specific in a way strongswan allocates memory that RSS process memory is increasing so much? To reproduce: ============= 1. Download any VM image with FreeBSD 12.0+ (was tested also on latest amd64 13.1-BETA2 to confirm) Configure virtual machine; for strongswan compilation give more memory, but for test 256 MB is enough. 2. Run VM and disable swap (to speed-up failure) # swapoff /dev/gpt/swapfs 3. install required packages for strongswan compilation: # pkg install git autoconf gperf autoconf-archive libtool m4 automake flex bison pkgconf gettext 4. get strongswan: git clone https://github.com/strongswan/strongswan 5. Compile strongswan: cd strongswan ./configure --disable-kernel-netlink --enable-kernel-pfroute --enable-kernel-pfkey --disable-gmp --enable-openssl --enable-mediation --disable-scripts --with-group=wheel --enable-gcm --enable-ccm --enable-pkcs11 make -j4 make install 6. start strongswan: ipsec start 7. run in loop any command which communicates on vici interface, swanctl --stats is enough to reproduce error: sh -c 'while swanctl --stats >/dev/null; do true; done' 8. Observe increase of VSS and RSS (Virtual and resident) memory of charon process, using e.g. top 9. After few hours charon should be killed by kernel due to not enough memory/swap space. Additional info =============== Problem occurred when monitored via vici socket state of charon daemon (tunnel definitions, SAs, etc), but it was also reproduced using simple swanctl --stats command repeated in loop. No change in this beaviour is observed when using different configure's --with-printf-hooks= -- according to issue in pfsense: https://redmine.pfsense.org/issues/5149 this could be the reason, but tests with --with-printf-hooks=builtin, --with-printf-hooks=glibc and --with-printf-hooks=vstr did not fix the error. I did some tests using various settings of `jemalloc`, attaching results, but I don't know how to interpret the results. It was gathered using following command: sh -c "MALLOC_CONF='stats_print:true,narenas:1' /usr/local/libexec/ipsec/charon 2>/var/log/charon-memdump-0.log" -- You are receiving this mail because: You are the assignee for the bug.