[Bug 230704] All the memory eaten away by ZFS 'solaris' malloc
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Fri Aug 17 13:33:31 UTC 2018
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230704
Bug ID: 230704
Summary: All the memory eaten away by ZFS 'solaris' malloc
Product: Base System
Version: 11.2-RELEASE
Hardware: Any
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: kern
Assignee: bugs at FreeBSD.org
Reporter: Mark.Martinec at ijs.si
Affected: ZFS on 11.2-RELEASE and 11.1-RELEASE-p11 (but not on 10.3).
Running commands like 'zpool list', 'zpool status' or 'zpool iostat'
on a defunct pool (with broken underlying disks) leaks memory.
When such command runs frequently (like by a monitoring tool
'telegraf'), in a couple of days the system runs out of memory,
applications start swapping and eventually everything grinds
to a standstill, requiring a forced reboot.
In few days, shortly before a freeze, the vmstat -m shows
the 'solaris' malloc approaching the total size of the memory
(prior to that this number was steadily growing linearly):
$ vmstat -m :
Type InUse MemUse HighUse Requests Size(s)
solaris 39359484 2652696K - 234986296 ...
How to repeat:
# create a test pool on md
mdconfig -a -t swap -s 1Gb
gpart create -s gpt /dev/md0
gpart add -t freebsd-zfs -a 4k /dev/md0
zpool create test /dev/md0p1
# destroy the disk underneath the pool, making it "unavailable"
mdconfig -d -u 0 -o force
reboot now (before a reboot the trouble does not start yet)
Now run 'zpool list' periodically, monitoring the growth of
the 'solaris' malloc zone:
(while true; do zpool list >/dev/null; vmstat -m | \
fgrep solaris; sleep 0.5; done) | awk '{print $2-a; a=$2}'
12224540
2509
3121
5022
2507
1834
2508
2505
As suggested by Mark Johnston, here is a dtrace
https://www.ijs.si/usr/mark/tmp/dtrace-cmd.out.bz2
from the following command:
# dtrace -c "zpool list -Hp" -x temporal=off -n '
dtmalloc::solaris:malloc
/pid == $target/{@allocs[stack(), args[3]] = count()}
dtmalloc::solaris:free
/pid == $target/{@frees[stack(), args[3]] = count();}'
This will record all allocations and frees from a single instance
of "zpool list".
Andriy Gapon wrote on the mailing list:
I see one memory leak, not sure if it's the only one.
It looks like vdev_geom_read_config() leaks all parsed vdev nvlist-s but
the last. The problems seems to come from r316760. Before that commit
the function would return upon finding the first valid config, but now
it keeps iterating.
The memory leak should not be a problem when vdev-s are probed
sufficiently rarely, but it appears that with an unhealthy pool the
probing can happen much more frequently (e.g., every time pools are listed).
The whole discussion leading to the above findings is on
the stable at freebsd.org mailing list, 2018-07-23 to 2018-08-14,
subject:
"All the memory eaten away by ZFS 'solaris' malloc - on 11.1-R amd64"
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-bugs
mailing list