[Bug 230704] All the memory eaten away by ZFS 'solaris' malloc

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Fri Aug 17 13:33:31 UTC 2018


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230704

            Bug ID: 230704
           Summary: All the memory eaten away by ZFS 'solaris' malloc
           Product: Base System
           Version: 11.2-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs at FreeBSD.org
          Reporter: Mark.Martinec at ijs.si

Affected: ZFS on 11.2-RELEASE and 11.1-RELEASE-p11 (but not on 10.3).

Running commands like 'zpool list', 'zpool status' or 'zpool iostat'
on a defunct pool (with broken underlying disks) leaks memory.

When such command runs frequently (like by a monitoring tool
'telegraf'), in a couple of days the system runs out of memory,
applications start swapping and eventually everything grinds
to a standstill, requiring a forced reboot.

In few days, shortly before a freeze, the  vmstat -m  shows
the 'solaris' malloc approaching the total size of the memory
(prior to that this number was steadily growing linearly):

  $ vmstat -m :
         Type InUse MemUse HighUse Requests  Size(s)
       solaris 39359484 2652696K       - 234986296  ...


How to repeat:

  # create a test pool on md
  mdconfig -a -t swap -s 1Gb
  gpart create -s gpt /dev/md0
  gpart add -t freebsd-zfs -a 4k /dev/md0
  zpool create test /dev/md0p1
  # destroy the disk underneath the pool, making it "unavailable"
  mdconfig -d -u 0 -o force

  reboot now (before a reboot the trouble does not start yet)

  Now run 'zpool list' periodically, monitoring the growth of
  the 'solaris' malloc zone:

  (while true; do zpool list >/dev/null; vmstat -m | \
     fgrep solaris; sleep 0.5; done) | awk '{print $2-a; a=$2}'
  12224540
  2509
  3121
  5022
  2507
  1834
  2508
  2505


As suggested by Mark Johnston, here is a dtrace

  https://www.ijs.si/usr/mark/tmp/dtrace-cmd.out.bz2

from the following command:

  # dtrace -c "zpool list -Hp" -x temporal=off -n '
             dtmalloc::solaris:malloc
               /pid == $target/{@allocs[stack(), args[3]] = count()}
         dtmalloc::solaris:free
           /pid == $target/{@frees[stack(), args[3]] = count();}'
  This will record all allocations and frees from a single instance
  of "zpool list".



Andriy Gapon wrote on the mailing list:

I see one memory leak, not sure if it's the only one.
It looks like vdev_geom_read_config() leaks all parsed vdev nvlist-s but
the last.  The problems seems to come from r316760.  Before that commit
the function would return upon finding the first valid config, but now
it keeps iterating.

The memory leak should not be a problem when vdev-s are probed
sufficiently rarely, but it appears that with an unhealthy pool the
probing can happen much more frequently (e.g., every time pools are listed).



The whole discussion leading to the above findings is on
the stable at freebsd.org mailing list, 2018-07-23 to 2018-08-14,
subject:
  "All the memory eaten away by ZFS 'solaris' malloc - on 11.1-R amd64"

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list