[Bug 221743] mountd at 100% CPU for 24+ hours - getmntinfo() inefficient with thousands of filesystems and snapshots

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Wed Aug 23 15:45:06 UTC 2017


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221743

            Bug ID: 221743
           Summary: mountd at 100% CPU for 24+ hours - getmntinfo()
                    inefficient with thousands of filesystems and
                    snapshots
           Product: Base System
           Version: CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: bin
          Assignee: freebsd-bugs at FreeBSD.org
          Reporter: peter at ifm.liu.se

Created attachment 185694
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=185694&action=edit
Fixed getmntinfo.c.diff

I noticed that mountd on two of our file servers where running at 100% for over
24 hours. 

These systems are running FreeBSD 11.0 and are Dell 730xd servers with 256GB
RAM and around 140TB of disk where there are around 16000 user filesystems
(with around 20-40 hourly snapshots per filesystem).

A "truss" of one of them indicated it was busy in a loop calling getfsinfo(),
munmap() and mmap() and slowly trying to loading more and more (one more per
loop) filesystems into a dynamically allocated buffer. 

At the time I looked it was up at 280000 filesystems+snapshots out of the
360000 available ones (16000 filesystems, the rest snapshots).

Looking at the code for getmntinfo() in /usr/src/lib/libc/gen/getmntinfo.c I
see that the code calls getfsinfo() and tries to load the list of filesystems -
and if it sees that it could load more filesystems than expected, loops back
and reretries with the buffer resized to fit one more filesystem.

The problem seems to be that at around 250000-300000 filesystems+snapshots the
loop took so long that due to the 16000 new snapshots created every hour it
never really catched up...

In the attached patch I've modified the getmntinfo() function to call
getfsinfo() in the loop in order to get the new number of available filesystems
- and also have a larger "extra" space - and just give up after 3 rounds in the
loop and just return the list it has got at that time...

Btw we also noticed that the snapshots where only sometimes included in the
list from getfsinfo() - but not always. It seems it must be accessed to show up
in the list (ls -l in ".zfs/snapshot" triggers a "mount"), or like in our case
- and rsync backup job).

I include a patch for a modified getmntinfo() function.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list