how to measure what is consuming swap space
bc979 at lafn.org
Sun Jul 23 20:11:45 UTC 2017
> On 22 July 2017, at 02:43, tech-lists <tech-lists at zyxst.net> wrote:
> Hello list,
> What's the best command to see what processes are consuming swap space?
I don't see anyone else responding. I will not claim to be an expert, or to even understand all of this, but here is what I went through to resolve a similar issue over a couple years.
I had a system that would occasionally kill a specific process. The system did a lot of logging so that I never could find the cause of the problem. Eventually it happened while I was doing a tail of messages and I saw the message the system had run out of swap space and "somehow" picked a process to kill. I started monitoring swap space and found that the system used very little swap for a couple weeks and then started using it quite quickly, eventually running out. Once the process was killed, swap usage went down to somewhere between 0 and 1 percent. Unfortunately, killing that process also killed the usefulness of the system. So, I used nagios to check swap usage and when it got over 50% I would pick a convenient time to restart that process.
After a bunch of discussion here, someone told me about procstat -v. I ran procstat -v on that process right after starting it up and there were a few entries. Most of them seemed reasonable. Once the swap usage got to 50%, there were a huge number of entries. Most of them were of type df. Somewhere I read that df types are generally created by mmaps that are not file backed. They use swap space for backing up that segment. However, sw types also appear to be swap backed. I don't know what the difference is in those. procstat -va works, but generates way too much information to analyze. Its much easier if you can find the process causing the problem via other approaches first. For example, I took a system running only an incoming mail server and ran top to get the current size of the swap file. It was about 3 percent. Dividing that number by 4K gives you the number of swap pages in use. There could be that many entries in the procstat -va output with types sw and df. Its not likely to be that large as may of the allocations are larger than one page, but the number is daunting.
I went through the source for the process causing the issue and there were no non-file backed mmaps. There were a bunch of file backed mmaps. Whats even more interesting is this process runs on a large number of systems and only one ever showed the problem. I never found the real cause. The developer of the process did a major restructuring of the code and released a new version. It no longer has the problem. He has no idea what he could have done that would have fixed it either.
I know this is not a good cookbook solution, but that's what I went through.
More information about the freebsd-questions