Significant memory leak in 9.3p10?

Thu Mar 26 23:39:14 UTC 2015

The Lost Admin
thelostadmin at gmail.com

On Mar 26, 2015, at 3:46 PM, J David <j.david.lists at gmail.com> wrote:

> On Mon, Mar 16, 2015 at 7:52 PM, J David <j.david.lists at gmail.com> wrote:
>> On Mon, Mar 16, 2015 at 7:24 PM, Konstantin Belousov
>> <kostikbel at gmail.com> wrote:
>>> There are a lot of possibilities to create persistent anonymous shared
>>> memory objects.  Not complete list is tmpfs mounts, swap-backed md disks,
>>> sysv shared memory, possibly posix shared memory (I do not remember which
>>> implementation is used in stable/9).
>> 
>> If that's the explanation, how could it be
>> detected/measured/investigated/resolved/prevented?
>> 
>> Under ordinary circumstances, machines will go run like this for days/weeks:
>> 
>> Mem: 549M Active, 3623M Inact, 567M Wired, 3484K Cache, 827M Buf, 3156M Free
>> Swap: 1024M Total, 1024M Free
>> 
>> Then, when this happens, it rapidly degrades from that to so bad that
>> processes start getting killed for being out of swap space.
> 
> These FreeBSD machines running out of swap space and dying continues
> to be a daily problem causing outages and unscheduled reboots.  Is
> there really no way to even research what might be causing the
> problem?
> 
> (Widening the cross-posting in the hopes of eliciting more help, so
> the brief summary of the problem orginally posted to freebsd-stable is
> that an unknown actor consumes all the user-space memory in the
> system, including swap space, to the point where processes are killed
> for being out of swap space, but if every process on the machine is
> stopped, very little of the user-space memory in use is freed.
> Original message with more details is here:
> https://lists.freebsd.org/pipermail/freebsd-stable/2015-March/081986.html
> .)
> 
> There are no tmpfs mounts or md disks, so it would have to be one of
> the other causes.  How can FreeBSD's use of persistent, anonymous
> shared memory objects be investigated, measured, or controlled so we
> can get a handle on this issue?

In your initial thread, you said:
$ sudo halt -p
> Waiting (max 60 seconds) for system process `vnlru' to stop...done
> Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
> Waiting (max 60 seconds) for system process `syncer' to stop…
> Syncing disks, vnodes remaining...0 0 0 0 0 0 0 0 0 done
> All buffers synced.  <----- 10 MINUTE HANG AFTER PRINTING THIS
> Uptime: 3d15h56m32s
> usbus0: Controller shutdown
> uhub0: at usbus0, port 1, addr 1 (disconnected)
> usbus0: controller did not stop
> usbus0: Controller shutdown complete
> acpi0: Powering system off
> Connection closed by foreign host.

> So it seems like somewhere after "All buffers synced" and printing the
> uptime, it's very slowly unwinding whatever is using up all that RAM
> and swap.
Have you looked through the system shutdown scripts (part of init/rc) to see what happens after the uptime is printed? that might give you a lead.

The output from your PS seams to be much shorter than I would expect. Are you sure it included everything? For example, I would expect to see processes for cron, syslog, and normally sshd. I’ve also got a few more kernel processes that you don’t appear to have. Most notably is pagedaemon

For what it’s worth, I’m running 9.3 RELEASE-P12 (the -p10 kernel) on a system 24x7 (6 days since the last reboot) and I haven’t had an issue. It’s a low volume NFS server.