koitsu at FreeBSD.org
Wed Oct 1 12:49:57 UTC 2008
On Wed, Oct 01, 2008 at 08:30:26AM -0400, Stephen Clark wrote:
> Jeremy Chadwick wrote:
>> On Wed, Oct 01, 2008 at 07:41:56AM -0400, Stephen Clark wrote:
>>> Hello List,
>>> I am running into a strange problem that points to a resource leak.
>>> The problem manifests itself after one of our remote systems has been
>>> up around 100 days.
>>> The symptom is that it appears no new processes can be spawned. If I try to
>>> ssh to the unit, I can see the 3-way tcp handshake and then no more traffic.
>>> Examining log files, like cron, etc show that when this happens no more entries
>>> are written into the cron log. The unit is acting as a firewall,
>>> router and vpn appliance these functions continue to work. We have a
>>> C application that is periodically started out of a shell script that
>>> reports various information about the system, it stops reporting,
>>> while vpns, ospf routing, and ipfilter firewalling continue to work
>>> and write into their logfiles.
>>> My question is how do I monitor the various resources in the system that could
>>> prevent the spawning of a new process?
>> Periodically logging "ps -auxw" output to a file would be useful, as
>> ideally you'd gradually see the list get longer and longer over time;
>> it's possible you have many zombie processes as a result of a parent
>> which is not reaping its children (calling waitpid(2) or its friends).
>> Other things that might come in useful are "fstat" and "vmstat -s".
>> It sounds like your C program relies heavily on system() or execl() and
>> fork(), which is why it's affected -- while the other programs are
>> likely kernel-level.
> Thanks Jeremy,
> I have added those commands to a periodic daily script.
> Another thing I have noticed is that quite often the problem seems to
> start at 2am in the morning, right when the periodic daily script runs.
> But I think it is coincidence and that we have reached the edge of the
> resource limit and all the jobs that get spawned by the periodic daily
> scripts pushes us over the limit.
> The other thing is that having logged into some of the systems that have
> been up in the 80 day range, I don't see a lot/any zombies. I just wonder
> if it is and fd leak, the fstat should point that out.
You might find the below thread beneficial -- an individual came to the
lists stating that they were running out of fds as a result of some
Java software running amok on their systems.
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |
More information about the freebsd-stable