resource leak
Stephen Clark
sclark46 at earthlink.net
Wed Oct 1 13:35:06 UTC 2008
Jeremy Chadwick wrote:
> On Wed, Oct 01, 2008 at 08:30:26AM -0400, Stephen Clark wrote:
>> Jeremy Chadwick wrote:
>>> On Wed, Oct 01, 2008 at 07:41:56AM -0400, Stephen Clark wrote:
>>>> Hello List,
>>>>
>>>> I am running into a strange problem that points to a resource leak.
>>>> The problem manifests itself after one of our remote systems has been
>>>> up around 100 days.
>>>> The symptom is that it appears no new processes can be spawned. If I try to
>>>> ssh to the unit, I can see the 3-way tcp handshake and then no more traffic.
>>>> Examining log files, like cron, etc show that when this happens no more entries
>>>> are written into the cron log. The unit is acting as a firewall,
>>>> router and vpn appliance these functions continue to work. We have a
>>>> C application that is periodically started out of a shell script that
>>>> reports various information about the system, it stops reporting,
>>>> while vpns, ospf routing, and ipfilter firewalling continue to work
>>>> and write into their logfiles.
>>>>
>>>> My question is how do I monitor the various resources in the system that could
>>>> prevent the spawning of a new process?
>>> Periodically logging "ps -auxw" output to a file would be useful, as
>>> ideally you'd gradually see the list get longer and longer over time;
>>> it's possible you have many zombie processes as a result of a parent
>>> which is not reaping its children (calling waitpid(2) or its friends).
>>>
>>> Other things that might come in useful are "fstat" and "vmstat -s".
>>>
>>> It sounds like your C program relies heavily on system() or execl() and
>>> fork(), which is why it's affected -- while the other programs are
>>> likely kernel-level.
>>>
>> Thanks Jeremy,
>>
>> I have added those commands to a periodic daily script.
>>
>> Another thing I have noticed is that quite often the problem seems to
>> start at 2am in the morning, right when the periodic daily script runs.
>>
>> But I think it is coincidence and that we have reached the edge of the
>> resource limit and all the jobs that get spawned by the periodic daily
>> scripts pushes us over the limit.
>>
>> The other thing is that having logged into some of the systems that have
>> been up in the 80 day range, I don't see a lot/any zombies. I just wonder
>> if it is and fd leak, the fstat should point that out.
>
> You might find the below thread beneficial -- an individual came to the
> lists stating that they were running out of fds as a result of some
> Java software running amok on their systems.
>
> http://lists.freebsd.org/pipermail/freebsd-stable/2008-September/thread.html#45383
> http://lists.freebsd.org/pipermail/freebsd-stable/2008-September/045383.html
>
Thanks, but after reading the thread is there a single place in the kernel that
reports the how many fds are currently in use? Does the "no more fds" message
get logged in /var/log/messages or only in the kernel log buffer, since I
haven't seen that message in the messages file, and since we force to have a
remote user reboot the box the kernel buffer is gone.
Steve
--
"They that give up essential liberty to obtain temporary safety,
deserve neither liberty nor safety." (Ben Franklin)
"The course of history shows that as a government grows, liberty
decreases." (Thomas Jefferson)
More information about the freebsd-stable
mailing list