8.0: OpenSSL stat()'s NLS 500+ times causing extreme system load
Jonathan McKeown
j.mckeown at ru.ac.za
Thu Dec 17 07:05:52 UTC 2009
On Tuesday 15 December 2009 23:24:16 Linda Messerschmidt wrote:
> On Tue, Dec 15, 2009 at 12:53 PM, Dan Nelson <dnelson at allantgroup.com>
wrote:
> > It's defined in src/lib/libc/Makefile, so you should be able to remove
> > that line, rebuild libc and reinstall, and see whether your performance
> > issue goes away.
>
> I tried that and as you predicted, all the bogus stat calls went away.
>
> Unfortunately the performance issue did not. :( Back to the drawing
> board for me!
>
> Upon further inspection, it seems as though for each check, Nagios
> spawns a process that spawns a process that spawns a process that runs
> the check. I did "ktrace -i -t w -p (nagiospid)" on Nagios for 30
> seconds and the ktrace output contained records from 2365 different
> processes spawned in that 30 seconds. During that time, I would
> expect about 800 checks to have run, so it does seem like it's right
> at 3 processes per check.
>
> I just don't think the system can keep up with all that fork()ing
> without going all out; it's just a limit of the Nagios plugin
> architecture.
You've probably already spotted this, but this behaviour is documented in
largeinstallationtweaks.html:
``Normally Nagios will fork() twice when it executes host and service checks.
This is done to (1) ensure a high level of resistance against plugins that go
awry and segfault and (2) make the OS deal with cleaning up the grandchild
process once it exits. The extra fork() is not really necessary, so it is
skipped when you enable this option. As a result, Nagios will itself clean up
child processes that exit (instead of leaving that job to the OS). This
feature should result in significant load savings on your Nagios
installation.''
It can also be enabled separately in nagios's main config file -
child_processes_fork_twice is the option to look for.
Jonathan
More information about the freebsd-questions
mailing list