Using sysctl(1) to gather resource consumption data
david at catwhisker.org
Sat Sep 13 00:15:08 UTC 2008
At $work, I've been trying to gather information on "interesting
patterns" of resource consumption during moderately long-running (5 - 8
hour) tasks; the hosts in question usually run FreeBSD 6.2, though
there's an occasional 6.x that's more recent, as well as a bit of
I wanted to have a low impact on the system being measured (of course),
and I was unwilling to require that a system to be measured had any
software installed on it other than base FreeBSD. (Yes, that means I
didn't assume Perl, though in practice in this environment, each does.)
I also wanted the data to be transferred reasonably securely, even if
part of that transit was over facilities over which I had no control.
(Some of the machines being measured happen to be in a continent other
than where I am.)
So I cobbled up a Perl script to run on a data-gathering machine (that
one was mine, so I could require that it had any software I wanted on
it); it acts (if you will) as a "shepherd," watching over child
processes, one of which is created for each host to be measured.
A given child process copies over a shell script to the remote machine,
then redirects STDOUT to append to a file on the data-gathering machine,
and exec()s ssh(1), telling it to run the shell script on the remote
The shell script fabricates a string (depending on the arguments with
which it was invoked), then sits in a loop:
* eval the string
* sleep for the amount of time remaining
indefinitely. (In practice, the usual nominal time between successive
eval()s is 5 minutes. I have recently been doing some experiments at a
Periodically, back on the data-gathering machine, a couple of different
* The "shepherd" script wakes up and checks the mtime on the file for
each per-host process (to see if it's been updated "sufficiently
recently"). Acttually, it first checks the file that lists the hosts
to watch; if its mtime has changed, it's re-read, and the list of
hosts is modified as appropriate. Anyway, if a given per-host file is
"too old," the corresponding child process is killed. The the
script runs through the list of hosts that should be checked,
creating a per-host process for each one for which that's necessary.
There's a fair amount of detail I'm eliding (such as limited
exponential backoff for unresponsive hosts).
In practice, this runs every 2 minutes at the moment.
* There's a cron(8)-initiated make(1) process that runs, reading the
files created by the per-host processes and writing to a corresponding
RRD. (I cobbled up a Perl script to do this.)
While I tried to externalize a fair amount of this -- e.g., the list of
sysctl(1) OIDs to use is read from an external file -- it turns out that
certain types of change are a bit ... painful. In particular, adding a
new "data source" to the RRD qualifies (as "painful").
I recently modified the scripts involved to allow them to also be used
to gather per-NIC statistics (via invocation of "netstat -nibf inet").
I'm about to implement that change over the weekend, so it occurred to
me that this might be a good time to add some more sysctl(1) OIDs.
So I'm asking for suggestions -- ideally, for OIDs that are fairly
easily parseable. (I started being limited to only OIDs that were
presented as a single numeric value per line, then figured out how to
handle kern.cp_time (which is an ordered quintuple); later I figured out
how to cope with vm.loadavg (which is an order triplet ... surrounded by
curly braces). I don't currently have logic to cope with anything more
complicated than those.)
Here's a list of the OIDs I'm currently using:
I admit that I don't know what several of those actually mean: I figured
I'd capture what I can, then try to make sense of it. It's very easy to
ignore data that I've captured, but don't need; it's a little harder to take
appropriate corrective action if I determine that there was some
information I should have captured, but didn't. :-}
Still, if something's in there that's just silly, I wouldn't mind knowing
about it. :-)
David H. Wolfskill david at catwhisker.org
Depriving a girl or boy of an opportunity for education is evil.
See http://www.catwhisker.org/~david/publickey.gpg for my public key.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 195 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-performance/attachments/20080913/8b465c96/attachment.pgp
More information about the freebsd-performance