Using sysctl(1) to gather resource consumption data

Sun Sep 14 09:18:38 UTC 2008

David Wolfskill wrote:
> At $work, I've been trying to gather information on "interesting
> patterns" of resource consumption during moderately long-running (5 - 8
> hour) tasks; the hosts in question usually run FreeBSD 6.2, though
> there's an occasional 6.x that's more recent, as well as a bit of
> 7-STABLE.
> 
> I wanted to have a low impact on the system being measured (of course),
> and I was unwilling to require that a system to be measured had any
> software installed on it other than base FreeBSD.  (Yes, that means I
> didn't assume Perl, though in practice in this environment, each does.)
> 
> I also wanted the data to be transferred reasonably securely, even if
> part of that transit was over facilities over which I had no control.
> (Some of the machines being measured happen to be in a continent other
> than where I am.)
> 
> So I cobbled up a Perl script to run on a data-gathering machine (that
> one was mine, so I could require that it had any software I wanted on
> it); it acts (if you will) as a "shepherd," watching over child
> processes, one of which is created for each host to be measured.
> 
> A given child process copies over a shell script to the remote machine,
> then redirects STDOUT to append to a file on the data-gathering machine,
> and exec()s ssh(1), telling it to run the shell script on the remote
> machine.
> 
> The shell script fabricates a string (depending on the arguments with
> which it was invoked), then sits in a loop:
> 
> * eval the string
> * sleep for the amount of time remaining
> 
> indefinitely.  (In practice, the usual nominal time between successive
> eval()s is 5 minutes.  I have recently been doing some experiments at a
> 10-second interval.)
> 
> Periodically, back on the data-gathering machine, a couple of different
> things happen:
> 
> * The "shepherd" script wakes up and checks the mtime on the file for
>   each per-host process (to see if it's been updated "sufficiently
>   recently").  Acttually, it first checks the file that lists the hosts
>   to watch; if its mtime has changed, it's re-read, and the list of
>   hosts is modified as appropriate.  Anyway, if a given per-host file is
>   "too old," the corresponding child process is killed.  The the
>   script runs through the list of hosts that should be checked,
>   creating a per-host process for each one for which that's necessary.
> 
>   There's a fair amount of detail I'm eliding (such as limited
>   exponential backoff for unresponsive hosts).
> 
>   In practice, this runs every 2 minutes at the moment.
> 
> * There's a cron(8)-initiated make(1) process that runs, reading the
>   files created by the per-host processes and writing to a corresponding
>   RRD.  (I cobbled up a Perl script to do this.)
> 
> While I tried to externalize a fair amount of this -- e.g., the list of
> sysctl(1) OIDs to use is read from an external file -- it turns out that
> certain types of change are a bit ... painful.  In particular, adding a
> new "data source" to the RRD qualifies (as "painful").
> 
> I recently modified the scripts involved to allow them to also be used
> to gather per-NIC statistics (via invocation of "netstat -nibf inet").
> 
> I'm about to implement that change over the weekend, so it occurred to
> me that this might be a good time to add some more sysctl(1) OIDs.
> 
> So I'm asking for suggestions -- ideally, for OIDs that are fairly
> easily parseable.  (I started being limited to only OIDs that were
> presented as a single numeric value per line, then figured out how to
> handle kern.cp_time (which is an ordered quintuple); later I figured out
> how to cope with vm.loadavg (which is an order triplet ... surrounded by
> curly braces).  I don't currently have logic to cope with anything more
> complicated than those.)
> 
> Here's a list of the OIDs I'm currently using:
> 
-------- Snip ---------
> 
> 
> I admit that I don't know what several of those actually mean: I figured
> I'd capture what I can, then try to make sense of it.  It's very easy to
> ignore data that I've captured, but don't need; it's a little harder to take
> appropriate corrective action if I determine that there was some
> information I should have captured, but didn't.  :-}
> 
> Still, if something's in there that's just silly, I wouldn't mind knowing
> about it.  :-)
> 
> Thanks!
> 
> Peace,
> david

You may be interested in some software that I've written over the last 5 
years or so called FreePDB. Its written in Perl and has a requirement 
for an XML library to be installed. This sort of breaks your first 
requirement but I'll describe it anyway.

I schedule a program to run regularly with cron. The program reads some 
configuration data from an XML file telling it what needs to be 
collected (and what mechanisms to use to collect it) and issues the 
necessary commands (sysctl is definitely one of the possibilities) and 
spits out rows into one or more text files.

In your case, I expect you would transfer the text files over to a 
central system (the logger just creates a new file if someone steals the 
old one), where another program loads the text files into database tables.

Graphing support includes the possibility to extract data into an rrd 
file, as well as driving gnuplot or some Perl GD::Graph stuff, or even 
hooking up Excel with ODBC from a Windows box and using the graph wizard.

Anyway, I just thought I'd mention it since it might save you some work.

It can be found at freepdb.sourceforge.net. It definitely runs on 
FreeBSD (I recently upgraded a 4.7 machine but before that it ran there 
quite nicely) including 7.0.

I'm just cleaning up a new release that includes choice of database 
systems and a few performance/usability improvements. As they say in the 
classics, "If you don't see what you need, just ask".

Regards,

Brian