From david at catwhisker.org Sat Sep 13 00:15:08 2008 From: david at catwhisker.org (David Wolfskill) Date: Sat Sep 13 00:15:16 2008 Subject: Using sysctl(1) to gather resource consumption data Message-ID: <20080912234822.GK11991@bunrab.catwhisker.org> At $work, I've been trying to gather information on "interesting patterns" of resource consumption during moderately long-running (5 - 8 hour) tasks; the hosts in question usually run FreeBSD 6.2, though there's an occasional 6.x that's more recent, as well as a bit of 7-STABLE. I wanted to have a low impact on the system being measured (of course), and I was unwilling to require that a system to be measured had any software installed on it other than base FreeBSD. (Yes, that means I didn't assume Perl, though in practice in this environment, each does.) I also wanted the data to be transferred reasonably securely, even if part of that transit was over facilities over which I had no control. (Some of the machines being measured happen to be in a continent other than where I am.) So I cobbled up a Perl script to run on a data-gathering machine (that one was mine, so I could require that it had any software I wanted on it); it acts (if you will) as a "shepherd," watching over child processes, one of which is created for each host to be measured. A given child process copies over a shell script to the remote machine, then redirects STDOUT to append to a file on the data-gathering machine, and exec()s ssh(1), telling it to run the shell script on the remote machine. The shell script fabricates a string (depending on the arguments with which it was invoked), then sits in a loop: * eval the string * sleep for the amount of time remaining indefinitely. (In practice, the usual nominal time between successive eval()s is 5 minutes. I have recently been doing some experiments at a 10-second interval.) Periodically, back on the data-gathering machine, a couple of different things happen: * The "shepherd" script wakes up and checks the mtime on the file for each per-host process (to see if it's been updated "sufficiently recently"). Acttually, it first checks the file that lists the hosts to watch; if its mtime has changed, it's re-read, and the list of hosts is modified as appropriate. Anyway, if a given per-host file is "too old," the corresponding child process is killed. The the script runs through the list of hosts that should be checked, creating a per-host process for each one for which that's necessary. There's a fair amount of detail I'm eliding (such as limited exponential backoff for unresponsive hosts). In practice, this runs every 2 minutes at the moment. * There's a cron(8)-initiated make(1) process that runs, reading the files created by the per-host processes and writing to a corresponding RRD. (I cobbled up a Perl script to do this.) While I tried to externalize a fair amount of this -- e.g., the list of sysctl(1) OIDs to use is read from an external file -- it turns out that certain types of change are a bit ... painful. In particular, adding a new "data source" to the RRD qualifies (as "painful"). I recently modified the scripts involved to allow them to also be used to gather per-NIC statistics (via invocation of "netstat -nibf inet"). I'm about to implement that change over the weekend, so it occurred to me that this might be a good time to add some more sysctl(1) OIDs. So I'm asking for suggestions -- ideally, for OIDs that are fairly easily parseable. (I started being limited to only OIDs that were presented as a single numeric value per line, then figured out how to handle kern.cp_time (which is an ordered quintuple); later I figured out how to cope with vm.loadavg (which is an order triplet ... surrounded by curly braces). I don't currently have logic to cope with anything more complicated than those.) Here's a list of the OIDs I'm currently using: debug.dir_entry debug.direct_blk_ptrs debug.numcache debug.numcachehv debug.numneg debug.to_avg_depth debug.to_avg_gcalls debug.to_avg_mpcalls hw.usermem kern.cp_time kern.ipc.max_datalen kern.ipc.max_hdr kern.ipc.maxsockbuf kern.ipc.msgmax kern.ipc.msgmnb kern.ipc.msgmni kern.ipc.msgtql kern.ipc.nmbclusters kern.ipc.nmbjumbo16 kern.ipc.nmbjumbo9 kern.ipc.nmbjumbop kern.ipc.nsfbufs kern.ipc.nsfbufspeak kern.ipc.nsfbufsused kern.ipc.numopensockets kern.ipc.pipekva kern.ipc.pipes kern.kstack_pages kern.malloc_count kern.maxfiles kern.maxusers kern.nselcoll kern.openfiles net.isr.count net.isr.deferred net.isr.directed net.isr.drop net.isr.queued vfs.bufdefragcnt vfs.buffreekvacnt vfs.bufmallocspace vfs.bufreusecnt vfs.bufspace vfs.cache.dotdothits vfs.cache.dothits vfs.cache.numcache vfs.cache.numcalls vfs.cache.numchecks vfs.cache.numfullpathcalls vfs.cache.numfullpathfail1 vfs.cache.numfullpathfail2 vfs.cache.numfullpathfail4 vfs.cache.numfullpathfound vfs.cache.nummiss vfs.cache.nummisszap vfs.cache.numneg vfs.cache.numneghits vfs.cache.numnegzaps vfs.cache.numposhits vfs.cache.numposzaps vfs.dirtybufferflushes vfs.dirtybufthresh vfs.flushwithdeps vfs.freevnodes vfs.getnewbufcalls vfs.getnewbufrestarts vfs.hibufspace vfs.hidirtybuffers vfs.hirunningspace vfs.lobufspace vfs.lodirtybuffers vfs.lorunningspace vfs.maxbufspace vfs.maxmallocbufspace vfs.nfs.downdelayinitial vfs.nfs.downdelayinterval vfs.nfs.realign_count vfs.nfs.realign_test vfs.nfs.reconnects vfs.nfs4.access_cache_timeout vfs.numdirtybuffers vfs.numfreebuffers vfs.numvnodes vfs.read_max vfs.reassignbufcalls vfs.wantfreevnodes vfs.write_behind vm.loadavg vm.stats.misc.cnt_prezero vm.stats.misc.zero_page_count vm.stats.sys.v_intr vm.stats.sys.v_soft vm.stats.sys.v_swtch vm.stats.sys.v_syscall vm.stats.sys.v_trap vm.stats.vm.v_active_count vm.stats.vm.v_cow_faults vm.stats.vm.v_cow_optim vm.stats.vm.v_forkpages vm.stats.vm.v_forks vm.stats.vm.v_free_count vm.stats.vm.v_inactive_count vm.stats.vm.v_intrans vm.stats.vm.v_kthreads vm.stats.vm.v_ozfod vm.stats.vm.v_pdpages vm.stats.vm.v_pdwakeups vm.stats.vm.v_pfree vm.stats.vm.v_reactivated vm.stats.vm.v_rforks vm.stats.vm.v_swapin vm.stats.vm.v_swapout vm.stats.vm.v_swappgsin vm.stats.vm.v_swappgsout vm.stats.vm.v_tfree vm.stats.vm.v_vforkpages vm.stats.vm.v_vforks vm.stats.vm.v_vm_faults vm.stats.vm.v_vnodein vm.stats.vm.v_vnodeout vm.stats.vm.v_vnodepgsin vm.stats.vm.v_vnodepgsout vm.stats.vm.v_wire_count vm.stats.vm.v_zfod vm.swap_idle_threshold1 vm.swap_idle_threshold2 I admit that I don't know what several of those actually mean: I figured I'd capture what I can, then try to make sense of it. It's very easy to ignore data that I've captured, but don't need; it's a little harder to take appropriate corrective action if I determine that there was some information I should have captured, but didn't. :-} Still, if something's in there that's just silly, I wouldn't mind knowing about it. :-) Thanks! Peace, david -- David H. Wolfskill david@catwhisker.org Depriving a girl or boy of an opportunity for education is evil. See http://www.catwhisker.org/~david/publickey.gpg for my public key. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-performance/attachments/20080913/8b465c96/attachment.pgp From bscott at bunyatech.com.au Sun Sep 14 09:18:38 2008 From: bscott at bunyatech.com.au (Brian Scott) Date: Sun Sep 14 09:18:45 2008 Subject: Using sysctl(1) to gather resource consumption data In-Reply-To: <20080912234822.GK11991@bunrab.catwhisker.org> References: <20080912234822.GK11991@bunrab.catwhisker.org> Message-ID: <48CCD68B.8050408@bunyatech.com.au> David Wolfskill wrote: > At $work, I've been trying to gather information on "interesting > patterns" of resource consumption during moderately long-running (5 - 8 > hour) tasks; the hosts in question usually run FreeBSD 6.2, though > there's an occasional 6.x that's more recent, as well as a bit of > 7-STABLE. > > I wanted to have a low impact on the system being measured (of course), > and I was unwilling to require that a system to be measured had any > software installed on it other than base FreeBSD. (Yes, that means I > didn't assume Perl, though in practice in this environment, each does.) > > I also wanted the data to be transferred reasonably securely, even if > part of that transit was over facilities over which I had no control. > (Some of the machines being measured happen to be in a continent other > than where I am.) > > So I cobbled up a Perl script to run on a data-gathering machine (that > one was mine, so I could require that it had any software I wanted on > it); it acts (if you will) as a "shepherd," watching over child > processes, one of which is created for each host to be measured. > > A given child process copies over a shell script to the remote machine, > then redirects STDOUT to append to a file on the data-gathering machine, > and exec()s ssh(1), telling it to run the shell script on the remote > machine. > > The shell script fabricates a string (depending on the arguments with > which it was invoked), then sits in a loop: > > * eval the string > * sleep for the amount of time remaining > > indefinitely. (In practice, the usual nominal time between successive > eval()s is 5 minutes. I have recently been doing some experiments at a > 10-second interval.) > > Periodically, back on the data-gathering machine, a couple of different > things happen: > > * The "shepherd" script wakes up and checks the mtime on the file for > each per-host process (to see if it's been updated "sufficiently > recently"). Acttually, it first checks the file that lists the hosts > to watch; if its mtime has changed, it's re-read, and the list of > hosts is modified as appropriate. Anyway, if a given per-host file is > "too old," the corresponding child process is killed. The the > script runs through the list of hosts that should be checked, > creating a per-host process for each one for which that's necessary. > > There's a fair amount of detail I'm eliding (such as limited > exponential backoff for unresponsive hosts). > > In practice, this runs every 2 minutes at the moment. > > * There's a cron(8)-initiated make(1) process that runs, reading the > files created by the per-host processes and writing to a corresponding > RRD. (I cobbled up a Perl script to do this.) > > While I tried to externalize a fair amount of this -- e.g., the list of > sysctl(1) OIDs to use is read from an external file -- it turns out that > certain types of change are a bit ... painful. In particular, adding a > new "data source" to the RRD qualifies (as "painful"). > > I recently modified the scripts involved to allow them to also be used > to gather per-NIC statistics (via invocation of "netstat -nibf inet"). > > I'm about to implement that change over the weekend, so it occurred to > me that this might be a good time to add some more sysctl(1) OIDs. > > So I'm asking for suggestions -- ideally, for OIDs that are fairly > easily parseable. (I started being limited to only OIDs that were > presented as a single numeric value per line, then figured out how to > handle kern.cp_time (which is an ordered quintuple); later I figured out > how to cope with vm.loadavg (which is an order triplet ... surrounded by > curly braces). I don't currently have logic to cope with anything more > complicated than those.) > > Here's a list of the OIDs I'm currently using: > -------- Snip --------- > > > I admit that I don't know what several of those actually mean: I figured > I'd capture what I can, then try to make sense of it. It's very easy to > ignore data that I've captured, but don't need; it's a little harder to take > appropriate corrective action if I determine that there was some > information I should have captured, but didn't. :-} > > Still, if something's in there that's just silly, I wouldn't mind knowing > about it. :-) > > Thanks! > > Peace, > david You may be interested in some software that I've written over the last 5 years or so called FreePDB. Its written in Perl and has a requirement for an XML library to be installed. This sort of breaks your first requirement but I'll describe it anyway. I schedule a program to run regularly with cron. The program reads some configuration data from an XML file telling it what needs to be collected (and what mechanisms to use to collect it) and issues the necessary commands (sysctl is definitely one of the possibilities) and spits out rows into one or more text files. In your case, I expect you would transfer the text files over to a central system (the logger just creates a new file if someone steals the old one), where another program loads the text files into database tables. Graphing support includes the possibility to extract data into an rrd file, as well as driving gnuplot or some Perl GD::Graph stuff, or even hooking up Excel with ODBC from a Windows box and using the graph wizard. Anyway, I just thought I'd mention it since it might save you some work. It can be found at freepdb.sourceforge.net. It definitely runs on FreeBSD (I recently upgraded a 4.7 machine but before that it ran there quite nicely) including 7.0. I'm just cleaning up a new release that includes choice of database systems and a few performance/usability improvements. As they say in the classics, "If you don't see what you need, just ask". Regards, Brian From numardbsd at gmail.com Sun Sep 14 11:44:12 2008 From: numardbsd at gmail.com (Norberto Meijome) Date: Sun Sep 14 11:44:19 2008 Subject: Using sysctl(1) to gather resource consumption data In-Reply-To: <20080912234822.GK11991@bunrab.catwhisker.org> References: <20080912234822.GK11991@bunrab.catwhisker.org> Message-ID: <20080914211136.1be3550d@ayiin> On Fri, 12 Sep 2008 16:48:22 -0700 David Wolfskill wrote: > I wanted to have a low impact on the system being measured (of course), > and I was unwilling to require that a system to be measured had any > software installed on it other than base FreeBSD. (Yes, that means I > didn't assume Perl, though in practice in this environment, each does.) Out of curiosity, how does bsnmpd compare to your approach with regards to impact on the system. It is part of 7.0 , not sure about previous versions, and it is definitely a more standard and cross platform approach , with support @ NOC / alerting side of things. (for what is worth, i've only used net-snmpd , not bsnmpd )... B _________________________ {Beto|Norberto|Numard} Meijome "Whenever you find that you are on the side of the majority, it is time to reform." Mark Twain I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned. From david at catwhisker.org Sun Sep 14 12:45:15 2008 From: david at catwhisker.org (David Wolfskill) Date: Sun Sep 14 12:45:21 2008 Subject: Using sysctl(1) to gather resource consumption data In-Reply-To: <20080914211136.1be3550d@ayiin> References: <20080912234822.GK11991@bunrab.catwhisker.org> <20080914211136.1be3550d@ayiin> Message-ID: <20080914120749.GN11991@bunrab.catwhisker.org> On Sun, Sep 14, 2008 at 09:11:36PM +1000, Norberto Meijome wrote: > ... > Out of curiosity, how does bsnmpd compare to your approach with regards to > impact on the system. It is part of 7.0 , not sure about previous versions, and > it is definitely a more standard and cross platform approach , with support @ > NOC / alerting side of things. > > (for what is worth, i've only used net-snmpd , not bsnmpd )... Understood. As I understand it, an SNMP daemon (whether bsnmpd or net-snmpd) would require some configuration on the remote host, and I wasn't willing to require that. Also, the only times I have used SNMP, it has been using a version that did not support encryption in any form (as for as I know), and since some of the transit was over facilities we don't control, I thought it would be a bit more sensible to use SSH for the transport. There is a moderate amount of work in setting up the SSH connection in the first place: the first version of my script actually had the "shepherd" script establish a new SSH connection to each remote host every 5 minutes; examing a ktrace of that convinced me that SSH session creation was not something I wanted to do on a frequent basis for a mechanism that was intended to be low impact. But keeping that SSH session around and "squirting" a little over 800 bytes of payload down the pipe every 5 minutes -- or even every 10 seconds -- shouldn't be too much impact. (As a colleague pointed out, that's probably less impact than running top(1) has.) Granted, this isn't intended for the one "shepherd" script to deal with thousands of remote hosts -- but I believe that "hundreds" is feasible. Mind, I'm not especially keen on re-inventing stuff that already works (or can be reasonably persuaded to work). But in this case, running an SNMP daemon seemed to fail to meet my (admittedly, somewhat self- imposed) requirements. Peace, david -- David H. Wolfskill david@catwhisker.org Depriving a girl or boy of an opportunity for education is evil. See http://www.catwhisker.org/~david/publickey.gpg for my public key. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-performance/attachments/20080914/635e52c2/attachment.pgp From valerio.daelli at gmail.com Sun Sep 14 19:08:49 2008 From: valerio.daelli at gmail.com (Valerio Daelli) Date: Sun Sep 14 19:08:55 2008 Subject: Using sysctl(1) to gather resource consumption data In-Reply-To: <20080914120749.GN11991@bunrab.catwhisker.org> References: <20080912234822.GK11991@bunrab.catwhisker.org> <20080914211136.1be3550d@ayiin> <20080914120749.GN11991@bunrab.catwhisker.org> Message-ID: <27dbfc8c0809141147i5404c1dbp3064c94e8b9d7636@mail.gmail.com> On Sun, Sep 14, 2008 at 2:07 PM, David Wolfskill wrote: > On Sun, Sep 14, 2008 at 09:11:36PM +1000, Norberto Meijome wrote: >> ... Hi I was thinking about extending net-snmp to gather some resource consumption data, (read-only MIBS). I'l post a PR as soon as I have a working patch. Valerio From numardbsd at gmail.com Wed Sep 17 14:39:34 2008 From: numardbsd at gmail.com (Norberto Meijome) Date: Wed Sep 17 14:39:40 2008 Subject: Using sysctl(1) to gather resource consumption data In-Reply-To: <20080914120749.GN11991@bunrab.catwhisker.org> References: <20080912234822.GK11991@bunrab.catwhisker.org> <20080914211136.1be3550d@ayiin> <20080914120749.GN11991@bunrab.catwhisker.org> Message-ID: <20080918003927.56387d63@ayiin> On Sun, 14 Sep 2008 05:07:49 -0700 David Wolfskill wrote: > On Sun, Sep 14, 2008 at 09:11:36PM +1000, Norberto Meijome wrote: > > ... > > Out of curiosity, how does bsnmpd compare to your approach with regards to > > impact on the system. It is part of 7.0 , not sure about previous versions, > > and it is definitely a more standard and cross platform approach , with > > support @ NOC / alerting side of things. > > > > (for what is worth, i've only used net-snmpd , not bsnmpd )... > > Understood. As I understand it, an SNMP daemon (whether bsnmpd or > net-snmpd) would require some configuration on the remote host, and I > wasn't willing to require that. fair enough. I don't know about the default config of bsnmpd, but "default" in net-smpd, IIRC, means you access as public, pretty open. Not sure if there are MIBs for the information you need though. > Also, the only times I have used SNMP, it has been using a version that > did not support encryption in any form (as for as I know), and since > some of the transit was over facilities we don't control, I thought it > would be a bit more sensible to use SSH for the transport. but do you use encryption with your current system? [...] > Mind, I'm not especially keen on re-inventing stuff that already works > (or can be reasonably persuaded to work). But in this case, running an > SNMP daemon seemed to fail to meet my (admittedly, somewhat self- > imposed) requirements. hey , your requirements are yours :) I was just curious to know why snmp didnt cut it. B _________________________ {Beto|Norberto|Numard} Meijome "Gravity cannot be blamed for people falling in love." Albert Einstein I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned. From david at catwhisker.org Wed Sep 17 15:29:57 2008 From: david at catwhisker.org (David Wolfskill) Date: Wed Sep 17 15:30:04 2008 Subject: Using sysctl(1) to gather resource consumption data In-Reply-To: <20080918003927.56387d63@ayiin> References: <20080912234822.GK11991@bunrab.catwhisker.org> <20080914211136.1be3550d@ayiin> <20080914120749.GN11991@bunrab.catwhisker.org> <20080918003927.56387d63@ayiin> Message-ID: <20080917152951.GH11991@bunrab.catwhisker.org> On Thu, Sep 18, 2008 at 12:39:27AM +1000, Norberto Meijome wrote: > ... > > Also, the only times I have used SNMP, it has been using a version that > > did not support encryption in any form (as for as I know), and since > > some of the transit was over facilities we don't control, I thought it > > would be a bit more sensible to use SSH for the transport. > > but do you use encryption with your current system? Since it uses SSH for transport, yes. And it uses authentication, too (for the same reason). > [...] > > Mind, I'm not especially keen on re-inventing stuff that already works > > (or can be reasonably persuaded to work). But in this case, running an > > SNMP daemon seemed to fail to meet my (admittedly, somewhat self- > > imposed) requirements. > > hey , your requirements are yours :) I was just curious to know why snmp didnt > cut it. Fair enough. :-} Peace, david -- David H. Wolfskill david@catwhisker.org Depriving a girl or boy of an opportunity for education is evil. See http://www.catwhisker.org/~david/publickey.gpg for my public key. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-performance/attachments/20080917/db035616/attachment.pgp