ports/110498: net-snmp proc monitoring randomly fails

Mike Andrews mandrews at bit0.com
Mon Mar 19 03:40:05 UTC 2007


>Number:         110498
>Category:       ports
>Synopsis:       net-snmp proc monitoring randomly fails
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-ports-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Mar 19 03:40:04 GMT 2007
>Closed-Date:
>Last-Modified:
>Originator:     Mike Andrews
>Release:        FreeBSD 6.2-RELEASE-p2 amd64
>Organization:
Fark.com LLC
>Environment:
System: FreeBSD mindcrime.bit0.com 6.2-RELEASE-p2 FreeBSD 6.2-RELEASE-p2 #19: Sun Mar 4 15:16:21 EST 2007 mandrews at mindcrime.bit0.com:/usr/obj/usr/src/sys/MINDCRIME amd64


>Description:

With net-snmp 5.3.1 and FreeBSD 6.2-RELEASE (i386 or amd64) the "proc"
monitoring facility will randomly indicate alarms that certain processes
are not running (or not enough are running) when in fact they actually are.
The alarms will suddenly start with no warning and then clear themselves
up several hours later.

If you have Nagios checking these alarms, it can be highly annoying. :)

I'm fairly certain net-snmp 5.2.x and earlier don't have this problem
(I've been using them for years).

The problem is that net-snmp uses /bin/ps to get a list of processes
and writes the output of ps to /var/net-snmp/.snmp-exec-cache.  The
file is truncated at 16000 bytes.  This is way too small for systems
with many hundreds of running processes at a time.

Maybe previous versions (5.2.x and earlier) of net-snmp used something
other than /bin/ps to get the process list?  I don't have a procfs
filesystem mounted (I did try it to see if it'd help and it didn't)

>How-To-Repeat:

bourbon# grep proc /usr/local/share/snmp/snmpd.conf
proc syslogd 1 1
proc httpd
proc ntpd 1 1
proc smartd
proc clamd
proc freshclam
bourbon# ps -U vscan | grep clam
84154  ??  Is     0:00.18 /usr/local/bin/freshclam --daemon -p /var/run/clamav/freshclam.pid
84265  ??  Is     0:04.61 /usr/local/sbin/clamd
bourbon# snmpwalk -v 2c -c ___ localhost .1.3.6.1.4.1.2021.2.1
UCD-SNMP-MIB::prIndex.1 = INTEGER: 1
UCD-SNMP-MIB::prIndex.2 = INTEGER: 2
UCD-SNMP-MIB::prIndex.3 = INTEGER: 3
UCD-SNMP-MIB::prIndex.4 = INTEGER: 4
UCD-SNMP-MIB::prIndex.5 = INTEGER: 5
UCD-SNMP-MIB::prIndex.6 = INTEGER: 6
UCD-SNMP-MIB::prNames.1 = STRING: syslogd
UCD-SNMP-MIB::prNames.2 = STRING: httpd
UCD-SNMP-MIB::prNames.3 = STRING: ntpd
UCD-SNMP-MIB::prNames.4 = STRING: smartd
UCD-SNMP-MIB::prNames.5 = STRING: clamd
UCD-SNMP-MIB::prNames.6 = STRING: freshclam
UCD-SNMP-MIB::prMin.1 = INTEGER: 1
UCD-SNMP-MIB::prMin.2 = INTEGER: 0
UCD-SNMP-MIB::prMin.3 = INTEGER: 1
UCD-SNMP-MIB::prMin.4 = INTEGER: 0
UCD-SNMP-MIB::prMin.5 = INTEGER: 0
UCD-SNMP-MIB::prMin.6 = INTEGER: 0
UCD-SNMP-MIB::prMax.1 = INTEGER: 1
UCD-SNMP-MIB::prMax.2 = INTEGER: 0
UCD-SNMP-MIB::prMax.3 = INTEGER: 1
UCD-SNMP-MIB::prMax.4 = INTEGER: 0
UCD-SNMP-MIB::prMax.5 = INTEGER: 0
UCD-SNMP-MIB::prMax.6 = INTEGER: 0
UCD-SNMP-MIB::prCount.1 = INTEGER: 1
UCD-SNMP-MIB::prCount.2 = INTEGER: 345
UCD-SNMP-MIB::prCount.3 = INTEGER: 1
UCD-SNMP-MIB::prCount.4 = INTEGER: 1
UCD-SNMP-MIB::prCount.5 = INTEGER: 0
UCD-SNMP-MIB::prCount.6 = INTEGER: 0
UCD-SNMP-MIB::prErrorFlag.1 = INTEGER: 0
UCD-SNMP-MIB::prErrorFlag.2 = INTEGER: 0
UCD-SNMP-MIB::prErrorFlag.3 = INTEGER: 0
UCD-SNMP-MIB::prErrorFlag.4 = INTEGER: 0
UCD-SNMP-MIB::prErrorFlag.5 = INTEGER: 1
UCD-SNMP-MIB::prErrorFlag.6 = INTEGER: 1
UCD-SNMP-MIB::prErrMessage.1 = STRING:
UCD-SNMP-MIB::prErrMessage.2 = STRING:
UCD-SNMP-MIB::prErrMessage.3 = STRING:
UCD-SNMP-MIB::prErrMessage.4 = STRING:
UCD-SNMP-MIB::prErrMessage.5 = STRING: No clamd process running.
UCD-SNMP-MIB::prErrMessage.6 = STRING: No freshclam process running.
UCD-SNMP-MIB::prErrFix.1 = INTEGER: 0
UCD-SNMP-MIB::prErrFix.2 = INTEGER: 0
UCD-SNMP-MIB::prErrFix.3 = INTEGER: 0
UCD-SNMP-MIB::prErrFix.4 = INTEGER: 0
UCD-SNMP-MIB::prErrFix.5 = INTEGER: 0
UCD-SNMP-MIB::prErrFix.6 = INTEGER: 0
UCD-SNMP-MIB::prErrFixCmd.1 = STRING:
UCD-SNMP-MIB::prErrFixCmd.2 = STRING:
UCD-SNMP-MIB::prErrFixCmd.3 = STRING:
UCD-SNMP-MIB::prErrFixCmd.4 = STRING:
UCD-SNMP-MIB::prErrFixCmd.5 = STRING:
UCD-SNMP-MIB::prErrFixCmd.6 = STRING:
bourbon# ps -U vscan | grep clam
84154  ??  Is     0:00.18 /usr/local/bin/freshclam --daemon -p /var/run/clamav/freshclam.pid
84265  ??  Is     0:04.61 /usr/local/sbin/clamd
bourbon# ps -acx | grep httpd | wc
     744    3720   23808

(744 > 345)   ;-)

>Fix:

Try this patch, though only the second half of it seems to actually fix it:


*** acconfig.h.orig     Fri May 26 12:36:06 2006
--- acconfig.h  Sun Mar 18 22:24:27 2007
***************
*** 488,494 ****

  #define EXCACHETIME 30
  #define CACHEFILE ".snmp-exec-cache"
! #define MAXCACHESIZE (200*80)   /* roughly 200 lines max */

  /* misc defaults */

--- 488,494 ----

  #define EXCACHETIME 30
  #define CACHEFILE ".snmp-exec-cache"
! #define MAXCACHESIZE (1500*80)   /* roughly 1500 lines max */

  /* misc defaults */

*** include/net-snmp/net-snmp-config.h.in.orig  Fri May 26 12:36:06 2006
--- include/net-snmp/net-snmp-config.h.in       Sun Mar 18 22:54:13 2007
***************
*** 1334,1340 ****

  #define EXCACHETIME 30
  #define CACHEFILE ".snmp-exec-cache"
! #define MAXCACHESIZE (200*80)   /* roughly 200 lines max */

  /* misc defaults */

--- 1334,1340 ----

  #define EXCACHETIME 30
  #define CACHEFILE ".snmp-exec-cache"
! #define MAXCACHESIZE (1500*80)   /* roughly 1500 lines max */

  /* misc defaults */

>Release-Note:
>Audit-Trail:
>Unformatted:



More information about the freebsd-ports-bugs mailing list