bsnmpd returns incorrect hrProcessorLoad values

Wed Feb 3 18:50:35 UTC 2010

En/na Mikolaj Golub ha escrit:
> On Fri, 29 Jan 2010 12:37:52 +0100 Gustau Pérez wrote:
>
>   
>>   Hi,
>>
>>   I'm using cacti to monitor some servers running FBSD. I was using 7.2
>> with SCHED_4BSD. With this configuration : bsnmpd+bsnmp-ucd was
>> returning right values for the cores' load.
>>
>>    I recently updated the servers (via csup) to RELENG_8 and bsnmpd is
>> returning negative values for the cores' load. If I try something like
>> in a 4-core system :
>>
>>               snmpwalk -v 2c -c community server .1.3.6.1.2.1.25.3.3.1
>>
>>    what I get is :
>>
>>         .1.3.6.1.2.1.25.3.3.1.1.6 = OID: .0.0
>>         .1.3.6.1.2.1.25.3.3.1.1.10 = OID: .0.0
>>         .1.3.6.1.2.1.25.3.3.1.1.14 = OID: .0.0
>>         .1.3.6.1.2.1.25.3.3.1.1.18 = OID: .0.0
>>         .1.3.6.1.2.1.25.3.3.1.2.6 = INTEGER: -182
>>         .1.3.6.1.2.1.25.3.3.1.2.10 = INTEGER: -182
>>         .1.3.6.1.2.1.25.3.3.1.2.14 = INTEGER: -182
>>         .1.3.6.1.2.1.25.3.3.1.2.18 = INTEGER: -182
>>
>>   I tried and old bsnmpd-ucd (0.2.1, works fine in a 7,2 system) with a
>> 8.0 system. Same wrong results. And it seems bsnmpd in /usr/src/contrib
>> has not changed between 7.2 and 8.0.
>>
>>   Any ideas ? I'm not an expert, but with tcpdump I see different
>> results. Against an old 7.2 system, the field related to each core load
>> gives the right value. Instead, against and 8.0 system, those field show
>> (in hex) values like fd 4b. What I don't know is how bsdnmp-ucb retrives
>> those values and how it construct the udp response packet.
>>     
>
> bsnmpd-ucd has nothing to do with HOST-RESOURCES-MIB. These mibs are provided
> by snmp_hostres(3) module (/usr/lib/snmp_hostres.so). So something wrong is
> there (I suppose it is not in sync with some recent changes in kernel or
> libkvm).
>
>   
    You are right. I checked the
usr.sbin/bsnmpd/modules/snmp_hostres/hostres_processor_tbl.c. I think it
has something to do with the processor_getpcpu function (line 122). The
code is :

>         if (ccpu == 0 || fscale == 0)
>                 return (0.0);
>  
> #define fxtofl(fixpt) ((double)(fixpt) / fscale)
>         return (100.0 * fxtofl(ki_p->ki_pctcpu) /
>             (1.0 - exp(ki_p->ki_swtime * log(fxtofl(ccpu)))));

   With 4 core  SCHED_ULE system  I checked it and ccpu is always 0
(sysctl kern.ccpu gives 0 too). So this routine always returns 0.0. That
makes the save_sample routine to fill e->samples[#cpu] with 100. If I
comment the ccpu ==0, the I see strange values. I know, I changed the code.

   With some printfs, I see the returned value when starting bsnmpd is
98~99.  But the it goes up until 350~400 (strange). I put some others
printfs and then I saw that when starting the daemon it return 98~99 for
each processor and the ki_pctcpu is 2026 (in my case). Then, the next
time bsnmpd refreshes its values I see it returns wrong values and
ki_pctcpu goes up four times.  So the function returns nearly 400% of 
idle time for each processor...

  So I checked it with SCHED_4BSD with an 8 core system. The same
behaviour, but this time I got an increase of eight times for the
ki_pctcpu.

   Now I'm stuck in here. I think the kinfo_proc info is obtained ny
using kvm_getprocs.  Do you have any idea why it returns those values  ?

   Regards,

   Gus

-

-- 
PGP KEY : http://www-entel.upc.edu/gus/gus.asc