CPU time accounting broken on 8-STABLE machine after a few hours of uptime

Don Lewis truckman at FreeBSD.org
Tue Sep 28 23:45:03 UTC 2010


On 28 Sep, Jeremy Chadwick wrote:

> Still speaking purely about ntpd:
> 
> The above doesn't indicate a single problem.  The deltas shown in both
> delay, offset, and jitter are all 100% legitimate.  A dd (to induce more
> interrupt use) isn't going to exacerbate the problem (depending on your
> system configuration, IRQ setup, local APIC, etc.).
> 
> How about writing a small shell script that runs every minute in a
> cronjob that does vmstat -i >> /some/file.log?  Then when you see calcru
> messages, look around the time frame where vmstat -i was run.  Look for
> high interrupt rates, aside from those associated with cpuX devices.

Looking at the timestamps of things and comparing to my logs, I
discovered that the last instance of ntp instability happened when I was
running "make index" in /usr/ports.  I tried it again with entertaining
results.  After a while, the machine became unresponsive.  I was logged
in over ssh and it stopped echoing keystrokes.  In parallel I was
running a script that echoed the date, the results of "vmstat -i", and
the results of "ntpq -c pe".  The latter showed jitter and offset going
insane.  Eventually "make index" finished and the machine was responsive
again, but the time was way off and ntpd croaked because the necessary
time correction was too large.  Nothing else anomalous showed up in the
logs.  Hmn, about half an hour after ntpd died I started my CPU time
accounting test and two minutes into that test I got a spew of calcru
messages ...

Tue Sep 28 14:52:27 PDT 2010
interrupt                          total       rate
irq0: clk                       64077827        999
irq1: atkbd0                          26          0
irq8: rtc                        8199966        127
irq9: acpi0                           19          0
irq10: ohci0 ehci1+             10356112        161
irq11: fwohci0 ahc+               132133          2
irq12: psm0                           27          0
irq14: ata0                        96064          1
irq15: nfe0 ata1                   23350          0
Total                           82885524       1293
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*gw.catspoiler.o .GPS.            1 u  137  128  377    0.195    0.111   0.030

Tue Sep 28 14:53:27 PDT 2010
interrupt                          total       rate
irq0: clk                       64137854        999
irq1: atkbd0                          26          0
irq8: rtc                        8207648        127
irq9: acpi0                           19          0
irq10: ohci0 ehci1+             10360184        161
irq11: fwohci0 ahc+               132133          2
irq12: psm0                           27          0
irq14: ata0                        96154          1
irq15: nfe0 ata1                   23379          0
Total                           82957424       1293
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*gw.catspoiler.o .GPS.            1 u   56  128  377    0.195    0.111 853895.

Tue Sep 28 14:54:27 PDT 2010
interrupt                          total       rate
irq0: clk                       64197881        999
irq1: atkbd0                          26          0
irq8: rtc                        8215329        127
irq9: acpi0                           21          0
irq10: ohci0 ehci1+             10360777        161
irq11: fwohci0 ahc+               132133          2
irq12: psm0                           27          0
irq14: ata0                        96244          1
irq15: nfe0 ata1                   23405          0
Total                           83025843       1293
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*gw.catspoiler.o .GPS.            1 u  116  128  377    0.195    0.111 853895.

Tue Sep 28 14:55:27 PDT 2010
interrupt                          total       rate
irq0: clk                       64257907        999
irq1: atkbd0                          26          0
irq8: rtc                        8223011        127
irq9: acpi0                           21          0
irq10: ohci0 ehci1+             10360836        161
irq11: fwohci0 ahc+               132133          2
irq12: psm0                           27          0
irq14: ata0                        96334          1
irq15: nfe0 ata1                   23424          0
Total                           83093719       1292
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 gw.catspoiler.o .GPS.            1 u   48  128  377    0.197  2259195 2091608

Tue Sep 28 14:56:27 PDT 2010
interrupt                          total       rate
irq0: clk                       64317933        999
irq1: atkbd0                          26          0
irq8: rtc                        8230692        127
irq9: acpi0                           21          0
irq10: ohci0 ehci1+             10360857        161
irq11: fwohci0 ahc+               132133          2
irq12: psm0                           27          0
irq14: ata0                        96424          1
irq15: nfe0 ata1                   23448          0
Total                           83161561       1292
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 gw.catspoiler.o .GPS.            1 u  108  128  377    0.197  2259195 2091608

Tue Sep 28 14:57:27 PDT 2010
interrupt                          total       rate
irq0: clk                       64377960        999
irq1: atkbd0                          26          0
irq8: rtc                        8238374        127
irq9: acpi0                           21          0
irq10: ohci0 ehci1+             10360869        160
irq11: fwohci0 ahc+               132133          2
irq12: psm0                           27          0
irq14: ata0                        96514          1
irq15: nfe0 ata1                   23469          0
Total                           83229393       1292
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 gw.catspoiler.o .GPS.            1 u   39  128  377    0.176  2259195 1909368

Tue Sep 28 14:59:51 PDT 2010
interrupt                          total       rate
irq0: clk                       64521959        999
irq1: atkbd0                          26          0
irq8: rtc                        8256801        127
irq9: acpi0                           21          0
irq10: ohci0 ehci1+             10360941        160
irq11: fwohci0 ahc+               132133          2
irq12: psm0                           27          0
irq14: ata0                        96730          1
irq15: nfe0 ata1                   23641          0
Total                           83392279       1292
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 gw.catspoiler.o .GPS.            1 u   55  128  377    0.174  2259195 1707791

Tue Sep 28 15:00:51 PDT 2010
interrupt                          total       rate
irq0: clk                       64581986        999
irq1: atkbd0                          26          0
irq8: rtc                        8264482        127
irq9: acpi0                           21          0
irq10: ohci0 ehci1+             10361001        160
irq11: fwohci0 ahc+               132133          2
irq12: psm0                           27          0
irq14: ata0                        96820          1
irq15: nfe0 ata1                   23658          0
Total                           83460154       1292
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 gw.catspoiler.o .GPS.            1 u  115  128  377    0.174  2259195 1707791



Sep 28 15:16:06 scratch ntpd[1141]: time correction of 2259 seconds exceeds sanity limit (1000); set clock manually to the correct UTC time.



More information about the freebsd-stable mailing list