CPU time accounting broken on 8-STABLE machine after a few hours of uptime

Don Lewis truckman at FreeBSD.org
Fri Oct 1 03:15:38 UTC 2010


On 30 Sep, Andriy Gapon wrote:
> on 30/09/2010 02:27 Don Lewis said the following:

>> I tried enabling apic and got worse results.  I saw ping RTTs as high as
>> 67 seconds.  Here's the timer info with apic enabled:
[snip]
>> Here's the verbose boot info with apic:
>> <http://people.freebsd.org/~truckman/AN-M2_HD-8.1-STABLE-apic-verbose.txt>
> 
> vmstat -i ?

Here's the vmstat -i output at the time the machine starts experiencing
freezes and ntp goes insane:

Thu Sep 30 11:38:57 PDT 2010
interrupt                          total       rate
irq1: atkbd0                           6          0
irq9: acpi0                           10          0
irq12: psm0                           18          0
irq14: ata0                         2845          1
irq17: ahc0                          310          0
irq19: fwohci0                         1          0
irq22: ehci0+                      74628         40
cpu0: timer                      3676399       1999
irq256: nfe0                        3915          2
Total                            3758132       2043
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*gw.catspoiler.o .GPS.            1 u  129  128  377    0.185   -0.307   0.020

Thu Sep 30 11:39:59 PDT 2010
interrupt                          total       rate
irq1: atkbd0                           6          0
irq9: acpi0                           10          0
irq12: psm0                           18          0
irq14: ata0                         2935          1
irq17: ahc0                          310          0
irq19: fwohci0                         1          0
irq22: ehci0+                      78954         41
cpu0: timer                      3796447       1998
irq256: nfe0                        4090          2
Total                            3882771       2043
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*gw.catspoiler.o .GPS.            1 u   61  128  377    0.185   -0.307   0.023

Thu Sep 30 11:40:59 PDT 2010
interrupt                          total       rate
irq1: atkbd0                           6          0
irq9: acpi0                           10          0
irq12: psm0                           18          0
irq14: ata0                         3025          1
irq17: ahc0                          310          0
irq19: fwohci0                         1          0
irq22: ehci0+                      85038         43
cpu0: timer                      3916483       1998
irq256: nfe0                        4247          2
Total                            4009138       2045
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*gw.catspoiler.o .GPS.            1 u  121  128  377    0.185   -0.307   0.023

Thu Sep 30 11:41:59 PDT 2010
interrupt                          total       rate
irq1: atkbd0                           6          0
irq9: acpi0                           10          0
irq12: psm0                           18          0
irq14: ata0                         3115          1
irq17: ahc0                          310          0
irq19: fwohci0                         1          0
irq22: ehci0+                      89099         44
cpu0: timer                      4036529       1998
irq256: nfe0                        4384          2
Total                            4133472       2046
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*gw.catspoiler.o .GPS.            1 u   54  128  377    0.185   -0.307 43008.9

Thu Sep 30 11:42:59 PDT 2010
interrupt                          total       rate
irq1: atkbd0                           6          0
irq9: acpi0                           11          0
irq12: psm0                           18          0
irq14: ata0                         3205          1
irq17: ahc0                          310          0
irq19: fwohci0                         1          0
irq22: ehci0+                      92111         44
cpu0: timer                      4156575       1998
irq256: nfe0                        4421          2
Total                            4256658       2046
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*gw.catspoiler.o .GPS.            1 u  114  128  377    0.185   -0.307 43008.9

Thu Sep 30 11:43:59 PDT 2010
interrupt                          total       rate
irq1: atkbd0                           6          0
irq9: acpi0                           12          0
irq12: psm0                           18          0
irq14: ata0                         3295          1
irq17: ahc0                          310          0
irq19: fwohci0                         1          0
irq22: ehci0+                      92132         43
cpu0: timer                      4276621       1998
irq256: nfe0                        4444          2
Total                            4376839       2045
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 gw.catspoiler.o .GPS.            1 u   44  128  377    0.177  113790. 105350.



I also hacked a kernel compiled with SMP so that it only finds one CPU
core.  The machine still freezes and causes long ping RTTs and ntp
insanity.

BTW, I first tried the above test by disabling the second core using the
machdep.hlt_cpus sysctl knob.  The results were most entertaining.  When
I tried to run "make index", it would hang early on.  It looked like
SCHED_ULE was trying to schedule the processes at both ends of a pipe on
both CPU cores even though it should have only been trying to use one
core.  The downstream process would then wait forever in piperd.



More information about the freebsd-stable mailing list