CPU time accounting broken on 8-STABLE machine after a few
hours of uptime
Don Lewis
truckman at FreeBSD.org
Fri Oct 1 03:15:38 UTC 2010
On 30 Sep, Andriy Gapon wrote:
> on 30/09/2010 02:27 Don Lewis said the following:
>> I tried enabling apic and got worse results. I saw ping RTTs as high as
>> 67 seconds. Here's the timer info with apic enabled:
[snip]
>> Here's the verbose boot info with apic:
>> <http://people.freebsd.org/~truckman/AN-M2_HD-8.1-STABLE-apic-verbose.txt>
>
> vmstat -i ?
Here's the vmstat -i output at the time the machine starts experiencing
freezes and ntp goes insane:
Thu Sep 30 11:38:57 PDT 2010
interrupt total rate
irq1: atkbd0 6 0
irq9: acpi0 10 0
irq12: psm0 18 0
irq14: ata0 2845 1
irq17: ahc0 310 0
irq19: fwohci0 1 0
irq22: ehci0+ 74628 40
cpu0: timer 3676399 1999
irq256: nfe0 3915 2
Total 3758132 2043
remote refid st t when poll reach delay offset jitter
==============================================================================
*gw.catspoiler.o .GPS. 1 u 129 128 377 0.185 -0.307 0.020
Thu Sep 30 11:39:59 PDT 2010
interrupt total rate
irq1: atkbd0 6 0
irq9: acpi0 10 0
irq12: psm0 18 0
irq14: ata0 2935 1
irq17: ahc0 310 0
irq19: fwohci0 1 0
irq22: ehci0+ 78954 41
cpu0: timer 3796447 1998
irq256: nfe0 4090 2
Total 3882771 2043
remote refid st t when poll reach delay offset jitter
==============================================================================
*gw.catspoiler.o .GPS. 1 u 61 128 377 0.185 -0.307 0.023
Thu Sep 30 11:40:59 PDT 2010
interrupt total rate
irq1: atkbd0 6 0
irq9: acpi0 10 0
irq12: psm0 18 0
irq14: ata0 3025 1
irq17: ahc0 310 0
irq19: fwohci0 1 0
irq22: ehci0+ 85038 43
cpu0: timer 3916483 1998
irq256: nfe0 4247 2
Total 4009138 2045
remote refid st t when poll reach delay offset jitter
==============================================================================
*gw.catspoiler.o .GPS. 1 u 121 128 377 0.185 -0.307 0.023
Thu Sep 30 11:41:59 PDT 2010
interrupt total rate
irq1: atkbd0 6 0
irq9: acpi0 10 0
irq12: psm0 18 0
irq14: ata0 3115 1
irq17: ahc0 310 0
irq19: fwohci0 1 0
irq22: ehci0+ 89099 44
cpu0: timer 4036529 1998
irq256: nfe0 4384 2
Total 4133472 2046
remote refid st t when poll reach delay offset jitter
==============================================================================
*gw.catspoiler.o .GPS. 1 u 54 128 377 0.185 -0.307 43008.9
Thu Sep 30 11:42:59 PDT 2010
interrupt total rate
irq1: atkbd0 6 0
irq9: acpi0 11 0
irq12: psm0 18 0
irq14: ata0 3205 1
irq17: ahc0 310 0
irq19: fwohci0 1 0
irq22: ehci0+ 92111 44
cpu0: timer 4156575 1998
irq256: nfe0 4421 2
Total 4256658 2046
remote refid st t when poll reach delay offset jitter
==============================================================================
*gw.catspoiler.o .GPS. 1 u 114 128 377 0.185 -0.307 43008.9
Thu Sep 30 11:43:59 PDT 2010
interrupt total rate
irq1: atkbd0 6 0
irq9: acpi0 12 0
irq12: psm0 18 0
irq14: ata0 3295 1
irq17: ahc0 310 0
irq19: fwohci0 1 0
irq22: ehci0+ 92132 43
cpu0: timer 4276621 1998
irq256: nfe0 4444 2
Total 4376839 2045
remote refid st t when poll reach delay offset jitter
==============================================================================
gw.catspoiler.o .GPS. 1 u 44 128 377 0.177 113790. 105350.
I also hacked a kernel compiled with SMP so that it only finds one CPU
core. The machine still freezes and causes long ping RTTs and ntp
insanity.
BTW, I first tried the above test by disabling the second core using the
machdep.hlt_cpus sysctl knob. The results were most entertaining. When
I tried to run "make index", it would hang early on. It looked like
SCHED_ULE was trying to schedule the processes at both ends of a pipe on
both CPU cores even though it should have only been trying to use one
core. The downstream process would then wait forever in piperd.
More information about the freebsd-stable
mailing list