RELENG_8 pf stack issue (state count spiraling out of control)
Daniel Hartmeier
daniel at benzedrine.cx
Tue May 3 09:22:59 UTC 2011
On Mon, May 02, 2011 at 06:58:54PM -0700, Jeremy Chadwick wrote:
> Here's one piece of core.0.txt which makes no sense to me -- the "rate"
> column. I have a very hard time believing that was the interrupt rate
> of all the relevant devices at the time (way too high). Maybe this data
> becomes wrong only during a coredump? The total column I could believe.
>
> ------------------------------------------------------------------------
> vmstat -i
>
> interrupt total rate
> irq4: uart0 54768 912
> irq6: fdc0 1 0
> irq17: uhci1+ 172 2
> irq23: uhci3 ehci1+ 2367 39
> cpu0: timer 13183882632 219731377
> irq256: em0 260491055 4341517
> irq257: em1 127555036 2125917
> irq258: ahci0 225923164 3765386
> cpu2: timer 13183881837 219731363
> cpu1: timer 13002196469 216703274
> cpu3: timer 13183881783 219731363
> Total 53167869284 886131154
> ------------------------------------------------------------------------
>
> Here's what a normal "vmstat -i" shows from the command-line:
>
> # vmstat -i
> interrupt total rate
> irq4: uart0 518 0
> irq6: fdc0 1 0
> irq23: uhci3 ehci1+ 145 0
> cpu0: timer 19041199 1999
> irq256: em0 614280 64
> irq257: em1 168529 17
> irq258: ahci0 355536 37
> cpu2: timer 19040462 1999
> cpu1: timer 19040458 1999
> cpu3: timer 19040454 1999
> Total 77301582 8119
The cpu0-3 timer totals seem consistent in the first output:
13183881783/1999/60/60/24 matches 76 days of uptime.
The high rate in the first output comes from vmstat.c dointr()'s
division of the total by the uptime:
struct timespec sp;
clock_gettime(CLOCK_MONOTONIC, &sp);
uptime = sp.tv_sec;
for (i = 0; i < nintr; i++) {
printf("%-*s %20lu %10lu\n", istrnamlen, intrname,
*intrcnt, *intrcnt / uptime);
}
>From this we can deduce that the value of uptime must have been
13183881783/219731363 = 60 (seconds).
Since the uptime was 76 days (and not just 60 seconds), the
CLOCK_MONOTONIC clock must have reset, wrapped, or been overwritten.
I don't know how that's possible, but if this means that the kernel
variable time_second was possibly going back, that could very well
have messed up pf's state purging.
Daniel
More information about the freebsd-stable
mailing list