Re: Panic: Page Fault in Kernel: Yesterday's CURRENT

From: Michael Butler via freebsd-current <freebsd-current_at_freebsd.org>
Date: Tue, 21 Dec 2021 17:01:50 UTC
I have an old pentium-3 that also won't boot kernels built after Dec 6th.

I suspect the commits listed below but, with the device being remote and 
having no DRAC, I'm struggling to test this theory.

The relevant commits ..

commit 553af8f1ec71d397b5b4fd5876622b9269936e63
Author: Mark Johnston <markj@FreeBSD.org>
Date:   Mon Dec 6 10:42:19 2021 -0500

     x86: Perform late TSC calibration before LAPIC timer calibration

commit 62d09b46ad7508ae74d462e49234f0a80f91ff69
Author: Mark Johnston <markj@FreeBSD.org>
Date:   Mon Dec 6 10:42:10 2021 -0500

     x86: Defer LAPIC calibration until after timecounters are available

It's currently running git rev e43d081f352 and I have a kernel at git 
rev f06f1d1fdb969fa7a0a6eefa030d8536f365eb6e to test later this evening,

	Michael


On 12/17/21 15:07, Larry Rosenman wrote:
> On 12/17/2021 1:36 pm, Mark Johnston wrote:
>> On Fri, Dec 10, 2021 at 10:43:19AM -0600, Larry Rosenman wrote:
>>> 14-2021_12_07-1217             -      -          1.87G 2021-12-07 12:17
>>> 14-2021_12_09-1957             NR     /          121G  2021-12-09 19:57
>>>
>>> If that's any help
>>
>> I can't tell what this is saying.  A kernel built on the 7th does not
>> crash, or...?  Which revision did you update from before you started
>> seeing crashes?
>>
>> From a kgdb session it'd be useful to see output from
>>
>> (kgdb) frame 8
>> (kgdb) p/x *tmp
>>
>> to start.
>>
> 
> Correct, the 7th didn't panic, but the 9th did, and yesterday's too.
> 
> Grrr
> ler in borg in /mnt🔒 on ☁️  (us-east-1)
> ❯ kgdb -c /var/crash/vmcore.0  /mnt/boot/kernel/kernel
> GNU gdb (GDB) 11.1 [GDB v11.1 for FreeBSD]
> Copyright (C) 2021 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-portbld-freebsd14.0".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <https://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
>      <http://www.gnu.org/software/gdb/documentation/>.
> 
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from /mnt/boot/kernel/kernel...
> (No debugging symbols found in /mnt/boot/kernel/kernel)
> Failed to open vmcore: /var/crash/vmcore.0: Permission denied
> (kgdb) bt
> No stack.
> quitb)
> 
> ler in borg in /mnt🔒 on ☁️  (us-east-1) took 6s
> ❯ sudo chmod +r /var/crash/*
> 
> ler in borg in /mnt🔒 on ☁️  (us-east-1)
> ❯ kgdb -c /var/crash/vmcore.0  /mnt/boot/kernel/kernel
> GNU gdb (GDB) 11.1 [GDB v11.1 for FreeBSD]
> Copyright (C) 2021 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-portbld-freebsd14.0".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <https://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
>      <http://www.gnu.org/software/gdb/documentation/>.
> 
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from /mnt/boot/kernel/kernel...
> (No debugging symbols found in /mnt/boot/kernel/kernel)
> /wrkdirs/usr/ports/devel/gdb/work-py37/gdb-11.1/gdb/thread.c:1345: 
> internal-error: void switch_to_thread(thread_info *): Assertion `thr != 
> NULL' failed.
> A problem internal to GDB has been detected,
> further debugging may prove unreliable.
> Quit this debugging session? (y or n) n
> 
> This is a bug, please report it.  For instructions, see:
> <https://www.gnu.org/software/gdb/bugs/>.
> 
> /wrkdirs/usr/ports/devel/gdb/work-py37/gdb-11.1/gdb/thread.c:1345: 
> internal-error: void switch_to_thread(thread_info *): Assertion `thr != 
> NULL' failed.
> A problem internal to GDB has been detected,
> further debugging may prove unreliable.
> Create a core file of GDB? (y or n) n
> Command aborted.
> (kgdb) bt
> No thread selected.
> (kgdb) fr 8
> No thread selected.
> (kgdb)
> 
>>> On 12/10/2021 10:36 am, Alexander Motin wrote:
>>> > Hi Larry,
>>> >
>>> > This looks like some use-after-free or otherwise corrupted callout
>>> > structure.  Unfortunately the backtrace does not tell what was the
>>> > callout.  When was the previous update to look what could change?
>>> >
>>> > On 10.12.2021 11:24, Larry Rosenman wrote:
>>> >> FreeBSD borg.lerctr.org 14.0-CURRENT FreeBSD 14.0-CURRENT #15
>>> >> main-n251537-ab639f2398b: Thu Dec  9 19:45:37 CST 2021
>>> >> root@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL
>>> >> amd64
>>> >>
>>> >> VMCORE *IS* available.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> Unread portion of the kernel message buffer:
>>> >> kernel trap 12 with interrupts disabled
>>> >>
>>> >>
>>> >> Fatal trap 12: page fault while in kernel mode
>>> >> cpuid = 0; apic id = 20
>>> >> fault virtual address   = 0x0
>>> >> fault code              = supervisor write data, page not present
>>> >> instruction pointer     = 0x20:0xffffffff804e0db4
>>> >> stack pointer           = 0x0:0xfffffe0434de4e10
>>> >> frame pointer           = 0x0:0xfffffe0434de4e70
>>> >> code segment            = base 0x0, limit 0xfffff, type 0x1b
>>> >>                         = DPL 0, pres 1, long 1, def32 0, gran 1
>>> >> processor eflags        = resume, IOPL = 0
>>> >> current process         = 82990 (c++)
>>> >> trap number             = 12
>>> >> panic: page fault
>>> >> cpuid = 0
>>> >> time = 1639111198
>>> >> KDB: stack backtrace:
>>> >> #0 0xffffffff8050fc95 at kdb_backtrace+0x65
>>> >> #1 0xffffffff804c468f at vpanic+0x17f
>>> >> #2 0xffffffff804c4503 at panic+0x43
>>> >> #3 0xffffffff807a2195 at trap_fatal+0x385
>>> >> #4 0xffffffff807a21ef at trap_pfault+0x4f
>>> >> #5 0xffffffff80779c78 at calltrap+0x8
>>> >> #6 0xffffffff8045ddb8 at handleevents+0x188
>>> >> #7 0xffffffff8045ea3e at timercb+0x24e
>>> >> #8 0xffffffff807ca9eb at lapic_handle_timer+0x9b
>>> >> #9 0xffffffff8077b9b1 at Xtimerint+0xb1
>>> >> Uptime: 2h28m57s
>>> >> Dumping 12829 out of 131023
>>> >> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
>>> >>
>>> >> __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
>>> >> 55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"
>>> >> (offsetof(struct pcpu,
>>> >> (kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
>>> >> #1  doadump (textdump=<optimized out>)
>>> >>     at /usr/src/sys/kern/kern_shutdown.c:399
>>> >> #2  0xffffffff804c428c in kern_reboot (howto=260)
>>> >>     at /usr/src/sys/kern/kern_shutdown.c:487
>>> >> #3  0xffffffff804c46fe in vpanic (fmt=0xffffffff807e1276 "%s",
>>> >>     ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:920
>>> >> #4  0xffffffff804c4503 in panic (fmt=<unavailable>)
>>> >>     at /usr/src/sys/kern/kern_shutdown.c:844
>>> >> #5  0xffffffff807a2195 in trap_fatal (frame=0xfffffe0434de4d50, 
>>> eva=0)
>>> >>     at /usr/src/sys/amd64/amd64/trap.c:946
>>> >> #6  0xffffffff807a21ef in trap_pfault (frame=0xfffffe0434de4d50,
>>> >>     usermode=false, signo=<optimized out>, ucode=<optimized out>)
>>> >>     at /usr/src/sys/amd64/amd64/trap.c:765
>>> >> #7  <signal handler called>
>>> >> #8  0xffffffff804e0db4 in callout_process
>>> >> (now=now@entry=38385536922300)
>>> >>     at /usr/src/sys/kern/kern_timeout.c:488
>>> >> #9  0xffffffff8045ddb8 in handleevents (now=now@entry=38385536922300,
>>> >>     fake=fake@entry=0) at /usr/src/sys/kern/kern_clocksource.c:213
>>> >> #10 0xffffffff8045ea3e in timercb (et=0xffffffff80d475e0 <lapic_et>,
>>> >>     arg=<optimized out>) at /usr/src/sys/kern/kern_clocksource.c:357
>>> >> #11 0xffffffff807ca9eb in lapic_handle_timer
>>> >> (frame=0xfffffe0434de4f40)
>>> >>     at /usr/src/sys/x86/x86/local_apic.c:1364
>>> >> #12 <signal handler called>
>>> >> #13 0x000000080df42bb6 in ?? ()
>>> >> Backtrace stopped: Cannot access memory at address 0x7ffffdef2c90
>>> >> (kgdb)
>