Re: Panic: Page Fault in Kernel: Yesterday's CURRENT

From: Larry Rosenman <ler_at_lerctr.org>
Date: Fri, 17 Dec 2021 03:16:21 UTC
On 12/16/2021 9:03 pm, Larry Rosenman wrote:
> On 12/10/2021 10:43 am, Larry Rosenman wrote:
>> 14-2021_12_07-1217             -      -          1.87G 2021-12-07 
>> 12:17
>> 14-2021_12_09-1957             NR     /          121G  2021-12-09 
>> 19:57
>> 
>> If that's any help
>> 
>> On 12/10/2021 10:36 am, Alexander Motin wrote:
>>> Hi Larry,
>>> 
>>> This looks like some use-after-free or otherwise corrupted callout
>>> structure.  Unfortunately the backtrace does not tell what was the
>>> callout.  When was the previous update to look what could change?
>>> 
>>> On 10.12.2021 11:24, Larry Rosenman wrote:
>>>> FreeBSD borg.lerctr.org 14.0-CURRENT FreeBSD 14.0-CURRENT #15
>>>> main-n251537-ab639f2398b: Thu Dec  9 19:45:37 CST 2021    
>>>> root@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL  
>>>> amd64
>>>> 
>>>> VMCORE *IS* available.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Unread portion of the kernel message buffer:
>>>> kernel trap 12 with interrupts disabled
>>>> 
>>>> 
>>>> Fatal trap 12: page fault while in kernel mode
>>>> cpuid = 0; apic id = 20
>>>> fault virtual address   = 0x0
>>>> fault code              = supervisor write data, page not present
>>>> instruction pointer     = 0x20:0xffffffff804e0db4
>>>> stack pointer           = 0x0:0xfffffe0434de4e10
>>>> frame pointer           = 0x0:0xfffffe0434de4e70
>>>> code segment            = base 0x0, limit 0xfffff, type 0x1b
>>>>                         = DPL 0, pres 1, long 1, def32 0, gran 1
>>>> processor eflags        = resume, IOPL = 0
>>>> current process         = 82990 (c++)
>>>> trap number             = 12
>>>> panic: page fault
>>>> cpuid = 0
>>>> time = 1639111198
>>>> KDB: stack backtrace:
>>>> #0 0xffffffff8050fc95 at kdb_backtrace+0x65
>>>> #1 0xffffffff804c468f at vpanic+0x17f
>>>> #2 0xffffffff804c4503 at panic+0x43
>>>> #3 0xffffffff807a2195 at trap_fatal+0x385
>>>> #4 0xffffffff807a21ef at trap_pfault+0x4f
>>>> #5 0xffffffff80779c78 at calltrap+0x8
>>>> #6 0xffffffff8045ddb8 at handleevents+0x188
>>>> #7 0xffffffff8045ea3e at timercb+0x24e
>>>> #8 0xffffffff807ca9eb at lapic_handle_timer+0x9b
>>>> #9 0xffffffff8077b9b1 at Xtimerint+0xb1
>>>> Uptime: 2h28m57s
>>>> Dumping 12829 out of 131023
>>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
>>>> 
>>>> __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
>>>> 55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"
>>>> (offsetof(struct pcpu,
>>>> (kgdb) #0  __curthread () at 
>>>> /usr/src/sys/amd64/include/pcpu_aux.h:55
>>>> #1  doadump (textdump=<optimized out>)
>>>>     at /usr/src/sys/kern/kern_shutdown.c:399
>>>> #2  0xffffffff804c428c in kern_reboot (howto=260)
>>>>     at /usr/src/sys/kern/kern_shutdown.c:487
>>>> #3  0xffffffff804c46fe in vpanic (fmt=0xffffffff807e1276 "%s",
>>>>     ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:920
>>>> #4  0xffffffff804c4503 in panic (fmt=<unavailable>)
>>>>     at /usr/src/sys/kern/kern_shutdown.c:844
>>>> #5  0xffffffff807a2195 in trap_fatal (frame=0xfffffe0434de4d50, 
>>>> eva=0)
>>>>     at /usr/src/sys/amd64/amd64/trap.c:946
>>>> #6  0xffffffff807a21ef in trap_pfault (frame=0xfffffe0434de4d50,
>>>>     usermode=false, signo=<optimized out>, ucode=<optimized out>)
>>>>     at /usr/src/sys/amd64/amd64/trap.c:765
>>>> #7  <signal handler called>
>>>> #8  0xffffffff804e0db4 in callout_process 
>>>> (now=now@entry=38385536922300)
>>>>     at /usr/src/sys/kern/kern_timeout.c:488
>>>> #9  0xffffffff8045ddb8 in handleevents 
>>>> (now=now@entry=38385536922300,
>>>>     fake=fake@entry=0) at /usr/src/sys/kern/kern_clocksource.c:213
>>>> #10 0xffffffff8045ea3e in timercb (et=0xffffffff80d475e0 <lapic_et>,
>>>>     arg=<optimized out>) at /usr/src/sys/kern/kern_clocksource.c:357
>>>> #11 0xffffffff807ca9eb in lapic_handle_timer 
>>>> (frame=0xfffffe0434de4f40)
>>>>     at /usr/src/sys/x86/x86/local_apic.c:1364
>>>> #12 <signal handler called>
>>>> #13 0x000000080df42bb6 in ?? ()
>>>> Backtrace stopped: Cannot access memory at address 0x7ffffdef2c90
>>>> (kgdb)
>>>> 
>>>> ------------------------------------------------------------------------
>>>> 
> '
> 
> I got a new crash on a today's current:
> ❯ more core.txt.1
> borg.lerctr.org dumped core - see /var/crash/vmcore.1
> 
> Thu Dec 16 17:01:37 CST 2021
> 
> FreeBSD borg.lerctr.org 14.0-CURRENT FreeBSD 14.0-CURRENT #22
> main-n251748-c610426c4de: Thu Dec 16 09:22:52 CST 2021
> root@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL
> amd64
> 
> panic: page fault
> 
> GNU gdb (GDB) 11.1 [GDB v11.1 for FreeBSD]
> Copyright (C) 2021 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-portbld-freebsd14.0".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <https://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
>     <http://www.gnu.org/software/gdb/documentation/>.
> 
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from /boot/kernel/kernel...
> Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...
> 
> Unread portion of the kernel message buffer:
> kernel trap 12 with interrupts disabled
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 20
> fault virtual address   = 0x0
> fault code              = supervisor write data, page not present
> instruction pointer     = 0x20:0xffffffff804e0a34
> stack pointer           = 0x0:0xfffffe03441a0e10
> frame pointer           = 0x0:0xfffffe03441a0e70
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = resume, IOPL = 0
> current process         = 86479 (c++)
> trap number             = 12
> panic: page fault
> cpuid = 0
> time = 1639694532
> KDB: stack backtrace:
> #0 0xffffffff8050f9e5 at kdb_backtrace+0x65
> #1 0xffffffff804c430f at vpanic+0x17f
> #2 0xffffffff804c4183 at panic+0x43
> #3 0xffffffff807a2195 at trap_fatal+0x385
> #4 0xffffffff807a21ef at trap_pfault+0x4f
> #5 0xffffffff80779728 at calltrap+0x8
> #6 0xffffffff8045da08 at handleevents+0x188
> #7 0xffffffff8045e68e at timercb+0x24e
> #8 0xffffffff807ca9eb at lapic_handle_timer+0x9b
> #9 0xffffffff8077b461 at Xtimerint+0xb1
> Uptime: 7h7m44s
> Dumping 13614 out of 131023 
> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> 
> __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> 55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"
> (offsetof(struct pcpu,
> (kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> #1  doadump (textdump=<optimized out>)
>     at /usr/src/sys/kern/kern_shutdown.c:399
> #2  0xffffffff804c3f0c in kern_reboot (howto=260)
>     at /usr/src/sys/kern/kern_shutdown.c:487
> #3  0xffffffff804c437e in vpanic (fmt=0xffffffff807e1291 "%s",
>     ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:920
> #4  0xffffffff804c4183 in panic (fmt=<unavailable>)
>     at /usr/src/sys/kern/kern_shutdown.c:844
> #5  0xffffffff807a2195 in trap_fatal (frame=0xfffffe03441a0d50, eva=0)
>     at /usr/src/sys/amd64/amd64/trap.c:946
> #6  0xffffffff807a21ef in trap_pfault (frame=0xfffffe03441a0d50,
>     usermode=false, signo=<optimized out>, ucode=<optimized out>)
>     at /usr/src/sys/amd64/amd64/trap.c:765
> #7  <signal handler called>
> #8  0xffffffff804e0a34 in callout_process 
> (now=now@entry=110228055503582)
>     at /usr/src/sys/kern/kern_timeout.c:488
> #9  0xffffffff8045da08 in handleevents (now=now@entry=110228055503582,
>     fake=fake@entry=0) at /usr/src/sys/kern/kern_clocksource.c:213
> #10 0xffffffff8045e68e in timercb (et=0xffffffff80d47660 <lapic_et>,
>     arg=<optimized out>) at /usr/src/sys/kern/kern_clocksource.c:357
> #11 0xffffffff807ca9eb in lapic_handle_timer (frame=0xfffffe03441a0f40)
>     at /usr/src/sys/x86/x86/local_apic.c:1364
> #12 <signal handler called>
> #13 0x0000000003883679 in ?? ()
> Backtrace stopped: Cannot access memory at address 0x7fffffff6f20
> (kgdb)
> 
> 
> Core is ALSO available
> 
> Boot Env:
> ❯ bectl list
> BE                             Active Mountpoint Space Created
> 14-2021-10-26_1554             -      -          1.82G 2021-10-26 15:54
> 14-2021-11-03-1800             -      -          162M  2021-11-03 18:00
> 14-2021_10_19-1900             -      -          1.80G 2021-10-19 18:57
> 14-2021_10_20-0800             -      -          1.94G 2021-10-20 08:01
> 14-2021_11_18-1241             -      -          1.86G 2021-11-18 11:41
> 14-2021_11_20-1417             -      -          1.85G 2021-11-20 13:17
> 14-2021_11_23-1111             -      -          1.87G 2021-11-23 11:11
> 14-2021_11_25-1312             -      -          1.87G 2021-11-25 12:12
> 14-2021_12_04-2220             -      -          13.7M 2021-12-04 22:20
> 14-2021_12_07-1217             -      -          1.87G 2021-12-07 12:17
> 14-2021_12_09-1957             -      -          1.89G 2021-12-09 19:57
> 14-2021_12_14-0224             -      -          1.82G 2021-12-14 02:24
> 14-2021_12_15-0923             -      -          18.6M 2021-12-15 09:23
> 14-2021_12_15-2133             -      -          1.83G 2021-12-15 21:33
> 14-2021_12_15-2257             -      -          1.82G 2021-12-15 22:57
> 14-2021_12_16-0924             NR     /          129G  2021-12-16 09:24
> 14-main-first                  -      -          2.69G 2021-10-02 20:11
> 14.0-CURRENT-2021-10-04_1051   -      -          16.6M 2021-10-04 10:51
> 14.0-CURRENT_2021-10-06_184540 -      -          12.9M 2021-10-06 18:46
> 14.0-CURRENT_2021-11-04_091349 -      -          17.7M 2021-11-04 09:13
> 14.0-CURRENT_2021-12-05_204803 -      -          10.6M 2021-12-05 20:48
> r363086                        -      -          4.19G 2020-07-10 15:37
> 
> what else can I supply?

FTR Both crashes were during LONG poudriere runs rebuilding all 800+ 
ports I use
after a FreeBSD_Version bump.

-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 214-642-9640                 E-Mail: ler@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106