page fault while in kernel mode - after upgrade from 12.2 to 13.0

Mon May 3 19:45:54 UTC 2021

On Mon, May 03, 2021 at 08:04:30PM +0200, Michael Schmiedgen wrote:
> Hi List,
> 
> if I start a Samba jail, after a few seconds the system crashes. Very reproducible.
> 
> System has ~10 jails and 3 bhyve VMs. Dell server, Xeon E3-1240, 64GB RAM, 3 way mirror ZFS.
> 
> It also occurs a few seconds after I start a phone call using the SIP VM of that machine,
> very strange.
> 
> I got some log messages suggesting raising somaxconn, so I did
> 
> kern.ipc.somaxconn=4096
> 
> in sysctl.conf
> 
> 
> Below some debug information, please let me know if I should provide further information.
> 
> Should I open a bug or something?
> 
> Thank you very much!
>    Michael
> 
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x0
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x20:0xffffffff80ca52c0
> stack pointer           = 0x28:0xfffffe019d039650
> frame pointer           = 0x28:0xfffffe019d039690
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                          = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 649 (devd)
> trap number             = 12
> panic: page fault
> cpuid = 0
> time = 1620061253
> KDB: stack backtrace:
> #0 0xffffffff80c57345 at kdb_backtrace+0x65
> #1 0xffffffff80c09d21 at vpanic+0x181
> #2 0xffffffff80c09b93 at panic+0x43
> #3 0xffffffff8108b187 at trap_fatal+0x387
> #4 0xffffffff8108b1df at trap_pfault+0x4f
> #5 0xffffffff8108a83d at trap+0x27d
> #6 0xffffffff810617a8 at calltrap+0x8
> #7 0xffffffff80ca51c3 at sbappendaddr_locked+0x93
> #8 0xffffffff80cb437a at uipc_send+0x73a
> #9 0xffffffff80ca9053 at sosend_generic+0x633
> #10 0xffffffff80ca94e0 at sosend+0x50
> #11 0xffffffff80caff2e at kern_sendit+0x20e
> #12 0xffffffff80cb032b at sendit+0x1db
> #13 0xffffffff80cb013d at sys_sendto+0x4d
> #14 0xffffffff8108ba8c at amd64_syscall+0x10c
> #15 0xffffffff810620ce at fast_syscall_common+0xf8
> Uptime: 2m2s
> Dumping 2373 out of 65454 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> 
> __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> 55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
> (kgdb) list *0xffffffff80ca52c0
> 0xffffffff80ca52c0 is in sbappendaddr_locked_internal (/usr/src/sys/kern/uipc_sockbuf.c:1169).
> 1164            if (ctrl_last)
> 1165                    ctrl_last->m_next = m0; /* concatenate data to control */
> 1166            else
> 1167                    control = m0;
> 1168            m->m_next = control;
> 1169            for (n = m; n->m_next != NULL; n = n->m_next)
> 1170                    sballoc(sb, n);
> 1171            sballoc(sb, n);
> 1172            nlast = n;
> 1173            SBLINKRECORD(sb, m);

So we are crashing because "n" is somehow NULL?  That seems difficult to
explain.  Can you show the local variables in this frame?

Does the panic always have the same stack trace?