Commit 367705+367706 causes a pabic
Peter Blok
pblok at bsd4all.org
Fri Nov 20 11:02:16 UTC 2020
Hi Kristof,
This is 12-stable. With the previous bridge epochification that was backed out my config had a panic too.
I don’t have any local modifications. I did a clean rebuild after removing /usr/obj/usr
My kernel is custom - I only have zfs.ko, opensolaris.ko, vmm.ko and nmdm.ko as modules. Everything else is statically linked. I have removed all drivers not needed for the hardware at hand.
My bridge is between two vlans from the same trunk and the jail epair devices as well as the bhyve tap devices.
The panic happens when the jails are starting.
I can try to narrow it down over the weekend and make the crash dump available for analysis.
Previously I had the following crash with 363492
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address = 0xffffffff00000410
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80692326
stack pointer = 0x28:0xfffffe00c06097b0
frame pointer = 0x28:0xfffffe00c06097f0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 2030 (ifconfig)
trap number = 12
panic: page fault
cpuid = 2
time = 1595683412
KDB: stack backtrace:
#0 0xffffffff80698165 at kdb_backtrace+0x65
#1 0xffffffff8064d67b at vpanic+0x17b
#2 0xffffffff8064d4f3 at panic+0x43
#3 0xffffffff809cc311 at trap_fatal+0x391
#4 0xffffffff809cc36f at trap_pfault+0x4f
#5 0xffffffff809cb9b6 at trap+0x286
#6 0xffffffff809a5b28 at calltrap+0x8
#7 0xffffffff803677fd at ck_epoch_synchronize_wait+0x8d
#8 0xffffffff8069213a at epoch_wait_preempt+0xaa
#9 0xffffffff807615b7 at ipsec_ioctl+0x3a7
#10 0xffffffff8075274f at ifioctl+0x47f
#11 0xffffffff806b5ea7 at kern_ioctl+0x2b7
#12 0xffffffff806b5b4a at sys_ioctl+0xfa
#13 0xffffffff809ccec7 at amd64_syscall+0x387
#14 0xffffffff809a6450 at fast_syscall_common+0x101
> On 20 Nov 2020, at 11:30, Kristof Provost <kp at FreeBSD.org> wrote:
>
> On 20 Nov 2020, at 11:18, peter.blok at bsd4all.org <mailto:peter.blok at bsd4all.org> wrote:
>> I’m afraid the last Epoch fix for bridge is not solving the problem ( or perhaps creates a new ).
>>
> We’re talking about the stable/12 branch, right?
>
>> This seems to happen when the jail epair is added to the bridge.
>>
> There must be something more to it than that. I’ve run the bridge tests on stable/12 without issue, and this is a problem we didn’t see when the bridge epochification initially went into stable/12.
>
> Do you have a custom kernel config? Other patches? What exact commands do you run to trigger the panic?
>
>> kernel trap 12 with interrupts disabled
>>
>>
>> Fatal trap 12: page fault while in kernel mode
>> cpuid = 6; apic id = 06
>> fault virtual address = 0xc10
>> fault code = supervisor read data, page not present
>> instruction pointer = 0x20:0xffffffff80695e76
>> stack pointer = 0x28:0xfffffe00bf14e6e0
>> frame pointer = 0x28:0xfffffe00bf14e720
>> code segment = base 0x0, limit 0xfffff, type 0x1b
>> = DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags = resume, IOPL = 0
>> current process = 1686 (jail)
>> trap number = 12
>> panic: page fault
>> cpuid = 6
>> time = 1605811310
>> KDB: stack backtrace:
>> #0 0xffffffff8069bb85 at kdb_backtrace+0x65
>> #1 0xffffffff80650a4b at vpanic+0x17b
>> #2 0xffffffff806508c3 at panic+0x43
>> #3 0xffffffff809d0351 at trap_fatal+0x391
>> #4 0xffffffff809d03af at trap_pfault+0x4f
>> #5 0xffffffff809cf9f6 at trap+0x286
>> #6 0xffffffff809a98c8 at calltrap+0x8
>> #7 0xffffffff80368a8d at ck_epoch_synchronize_wait+0x8d
>> #8 0xffffffff80695c8a at epoch_wait_preempt+0xaa
>> #9 0xffffffff80757d40 at vnet_if_init+0x120
>> #10 0xffffffff8078c994 at vnet_alloc+0x114
>> #11 0xffffffff8061e3f7 at kern_jail_set+0x1bb7
>> #12 0xffffffff80620190 at sys_jail_set+0x40
>> #13 0xffffffff809d0f07 at amd64_syscall+0x387
>> #14 0xffffffff809aa1ee at fast_syscall_common+0xf8
>
> This panic is rather odd. This isn’t even the bridge code. This is during initial creation of the vnet. I don’t really see how this could even trigger panics.
> That panic looks as if something corrupted the net_epoch_preempt, by overwriting the epoch->e_epoch. The bridge patches only access this variable through the well-established functions and macros. I see no obvious way that they could corrupt it.
>
> Best regards,
> Kristof
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2348 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20201120/a6bbd70e/attachment.bin>
More information about the freebsd-stable
mailing list