[Bug 254474] mlx4 causes kernel panic at boot if compiled into the kernel

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Mon Mar 22 03:02:50 UTC 2021


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254474

            Bug ID: 254474
           Summary: mlx4 causes kernel panic at boot if compiled into the
                    kernel
           Product: Base System
           Version: 13.0-STABLE
          Hardware: amd64
                OS: Any
            Status: New
          Keywords: panic
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs at FreeBSD.org
          Reporter: matsuo.hiroshi.39 at gmail.com

In order to try 13-RC3 on my box with Mellanox ConnectX-2 card,
I checked out 13.0 branch and made a KERNCONF file from GENERIC
added and removed a few lines with reference to FreeBSD Infiniband Wiki.

This kernel ran into panic at boot time. On the other hand I have confirmed 
that both
  13-RC3 GENERIC kernel (and mlx4 drivers compiled as module)
  12.2 custom kernel and mlx4 drivers not as module
work correctly.

I don't know why mlx4 drivers compiled into 13.0 kernel causes panic.




---<<BOOT>>---
Copyright (c) 1992-2021 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 13.0-RC3 #1: Mon Mar 22 18:37:58 JST 2021
    matsuo at build:/usr/obj/usr/src/amd64.amd64/sys/MICROSERVER-PR amd64
FreeBSD clang version 11.0.1 (git at github.com:llvm/llvm-project.git
llvmorg-11.0.1-0-g43ff75f2c3fe)
VT(vga): resolution 640x480
CPU: AMD Turion(tm) II Neo N54L Dual-Core Processor (2196.39-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x100f63  Family=0x10  Model=0x6  Stepping=3
 
Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x802009<SSE3,MON,CX16,POPCNT>
  AMD
Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
  AMD
Features2=0x837ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT,NodeId>
  SVM: NP,NRIP,NAsids=64
  TSC: P-state invariant
real memory  = 8589934592 (8192 MB)
avail memory = 8249397248 (7867 MB)
Event timer "LAPIC" quality 100
ACPI APIC Table: <HP     ProLiant>
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)
random: unblocking device.
Firmware Warning (ACPI): 32/64X length mismatch in FADT/Gpe0Block: 64/32
(20201113/tbfadt-748)
ioapic0 <Version 2.1> irqs 0-23
Launching APs: 1
Timecounter "TSC-low" frequency 1098192980 Hz quality 800
KTLS: Initialized 2 threads
random: entropy device external interface
[ath_hal] loaded
WARNING: Device "kbd" is Giant locked and may be deleted before FreeBSD 14.0.
kbd1 at kbdmux0
000.000052 [4350] netmap_init               netmap: loaded module
nexus0
vtvga0: <VT VGA driver>
cryptosoft0: <software crypto>
aesni0: No AES or SHA support.
acpi0: <HP ProLiant>
acpi0: Power Button (fixed)
acpi0: _OSC failed: AE_BUFFER_OVERFLOW
cpu0: <ACPI CPU> on acpi0
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
Event timer "HPET" frequency 14318180 Hz quality 550
Event timer "HPET1" frequency 14318180 Hz quality 450
Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
apei0: <ACPI Platform Error Interface> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
pci1: <ACPI PCI bus> on pcib1
vgapci0: <VGA-compatible display> port 0xe000-0xe0ff mem
0xfa000000-0xfbffffff,0xfe7f0000-0xfe7fffff,0xfe600000-0xfe6fffff irq 18 at
device 5.0 on pci1
vgapci0: Boot video device
pcib2: <ACPI PCI-PCI bridge> irq 18 at device 2.0 on pci0
pci2: <ACPI PCI bus> on pcib2
pci2: <serial bus> at device 0.0 (no driver attached)
pcib3: <ACPI PCI-PCI bridge> irq 18 at device 6.0 on pci0
pci3: <ACPI PCI bus> on pcib3
bge0: <HP NC107i PCIe Gigabit Server Adapter, ASIC rev. 0x5784100> mem
0xfe9f0000-0xfe9fffff irq 18 at device 0.0 on pci3
bge0: CHIP ID 0x05784100; ASIC REV 0x5784; CHIP REV 0x57841; PCI-E
miibus0: <MII bus> on bge0
brgphy0: <BCM5784 10/100/1000baseT PHY> PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
bge0: Using defaults for TSO: 65518/35/2048
bge0: Ethernet address: fc:15:b4:90:34:f3
ahci0: <AMD SB7x0/SB8x0/SB9x0 AHCI SATA controller> port
0xd000-0xd007,0xc000-0xc003,0xb000-0xb007,0xa000-0xa003,0x9000-0x900f mem
0xfe5ffc00-0xfe5fffff irq 19 at device 17.0 on pci0
ahci0: AHCI v1.20 with 6 3Gbps ports, Port Multiplier supported
ahci0: quirks=0x22000<ATI_PMP_BUG,1MSI>
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich4: <AHCI channel> at channel 4 on ahci0
ahcich5: <AHCI channel> at channel 5 on ahci0
ohci0: <AMD SB7x0/SB8x0/SB9x0 USB controller> mem 0xfe5fe000-0xfe5fefff irq 18
at device 18.0 on pci0
usbus0 on ohci0
usbus0: 12Mbps Full Speed USB v1.0
ehci0: <AMD SB7x0/SB8x0/SB9x0 USB 2.0 controller> mem 0xfe5ff800-0xfe5ff8ff irq
17 at device 18.2 on pci0
usbus1: EHCI version 1.0
usbus1 on ehci0
usbus1: 480Mbps High Speed USB v2.0
ohci1: <AMD SB7x0/SB8x0/SB9x0 USB controller> mem 0xfe5fd000-0xfe5fdfff irq 18
at device 19.0 on pci0
usbus2 on ohci1
usbus2: 12Mbps Full Speed USB v1.0
ehci1: <AMD SB7x0/SB8x0/SB9x0 USB 2.0 controller> mem 0xfe5ff400-0xfe5ff4ff irq
17 at device 19.2 on pci0
usbus3: EHCI version 1.0
usbus3 on ehci1
usbus3: 480Mbps High Speed USB v2.0
isab0: <PCI-ISA bridge> at device 20.3 on pci0
isa0: <ISA bus> on isab0
pcib4: <ACPI PCI-PCI bridge> at device 20.4 on pci0
pci4: <ACPI PCI bus> on pcib4
ohci2: <AMD SB7x0/SB8x0/SB9x0 USB controller> mem 0xfe5fc000-0xfe5fcfff irq 18
at device 22.0 on pci0
usbus4 on ohci2
usbus4: 12Mbps Full Speed USB v1.0
ehci2: <AMD SB7x0/SB8x0/SB9x0 USB 2.0 controller> mem 0xfe5ff000-0xfe5ff0ff irq
17 at device 22.2 on pci0
usbus5: EHCI version 1.0
usbus5 on ehci2
usbus5: 480Mbps High Speed USB v2.0
acpi_button0: <Power Button> on acpi0
hwpstate0: <Cool`n'Quiet 2.0> on cpu0
Timecounters tick every 1.000 msec
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
ugen2.1: <ATI OHCI root HUB> at usbus2
ugen4.1: <ATI OHCI root HUB> at usbus4
uhub0 on usbus2
uhub1 on usbus4
uhub0: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
uhub1: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4
ugen1.1: <ATI EHCI root HUB> at usbus1
ugen0.1: <ATI OHCI root HUB> at usbus0
uhub2 on usbus1
uhub3 on usbus0
uhub2: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
uhub3: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
mlx4_core0: <mlx4_core> mem 0xfe800000-0xfe8fffff,0xfd800000-0xfdffffff irq 18
at device 0.0 on pci2
mlx4_core: Mellanox ConnectX core driver v3.6.0 (December 2020)
mlx4_core: Initializing mlx4_core
ugen5.1: <ATI EHCI root HUB> at usbus5
uhub4 on usbus5
uhub4: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus5
ugen3.1: <ATI EHCI root HUB> at usbus3
uhub5 on usbus3
uhub5: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
bge0: link state changed to UP
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <WDC WD30EZRX-00D8PB0 80.00A80> ACS-2 ATA SATA 3.x device
ada0: Serial Number WD-WMC4N0D37EH5
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 2861588MB (5860533168 512 byte sectors)
ada0: quirks=0x1<4K>
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <WDC WD30EZRX-00D8PB0 80.00A80> ACS-2 ATA SATA 3.x device
ada1: Serial Number WD-WMC4N0D7W637
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 2861588MB (5860533168 512 byte sectors)
ada1: quirks=0x1<4K>
ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
ada2: <WDC WD30EZRX-00D8PB0 80.00A80> ACS-2 ATA SATA 3.x device
ada2: Serial Number WD-WMC4N0D6EVLR
ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 2861588MB (5860533168 512 byte sectors)
ada2: quirks=0x1<4K>
ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
ada3: <WDC WD30EZRX-00D8PB0 80.00A80> ACS-2 ATA SATA 3.x device
ada3: Serial Number WD-WMC4N0DA7JCC
ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 2861588MB (5860533168 512 byte sectors)
ada3: quirks=0x1<4K>
ada4 at ahcich5 bus 0 scbus5 target 0 lun 0
ada4: <WDC WD5000AAJS-55A8B2 01.03B01> ATA8-ACS SATA 2.x device
ada4: Serial Number WD-WCASY8895731
ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada4: Command Queueing enabled
ada4: 476940MB (976773168 512 byte sectors)
uhub1: 4 ports with 4 removable, self powered
uhub3: 5 ports with 5 removable, self powered
uhub0: 5 ports with 5 removable, self powered
mlx4_core0: Old device ETS support detected
mlx4_core0: Consider upgrading device FW.
mlx4_core0: Unable to determine PCI device chain minimum BW
<mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v3.6.0
(December 2020)
<mlx4_ib> mlx4_ib_add: counter index 0 for port 1 allocated 0
<mlx4_ib> mlx4_ib_add: counter index 1 for port 2 allocated 0
ib0: link state changed to DOWN
ib0: post srq failed for buf 0 (-22)
ib0: ipoib_cm_post_receive_srq failed for buf 0


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x1f4bd438
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80ea7f03
stack pointer           = 0x28:0xffffffff829ba990
frame pointer           = 0x28:0xffffffff829ba9b0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (swapper)
trap number             = 12
panic: page fault
cpuid = 0
time = 5
KDB: stack backtrace:
#0 0xffffffff80c60b55 at kdb_backtrace+0x65
#1 0xffffffff80c13771 at vpanic+0x181
#2 0xffffffff80c135e3 at panic+0x43
#3 0xffffffff81135187 at trap_fatal+0x387
#4 0xffffffff811351df at trap_pfault+0x4f
#5 0xffffffff8113483d at trap+0x27d
#6 0xffffffff8110c028 at calltrap+0x8
#7 0xffffffff80ea7794 at ipoib_cm_dev_cleanup+0x94
#8 0xffffffff80ea6976 at ipoib_cm_dev_init+0x536
#9 0xffffffff80eaf242 at ipoib_transport_dev_init+0xf2
#10 0xffffffff80ea98d1 at ipoib_ib_dev_init+0x31
#11 0xffffffff80eaaf07 at ipoib_dev_init+0x97
#12 0xffffffff80eac812 at ipoib_add_one+0x312
#13 0xffffffff80e71848 at ib_register_device+0x768
#14 0xffffffff80ee2013 at mlx4_ib_add+0x1033
#15 0xffffffff80f00d40 at mlx4_add_device+0x40
#16 0xffffffff80f00c68 at mlx4_register_interface+0xb8










----- KERNCONF diff ----------
--- GENERIC     2021-03-21 03:48:03.373297000 +0900
+++ MICROSERVER-PR      2021-03-22 09:22:06.646143000 +0900
@@ -19,7 +19,7 @@
 # $FreeBSD$

 cpu            HAMMER
-ident          GENERIC
+ident          MICROSERVER-PR

 makeoptions    DEBUG=-g                # Build kernel with gdb(1) debug
symbols
 makeoptions    WITH_CTF=1              # Run ctfconvert(1) for DTrace support
@@ -249,9 +249,23 @@

 # Nvidia/Mellanox Connect-X 4 and later, Ethernet only
 # mlx5ib requires ibcore infra and is not included by default
-device         mlx5                    # Base driver
-device         mlxfw                   # Firmware update
-device         mlx5en                  # Ethernet driver
+#device                mlx5                    # Base driver
+#device                mlxfw                   # Firmware update
+#device                mlx5en                  # Ethernet driver
+
+
+# Mellanox
+options        OFED
+options        SDP
+options        IPOIB_CM
+
+device         ipoib
+device         mlx4
+device         mlx4ib
+device         mlx4en
+device         mthca
+
+

 # PCI Ethernet NICs that use the common MII bus controller code.
 # NOTE: Be sure to keep the 'device miibus' line in order to use these NICs!

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list