Kernel panic (page fault) on 10.3-STABLE with IB & VIMAGE modules
Justin Clift
justin at postgresql.org
Thu Apr 21 14:16:18 UTC 2016
Hi all,
Have been hitting a kernel panic (page fault) with the IB modules loaded
on 10.3-STABLE. (compiled multiple times over the last few days, all panicing)
Spent several hours narrowing down the cause, and it's definitely a bad
interaction between the IB modules (unsure which) + the "VIMAGE" module.
I'll fill out a bug report in a bit. In the meantime, does the below have any
useful info in it that I can use for further investigation? (commands taken from
https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html)
***********************************************************************************
root at cluster1:/usr/obj/usr/src/sys/CONNECTX # kgdb kernel.debug /var/crash/vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
Unread portion of the kernel message buffer:
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 12 (irq271: mlx4_core0)
trap number = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff807263d0 at kdb_backtrace+0x60
#1 0xffffffff806e8c76 at vpanic+0x126
#2 0xffffffff806e8b43 at panic+0x43
#3 0xffffffff80b8bf3b at trap_fatal+0x36b
#4 0xffffffff80b8c23d at trap_pfault+0x2ed
#5 0xffffffff80b8b8ba at trap+0x47a
#6 0xffffffff80b71892 at calltrap+0x8
#7 0xffffffff807be1a2 at netisr_dispatch_src+0x62
#8 0xffffffff808f89fa at ipoib_cm_handle_rx_wc+0x22a
#9 0xffffffff808fcc98 at ipoib_ib_completion+0x78
#10 0xffffffff80930c43 at mlx4_cq_completion+0x63
#11 0xffffffff80933d43 at mlx4_eq_int+0x2c3
#12 0xffffffff80932fac at mlx4_msi_x_interrupt+0xc
#13 0xffffffff806b35cb at intr_event_execute_handlers+0xab
#14 0xffffffff806b3a16 at ithread_loop+0x96
#15 0xffffffff806b104a at fork_exit+0x9a
#16 0xffffffff80b71dce at fork_trampoline+0xe
Uptime: 3m47s
Dumping 485 out of 7857 MB:..4%..14%..24%..33%..43%..53%..63%..73%..83%..93%
Reading symbols from /boot/kernel/ums.ko.symbols...done.
Loaded symbols for /boot/kernel/ums.ko.symbols
#0 doadump (textdump=<value optimized out>) at pcpu.h:219
219 __asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) list *0xffffffff808f89fa
0xffffffff808f89fa is in ipoib_cm_handle_rx_wc (/usr/src/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c:565).
560 mb->m_pkthdr.rcvif = dev;
561 proto = *mtod(mb, uint16_t *);
562 m_adj(mb, IPOIB_ENCAP_LEN);
563
564 IPOIB_MTAP_PROTO(dev, mb, proto);
565 ipoib_demux(dev, mb, ntohs(proto));
566
567 repost:
568 if (has_srq) {
569 if (unlikely(ipoib_cm_post_receive_srq(priv, wr_id)))
Current language: auto; currently minimal
(kgdb) list *0xffffffff807be1a2
0xffffffff807be1a2 is in netisr_dispatch_src (/usr/src/sys/net/netisr.c:976).
971 if (dispatch_policy == NETISR_DISPATCH_DIRECT) {
972 nwsp = DPCPU_PTR(nws);
973 npwp = &nwsp->nws_work[proto];
974 npwp->nw_dispatched++;
975 npwp->nw_handled++;
976 netisr_proto[proto].np_handler(m);
977 error = 0;
978 goto out_unlock;
979 }
980
(kgdb) list *0xffffffff80b71892
0xffffffff80b71892 is at /usr/src/sys/amd64/amd64/exception.S:238.
233 .type calltrap, at function
234 calltrap:
235 movq %rsp,%rdi
236 call trap
237 MEXITCOUNT
238 jmp doreti /* Handle any pending ASTs */
239
240 /*
241 * alltraps_noen entry point. Unlike alltraps above, we want to
242 * leave the interrupts disabled. This corresponds to
(kgdb) list *0xffffffff80b8b8ba
0xffffffff80b8b8ba is in trap (/usr/src/sys/amd64/amd64/trap.c:447).
442
443 KASSERT(cold || td->td_ucred != NULL,
444 ("kernel trap doesn't have ucred"));
445 switch (type) {
446 case T_PAGEFLT: /* page fault */
447 (void) trap_pfault(frame, FALSE);
448 goto out;
449
450 case T_DNA:
451 KASSERT(!PCB_USER_FPU(td->td_pcb),
(kgdb)
***********************************************************************************
Regards and best wishes,
Justin Clift
--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi
More information about the freebsd-infiniband
mailing list