Panics on IBM Bladecenter HS20/amd64 blades
Daniel Bond
db at danielbond.org
Fri Oct 6 09:06:02 UTC 2006
Hi,
FreeBSD has been running rock solid on the older i386/HS20's, but the newer ones
with amd64 configuration keeps panicing, and I can't quite figure out why.
Help tracking this issue down, is greatly appreciated.
The panics happen randomly, average once every 2 days, sometimes just
20minutes between each panic, allways in the process tcpserver, which
indicates that this is a network related issue(?).
Another problem is that the system can't reboot by it's self, because there is
no keyboard controller, leaving the filesystems dirty (there is a flag
BROKEN_KEYBOARD_RESET in i386, but not in amd64), so I have to reboot the
machine via bladecenter managament to get it up again.
If there is anything I can do to provide more usefull output, please let me know.
Trace:
------------------------------------------
mxtwo# kgdb kernel.debug /var/crash/vmcore.3
Unread portion of the kernel message buffer:
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x18c
fault code = supervisor read, page not present
instruction pointer = 0x8:0xffffffff802cf867
stack pointer = 0x10:0xffffffffb3ff38b0
frame pointer = 0x10:0x4
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 1363 (tcpserver)
trap number = 12
panic: page fault
cpuid = 0
Uptime: 6m22s
Dumping 2047 MB (2 chunks)
chunk 0: 1MB (154 pages) ... ok
chunk 1: 2047MB (523966 pages) 2031 2015 1999 1983 1967 1951 1935 1919 1903
1887 1871 1855 1839 1823 1807 1791 1775 1759 1743 1727 1711 1695 1679 1663
1647 1631 1615 1599 1583 1567 1551 1535 1519 1503 1487 1471 1455 1439 1423
1407 1391 1375 1359 1343 1327 1311 1295 1279 1263 1247 1231 1215 1199 1183
1167 1151 1135 1119 1103 1087 1071 1055 1039 1023 1007 991 975 959 943 927 911
895 879 863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607
591 575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303
287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15
#0 doadump () at pcpu.h:172
172 __asm __volatile("movq %%gs:0,%0" : "=r" (td));
(kgdb) list *0xffffffff802cf867
0xffffffff802cf867 is in _mtx_lock_sleep (/usr/src/sys/kern/kern_mutex.c:544).
539 * If the current owner of the lock is executing on
another
540 * CPU, spin instead of blocking.
541 */
542 owner = (struct thread *)(v & MTX_FLAGMASK);
543 #ifdef ADAPTIVE_GIANT
544 if (TD_IS_RUNNING(owner)) {
545 #else
546 if (m != &Giant && TD_IS_RUNNING(owner)) {
547 #endif
548 turnstile_release(&m->mtx_object);
(kgdb) bt
#0 doadump () at pcpu.h:172
#1 0x0000000000000004 in ?? ()
#2 0xffffffff802d9bf7 in boot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:409
#3 0xffffffff802da291 in panic (fmt=0xffffff005b3fa4c0 "@Ó¯[") at
/usr/src/sys/kern/kern_shutdown.c:565
#4 0xffffffff80488bff in trap_fatal (frame=0xffffff005b3fa4c0,
eva=18446742975736173376) at /usr/src/sys/amd64/amd64/trap.c:660
#5 0xffffffff80489126 in trap (frame=
{tf_rdi = 56, tf_rsi = -1097980730176, tf_rdx = 6, tf_rcx = 0, tf_r8 =
0, tf_r9 = 0, tf_rax = 1, tf_rbx = -1098015721464, tf_rbp = 4, tf_r10 =
-2037788432, tf_r11 = -1097980730176, tf_r12 = -1097980730176, tf_r13 =
-1097438414848, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12, tf_addr = 396,
tf_flags = -2143116959, tf_err = 0, tf_rip = -2144536473, tf_cs = 8, tf_rflags
= 65538, tf_rsp = -1275119424, tf_ss = 16}) at
/usr/src/sys/amd64/amd64/trap.c:238
#6 0xffffffff8047449b in calltrap () at
/usr/src/sys/amd64/amd64/exception.S:168
#7 0xffffffff802cf867 in _mtx_lock_sleep (m=0xffffff005929b808,
tid=18446742975728821440, opts=6, file=0x0, line=0)
at /usr/src/sys/kern/kern_mutex.c:542
#8 0xffffffff803826bd in ip_ctloutput (so=0x38, sopt=0xffffffffb3ff3b30) at
/usr/src/sys/netinet/ip_output.c:1193
#9 0xffffffff80393bd5 in tcp_ctloutput (so=0xffffff005a83b738,
sopt=0xffffffffb3ff3b30) at /usr/src/sys/netinet/tcp_usrreq.c:1038
#10 0xffffffff80322068 in sosetopt (so=0xffffff005a83b738,
sopt=0xffffffffb3ff3b30) at /usr/src/sys/kern/uipc_socket.c:1563
#11 0xffffffff80328536 in kern_setsockopt (td=0xffffff005b3fa4c0,
s=1619162408, level=56, name=0, val=0x0, valseg=UIO_USERSPACE,
valsize=2257178864) at /usr/src/sys/kern/uipc_syscalls.c:1351
#12 0xffffffff803285ae in setsockopt (td=0x38, uap=0xffffff005b3fa4c0) at
/usr/src/sys/kern/uipc_syscalls.c:1307
#13 0xffffffff80489a51 in syscall (frame=
{tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9 =
140737488349992, tf_rax = 105, tf_rbx = 0, tf_rbp = 0, tf_r10 = 0, tf_r11 =
514, tf_r12 = 3, tf_r13 = 140737488350320, tf_r14 = 0, tf_r15 = 0, tf_trapno =
12, tf_addr = 5285992, tf_flags = 12, tf_err = 2, tf_rip = 34368089164, tf_cs
= 43, tf_rflags = 582, tf_rsp = 140737488350040, tf_ss = 35})
at /usr/src/sys/amd64/amd64/trap.c:792
#14 0xffffffff80474638 in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:270
#15 0x00000008007f6c4c in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) up 5
#5 0xffffffff80489126 in trap (frame=
{tf_rdi = 56, tf_rsi = -1097980730176, tf_rdx = 6, tf_rcx = 0, tf_r8 =
0, tf_r9 = 0, tf_rax = 1, tf_rbx = -1098015721464, tf_rbp = 4, tf_r10 =
-2037788432, tf_r11 = -1097980730176, tf_r12 = -1097980730176, tf_r13 =
-1097438414848, tf_r14 = 0, tf_r15 = 1, tf_trapno = 12, tf_addr = 396,
tf_flags = -2143116959, tf_err = 0, tf_rip = -2144536473, tf_cs = 8, tf_rflags
= 65538, tf_rsp = -1275119424, tf_ss = 16}) at
/usr/src/sys/amd64/amd64/trap.c:238
238 trap_fatal(&frame, frame.tf_addr);
(kgdb) up
#6 0xffffffff8047449b in calltrap () at
/usr/src/sys/amd64/amd64/exception.S:168
168 call trap
Current language: auto; currently asm
(kgdb) up
#7 0xffffffff802cf867 in _mtx_lock_sleep (m=0xffffff005929b808,
tid=18446742975728821440, opts=6, file=0x0, line=0)
at /usr/src/sys/kern/kern_mutex.c:542
542 owner = (struct thread *)(v & MTX_FLAGMASK);
Current language: auto; currently c
(kgdb) list
537 #if defined(SMP) && !defined(NO_ADAPTIVE_MUTEXES)
538 /*
539 * If the current owner of the lock is executing on
another
540 * CPU, spin instead of blocking.
541 */
542 owner = (struct thread *)(v & MTX_FLAGMASK);
543 #ifdef ADAPTIVE_GIANT
544 if (TD_IS_RUNNING(owner)) {
545 #else
546 if (m != &Giant && TD_IS_RUNNING(owner)) {
(kgdb) up
#8 0xffffffff803826bd in ip_ctloutput (so=0x38, sopt=0xffffffffb3ff3b30) at
/usr/src/sys/netinet/ip_output.c:1193
1193 INP_LOCK(inp);
(kgdb) list
1188 m->m_len);
1189 if (error) {
1190 m_free(m);
1191 break;
1192 }
1193 INP_LOCK(inp);
1194 error = ip_pcbopts(inp, sopt->sopt_name, m);
1195 INP_UNLOCK(inp);
1196 return (error);
1197 }
(kgdb) print so
$1 = (struct socket *) 0x38
(kgdb) print sopt
$2 = (struct sockopt *) 0xffffffffb3ff3b30
(kgdb) up
#9 0xffffffff80393bd5 in tcp_ctloutput (so=0xffffff005a83b738,
sopt=0xffffffffb3ff3b30) at /usr/src/sys/netinet/tcp_usrreq.c:1038
1038 error = ip_ctloutput(so, sopt);
(kgdb) list
1033 #ifdef INET6
1034 if (INP_CHECK_SOCKAF(so, AF_INET6))
1035 error = ip6_ctloutput(so, sopt);
1036 else
1037 #endif /* INET6 */
1038 error = ip_ctloutput(so, sopt);
1039 return (error);
1040 }
1041 tp = intotcpcb(inp);
(kgdb) up
#10 0xffffffff80322068 in sosetopt (so=0xffffff005a83b738,
sopt=0xffffffffb3ff3b30) at /usr/src/sys/kern/uipc_socket.c:1563
1563 return ((*so->so_proto->pr_ctloutput)
(kgdb) print so->so_proto->pr_ctloutput
$3 = (pr_ctloutput_t *) 0xffffffff80393ae0 <tcp_ctloutput>
(kgdb) list *0xffffffff80393ae0
0xffffffff80393ae0 is in tcp_ctloutput
(/usr/src/sys/netinet/tcp_usrreq.c:1016).
1011 */
1012 int
1013 tcp_ctloutput(so, sopt)
1014 struct socket *so;
1015 struct sockopt *sopt;
1016 {
1017 int error, opt, optval;
1018 struct inpcb *inp;
1019 struct tcpcb *tp;
1020 struct tcp_info ti;
(kgdb) up
#11 0xffffffff80328536 in kern_setsockopt (td=0xffffff005b3fa4c0,
s=1619162408, level=56, name=0, val=0x0, valseg=UIO_USERSPACE,
valsize=2257178864) at /usr/src/sys/kern/uipc_syscalls.c:1351
1351 error = sosetopt(so, &sopt);
(kgdb) list
1346
1347 NET_LOCK_GIANT();
1348 error = getsock(td->td_proc->p_fd, s, &fp);
1349 if (error == 0) {
1350 so = fp->f_data;
1351 error = sosetopt(so, &sopt);
1352 fdrop(fp, td);
1353 }
1354 NET_UNLOCK_GIANT();
1355 return(error);
(kgdb) up
#12 0xffffffff803285ae in setsockopt (td=0x38, uap=0xffffff005b3fa4c0) at
/usr/src/sys/kern/uipc_syscalls.c:1307
1307 return (kern_setsockopt(td, uap->s, uap->level, uap->name,
(kgdb) up
#13 0xffffffff80489a51 in syscall (frame=
{tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 0, tf_r8 = 0, tf_r9 =
140737488349992, tf_rax = 105, tf_rbx = 0, tf_rbp = 0, tf_r10 = 0, tf_r11 =
514, tf_r12 = 3, tf_r13 = 140737488350320, tf_r14 = 0, tf_r15 = 0, tf_trapno =
12, tf_addr = 5285992, tf_flags = 12, tf_err = 2, tf_rip = 34368089164, tf_cs
= 43, tf_rflags = 582, tf_rsp = 140737488350040, tf_ss = 35})
at /usr/src/sys/amd64/amd64/trap.c:792
792 error = (*callp->sy_call)(td, argp);
(kgdb) list
787 if ((callp->sy_narg & SYF_MPSAFE) == 0) {
788 mtx_lock(&Giant);
789 error = (*callp->sy_call)(td, argp);
790 mtx_unlock(&Giant);
791 } else
792 error = (*callp->sy_call)(td, argp);
793 }
794
795 switch (error) {
796 case 0:
#14 0xffffffff80474638 in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:270
270 call syscall
Current language: auto; currently asm
(kgdb) list
265 movq %r12,TF_R12(%rsp) /* C preserved */
266 movq %r13,TF_R13(%rsp) /* C preserved */
267 movq %r14,TF_R14(%rsp) /* C preserved */
268 movq %r15,TF_R15(%rsp) /* C preserved */
269 FAKE_MCOUNT(TF_RIP(%rsp))
270 call syscall
271 movq PCPU(CURPCB),%rax
272 testq $PCB_FULLCTX,PCB_FLAGS(%rax)
273 jne 3f
274 1: /* Check for and handle AST's on return to userland */
(kgdb) up
#15 0x00000008007f6c4c in ?? ()
(kgdb) up
Initial frame selected; you cannot go up.
(kgdb) list
275 cli
276 movq PCPU(CURTHREAD),%rax
277 testl $TDF_ASTPENDING | TDF_NEEDRESCHED,TD_FLAGS(%rax)
278 je 2f
279 sti
280 movq %rsp, %rdi
281 call ast
282 jmp 1b
283 2: /* restore preserved registers */
284 MEXITCOUNT
DMESG:
------------------------------------------
opyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 6.1-STABLE #0: Tue Oct 3 08:33:25 CEST 2006
root at mxtwo.nsn.no:/usr/obj/usr/src/sys/BladeSMP
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2800.11-MHz K8-class CPU)
Origin = "GenuineIntel" Id = 0xf41 Stepping = 1
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,
HTT,TM,PBE>
Features2=0x641d<SSE3,RSVD2,MON,DS_CPL,CNTX-ID,CX16,<b14>>
AMD Features=0x20000800<SYSCALL,LM>
Logical CPUs per core: 2
real memory = 2147213312 (2047 MB)
avail memory = 2064486400 (1968 MB)
kbd0 at kbdmux0
cpu0 on motherboard
pcib0: <Host to PCI bridge> pcibus 0 on motherboard
pci0: <PCI bus> on pcib0
pci0: <unknown> at device 0.1 (no driver attached)
pcib1: <PCI-PCI bridge> at device 3.0 on pci0
pci4: <PCI bus> on pcib1
pcib2: <PCI-PCI bridge> at device 0.0 on pci4
pci6: <PCI bus> on pcib2
pcib3: <PCI-PCI bridge> at device 0.2 on pci4
pci5: <PCI bus> on pcib3
bge0: <Broadcom BCM5704 B0, ASIC rev. 0x2100> mem 0xdcff0000-0xdcffffff irq 7
at device 1.0 on pci5
bge0: Ethernet address: 00:14:5e:3c:94:b6
bge1: <Broadcom BCM5704 B0, ASIC rev. 0x2100> mem 0xdcfe0000-0xdcfeffff irq 5
at device 1.1 on pci5
bge1: Ethernet address: 00:14:5e:3c:94:b7
pci0: <base peripheral> at device 8.0 (no driver attached)
pcib4: <PCI-PCI bridge> at device 28.0 on pci0
pci2: <PCI bus> on pcib4
mpt0: <LSILogic 1030 Ultra4 Adapter> port 0x4000-0x40ff mem
0xdeff0000-0xdeffffff,0xdefe0000-0xdefeffff irq 10 at device 1.0 on pci2
mpt0: [GIANT-LOCKED]
mpt0: MPI Version=1.2.15.0
mpt0: Capabilities: ( RAID-1E RAID-1 SAFTE )
mpt0: 1 Active Volume (1 Max)
mpt0: 2 Hidden Drive Members (6 Max)
uhci0: <UHCI (generic) USB controller> port 0x2200-0x221f irq 10 at device
29.0 on pci0
uhci0: [GIANT-LOCKED]
usb0: <UHCI (generic) USB controller> on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: <UHCI (generic) USB controller> port 0x2600-0x261f irq 5 at device 29.1
on pci0
uhci1: [GIANT-LOCKED]
usb1: <UHCI (generic) USB controller> on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
pci0: <base peripheral> at device 29.4 (no driver attached)
pci0: <base peripheral, interrupt controller> at device 29.5 (no driver
attached)
pcib5: <PCI-PCI bridge> at device 30.0 on pci0
pci1: <PCI bus> on pcib5
pci1: <display, VGA> at device 1.0 (no driver attached)
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel 6300ESB UDMA100 controller> port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376 at device 31.1 on pci0
ata0: <ATA channel 0> on atapci0
ata1: <ATA channel 1> on atapci0
pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
orm0: <ISA Option ROM> at iomem 0xc0000-0xc8fff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
uhub2: Cypress Semiconductor 4 Port Hub, class 9/0, rev 1.10/0.01, addr 2
uhub2: 4 ports with 4 removable, bus powered
ukbd0: IBM PPC I/F, rev 1.10/0.01, addr 3, iclass 3/1
kbd1 at ukbd0
ums0: IBM PPC I/F, rev 1.10/0.01, addr 3, iclass 3/1
ums0: X report 0x0002 not supported
device_attach: ums0 attach returned 6
ukbd1: IBM HIDK/M, rev 1.10/0.01, addr 4, iclass 3/1
kbd2 at ukbd1
ums0: IBM HIDK/M, rev 1.10/0.01, addr 4, iclass 3/1
ums0: 3 buttons and Z dir.
Timecounter "TSC" frequency 2800109935 Hz quality 800
Timecounters tick every 1.000 msec
IP Filter: v4.1.8 initialized. Default = pass all, Logging = enabled
Waiting 5 seconds for SCSI devices to settle
mpt0:vol0(mpt0:0:0): Settings ( Hot-Plug-Spares )
mpt0:vol0(mpt0:0:0): Using Spare Pool: 0
mpt0:vol0(mpt0:0:0): 2 Members:
(mpt0:0:0): Primary
(mpt0:0:1): Secondary
mpt0:vol0(mpt0:0:0): RAID-1 - Optimal
mpt0:vol0(mpt0:0:0): Status ( Enabled )
(mpt0:vol0:0): Physical (mpt0:0:0), Pass-thru (mpt0:1:0)
(mpt0:vol0:0): Online
(mpt0:vol0:1): Physical (mpt0:0:1), Pass-thru (mpt0:1:1)
(mpt0:vol0:1): Online
pass1 at mpt0 bus 1 target 0 lun 0
pass1: <IBM-ESXS ST973401LC FN B41D> Fixed unknown SCSI-4 device
pass1: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing
Enabled
pass2 at mpt0 bus 1 target 1 lun 0
pass2: <IBM-ESXS ST973401LC FN B41D> Fixed unknown SCSI-4 device
pass2: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing
Enabled
da0 at mpt0 bus 0 target 0 lun 0
da0: <LSILOGIC 1030 IM IM 1000> Fixed Direct Access SCSI-2 device
da0: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing
Enabled
da0: 69878MB (143110144 512 byte sectors: 255H 63S/T 8908C)
Trying to mount root from ufs:/dev/da0s1a
WARNING: / was not properly dismounted
WARNING: /usr was not properly dismounted
bge1: link state changed to UP
--
Med vennlig hilsen / Best regards,
------------------------------------------
Daniel Bond
PGP: C822C4BD
------------------------------------------
More information about the freebsd-stable
mailing list