6.2-R crash; kgdb not cooperating

David Wolfskill david at catwhisker.org
Mon Aug 13 09:47:34 PDT 2007


Had a machine at work crash; got a core dump, but I'm having some
trouble getting kgdb to behave usefully, and could use a hint.

Machine is a dual Xeon, running:
out03# uname -rms
FreeBSD 6.2-RELEASE amd64
out03# 

Its primary workload is delivering mail to customers (similar to the
role of mx2.freebsd.org, for those familiar with FreeBSD.org
infrastructure).  Like mx2, the MTA in use is Postfix.  It runs a
caching-only name server for its own use, and there's a Perl script that
runs from time to time to scrape data out of /var/log/maillog and feed
said data to some database machine somewhere.  It also runs ntpd & sshd,
and uses IPFW for packet-filtering.

My first approach was to copy the core dump & kernel.debug files
to my work desktop (which runs 6.2-STABLE on i386 as of yesterday;
I'm in the habit of tracking RELENG_6 every Sunday on that machine).
Results of that were succinct:

catmint(6.2-S)[2] ls -ltr kernel.debug vmcore.0
-rw-------  1 dhw  wheel  2913157120 Aug 13 05:49 vmcore.0
-rwxr-xr-x  1 dhw  wheel    29215877 Aug 13 06:02 kernel.debug
catmint(6.2-S)[3] kgdb kernel.debug vmcore.0 
kgdb: bad namelist
catmint(6.2-S)[4] echo $?
1
catmint(6.2-S)[5] 

I would prefer to continue the work from that machine, ideally.


On the chance that there's something odd about the different
environments, I tried invoking kgdb on the machine that crashed:

out03# cd /usr/obj/usr/src/sys/SMP_IPFW/
out03# kgdb kernel.debug /var/spool/crash/vmcore.0 
kgdb: kvm_read: 
kgdb: kvm_read: invalid address (0xffff67e9d231c931)
kgdb: kvm_read: invalid address (0xffff67e9d231c931)
...
kgdb: kvm_read: invalid address (0xffff67e9d231c931)
kgdb: kvm_read: invalid address (0xffff67e9d231c931)
^Ckvm_read: invalid address (0xffff67e9d231c931)
out03# 

It showed no indication of stopping; the novelty had worn off long
since, and the machine is back in production at the moment, so I'd
prefer to avoid disrupting that.

Checking /var/log/console.log, I see:

...
Aug 13 04:31:28 out03 kernel: 32-bit compatibility ldconfig path: /usr/lib32
Aug 13 04:31:28 out03 kernel: Checking for core dump on /dev/da0s3b...
Aug 13 04:31:28 out03 kernel: savecore: reboot after panic: page fault
Aug 13 04:31:28 out03 kernel: Aug 13 04:31:28 out03 savecore: reboot after panic: page fault
Aug 13 04:31:28 out03 kernel: savecore: writing core to vmcore.0
Aug 13 04:47:17 out03 kernel: Script /etc/rc.d/savecore interrupted
Aug 13 04:47:17 out03 kernel: Initial amd64 initialization:
Aug 13 04:47:17 out03 kernel: .
Aug 13 04:47:17 out03 kernel: Additional ABI support:
...

I hadn't recalled noticing that "Script /etc/rc.d/savecore interrupted";
hmmm...

Here's dmesg.boot from its most recent boot; since I hadn't changed
anything, it should resemble the system from before the crash pretty
closely:

out03# cat /var/run/dmesg.boot 
Copyright (c) 1992-2007 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 6.2-RELEASE #0: Wed Jan 31 09:03:09 PST 2007
    dhw at h239.dhw.mail-abuse.org:/usr/obj/usr/src/sys/SMP_IPFW
ACPI APIC Table: <PTLTD          APIC  >
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(R) CPU            5130  @ 2.00GHz (2000.08-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x6f6  Stepping = 6
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x4e33d<SSE3,RSVD2,MON,DS_CPL,VMX,TM2,<b9>,CX16,<b14>,<b15>,<b18>>
  AMD Features=0x20100800<SYSCALL,NX,LM>
  AMD Features2=0x1<LAHF>
  Cores per package: 2
real memory  = 5100273664 (4864 MB)
avail memory = 4122042368 (3931 MB)
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  6
 cpu3 (AP): APIC ID:  7
ioapic0 <Version 2.0> irqs 0-23 on motherboard
ioapic1 <Version 2.0> irqs 24-47 on motherboard
kbd1 at kbdmux0
ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
acpi0: <PTLTD   RSDT> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
cpu0: <ACPI CPU> on acpi0
acpi_throttle0: <ACPI CPU Throttling> on cpu0
cpu1: <ACPI CPU> on acpi0
acpi_throttle1: <ACPI CPU Throttling> on cpu1
acpi_throttle1: failed to attach P_CNT
device_attach: acpi_throttle1 attach returned 6
cpu2: <ACPI CPU> on acpi0
acpi_throttle2: <ACPI CPU Throttling> on cpu2
acpi_throttle2: failed to attach P_CNT
device_attach: acpi_throttle2 attach returned 6
cpu3: <ACPI CPU> on acpi0
acpi_throttle3: <ACPI CPU Throttling> on cpu3
acpi_throttle3: failed to attach P_CNT
device_attach: acpi_throttle3 attach returned 6
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 2.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> irq 16 at device 0.0 on pci1
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI PCI-PCI bridge> irq 16 at device 0.0 on pci2
pci3: <ACPI PCI bus> on pcib3
pcib4: <ACPI PCI-PCI bridge> irq 18 at device 2.0 on pci2
pci4: <ACPI PCI bus> on pcib4
em0: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port 0x2000-0x201f mem 0xda000000-0xda01ffff irq 18 at device 0.0 on pci4
em0: Ethernet address: 00:30:48:8b:94:72
em1: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port 0x2020-0x203f mem 0xda020000-0xda03ffff irq 19 at device 0.1 on pci4
em1: Ethernet address: 00:30:48:8b:94:73
pcib5: <ACPI PCI-PCI bridge> at device 0.3 on pci1
pci5: <ACPI PCI bus> on pcib5
3ware device driver for 9000 series storage controllers, version: 3.60.02.012
twa0: <3ware 9000 series Storage Controller> port 0x3000-0x303f mem 0xd8000000-0xd9ffffff,0xda100000-0xda100fff irq 24 at device 1.0 on pci5
twa0: [FAST]
twa0: INFO: (0x04: 0x003B): Rebuild paused: unit=1
twa0: INFO: (0x15: 0x1300): Controller details:: Model 9550SX-4LP, 4 ports, Firmware FE9X 3.04.01.011, BIOS BE9X 3.04.00.002
pci0: <base peripheral> at device 8.0 (no driver attached)
pcib6: <ACPI PCI-PCI bridge> irq 17 at device 28.0 on pci0
pci6: <ACPI PCI bus> on pcib6
uhci0: <UHCI (generic) USB controller> port 0x1800-0x181f irq 17 at device 29.0 on pci0
uhci0: [GIANT-LOCKED]
usb0: <UHCI (generic) USB controller> on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: <UHCI (generic) USB controller> port 0x1820-0x183f irq 19 at device 29.1 on pci0
uhci1: [GIANT-LOCKED]
usb1: <UHCI (generic) USB controller> on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2: <UHCI (generic) USB controller> port 0x1840-0x185f irq 18 at device 29.2 on pci0
uhci2: [GIANT-LOCKED]
usb2: <UHCI (generic) USB controller> on uhci2
usb2: USB revision 1.0
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
uhci3: <UHCI (generic) USB controller> port 0x1860-0x187f irq 16 at device 29.3 on pci0
uhci3: [GIANT-LOCKED]
usb3: <UHCI (generic) USB controller> on uhci3
usb3: USB revision 1.0
uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub3: 2 ports with 2 removable, self powered
ehci0: <EHCI (generic) USB 2.0 controller> mem 0xda600000-0xda6003ff irq 17 at device 29.7 on pci0
ehci0: [GIANT-LOCKED]
usb4: EHCI version 1.0
usb4: companion controllers, 2 ports each: usb0 usb1 usb2 usb3
usb4: <EHCI (generic) USB 2.0 controller> on ehci0
usb4: USB revision 2.0
uhub4: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub4: 8 ports with 8 removable, self powered
pcib7: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci7: <ACPI PCI bus> on pcib7
pci7: <display, VGA> at device 1.0 (no driver attached)
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel 63XXESB2 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x1880-0x188f at device 31.1 on pci0
ata0: <ATA channel 0> on atapci0
ata1: <ATA channel 1> on atapci0
pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
acpi_button0: <Power Button> on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A, console
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
ppc0: <ECP parallel printer port> port 0x378-0x37f,0x778-0x77f irq 7 drq 3 on acpi0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/9 bytes threshold
ppbus0: <Parallel port bus> on ppc0
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
orm0: <ISA Option ROMs> at iomem 0xc0000-0xcafff,0xcb000-0xcc7ff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounters tick every 1.000 msec
ipfw2 (+ipv6) initialized, divert loadable, rule-based forwarding disabled, default to deny, logging unlimited
acd0: DMA limited to UDMA33, controller found non-ATA66 cable
acd0: DVDROM <SONY DVD-ROM DDU1615/GYS1> at ata0-master UDMA33
SMP: AP CPU #1 Launched!
SMP: AP CPU #2 Launched!
SMP: AP CPU #3 Launched!
da0 at twa0 bus 0 target 0 lun 0
da0: <AMCC 9550SX-4LP DISK 3.04> Fixed Direct Access SCSI-3 device 
da0: 100.000MB/s transfers
da0: 76283MB (156227584 512 byte sectors: 255H 63S/T 9724C)
da1 at twa0 bus 0 target 1 lun 0
da1: <AMCC 9550SX-4LP DISK 3.04> Fixed Direct Access SCSI-3 device 
da1: 100.000MB/s transfers
da1: 152577MB (312477696 512 byte sectors: 255H 63S/T 19450C)
Trying to mount root from ufs:/dev/da0s1a
em0: link state changed to UP
twa0: INFO: (0x04: 0x000B): Rebuild started: unit=1
out03# 

Hints and/or clues would be quite welcome; thanks!

Peace,
david
-- 
David H. Wolfskill				david at catwhisker.org
Anything and everything is a (potential) cat toy.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20070813/f7d6abed/attachment.pgp


More information about the freebsd-stable mailing list