kern/66359: repeatable kernel panic during ata(4) device probe
Eugene Grosbein
eugen at grosbein.pp.ru
Fri May 7 11:00:57 PDT 2004
>Number: 66359
>Category: kern
>Synopsis: repeatable kernel panic during ata(4) device probe
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Fri May 07 11:00:39 PDT 2004
>Closed-Date:
>Last-Modified:
>Originator: Eugene Grosbein
>Release: FreeBSD 4.10-PRERELEASE i386
>Organization:
Svyaz Service JSC
>Environment:
System: FreeBSD grosbein.pp.ru 4.10-PRERELEASE FreeBSD 4.10-PRERELEASE #12: Sat May 8 00:53:46 KRAST 2004 eu at grosbein.pp.ru:/usr/local/obj/usr/local/src/sys/DADV i386
Same for 4.9-RELEASE
Same for 4.8-RELEASE
>Description:
I've found a way to make kernel panic 'reliably'
during device probe stage. It works for 4.8, 4.8 and 4.10-PRE.
I didn't tested 5-CURRENT as I do not have spare CURRENT box.
Today my Maxtor 2B020H1 HDD failed - now it is detected by BIOS
as Maxtor ATHENA. Size and parameters are detected correctly but
the drive aborts all read requests. It seems that its internal
service information is lost and the drive switched into special
'safe technological' mode itself and now should be repaired
in a lab. I will not consider data recovery here.
Better look how ata(4) behaves with the drive. Here is boot log:
>How-To-Repeat:
Take bad hard disk drive that refuses to read its first cylinder
and boot ata(4)-enabled system.
>Fix:
Unknown for me.
>Release-Note:
>Audit-Trail:
>Unformatted:
>> FreeBSD/i386 BOOT
Default: 1:ad(1,a)/boot/loader
boot: -h
Console: serial port
BIOS drive A: is disk0
BIOS drive C: is disk1
BIOS drive D: is disk2
BIOS 639kB/523200kB available memory
FreeBSD/i386 bootstrap loader, Revision 0.8
(eu at grosbein.pp.ru, Sun Apr 25 01:02:09 KRAST 2004)
Loading /boot/defaults/loader.conf
/kernel text=0x1eda60 data=0x41b74+0x23940 syms=[0x4+0x332e0+0x4+0x3a3e4]
Hit [Enter] to boot immediately, or any other key for command prompt.
Booting [kernel] in 1 second... Booting [kernel]...
Copyright (c) 1992-2004 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 4.10-PRERELEASE #12: Sat May 8 00:53:46 KRAST 2004
eu at grosbein.pp.ru:/usr/local/obj/usr/local/src/sys/DADV
Timecounter "i8254" frequency 1193167 Hz
CPU: Intel Celeron (902.03-MHz 686-class CPU)
Origin = "GenuineIntel" Id = 0x68a Stepping = 10
Features=0x383f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
real memory = 536805376 (524224K bytes)
config> flags atkbd 0x0
config> quit
sio0: gdb debugging port
avail memory = 518496256 (506344K bytes)
Preloaded elf kernel "kernel" at 0xc03c3000.
Preloaded userconfig_script "/boot/kernel.conf" at 0xc03c309c.
VESA: v2.0, 65536k memory, flags:0x1, mode table:0xc034cac2 (1000022)
VESA: ATI RADEON 9200
Pentium Pro MTRR support enabled
Using $PIR table, 8 entries at 0xc00fdef0
apm0: <APM BIOS> on motherboard
apm0: found APM BIOS v1.2, connected at v1.2
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <Intel 82443BX (440 BX) host to PCI bridge> on motherboard
pci0: <PCI bus> on pcib0
agp0: <Intel 82443BX (440 BX) host to PCI bridge> mem 0xe8000000-0xebffffff at device 0.0 on pci0
pcib1: <Intel 82443BX (440 BX) PCI-PCI (AGP) bridge> at device 1.0 on pci0
pci1: <PCI bus> on pcib1
pci1: <ATI model 5961 graphics accelerator> at 0.0 irq 11
pci1: <ATI model 5941 graphics accelerator> at 0.1
isab0: <Intel 82371AB PCI to ISA bridge> at device 7.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel PIIX4 ATA33 controller> port 0xf000-0xf00f at device 7.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
uhci0: <Intel 82371AB/EB (PIIX4) USB controller> port 0xe000-0xe01f irq 10 at device 7.2 on pci0
usb0: <Intel 82371AB/EB (PIIX4) USB controller> on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uscanner0: Hewlett-Packard HP ScanJet 2200C, rev 1.10/1.00, addr 2
uhid0: American Power Conversion Back-UPS 500 FW: 6.5.I USB FW: c1, rev 1.10/1.00, addr 3, iclass 3/0
intpm0: <Intel 82371AB Power management controller> port 0x5000-0x500f irq 9 at device 7.3 on pci0
intpm0: I/O mapped 5000
intpm0: intr IRQ 9 enabled revision 0
smbus0: <System Management Bus> on intsmb0
smb0: <SMBus general purpose I/O> on smbus0
intpm0: PM I/O mapped 4000
fxp0: <Intel 82557 Pro/100 Ethernet> port 0xe400-0xe41f mem 0xef000000-0xef0fffff,0xef100000-0xef100fff irq 9 at device 16.0 on pci0
fxp0: Ethernet address 00:a0:c9:89:95:1f
inphy0: <i82555 10/100 media interface> on miibus0
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
orm0: <Option ROM> at iomem 0xc0000-0xccfff on isa0
pmtimer0 on isa0
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: model NetMouse/NetScroll Optical, device ID 0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
sc0: <System console> on isa0
sc0: VGA <24 virtual consoles, flags=0x0>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x90 on isa0
sio0: type 16550A, console
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/16 bytes threshold
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
pcm0: <Yamaha OPL-SAx> at port 0x220-0x22f,0x530-0x537,0x388-0x38f,0x330-0x331,0x370-0x371 irq 5 drq 0,1 on isa0
DUMMYNET initialized (011031)
ipfw2 initialized, divert enabled, rule-based forwarding enabled, default to deny, logging unlimited
IPsec: Initialized Security Association Processing.
ad0: 6149MB <WDC AC26400B> [13328/15/63] at ata0-master UDMA33
ad1: READ command timeout tag=0 serv=0 - resetting
ata0: resetting devices .. done
ad1: 19541MB <Maxtor ATHENA> [39703/16/63] at ata0-slave UDMA33
ad1: READ command timeout tag=0 serv=0 - resetting
ata0: resetting devices .. done
ad1: READ command timeout tag=0 serv=0 - resetting
ata0: resetting devices .. done
ad1: READ command timeout tag=0 serv=0 - resetting
ad1: trying fallback to PIO mode
ata0: resetting devices .. done
Fatal trap 12: page fault while in kernel mode
fault virtual address = 0x80100004
fault code = supervisor read, page not present
instruction pointer = 0x8:0xc0157f97
stack pointer = 0x10:0xc02f0924
frame pointer = 0x10:0xc02f0924
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = Idle
interrupt mask = bio
trap number = 12
panic: page fault
syncing disks...
done
Uptime: 40s
Automatic reboot in 15 seconds - press a key on the console to abort
Rebooting...
That was using next settings:
hw.ata.ata_dma: 1
hw.ata.wc: 0
hw.ata.tags: 0
hw.ata.atapi_dma: 0
If I use hw.ata.ata_dma=0 then it boots right:
ad0: 6149MB <WDC AC26400B> [13328/15/63] at ata0-master PIO4
ad1: hard error reading fsbn 40020561 of 0-3 (ad1 bn 40020561; cn 39702 tn 15 sn 0) status=51 error=04
ad1: 19541MB <Maxtor ATHENA> [39703/16/63] at ata0-slave PIO4
And proceedes till multiuser.
It is not possible to get crashdump in this situation so I used
another machine and GDB over serial line.
Here comes debug session script. It correspondes to the moment
after 'Automatic reboot in 15 seconds' message.
As you can see, fault virtual address 0x80100004 is the (junk)
value of request->bp->b_dev in ad_interrupt().
And by the way, is it OK to call splx(s) and return without
calling ATA_UNLOCK_CH(ch) in the ata-all.c:ata_start() inside
'#if NATADISK > 0' block?
Script started on Sat May 8 01:02:20 2004
GNU gdb 4.18 (FreeBSD)
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...Deprecated bfd_read called at /usr/local/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c line 2627 in elfstab_build_psymtabs
Deprecated bfd_read called at /usr/local/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c line 933 in fill_symbuf
(kgdb) target remote /dev/cuaa0
Remote debugging using /dev/cuaa0
Debugger (msg=0xc02df569 "manual escape to debugger")
at /usr/local/src/sys/i386/i386/db_interface.c:319
319 in_Debugger = 0;
(kgdb) c
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
Debugger (msg=0xc02df569 "manual escape to debugger")
at /usr/local/src/sys/i386/i386/db_interface.c:319
319 in_Debugger = 0;
(kgdb) set radix 16
Input and output radices now set to decimal 16, hex 10, octal 20.
(kgdb) bt full
#0 Debugger (msg=0xc02df569 "manual escape to debugger")
at /usr/local/src/sys/i386/i386/db_interface.c:319
msg = 0x26 <Address 0x26 out of bounds>
in_Debugger = 0x1
#1 0xc026e56e in scgetc (sc=0xc034cd40, flags=0x1)
at /usr/local/src/sys/dev/syscons/syscons.c:3198
scp = (scr_stat *) 0xc0346f80
tp = (struct tty *) 0x26
c = 0x86
this_scr = 0x1
f = 0xffffffff
i = 0x86
#2 0xc026c182 in sccngetch (flags=0x0)
at /usr/local/src/sys/dev/syscons/syscons.c:1565
fkey = {str = '\000' <repeats 15 times>, len = 0x0}
fkeycp = 0x0
scp = (scr_stat *) 0xc0346f80
p = (u_char *) 0xc02e324f ""
cur_mode = 0x1
s = 0xffffffff
c = 0x100
#3 0xc026c002 in sccngetc (dev=0xc0332ea0)
at /usr/local/src/sys/dev/syscons/syscons.c:1488
No locals.
---Type <return> to continue, or q <return> to quit---
#4 0xc017e85f in cngetc () at /usr/local/src/sys/kern/tty_cons.c:433
c = 0xc02e324f
#5 0xc0164875 in shutdown_panic (junk=0x0, howto=0x100)
at /usr/local/src/sys/kern/kern_shutdown.c:435
loop = 0x87
#6 0xc01647cd in boot (howto=0x100)
at /usr/local/src/sys/kern/kern_shutdown.c:378
_el = (struct eventhandler_list *) 0x26
_ep = (struct eventhandler_entry *) 0xc1050e60
howto = 0x100
#7 0xc0164c11 in panic (fmt=0xc02e5c0c "%s")
at /usr/local/src/sys/kern/kern_shutdown.c:656
fmt = 0xc02e5c0c "%s"
bootopt = 0x100
buf = "page fault", '\000' <repeats 245 times>
#8 0xc0280dc0 in trap_fatal (frame=0xc02f08fc, eva=0x80100004)
at /usr/local/src/sys/i386/i386/trap.c:974
frame = (struct trapframe *) 0xc02f08fc
eva = 0x26
code = 0x10
type = 0xc
ss = 0x10
esp = 0x26
softseg = {ssd_base = 0x0, ssd_limit = 0xfffff, ssd_type = 0x1b,
---Type <return> to continue, or q <return> to quit---
ssd_dpl = 0x0, ssd_p = 0x1, ssd_xx = 0xf, ssd_xx1 = 0x2, ssd_def32 = 0x1,
ssd_gran = 0x1}
#9 0xc0280a55 in trap_pfault (frame=0xc02f08fc, usermode=0x0, eva=0x80100004)
at /usr/local/src/sys/i386/i386/trap.c:867
va = 0x80100000
vm = (struct vmspace *) 0x26
map = 0xc
rv = 0x20
ftype = 0x0
p = (struct proc *) 0x0
#10 0xc02805e7 in trap (frame={tf_fs = 0xc1e00010, tf_es = 0x10,
tf_ds = 0x80010, tf_edi = 0xc1e1f200, tf_esi = 0xc1e27000,
tf_ebp = 0xc02f093c, tf_isp = 0xc02f0928, tf_ebx = 0x80100000,
tf_edx = 0xc1e70a40, tf_ecx = 0xc1e04b00, tf_eax = 0x80100000,
tf_trapno = 0xc, tf_err = 0x0, tf_eip = 0xc0157f97, tf_cs = 0x8,
tf_eflags = 0x90293, tf_esp = 0xc02f095c, tf_ss = 0xc02076ad})
at /usr/local/src/sys/i386/i386/trap.c:466
p = (struct proc *) 0x0
sticks = 0x1
i = 0x0
ucode = 0x0
type = 0xc
code = 0x20
eva = 0x80100004
---Type <return> to continue, or q <return> to quit---
#11 0xc0157f97 in minor (x=0x80100000)
at /usr/local/src/sys/kern/kern_conf.c:198
x = 0x26
#12 0xc02076ad in diskerr (bp=0xc1e1f200, what=0xc02a0ccb "hard error",
pri=0xffffffff, blkdone=0x262aa51, lp=0xc1e2716c)
at /usr/local/src/sys/sys/disklabel.h:459
dev = 0x68c440
bp = (struct buf *) 0xc1e1f200
slice = 0x68c440
part = 0x0
partname = "\000 "
sname = 0xc1e27000 "pKàÁ\001"
sn = 0x68c440
#13 0xc01360eb in ad_interrupt (request=0xc1e70a40)
at /usr/local/src/sys/dev/ata/ata-disk.c:609
adp = (struct ad_softc *) 0xc1e27000
dma_stat = 0xc02a0cbb
#14 0xc012ca1d in ata_intr (data=0xc1e04b00)
at /usr/local/src/sys/dev/ata/ata-all.c:607
ch = (struct ata_channel *) 0xc1e04b00
#15 0xc01cda0d in ipfw_tick (unused=0x0)
at /usr/local/src/sys/netinet/ip_fw2.c:2722
i = 0x400000
s = 0xcc4bb500
---Type <return> to continue, or q <return> to quit---
q = (ipfw_dyn_rule *) 0xc01cda0c
#16 0xc0275353 in doreti_swi ()
No symbol table info available.
(kgdb) frame e
#14 0xc012ca1d in ata_intr (data=0xc1e04b00)
at /usr/local/src/sys/dev/ata/ata-all.c:607
607 if (!ch->running || ad_interrupt(ch->running) == ATA_OP_CONTINUES)
(kgdb) l
602
603 /* find & call the responsible driver to process this interrupt */
604 switch (ch->active) {
605 #if NATADISK > 0
606 case ATA_ACTIVE_ATA:
607 if (!ch->running || ad_interrupt(ch->running) == ATA_OP_CONTINUES)
608 return;
609 break;
610 #endif
611 #if DEV_ATAPIALL
(kgdb) frame d
#13 0xc01360eb in ad_interrupt (request=0xc1e70a40)
at /usr/local/src/sys/dev/ata/ata-disk.c:609
609 diskerr(request->bp, (adp->device->channel->error & ATA_E_ICRC) ?
(kgdb) l
604 /* did any real errors happen ? */
605 if ((adp->device->channel->status & ATA_S_ERROR) ||
606 (request->flags & ADR_F_DMA_USED && dma_stat & ATA_BMSTAT_ERROR)) {
607 adp->device->channel->error =
608 ATA_INB(adp->device->channel->r_io, ATA_ERROR);
609 diskerr(request->bp, (adp->device->channel->error & ATA_E_ICRC) ?
610 "UDMA ICRC error" : "hard error", LOG_PRINTF,
611 request->blockaddr + (request->donecount / DEV_BSIZE),
612 &adp->disk.d_label);
613
(kgdb) p request->bp->b_dev
$2 = 0x80100000
More information about the freebsd-bugs
mailing list