kern/66359: repeatable kernel panic during ata(4) device probe

Eugene Grosbein eugen at grosbein.pp.ru
Fri May 7 11:00:57 PDT 2004


>Number:         66359
>Category:       kern
>Synopsis:       repeatable kernel panic during ata(4) device probe
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri May 07 11:00:39 PDT 2004
>Closed-Date:
>Last-Modified:
>Originator:     Eugene Grosbein
>Release:        FreeBSD 4.10-PRERELEASE i386
>Organization:
Svyaz Service JSC
>Environment:
System: FreeBSD grosbein.pp.ru 4.10-PRERELEASE FreeBSD 4.10-PRERELEASE #12: Sat May 8 00:53:46 KRAST 2004 eu at grosbein.pp.ru:/usr/local/obj/usr/local/src/sys/DADV i386
	Same for 4.9-RELEASE
	Same for 4.8-RELEASE

>Description:

	I've found a way to make kernel panic 'reliably'
	during device probe stage. It works for 4.8, 4.8 and 4.10-PRE.
	I didn't tested 5-CURRENT as I do not have spare CURRENT box.

	Today my Maxtor 2B020H1 HDD failed - now it is detected by BIOS
	as Maxtor ATHENA. Size and parameters are detected correctly but
	the drive aborts all read requests. It seems that its internal
	service information is lost and the drive switched into special
	'safe technological' mode itself and now should be repaired
	in a lab. I will not consider data recovery here.

	Better look how ata(4) behaves with the drive. Here is boot log:

>How-To-Repeat:

	Take bad hard disk drive that refuses to read its first cylinder
	and boot ata(4)-enabled system.

>Fix:

	Unknown for me.
>Release-Note:
>Audit-Trail:
>Unformatted:
 >> FreeBSD/i386 BOOT
 Default: 1:ad(1,a)/boot/loader
 boot: -h
 Console: serial port
 BIOS drive A: is disk0
 BIOS drive C: is disk1
 BIOS drive D: is disk2
 BIOS 639kB/523200kB available memory
 
 FreeBSD/i386 bootstrap loader, Revision 0.8
 (eu at grosbein.pp.ru, Sun Apr 25 01:02:09 KRAST 2004)
 Loading /boot/defaults/loader.conf 
 /kernel text=0x1eda60 data=0x41b74+0x23940 syms=[0x4+0x332e0+0x4+0x3a3e4]
 Hit [Enter] to boot immediately, or any other key for command prompt.
 Booting [kernel] in 1 second... Booting [kernel]...               
 Copyright (c) 1992-2004 The FreeBSD Project.
 Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
 	The Regents of the University of California. All rights reserved.
 FreeBSD 4.10-PRERELEASE #12: Sat May  8 00:53:46 KRAST 2004
     eu at grosbein.pp.ru:/usr/local/obj/usr/local/src/sys/DADV
 Timecounter "i8254"  frequency 1193167 Hz
 CPU: Intel Celeron (902.03-MHz 686-class CPU)
   Origin = "GenuineIntel"  Id = 0x68a  Stepping = 10
   Features=0x383f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
 real memory  = 536805376 (524224K bytes)
 config> flags atkbd 0x0
 config> quit
 sio0: gdb debugging port
 avail memory = 518496256 (506344K bytes)
 Preloaded elf kernel "kernel" at 0xc03c3000.
 Preloaded userconfig_script "/boot/kernel.conf" at 0xc03c309c.
 VESA: v2.0, 65536k memory, flags:0x1, mode table:0xc034cac2 (1000022)
 VESA: ATI RADEON 9200
 Pentium Pro MTRR support enabled
 Using $PIR table, 8 entries at 0xc00fdef0
 apm0: <APM BIOS> on motherboard
 apm0: found APM BIOS v1.2, connected at v1.2
 npx0: <math processor> on motherboard
 npx0: INT 16 interface
 pcib0: <Intel 82443BX (440 BX) host to PCI bridge> on motherboard
 pci0: <PCI bus> on pcib0
 agp0: <Intel 82443BX (440 BX) host to PCI bridge> mem 0xe8000000-0xebffffff at device 0.0 on pci0
 pcib1: <Intel 82443BX (440 BX) PCI-PCI (AGP) bridge> at device 1.0 on pci0
 pci1: <PCI bus> on pcib1
 pci1: <ATI model 5961 graphics accelerator> at 0.0 irq 11
 pci1: <ATI model 5941 graphics accelerator> at 0.1
 isab0: <Intel 82371AB PCI to ISA bridge> at device 7.0 on pci0
 isa0: <ISA bus> on isab0
 atapci0: <Intel PIIX4 ATA33 controller> port 0xf000-0xf00f at device 7.1 on pci0
 ata0: at 0x1f0 irq 14 on atapci0
 ata1: at 0x170 irq 15 on atapci0
 uhci0: <Intel 82371AB/EB (PIIX4) USB controller> port 0xe000-0xe01f irq 10 at device 7.2 on pci0
 usb0: <Intel 82371AB/EB (PIIX4) USB controller> on uhci0
 usb0: USB revision 1.0
 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
 uhub0: 2 ports with 2 removable, self powered
 uscanner0: Hewlett-Packard HP ScanJet 2200C, rev 1.10/1.00, addr 2
 uhid0: American Power Conversion Back-UPS 500 FW: 6.5.I USB FW: c1, rev 1.10/1.00, addr 3, iclass 3/0
 intpm0: <Intel 82371AB Power management controller> port 0x5000-0x500f irq 9 at device 7.3 on pci0
 intpm0: I/O mapped 5000
 intpm0: intr IRQ 9 enabled revision 0
 smbus0: <System Management Bus> on intsmb0
 smb0: <SMBus general purpose I/O> on smbus0
 intpm0: PM I/O mapped 4000 
 fxp0: <Intel 82557 Pro/100 Ethernet> port 0xe400-0xe41f mem 0xef000000-0xef0fffff,0xef100000-0xef100fff irq 9 at device 16.0 on pci0
 fxp0: Ethernet address 00:a0:c9:89:95:1f
 inphy0: <i82555 10/100 media interface> on miibus0
 inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
 orm0: <Option ROM> at iomem 0xc0000-0xccfff on isa0
 pmtimer0 on isa0
 fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
 fdc0: FIFO enabled, 8 bytes threshold
 fd0: <1440-KB 3.5" drive> on fdc0 drive 0
 atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
 atkbd0: <AT Keyboard> irq 1 on atkbdc0
 psm0: <PS/2 Mouse> irq 12 on atkbdc0
 psm0: model NetMouse/NetScroll Optical, device ID 0
 vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
 sc0: <System console> on isa0
 sc0: VGA <24 virtual consoles, flags=0x0>
 sio0 at port 0x3f8-0x3ff irq 4 flags 0x90 on isa0
 sio0: type 16550A, console
 sio1 at port 0x2f8-0x2ff irq 3 on isa0
 sio1: type 16550A
 ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0
 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
 ppc0: FIFO with 16/16/16 bytes threshold
 lpt0: <Printer> on ppbus0
 lpt0: Interrupt-driven port
 ppi0: <Parallel I/O> on ppbus0
 pcm0: <Yamaha OPL-SAx> at port 0x220-0x22f,0x530-0x537,0x388-0x38f,0x330-0x331,0x370-0x371 irq 5 drq 0,1 on isa0
 DUMMYNET initialized (011031)
 ipfw2 initialized, divert enabled, rule-based forwarding enabled, default to deny, logging unlimited
 IPsec: Initialized Security Association Processing.
 ad0: 6149MB <WDC AC26400B> [13328/15/63] at ata0-master UDMA33
 ad1: READ command timeout tag=0 serv=0 - resetting
 ata0: resetting devices .. done
 ad1: 19541MB <Maxtor ATHENA> [39703/16/63] at ata0-slave UDMA33
 ad1: READ command timeout tag=0 serv=0 - resetting
 ata0: resetting devices .. done
 ad1: READ command timeout tag=0 serv=0 - resetting
 ata0: resetting devices .. done
 ad1: READ command timeout tag=0 serv=0 - resetting
 ad1: trying fallback to PIO mode
 ata0: resetting devices .. done
 
 
 Fatal trap 12: page fault while in kernel mode
 fault virtual address	= 0x80100004
 fault code		= supervisor read, page not present
 instruction pointer	= 0x8:0xc0157f97
 stack pointer	        = 0x10:0xc02f0924
 frame pointer	        = 0x10:0xc02f0924
 code segment		= base 0x0, limit 0xfffff, type 0x1b
 			= DPL 0, pres 1, def32 1, gran 1
 processor eflags	= interrupt enabled, resume, IOPL = 0
 current process		= Idle
 interrupt mask		= bio 
 trap number		= 12
 panic: page fault
 
 syncing disks... 
 done
 Uptime: 40s
 Automatic reboot in 15 seconds - press a key on the console to abort
 Rebooting...
 
 
 	That was using next settings:
 
 hw.ata.ata_dma: 1
 hw.ata.wc: 0
 hw.ata.tags: 0
 hw.ata.atapi_dma: 0
 
 	If I use hw.ata.ata_dma=0 then it boots right:
 
 ad0: 6149MB <WDC AC26400B> [13328/15/63] at ata0-master PIO4
 ad1: hard error reading fsbn 40020561 of 0-3 (ad1 bn 40020561; cn 39702 tn 15 sn 0) status=51 error=04
 ad1: 19541MB <Maxtor ATHENA> [39703/16/63] at ata0-slave PIO4
 
 	And proceedes till multiuser.
 
 	It is not possible to get crashdump in this situation so I used
 	another machine and GDB over serial line.
 	Here comes debug session script. It correspondes to the moment
 	after 'Automatic reboot in 15 seconds' message.
 	As you can see, fault virtual address 0x80100004 is the (junk)
 	value of request->bp->b_dev in ad_interrupt().
 
 	And by the way, is it OK to call splx(s) and return without
 	calling ATA_UNLOCK_CH(ch) in the ata-all.c:ata_start() inside
 	'#if NATADISK > 0' block?
 
 Script started on Sat May  8 01:02:20 2004
 GNU gdb 4.18 (FreeBSD)
 Copyright 1998 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type "show copying" to see the conditions.
 There is absolutely no warranty for GDB.  Type "show warranty" for details.
 This GDB was configured as "i386-unknown-freebsd"...Deprecated bfd_read called at /usr/local/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c line 2627 in elfstab_build_psymtabs
 Deprecated bfd_read called at /usr/local/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c line 933 in fill_symbuf
 
 (kgdb) target remote /dev/cuaa0
 Remote debugging using /dev/cuaa0
 Debugger (msg=0xc02df569 "manual escape to debugger")
     at /usr/local/src/sys/i386/i386/db_interface.c:319
 319		    in_Debugger = 0;
 (kgdb) c
 Continuing.
 
 Program received signal SIGTRAP, Trace/breakpoint trap.
 Debugger (msg=0xc02df569 "manual escape to debugger")
     at /usr/local/src/sys/i386/i386/db_interface.c:319
 319		    in_Debugger = 0;
 (kgdb) set radix 16
 Input and output radices now set to decimal 16, hex 10, octal 20.
 (kgdb) bt full
 #0  Debugger (msg=0xc02df569 "manual escape to debugger")
     at /usr/local/src/sys/i386/i386/db_interface.c:319
 	msg = 0x26 <Address 0x26 out of bounds>
 	in_Debugger = 0x1
 #1  0xc026e56e in scgetc (sc=0xc034cd40, flags=0x1)
     at /usr/local/src/sys/dev/syscons/syscons.c:3198
 	scp = (scr_stat *) 0xc0346f80
 	tp = (struct tty *) 0x26
 	c = 0x86
 	this_scr = 0x1
 	f = 0xffffffff
 	i = 0x86
 #2  0xc026c182 in sccngetch (flags=0x0)
     at /usr/local/src/sys/dev/syscons/syscons.c:1565
 	fkey = {str = '\000' <repeats 15 times>, len = 0x0}
 	fkeycp = 0x0
 	scp = (scr_stat *) 0xc0346f80
 	p = (u_char *) 0xc02e324f ""
 	cur_mode = 0x1
 	s = 0xffffffff
 	c = 0x100
 #3  0xc026c002 in sccngetc (dev=0xc0332ea0)
     at /usr/local/src/sys/dev/syscons/syscons.c:1488
 No locals.
 ---Type <return> to continue, or q <return> to quit---
 #4  0xc017e85f in cngetc () at /usr/local/src/sys/kern/tty_cons.c:433
 	c = 0xc02e324f
 #5  0xc0164875 in shutdown_panic (junk=0x0, howto=0x100)
     at /usr/local/src/sys/kern/kern_shutdown.c:435
 	loop = 0x87
 #6  0xc01647cd in boot (howto=0x100)
     at /usr/local/src/sys/kern/kern_shutdown.c:378
 	_el = (struct eventhandler_list *) 0x26
 	_ep = (struct eventhandler_entry *) 0xc1050e60
 	howto = 0x100
 #7  0xc0164c11 in panic (fmt=0xc02e5c0c "%s")
     at /usr/local/src/sys/kern/kern_shutdown.c:656
 	fmt = 0xc02e5c0c "%s"
 	bootopt = 0x100
 	buf = "page fault", '\000' <repeats 245 times>
 #8  0xc0280dc0 in trap_fatal (frame=0xc02f08fc, eva=0x80100004)
     at /usr/local/src/sys/i386/i386/trap.c:974
 	frame = (struct trapframe *) 0xc02f08fc
 	eva = 0x26
 	code = 0x10
 	type = 0xc
 	ss = 0x10
 	esp = 0x26
 	softseg = {ssd_base = 0x0, ssd_limit = 0xfffff, ssd_type = 0x1b, 
 ---Type <return> to continue, or q <return> to quit---
   ssd_dpl = 0x0, ssd_p = 0x1, ssd_xx = 0xf, ssd_xx1 = 0x2, ssd_def32 = 0x1, 
   ssd_gran = 0x1}
 #9  0xc0280a55 in trap_pfault (frame=0xc02f08fc, usermode=0x0, eva=0x80100004)
     at /usr/local/src/sys/i386/i386/trap.c:867
 	va = 0x80100000
 	vm = (struct vmspace *) 0x26
 	map = 0xc
 	rv = 0x20
 	ftype = 0x0
 	p = (struct proc *) 0x0
 #10 0xc02805e7 in trap (frame={tf_fs = 0xc1e00010, tf_es = 0x10, 
       tf_ds = 0x80010, tf_edi = 0xc1e1f200, tf_esi = 0xc1e27000, 
       tf_ebp = 0xc02f093c, tf_isp = 0xc02f0928, tf_ebx = 0x80100000, 
       tf_edx = 0xc1e70a40, tf_ecx = 0xc1e04b00, tf_eax = 0x80100000, 
       tf_trapno = 0xc, tf_err = 0x0, tf_eip = 0xc0157f97, tf_cs = 0x8, 
       tf_eflags = 0x90293, tf_esp = 0xc02f095c, tf_ss = 0xc02076ad})
     at /usr/local/src/sys/i386/i386/trap.c:466
 	p = (struct proc *) 0x0
 	sticks = 0x1
 	i = 0x0
 	ucode = 0x0
 	type = 0xc
 	code = 0x20
 	eva = 0x80100004
 ---Type <return> to continue, or q <return> to quit---
 #11 0xc0157f97 in minor (x=0x80100000)
     at /usr/local/src/sys/kern/kern_conf.c:198
 	x = 0x26
 #12 0xc02076ad in diskerr (bp=0xc1e1f200, what=0xc02a0ccb "hard error", 
     pri=0xffffffff, blkdone=0x262aa51, lp=0xc1e2716c)
     at /usr/local/src/sys/sys/disklabel.h:459
 	dev = 0x68c440
 	bp = (struct buf *) 0xc1e1f200
 	slice = 0x68c440
 	part = 0x0
 	partname = "\000 "
 	sname = 0xc1e27000 "pKàÁ\001"
 	sn = 0x68c440
 #13 0xc01360eb in ad_interrupt (request=0xc1e70a40)
     at /usr/local/src/sys/dev/ata/ata-disk.c:609
 	adp = (struct ad_softc *) 0xc1e27000
 	dma_stat = 0xc02a0cbb
 #14 0xc012ca1d in ata_intr (data=0xc1e04b00)
     at /usr/local/src/sys/dev/ata/ata-all.c:607
 	ch = (struct ata_channel *) 0xc1e04b00
 #15 0xc01cda0d in ipfw_tick (unused=0x0)
     at /usr/local/src/sys/netinet/ip_fw2.c:2722
 	i = 0x400000
 	s = 0xcc4bb500
 ---Type <return> to continue, or q <return> to quit---
 	q = (ipfw_dyn_rule *) 0xc01cda0c
 #16 0xc0275353 in doreti_swi ()
 No symbol table info available.
 (kgdb) frame e
 #14 0xc012ca1d in ata_intr (data=0xc1e04b00)
     at /usr/local/src/sys/dev/ata/ata-all.c:607
 607		if (!ch->running || ad_interrupt(ch->running) == ATA_OP_CONTINUES)
 (kgdb) l
 602	
 603	    /* find & call the responsible driver to process this interrupt */
 604	    switch (ch->active) {
 605	#if NATADISK > 0
 606	    case ATA_ACTIVE_ATA:
 607		if (!ch->running || ad_interrupt(ch->running) == ATA_OP_CONTINUES)
 608		    return;
 609		break;
 610	#endif
 611	#if DEV_ATAPIALL
 (kgdb) frame d
 #13 0xc01360eb in ad_interrupt (request=0xc1e70a40)
     at /usr/local/src/sys/dev/ata/ata-disk.c:609
 609		diskerr(request->bp, (adp->device->channel->error & ATA_E_ICRC) ?
 (kgdb) l
 604	    /* did any real errors happen ? */
 605	    if ((adp->device->channel->status & ATA_S_ERROR) ||
 606		(request->flags & ADR_F_DMA_USED && dma_stat & ATA_BMSTAT_ERROR)) {
 607		adp->device->channel->error =
 608		    ATA_INB(adp->device->channel->r_io, ATA_ERROR);
 609		diskerr(request->bp, (adp->device->channel->error & ATA_E_ICRC) ?
 610			"UDMA ICRC error" : "hard error", LOG_PRINTF,
 611			request->blockaddr + (request->donecount / DEV_BSIZE),
 612			&adp->disk.d_label);
 613	
 (kgdb) p request->bp->b_dev
 $2 = 0x80100000
 
 


More information about the freebsd-bugs mailing list