panic on boot

John Baldwin jhb at freebsd.org
Thu Dec 23 14:10:37 UTC 2010


On Thursday, December 23, 2010 8:33:13 am Daniel Braniss wrote:
> > On Thursday, December 23, 2010 1:47:39 am Daniel Braniss wrote:
> > > > On Wednesday, December 22, 2010 10:58:56 am Daniel Braniss wrote:
> > > > > > On Wednesday, December 22, 2010 5:12:03 am Daniel Braniss wrote:
> > > > > > > the hardware is Sun Fire X2200 M2, and it's discless, PXE booted.
> > > > > > > 
> > > > > > > this seems to have started sometime before 8.2, and it
> > > > > > > 'sometimes happens':
> > > > > > > 
> > > > > > > FreeBSD 8.2-PRERELEASE #15 r4274: Wed Dec 22 09:11:27 IST 2010c40, rbp = 
> > > > > > > 0xffffffff80ef5c60 ---
> > > > > > >     danny at rnd:/home/obj/rnd/r+d/stable/8/sys/HUJI amd64
> > > > > > > Timecounter "i8254" frequency 1193182 Hz quality 0
> > > > > > > CPU: Dual-Core AMD Opteron(tm) Processor 2218 (2613.40-MHz K8-class CPU)
> > > > > > >   Origin = "AuthenticAMD"  Id = 0x40f13  Family = f  Model = 41  Stepping = 3
> > > > > > >   Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
> > > > > > > CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
> > > > > > >   Features2=0x2001<SSE3,CX16>
> > > > > > >   AMD Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!>
> > > > > > >   AMD Features2=0x1f<LAHF,CMP,SVM,ExtAPIC,CR8>
> > > > > > > ...
> > > > > > > SMP: AP CPU #3 Launched!
> > > > > > > (cd0:ata0:0:0:0): SCSI status: Check Condition
> > > > > > > cpu3 AP:
> > > > > > > (cd0:ata0:0:0:0): SCSI sense: NOT READY asc:3a,0 (Medium not present)
> > > > > > >      ID: 0x03000000   VER: 0x80050010 LDR: 0x00000000 DFR: 0xffffffff
> > > > > > > (cd0:  lint0: 0x00010700 lint1: 0x00000400 TPR: 0x00000000 SVR: 0x000001ff
> > > > > > > ata0:0:  timer: 0x000200ef therm: 0x00010000 err: 0x000000f00: pmc: 0x000104000): 
> > > > > > > Error 6, Unretryable error
> > > > > > > SMP: AP CPU #2 Launched!
> > > > > > > cd0 at ata0 bus 0 scbus0 target 0 lun 0
> > > > > > > cpu2 AP:
> > > > > > > cd0:      ID: 0x02000000   VER: 0x80050010 LDR: 0x00000000 DFR: 0xffffffff
> > > > > > > <TEAC DV-28E-N P.6A> Removable CD-ROM SCSI-0 device 
> > > > > > >   lint0: 0x00010700 lint1: 0x00000400 TPR: 0x00000000 SVR: 0x000001ff
> > > > > > > cd0: 33.300MB/s transfers  timer: 0x000200ef therm: 0x00010000 err: 0x000000f0 ( pmc: 0x00010400UDMA2, 
> > > > > > > ATAPI 12bytes, ioapic0: routing intpin 3 (PIO 65534bytesISA IRQ 3)) to lapic 1 vector 48
> > > > > > > f
> > > > > > > loiwotaapbilce0 :c lreoaunteirn gs tianrttpeidn
> > > > > > >  4 (cd0: Attempt to query device size failed: NOT READY, Medium not present
> > > > > > > ISA IRQ 4) to lapic 2 vector 48
> > > > > > > ioapic0: routing intpin 9 (ISA IRQ 9) to lapic 3 vector 48
> > > > > > > ioapic0: routing intpin 15 (ISA IRQ 15) to lapic 1 vector 49
> > > > > > > ioapic0: routing intpin 17 (PCI IRQ 17) to lapic 2 vector 49
> > > > > > > ioapic0: routing intpin 18 (PCI IRQ 18) to lapic 3 vector 49
> > > > > > > ioapic0: routing intpin 22 (PCI IRQ 22) to lapic 1 vector 50
> > > > > > > ioapic0: routing intpin 23 (PCI IRQ 23) to lapic 2 vector 50
> > > > > > > kernel trap 12 with interrupts disabled
> > > > > > > 
> > > > > > > 
> > > > > > > Fatal trap 12: page fault while in kernel mode
> > > > > > > cpuid = 0; apic id = 00
> > > > > > > fault virtual address   = 0x10
> > > > > > > fault code              = supervisor read data, page not present
> > > > > > > instruction pointer     = 0x20:0xffffffff808b1581
> > > > > > > stack pointer           = 0x28:0xffffffff80ef5b20
> > > > > > > frame pointer           = 0x28:0xffffffff80ef5b50
> > > > > > > code segment            = base 0x0, limit 0xfffff, type 0x1b
> > > > > > >                         = DPL 0, pres 1, long 1, def32 0, gran 1
> > > > > > > processor eflags        = resume, IOPL = 0
> > > > > > > current process         = 0 (swapper)
> > > > > > > trap number             = 12
> > > > > > > panic: page fault
> > > > > > > cpuid = 0
> > > > > > > KDB: stack backtrace:
> > > > > > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> > > > > > > kdb_backtrace() at kdb_backtrace+0x37
> > > > > > > panic() at panic+0x187
> > > > > > > trap_fatal() at trap_fatal+0x290
> > > > > > > trap_pfault() at trap_pfault+0x28f
> > > > > > > trap() at trap+0x3df
> > > > > > > calltrap() at calltrap+0x8
> > > > > > > --- trap 0xc, rip = 0xffffffff808b1581, rsp = 0xffffffff80ef5b20, rbp = 0xffffffff80ef5b50 ---
> > > > > > > intr_execute_handlers() at intr_execute_handlers+0x21
> > > > > > > lapic_handle_intr() at lapic_handle_intr+0x37
> > > > > > > Xapic_isr1() at Xapic_isr1+0xa5
> > > > > > > --- interrupt, rip = 0xffffffff808b6cf3, rsp = 0xffffffff80ef5c40, rbp = 0xffffffff80ef5c60 ---
> > > > > > > spinlock_exit() at spinlock_exit+0x33
> > > > > > > ioapic_assign_cpu() at ioapic_assign_cpu+0x123
> > > > > > > intr_shuffle_irqs() at intr_shuffle_irqs+0x9d
> > > > > > > mi_startup() at mi_startup+0x77
> > > > > > > btext() at btext+0x2c
> > > > > > > Uptime: 2s
> > > > > > 
> > > > > > Can you do 'l *intr_execute_handlers+0x21' and 'l *ioapic_assign_cpu+0x123'
> > > > > > in 'gdb kernel.debug' of your kernel?
> > > > > 
> > > > > sure, as soon as it happens, and it aint happening now :-(
> > > > > but when it will happen, I think it won't let me into the debugger
> > > > > - probably will have to recompile
> > > > 
> > > > You don't need to trigger the panic, you can just run
> > > > 'gdb /path/to/kernel.debug' (e.g.
> > > > 'gdb /usr/obj/usr/src/sys/GENERIC/kernel.debug')
> > > sorry, missed the gdb part.
> > > 
> > > gdb /d/7/boot/kernel/kernel
> > > ...
> > > (gdb) l *intr_execute_handlers+0x21
> > > 0xffffffff808b1581 is in intr_execute_handlers (/r+d/stable/8/sys/amd64/amd64/i
> > > ntr_machdep.c:243).
> > > 238              * We count software interrupts when we process them.  The
> > > 239              * code here follows previous practice, but there's an
> > > 240              * argument for counting hardware interrupts when they're
> > > 241              * processed too.
> > > 242              */
> > > 243             (*isrc->is_count)++;
> > > 244             PCPU_INC(cnt.v_intr);
> > > 245     
> > > 246             ie = isrc->is_event;
> > > 247     
> > > (gdb) l *ioapic_assign_cpu+0x123
> > > 0xffffffff808b29c3 is in ioapic_assign_cpu (/r+d/stable/8/sys/amd64/amd64/io_ap
> > > ic.c:383).
> > > 378     
> > > 379             /*
> > > 380              * Free the old vector after the new one is established.  This 
> > > is done
> > > 381              * to prevent races where we could miss an interrupt.
> > > 382              */
> > > 383             if (old_vector) {
> > > 384                     if (isrc->is_handlers > 0)
> > > 385                             apic_disable_vector(old_id, old_vector);
> > > 386                     apic_free_vector(old_id, old_vector, intpin->io_irq);
> > > 387             }
> > > 
> > > BTW, the config has
> > > 	makeoptions     DEBUG=-g
> > > but I don't see no kernel.debug (searched the obj directory, and only found 
> > > old versions)
> > 
> > Hmmm, can you get a crashdump?  If not, try this patch:
> > 
> > Index: local_apic.c
> > ===================================================================
> > --- local_apic.c	(revision 216651)
> > +++ local_apic.c	(working copy)
> > @@ -763,6 +763,9 @@
> >  		panic("Couldn't get vector from ISR!");
> >  	isrc = intr_lookup_source(apic_idt_to_irq(PCPU_GET(apic_id),
> >  	    vector));
> > +	if (isrc == NULL)
> > +		panic("null isrc for APIC %d, vector %d", PCPU_GET(apic_id),
> > +		    vector);
> >  	intr_execute_handlers(isrc, frame);
> >  }
> >  
> > If it triggers this new panic, capture the panic message and the output of
> > 'show apic' from DDB.
> > 
> > -- 
> > John Baldwin
> 
> because reboot hung :-), I was able to ~^B and so:
> 
> db> show apic
> Interrupts bound to lapic 0
> vec 0x31 -> IRQ 21
> vec 0x33 -> IRQ 14

Ok, so vec 0x32 was recently moved.  Can you try this patch?

Index: sys/x86/x86/io_apic.c
===================================================================
--- io_apic.c	(revision 216651)
+++ io_apic.c	(working copy)
@@ -359,7 +359,9 @@
 	if (!intpin->io_masked && !intpin->io_edgetrigger) {
 		ioapic_write(io->io_addr, IOAPIC_REDTBL_LO(intpin->io_intpin),
 		    intpin->io_lowreg | IOART_INTMSET);
+		mtx_unlock_spin(&icu_lock);
 		DELAY(100);
+		mtx_lock_spin(&icu_lock);
 	}
 
 	intpin->io_cpu = apic_id;

-- 
John Baldwin


More information about the freebsd-stable mailing list