panic in em_txeof

Thu Sep 28 17:06:34 PDT 2006

On Thu, Sep 28, 2006 at 01:09:05PM +0200, Michiel Boland wrote:
 > -CURRENT from 25 Sept. (if_em.c has rev 1.147)
 > 
 > em, connected to cisco 2950 at 100 Mb full/duplex with TSO disabled.
 > 
 > At high load, the card stopped passing network traffic. After I 
 > ifconfig-ed the interface down and up again, I got this panic.
 > 
 > Obviously neither the network card malfunction or the panic are any good. 
 > I hope someone can figure out what's going on.
 > 
 > Cheers
 > Michiel
 > 
 > Fatal trap 12: page fault while in kernel mode
 > fault virtual address	= 0x568
 > fault code		= supervisor read, page not present
 > instruction pointer	= 0x20:0xc0464d9a
 > stack pointer	        = 0x28:0xd3358c50
 > frame pointer	        = 0x28:0xd3358c64
 > code segment		= base 0x0, limit 0xfffff, type 0x1b
 > 			= DPL 0, pres 1, def32 1, gran 1
 > processor eflags	= interrupt enabled, resume, IOPL = 0
 > current process		= 11 (swi4: clock sio)
 > trap number		= 12
 > panic: page fault
 > KDB: stack backtrace:
 > kdb_backtrace(100,c20736c0,28,d3358c10,c,...) at kdb_backtrace+0x29
 > panic(c063a952,c065b591,0,0,fffff,...) at panic+0xa8
 > trap_fatal(d3358c10,568,c20736c0,c069d0a0,0,...) at trap_fatal+0x2b6
 > trap_pfault(d3358c10,0,568) at trap_pfault+0x1cb
 > trap(d3350008,c04f0028,c2150028,568,ad,...) at trap+0x38d
 > calltrap() at calltrap+0x5
 > --- trap 0xc, eip = 0xc0464d9a, esp = 0xd3358c50, ebp = 0xd3358c64 ---
 > em_txeof(c20f1000) at em_txeof+0x86
 > em_watchdog(c2131000) at em_watchdog+0xa6
 > if_slowtimo(0) at if_slowtimo+0x66
 > softclock(0) at softclock+0x252
 > ithread_execute_handlers(c2072b04,c2070500) at 
 > ithread_execute_handlers+0x125
 > ithread_loop(c20426c0,d3358d38) at ithread_loop+0x54
 > fork_exit(c04cea10,c20426c0,d3358d38) at fork_exit+0x7a
 > fork_trampoline() at fork_trampoline+0x8
 > --- trap 0x1, eip = 0, esp = 0xd3358d6c, ebp = 0 ---
 > Uptime: 2d23h21m50s
 > Physical memory: 505 MB
 > Dumping 117 MB: 102 86 (CTRL-C to abort)  70 54 38 22 (CTRL-C to abort)  
 > (CTRL-C to abort)  6
 > 
 > #0  doadump () at pcpu.h:166
 > 166		__asm __volatile("movl %%fs:0,%0" : "=r" (td));
 > (kgdb) bt
 > #0  doadump () at pcpu.h:166
 > #1  0xc04e3ca4 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
 > #2  0xc04e3f6c in panic (fmt=0xc063a952 "%s") at 
 > /usr/src/sys/kern/kern_shutdown.c:565
 > #3  0xc0616d0a in trap_fatal (frame=0xd3358c10, eva=1384) at 
 > /usr/src/sys/i386/i386/trap.c:867
 > #4  0xc0616a2b in trap_pfault (frame=0xd3358c10, usermode=0, eva=1384) at 
 > /usr/src/sys/i386/i386/trap.c:776
 > #5  0xc0616625 in trap (frame=
 >       {tf_fs = -751501304, tf_es = -1068564440, tf_ds = -1038811096, tf_edi 
 >       = 1384, tf_esi = 173, tf_ebp = -751465372, tf_isp = -751465412, 
 >       tf_ebx = -1038800176, tf_edx = -1039200256, tf_ecx = -865996036, 
 >       tf_eax = 2768, tf_trapno = 12, tf_err = 0, tf_eip = -1069134438, 
 >       tf_cs = 32, tf_eflags = 66054, tf_esp = -1038938112, tf_ss = 231}) at 
 >       /usr/src/sys/i386/i386/trap.c:461
 > #6  0xc060759a in calltrap () at /usr/src/sys/i386/i386/exception.s:138
 > #7  0xc0464d9a in em_txeof (adapter=0xc20f1000) at 
 > /usr/src/sys/dev/em/if_em.c:2956
 > #8  0xc0461ace in em_watchdog (ifp=0xc2131000) at 
 > /usr/src/sys/dev/em/if_em.c:963
 > #9  0xc05576de in if_slowtimo (arg=0x0) at /usr/src/sys/net/if.c:1415
 > #10 0xc04f1ac2 in softclock (dummy=0x0) at 
 > /usr/src/sys/kern/kern_timeout.c:271
 > #11 0xc04ce955 in ithread_execute_handlers (p=0xc2072b04, ie=0xc2070500) at 
 > /usr/src/sys/kern/kern_intr.c:662
 > #12 0xc04cea64 in ithread_loop (arg=0xc20426c0) at 
 > /usr/src/sys/kern/kern_intr.c:745
 > #13 0xc04cd8b6 in fork_exit (callout=0xc04cea10 <ithread_loop>, 
 > arg=0xc20426c0, frame=0xd3358d38) at /usr/src/sys/kern/kern_fork.c:818
 > #14 0xc06075fc in fork_trampoline () at 
 > /usr/src/sys/i386/i386/exception.s:199
 > (kgdb) f 7
 > #7  0xc0464d9a in em_txeof (adapter=0xc20f1000) at 
 > /usr/src/sys/dev/em/if_em.c:2956
 > 2956			num_avail++;
 > (kgdb) info locals
 > i = 173
 > num_avail = 231
 > tx_buffer = (struct em_buffer *) 0x568
 > tx_desc = (struct em_tx_desc *) 0xc2152ad0
 > ifp = (struct ifnet *) 0xc2131000

As Jack said I can't sure how tx_buffer can have bogus value.

Since switching to adaptive polling on em(4) em_rxeof() runs without
locks held. But if you force interface down while em_rxeof() is in
prgoress it would corrupt softc/hardware. It's just vague guess since
no other users reported this kind of issues.
Removing em_txeof() in em_watchdog() may help for your case(eventually, 
em_watchdog() will reset hardware) but I don't think it's correct fix
for root cause.

-- 
Regards,
Pyun YongHyeon