sio interrupt-level buffer overflows

Tue Mar 29 22:06:33 PST 2005

On Wed, 23 Mar 2005, Oleg Tarasov wrote:

> About my panics. They persist and when this server panics it somehow
> overloads my network so it stops functioning until reboot. This is
> very, very bad.
>
> Maybe you could tell me where to write, or you could
> personally tell me what should I do.
>
> Using all my theoretical skills I have come to this data I could
> obtain from my dump:
>
> (kgdb) backtrace
> #0  doadump () at pcpu.h:159
> #1  0xc060b063 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:397
> #2  0xc060b389 in panic (fmt=0xc080321d "spin lock held too long")
>    at /usr/src/sys/kern/kern_shutdown.c:553
> #3  0xc060270c in _mtx_lock_spin (m=0xc08d7800, td=0xc19ca320, opts=0,
>    file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:613
> #4  0xc077c165 in siointr (arg=0xc1ab8800) at /usr/src/sys/dev/sio/sio.c:1710
> #5  0xc0790ead in intr_execute_handlers (isrc=0xc19b8890, iframe=0xd541ac94)
>    at /usr/src/sys/i386/i386/intr_machdep.c:203
> #6  0xc07932be in lapic_handle_intr (frame=
>      {if_vec = 52, if_fs = -717160424, if_es = -1067384816, if_ds = 16, if_edi
> = -1046699232, if_esi = -1064591424, if_ebp = -717116188, if_ebx = -1046425600,
> if_edx = -1064566184, if_ecx = 0, if_eax = -1046425600, if_eip = -1067440569, if
> _cs = 8, if_eflags = 582, if_esp = -1045200000, if_ss = 4})
>    at /usr/src/sys/i386/i386/local_apic.c:490
> #7  0xc078d753 in Xapic_isr1 () at apic_vector.s:110
> #8  0x00000034 in ?? ()
> #9  0xd5410018 in ?? ()
> #10 0xc0610010 in coredump (td=0xc08b9fc0) at vnode_if.h:1244
> #11 0xc05f6f46 in ithread_loop (arg=0xc1981c80)
>    at /usr/src/sys/kern/kern_intr.c:546
> #12 0xc05f6001 in fork_exit (callout=0xc05f6df8 <ithread_loop>,
>    arg=0xc1981c80, frame=0xd541ad48) at /usr/src/sys/kern/kern_fork.c:811
> #13 0xc078d3fc in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:209
> ...

I couldn't figure out the problem from this.  Your later mail says that
the problem is caused by ppp not being MPSAFE, at least with sio, so I
won't do much more with this stack trace, but I wonder about some of the
strange entries in it:

#13 - #11 are normal.
#10 is weird.  ithread_loop() shouldn't call coredump().
#8 - #9 seem to be more like stack garbage than module addresses.
#7 is normal, but it looks like someone broke stack traces for interrupts,
    giving the garbage in #8 - #10.
#0 - #6 are normal if the spin lock is already held by the same CPU that
    is handling the interrupt (except this can't happen :-).  I wouldn't
    have thought that broken locking in ppp could cause this.

Bruce