repeating crashes with 8.1

Sat Oct 23 08:21:42 UTC 2010

At 12:41 AM 10/23/2010, Jack Vogel wrote:
>Odd, can you make any connection between this and the em complaints??

I dont think so.  This is on an igb nic and a different 
panic/behaviour. I have the box sitting at the debugger prompt in the 
FreeBSD netperf cluster, so hopefully someone can take a look and see 
what is the issue.

         ---Mike

>Jack
>
>
>On Fri, Oct 22, 2010 at 6:59 PM, Mike Tancsa 
><<mailto:mike at sentex.net>mike at sentex.net> wrote:
>At 09:11 PM 10/22/2010, Mike Tancsa wrote:
>At 08:01 PM 10/22/2010, Chris Morrow wrote:
>Note, Warren and I attempted to test this this evening on a 10.04 Ubuntu
>box, no crashy-crashy...
>
>
>
>I was able to trigger the issue on box (c).  I was ping6ing box (a) 
>when I did a hard down of (d)'s connected interface. The box then 
>dropped to debugger
>
>
>Fatal trap 9: general protection fault while in kernel mode
>cpuid = 0; apic id = 00
>instruction pointer     = 0x20:0xffffffff80740a50
>stack pointer           = 0x28:0xffffff800005a890
>frame pointer           = 0x28:0xffffff800005a930
>
>code segment            = base 0x0, limit 0xfffff, type 0x1b
>                        = DPL 0, pres 1, long 1, def32 0, gran 1
>processor eflags        = interrupt enabled, resume, IOPL = 0
>current process         = 12 (swi4: clock)
>[thread pid 12 tid 100007 ]
>Stopped at      in6_cksum+0x410:        movzwl  (%rsi),%r10d
>db> bt
>Tracing pid 12 tid 100007 td 0xffffff00025083e0
>in6_cksum() at in6_cksum+0x410
>icmp6_reflect() at icmp6_reflect+0x312
>icmp6_error() at icmp6_error+0x1ec
>nd6_llinfo_timer() at nd6_llinfo_timer+0x208
>softclock() at softclock+0x2a6
>intr_event_execute_handlers() at intr_event_execute_handlers+0x66
>ithread_loop() at ithread_loop+0xb2
>fork_exit() at fork_exit+0x12a
>fork_trampoline() at fork_trampoline+0xe
>--- trap 0, rip = 0, rsp = 0xffffff800005ad30, rbp = 0 ---
>db>
>
>
>
>
>I was able to do it, but not the box I expected
>
>4 boxes
>
>(a) Attacking host 2001:db8:1:1/64
>(b) victim, not on a connected interface with a). Outside interface 
>- em0 - 2001:db8::2:1/64, inside interface - em1 - 2001:db8::3:1/64
>(c) a host behind (b) 2001:db8::3:c/64
>(d) a host behind (b), 2001:db8::3:d/64
>
>
>hosts (c) and (d) have default gateways to b).  (c) however, has a 
>next hop for (a) via (d).  So rather than go out its normal default 
>gateway, it takes an extra hop via (d).
>
>Start a ping6 from (a) to (c).  Then down (d)'s interface so that 
>the ping6 fails.  Let the ping keep running for an hour or 
>two.  Eventually (b) gets error messages like
>
>Oct 22 18:38:32 zoo kernel: em1: discard frame w/o packet header
>
>and crashes.
>
>Unfortunately, I thought it would be (c) that crapped out, not (b) 
>and I didnt have crash dumps enabled on the host.  Just in the 
>process of setting up a better environment.
>
>        ---Mike
>
>-chris
>
>On 10/22/10 16:27, Joel Jaeggli wrote:
> > Ok I'll try testing that on some box I can reach with both hands.
> >
> > fyi nagasaki is:
> >
> > [root at nagasaki ~]# uname -a
> > FreeBSD <http://nagasaki.bogus.com>nagasaki.bogus.com 
> 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #13:
> > Sun May 30 22:19:23 UTC 2010
> > root at nagasaki.bogus.com:/usr/obj/usr/src/sys/GENERIC  i386
> > [root at nagasaki ~]#
> >
> >
> > On 10/22/10 1:17 PM, Randy Bush wrote:
> >>>>>>> Do you know how this panic is triggered ? Are you able to
> >>>>>>> create it on demand ?
> >>>>>>
> >>>>>> no i do not.  bring server up and it'll happen in half an hour.
> >>>>>> and the server was happy for two months.  so i am thinking hardware.
> >>>>>
> >>>>> Perhaps. The reason I ask is that I had a box go down last night with
> >>>>> the same set of errors.  The box has a number of ipv6 routes, but its
> >>>>> next hop was down and the problems started soon after. So I wonder if
> >>>>> it has something to do with that.  Do you have ipv6 on this box and
> >>>>> are all the next hop addresses correct / reachable ?
> >>>>>
> >>>>> Oct 22 02:06:02 i4 kernel: em1: discard frame w/o packet header
> >>>>> Oct 22 02:06:10 i4 kernel: em2: discard frame w/o packet header
> >>>>> Oct 22 02:06:21 i4 kernel: em1: discard frame w/o packet header
> >>>>
> >>>> it was co-incident with a border router being taken down for new router
> >>>> install.  that router was the v6 exit the servers was using.  i have now
> >>>> pointed default6 to a different exit.  the server seems happy.
> >>>
> >>>
> >>> Are you servers still up ?  I guess the question now is how to
> >>> trigger this problem on demand.  Perhaps lots of inbound ipv6 traffic
> >>> with a bad next hop out ?  How recent are you sources ?  The kernel
> >>> said Oct 21st. Were the sources from then too ?
> >>
> >> yes, kernel and world from 21 oct
> >>
> >> chris had an idea on retrigger, install a static for a small dest that
> >> points to a hole.  send a packet to the small dest.
> >>
> >> randy
> >>
>
>
>--------------------------------------------------------------------
>Mike Tancsa,                                      tel +1 519 651 3400
>Sentex 
>Communications, 
><mailto:mike at sentex.net>mike at sentex.net
>Providing Internet since 
>1994                    <http://www.sentex.net>www.sentex.net
>Cambridge, Ontario 
>Canada                         <http://www.sentex.net/mike>www.sentex.net/mike
>
>
>--------------------------------------------------------------------
>Mike Tancsa,                                      tel +1 519 651 3400
>Sentex 
>Communications, 
><mailto:mike at sentex.net>mike at sentex.net
>Providing Internet since 
>1994                    <http://www.sentex.net>www.sentex.net
>Cambridge, Ontario 
>Canada                         <http://www.sentex.net/mike>www.sentex.net/mike
>
>_______________________________________________
><mailto:freebsd-stable at freebsd.org>freebsd-stable at freebsd.org mailing list
><http://lists.freebsd.org/mailman/listinfo/freebsd-stable>http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>To unsubscribe, send any mail to 
>"<mailto:freebsd-stable-unsubscribe at freebsd.org>freebsd-stable-unsubscribe at freebsd.org"
>

--------------------------------------------------------------------
Mike Tancsa,                                      tel +1 519 651 3400
Sentex Communications,                            mike at sentex.net
Providing Internet since 1994                    www.sentex.net
Cambridge, Ontario Canada                         www.sentex.net/mike