kern/113548: [dummynet] [patch] system hangs with dummynet queues

Cristian KLEIN cristi at net.utcluj.ro
Fri Jun 15 07:40:15 UTC 2007


The following reply was made to PR kern/113548; it has been noted by GNATS.

From: Cristian KLEIN <cristi at net.utcluj.ro>
To: Alexey Illarionov <littlesavage at orionet.ru>
Cc: bug-followup at FreeBSD.org
Subject: Re: kern/113548: [dummynet] [patch] system hangs with dummynet queues
Date: Fri, 15 Jun 2007 10:30:43 +0300

 Alexey Illarionov wrote:
 > Cristian KLEIN wrote:
 > 
 >> I think the problem occurs because you use ipfw tags. As far as I know,
 >> ipfw tags are stored as mbuf_tags(9). Dummynet uses mbuf tags too to
 >> mark it's own packets. However, I suspect that in dn_tag_get(), dummynet
 >> incorrectly assumes it is the only one using mbuf_tags(9).
 > 
 >> Could you please apply the following patch? Also, could you test whether
 >> removing "tag 1" from ipfw rules has any impact?
 > 
 > Thanks for a fast reply and for the patch. It seems that panics have
 > really been caused by ipfw tags. When I apply this patch, there were no
 > panics for several days, but I have got the following dump today:
 > 
 > kgdb: kvm_nlist(_stopped_cpus):
 > kgdb: kvm_nlist(_stoppcbs):
 > [GDB will not be able to debug user-mode threads:
 > /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
 > GNU gdb 6.1.1 [FreeBSD]
 > Copyright 2004 Free Software Foundation, Inc.
 > GDB is free software, covered by the GNU General Public License, and you are
 > welcome to change it and/or distribute copies of it under certain
 > conditions.
 > Type "show copying" to see the conditions.
 > There is absolutely no warranty for GDB.  Type "show warranty" for details.
 > This GDB was configured as "i386-marcel-freebsd".
 > 
 > Unread portion of the kernel message buffer:
 > 
 > 
 > Fatal trap 12: page fault while in kernel mode
 > fault virtual address   = 0xec221d87
 > fault code              = supervisor read, page not present
 > instruction pointer     = 0x20:0xc05dafc6
 > stack pointer           = 0x28:0xde7b0c24
 > frame pointer           = 0x28:0xde7b0c28
 > code segment            = base 0x0, limit 0xfffff, type 0x1b
 >                         = DPL 0, pres 1, def32 1, gran 1
 > processor eflags        = interrupt enabled, resume, IOPL = 0
 > current process         = 30 (dummynet)
 > trap number             = 12
 > panic: page fault
 > KDB: stack backtrace:
 > kdb_backtrace(100,c52ad480,28,de7b0be4,c,...) at kdb_backtrace+0x29
 > panic(c078df19,c07d4928,0,fffff,c09b,...) at panic+0xa4
 > trap_fatal(de7b0be4,ec221d87,c52ad480,c104b000,ec221000,...) at
 > trap_fatal+0x2b7
 > trap_pfault(de7b0be4,0,ec221d87) at trap_pfault+0x16b
 > trap(8,28,28,1,0,...) at trap+0x331
 > calltrap() at calltrap+0x5
 > --- trap 0xc, eip = 0xc05dafc6, esp = 0xde7b0c24, ebp = 0xde7b0c28 ---
 > m_tag_locate(c55df900,0,f,0) at m_tag_locate+0x36
 > dn_tag_get(c55df900,2ffbd300,1,c05c3e7e,c088e858,...) at dn_tag_get+0x1d
 > ready_event_wfq(c57b0800,de7b0cac,de7b0cb0) at ready_event_wfq+0x50b
 > dummynet_task(0,1) at dummynet_task+0x24c
 > taskqueue_run(c5562a00) at taskqueue_run+0xd1
 > taskqueue_thread_loop(c08ce950,de7b0d38,c08ce950,c05c01e0,0,...) at
 > taskqueue_thread_loop+0x4a
 > fork_exit(c05c01e0,c08ce950,de7b0d38) at fork_exit+0xa8
 > fork_trampoline() at fork_trampoline+0x8
 > --- trap 0x1, eip = 0, esp = 0xde7b0d6c, ebp = 0 ---
 > Uptime: 50m0s
 > Dumping 511 MB (2 chunks)
 >   chunk 0: 1MB (156 pages) ... ok
 >   chunk 1: 511MB (130800 pages) 495 479 463 447 431 415 399 383 367 351
 > 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47
 > 31 15
 > 
 > #0  doadump () at pcpu.h:165
 > 165     pcpu.h: No such file or directory.
 >         in pcpu.h
 > (kgdb) bt
 > #0  doadump () at pcpu.h:165
 > #1  0xc059f2a6 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
 > #2  0xc059f57b in panic (fmt=0xc078df19 "%s") at
 > /usr/src/sys/kern/kern_shutdown.c:565
 > #3  0xc076c1f7 in trap_fatal (frame=0xde7b0be4, eva=3961658759) at
 > /usr/src/sys/i386/i386/trap.c:837
 > #4  0xc076bf0b in trap_pfault (frame=0xde7b0be4, usermode=0,
 > eva=3961658759) at /usr/src/sys/i386/i386/trap.c:745
 > #5  0xc076bb71 in trap (frame=
 >       {tf_fs = 8, tf_es = 40, tf_ds = 40, tf_edi = 1, tf_esi = 0, tf_ebp
 > = -562361304, tf_isp = -562361328, tf_ebx = 15, tf_edx = -333308545,
 > tf_ecx = 0, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip =
 > -1067601978, tf_cs = 32, tf_eflags = 66178, tf_esp = 22, tf_ss =
 > -562361280}) at /usr/src/sys/i386/i386/trap.c:435
 > #6  0xc0758bca in calltrap () at /usr/src/sys/i386/i386/exception.s:139
 > #7  0xc05dafc6 in m_tag_locate (m=0xec221d7f, cookie=0, type=15, t=0x0)
 > at /usr/src/sys/kern/uipc_mbuf2.c:392
 > #8  0xc06279ad in dn_tag_get (m=0xec221d7f) at mbuf.h:881
 > #9  0xc06281fb in ready_event_wfq (p=0xc57b0800, head=0xde7b0cac,
 > tail=0xde7b0cb0) at /usr/src/sys/netinet/ip_dummynet.c:705
 > #10 0xc06284cc in dummynet_task (context=0x0, pending=0) at
 > /usr/src/sys/netinet/ip_dummynet.c:805
 > #11 0xc05bfe71 in taskqueue_run (queue=0xc5562a00) at
 > /usr/src/sys/kern/subr_taskqueue.c:257
 > #12 0xc05c022a in taskqueue_thread_loop (arg=0x0) at
 > /usr/src/sys/kern/subr_taskqueue.c:376
 > #13 0xc05897b8 in fork_exit (callout=0xc05c01e0 <taskqueue_thread_loop>,
 > arg=0xc08ce950, frame=0xde7b0d38)
 >     at /usr/src/sys/kern/kern_fork.c:821
 > #14 0xc0758c2c in fork_trampoline () at
 > /usr/src/sys/i386/i386/exception.s:208
 > (kgdb) up 9
 > #9  0xc06281fb in ready_event_wfq (p=0xc57b0800, head=0xde7b0cac,
 > tail=0xde7b0cb0) at /usr/src/sys/netinet/ip_dummynet.c:705
 > 705             dn_tag_get(p->tail)->output_time += t ;
 > (kgdb) p *p
 > $1 = {next = {sle_next = 0xc6713600}, pipe_nr = 1700, bandwidth =
 > 50000000, delay = 0, head = 0x0, tail = 0xc55df900,
 >   scheduler_heap = {size = 16, elements = 1, offset = 0, p =
 > 0xc57b2800}, not_eligible_heap = {size = 16, elements = 0,
 >     offset = 0, p = 0xc57ac700}, idle_heap = {size = 16, elements = 0,
 > offset = 124, p = 0xc56a2800}, V = 9830400,
 >   sum = 10, numbytes = -1090027776, sched_time = 2997985, if_name = '\0'
 > <repeats 15 times>, ifp = 0x0, ready = 0, fs = {
 >     next = {sle_next = 0x0}, fs_nr = 0, flags_fs = 0, pipe = 0xc57b0800,
 > parent_nr = 0, weight = 0, qsize = 50, plr = 0,
 >     flow_mask = {dst_ip = 0, src_ip = 0, dst_port = 0, src_port = 0,
 > proto = 0 '\0', flags = 0 '\0', addr_type = 0 '\0',
 >       dst_ip6 = {__u6_addr = {__u6_addr8 = '\0' <repeats 15 times>,
 > __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {
 >             0, 0, 0, 0}}}, src_ip6 = {__u6_addr = {__u6_addr8 = '\0'
 > <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0,
 >             0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, flow_id6 = 0, frag_id6
 > = 0}, rq_size = 1, rq_elements = 0,
 >     rq = 0xc55791b0, last_expired = 0, backlogged = 0, w_q = 0, max_th =
 > 0, min_th = 0, max_p = 0, c_1 = 0, c_2 = 0,
 >     c_3 = 0, c_4 = 0, w_q_lookup = 0x0, lookup_depth = 0, lookup_step =
 > 0, lookup_weight = 0, avg_pkt_size = 0,
 >     max_pkt_size = 0}}
 > 
 > 
 > When I remove "tag 1" the kernel stopped panick, but deadlocks didn't
 > pass away. When I managed to enter DDB using serial console I found
 > dummynet_task() looped on the following code:
 > 
 > h = heaps[i];
 > 	while (h->elements > 0 && DN_KEY_LEQ(h->p[0].key, curr_time)) {
 > ...	
 > 		ready_event_wfq(p, &head, &tail);
 > ...
 > 	}
 > It seems to me that problem is in ready_event_wfq() in the following code:
 > if (p->bandwidth > 0)
 > 	t = (p->bandwidth -1 - p->numbytes) / p->bandwidth ;
 > 
 > Since p->bandwidth and p->numbytes are signed integers, the result can
 > be negative (i have p->bandwidth=50000000 and p->numbytes=-2147483647)
 > 
 > Now i test attached patch. I hope it will help. :)
 
 Could you please be so kind and test whether SMP has any effect on the
  bug. I.e. does an unpatched ip_dummynet without SMP cause panics? I ask
 this because I was unable to reproduce this bug on a non-SMP machine.
 
 Also, I see you have "dummynet_task" in your dumps. Are using RELENG_6
 or 1.93.2.6 of ip_dummynet.c?


More information about the freebsd-net mailing list