kern/113548: [dummynet] [patch] system hangs with dummynet queues

Alexey Illarionov littlesavage at orionet.ru
Fri Jun 15 07:40:10 UTC 2007


The following reply was made to PR kern/113548; it has been noted by GNATS.

From: Alexey Illarionov <littlesavage at orionet.ru>
To: Cristian KLEIN <cristi at net.utcluj.ro>
Cc: bug-followup at FreeBSD.org
Subject: Re: kern/113548: [dummynet] [patch] system hangs with dummynet queues
Date: Fri, 15 Jun 2007 11:11:39 +0400

 This is a multi-part message in MIME format.
 --------------040704070900010000020204
 Content-Type: text/plain; charset=ISO-8859-1
 Content-Transfer-Encoding: 7bit
 
 Cristian KLEIN wrote:
 
 > I think the problem occurs because you use ipfw tags. As far as I know,
 > ipfw tags are stored as mbuf_tags(9). Dummynet uses mbuf tags too to
 > mark it's own packets. However, I suspect that in dn_tag_get(), dummynet
 > incorrectly assumes it is the only one using mbuf_tags(9).
 
 > Could you please apply the following patch? Also, could you test whether
 > removing "tag 1" from ipfw rules has any impact?
 
 Thanks for a fast reply and for the patch. It seems that panics have
 really been caused by ipfw tags. When I apply this patch, there were no
 panics for several days, but I have got the following dump today:
 
 kgdb: kvm_nlist(_stopped_cpus):
 kgdb: kvm_nlist(_stoppcbs):
 [GDB will not be able to debug user-mode threads:
 /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain
 conditions.
 Type "show copying" to see the conditions.
 There is absolutely no warranty for GDB.  Type "show warranty" for details.
 This GDB was configured as "i386-marcel-freebsd".
 
 Unread portion of the kernel message buffer:
 
 
 Fatal trap 12: page fault while in kernel mode
 fault virtual address   = 0xec221d87
 fault code              = supervisor read, page not present
 instruction pointer     = 0x20:0xc05dafc6
 stack pointer           = 0x28:0xde7b0c24
 frame pointer           = 0x28:0xde7b0c28
 code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, def32 1, gran 1
 processor eflags        = interrupt enabled, resume, IOPL = 0
 current process         = 30 (dummynet)
 trap number             = 12
 panic: page fault
 KDB: stack backtrace:
 kdb_backtrace(100,c52ad480,28,de7b0be4,c,...) at kdb_backtrace+0x29
 panic(c078df19,c07d4928,0,fffff,c09b,...) at panic+0xa4
 trap_fatal(de7b0be4,ec221d87,c52ad480,c104b000,ec221000,...) at
 trap_fatal+0x2b7
 trap_pfault(de7b0be4,0,ec221d87) at trap_pfault+0x16b
 trap(8,28,28,1,0,...) at trap+0x331
 calltrap() at calltrap+0x5
 --- trap 0xc, eip = 0xc05dafc6, esp = 0xde7b0c24, ebp = 0xde7b0c28 ---
 m_tag_locate(c55df900,0,f,0) at m_tag_locate+0x36
 dn_tag_get(c55df900,2ffbd300,1,c05c3e7e,c088e858,...) at dn_tag_get+0x1d
 ready_event_wfq(c57b0800,de7b0cac,de7b0cb0) at ready_event_wfq+0x50b
 dummynet_task(0,1) at dummynet_task+0x24c
 taskqueue_run(c5562a00) at taskqueue_run+0xd1
 taskqueue_thread_loop(c08ce950,de7b0d38,c08ce950,c05c01e0,0,...) at
 taskqueue_thread_loop+0x4a
 fork_exit(c05c01e0,c08ce950,de7b0d38) at fork_exit+0xa8
 fork_trampoline() at fork_trampoline+0x8
 --- trap 0x1, eip = 0, esp = 0xde7b0d6c, ebp = 0 ---
 Uptime: 50m0s
 Dumping 511 MB (2 chunks)
   chunk 0: 1MB (156 pages) ... ok
   chunk 1: 511MB (130800 pages) 495 479 463 447 431 415 399 383 367 351
 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47
 31 15
 
 #0  doadump () at pcpu.h:165
 165     pcpu.h: No such file or directory.
         in pcpu.h
 (kgdb) bt
 #0  doadump () at pcpu.h:165
 #1  0xc059f2a6 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
 #2  0xc059f57b in panic (fmt=0xc078df19 "%s") at
 /usr/src/sys/kern/kern_shutdown.c:565
 #3  0xc076c1f7 in trap_fatal (frame=0xde7b0be4, eva=3961658759) at
 /usr/src/sys/i386/i386/trap.c:837
 #4  0xc076bf0b in trap_pfault (frame=0xde7b0be4, usermode=0,
 eva=3961658759) at /usr/src/sys/i386/i386/trap.c:745
 #5  0xc076bb71 in trap (frame=
       {tf_fs = 8, tf_es = 40, tf_ds = 40, tf_edi = 1, tf_esi = 0, tf_ebp
 = -562361304, tf_isp = -562361328, tf_ebx = 15, tf_edx = -333308545,
 tf_ecx = 0, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip =
 -1067601978, tf_cs = 32, tf_eflags = 66178, tf_esp = 22, tf_ss =
 -562361280}) at /usr/src/sys/i386/i386/trap.c:435
 #6  0xc0758bca in calltrap () at /usr/src/sys/i386/i386/exception.s:139
 #7  0xc05dafc6 in m_tag_locate (m=0xec221d7f, cookie=0, type=15, t=0x0)
 at /usr/src/sys/kern/uipc_mbuf2.c:392
 #8  0xc06279ad in dn_tag_get (m=0xec221d7f) at mbuf.h:881
 #9  0xc06281fb in ready_event_wfq (p=0xc57b0800, head=0xde7b0cac,
 tail=0xde7b0cb0) at /usr/src/sys/netinet/ip_dummynet.c:705
 #10 0xc06284cc in dummynet_task (context=0x0, pending=0) at
 /usr/src/sys/netinet/ip_dummynet.c:805
 #11 0xc05bfe71 in taskqueue_run (queue=0xc5562a00) at
 /usr/src/sys/kern/subr_taskqueue.c:257
 #12 0xc05c022a in taskqueue_thread_loop (arg=0x0) at
 /usr/src/sys/kern/subr_taskqueue.c:376
 #13 0xc05897b8 in fork_exit (callout=0xc05c01e0 <taskqueue_thread_loop>,
 arg=0xc08ce950, frame=0xde7b0d38)
     at /usr/src/sys/kern/kern_fork.c:821
 #14 0xc0758c2c in fork_trampoline () at
 /usr/src/sys/i386/i386/exception.s:208
 (kgdb) up 9
 #9  0xc06281fb in ready_event_wfq (p=0xc57b0800, head=0xde7b0cac,
 tail=0xde7b0cb0) at /usr/src/sys/netinet/ip_dummynet.c:705
 705             dn_tag_get(p->tail)->output_time += t ;
 (kgdb) p *p
 $1 = {next = {sle_next = 0xc6713600}, pipe_nr = 1700, bandwidth =
 50000000, delay = 0, head = 0x0, tail = 0xc55df900,
   scheduler_heap = {size = 16, elements = 1, offset = 0, p =
 0xc57b2800}, not_eligible_heap = {size = 16, elements = 0,
     offset = 0, p = 0xc57ac700}, idle_heap = {size = 16, elements = 0,
 offset = 124, p = 0xc56a2800}, V = 9830400,
   sum = 10, numbytes = -1090027776, sched_time = 2997985, if_name = '\0'
 <repeats 15 times>, ifp = 0x0, ready = 0, fs = {
     next = {sle_next = 0x0}, fs_nr = 0, flags_fs = 0, pipe = 0xc57b0800,
 parent_nr = 0, weight = 0, qsize = 50, plr = 0,
     flow_mask = {dst_ip = 0, src_ip = 0, dst_port = 0, src_port = 0,
 proto = 0 '\0', flags = 0 '\0', addr_type = 0 '\0',
       dst_ip6 = {__u6_addr = {__u6_addr8 = '\0' <repeats 15 times>,
 __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {
             0, 0, 0, 0}}}, src_ip6 = {__u6_addr = {__u6_addr8 = '\0'
 <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0,
             0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, flow_id6 = 0, frag_id6
 = 0}, rq_size = 1, rq_elements = 0,
     rq = 0xc55791b0, last_expired = 0, backlogged = 0, w_q = 0, max_th =
 0, min_th = 0, max_p = 0, c_1 = 0, c_2 = 0,
     c_3 = 0, c_4 = 0, w_q_lookup = 0x0, lookup_depth = 0, lookup_step =
 0, lookup_weight = 0, avg_pkt_size = 0,
     max_pkt_size = 0}}
 
 
 When I remove "tag 1" the kernel stopped panick, but deadlocks didn't
 pass away. When I managed to enter DDB using serial console I found
 dummynet_task() looped on the following code:
 
 h = heaps[i];
 	while (h->elements > 0 && DN_KEY_LEQ(h->p[0].key, curr_time)) {
 ...	
 		ready_event_wfq(p, &head, &tail);
 ...
 	}
 It seems to me that problem is in ready_event_wfq() in the following code:
 if (p->bandwidth > 0)
 	t = (p->bandwidth -1 - p->numbytes) / p->bandwidth ;
 
 Since p->bandwidth and p->numbytes are signed integers, the result can
 be negative (i have p->bandwidth=50000000 and p->numbytes=-2147483647)
 
 Now i test attached patch. I hope it will help. :)
 
 
 
 
 --------------040704070900010000020204
 Content-Type: text/x-patch;
  name="ip_dummynet.c.patch"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: inline;
  filename="ip_dummynet.c.patch"
 
 --- ip_dummynet.c_orig	Sun Jun 10 20:19:33 2007
 +++ ip_dummynet.c	Fri Jun 15 07:37:46 2007
 @@ -433,7 +433,7 @@
  static struct dn_pkt_tag *
  dn_tag_get(struct mbuf *m)
  {
 -    struct m_tag *mtag = m_tag_first(m);
 +    struct m_tag *mtag = m_tag_find(m, PACKET_TAG_DUMMYNET, NULL);
      KASSERT(mtag != NULL &&
  	    mtag->m_tag_cookie == MTAG_ABI_COMPAT &&
  	    mtag->m_tag_id == PACKET_TAG_DUMMYNET,
 @@ -698,8 +698,10 @@
      if (p->if_name[0]==0 && p->numbytes < 0) { /* this implies bandwidth >0 */
  	dn_key t=0 ; /* number of ticks i have to wait */
  
 -	if (p->bandwidth > 0)
 -	    t = ( p->bandwidth -1 - p->numbytes) / p->bandwidth ;
 +	if (p->bandwidth > 0) 
 +	    t = ( (u_int64_t)p->bandwidth -1 - p->numbytes) / p->bandwidth ;
 +
 +	KASSERT( (curr_time + t) >= curr_time, ("wfq overflow"));
  	dn_tag_get(p->tail)->output_time += t ;
  	p->sched_time = curr_time ;
  	heap_insert(&wfq_ready_heap, curr_time + t, (void *)p);
 
 --------------040704070900010000020204--


More information about the freebsd-net mailing list