kern/113548: [dummynet] [patch] system hangs with dummynet
queues
Cristian KLEIN
cristi at net.utcluj.ro
Fri Jun 15 07:40:15 UTC 2007
The following reply was made to PR kern/113548; it has been noted by GNATS.
From: Cristian KLEIN <cristi at net.utcluj.ro>
To: Alexey Illarionov <littlesavage at orionet.ru>
Cc: bug-followup at FreeBSD.org
Subject: Re: kern/113548: [dummynet] [patch] system hangs with dummynet queues
Date: Fri, 15 Jun 2007 10:30:43 +0300
Alexey Illarionov wrote:
> Cristian KLEIN wrote:
>
>> I think the problem occurs because you use ipfw tags. As far as I know,
>> ipfw tags are stored as mbuf_tags(9). Dummynet uses mbuf tags too to
>> mark it's own packets. However, I suspect that in dn_tag_get(), dummynet
>> incorrectly assumes it is the only one using mbuf_tags(9).
>
>> Could you please apply the following patch? Also, could you test whether
>> removing "tag 1" from ipfw rules has any impact?
>
> Thanks for a fast reply and for the patch. It seems that panics have
> really been caused by ipfw tags. When I apply this patch, there were no
> panics for several days, but I have got the following dump today:
>
> kgdb: kvm_nlist(_stopped_cpus):
> kgdb: kvm_nlist(_stoppcbs):
> [GDB will not be able to debug user-mode threads:
> /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for details.
> This GDB was configured as "i386-marcel-freebsd".
>
> Unread portion of the kernel message buffer:
>
>
> Fatal trap 12: page fault while in kernel mode
> fault virtual address = 0xec221d87
> fault code = supervisor read, page not present
> instruction pointer = 0x20:0xc05dafc6
> stack pointer = 0x28:0xde7b0c24
> frame pointer = 0x28:0xde7b0c28
> code segment = base 0x0, limit 0xfffff, type 0x1b
> = DPL 0, pres 1, def32 1, gran 1
> processor eflags = interrupt enabled, resume, IOPL = 0
> current process = 30 (dummynet)
> trap number = 12
> panic: page fault
> KDB: stack backtrace:
> kdb_backtrace(100,c52ad480,28,de7b0be4,c,...) at kdb_backtrace+0x29
> panic(c078df19,c07d4928,0,fffff,c09b,...) at panic+0xa4
> trap_fatal(de7b0be4,ec221d87,c52ad480,c104b000,ec221000,...) at
> trap_fatal+0x2b7
> trap_pfault(de7b0be4,0,ec221d87) at trap_pfault+0x16b
> trap(8,28,28,1,0,...) at trap+0x331
> calltrap() at calltrap+0x5
> --- trap 0xc, eip = 0xc05dafc6, esp = 0xde7b0c24, ebp = 0xde7b0c28 ---
> m_tag_locate(c55df900,0,f,0) at m_tag_locate+0x36
> dn_tag_get(c55df900,2ffbd300,1,c05c3e7e,c088e858,...) at dn_tag_get+0x1d
> ready_event_wfq(c57b0800,de7b0cac,de7b0cb0) at ready_event_wfq+0x50b
> dummynet_task(0,1) at dummynet_task+0x24c
> taskqueue_run(c5562a00) at taskqueue_run+0xd1
> taskqueue_thread_loop(c08ce950,de7b0d38,c08ce950,c05c01e0,0,...) at
> taskqueue_thread_loop+0x4a
> fork_exit(c05c01e0,c08ce950,de7b0d38) at fork_exit+0xa8
> fork_trampoline() at fork_trampoline+0x8
> --- trap 0x1, eip = 0, esp = 0xde7b0d6c, ebp = 0 ---
> Uptime: 50m0s
> Dumping 511 MB (2 chunks)
> chunk 0: 1MB (156 pages) ... ok
> chunk 1: 511MB (130800 pages) 495 479 463 447 431 415 399 383 367 351
> 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47
> 31 15
>
> #0 doadump () at pcpu.h:165
> 165 pcpu.h: No such file or directory.
> in pcpu.h
> (kgdb) bt
> #0 doadump () at pcpu.h:165
> #1 0xc059f2a6 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
> #2 0xc059f57b in panic (fmt=0xc078df19 "%s") at
> /usr/src/sys/kern/kern_shutdown.c:565
> #3 0xc076c1f7 in trap_fatal (frame=0xde7b0be4, eva=3961658759) at
> /usr/src/sys/i386/i386/trap.c:837
> #4 0xc076bf0b in trap_pfault (frame=0xde7b0be4, usermode=0,
> eva=3961658759) at /usr/src/sys/i386/i386/trap.c:745
> #5 0xc076bb71 in trap (frame=
> {tf_fs = 8, tf_es = 40, tf_ds = 40, tf_edi = 1, tf_esi = 0, tf_ebp
> = -562361304, tf_isp = -562361328, tf_ebx = 15, tf_edx = -333308545,
> tf_ecx = 0, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip =
> -1067601978, tf_cs = 32, tf_eflags = 66178, tf_esp = 22, tf_ss =
> -562361280}) at /usr/src/sys/i386/i386/trap.c:435
> #6 0xc0758bca in calltrap () at /usr/src/sys/i386/i386/exception.s:139
> #7 0xc05dafc6 in m_tag_locate (m=0xec221d7f, cookie=0, type=15, t=0x0)
> at /usr/src/sys/kern/uipc_mbuf2.c:392
> #8 0xc06279ad in dn_tag_get (m=0xec221d7f) at mbuf.h:881
> #9 0xc06281fb in ready_event_wfq (p=0xc57b0800, head=0xde7b0cac,
> tail=0xde7b0cb0) at /usr/src/sys/netinet/ip_dummynet.c:705
> #10 0xc06284cc in dummynet_task (context=0x0, pending=0) at
> /usr/src/sys/netinet/ip_dummynet.c:805
> #11 0xc05bfe71 in taskqueue_run (queue=0xc5562a00) at
> /usr/src/sys/kern/subr_taskqueue.c:257
> #12 0xc05c022a in taskqueue_thread_loop (arg=0x0) at
> /usr/src/sys/kern/subr_taskqueue.c:376
> #13 0xc05897b8 in fork_exit (callout=0xc05c01e0 <taskqueue_thread_loop>,
> arg=0xc08ce950, frame=0xde7b0d38)
> at /usr/src/sys/kern/kern_fork.c:821
> #14 0xc0758c2c in fork_trampoline () at
> /usr/src/sys/i386/i386/exception.s:208
> (kgdb) up 9
> #9 0xc06281fb in ready_event_wfq (p=0xc57b0800, head=0xde7b0cac,
> tail=0xde7b0cb0) at /usr/src/sys/netinet/ip_dummynet.c:705
> 705 dn_tag_get(p->tail)->output_time += t ;
> (kgdb) p *p
> $1 = {next = {sle_next = 0xc6713600}, pipe_nr = 1700, bandwidth =
> 50000000, delay = 0, head = 0x0, tail = 0xc55df900,
> scheduler_heap = {size = 16, elements = 1, offset = 0, p =
> 0xc57b2800}, not_eligible_heap = {size = 16, elements = 0,
> offset = 0, p = 0xc57ac700}, idle_heap = {size = 16, elements = 0,
> offset = 124, p = 0xc56a2800}, V = 9830400,
> sum = 10, numbytes = -1090027776, sched_time = 2997985, if_name = '\0'
> <repeats 15 times>, ifp = 0x0, ready = 0, fs = {
> next = {sle_next = 0x0}, fs_nr = 0, flags_fs = 0, pipe = 0xc57b0800,
> parent_nr = 0, weight = 0, qsize = 50, plr = 0,
> flow_mask = {dst_ip = 0, src_ip = 0, dst_port = 0, src_port = 0,
> proto = 0 '\0', flags = 0 '\0', addr_type = 0 '\0',
> dst_ip6 = {__u6_addr = {__u6_addr8 = '\0' <repeats 15 times>,
> __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {
> 0, 0, 0, 0}}}, src_ip6 = {__u6_addr = {__u6_addr8 = '\0'
> <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0,
> 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, flow_id6 = 0, frag_id6
> = 0}, rq_size = 1, rq_elements = 0,
> rq = 0xc55791b0, last_expired = 0, backlogged = 0, w_q = 0, max_th =
> 0, min_th = 0, max_p = 0, c_1 = 0, c_2 = 0,
> c_3 = 0, c_4 = 0, w_q_lookup = 0x0, lookup_depth = 0, lookup_step =
> 0, lookup_weight = 0, avg_pkt_size = 0,
> max_pkt_size = 0}}
>
>
> When I remove "tag 1" the kernel stopped panick, but deadlocks didn't
> pass away. When I managed to enter DDB using serial console I found
> dummynet_task() looped on the following code:
>
> h = heaps[i];
> while (h->elements > 0 && DN_KEY_LEQ(h->p[0].key, curr_time)) {
> ...
> ready_event_wfq(p, &head, &tail);
> ...
> }
> It seems to me that problem is in ready_event_wfq() in the following code:
> if (p->bandwidth > 0)
> t = (p->bandwidth -1 - p->numbytes) / p->bandwidth ;
>
> Since p->bandwidth and p->numbytes are signed integers, the result can
> be negative (i have p->bandwidth=50000000 and p->numbytes=-2147483647)
>
> Now i test attached patch. I hope it will help. :)
Could you please be so kind and test whether SMP has any effect on the
bug. I.e. does an unpatched ip_dummynet without SMP cause panics? I ask
this because I was unable to reproduce this bug on a non-SMP machine.
Also, I see you have "dummynet_task" in your dumps. Are using RELENG_6
or 1.93.2.6 of ip_dummynet.c?
More information about the freebsd-net
mailing list