Lock order reversals with dummynet (Re: FreeBSD 7.0 Beta, RC, RELEASE (amd64) freezes with dummynet enabled)

matthew matthew at matthew.sk
Fri Feb 29 11:56:18 UTC 2008


I have some more tests, but always the same.


Here is the top output, that i see last on terminal:


65 processes: 4 running, 144 sleeping, 17 waiting
CPU states:  0.0% user,  0.0% nice, 29.9% system,  7.8% interrupt, 62.3% 
idle
Mem: 46M Active, 522M Inact, 350M Wired, 10M Cache, 109M Buf, 35M Free
Swap: 2048M Total, 228K Used, 2048M Free

  PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
   12 root        1 171 ki31     0K    16K RUN    0 692:10 61.96% idle: cpu0
   11 root        1 171 ki31     0K    16K RUN    1 804:37 51.51% idle: cpu1
   13 root        1 -44    -     0K    16K WAIT   0 325:41 13.18% swi1: net
   42 root        1 -68    -     0K    16K -      0  12:26 12.74% dummynet
72368 ftp         1   4    0 10832K  2080K RUN    0   4:05 10.01% vsftpd
74488 ftp         1   4    0 10832K  2080K zfs:(& 0   0:31  9.62% vsftpd
73690 ftp         1   8    0 10832K  2072K nanslp 0   1:43  8.64% vsftpd
74752 ftp         1   8    0 10832K  2068K nanslp 1   0:09  6.06% vsftpd
74811 ftp         1   8    0 10832K  2072K nanslp 1   0:03  3.79% vsftpd
74869 ftp         1   4    0 10832K  2056K sbwait 1   0:00  3.39% vsftpd


As you can see, there is no high load, enought free memory and CPUs are 
relative idle.



This is the output from nload -o 250000:

Device em0 [10.252.2.3] (1/1):
============================================================================================================================================
Incoming:










 ..    .| .|.  .   .. .  ||...|....            .........| 
.                           .   ..   Curr: 2519.37 kBit/s
####|||######.######################           ############...  ..... 
..||...  .|. ##|#|####.  Avg: 2831.76 kBit/s
####################################|# .. |. | 
######################|#######||##############  Min: 141.74 kBit/s
#########################################.##.#|##############################################  
Max: 6586.57 kBit/s
#############################################################################################  
Ttl: 542932224.00 MByte
Outgoing:




                                                                                      
.
 ..     #  ..  .    .    ##.. ....                  . . 
#                           . #   .#
 ##    ## ###  #  .##.|| #########||           
|###|#####..                        |#.#.#|##
####||.######.######################           ############...  |   .  
..# .|   |. #########.
####################################           ###############.####|# 
####|##  ###|##########
####################################.#    .  . 
##############################||##############
###################################### || ## # 
##############################################  Curr: 127802.60 kBit/s
######################################|## ## 
#|##############################################  Avg: 144300.05 kBit/s
#########################################|###################################################  
Min: 3339.92 kBit/s
#############################################################################################  
Max: 292397.75 kBit/s
#############################################################################################  
Ttl: 14977425408.00 MByte 


There is no rapid change of bw or anything.



Kris Kennaway wrote:

> Adding back the mailing list so others can help.
>
> matthew wrote:
>> Kris Kennaway wrote:
>>
>>> matthew wrote:
>>>> I have posted before that i have a stability issue with the 7.0 branch
>>>> on my servers. Tested on BETA2,BETA4,RC1,RC2,RELEASE
>>>>
>>>> The original thread and my post with details is at:
>>>> http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-12/msg00674.html 
>>>>
>>>>
>>>> I was waiting for the 7.0-RELEASE, updated the whole servers, and
>>>> enabled dummynet again, but it always freezes after some minutes, 100%
>>>> reproducible.
>>>>
>>>> I tested it also on a HP 140 G3 1U server, where 6.3 has absolutely no
>>>> problems, but the 7.0 branch keeps freezing.
>>>>
>>>> Again, if it helps to solve this bug, i can rebuild the kernel with
>>>> debug symbols a take some screenshots :)
>>>
>>> Please follow the steps at
>>>
>>> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html 
>>>
>>>
>>> Kris
>>> _______________________________________________
>> I added to the kernel these options:
>> makeoptions     DEBUG=-g                # Build kernel with gdb(1) 
>> debug symbols
>> options         INVARIANTS
>> options         INVARIANT_SUPPORT
>> options         WITNESS
>> options         DEBUG_LOCKS
>> options         DEBUG_VFS_LOCKS
>> options         DIAGNOSTIC
>> options         KDB
>> options         DDB
>> options         GDB
>> options         SOCKBUF_DEBUG
>>
>> But still, the server freezes withoutany debug message,
>
> And you can't break to DDB?  It is expected you will have to when you 
> encounter a deadlock.
No, i am unable to do anything, no input from keyboard, the only one 
thing thats work, is to change the num lock indicator on the keyboard :)
>
> > panic whatever,
>> but at leasts it dumps the crash kernel to /var/crash, but the debug 
>> looks like this:
>
> Dumping is an activity the occurs after panicking, so if you have a 
> dump then you must have a panic.
>> root at hanka:/usr/src/sys/amd64/compile/HANKA-debug# kgdb kernel.debug 
>> /var/crash/vmcore.0
>> [GDB will not be able to debug user-mode threads: 
>> /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
>> GNU gdb 6.1.1 [FreeBSD]
>> Copyright 2004 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and 
>> you are
>> welcome to change it and/or distribute copies of it under certain 
>> conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB.  Type "show warranty" for 
>> details.
>> This GDB was configured as "amd64-marcel-freebsd".
>> Cannot access memory at address 0x57
>> (kgdb) backtrace
>> #0  0x0000000000000000 in ?? ()
>> Cannot access memory at address 0x0
>> (kgdb)                                   I have some debug messages 
>> at boot time:
>
> This usually means you are not running the dump against the 
> kernel.debug corresponding exactly to the kernel that crashed.  
> Probably the dump you found was old.
>
Yes, i think it is possible that it is the previous kernel, without the 
debug options.
>> Feb 28 17:01:29 hanka lock order reversal:
>> Feb 28 17:01:29 hanka 1st 0xffffffff80b0e1e8 PFil hook read/write 
>> mutex (PFil hook read/write mutex) @ net/pfil.c:73
>> Feb 28 17:01:29 hanka 2nd 0xffffffff80b0f8e8 udp (udp) @ 
>> /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:2970
>> Feb 28 17:01:29 hanka KDB: stack backtrace:
>> Feb 28 17:01:29 hanka db_trace_self_wrapper() at 
>> db_trace_self_wrapper+0x2a
>> Feb 28 17:01:29 hanka witness_checkorder() at witness_checkorder+0x605
>> Feb 28 17:01:29 hanka _mtx_lock_flags() at _mtx_lock_flags+0x75
>> Feb 28 17:01:29 hanka pf_socket_lookup() at pf_socket_lookup+0x241
>> Feb 28 17:01:29 hanka pf_test_udp() at pf_test_udp+0x890
>> Feb 28 17:01:29 hanka pf_test() at pf_test+0xeb2
>> Feb 28 17:01:29 hanka pf_check_in() at pf_check_in+0x2b
>> Feb 28 17:01:29 hanka pfil_run_hooks() at pfil_run_hooks+0xbc
>> Feb 28 17:01:29 hanka ip_input() at ip_input+0x2c4
>> Feb 28 17:01:29 hanka ether_demux() at ether_demux+0x1d9
>> Feb 28 17:01:29 hanka ether_input() at ether_input+0x19d
>> Feb 28 17:01:29 hanka ether_demux() at ether_demux+0xe9
>> Feb 28 17:01:29 hanka ether_input() at ether_input+0x19d
>> Feb 28 17:01:29 hanka em_rxeof() at em_rxeof+0x1ca
>> Feb 28 17:01:29 hanka em_poll() at em_poll+0x6f
>> Feb 28 17:01:29 hanka netisr_poll() at netisr_poll+0x87
>> Feb 28 17:01:29 hanka swi_net() at swi_net+0xea
>> Feb 28 17:01:29 hanka ithread_loop() at ithread_loop+0xda
>> Feb 28 17:01:29 hanka fork_exit() at fork_exit+0x12a
>> Feb 28 17:01:29 hanka fork_trampoline() at fork_trampoline+0xe
>> Feb 28 17:01:29 hanka --- trap 0, rip = 0, rsp = 0xffffffff9feead30, 
>> rbp = 0 ---
>
>
>> Feb 28 17:01:29 hanka lock order reversal:
>> Feb 28 17:01:29 hanka 1st 0xffffff0001237430 em0 (EM Core Lock) @ 
>> dev/em/if_em.c:1416
>> Feb 28 17:01:29 hanka 2nd 0xffffffff80b0f8e8 udp (udp) @ 
>> netinet/udp_usrreq.c:385
>> Feb 28 17:01:29 hanka KDB: stack backtrace:
>> Feb 28 17:01:29 hanka db_trace_self_wrapper() at 
>> db_trace_self_wrapper+0x2a
>> Feb 28 17:01:29 hanka witness_checkorder() at witness_checkorder+0x605
>> Feb 28 17:01:29 hanka _mtx_lock_flags() at _mtx_lock_flags+0x75
>> Feb 28 17:01:29 hanka udp_input() at udp_input+0x1a4
>> Feb 28 17:01:29 hanka ip_input() at ip_input+0xc0
>> Feb 28 17:01:29 hanka ether_demux() at ether_demux+0x1d9
>> Feb 28 17:01:29 hanka ether_input() at ether_input+0x19d
>> Feb 28 17:01:29 hanka ether_demux() at ether_demux+0xe9
>> Feb 28 17:01:29 hanka ether_input() at ether_input+0x19d
>> Feb 28 17:01:29 hanka em_rxeof() at em_rxeof+0x1ca
>> Feb 28 17:01:29 hanka em_poll() at em_poll+0x6f
>> Feb 28 17:01:29 hanka netisr_poll() at netisr_poll+0x87
>> Feb 28 17:01:29 hanka swi_net() at swi_net+0xea
>> Feb 28 17:01:29 hanka ithread_loop() at ithread_loop+0xda
>> Feb 28 17:01:29 hanka fork_exit() at fork_exit+0x12a
>> Feb 28 17:01:29 hanka fork_trampoline() at fork_trampoline+0xe
>> Feb 28 17:01:29 hanka --- trap 0, rip = 0, rsp = 0xffffffff9feead30, 
>> rbp = 0 ---
>
>
>> Feb 28 17:01:29 hanka xinetd[1356]: xinetd Version 2.3.14 started 
>> with libwrap loadavg options compiled in.
>> Feb 28 17:01:29 hanka xinetd[1356]: Started working: 1 available service
>> Feb 28 17:01:33 hanka lock order reversal:
>> Feb 28 17:01:33 hanka 1st 0xffffff00042f5b10 inp (tcpinp) @ 
>> netinet/tcp_usrreq.c:781
>> Feb 28 17:01:33 hanka 2nd 0xffffffff80b0e1e8 PFil hook read/write 
>> mutex (PFil hook read/write mutex) @ net/pfil.c:73
>> Feb 28 17:01:33 hanka KDB: stack backtrace:
>> Feb 28 17:01:33 hanka db_trace_self_wrapper() at 
>> db_trace_self_wrapper+0x2a
>> Feb 28 17:01:33 hanka witness_checkorder() at witness_checkorder+0x605
>> Feb 28 17:01:33 hanka _rw_rlock() at _rw_rlock+0x5b
>> Feb 28 17:01:33 hanka pfil_run_hooks() at pfil_run_hooks+0x44
>> Feb 28 17:01:33 hanka ip_output() at ip_output+0x38e
>> Feb 28 17:01:33 hanka tcp_output() at tcp_output+0xacd
>> Feb 28 17:01:33 hanka tcp_usr_send() at tcp_usr_send+0x272
>> Feb 28 17:01:33 hanka sosend_generic() at sosend_generic+0x347
>> Feb 28 17:01:33 hanka soo_write() at soo_write+0x38
>> Feb 28 17:01:33 hanka dofilewrite() at dofilewrite+0x85
>> Feb 28 17:01:33 hanka kern_writev() at
>> Feb 28 17:01:33 hanka kern_writev+0x4c
>> Feb 28 17:01:33 hanka write() at write+0x54
>> Feb 28 17:01:33 hanka syscall() at syscall+0x1f6
>> Feb 28 17:01:33 hanka Xfast_syscall() at Xfast_syscall+0xab
>> Feb 28 17:01:33 hanka --- syscall (4, FreeBSD ELF64, write), rip = 
>> 0x800e1fd3c, rsp = 0x7fffffffea88, rbp = 0x16 ---
>
>
>> Feb 28 17:03:54 hanka lock order reversal:
>> Feb 28 17:03:54 hanka 1st 0xffffffff80b0f308 tcp (tcp) @ 
>> netinet/tcp_input.c:400
>> Feb 28 17:03:54 hanka 2nd 0xffffffff80b0e1e8 PFil hook read/write 
>> mutex (PFil hook read/write mutex) @ net/pfil.c:73
>> Feb 28 17:03:54 hanka KDB: stack backtrace:
>> Feb 28 17:03:54 hanka db_trace_self_wrapper() at 
>> db_trace_self_wrapper+0x2a
>> Feb 28 17:03:54 hanka witness_checkorder() at witness_checkorder+0x605
>> Feb 28 17:03:54 hanka _rw_rlock() at _rw_rlock+0x5b
>> Feb 28 17:03:54 hanka pfil_run_hooks() at pfil_run_hooks+0x44
>> Feb 28 17:03:54 hanka ip_output() at ip_output+0x38e
>> Feb 28 17:03:54 hanka tcp_respond() at tcp_respond+0x2d5
>> Feb 28 17:03:54 hanka tcp_dropwithreset() at tcp_dropwithreset+0x131
>> Feb 28 17:03:54 hanka tcp_input() at tcp_input+0x6d0
>> Feb 28 17:03:54 hanka ip_input() at ip_input+0xc0
>> Feb 28 17:03:54 hanka ether_demux() at ether_demux+0x1d9
>> Feb 28 17:03:54 hanka ether_input() at ether_input+0x19d
>> Feb 28 17:03:54 hanka em_rxeof() at em_rxeof+0x1ca
>> Feb 28 17:03:54 hanka em_poll() at em_poll+0x6f
>> Feb 28 17:03:54 hanka netisr_poll() at netisr_poll+0x87
>> Feb 28 17:03:54 hanka swi_net() at swi_net+0xea
>> Feb 28 17:03:54 hanka ithread_loop() at ithread_loop+0xda
>> Feb 28 17:03:54 hanka fork_exit() at fork_exit+0x12a
>> Feb 28 17:03:54 hanka fork_trampoline() at fork_trampoline+0xe
>> Feb 28 17:03:54 hanka --- trap 0, rip = 0, rsp = 0xffffffff9feead30, 
>> rbp = 0 ---
>
>
>> Feb 28 17:23:36 hanka lock order reversal:
>> Feb 28 17:23:36 hanka 1st 0xffffffff810f9ef0 tcp_sc_head 
>> (tcp_sc_head) @ netinet/tcp_syncache.c:477
>> Feb 28 17:23:36 hanka 2nd 0xffffffff80b0e1e8 PFil hook read/write 
>> mutex (PFil hook read/write mutex) @ net/pfil.c:73
>> Feb 28 17:23:36 hanka KDB: stack backtrace:
>> Feb 28 17:23:36 hanka db_trace_self_wrapper() at 
>> db_trace_self_wrapper+0x2a
>> Feb 28 17:23:36 hanka witness_checkorder() at witness_checkorder+0x605
>> Feb 28 17:23:36 hanka _rw_rlock() at _rw_rlock+0x5b
>> Feb 28 17:23:36 hanka pfil_run_hooks() at pfil_run_hooks+0x44
>> Feb 28 17:23:36 hanka ip_output() at ip_output+0x38e
>> Feb 28 17:23:36 hanka syncache_respond() at syncache_respond+0x32a
>> Feb 28 17:23:36 hanka syncache_add() at syncache_add+0x206
>> Feb 28 17:23:36 hanka tcp_input() at tcp_input+0xa77
>> Feb 28 17:23:36 hanka ip_input() at ip_input+0xc0
>> Feb 28 17:23:36 hanka ether_demux() at ether_demux+0x1d9
>> Feb 28 17:23:36 hanka ether_input() at ether_input+0x19d
>> Feb 28 17:23:36 hanka em_rxeof() at em_rxeof+0x1ca
>> Feb 28 17:23:36 hanka em_poll() at em_poll+0x6f
>> Feb 28 17:23:36 hanka netisr_poll() at netisr_poll+0x87
>> Feb 28 17:23:36 hanka swi_net() at swi_net+0xea
>> Feb 28 17:23:36 hanka ithread_loop() at ithread_loop+0xda
>> Feb 28 17:23:36 hanka fork_exit() at fork_exit+0x12a
>> Feb 28 17:23:36 hanka fork_trampoline() at fork_trampoline+0xe
>> Feb 28 17:23:36 hanka --- trap 0, rip = 0, rsp = 0xffffffff9feead30, 
>> rbp = 0 ---
>
> These look useful and should point to the problem.  It is also 
> probably related to something in your script:
>
>> I tried it with pf disabled, only running ipfw without dummynet, 
>> everything is ok.
>> I tried it witout pf disabled, running ipfw with dummynet queue for 
>> small traffic, everything is ok.
>> I tried it with/without pf disabled, running ipfw with dummynet queue 
>> for hight traffic >100Mbit/s, the box freezes, everytime, after 
>> enabling rules with the dummynet queue, after some second or few 
>> minutes, maybe depending on the high traffic.
>>
>> The shaper script:
>> # SHAPER
>> IPFW_PIPE_DOWNLOAD=1
>> IPFW_PIPE_UPLOAD=11
>> IPFW_QUEUE_DOWNLOAD=1
>> IPFW_QUEUE_UPLOAD=11
>>
>> DOWNLOAD_ROOT="450Mbit/s"
>> UPLOAD_ROOT="450Mbit/s"
>> SHAPER_BUCKETS="1024"
>>
>> IPFW_SHAPER_ROOT=1000
>>
>> $CMD pipe $IPFW_PIPE_DOWNLOAD config bw $DOWNLOAD_ROOT buckets 
>> $SHAPER_BUCKETS
>> $CMD pipe $IPFW_PIPE_UPLOAD config bw $UPLOAD_ROOT buckets 
>> $SHAPER_BUCKETS
>>
>> $CMD queue $IPFW_QUEUE_DOWNLOAD config pipe $IPFW_PIPE_DOWNLOAD 
>> buckets $SHAPER_BUCKETS mask dst-ip 0xFFFFFFFF
>> $CMD queue $IPFW_QUEUE_UPLOAD config pipe $IPFW_PIPE_UPLOAD buckets 
>> $SHAPER_BUCKETS mask src-ip 0xFFFFFFFF
>>
>> $CMD delete $IPFW_SHAPER_ROOT
>> $CMD $IPFW_SHAPER_ROOT add queue $IPFW_QUEUE_DOWNLOAD ip from me to 
>> any out  // Share download
>> $CMD $IPFW_SHAPER_ROOT add queue $IPFW_QUEUE_UPLOAD ip from any to me 
>> in // Share upload
>
> Thanks for the information.
>
> Kris

I hope i can help to solve this problem.

Matthew



More information about the freebsd-ipfw mailing list