Kernel hang on 6.x
Brad L. Chisholm
blc at bsdwins.com
Wed Jan 10 22:21:41 UTC 2007
On Thu, 4 Jan 2007 12:53:47 -0500, John Baldwin wrote:
> On Thursday 04 January 2007 10:27, Brian Dean wrote:
> >
> > I believe that I can generate a kernel dump. We tried this yesterday
> > but didn't have a dump device configured.
>
> If this is 6.x, turn on minidumps via the sysctl. The dump size normally is
> the size of RAM. With minidumps it can be a lot smaller. If you get a dump,
> let me know and I'll point you at some gdb scripts to generate 'ps' type
> output, etc.
>
I work with Brian, and have been helping him analyze this problem. We have
been able to generate kernel dumps, and have also done some additional
analysis under ddb. Here is a summary of our analysis so far. Suggestions
as to how to proceed from here are most welcome.
Summary
-------
There appears to be a VM deadlock, not due to any actual locks or mutexes
being held, but due to an inability to satisfy page faults.
In our scenario, one huge process is essentially consuming most of the
memory on the machine. Depending upon the physical memory configured,
that process will consume 1.5-3 times physical memory. When the machine
hangs, however, there is still plenty of swap space available. One
possibly interesting observation is that the hang always seems to occur
when ~14G of swap has been consumed, regardless of the size of physmem
(6G, 8G, 16G) or amount of swap (32G, 64G) configured.
Dropping to DDB on the serial console, and running "show locks",
"show alllocks", and "show lockedvnods" usually indicate that only an
sio lock is being held (probably for the serial console). A "ps" indicates
that the pagedaemon is in VMWait state, while a number of other processes
(including the huge memory process) are in pfault state.
We have been able to generate a full core dump, as well as a couple of
minidumps with differing hw.physmem settings. We can make these dumps
available, if that would help, or we can provide information from gdb
on request. We can usually reproduce the hang within 1.5-2 hours, and
can provide additional information from ddb, as well.
Details
-------
Here is an example of a 'top' session that was running when the machine
became unresponsive. Note the one huge 'sdsbuild' process.
last pid: 3331; load averages: 0.00, 0.27, 0.44 up 0+03:22:40 17:06:05
42 processes: 1 running, 41 sleeping
CPU states: 0.0% user, 0.0% nice, 0.8% system, 0.0% interrupt, 99.2% idle
Mem: 14G Active, 1115M Inact, 535M Wired, 39M Cache, 214M Buf, 21M Free
Swap: 64G Total, 14G Used, 50G Free, 21% Inuse
PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
2698 blc 1 114 0 29579M 14635M pfault 30:40 0.00% sdsbuild
3159 blc 1 5 0 7492K 856K ttyin 0:23 0.00% systat
3158 root 1 96 0 6984K 744K RUN 0:04 0.00% top
2096 root 1 96 0 3656K 104K select 0:02 0.00% rlogind
2699 blc 1 -8 0 2492K 212K piperd 0:01 0.00% tee
1092 root 1 96 0 2692K 168K select 0:01 0.00% cvd
773 root 1 96 0 7712K 600K select 0:01 0.00% ntpd
736 root 1 97 0 3984K 0K select 0:00 0.00% <mountd>
2525 root 1 96 0 3656K 104K select 0:00 0.00% rlogind
1986 root 1 96 0 3656K 104K select 0:00 0.00% rlogind
707 root 1 96 0 9760K 352K select 0:00 0.00% amd
817 root 1 96 0 9340K 564K select 0:00 0.00% sendmail
657 root 1 96 0 3644K 304K select 0:00 0.00% syslogd
2000 root 1 20 0 9796K 0K pause 0:00 0.00% <csh>
825 root 1 -16 0 3664K 336K vmwait 0:00 0.00% cron
2527 blc 1 20 0 960K 0K pause 0:00 0.00% <ksh>
676 root 1 96 0 4940K 352K select 0:00 0.00% rpcbind
1956 root 1 96 0 6108K 0K select 0:00 0.00% <inetd>
738 root 1 109 0 2724K 0K select 0:00 0.00% <nfsd>
1987 root 1 8 0 18408K 0K wait 0:00 0.00% <login>
2097 root 1 8 0 18408K 0K wait 0:00 0.00% <login>
2526 root 1 8 0 18408K 0K wait 0:00 0.00% <login>
2098 blc 1 20 0 960K 0K pause 0:00 0.00% <ksh>
793 root 1 96 0 2516K 152K select 0:00 0.00% usbd
1999 blc 1 8 0 18400K 0K wait 0:00 0.00% <su>
1988 blc 1 20 0 960K 0K pause 0:00 0.00% <ksh>
2682 blc 1 20 0 964K 0K pause 0:00 0.00% <ksh>
1982 root 1 5 0 3600K 0K ttyin 0:00 0.00% <getty>
1974 root 1 5 0 3600K 0K ttyin 0:00 0.00% <getty>
1981 root 1 5 0 3600K 0K ttyin 0:00 0.00% <getty>
1978 root 1 5 0 3600K 0K ttyin 0:00 0.00% <getty>
1977 root 1 5 0 3600K 0K ttyin 0:00 0.00% <getty>
1976 root 1 5 0 3600K 0K ttyin 0:00 0.00% <getty>
Here is a 'ps' from ddb corresponding to the 'top' session above:
db> ps
pid ppid pgrp uid state wmesg wchan cmd
3332 817 817 0 N sendmail
3331 825 825 0 N cron
3159 2527 3159 207 S+ ttyin 0xffffff02574d4c10 systat
3158 2000 3158 0 SL+ pfault 0xffffffff809326f8 top
3155 0 0 0 SL mdwait 0xffffff030c9e7800 [md0]
2699 2682 2682 207 S+ piperd 0xffffff004e432000 tee
2698 2682 2682 207 SL+ pfault 0xffffffff809326f8 sdsbuild
2682 2098 2682 207 SW+ pause 0xffffff03253933c0 ksh
2527 2526 2527 207 SW+ pause 0xffffff030b906068 ksh
2526 2525 2526 0 SWs+ wait 0xffffff031af246b0 login
2525 1956 2525 0 Ss select 0xffffffff8091fe50 rlogind
2098 2097 2098 207 SW+ pause 0xffffff030bbc7a70 ksh
2097 2096 2097 0 SWs+ wait 0xffffff030bbc7358 login
2096 1956 2096 0 Ss select 0xffffffff8091fe50 rlogind
2000 1999 2000 0 SW+ pause 0xffffff030663b718 csh
1999 1988 1999 207 SW+ wait 0xffffff030663ba08 su
1988 1987 1988 207 SW+ pause 0xffffff031b2e4718 ksh
1987 1986 1987 0 SWs+ wait 0xffffff031b2e4a08 login
1986 1956 1986 0 Ss select 0xffffffff8091fe50 rlogind
1982 1 1982 0 ?s+ getty
1981 1 1981 0 SWs+ ttyin 0xffffff03e0c42410 getty
1980 1 1980 0 SWs+ ttyin 0xffffff03e0c42010 getty
1979 1 1979 0 SWs+ ttyin 0xffffff03dd03c010 getty
1978 1 1978 0 SWs+ ttyin 0xffffff03e14b6410 getty
1977 1 1977 0 SWs+ ttyin 0xffffff03e0921810 getty
1976 1 1976 0 SWs+ ttyin 0xffffff03e0921410 getty
1975 1 1975 0 SWs+ ttyin 0xffffff03e0200410 getty
1974 1 1974 0 SWs+ ttyin 0xffffff03e0200010 getty
1956 1 1956 0 SWs select 0xffffffff8091fe50 inetd
1092 1 1092 0 Ss select 0xffffffff8091fe50 cvd
825 1 825 0 SLs vmwait 0xffffffff809326f8 cron
817 1 817 0 SLs vmwait 0xffffffff809326f8 sendmail
811 1 811 0 SWs select 0xffffffff8091fe50 sshd
793 1 793 0 Ss select 0xffffffff8091fe50 usbd
773 1 773 0 SLs pfault 0xffffffff809326f8 ntpd
745 738 738 0 SW - 0xffffff03e068bc00 nfsd
742 738 738 0 SW - 0xffffff0000f47600 nfsd
741 738 738 0 SW - 0xffffff03e068b400 nfsd
740 738 738 0 SW - 0xffffff03e068b600 nfsd
738 1 738 0 SWs select 0xffffffff8091fe50 nfsd
736 1 736 0 SWs select 0xffffffff8091fe50 mountd
707 1 707 0 SLs pfault 0xffffffff809326f8 amd
676 1 676 0 Ss select 0xffffffff8091fe50 rpcbind
657 1 657 0 Ss select 0xffffffff8091fe50 syslogd
547 1 547 0 SWs select 0xffffffff8091fe50 devd
41 0 0 0 SL - 0xffffffffbf5ddbe4 [schedcpu]
40 0 0 0 SL sdflush 0xffffffff80931ae0 [softdepflush]
39 0 0 0 SL syncer 0xffffffff8089aa20 [syncer]
38 0 0 0 SL vlruwt 0xffffff03dccfc6b0 [vnlru]
37 0 0 0 SL psleep 0xffffffff80920718 [bufdaemon]
36 0 0 0 SL pgzero 0xffffffff80933460 [pagezero]
35 0 0 0 SL psleep 0xffffffff80932b2c [vmdaemon]
34 0 0 0 SL VMWait 0xffffffff80932ae0 [pagedaemon]
33 0 0 0 WL [swi0: sio]
32 0 0 0 WL [irq15: ata1]
31 0 0 0 WL [irq14: ata0]
30 0 0 0 SL usbevt 0xffffffff94066420 [usb1]
29 0 0 0 SL usbtsk 0xffffffff80895750 [usbtask]
28 0 0 0 SL usbevt 0xffffffff94064420 [usb0]
27 0 0 0 WL [irq19: ohci0 ohci1
]
26 0 0 0 SL idle 0xffffffff940562e0 [mpt_raid0]
25 0 0 0 SL idle 0xffffffff94056000 [mpt_recovery0]
24 0 0 0 WL [irq28: mpt0]
23 0 0 0 SL - 0xffffff0000e2d700 [em3 taskq]
22 0 0 0 SL - 0xffffff0000e2dd00 [em2 taskq]
21 0 0 0 SL - 0xffffff0000e2a200 [em1 taskq]
20 0 0 0 SL - 0xffffff0000e36300 [em0 taskq]
19 0 0 0 WL [irq9: acpi0]
18 0 0 0 WL [swi6: Giant taskq]
9 0 0 0 SL - 0xffffff0000db5000 [thread taskq]
17 0 0 0 WL [swi5: +]
8 0 0 0 SL - 0xffffff0000db5300 [acpi_task_2]
7 0 0 0 SL - 0xffffff0000db5300 [acpi_task_1]
6 0 0 0 SL - 0xffffff0000db5300 [acpi_task_0]
5 0 0 0 SL - 0xffffff0000db5400 [kqueue taskq]
16 0 0 0 WL [swi2: cambio]
15 0 0 0 WL [swi6: task queue]
14 0 0 0 SL - 0xffffffff80893300 [yarrow]
4 0 0 0 SL - 0xffffffff80896468 [g_down]
3 0 0 0 SL - 0xffffffff80896460 [g_up]
2 0 0 0 SL - 0xffffffff80896450 [g_event]
13 0 0 0 WL [swi1: net]
12 0 0 0 WL [swi3: vm]
11 0 0 0 WL [swi4: clock sio]
10 0 0 0 RL CPU 0 [idle]
1 0 1 0 SLs wait 0xffffff03e1566000 [init]
0 0 0 0 SLs vmwait 0xffffffff809326f8 [swapper]
Here is where the pagedaemon process appears to be whenever the hang occurs:
db> tr 34
Tracing pid 34 tid 100027 td 0xffffff03e1557000
sched_switch() at sched_switch+0x11f
mi_switch() at mi_switch+0x14c
sleepq_wait() at sleepq_wait+0x2e
msleep() at msleep+0x191
swp_pager_meta_build() at swp_pager_meta_build+0x13b
swap_pager_putpages() at swap_pager_putpages+0x261
vm_pageout_flush() at vm_pageout_flush+0xde
vm_pageout() at vm_pageout+0x1668
fork_exit() at fork_exit+0x87
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffffffbf56ed00, rbp = 0 ---
Here is the lock info from ddb:
db> show locks
exclusive spin mutex sio r = 0 (0xffffffff8097a100) locked @ /usr/src/sys/dev/sio/sio.c:1390
db> show alllocks
db> show lockedvnods
Locked vnodes
db> show pcpu
cpuid = 0
curthread = 0xffffff017f421720: pid 10 "idle"
curpcb = 0xffffffffb6ce2d10
fpcurthread = none
idlethread = 0xffffff017f421720: pid 10 "idle"
spin locks held:
exclusive spin mutex sio r = 0 (0xffffffff8097a100) locked @ /usr/src/sys/dev/sio/sio.c:1390
db> show allpcpu
Current CPU: 0
cpuid = 0
curthread = 0xffffff017f421720: pid 10 "idle"
curpcb = 0xffffffffb6ce2d10
fpcurthread = none
idlethread = 0xffffff017f421720: pid 10 "idle"
spin locks held:
exclusive spin mutex sio r = 0 (0xffffffff8097a100) locked @ /usr/src/sys/dev/sio/sio.c:1390
Here is a 'vmstat -s' snapshot taken as the machine hung:
45794937 cpu context switches
20220635 device interrupts
810196 software interrupts
66195618 traps
6615004 system calls
61 kernel threads created
5940 fork() calls
560 vfork() calls
0 rfork() calls
1851 swap pager pageins
13980 swap pager pages paged in
234711 swap pager pageouts
3670484 swap pager pages paged out
1088 vnode pager pageins
5591 vnode pager pages paged in
0 vnode pager pageouts
0 vnode pager pages paged out
744 page daemon wakeups
301408821 pages examined by the page daemon
3228902 pages reactivated
233903 copy-on-write faults
425 copy-on-write optimized faults
55595560 zero fill pages zeroed
10623188 zero fill pages prezeroed
1878 intransit blocking page faults
66078208 total VM faults taken
0 pages affected by kernel thread creation
377526 pages affected by fork()
1282533 pages affected by vfork()
0 pages affected by rfork()
71450207 pages freed
3805 pages freed by daemon
7974727 pages freed by exiting processes
3577239 pages active
294202 pages inactive
13662 pages in VM cache
106557 pages wired down
10548 pages free
4096 bytes per page
3665219 total name lookups
cache hits (75% pos + 0% neg) system 1% per-directory
deletions 0%, falsehits 0%, toolong 0%
Here is a snapshot of 'sysctl vm' as the machine hung:
vm.vmtotal:
System wide totals computed every five seconds: (values in kilobytes)
===============================================
Processes: (RUNQ: 2 Disk Wait: 1 Page Wait: 1 Sleep: 43)
Virtual Memory: (Total: 35242K, Active 30153728K)
Real Memory: (Total: 15917288K Active 15540600K)
Shared Virtual Memory: (Total: 7128K Active: 6576K)
Shared Real Memory: (Total: 2368K Active: 2368K)
Free Memory Pages: 96860K
vm.loadavg: { 2.67 1.84 1.53 }
vm.v_free_min: 25512
vm.v_free_target: 107384
vm.v_free_reserved: 5336
vm.v_inactive_target: 161076
vm.v_cache_min: 107384
vm.v_cache_max: 214768
vm.v_pageout_free_min: 34
vm.pageout_algorithm: 0
vm.swap_enabled: 1
vm.kmem_size_scale: 3
vm.kmem_size_max: 419430400
vm.kmem_size: 419430400
vm.nswapdev: 1
vm.dmmax: 32
vm.swap_async_max: 4
vm.zone_count: 69
vm.zone:
ITEM SIZE LIMIT USED FREE REQUESTS
FFS2 dinode: 256, 0, 38765, 25060, 633124
FFS1 dinode: 128, 0, 0, 0, 0
FFS inode: 192, 0, 38765, 26355, 633124
Mountpoints: 936, 0, 181, 11, 181
SWAPMETA: 288, 116519, 116519, 0, 116551
rtentry: 264, 0, 40, 30, 72
ripcb: 304, 33792, 0, 24, 18
sackhole: 32, 0, 0, 0, 0
tcpreass: 40, 2184, 0, 0, 0
hostcache: 136, 15372, 17, 39, 20
syncache: 128, 15370, 0, 58, 12
tcptw: 80, 6795, 0, 810, 1595
tcpcb: 752, 33795, 34, 21, 1655
inpcb: 304, 33792, 34, 794, 1655
udpcb: 304, 33792, 204, 24, 3543
ipq: 56, 1071, 0, 126, 2604
unpcb: 192, 33800, 12, 28, 577
socket: 616, 33792, 250, 26, 5794
KNOTE: 120, 0, 0, 62, 6251
PIPE: 768, 0, 2, 23, 2416
NFSNODE: 816, 0, 9012, 7898, 65144
NFSMOUNT: 584, 0, 174, 15, 174
DIRHASH: 1024, 0, 1219, 293, 2098
L VFS Cache: 327, 0, 729, 687, 10160
S VFS Cache: 104, 0, 52929, 30231, 714353
NAMEI: 1024, 0, 4, 8, 1395238
VNODEPOLL: 152, 0, 0, 0, 0
VNODE: 648, 0, 47992, 34748, 698512
ata_composit: 376, 0, 0, 0, 0
ata_request: 336, 0, 0, 22, 24
g_bio: 216, 0, 0, 1818, 1418264
ACL UMA zone: 388, 0, 0, 0, 0
mbuf_jumbo_1: 16384, 0, 0, 0, 0
mbuf_jumbo_9: 9216, 0, 0, 0, 0
mbuf_jumbo_p: 4096, 0, 0, 0, 0
mbuf_cluster: 2048, 33792, 1152, 72, 7441
mbuf: 256, 0, 1169, 121, 2197657
mbuf_packet: 256, 0, 1152, 138, 1780924
VMSPACE: 544, 0, 47, 16, 6504
UPCALL: 88, 0, 0, 0, 0
KSEGRP: 136, 0, 112, 18, 112
THREAD: 608, 0, 112, 8, 112
PROC: 856, 0, 88, 24, 6562
Files: 120, 0, 149, 68, 223724
4096: 4096, 0, 351, 8, 9244
2048: 2048, 0, 29, 137, 3815
1024: 1024, 0, 727, 17, 1059555
512: 512, 0, 593, 149, 16584
256: 256, 0, 753, 117, 57787
128: 128, 0, 5084, 1325, 291338
64: 64, 0, 3596, 436, 53093
32: 32, 0, 1895, 226, 27977
16: 16, 0, 3166, 194, 389649
mt_zone: 64, 0, 203, 77, 203
DP fakepg: 120, 0, 0, 0, 0
PV ENTRY: 48, 5678064, 3753306, 116118, 106702024
MAP ENTRY: 112, 0, 1707, 207, 273770
KMAP ENTRY: 112, 98406, 16, 116, 11646
MAP: 352, 0, 7, 15, 7
VM OBJECT: 224, 0, 41676, 8678, 178636
128 Bucket: 1048, 0, 240, 306, 1081
64 Bucket: 536, 0, 38, 32, 73
32 Bucket: 280, 0, 16, 26, 39
16 Bucket: 152, 0, 30, 20, 43
UMA Hash: 256, 0, 5, 10, 8
UMA RCntSlab: 128, 0, 612, 26, 619
UMA Slabs: 128, 0, 4452, 3668, 14083
UMA Zones: 184, 0, 67, 13, 67
UMA Kegs: 240, 0, 67, 8, 67
vm.old_contigmalloc: 0
vm.swap_idle_threshold2: 10
vm.swap_idle_threshold1: 2
vm.exec_map_entries: 16
vm.stats.misc.zero_page_count: 1415
vm.stats.misc.cnt_prezero: 6936472
vm.stats.vm.v_kthreadpages: 0
vm.stats.vm.v_rforkpages: 0
vm.stats.vm.v_vforkpages: 1282533
vm.stats.vm.v_forkpages: 377616
vm.stats.vm.v_kthreads: 61
vm.stats.vm.v_rforks: 0
vm.stats.vm.v_vforks: 560
vm.stats.vm.v_forks: 5941
vm.stats.vm.v_interrupt_free_min: 2
vm.stats.vm.v_pageout_free_min: 34
vm.stats.vm.v_cache_max: 214768
vm.stats.vm.v_cache_min: 107384
vm.stats.vm.v_cache_count: 13662
vm.stats.vm.v_inactive_count: 294202
vm.stats.vm.v_inactive_target: 161076
vm.stats.vm.v_active_count: 3577241
vm.stats.vm.v_wire_count: 106556
vm.stats.vm.v_free_count: 10547
vm.stats.vm.v_free_min: 25512
vm.stats.vm.v_free_target: 107384
vm.stats.vm.v_free_reserved: 5336
vm.stats.vm.v_page_count: 4035617
vm.stats.vm.v_page_size: 4096
vm.stats.vm.v_tfree: 71450323
vm.stats.vm.v_pfree: 7974792
vm.stats.vm.v_dfree: 3805
vm.stats.vm.v_pdpages: 301408821
vm.stats.vm.v_pdwakeups: 744
vm.stats.vm.v_reactivated: 3228902
vm.stats.vm.v_intrans: 1878
vm.stats.vm.v_vnodepgsout: 0
vm.stats.vm.v_vnodepgsin: 5591
vm.stats.vm.v_vnodeout: 0
vm.stats.vm.v_vnodein: 1088
vm.stats.vm.v_swappgsout: 3670484
vm.stats.vm.v_swappgsin: 13980
vm.stats.vm.v_swapout: 234711
vm.stats.vm.v_swapin: 1851
vm.stats.vm.v_ozfod: 10623195
vm.stats.vm.v_zfod: 55595610
vm.stats.vm.v_cow_optim: 425
vm.stats.vm.v_cow_faults: 233938
vm.stats.vm.v_vm_faults: 66078331
vm.stats.sys.v_soft: 810461
vm.stats.sys.v_intr: 20220649
vm.stats.sys.v_syscall: 6616609
vm.stats.sys.v_trap: 66195742
vm.stats.sys.v_swtch: 45795521
vm.v_free_severe: 15424
vm.max_proc_mmap: 37449
vm.old_msync: 0
vm.msync_flush_flags: 3
vm.boot_pages: 48
vm.pageout_lock_miss: 0
vm.disable_swapspace_pageouts: 0
vm.defer_swapspace_pageouts: 0
vm.swap_idle_enabled: 0
vm.pageout_stats_interval: 5
vm.pageout_full_stats_interval: 20
vm.pageout_stats_max: 107384
vm.max_launder: 32
vm.idlezero_maxrun: 16
vm.idlezero_enable: 1
vm.kvm_free: 1042280448
vm.kvm_size: 2147479552
---
Brad Chisholm
blc at bsdwins.com
More information about the freebsd-hackers
mailing list