RPI3, panic: deadlres_td_sleep_q: possible deadlock detected for 0xfffffd003ad8f000, blocked for 1802596 ticks
bob prohaska
fbsd at www.zefox.net
Fri Nov 29 16:37:27 UTC 2019
This morning an Rpi3 was found reporting "panic: deadlres_td_sleep_q: possible deadlock detected for 0xfffffd003ad8f000, blocked for 1802596 ticks" .
System and kernel are at r355024.
Late last night the machine seemed to have stalled on a make in www/chromium, with
make-related processes showing zero percent cpu in top, low storage activity and no
progress in log files but still responsive to shell commands. This is the same
machine that stalled with "smsc0: failed to create mbuf" on Nov. 25th but didn't
panic by itself in that case.
This time I captured a backtrace:
panic: deadlres_td_sleep_q: possible deadlock detected for 0xfffffd003ad8f000, blocked for 1802596 ticks
cpuid = 0
time = 1575027501
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28
pc = 0xffff000000729e4c lr = 0xffff0000001066c8
sp = 0xffff0000518d2580 fp = 0xffff0000518d2790
db_trace_self_wrapper() at vpanic+0x18c
pc = 0xffff0000001066c8 lr = 0xffff000000400d7c
sp = 0xffff0000518d27a0 fp = 0xffff0000518d2850
vpanic() at panic+0x44
pc = 0xffff000000400d7c lr = 0xffff000000400b2c
sp = 0xffff0000518d2860 fp = 0xffff0000518d28e0
panic() at deadlkres+0x314
pc = 0xffff000000400b2c lr = 0xffff00000039cdf0
sp = 0xffff0000518d28f0 fp = 0xffff0000518d2940
deadlkres() at fork_exit+0x7c
pc = 0xffff00000039cdf0 lr = 0xffff0000003c10d4
sp = 0xffff0000518d2950 fp = 0xffff0000518d2980
fork_exit() at fork_trampoline+0x10
pc = 0xffff0000003c10d4 lr = 0xffff0000007471a4
sp = 0xffff0000518d2990 fp = 0x0000000000000000
KDB: enter: panic
[ thread pid 0 tid 100052 ]
Stopped at 0
db> bt
Tracing pid 0 tid 100052 td 0xfffffd0000bb5560
db_trace_self() at db_stack_trace+0xf8
pc = 0xffff000000729e4c lr = 0xffff000000103b0c
sp = 0xffff0000518d2150 fp = 0xffff0000518d2180
db_stack_trace() at db_command+0x228
pc = 0xffff000000103b0c lr = 0xffff000000103784
sp = 0xffff0000518d2190 fp = 0xffff0000518d2270
db_command() at db_command_loop+0x58
pc = 0xffff000000103784 lr = 0xffff00000010352c
sp = 0xffff0000518d2280 fp = 0xffff0000518d22a0
db_command_loop() at db_trap+0xf4
pc = 0xffff00000010352c lr = 0xffff000000106830
sp = 0xffff0000518d22b0 fp = 0xffff0000518d24d0
db_trap() at kdb_trap+0x1d8
pc = 0xffff000000106830 lr = 0xffff0000004492fc
sp = 0xffff0000518d24e0 fp = 0xffff0000518d2590
kdb_trap() at do_el1h_sync+0xf4
pc = 0xffff0000004492fc lr = 0xffff000000747418
sp = 0xffff0000518d25a0 fp = 0xffff0000518d25d0
do_el1h_sync() at handle_el1h_sync+0x78
pc = 0xffff000000747418 lr = 0xffff00000072c878
sp = 0xffff0000518d25e0 fp = 0xffff0000518d26f0
handle_el1h_sync() at kdb_enter+0x34
pc = 0xffff00000072c878 lr = 0xffff000000448948
sp = 0xffff0000518d2700 fp = 0xffff0000518d2790
kdb_enter() at vpanic+0x1a8
pc = 0xffff000000448948 lr = 0xffff000000400d98
sp = 0xffff0000518d27a0 fp = 0xffff0000518d2850
vpanic() at panic+0x44
pc = 0xffff000000400d98 lr = 0xffff000000400b2c
sp = 0xffff0000518d2860 fp = 0xffff0000518d28e0
panic() at deadlkres+0x314
pc = 0xffff000000400b2c lr = 0xffff00000039cdf0
sp = 0xffff0000518d28f0 fp = 0xffff0000518d2940
deadlkres() at fork_exit+0x7c
pc = 0xffff00000039cdf0 lr = 0xffff0000003c10d4
sp = 0xffff0000518d2950 fp = 0xffff0000518d2980
fork_exit() at fork_trampoline+0x10
pc = 0xffff0000003c10d4 lr = 0xffff0000007471a4
sp = 0xffff0000518d2990 fp = 0x0000000000000000
db>
The last top window reported:
last pid: 50623; load averages: 0.32, 0.35, 0.39 up 3+20:13:48 03:38:19
60 processes: 1 running, 58 sleeping, 1 zombie
CPU: 2.3% user, 0.0% nice, 8.1% system, 0.2% interrupt, 89.4% idle
Mem: 86M Active, 629M Inact, 7536K Laundry, 116M Wired, 26M Buf, 62M Free
Swap: 7194M Total, 1772M Used, 5422M Free, 24% Inuse
packet_write_wait: Connection to 50.1.20.28 port 22: Broken pipe
bob at raspberrypi:~ $ R PRI NICE SIZE RES SWAP STATE C TIME WCPU COMMAND
67897 root 1 23 0 679M 93M 18M vmpfw 1 11:34 0.00% c++
37083 root 1 52 0 12M 3756K 0B ufs 1 4:29 0.00% find
37086 root 1 52 0 11M 2720K 0B piperd 0 0:00 0.00% cat
50623 bob 1 52 0 12M 2652K 0B nanslp 2 0:00 0.00% gstat
37079 root 1 52 0 11M 2172K 0B wait 1 0:00 0.00% sh
37084 root 1 52 0 11M 2168K 0B wait 3 0:00 0.00% sh
36967 root 1 52 0 11M 2164K 0B wait 0 0:00 0.00% sh
37077 root 1 52 0 11M 2164K 0B wait 3 0:00 0.00% sh
37066 root 1 52 0 11M 2160K 0B wait 1 0:00 0.00% sh
37070 root 1 52 0 11M 2156K 0B wait 2 0:00 0.00% sh
37067 root 1 52 0 11M 2156K 0B wait 3 0:00 0.00% sh
36949 root 1 32 0 11M 2156K 0B wait 1 0:00 0.00% sh
36959 root 1 52 0 11M 2156K 0B wait 2 0:00 0.00% sh
37078 root 1 52 0 11M 1860K 0B piperd 0 0:00 0.00% mail
36968 root 1 29 0 11M 1852K 0B piperd 1 0:00 0.00% mail
The swap usage was high, but lower than earlier in the make session. A log of both
make output and system activity was captured, the last entries of which are:
[12229/16483] c++ -MMD -MF obj/chrome/browser/ui/ui/ui_jumbo_41.o.d -DUSE_DBUS -DUSE_AURA=1 -DUSE_GLIB=1 -DUSE_NSS_CERTS=1 -DUSE_X11=1 -DOFFICIAL_BUILD -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE
.......(long command line, list of more-or-less routine warnings)
and
0/1530/1530/19178 mbuf clusters in use (current/cache/total/max)
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr mm0 da0 in sy cs us sy id
0 0 9 2472776 63368 7126 159 228 129 6751 24168 0 0 23120 1800 15731 41 22 37
dT: 1.063s w: 1.000s
L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name
0 2 0 0 0.0 2 53 3.2 0 0 0.0 0.6 mmcsd0
0 2 0 0 0.0 2 53 3.2 0 0 0.0 0.6 mmcsd0s2
0 2 0 0 0.0 2 53 3.2 0 0 0.0 0.6 mmcsd0s2a
0 2 0 0 0.0 2 53 3.3 0 0 0.0 0.6 ufs/rootfs
Fri Nov 29 03:37:44 PST 2019
Device 1K-blocks Used Avail Capacity
/dev/mmcsd0s2b 4404252 906260 3497992 21%
/dev/da0p6 5242880 907904 4334976 17%
Total 9647132 1814164 7832968 19%
Nov 29 03:37:13 www sshd[50211]: error: PAM: Authentication error for illegal user support from 103.133.104.114
Nov 29 03:37:13 www sshd[50211]: error: Received disconnect from 103.133.104.114 port 60251:14: No more user authentication methods available. [preauth]
The commands used to capture the system activity were:
#!/bin/sh
while true
do vmstat ; gstat -abd -I 1s ; date ; swapinfo ; tail -n 2 /var/log/messages ; netstat -m | grep "mbuf clusters"
done
About three hours elapsed between the last interactive observation
and the last logfile entry.
The log files are too large to post in their entirety but they can
be put on a webserver if it's useful.
Thanks for reading, and any suggestions
bob prohaska
More information about the freebsd-arm
mailing list