kern/114133: Kernel panic: locking problems between aac driver and
aaccli
Herve Boulouis
amon at sockar.homeip.net
Fri Jun 29 15:10:04 UTC 2007
>Number: 114133
>Category: kern
>Synopsis: Kernel panic: locking problems between aac driver and aaccli
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Fri Jun 29 15:10:03 GMT 2007
>Closed-Date:
>Last-Modified:
>Originator: Herve Boulouis
>Release: FreeBSD 6.2-STABLE
>Organization:
>Environment:
FreeBSD ridley.nerim.net 6.2-STABLE FreeBSD 6.2-STABLE #0: Thu Jun 21 16:13:11 CEST 2007 root at ridley.nerim.net:/usr/src/sys/i386/compile/RIDLEY i386
>Description:
I got the following panic on 2 different servers :
panic: _mtx_lock_sleep: recursed on non-recursive mutex AAC I/O lock @ ../../../dev/aac/aac.c:2985
This is triggered by a supervision script which runs aaccli to check the raid volume health. The 2 machines are supermicro Xeon servers with Adaptec 2200S cards. I have other servers with same hardware which do not exhibit the problem thought the 2 crashing ones are doing more disk i/o than the others.
Compiling the kernel with the standard (from the handbook) lock debugging options gave the following info (same kernel backtrace, same 'sendfib' status & backtrace for aaccli process in all crashes)
db> ps
pid ppid pgrp uid state wmesg wchan cmd
66921 66920 3208 0 S sendfib 0xecdb5c14 aaccli
66920 66919 3208 0 S wait 0xc669c860 sh
66919 3209 3208 0 S piperd 0xc67e7198 perl5.8.6
66918 66629 66625 8 S nanslp 0xc0887d6c sleep
66629 66625 66625 8 S wait 0xc6c1a218 sh
66625 66622 66625 8 Ss wait 0xc69e8218 sh
66622 707 707 0 S piperd 0xc66c4b28 cron
66604 738 719 8 S nanslp 0xc0887d6c sleep
51466 734 734 8 S select 0xc08d28c4 innfeed
51465 734 734 8 S sbwait 0xc692979c perl5.8.6
3209 1 3208 0 S select 0xc08d28c4 snmpd
785 1 785 0 Ss+ ttyin 0xc64d2810 getty
784 1 784 0 Ss+ ttyin 0xc64d9410 getty
783 1 783 0 Ss+ ttyin 0xc64db010 getty
782 1 782 0 Ss+ ttyin 0xc64db410 getty
781 1 781 0 Ss+ ttyin 0xc64dac10 getty
780 1 780 0 Ss+ ttyin 0xc64d9c10 getty
779 1 779 0 Ss+ ttyin 0xc64ce410 getty
778 1 778 0 Ss+ ttyin 0xc64dbc10 getty
--More-- 777 1 777 0 Ss+ ttyin 0xc64dc010 getty
739 736 719 8 S+ nanslp 0xc0887d6c perl5.8.6
738 735 719 8 S+ wait 0xc690d218 sh
736 1 719 8 S+ wait 0xc64b6a78 sh
735 1 719 8 S+ wait 0xc669cc90 sh
734 1 734 8 SLs vmpfw 0xc2a72df8 innd
707 1 707 0 Ss nanslp 0xc0887d6c cron
701 1 701 25 Ss pause 0xc690d464 sendmail
697 1 697 0 Ss select 0xc08d28c4 sendmail
691 1 691 0 Ss select 0xc08d28c4 sshd
634 1 633 80 S kqread 0xc6727000 lighttpd
567 1 567 0 Ss select 0xc08d28c4 syslogd
550 1 550 0 Ss nanslp 0xc0887d6c ipmon
510 1 510 0 Ss select 0xc08d28c4 devd
41 0 0 0 SL - 0xe6abacf8 [schedcpu]
40 0 0 0 SL sdflush 0xc08e0254 [softdepflush]
39 0 0 0 SL syncer 0xc0887adc [syncer]
38 0 0 0 SL vlruwt 0xc6452c90 [vnlru]
37 0 0 0 SL psleep 0xc08d2e4c [bufdaemon]
36 0 0 0 SL pgzero 0xc08e11c4 [pagezero]
--More-- 35 0 0 0 SL psleep 0xc08e0d14 [vmdaemon]
34 0 0 0 SL psleep 0xc08e0cd0 [pagedaemon]
33 0 0 0 SL - 0xc6451a3c [fdc0]
32 0 0 0 WL [swi0: sio]
31 0 0 0 WL [irq1: atkbd0]
30 0 0 0 WL [irq17: ichsmb0]
29 0 0 0 WL [irq15: ata1]
28 0 0 0 WL [irq14: ata0]
27 0 0 0 WL [irq55: em1]
26 0 0 0 RL [irq54: em0]
25 0 0 0 SL idle 0xc6497000 [aic_recovery1]
24 0 0 0 WL [irq33: ahd1]
23 0 0 0 SL idle 0xc6495000 [aic_recovery0]
22 0 0 0 WL [irq32: ahd0]
21 0 0 0 SL aifthd 0xc6452430 [aac0aif]
20 0 0 0 WL [irq9: acpi0]
19 0 0 0 WL [swi2: cambio]
18 0 0 0 SL ccb_scan 0xc087e584 [xpt_thrd]
9 0 0 0 SL - 0xc63da880 [acpi_task_2]
8 0 0 0 SL - 0xc63da880 [acpi_task_1]
--More-- 7 0 0 0 SL - 0xc63da880 [acpi_task_0]
17 0 0 0 RL CPU 0 [swi5: +]
6 0 0 0 SL - 0xc63daa00 [thread taskq]
16 0 0 0 WL [swi6: Giant taskq]
15 0 0 0 WL [swi6: task queue]
5 0 0 0 SL - 0xc63dad00 [kqueue taskq]
14 0 0 0 SL - 0xc0884500 [yarrow]
4 0 0 0 SL - 0xc0885228 [g_down]
3 0 0 0 SL - 0xc0885224 [g_up]
2 0 0 0 SL - 0xc088521c [g_event]
13 0 0 0 WL [swi1: net]
12 0 0 0 WL [swi3: vm]
11 0 0 0 RL [swi4: clock sio]
10 0 0 0 RL [idle]
1 0 1 0 SLs wait 0xc6391000 [init]
0 0 0 0 WLs [swapper]
db> show pcpu
cpuid = 0
curthread = 0xc6392300: pid 17 "swi5: +"
curpcb = 0xe5066d90
fpcurthread = none
idlethread = 0xc638c600: pid 10 "idle"
APIC ID = 0
currentldt = 0x50
spin locks held:
db> show locks
exclusive sleep mutex AAC I/O lock r = 0 (0xc6476134) locked @ dev/aac/aac.c:1009
db> show alllocks
Process 17 (swi5: +) thread 0xc6392300 (100011)
exclusive sleep mutex AAC I/O lock r = 0 (0xc6476134) locked @ dev/aac/aac.c:1009
db> show lockedvnods
Locked vnodes
db> trace
Tracing pid 17 tid 100011 td 0xc6392300
kdb_enter(c0815a6b,c0887740,c0814ee3,e5066bb8,100,...) at kdb_enter+0x30
panic(c0814ee3,c07f3b05,c07f3c43,ba9,c6476134,...) at panic+0xd5
_mtx_lock_sleep(c6476134,c6392300,0,c07f3c43,ba9,...) at _mtx_lock_sleep+0x48
_mtx_lock_flags(c6476134,0,c07f3c43,ba9,c6476134,...) at _mtx_lock_flags+0xe1
aac_ioctl_event(c6476000,c6c4e3e0,ecdb5c14) at aac_ioctl_event+0x4b
aac_release_command(c647e380,c6476158,c647e380,c6476000,e5066c78,...) at aac_release_command+0x112
aac_bio_complete(c647e380,4,e5066c68,e5066c6c,2c,...) at aac_bio_complete+0x2e
aac_complete(c6476000,1,c0818c67,50,c63da99c,...) at aac_complete+0xfb
taskqueue_run(c63da980,e5066cd8,c05dad78,0,e5066cd8,...) at taskqueue_run+0x110
taskqueue_fast_run(0,e5066cd8,c05e9b1c,c0887440,1,...) at taskqueue_fast_run+0x13
ithread_execute_handlers(c6391860,c63da900,c0812eb5,30d,c6392300,...) at ithread_execute_handlers+0x128
ithread_loop(c6372970,e5066d38,c0812cb7,32d,0,...) at ithread_loop+0x83
fork_exit(c05dae70,c6372970,e5066d38) at fork_exit+0xc1
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xe5066d6c, ebp = 0 ---
db> alltrace
Tracing command aaccli pid 66921 tid 100065 td 0xc69e7000
sched_switch(c69e7000,0,1,15e,5eef17a,...) at sched_switch+0x190
mi_switch(1,0,c0818796,1bf,1,...) at mi_switch+0x2e6
sleepq_switch(ecdb5c14,0,c0818796,211,ecdb5bf8,...) at sleepq_switch+0x112
sleepq_wait(ecdb5c14,c6476134,c07f41f0,0,0,...) at sleepq_wait+0x65
msleep(ecdb5c14,c6476134,0,c07f41f0,0,...) at msleep+0x339
aac_ioctl_sendfib(c6476000,bfbfc704,131,2,c6458b00,...) at aac_ioctl_sendfib+0xc4
aac_ioctl(c6458b00,20003802,ecdb5d0c,7,c69e7000,...) at aac_ioctl+0x1a0
giant_ioctl(c6458b00,20003802,ecdb5d0c,7,c69e7000,...) at giant_ioctl+0x56
devfs_ioctl_f(c6adf6c0,20003802,ecdb5d0c,c6a6b700,c69e7000,...) at devfs_ioctl_f+0xe7
ioctl(c69e7000,ecdb5d04,c,c69e7000,3,...) at ioctl+0x48d
syscall(3b,3b,3b,20003802,6,...) at syscall+0x300
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (54, FreeBSD ELF32, ioctl), eip = 0x815f468, esp = 0xbfbfc5fc, ebp = 0xbfbfc628 ---
<snip the other processes>
My aac.c version is 1.109.2.5
>How-To-Repeat:
Run aaccli regularly (I only use "container list /full") on a machine doing a lot of disk i/o (the servers which crash here run inn and cricket)
>Fix:
The only workaround I found is to to stop running aaccli :(
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list