kern/114133: Kernel panic: locking problems between aac driver and aaccli

Herve Boulouis amon at sockar.homeip.net
Fri Jun 29 15:10:04 UTC 2007


>Number:         114133
>Category:       kern
>Synopsis:       Kernel panic: locking problems between aac driver and aaccli
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Jun 29 15:10:03 GMT 2007
>Closed-Date:
>Last-Modified:
>Originator:     Herve Boulouis
>Release:        FreeBSD 6.2-STABLE
>Organization:
>Environment:
FreeBSD ridley.nerim.net 6.2-STABLE FreeBSD 6.2-STABLE #0: Thu Jun 21 16:13:11 CEST 2007     root at ridley.nerim.net:/usr/src/sys/i386/compile/RIDLEY  i386

>Description:
I got the following panic on 2 different servers :

panic: _mtx_lock_sleep: recursed on non-recursive mutex AAC I/O lock @ ../../../dev/aac/aac.c:2985

This is triggered by a supervision script which runs aaccli to check the raid volume health. The 2 machines are supermicro Xeon servers with Adaptec 2200S cards. I have other servers with same hardware which do not exhibit the problem thought the 2 crashing ones are doing more disk i/o than the others.

Compiling the kernel with the standard (from the handbook) lock debugging options gave the following info (same kernel backtrace, same 'sendfib' status & backtrace for aaccli process in all crashes)


db> ps
  pid  ppid  pgrp   uid   state   wmesg     wchan    cmd
66921 66920  3208     0  S       sendfib  0xecdb5c14 aaccli
66920 66919  3208     0  S       wait     0xc669c860 sh
66919  3209  3208     0  S       piperd   0xc67e7198 perl5.8.6
66918 66629 66625     8  S       nanslp   0xc0887d6c sleep
66629 66625 66625     8  S       wait     0xc6c1a218 sh
66625 66622 66625     8  Ss      wait     0xc69e8218 sh
66622   707   707     0  S       piperd   0xc66c4b28 cron
66604   738   719     8  S       nanslp   0xc0887d6c sleep
51466   734   734     8  S       select   0xc08d28c4 innfeed
51465   734   734     8  S       sbwait   0xc692979c perl5.8.6
 3209     1  3208     0  S       select   0xc08d28c4 snmpd
  785     1   785     0  Ss+     ttyin    0xc64d2810 getty
  784     1   784     0  Ss+     ttyin    0xc64d9410 getty
  783     1   783     0  Ss+     ttyin    0xc64db010 getty
  782     1   782     0  Ss+     ttyin    0xc64db410 getty
  781     1   781     0  Ss+     ttyin    0xc64dac10 getty
  780     1   780     0  Ss+     ttyin    0xc64d9c10 getty
  779     1   779     0  Ss+     ttyin    0xc64ce410 getty
  778     1   778     0  Ss+     ttyin    0xc64dbc10 getty
--More--          777     1   777     0  Ss+     ttyin    0xc64dc010 getty
  739   736   719     8  S+      nanslp   0xc0887d6c perl5.8.6
  738   735   719     8  S+      wait     0xc690d218 sh
  736     1   719     8  S+      wait     0xc64b6a78 sh
  735     1   719     8  S+      wait     0xc669cc90 sh
  734     1   734     8  SLs     vmpfw    0xc2a72df8 innd
  707     1   707     0  Ss      nanslp   0xc0887d6c cron
  701     1   701    25  Ss      pause    0xc690d464 sendmail
  697     1   697     0  Ss      select   0xc08d28c4 sendmail
  691     1   691     0  Ss      select   0xc08d28c4 sshd
  634     1   633    80  S       kqread   0xc6727000 lighttpd
  567     1   567     0  Ss      select   0xc08d28c4 syslogd
  550     1   550     0  Ss      nanslp   0xc0887d6c ipmon
  510     1   510     0  Ss      select   0xc08d28c4 devd
   41     0     0     0  SL      -        0xe6abacf8 [schedcpu]
   40     0     0     0  SL      sdflush  0xc08e0254 [softdepflush]
   39     0     0     0  SL      syncer   0xc0887adc [syncer]
   38     0     0     0  SL      vlruwt   0xc6452c90 [vnlru]
   37     0     0     0  SL      psleep   0xc08d2e4c [bufdaemon]
   36     0     0     0  SL      pgzero   0xc08e11c4 [pagezero]
--More--           35     0     0     0  SL      psleep   0xc08e0d14 [vmdaemon]
   34     0     0     0  SL      psleep   0xc08e0cd0 [pagedaemon]
   33     0     0     0  SL      -        0xc6451a3c [fdc0]
   32     0     0     0  WL                          [swi0: sio]
   31     0     0     0  WL                          [irq1: atkbd0]
   30     0     0     0  WL                          [irq17: ichsmb0]
   29     0     0     0  WL                          [irq15: ata1]
   28     0     0     0  WL                          [irq14: ata0]
   27     0     0     0  WL                          [irq55: em1]
   26     0     0     0  RL                          [irq54: em0]
   25     0     0     0  SL      idle     0xc6497000 [aic_recovery1]
   24     0     0     0  WL                          [irq33: ahd1]
   23     0     0     0  SL      idle     0xc6495000 [aic_recovery0]
   22     0     0     0  WL                          [irq32: ahd0]
   21     0     0     0  SL      aifthd   0xc6452430 [aac0aif]
   20     0     0     0  WL                          [irq9: acpi0]
   19     0     0     0  WL                          [swi2: cambio]
   18     0     0     0  SL      ccb_scan 0xc087e584 [xpt_thrd]
    9     0     0     0  SL      -        0xc63da880 [acpi_task_2]
    8     0     0     0  SL      -        0xc63da880 [acpi_task_1]
--More--            7     0     0     0  SL      -        0xc63da880 [acpi_task_0]
   17     0     0     0  RL      CPU 0               [swi5: +]
    6     0     0     0  SL      -        0xc63daa00 [thread taskq]
   16     0     0     0  WL                          [swi6: Giant taskq]
   15     0     0     0  WL                          [swi6: task queue]
    5     0     0     0  SL      -        0xc63dad00 [kqueue taskq]
   14     0     0     0  SL      -        0xc0884500 [yarrow]
    4     0     0     0  SL      -        0xc0885228 [g_down]
    3     0     0     0  SL      -        0xc0885224 [g_up]
    2     0     0     0  SL      -        0xc088521c [g_event]
   13     0     0     0  WL                          [swi1: net]
   12     0     0     0  WL                          [swi3: vm]
   11     0     0     0  RL                          [swi4: clock sio]
   10     0     0     0  RL                          [idle]
    1     0     1     0  SLs     wait     0xc6391000 [init]
    0     0     0     0  WLs                         [swapper]
db> show pcpu
cpuid        = 0
curthread    = 0xc6392300: pid 17 "swi5: +"
curpcb       = 0xe5066d90
fpcurthread  = none
idlethread   = 0xc638c600: pid 10 "idle"
APIC ID      = 0
currentldt   = 0x50
spin locks held:
db> show locks
exclusive sleep mutex AAC I/O lock r = 0 (0xc6476134) locked @ dev/aac/aac.c:1009
db> show alllocks
Process 17 (swi5: +) thread 0xc6392300 (100011)
exclusive sleep mutex AAC I/O lock r = 0 (0xc6476134) locked @ dev/aac/aac.c:1009
db> show lockedvnods
Locked vnodes
db> trace
Tracing pid 17 tid 100011 td 0xc6392300
kdb_enter(c0815a6b,c0887740,c0814ee3,e5066bb8,100,...) at kdb_enter+0x30
panic(c0814ee3,c07f3b05,c07f3c43,ba9,c6476134,...) at panic+0xd5
_mtx_lock_sleep(c6476134,c6392300,0,c07f3c43,ba9,...) at _mtx_lock_sleep+0x48
_mtx_lock_flags(c6476134,0,c07f3c43,ba9,c6476134,...) at _mtx_lock_flags+0xe1
aac_ioctl_event(c6476000,c6c4e3e0,ecdb5c14) at aac_ioctl_event+0x4b
aac_release_command(c647e380,c6476158,c647e380,c6476000,e5066c78,...) at aac_release_command+0x112
aac_bio_complete(c647e380,4,e5066c68,e5066c6c,2c,...) at aac_bio_complete+0x2e
aac_complete(c6476000,1,c0818c67,50,c63da99c,...) at aac_complete+0xfb
taskqueue_run(c63da980,e5066cd8,c05dad78,0,e5066cd8,...) at taskqueue_run+0x110
taskqueue_fast_run(0,e5066cd8,c05e9b1c,c0887440,1,...) at taskqueue_fast_run+0x13
ithread_execute_handlers(c6391860,c63da900,c0812eb5,30d,c6392300,...) at ithread_execute_handlers+0x128
ithread_loop(c6372970,e5066d38,c0812cb7,32d,0,...) at ithread_loop+0x83
fork_exit(c05dae70,c6372970,e5066d38) at fork_exit+0xc1
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xe5066d6c, ebp = 0 ---

db> alltrace
   
Tracing command aaccli pid 66921 tid 100065 td 0xc69e7000
sched_switch(c69e7000,0,1,15e,5eef17a,...) at sched_switch+0x190  
mi_switch(1,0,c0818796,1bf,1,...) at mi_switch+0x2e6
sleepq_switch(ecdb5c14,0,c0818796,211,ecdb5bf8,...) at sleepq_switch+0x112
sleepq_wait(ecdb5c14,c6476134,c07f41f0,0,0,...) at sleepq_wait+0x65
msleep(ecdb5c14,c6476134,0,c07f41f0,0,...) at msleep+0x339
aac_ioctl_sendfib(c6476000,bfbfc704,131,2,c6458b00,...) at aac_ioctl_sendfib+0xc4
aac_ioctl(c6458b00,20003802,ecdb5d0c,7,c69e7000,...) at aac_ioctl+0x1a0
giant_ioctl(c6458b00,20003802,ecdb5d0c,7,c69e7000,...) at giant_ioctl+0x56
devfs_ioctl_f(c6adf6c0,20003802,ecdb5d0c,c6a6b700,c69e7000,...) at devfs_ioctl_f+0xe7
ioctl(c69e7000,ecdb5d04,c,c69e7000,3,...) at ioctl+0x48d
syscall(3b,3b,3b,20003802,6,...) at syscall+0x300
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (54, FreeBSD ELF32, ioctl), eip = 0x815f468, esp = 0xbfbfc5fc, ebp = 0xbfbfc628 ---
<snip the other processes>

My aac.c version is 1.109.2.5 
>How-To-Repeat:
Run aaccli regularly (I only use "container list /full") on a machine doing a lot of disk i/o (the servers which crash here run inn and cricket)
>Fix:
The only workaround I found is to to stop running aaccli :(

>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list