Kernel panics in ahc during load with stable built 18th Feb

Tony Frank tfrank at optushome.com.au
Thu Feb 19 05:04:14 PST 2004


Hi all again,

Some more updates on my problems.

On Thu, Feb 19, 2004 at 04:23:36PM +1100, Tony Frank wrote:
> As per the subject I seem to be getting kernel panics in ahc 
> driver since upgrading my kernel & world to -stable.
> This occurs specifically when writing high volume of files to
> vinum raid5 volume spanning 4 scsi drives connected to Adaptec 2940 PCI
> controller.
> 
> The fault appears to be reproducable - every time I try to extract
> a tar file containing a copy of /usr/obj from another system.
> 
> vinum init and a lot of benchmarking (rawio and bonnie) work fine on the
> volume. 
> 
> dmesg lines:
> ahc0: <Adaptec 2940 Ultra SCSI adapter> port 0xb400-0xb4ff mem 0xe0800000-0xe080
> 0fff irq 10 at device 11.0 on pci0
> 
> Custom kernel is configured with:
> options 	AHC_ALLOW_MEMIO 
> 
> I setup a serial console and rebuilt kernel to include debugging bits.
> 
> Fatal trap 12: page fault while in kernel mode
> fault virtual address   = 0x5c
> fault code              = supervisor read, page not present
> instruction pointer     = 0x8:0xc015cab2
> stack pointer           = 0x10:0xc02e2b58
> frame pointer           = 0x10:0xc02e2b68
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, def32 1, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = Idle
> interrupt mask          = cam
> kernel: type 12 trap, code=0
> Stopped at      ahc_done+0xc2:  pushl   0x5c(%ebx)
> 
> db> trace
> ahc_done(c0f9a200,c0fb53c0) at ahc_done+0xc2
> ahc_run_qoutfifo(c0f9a200) at ahc_run_qoutfifo+0xf1
> ahc_platform_intr(c0f9a200,0,c02e2bf8,c027ab82,c0322458) at ahc_platform_intr+0x
> 174
> add_interrupt_randomness(c0322458,0,400010,c0300010,c0300010) at add_interrupt_r
> andomness+0xe
> Xresume10() at Xresume10+0x2b
> --- interrupt, eip = 0xc027fa46, esp = 0xc02e2bf0, ebp = 0xc02e2bf8 ---
> cpu_idle(e,633,2,80f9ff,0) at cpu_idle+0xe
> idle_loop() at idle_loop+0x1d

This problem easily occured after ~20 mins of extracting the mentioned tar file.
I rebuilt my kernel without options AHC_ALLOW_MEMIO.
With the new kernel the tar file extracted without any panics.
When I then tried to 'stress' the system a bit more I did get a bunch of messages
from ahc0 driver plus a panic.

My "stress test" consisted of:
FTP download (~500meg file over 100Mbps LAN using fxp0 from server)
tar xf objtest.tar (~500meg tar file containing /usr/obj /usr/src copy)
second tar xf objtest.tar (second time in different filesystem on ata disks)
cvsup (stable from local cvsup mirror server)

This was busy making noise & heat for about 1hr 40 mins before it died as mentioned.
As such without the AHC_ALLOW_MEMIO option it worked a bit longer but still failed.

Details of the new failure are included below:

tar: Skipping to next header
tar: Skipping to next header
tar: Skipping to next header
tar: Archive contains obsolescent base-64 headers
tar: Skipping to next header
tar: Skipping to next header
tar: Error exit delayed from previous errors
> ahc0:A:1: no active SCB for reconnecting target - issuing BUS DEVICE RESET
SAVED_SCSIID == 0x17, SAVED_LUN == 0x0, ARG_1 == 0x27 ACCUM = 0x27
SEQ_FLAGS == 0xc0, SCBPTR == 0xa, BTT == 0xff, SINDEX == 0x31
SCSIID == 0x27, SCB_SCSIID == 0x27, SCB_LUN == 0x0, SCB_TAG == 0x27, SCB_CONTROL
 == 0x64
SCSIBUSL == 0x27, SCSISIGI == 0xe6
SXFRCTL0 == 0x88
SEQCTL == 0x10
>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
ahc0: Dumping Card State in Message-in phase, at SEQADDR 0x1b8
Card was paused
ACCUM = 0x27, SINDEX = 0x31, DINDEX = 0x52, ARG_2 = 0xff
HCNT = 0x0 SCBPTR = 0xa
SCSISIGI[0xe6]:(REQI|BSYI|MSGI|IOI|CDI) ERROR[0x0] 
SCSIBUSL[0x27] LASTPHASE[0xe0]:(MSGI|IOI|CDI) SCSISEQ[0x12]:(ENAUTOATNP|ENRSELI)
 
SBLKCTL[0x2]:(SELWIDE) SCSIRATE[0x0] SEQCTL[0x10]:(FASTMODE) 
SEQ_FLAGS[0xc0]:(NO_CDB_SENT|NOT_IDENTIFIED) SSTAT0[0x7]:(DMADONE|SPIORDY|SDONE)
 
SSTAT1[0x3]:(REQINIT|PHASECHG) SSTAT2[0x0] SSTAT3[0x0] 
SIMODE0[0x0] SIMODE1[0xac]:(ENSCSIPERR|ENBUSFREE|ENSCSIRST|ENSELTIMO) 
SXFRCTL0[0x88]:(SPIOEN|DFON) DFCNTRL[0x0] DFSTATUS[0x2d]:(FIFOEMP|DFTHRESH|HDONE
|FIFOQWDEMP) 
STACK: 0x12c 0x0 0x151 0x192
SCB count = 50
Kernel NEXTQSCB = 38
Card NEXTQSCB = 38
QINFIFO entries: 
Waiting Queue entries: 
Disconnected Queue entries: 10:39 11:46 14:39 
QOUTFIFO entries: 
Sequencer Free SCB List: 12 9 6 2 3 7 8 15 1 4 0 5 13 
Sequencer SCB Info: 
  0 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x37] 
SCB_LUN[0x0] SCB_TAG[0xff] 
  1 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x7] 
SCB_LUN[0x0] SCB_TAG[0xff] 
  2 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x27] 
SCB_LUN[0x0] SCB_TAG[0xff] 
  3 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x37] 
SCB_LUN[0x0] SCB_TAG[0xff] 
  4 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x27] 
SCB_LUN[0x0] SCB_TAG[0xff] 
  5 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x37] 
SCB_LUN[0x0] SCB_TAG[0xff] 
  6 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x37] 
SCB_LUN[0x0] SCB_TAG[0xff] 
  7 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x7] 
SCB_LUN[0x0] SCB_TAG[0xff] 
  8 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x27] 
SCB_LUN[0x0] SCB_TAG[0xff] 
  9 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x27] 
SCB_LUN[0x0] SCB_TAG[0xff] 
 10 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) SCB_SCSIID[0x27] 
SCB_LUN[0x0] SCB_TAG[0x27] 
 11 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) SCB_SCSIID[0x17] 
SCB_LUN[0x0] SCB_TAG[0x2e] 
 12 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x37] 
SCB_LUN[0x0] SCB_TAG[0xff] 
 13 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x7] 
SCB_LUN[0x0] SCB_TAG[0xff] 
 14 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) SCB_SCSIID[0x17] 
SCB_LUN[0x0] SCB_TAG[0x27] 
 15 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) SCB_SCSIID[0x17] 
SCB_LUN[0x0] SCB_TAG[0xff] 
Pending list: 
 39 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) SCB_SCSIID[0x27] SCB_LUN[0x0] 
 46 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) SCB_SCSIID[0x17] SCB_LUN[0x0] 
 46 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) SCB_SCSIID[0x17] SCB_LUN[0x0] 
[ ... this line repeats ~256 times ... ]

Kernel Free SCB list: 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 3
9 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 
39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39
 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 3
9 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 
39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39
 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 3
9 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 
39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39
 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 39 

<<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>


Fatal trap 12: page fault while in kernel mode
fault virtual address	= 0x2c
fault code		= supervisor read, page not present
instruction pointer	= 0x8:0xc015755f
stack pointer	        = 0x10:0xccd72c40
frame pointer	        = 0x10:0xccd72c50
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 165 (top)
interrupt mask		= cam 
kernel: type 12 trap, code=0
Stopped at      ahc_match_scb+0xa3:     movl    0x2c(%eax),%eax
db> trace
ahc_match_scb(c0f9a200,c0fb54c0,1,41,ffffffff,ff,1) at ahc_match_scb+0xa3
ahc_abort_scbs(c0f9a200,1,41,ffffffff,ff) at ahc_abort_scbs+0x2d9
ahc_handle_devreset(c0f9a200,ccd72d44,17,c02acb84,0) at ahc_handle_devreset+0x2c
ahc_handle_scsiint(c0f9a200,64,c0322458,2,ccd72dc0) at ahc_handle_scsiint+0x95f
ahc_platform_intr(c0f9a200,ccd72e48,ccd72e1c,c027ab82,c0322458) at ahc_platform_
intr+0x1f7
add_interrupt_randomness(c0322458,0,1030010,c7f80010,ccd70010) at add_interrupt_
randomness+0xe
Xresume10() at Xresume10+0x2b
--- interrupt, eip = 0xc0188a2f, esp = 0xccd72e08, ebp = 0xccd72e1c ---
sysctl_find_oid(ccd72ef8,2,ccd72e44,ccd72e48,ccd72e70) at sysctl_find_oid+0x1b
sysctl_root(0,ccd72ef8,2,ccd72e70,0) at sysctl_root+0x22
userland_sysctl(c7f825a0,ccd72ef8,2,bfbff9cc,bfbff9d4) at userland_sysctl+0x111
__sysctl(c7f825a0,ccd72f80,2815578c,bfbff9d8,2) at __sysctl+0x5c
syscall2(2f,c107002f,bfbf002f,2,bfbff9d8) at syscall2+0x1f5
Xint0x80_syscall() at Xint0x80_syscall+0x25
db> ps
  pid   proc     addr    uid  ppid  pgrp  flag stat wmesg   wchan   cmd
  354 c7f801e0 cd00f000 1001   249   354 004006  2                  cvsup
  288 c7f80860 ccff5000 1001   240   288 004086  3   ttyin c1115830 ftp
  249 c7f80380 cd00c000 1001   248   249 2004086  3   pause cd00c260 tcsh
  248 c7f80520 cd007000 1001   246   118 000184  3  select c0324168 sshd
  246 c7f806c0 cd002000    0   118   118 000184  3  sbwait cc5bfec8 sshd
  240 c7f80a00 ccfed000 1001   239   240 2004086  3   pause ccfed260 tcsh
  239 c7f80d40 ccf5e000 1001   237   118 000184  2                  sshd
  237 c7f80ee0 ccf7a000    0   118   118 000184  3  sbwait cc5be308 sshd
  223 c7f80ba0 ccf6e000 1001   222   223 004086  3   ttyin c0f9aa30 tcsh
  222 c7f81080 ccf57000 1001   220   118 000184  3  select c0324168 sshd
  220 c7f813c0 cce0b000    0   118   118 000184  3  sbwait cc5be548 sshd
  171 c7f81560 ccded000 1001   153   171 004186  2                  systat
  165 c7f825a0 ccd70000 1001   152   165 004106  2                  top
  159 c7f818a0 ccda1000 1001   158   159 004086  3   ttyin c105b810 tcsh
  158 c7f81a40 ccd9c000    0     1   158 004186  3    wait c7f81a40 login
  153 c7f82dc0 ccd2f000    0     1   153 004186  3    wait c7f82dc0 login
  152 c7f82f60 ccd27000    0     1   152 004186  3    wait c7f82f60 login
  118 c7f81be0 ccd91000    0     1   118 000184  3  select c0324168 sshd
  116 c7f81d80 ccd84000    0     1   116 000484  2                  cron
  109 c7f81f20 ccd80000    0     1   104 000084  3  nfsidl c032a6ac nfsiod
--More--  108 c7f820c0 ccd7c000    0     1   104 000084  3  nfsidl c032a6a8 nfsiod
  107 c7f82260 ccd78000    0     1   104 000084  3  nfsidl c032a6a4 nfsiod
  106 c7f82400 ccd74000    0     1   104 000084  3  nfsidl c032a6a0 nfsiod
  101 c7f82c20 ccd37000    0     1   101 000084  2                  ntpd
   96 c7f82a80 ccd3b000    0     1    96 000004  2                  syslogd
   69 c7f82740 ccd44000    0     1    69 000084  3  select c0324168 dhclient
   29 c7f828e0 ccd3f000    0     1    29 2000084  3   pause ccd3f260 adjkerntz
    9 c7f83100 ccabf000    0     0     0 000204  3  vlruwt c7f83100 vnlru
    8 c7f832a0 ccabc000    0     0     0 000204  2                  syncer
    7 c7f83440 ccab9000    0     0     0 000204  3  vrlock c104c000 bufdaemon
    6 c7f835e0 ccab6000    0     0     0 000204  3  psleep c031b260 vmdaemon
    5 c7f83780 ccab3000    0     0     0 000204  3  psleep c02ffef8 pagedaemon
    4 c7f83920 cc5b8000    0     0     0 000204  3    idle c0f9a200 aic_recovery
0
    3 c7f83ac0 cc5b5000    0     0     0 000204  3    idle c0f9a200 aic_recovery
0
    2 c7f83c60 c856b000    0     0     0 000204  3   tqthr c0324164 taskqueue
    1 c7f83e00 c7f88000    0     0     1 004284  3    wait c7f83e00 init
    0 c0323460 c0473000    0     0     0 000204  3   sched c0323460 swapper


Any assistance is appreciated,

Tony


More information about the freebsd-stable mailing list