6.2 Release - Adaptec 2130SLP driver?? issue - aac driver

Jeff Royle lists at qwirky.net
Wed Jan 24 16:17:50 UTC 2007


Jeff Royle wrote:
> LI Xin wrote:
>> Jeff Royle wrote:
>>> Jeff Royle wrote:
>>>> I could use some advice on this issue I have had with my raid 
>>>> controller.
>>>> I am not really running much on the system yet, postfix, Pf + pflogd,
>>>> rlogind, ssh, bsnmp and ntpd.  While I was just reading a file with
>>>> less the system stopped responding.   I thought it was the network
>>>> interfaces but I was able to ping the interface. Once I plugged a
>>>> monitor into the system I saw this (roughly):
>>>>
>>>> AAC0: COMMAND <SOME HEX> TIMEOUT AFTER X number of seconds
>>>>
>>>> Not good :)
>>>>
>>>> Reset of the system resolved the issue and it booted fine.    Since
>>>> the controller stopped responding nothing was recorded to my logs.
>>>>
>>>> Now I have to figure out how to prevent that from happening again.
>>>>
>>>> Basic run down on the system and some history...
>>>>
>>>> P4 3.2Ghz
>>>> Asus P5MT-S MB
>>>> 2 x 1GB DDR2 667 memory
>>>> Adaptec 2130SLP Raid Controller + battery backup module
>>>> 2 Segate Ultra320 73GB 15k RPM (mirrored)
>>>>
>>>> I have run this same system hardware testing 6.2-BETA3, RC-1 and RC-2
>>>> without this issue.    I was using the driver released by Adaptec
>>>> while testing the pre-release installs
>>>> (http://www.adaptec.com/en-US/speed/raid/aac/unix/aacraid_freebsd6_drv_b11518_tgz.htm).  
>>>> You could say I am fairly confidient in the hardware itself.   I have
>>>> put this system through a lot of testing since BETA3.
>>>>
>>>> The 6.2 release kernel has not been customized all that much, I just
>>>> pulled out all the drivers I would never use.    To be safe I kept
>>>> just about all scsi devices/card models still in as I continued my
>>>> testing of 6.2 release. Right now I am going to try taking out aac and
>>>> aacp then try the driver I used in my previous tests.    However,
>>>> since I have run a week without this issue it will be hard/impossible
>>>> tell if this did anything to resolve it...I almost want a crash on the
>>>> old driver :)
>>>>
>>>> So I need some advice...  How best do I debug this issue?
>>>>
>>>> Thanks in advance for any direction you guys can offer me.
>>>>
>>>> Cheers,
>>>>
>>>> Jeff
>>>>
>>>>
>>> It appears the driver I was using in my pre-release testing is newer
>>> then the release driver.
>>>
>>> Stock driver in 6.2r dmesg:
>>>
>>> aac0: <Adaptec SCSI RAID 2130S> mem
>>> 0xfc600000-0xfc7fffff,0xfc5ff000-0xfc5fffff irq 24 at device 1.0 on pci2
>>> aac0: New comm. interface enabled
>>> aac0: Adaptec Raid Controller 2.0.0-1
>>> aacp0: <SCSI Passthrough Bus> on aac0
>>>
>>> Currently using:
>>>
>>> aacu0: <Adaptec SCSI RAID 2130S> mem
>>> 0xfc600000-0xfc7fffff,0xfc5ff000-0xfc5fffff irq 24 at device 1.0 on pci2
>>> aacu0: New comm. interface enabled
>>> aacu0: Adaptec Raid Controller 2.0.7-1
>>> aacpu0: <SCSI Passthrough Bus> on aacu0
>>>
>>> Going to continue testing with the newer driver.
>>
>> I have some preliminary work on merging the Adaptec driver:
>>
>> http://people.freebsd.org/~delphij/for_review/patch-aac-vendor-b11518
>>
>> But one of the reviewers has advised me to request boarder testing,
>> especially against old cards and CLI tools, so I have hold the commit
>> for now.
>>
>> Cheers,
> 
> Well the driver patched fine, no issues to report there.
> 
> The speed performance is where I expected to see it while using bonnie 
> and simple DD tests based on my previous testing.
> 
> So far the issue I noted above with the TIMEOUT error has not shown 
> itself again, time will tell I think on this one.
> 
> However I have encountered a intermittent bug on boot.
> 
> Sometimes, say every 5-10 boots the system will hang while probing the 
> the scsi bus for the drives.   Now I have seen this happen on the aacdu 
> 2.0.7-1 binary driver I was using in my 6.2-RC 1 / 6.2-RC 2 testing once 
> before.  This problem is happening a fair bit more.
> 
> Here is where it hangs...
> 
> Hung dmesg output:
> 
> -- snip ---
> orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcd7ff on isa0
> sc0: <System console> at flags 0x100 on isa0
> sc0: VGA <16 virtual consoles, flags=0x300>
> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
> ppc0: parallel port not found.
> Timecounters tick every 1.000 msec
> acd0: CDRW <QSI CD-RW/DVD-ROM SBW-243/TX09> at ata0-master UDMA33
> aacd0: <RAID 1 (Mirror)> on aac0
> aacd0: 69889MB (143132672 sectors)
> --- end snip ---
> 
> The system does not continue on and probe the drives, as seen in a 
> normal boot dmesg:
> 
> --- snip ---
> sc0: <System console> at flags 0x100 on isa0
> sc0: VGA <16 virtual consoles, flags=0x300>
> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
> ppc0: parallel port not found.
> Timecounters tick every 1.000 msec
> acd0: CDRW <QSI CD-RW/DVD-ROM SBW-243/TX09> at ata0-master UDMA33
> aacd0: <RAID 1 (Mirror)> on aac0
> aacd0: 69889MB (143132672 sectors)
> pass0 at aacp0 bus 0 target 0 lun 0
> pass0: <SEAGATE ST373207LC 0005> Fixed unknown SCSI-3 device
> pass0: 3.300MB/s transfers
> pass1 at aacp0 bus 0 target 3 lun 0
> pass1: <SEAGATE ST373207LC 0005> Fixed unknown SCSI-3 device
> pass1: 3.300MB/s transfers
> SMP: AP CPU #1 Launched!
> Trying to mount root from ufs:/dev/aacd0s1a
> -- end snip --
> 
> In a effort to resolve this I increased the scsi delay in the kernel 
> from 5ms to 10ms
> 
> options         SCSI_DELAY=10000
> 
> It *may* have helped on one of my reboot tests, I thought it was going 
> to hang again but proceeded.   However it definitely did not solve the 
> issue.
> 
> Once I am back in the office I will see if I can get some debug output 
> for you.
> 
> Cheers,
> 
> Jeff

Update ---

The TIMEOUT error I think has been resolved using aac 2.0.7-1 patch. 
The system has never failed on any of my tests to generate the timeout.

However the hardlock on boot while probing the hard drives continues. 
  From another post someone suggested disabling the device fdc as there 
is a bug in the Intel chipset that can cause issues.   So I attempted 
that as I have seen the floppy seek an unusually long time.   No change.

I am assuming at this point this bug is not specific to the aac driver 
since I saw it at least once on the binary 2.0.7-1 driver from Adaptec.

Last reboot test results was : Reboot #10 hardlock

Unfortunately it will not break into the debugger to get more detailed 
information.

Last time I am going to try was a recent post suggesting 
hint.apic.0.disabled=1 might help.  This was to resolve another boot 
issue, not exactly the same issue I have but I am willing to try almost 
anything at this point.

I admit I don't really understand what exactly hint.apic.0.disabled 
does.   My assumption is it disables all APIC, we shall have to see :)

Cheers,

Jeff



More information about the freebsd-stable mailing list