twa: Passthru request timed out! Resetting controller...

Clayton Milos clay at milos.co.za
Wed Nov 15 22:30:45 PST 2006


----- Original Message ----- 
From: "Atanas" <atanas at asd.aplus.net>
To: "Mark Dotson" <mark at dmglobal.net>
Cc: <freebsd-stable at freebsd.org>
Sent: Thursday, November 16, 2006 4:07 AM
Subject: Re: twa: Passthru request timed out! Resetting controller...


> Mark Dotson said the following on 11/14/06 1:18 PM:
>> I've had continued problems with the 3ware series SATA cards and the Tyan 
>> boards.  Specifically, I have a "Tyan S5360-1U" and both a 9500S-4LP and 
>> a 8506 series 3ware cards.
>>
>> In my case the first error is different, but the 'resetting' over and 
>> over is VERY familiar.  This could be triggered by a simple file copy 
>> from one part of a container to another; degrading the unit and 
>> triggering the resetting crap.  Note that the drives are fine, I tested 
>> that first thing.
>>
>> Sep  8 11:59:23 localhost kernel: 3w-9xxx: scsi0: WARNING: (0x06:0x002C): 
>> Unit #1: Command (0x2a) timed out, resetting card.
>> Sep  8 11:59:41 localhost kernel: 3w-9xxx: scsi0: AEN: INFO 
>> (0x04:0x005E):
>> Cache synchronized after power fail:unit=0.
>> Sep  8 11:59:41 localhost kernel: 3w-9xxx: scsi0: AEN: INFO 
>> (0x04:0x005E):
>> Cache synchronized after power fail:unit=1.
>>
>> I also found this problem to exist across platforms, not just FreeBSD. 
>> For example, the excerpt above is from a CentOS box.
>>
>> All tests were done with newest firmware for both card and mobo, and 
>> using the newest drivers provided by 3ware.
>>
>> Once I removed the card and drives from the Tyan system and stuck them in 
>> pretty much ANY other system, they worked fantastically.
>>
>> I don't have an answer for the "resetting problem" as of yet... 3ware and 
>> Tyan (And my system vendor "Appro") are still trying to find my specific 
>> problem and solve it.  I believe they are currently doing the "replace 
>> everything" method of troubleshooting.
>>
> Mark, thank you.
>
> It's good to know that the resetting problem exist on other platforms too.
>
> We already found out that replacing the entire box with identical one 
> doesn't help, so unfortunately we'll have to start replacing components by 
> using different brands or models.
>
> I wouldn't like to touch the I/O subsystem (these are already loaded 
> production machines), so like you said, the safest bet would be to try 
> another motherboard.
>
> However I don't see many Dual Opteron based boards suggested by the 
> 3ware's compatibility list. The next one that comes in mind from that list 
> is Supermicro H8DC8, but it looks more like a gamers dream (High-End PCI-e 
> Graphics, SLI, etc. but no on-board VGA) than a server board.
>
> I'm quite surprised that the top Opteron based motherboard manufacturer 
> listed in the 3ware web site motherboard compatibility docs:
> http://3ware.com/products/pdf/Motherboard_compatibility_list_9550SX_2006_06.pdf
> makes 2 out of 5 boards that are marked as compatible, but perform so bad 
> with 3ware cards.
>
> I know what happens here in this mailing list when somebody looks for good 
> SATA cards (Re: 3ware, 3ware, ...), I replied myself too.
>
> So are there any success stories with 3ware 9550SX (SATA II) and dual AMD 
> Opteron server boards, or it's time to go back with Intel?
>
> Regards,
> Atanas

It's time to go with another SATA2 raid controller card. I have an Areca 8 
port PCI-X cotroller card (www.areca.com.tw).
Running it on a Tyan Thunder motherboard with dual AthlonMP and I've had no 
issues with it yet.
I've got 8 drives on it in 2 volumes of 4 drives each. I'm getting what I 
consider to be good read/write speeds to the array.
It also supports many things that 3ware did not at the time I bought it like 
online volume expansion.

homer# dd if=/dev/zero of=test.file bs=65536 count=16384
16384+0 records in
16384+0 records out
1073741824 bytes transferred in 7.000588 secs (153378801 bytes/sec)

-Clay

>
>
>> Atanas wrote:
>>> Has anyone experiencing this:
>>>
>>> twa0: ERROR: (0x05: 0x2018): Passthru request timed out!: request = 
>>> 0xca839d20
>>> twa0: INFO: (0x16: 0x1108): Resetting controller...:
>>> twa0: INFO: (0x04: 0x005E): Cache synchronization completed: unit=0
>>> ...
>>> twa0: INFO: (0x04: 0x005E): Cache synchronization completed: unit=7
>>> twa0: INFO: (0x04: 0x0001): Controller reset occurred: resets=1
>>> twa0: INFO: (0x16: 0x1107): Controller reset done!:
>>>
>>> This happens on 6.2-PRERELEASE i386 (and on 6.1 since its release) on a 
>>> number of machines with the following hardware configuration:
>>>
>>> - Tyan K8SE 2892, 2 AMD Opteron 270 CPUs, 4GB RAM
>>> - 3ware 9550SX-8LP, 8 500GB Seagate ST3500641AS SATA drives
>>>   (configured as 8 SINGLE DISK units, aka JBOD)
>>>
>>> All hardware components, including the server chassis, are listed in the 
>>> 3ware hardware compatibility lists. It doesn't seem to be a cabling or 
>>> power issue. The controller and hard drives are already flashed to the 
>>> latest firmware revisions. I tried turning off NCQ, but it didn't make 
>>> any difference. I tried also switching the kernel from PAE to non-PAE 
>>> (reducing the usable memory to 3GB), but it didn't help either.
>>>
>>> I have another machines with similar I/O configurations (3ware), but 
>>> with Intel motherboards and running FreeBSD-5.5, and these run fine for 
>>> about a year already. Now I'm thinking about swapping the drives between 
>>> a working Intel and AMD based box, to see where controller timeouts will 
>>> follow.
>>>
>>> The problem happens sporadically once in a month or so and is very hard 
>>> to reproduce. Sometimes it takes several weeks until the next crash 
>>> happens, sometimes it crashes again in just a few hours.
>>>
>>> When the thing happens, the kernel sometimes panics (most likely due to 
>>> the inconsistent filesystem state caused by the controller reset), 
>>> sometimes just hangs. It can be interrupted (I have a serial console), 
>>> but the only usable thing after that seems to be "call cpu_reset()", 
>>> followed by full (and sometimes painfully long) filesystem check.
>>>
>>> Here are the diffs against the default GENERIC and PAE kernel 
>>> configurations:
>>>
>>> < cpu       I486_CPU
>>> < ident     GENERIC
>>> < options   INET6               # IPv6 communications protocols
>>> < options   SCSI_DELAY=5000     # Delay (in ms) before probing SCSI
>>>
>>>  > options   QUOTA
>>>  > options   SMP                 # Symmetric MultiProcessor Kernel
>>>  > options   BREAK_TO_DEBUGGER
>>>  > options   DDB
>>>  > options   KDB
>>>  > options   KDB_UNATTENDED
>>>
>>>  > options   IPFIREWALL
>>>  > options   DUMMYNET
>>>
>>> I'm attaching the dmesg.boot following the latest crash.
>>>
>>> Regards,
>>> Atanas
>>>
>
>
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
> 



More information about the freebsd-stable mailing list