twa: Passthru request timed out! Resetting controller...
Clayton Milos
clay at milos.co.za
Wed Nov 15 22:30:45 PST 2006
----- Original Message -----
From: "Atanas" <atanas at asd.aplus.net>
To: "Mark Dotson" <mark at dmglobal.net>
Cc: <freebsd-stable at freebsd.org>
Sent: Thursday, November 16, 2006 4:07 AM
Subject: Re: twa: Passthru request timed out! Resetting controller...
> Mark Dotson said the following on 11/14/06 1:18 PM:
>> I've had continued problems with the 3ware series SATA cards and the Tyan
>> boards. Specifically, I have a "Tyan S5360-1U" and both a 9500S-4LP and
>> a 8506 series 3ware cards.
>>
>> In my case the first error is different, but the 'resetting' over and
>> over is VERY familiar. This could be triggered by a simple file copy
>> from one part of a container to another; degrading the unit and
>> triggering the resetting crap. Note that the drives are fine, I tested
>> that first thing.
>>
>> Sep 8 11:59:23 localhost kernel: 3w-9xxx: scsi0: WARNING: (0x06:0x002C):
>> Unit #1: Command (0x2a) timed out, resetting card.
>> Sep 8 11:59:41 localhost kernel: 3w-9xxx: scsi0: AEN: INFO
>> (0x04:0x005E):
>> Cache synchronized after power fail:unit=0.
>> Sep 8 11:59:41 localhost kernel: 3w-9xxx: scsi0: AEN: INFO
>> (0x04:0x005E):
>> Cache synchronized after power fail:unit=1.
>>
>> I also found this problem to exist across platforms, not just FreeBSD.
>> For example, the excerpt above is from a CentOS box.
>>
>> All tests were done with newest firmware for both card and mobo, and
>> using the newest drivers provided by 3ware.
>>
>> Once I removed the card and drives from the Tyan system and stuck them in
>> pretty much ANY other system, they worked fantastically.
>>
>> I don't have an answer for the "resetting problem" as of yet... 3ware and
>> Tyan (And my system vendor "Appro") are still trying to find my specific
>> problem and solve it. I believe they are currently doing the "replace
>> everything" method of troubleshooting.
>>
> Mark, thank you.
>
> It's good to know that the resetting problem exist on other platforms too.
>
> We already found out that replacing the entire box with identical one
> doesn't help, so unfortunately we'll have to start replacing components by
> using different brands or models.
>
> I wouldn't like to touch the I/O subsystem (these are already loaded
> production machines), so like you said, the safest bet would be to try
> another motherboard.
>
> However I don't see many Dual Opteron based boards suggested by the
> 3ware's compatibility list. The next one that comes in mind from that list
> is Supermicro H8DC8, but it looks more like a gamers dream (High-End PCI-e
> Graphics, SLI, etc. but no on-board VGA) than a server board.
>
> I'm quite surprised that the top Opteron based motherboard manufacturer
> listed in the 3ware web site motherboard compatibility docs:
> http://3ware.com/products/pdf/Motherboard_compatibility_list_9550SX_2006_06.pdf
> makes 2 out of 5 boards that are marked as compatible, but perform so bad
> with 3ware cards.
>
> I know what happens here in this mailing list when somebody looks for good
> SATA cards (Re: 3ware, 3ware, ...), I replied myself too.
>
> So are there any success stories with 3ware 9550SX (SATA II) and dual AMD
> Opteron server boards, or it's time to go back with Intel?
>
> Regards,
> Atanas
It's time to go with another SATA2 raid controller card. I have an Areca 8
port PCI-X cotroller card (www.areca.com.tw).
Running it on a Tyan Thunder motherboard with dual AthlonMP and I've had no
issues with it yet.
I've got 8 drives on it in 2 volumes of 4 drives each. I'm getting what I
consider to be good read/write speeds to the array.
It also supports many things that 3ware did not at the time I bought it like
online volume expansion.
homer# dd if=/dev/zero of=test.file bs=65536 count=16384
16384+0 records in
16384+0 records out
1073741824 bytes transferred in 7.000588 secs (153378801 bytes/sec)
-Clay
>
>
>> Atanas wrote:
>>> Has anyone experiencing this:
>>>
>>> twa0: ERROR: (0x05: 0x2018): Passthru request timed out!: request =
>>> 0xca839d20
>>> twa0: INFO: (0x16: 0x1108): Resetting controller...:
>>> twa0: INFO: (0x04: 0x005E): Cache synchronization completed: unit=0
>>> ...
>>> twa0: INFO: (0x04: 0x005E): Cache synchronization completed: unit=7
>>> twa0: INFO: (0x04: 0x0001): Controller reset occurred: resets=1
>>> twa0: INFO: (0x16: 0x1107): Controller reset done!:
>>>
>>> This happens on 6.2-PRERELEASE i386 (and on 6.1 since its release) on a
>>> number of machines with the following hardware configuration:
>>>
>>> - Tyan K8SE 2892, 2 AMD Opteron 270 CPUs, 4GB RAM
>>> - 3ware 9550SX-8LP, 8 500GB Seagate ST3500641AS SATA drives
>>> (configured as 8 SINGLE DISK units, aka JBOD)
>>>
>>> All hardware components, including the server chassis, are listed in the
>>> 3ware hardware compatibility lists. It doesn't seem to be a cabling or
>>> power issue. The controller and hard drives are already flashed to the
>>> latest firmware revisions. I tried turning off NCQ, but it didn't make
>>> any difference. I tried also switching the kernel from PAE to non-PAE
>>> (reducing the usable memory to 3GB), but it didn't help either.
>>>
>>> I have another machines with similar I/O configurations (3ware), but
>>> with Intel motherboards and running FreeBSD-5.5, and these run fine for
>>> about a year already. Now I'm thinking about swapping the drives between
>>> a working Intel and AMD based box, to see where controller timeouts will
>>> follow.
>>>
>>> The problem happens sporadically once in a month or so and is very hard
>>> to reproduce. Sometimes it takes several weeks until the next crash
>>> happens, sometimes it crashes again in just a few hours.
>>>
>>> When the thing happens, the kernel sometimes panics (most likely due to
>>> the inconsistent filesystem state caused by the controller reset),
>>> sometimes just hangs. It can be interrupted (I have a serial console),
>>> but the only usable thing after that seems to be "call cpu_reset()",
>>> followed by full (and sometimes painfully long) filesystem check.
>>>
>>> Here are the diffs against the default GENERIC and PAE kernel
>>> configurations:
>>>
>>> < cpu I486_CPU
>>> < ident GENERIC
>>> < options INET6 # IPv6 communications protocols
>>> < options SCSI_DELAY=5000 # Delay (in ms) before probing SCSI
>>>
>>> > options QUOTA
>>> > options SMP # Symmetric MultiProcessor Kernel
>>> > options BREAK_TO_DEBUGGER
>>> > options DDB
>>> > options KDB
>>> > options KDB_UNATTENDED
>>>
>>> > options IPFIREWALL
>>> > options DUMMYNET
>>>
>>> I'm attaching the dmesg.boot following the latest crash.
>>>
>>> Regards,
>>> Atanas
>>>
>
>
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>
More information about the freebsd-stable
mailing list