twa0 errors and system lockup on amd64

Vinod Kashyap vkashyap at amcc.com
Tue Aug 9 18:39:30 GMT 2005


> -----Original Message-----
> From: owner-freebsd-stable at freebsd.org 
> [mailto:owner-freebsd-stable at freebsd.org] On Behalf Of Jung-uk Kim
> Sent: Monday, August 08, 2005 2:29 PM
> To: freebsd-stable at FreeBSD.org
> Cc: Josh Endries
> Subject: Re: twa0 errors and system lockup on amd64
> 
> On Monday 08 August 2005 05:03 pm, Josh Endries wrote:
> > Hello,
> >
> > Just a little while ago I got this on a test 5.4-stable 
> dual Opteron 
> > box I'm setting up (9500S-LP RAID 5 with a hot spare):
> >
> > Aug  8 15:58:21 kernel: twa0: ERROR: (0x05: 0x210b): Request timed
> > out!: request = 0xffffffff80a67700
> > Aug  8 15:58:21 kernel: twa0: INFO: (0x16: 0x1108): Resetting
> > controller...:
> > Aug  8 15:58:21 kernel: twa0: ERROR: (0x15: 0x110b): Can't 
> drain AEN 
> > queue after reset: error = 60 Aug  8 15:58:21 kernel: twa0: ERROR: 
> > (0x16: 0x1105): Controller reset failed: error = 60; attempt 1
> >
> > It attempted twice and then just sat there after that. I 
> couldn't log 
> > in at all so I did a hard reset after probably 30+ minutes. 
> I didn't 
> > find much online other than driver source code or twe(4) man pages, 
> > which suggests that it's a problem between the driver and card. Has 
> > anyone else seen this problem? Is it a sign of a flaky card 
> or could 
> > it be something else? Maybe it's something to do with AMD64? This 
> > system was supposed to go into production tomorrow. I guess 
> it's good 
> > that it died today...
> 

This should have nothing to do with amd64.  The firmware seems to have
gotten into a bad state.  Can you reproduce this consistently?  If you
can, please run 'tw_cli /cX show diag', and send the output.

> I have seen it with 9500S-8, which is the same controller 
> with 8 ports.  In fact, I am seeing other problems.
> 
> twa0: INFO: (0x04: 0x000b): Rebuild started: unit=0
> twa0: ERROR: (0x04: 0x0002): Degraded unit: unit=0, port=5
> twa0: INFO: (0x04: 0x000b): Rebuild started: unit=0
> twa0: INFO: (0x04: 0x003b): Rebuild paused: unit=0
> twa0: INFO: (0x04: 0x000b): Rebuild started: unit=0
> twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=5
> twa0: INFO: (0x04: 0x003b): Rebuild paused: unit=0
> twa0: INFO: (0x04: 0x000b): Rebuild started: unit=0
> twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=5
> twa0: INFO: (0x04: 0x0005): Rebuild completed: unit=0
> twa0: ERROR: (0x04: 0x0009): Drive timeout detected: port=5
> twa0: ERROR: (0x04: 0x0002): Degraded unit: unit=0, port=5
> 
> I have rebuilt this array many times but it's happening again 
> and again.  It seems this controller/driver has issues with 
> amd64.  FYI, UP kernel or replacing cables didn't fix the problem.
> 

You seem to have a bad drive/cable at port 5.

> Good luck,
> 
> Jung-uk Kim
> 
> > Josh
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to 
> "freebsd-stable-unsubscribe at freebsd.org"
>
--------------------------------------------------------

CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and contains information that is confidential and proprietary to Applied Micro Circuits Corporation or its subsidiaries. It is to be used solely for the purpose of furthering the parties' business relationship. All unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.


More information about the freebsd-stable mailing list