kern/70360: Random lock-ups with 3ware RAID 5 under FreeBSD 5.2.1

Aaron Gifford astounding at
Thu Aug 12 04:00:37 PDT 2004

>Number:         70360
>Category:       kern
>Synopsis:       Random lock-ups with 3ware RAID 5 under FreeBSD 5.2.1
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Aug 12 11:00:35 GMT 2004
>Originator:     Aaron Gifford
>Release:        FreeBSD 5.2.1 (RELENG_5_2 as of August 2004)
FreeBSD 5.2.1-RELEASE-p9 FreeBSD 5.2.1-RELEASE-p9 #0: Tue Aug 10 14:14:44 MDT 2004     me at  i386  
There have been random system lock-ups (where the system becomes suddenly unresponsive, no unusual output on the console, keyboard is unresponsive too, network unresponsive) at unpredictable intervals on several machines running 5.2.1 with 3ware RAID arrays, including:

1) A verly lightly loaded machine with 3ware 7506-LP with 4 drives in a single RAID 5

2) A mixed load machine with 3Ware 8506 with 4 drives in a single RAID 10 array

3) And the worst offender, a medium-to-high-load server with 3ware 8506 with 8 drives in two arrays, one RAID 10 and one RAID 1

There doesn't seem to be any predictable I/O or system load correlation with the lock-ups that I'm aware of.  Murphy's law does mean a large number of them like to happen in the early A.M. though the timing is different each time, and it also happens at all other times of day.

With one machine (#2), it would lock-up during background fsck after the first lock-up, several times in a row, until background fsck was disabled.  I don't know if this is relevant, or not.
I wish I knew how to consistently repeat the problem.  The machines may stay up without problems for days, weeks, or longer, and other times, one will lock-up multiple times in a single day or week.
For now I'll be downgrading to 4.x.  Hopefully this will be remedied by 5.3 so that 5.3 can really be -STABLE rather than unstable like 5.2.1 has been.

I'm keeping the low-load machine running 5.2.1.  If anyone has any idea how to get the twe driver to print out some useful information instead of a totally-useless total-lock-up, I may be able to try it out and hope for another occurence.

More information about the freebsd-bugs mailing list