Adaptec 2400A RAID controller corrupting data (4.8)
Matt Staroscik
matt at wrongcrowd.com
Wed Jul 16 13:15:52 PDT 2003
I am going to break this saga into 2 posts, one with the ugly details for
those who are interested, and one short post with the essential questions
and observations.
>I have that card with 6 60 gig drives and set the box up (freebsd 4.7?)
>and it would run for a day or so and just crash. I also recall having
>similar panics when moving large amounts of data. I've given up on
>using the box for any real work so it's just sitting doing nothing
>waiting... hoping for a solution... a glimmer of hope. ;-)
>
>If you get it working please post.
Here is an update. While I have made progress I am not 100% hopeful for a
solution that is stable in the long term.
To make a long story short, I seem to have made the system much stable by
turning off soft updates. I was able to do a make buildworld, and then
delete the contents of /usr/obj. Previously, one of those actions was sure
to trigger a panic. Before I tried disabling soft updates I also did all
this, some of which I readily admit is voodoo:
- cable replacement
- jumped drives to Master instead of Cable Select
- Changed RAID card PCI slot
- Wiggled everything
I continued my test by cvsupping my source and doing another make
buildworld. However, this time it bombed out while working on groff. I
checked the file in an editor and it didn't look munged, so I am not sure
if there is an error in the cvs tree, an innocent file transfer error, or a
sign of deeper issues with my disk subsystem. I am going to thrash the
machine with more builds but avoid CVS for now.
Unfortunately, turning off soft updates isn't a great solution, if indeed
it IS a solution, which I am still testing. It definitely makes things
slower. My buildworld went from about 23 minutes to 34 minutes this way.
Removing the contents of /usr/obj took about 1 minute, whereas with soft
updates it took only a few seconds (though it panicked afterwards).
Update: I created a custom kernel config (adding only device pcm and
removing nothing) and successfully built it. I then installed it, rebooted,
and tried to make installworld. Bomb city! getty dumped core before I even
logged in and it got worse from there.
Then I tried deleting /usr/obj and I got the kernel panic again. :)
Observation: My last 2 panics (ffs_blkfree) reported these block numbers:
54608, 54592. Those are awfully close. Could my trouble stem from a defect
on a disk?
Things I have yet to try:
- Removing the Maxtor 160s from the RAID and trying them individually on
the motherboard controller.
- Applying a hammer to the system
Cheers,
Matt
More information about the freebsd-questions
mailing list