Adaptec 2400A RAID controller corrupting data (4.8)

Matt Staroscik matt at wrongcrowd.com
Wed Jul 16 13:15:52 PDT 2003


I am going to break this saga into 2 posts, one with the ugly details for 
those who are interested, and one short post with the essential questions 
and observations.

>I have that card with 6 60 gig drives and set the box up (freebsd 4.7?)
>and it would run for a day or so and just crash.  I also recall having
>similar panics when moving large amounts of data.  I've given up on
>using the box for any real work so it's just sitting doing nothing
>waiting... hoping for a solution... a glimmer of hope.  ;-)
>
>If you get it working please post.

Here is an update. While I have made progress I am not 100% hopeful for a 
solution that is stable in the long term.

To make a long story short, I seem to have made the system much stable by 
turning off soft updates. I was able to do a make buildworld, and then 
delete the contents of /usr/obj. Previously, one of those actions was sure 
to trigger a panic. Before I tried disabling soft updates I also did all 
this, some of which I readily admit is voodoo:

- cable replacement
- jumped drives to Master instead of Cable Select
- Changed RAID card PCI slot
- Wiggled everything

I continued my test by cvsupping my source and doing another make 
buildworld. However, this time it bombed out while working on groff. I 
checked the file in an editor and it didn't look munged, so I am not sure 
if there is an error in the cvs tree, an innocent file transfer error, or a 
sign of deeper issues with my disk subsystem. I am going to thrash the 
machine with more builds but avoid CVS for now.

Unfortunately, turning off soft updates isn't a great solution, if indeed 
it IS a solution, which I am still testing. It definitely makes things 
slower. My buildworld went from about 23 minutes to 34 minutes this way. 
Removing the contents of /usr/obj took about 1 minute, whereas with soft 
updates it took only a few seconds (though it panicked afterwards).

Update: I created a custom kernel config (adding only device pcm and 
removing nothing) and successfully built it. I then installed it, rebooted, 
and tried to make installworld. Bomb city! getty dumped core before I even 
logged in and it got worse from there.

Then I tried deleting /usr/obj and I got the kernel panic again. :)

Observation: My last 2 panics (ffs_blkfree) reported these block numbers: 
54608, 54592. Those are awfully close. Could my trouble stem from a defect 
on a disk?

Things I have yet to try:

- Removing the Maxtor 160s from the RAID and trying them individually on 
the motherboard controller.
- Applying a hammer to the system

Cheers,
Matt



More information about the freebsd-questions mailing list