Derek Ragona wrote:
> At 09:00 AM 11/14/2007, Barnaby Scott wrote:
>> I suspect I already know the answer to this, which is that the trouble 
>> I am having is nothing to do with the OS at all, but I have to ask, 
>> because I am otherwise up against a total brick wall!
>> I bought a second-hand Dell Poweredge 4600 and installed FreeBSD 6.2 
>> earlier this year. I had it set up with RAID5 using its PERC3/DC 
>> controller, with 7 x 73GB disks (+ 1 hot spare). So far so good, and 
>> it worked faultlessly as a Samba server for several months.
>> At the beginning of October, it went down, reporting a mismatch 
>> between the configuration on the NVRAM and the disks. With help from 
>> Dell support, I managed to recreate the RAID array and it worked again 
>> for a month.
>> In early November it happened again, and has kept happening since. At 
>> one point it appeared that the backplane was faulty, so I replaced 
>> that, but I cannot keep the server up for more than a day or so 
>> without this 'mismatch' poblem.
>> What about diagnostics on the hardware you may ask? I have run all the 
>> diagnostic tools that Dell can supply - several times - and the server 
>> declares itself to be totally fault-free.
>> My specific questions therefore:
>> Is there any way at all that FreeBSD could be invloved with this 
>> problem? (I did notice for example that the Dell PERC3/DC controller 
>> was not in the list of supported hardware - but then again, why did it 
>> work for several months?)
>> Can I use FreeBSD to tell me anything about the fault that Dell's 
>> diagnostic tools haven't found?
>> (I do hope someone might be able to help - Dell are trying to get me 
>> to switch to a 'supported' OS!)
>> Thanks
>> Barnaby Scott
> It doesn't sound like any OS issue as you set up the RAID outside the 
> OS.  It may be a bad drive or drive(s).  Most RAID drives have RAID 
> information written to the drives, and if this becomes unreadable you 
> will have RAID faults.
> Another likely culprit is heat.  Overheating drives often fail.  Are you 
> sure the temperatures in the drive enclosure is OK?
> If you can, run diagnostics on the drives, this usually requires running 
> these with the drives taken out of the RAID array though.
>         -Derek

Thanks for replying - as I said, this is a long shot trying to see if 
there is any OS involvement.

The drives are fine - I have used two different tools to analyse them 
while the computer is booted from a live CD and the RAID configuration 
cleared on the controller. Besides, you would expect one drive to fail 
at a time, and if this happened, the hot spare would surely be pressed 
into service. Nothing like this has happened though - the controller is 
reporting several drives (not always the same ones) failed 
simultaneously, but when the array is re-created from the disks, 
everything works fine. Problem is, it goes down again a day or so later.

As for heat, there is nothing being reported there and the fans that 
cool that area are working.

Any other ideas gratefully received!

Barnaby Scott

