3Ware Escalade Issues

Wed Feb 22 03:09:14 PST 2006

>-----Original Message-----
>From: owner-freebsd-questions at freebsd.org
>[mailto:owner-freebsd-questions at freebsd.org]On Behalf Of Chuck Swiger
>Sent: Sunday, February 19, 2006 6:47 AM
>To: Don O'Neil
>Cc: freebsd-questions at freebsd.org
>Subject: Re: 3Ware Escalade Issues
>
>
>Don O'Neil wrote:
>> There appears to be a bad sector on one of the drives
>according to smartctl,
>> but nothing serious.
>
>What that may mean is that there have been many bad sectors,
>which have been
>corrected using the spares, until no more spare sectors are
>left for replacements.
>
>That drive may well fail catastrophically, soon.
>

_Will_ fail soon!

>> However, every time the system tried to write to that sector
>in the array,
>> the system would freeze, and then reboot, and of course it
>would say the
>> file system isn't clean, etc...
>>
>> Since the file system is 1 TB in size, it would take 8+ hours
>to FSCK it.
>> The array is only striped, and not mirrored or built with
>redunancy. I'm
>> basically using the card/driver to make one large volume for
>a web server.
>
>OK.  Well, if this data is important to you, you should give
>consideration to
>using a RAID-1, RAID-10, or RAID-5 configuration to gain redundancy.
>

RAID0-1 is the way to go - disks are cheap now.  Fry's was selling 300GB
UDMA Seagates yesterday for $69.00 with rebates.  You can find Promise
and Highpoint UDMA RAID 100 cards on Ebay for $15 or so.

>> I have a few questions:
>>
>> 1) Is this a known bug? I'm running FreeBSD 4.11 (for
>software compatibility

No it is not a bug.

>> issues at the moment, I will upgrade at some point in the future)
>
>Normally, the OS will only kill the affected processes using
>that sector,

No, Chuck, the OS has no knowledge of bad sectors on the disk.

All UNIXes out there assume perfect storage media, and perfect RAM.
It is the hardware's job to handle error correction or containment.
All the OS knows in a disk error is that it is pulling data off
the disk and doing something with it.  If the data that's pulled off
is corrupted and happens to be a device driver or some such, or
other part of the kernel (perhaps it was swapped out) then the system
will crash.  Otherwise the system won't know the difference if the
area is user data.  If it's a program then the results will be
the same as if the program had a bug in it, it will unexpectedly
terminate.

People have lost databases due to corruption by not knowing about
disk failures like this.

 but
>without knowing where it is, perhaps it's affecting some
>important file like the
>kernel itself, /bin/sh...?
>
>> 2) How can I trap the errors and eliminate the re-boot issue?
>
>Shut down the system.  Replace the failing hard drive.  Use dd
>to make an exact
>copy onto the new drive on some other system. and put the new
>drive back into
>the array.  Note that the replacement drive must be an exact
>match for this to
>work, otherwise you will have to backup your data and rebuild the array.
>
>Speaking of which, do you have known-good backups available?
>
>> 3) Is there some way I can do a faster FSCK, or perhaps
>'fool' the system
>> into thinking the file system is clean?
>
>If you update to 5.x or later, you can use background FSCK
>rather than having to
>wait for the FSCK to complete the way it does under 4.x.
>
>> 4) Any suggestions on how to fix this?
>
>Also, if you update to 5.x, you can run the smartmon tools,
>which will let you
>do a drive self-test using SMART, this will give much better
>information about
>what is going on with the drive, and also give an estimate of
>its remaining
>lifespan.
>
>How old are the drives, if you know?
>

A lot of the drive manufacturers these days are offering plenty
generous warranties, it is likely his disk is still under warranty.

Ted

>--
>-Chuck
>
>_______________________________________________
>freebsd-questions at freebsd.org mailing list
>http://lists.freebsd.org/mailman/listinfo/freebsd-questions
>To unsubscribe, send any mail to
>"freebsd-questions-unsubscribe at freebsd.org"
>
>--
>No virus found in this incoming message.
>Checked by AVG Free Edition.
>Version: 7.1.375 / Virus Database: 267.15.12/265 - Release
>Date: 2/20/2006
>