ZFS RaidZ-2 problems
Paul Wootton
paul-freebsd at fletchermoorland.co.uk
Thu Nov 1 09:29:01 UTC 2012
On 10/31/12 17:58, Zaphod Beeblebrox wrote:
> I'd start off by saying "smart is your friend." Install smartmontools
> and study the somewhat opaque "smartctl -a /dev/mydisk" output
> carefully. Try running a short and/or long test, too. Many times the
> disk can tell you what the problem is. If too many blocks are being
> replaced, your drive is dying. If the drive sees errors in commands
> it receives, the cable or the controller are at fault. ZFS itself
> does _exceptionally_ well at trying to use what it has.
I already run SmartMonTools regularly. I do have a one of my drives that
is starting to go bad.
The drive that keeps disconnecting actually looks on on SMART (when it's
connected).
I normally also run a period scrub every few days (I've been caught out
a few times before)
> I'll also say that bad power supplies make for bad disks. Replacing a
> power supply has often been the solution to bad disk problems I've
> had. Disks are sensitive to under voltage problems. Brown-outs can
> exacerbate this problem. My parents live out where power is very
> flaky. Cheap UPSs didn't help much ... but a good power supply can
> make all the difference.
Maybe... I will not run out a bad power supply
> But I've also had bad controllers of late, too. My most recent
> problem had my 9-disk raidZ1 array loose a disk. Smartctl said that
> it was loosing blocks fast, so I RMA'd the disk. When the new disk
> came, the array just wouldn't heal... it kept loosing the disks
> attached to a certain controller. Now it's possible the controller
> was bad before the disk had died ... or that it died during the first
> attempt at resilver ... or that FreeBSD drivers don't like it anymore
> ... I don't know.
>
> My solution was to get two more 4 drive "pro box" SATA enclosures.
> They use a 1-to-4 SATA breakout and the 6 motherboard ports I have are
> a revision of the ICH11 intel chipset that supports SATA port
> replication (I already had two of these boxes). In this manner I
> could remove the defective controller and put all disks onto the
> motherboard ICH11 (it actually also allowed me to later expand the
> array... but that's not part of this story).
Again maybe... It might be a controller or cable. It could actually be
the drive.
I am not worried about the hardware side. I can replace the disks,
cables, controllers and power supply with out any problems.
As I said before, the issue I have is, I have a 9 RAIDZ-2 pack with only
1 disk showing as offline and the pack is showing as faulted.
If the power supply was bouncing and a drive was giving bad data, I
would expect ZFS to report that 2 drives were faulted (1 offline and 1
corrupt)
Is there a way with ZDB that I can see why the pool is showing as
faulted? Can it tell me which drives it thinks are bad, or has bad data?
Paul
More information about the freebsd-fs
mailing list