ZFS - Unable to offline drive in raidz1 based pool
Aaron Hurt
aaron at goflexitllc.com
Tue Sep 22 01:26:16 UTC 2009
Kurt Touet wrote:
> Apparently you were right Aaron:
>
> monolith# zpool scrub storage
> monolith# zpool status storage
> pool: storage
> state: ONLINE
> scrub: resilver completed after 0h1m with 0 errors on Mon Sep 21 11:37:24 2009
> config:
>
> NAME STATE READ WRITE CKSUM
> storage ONLINE 0 0 0
> raidz1 ONLINE 0 0 0
> ad14 ONLINE 0 0 0 1.46M resilvered
> ad6 ONLINE 0 0 0 2K resilvered
> ad12 ONLINE 0 0 0 3K resilvered
> ad4 ONLINE 0 0 0 3K resilvered
>
> errors: No known data errors
> monolith# zpool offline storage ad6
> monolith# zpool online storage ad6
> monolith# zpool status storage
> pool: storage
> state: ONLINE
> scrub: resilver completed after 0h0m with 0 errors on Mon Sep 21 11:40:12 2009
> config:
>
> NAME STATE READ WRITE CKSUM
> storage ONLINE 0 0 0
> raidz1 ONLINE 0 0 0
> ad14 ONLINE 0 0 0 67.5K resilvered
> ad6 ONLINE 0 0 0 671K resilvered
> ad12 ONLINE 0 0 0 67.5K resilvered
> ad4 ONLINE 0 0 0 53K resilvered
>
> errors: No known data errors
>
>
> I wonder then, with the storage array reporting itself as healthy, how
> did it know that one drive had desynced data, and why wouldn't that
> have shown up as an error like DEGRADED?
>
> Cheers,
> -kurt
>
>
> On Mon, Sep 21, 2009 at 11:21 AM, Kurt Touet <ktouet at gmail.com> wrote:
>
>> I thought about that possibility as well.. but I had scrubbed the
>> array within 10 days. I'll give it a shot again today and see if that
>> brings up any other errors (or allows me to offline the drive
>> afterwards).
>>
>> Cheers,
>> -kurt
>>
>> On Mon, Sep 21, 2009 at 4:39 AM, Aaron Hurt <aaron at goflexitllc.com> wrote:
>>
>>> Kurt Touet wrote:
>>>
>>>> I am using ZFS pool based on a 4-drive raidz1 setup for storage. I
>>>> believe that one of the drives is failing, and I'd like to
>>>> remove/replace it. The drive has been causing some issues (such as
>>>> becoming non-responsive and hanging the system with timeouts), so I'd
>>>> like to offline it, and then run in degraded mode until I can grab a
>>>> new drive (tomorrow). However, when I disconnected the drive (pulled
>>>> the plug, not using a zpool offline command), the following occurred:
>>>>
>>>> NAME STATE READ WRITE CKSUM
>>>> storage FAULTED 0 0 1
>>>> raidz1 DEGRADED 0 0 0
>>>> ad14 ONLINE 0 0 0
>>>> ad6 UNAVAIL 0 0 0
>>>> ad12 ONLINE 0 0 0
>>>> ad4 ONLINE 0 0 0
>>>>
>>>> Note: That's my recreation of the output... not the actual text.
>>>>
>>>> At this point, I was unable to to do anything with the pool... and all
>>>> data was inaccessible. Fortunately, the after sitting pulled for a
>>>> bit, I tried putting the failing drive back into the array, and it
>>>> booted properly. Of course, I still want to replace it, but this is
>>>> what happens when I try to take it offline:
>>>>
>>>> monolith# zpool status storage
>>>> pool: storage
>>>> state: ONLINE
>>>> scrub: none requested
>>>> config:
>>>>
>>>> NAME STATE READ WRITE CKSUM
>>>> storage ONLINE 0 0 0
>>>> raidz1 ONLINE 0 0 0
>>>> ad14 ONLINE 0 0 0
>>>> ad6 ONLINE 0 0 0
>>>> ad12 ONLINE 0 0 0
>>>> ad4 ONLINE 0 0 0
>>>>
>>>> errors: No known data errors
>>>> monolith# zpool offline storage ad6
>>>> cannot offline ad6: no valid replicas
>>>> monolith# uname -a
>>>> FreeBSD monolith 8.0-RC1 FreeBSD 8.0-RC1 #2 r197370: Sun Sep 20
>>>> 15:32:08 CST 2009 k at monolith:/usr/obj/usr/src/sys/MONOLITH amd64
>>>>
>>>> If the array is online and healthy, why can't I simply offline a drive
>>>> and then replace it afterwards? Any thoughts? Also, how does a
>>>> degraded raidz1 array end up faulting the entire pool?
>>>>
>>>> Thanks,
>>>> -kurt
>>>> _______________________________________________
>>>> freebsd-fs at freebsd.org mailing list
>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>>>>
>>>>
>>>>
>>>>
>>>>
>>> I'm not sure why it would be giving you that message. In a raidz1 you
>>> should be able to sustain one failure. The only thing that comes to mind
>>> this early in the morning would be that somehow your data replication across
>>> your discs isn't totally in sync. I would suggest you try a scrub and then
>>> see if you can remove the drive afterwards.
>>>
>>> Aaron Hurt
>>> Managing Partner
>>> Flex I.T., LLC
>>> 611 Commerce Street
>>> Suite 3117
>>> Nashville, TN 37203
>>> Phone: 615.438.7101
>>> E-mail: aaron at goflexitllc.com
>>>
>>>
>>>
>
> !DSPAM:2,4ab7bc3e126161245783902!
>
>
I had a buggy ata controller that was causing similar problems for me
once upon a time. I replaced the controller card and drive cables and
never had any more issues with it. That's still one of those things I
just scratch my head over. I'm far from a ZFS code expert so I couldn't
even begin to tell you the underlying reasons such things might be
related...just my two cents worth of experience. Anyways...glad it's
working for you now.
--
Aaron Hurt
Managing Partner
Flex I.T., LLC
611 Commerce Street
Suite 3117
Nashville, TN 37203
Phone: 615.438.7101
E-mail: aaron at goflexitllc.com
More information about the freebsd-fs
mailing list