ZFS - Unable to offline drive in raidz1 based pool

Aaron Hurt aaron at goflexitllc.com
Tue Sep 22 01:26:16 UTC 2009


Kurt Touet wrote:
> Apparently you were right Aaron:
>
> monolith# zpool scrub storage
> monolith# zpool status storage
>   pool: storage
>  state: ONLINE
>  scrub: resilver completed after 0h1m with 0 errors on Mon Sep 21 11:37:24 2009
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         storage     ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             ad14    ONLINE       0     0     0  1.46M resilvered
>             ad6     ONLINE       0     0     0  2K resilvered
>             ad12    ONLINE       0     0     0  3K resilvered
>             ad4     ONLINE       0     0     0  3K resilvered
>
> errors: No known data errors
> monolith# zpool offline storage ad6
> monolith# zpool online storage ad6
> monolith# zpool status storage
>   pool: storage
>  state: ONLINE
>  scrub: resilver completed after 0h0m with 0 errors on Mon Sep 21 11:40:12 2009
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         storage     ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             ad14    ONLINE       0     0     0  67.5K resilvered
>             ad6     ONLINE       0     0     0  671K resilvered
>             ad12    ONLINE       0     0     0  67.5K resilvered
>             ad4     ONLINE       0     0     0  53K resilvered
>
> errors: No known data errors
>
>
> I wonder then, with the storage array reporting itself as healthy, how
> did it know that one drive had desynced data, and why wouldn't that
> have shown up as an error like DEGRADED?
>
> Cheers,
> -kurt
>
>
> On Mon, Sep 21, 2009 at 11:21 AM, Kurt Touet <ktouet at gmail.com> wrote:
>   
>> I thought about that possibility as well.. but I had scrubbed the
>> array within 10 days. I'll give it a shot again today and see if that
>> brings up any other errors (or allows me to offline the drive
>> afterwards).
>>
>> Cheers,
>> -kurt
>>
>> On Mon, Sep 21, 2009 at 4:39 AM, Aaron Hurt <aaron at goflexitllc.com> wrote:
>>     
>>> Kurt Touet wrote:
>>>       
>>>> I am using ZFS pool based on a 4-drive raidz1 setup for storage.  I
>>>> believe that one of the drives is failing, and I'd like to
>>>> remove/replace it.  The drive has been causing some issues (such as
>>>> becoming non-responsive and hanging the system with timeouts), so I'd
>>>> like to offline it, and then run in degraded mode until I can grab a
>>>> new drive (tomorrow).  However, when I disconnected the drive (pulled
>>>> the plug, not using a zpool offline command), the following occurred:
>>>>
>>>>        NAME        STATE     READ WRITE CKSUM
>>>>        storage     FAULTED       0     0     1
>>>>          raidz1    DEGRADED     0     0     0
>>>>            ad14    ONLINE       0     0     0
>>>>            ad6     UNAVAIL      0     0     0
>>>>            ad12    ONLINE       0     0     0
>>>>            ad4     ONLINE       0     0     0
>>>>
>>>> Note: That's my recreation of the output... not the actual text.
>>>>
>>>> At this point, I was unable to to do anything with the pool... and all
>>>> data was inaccessible.  Fortunately, the after sitting pulled for a
>>>> bit, I tried putting the failing drive back into the array, and it
>>>> booted properly.  Of course, I still want to replace it, but this is
>>>> what happens when I try to take it offline:
>>>>
>>>> monolith# zpool status storage
>>>>  pool: storage
>>>>  state: ONLINE
>>>>  scrub: none requested
>>>> config:
>>>>
>>>>        NAME        STATE     READ WRITE CKSUM
>>>>        storage     ONLINE       0     0     0
>>>>          raidz1    ONLINE       0     0     0
>>>>            ad14    ONLINE       0     0     0
>>>>            ad6     ONLINE       0     0     0
>>>>            ad12    ONLINE       0     0     0
>>>>            ad4     ONLINE       0     0     0
>>>>
>>>> errors: No known data errors
>>>> monolith# zpool offline storage ad6
>>>> cannot offline ad6: no valid replicas
>>>> monolith# uname -a
>>>> FreeBSD monolith 8.0-RC1 FreeBSD 8.0-RC1 #2 r197370: Sun Sep 20
>>>> 15:32:08 CST 2009     k at monolith:/usr/obj/usr/src/sys/MONOLITH  amd64
>>>>
>>>> If the array is online and healthy, why can't I simply offline a drive
>>>> and then replace it afterwards?  Any thoughts?   Also, how does a
>>>> degraded raidz1 array end up faulting the entire pool?
>>>>
>>>> Thanks,
>>>> -kurt
>>>> _______________________________________________
>>>> freebsd-fs at freebsd.org mailing list
>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>>>>
>>>>
>>>>
>>>>
>>>>         
>>> I'm not sure why it would be giving you that message.  In a raidz1 you
>>> should be able to sustain one failure.  The only thing that comes to mind
>>> this early in the morning would be that somehow your data replication across
>>> your discs isn't totally in sync.  I would suggest you try a scrub and then
>>> see if you can remove the drive afterwards.
>>>
>>> Aaron Hurt
>>> Managing Partner
>>> Flex I.T., LLC
>>> 611 Commerce Street
>>> Suite 3117
>>> Nashville, TN  37203
>>> Phone: 615.438.7101
>>> E-mail: aaron at goflexitllc.com
>>>
>>>
>>>       
>
> !DSPAM:2,4ab7bc3e126161245783902!
>
>   
I had a buggy ata controller that was causing similar problems for me 
once upon a time.  I replaced the controller card and drive cables and 
never had any more issues with it.  That's still one of those things I 
just scratch my head over.  I'm far from a ZFS code expert so I couldn't 
even begin to tell you the underlying reasons such things might be 
related...just my two cents worth of experience.  Anyways...glad it's 
working for you now.

-- 

Aaron Hurt
Managing Partner
Flex I.T., LLC
611 Commerce Street
Suite 3117
Nashville, TN  37203
Phone: 615.438.7101
E-mail: aaron at goflexitllc.com



More information about the freebsd-fs mailing list